US CMS Software and Computing Project US CMS Collaboration Meeting at FSU, May 2002

About This Presentation

Title:

US CMS Software and Computing Project US CMS Collaboration Meeting at FSU, May 2002

Description:

Prototyping and test-bed effort very successful. Universities will 'bid' to host Tier-2 center ... Many bugs in Condor and Globus found and fixed ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 67

Provided by: claudio4

Learn more at: http://www.hep.fsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: US CMS Software and Computing Project US CMS Collaboration Meeting at FSU, May 2002

1
US CMS Software and Computing ProjectUS CMS
Collaboration Meeting at FSU, May 2002

Lothar A T Bauerdick/Fermilab
Project Manager

2
Scope and Deliverables

Provide Computing Infrastructure in the U.S.
that needs RD
Provide software engineering support for CMS
Mission is to develop and build User
Facilities for CMS physics in the U.S.
To provide the enabling IT infrastructure that
will allow U.S. physicists to fully participate
in the physics program of CMS
To provide the U.S. share of the framework and
infrastructure software
Tier-1 center at Fermilab provides computing
resources and support
User Support for CMS physics community, e.g.
software distribution, help desk
Support for Tier-2 centers, and for physics
analysis center at Fermilab
Five Tier-2 centers in the U.S.
Together will provide same CPU/Disk resources as
Tier-1
Facilitate involvement of collaboration in SC
development
Prototyping and test-bed effort very successful
Universities will bid to host Tier-2 center
taking advantage of existing resources and
expertise
Tier-2 centers to be funded through NSF program
for empowering Universities
Proposal to the NSF submitted Nov 2001

3
Project Milestones and Schedules

Prototyping, test-beds, RD started in
2000Developing the LHC Computing Grid in the
U.S.
RD systems, funded in FY2002 and FY2003
Used for 5 data challenge (end 2003)? release
Software and Computing TDR (technical design
report)
Prototype T1/T2 systems, funded in FY2004
for 20 data challenge (end 2004)? end Phase
1, Regional Center TDR, start deployment
Deployment 2005-2007, 30, 30, 40 costs
Fully Functional Tier-1/2 funded in FY2005
through FY2007
ready for LHC physics run ? start of Physics
Program
SC Maintenance and Operations 2007 on

4
US CMS SC Since UCR

Consolidation of the project, shaping out the RD
program
Project Baselined in Nov 2001 Workplan for CAS,
UF, Grids endorsed
CMS has become lead experiment for Grid work ?
Koen, Greg, Rick
US Grid Projects PPDG, GriPhyN and iVDGL
EU Grid Projects DataGrid, Data Tag
LHC Computing Grid Project
Fermilab UF team, Tier-2 prototypes, US CMS
testbed
Major production efforts, PRS support
Objectivity goes, LCG comes
We do have a working software and computing
system! ? Physics Analysis
CCS will drive much of the common LCG Application
are
Major challenges to manage and execute the
project
Since fall 2001 we knew LHC start would be
delayed ? new date April 2007
Proposal to NSF in Oct 2001, things are probably
moving now
New DOE funding guidance (and lack thereof from
NSF) is starving us in 2002-2004
Very strong support for the Project from
individuals in CMS, Fermilab, Grids, FA

5
Other New Developments

NSF proposal guidance AND DOE guidance are
(SCMO)
That prompted a change in US CMS line management
Program Manager will oversee both Construction
Project and SC Project
New DOE guidance for SCMO is much below SC
baseline MO request
Europeans have achieved major UF funding,
significantly larger relative to U.S.LCG
started, expects U.S. to partner with European
projects
LCG Application Area possibly imposes issues on
CAS structure
Many developments and changes that invalidate or
challenge much of what PM tried to achieve
Opportunity to take stock of where we stand in US
CMS SCbefore we try to understand where we need
to go

6
Vivian has left SC

Thanks and appreciation
for Vivians work
of bringing the UF project
to the successful baseline
New scientist position
opened at Fermilab
for UF L2 manager
and physics!
Other assignments
Hans Wenzel Tier-1 Manager
Jorge RodrigezU.Florida pT2 L3 manager
Greg Graham CMS GIT Production Task Lead
Rick Cavenaugh US CMS Testbed Coordinator

7
Project Status

User Facilities status and successes
US CMS Prototypes systems Tier-1, Tier-2,
testbed
Intense collaboration with US Grid project,
Grid-enabled MC production system
User Support facilities, software, operations
for PRS studies
Core Application Software status and successes
See Ians talk
Project Office started
Project Engineer hired, to work on WBS,
Schedule, Budget, Reporting, Documenting
SOWs in place w/ CAS Universities MOUs,
subcontracts, invoicing is coming
In process of signing the MOUs
Have a draft of MOU with iVDGL on prototype
Tier-2 funding

8
Successful Base-lining Review

The Committee endorses the proposed project
scope, schedule, budgets and management plan
Endorsement for the Scrubbed project plan
following the DOE/NSF guidance3.5MDOE 2MNSF
in FY2003 and 5.5DOE 3MNSF in FY2004!

9
CMS Produced Data in 2001
Reconstructed w/ Pile-UpTOTAL 29 TB

TYPICAL EVENT SIZES
Simulated
1 CMSIM event 1 OOHit event 1.4 MB
Reconstructed
1 1033 event 1.2 MB
1 2x1033 event 1.6 MB
1 1034 event 5.6 MB

14 TB
CERN
12 TB
FNAL
0.60 TB
Caltech
0.45 TB
Moscow
0.40 TB
INFN
0.22 TB
Bristol/RAL
0.20 TB
UCSD
0.10 TB
IN2P3
0.05 TB
Wisconsin
-
Helsinki
0.08 TB
UFL

These fully simulated data samples are essential
for physics and trigger studies
? Technical Design Report for DAQ and Higher
Level Triggers

10
Production Operations

Production Efforts are Manpower intensive!
Fermiulab Tier-1 Production Operations ? 1.7
FTE sustained effort to fill those 8 roles the
system support people that need to help if
something goes wrong!!!

At Fermilab (US CMS, PPDG) Greg Graham, Shafqat
Aziz, Yujun Wu, Moacyr Souza, Hans Wenzel,
Michael Ernst, Shahzad Muzaffar staff At U
Florida (GriPhyN, iVDGL) Dimitri Bourilkov, Jorge
Rodrigez, Rick Cavenaugh staff At Caltech
(GriPhyN, PPDG, iVDGL, USCMS) Vladimir Litvin,
Suresh Singh et al At UCSD (PPDG, iVDGL) Ian
Fisk, James Letts staff At Wisconsin Pam
Chumney, R. Gowrishankara, David Mulvihill
Peter Couvares, Alain Roy et al At CERN
(USCMS) Tony Wildish many
11
US CMS Prototypes and Test-beds

Tier-1 and Tier-2 Prototypes and Test-beds
operational

Facilities for event simulationincluding
reconstruction
Sophisticated processing for pile-up simulation
User cluster and hosting of data samples for
physics studies
Facilities and Grid RD

12
Tier-1 equipment
13
Tier-1 Equipment
14
Using the Tier-1 system User System

Until the Grid becomes reality (maybe soon!)
people who want to use computing facilities at
Fermilab need to obtain an account
That requires registration as a Fermilab user
(DOE requirement)
We will make sure that turn-around times are
reasonably short, did not hear complains yet
Go to http//computing.fnal.gov/cms/ click on
the "CMS Account" button that will guide you
through the process
Step 1 Get a Valid Fermilab ID
Step 2 Get a fnalu account and CMS account
Step 3 Get a Kerberos principal and krypto card
Step 4 Information for first-time CMS account
users http//consult.cern.ch/writeup/form01/
Got gt 100 users, currently about 1 new user per
week

15
US CMS User Cluster
RD on reliable i/a serviceOS Mosix? batch
system Fbsng? Storage Disk farm?
FRY1
100Mbps
FRY2
FRY3
FRY4
BIGMAC
SWITCH
FRY5
GigaBit
SCSI 160
FRY6
FRY7
FRY8
RAID 250 GB

To be released June 2002! nTuple, Objy analysis
etc

16
User Access to Tier-1 Data

Hosting of Jets/Met data
Muons will be coming soon

AMD server AMD/Enstore interface
Objects
Enstore STKEN Silo
Snickers
Network
gt 10 TB
Users
Working on providing Powerful disk cache
Host redirection protocol allows to add more
servers --gt scaling load balancing
17
US CMS T2 Prototypes and Test-beds

Tier-1 and Tier-2 Prototypes and Test-beds
operational

18
California Prototype Tier-2 Setup

UCSD
Caltech

19
Benefits Of US Tier-2 Centers

Bring computing resources close of user
communities
Provide dedicated resources to regions (of
interest and geographical)
More control over localized resources, more
opportunities to pursue physics goals
Leverage Additional Resources, which exist at the
universities and labs
Reduce computing requirements of CERN (supposed
to account for 1/3 of total LHC facilities!)
Help meet the LHC Computing Challenge
Provide diverse collection of sites, equipment,
expertise for development and testing
Provide much needed computing resources
US-CMS plans for about 2 FTE at each Tier-2 site
Equipment funding
supplemented with Grid, University and Lab funds
(BTW no I/S costs in US CMS plan)
Problem How do you run a center with only two
peoplethat will have much greater processing
power than CERN has currently ?
This involved facilities and operations RDto
reduce the operations personnel required to run
the centere.g. investigating cluster management
software

20
U.S. Tier-1/2 System Operational

CMS Grid Integration and Deployment on U.S. CMS
Test Bed
Data Challenges and Production Runson Tier-1/2
Prototype Systems
Spring Production 2002 finishing? Physics,
Trigger, Detector studies
Produce 10M events and 15 TB of dataalso 10M
mini-biasfully simulated including pile-upfully
reconstructed
Large assignment to U.S. CMS
Successful Production in 2001
8.4M events fully simulated, including pile-up,
50 in the U.S.
29TB data processed13TB in the U.S.

21
US CMS Prototypes and Test-beds

All U.S. CMS SC Institutions are involved in DOE
and NSF Grid Projects
Integrating Grid softwareinto CMS systems
Bringing CMS Productionon the Grid
Understanding the operational issues
CMS directly profit from Grid Funding
Deliverables of Grid Projects become useful for
LHC in the real world
Major success MOP, GDMP

22
Grid-enabled CMS Production

Successful collaboration with Grid Projects!
MOP (Fermilab, U.Wisconsin/Condor)
Remote job execution Condor-G, DAGman
GDMP (Fermilab, European DataGrid WP2)
File replication and replica catalog (Globus)
Successfully used on CMS testbed
First real CMS Production use finishing now!

23
Recent Successes with the Grid

Grid Enabled CMS Production Environment NB MOP
Grid-ified IMPALA, vertically integrated CMS
application
Brings together US CMS with all three US Grid
Projects
PPDG Grid developers (Condor, DAGman), GDMP (w/
WP2),
GriPhyN VDT, in the future also the virtual data
catalog
iVDGL pT2 sites and US CMS testbed
CMS Spring 2002 production assignment of 200k
events to MOP
Half-way through, next week transfer back to CERN
This is being considered a major success for US
CMS and Grids!
Many bugs in Condor and Globus found and fixed
Many operational issues that needed and still
need to be sorted out
MOP will be moved into production Tier-1/Tier-2
environment

24
Successes Grid-enabled Production

Major Milestone for US CMS and PPDG
From PPDG internal review of MOP
From the Grid perspective, MOP has been
outstanding. It has both legitimized the idea of
using Grid tools such as DAGMAN, Condor-G, GDMP,
and Globus in a real production environment
outside of prototypes and trade show
demonstrations. Furthermore, it has motivated
the use of Grid tools such as DAGMAN, Condor-G,
GDMP, and Globus in novel environments leading to
the discovery of many bugs which would otherwise
have prevented these tools from being taken
seriously in a real production environment.
From the CMS perspective, MOP won early respect
for taking on real production problems, and is
soon ready to deliver real events. In fact,
today or early next week we will update the RefDB
at CERN which tracks production at various
regional centers. This has been delayed because
of the numerous bugs that, while being tracked
down, involved several cycles of development and
redeployment. The end of the current CMS
production cycle is in three weeks, and MOP will
be able to demonstrate some grid enabled
production capability by then. We are confident
that this will happen. It is not necessary at
this stage to have a perfect MOP system for CMS
Production IMPALA also has some failover
capability and we will use that where possible.
However, it has been a very useful exercise and
we believe that we are among the first team to
tackle Globus and Condor-G in such a stringent
and HEP specific environment.

25
Successes File Transfers

In 2001 were observing typical rates for large
data transfers,e.g. CERN - FNAL 4.7 GB/hour
After network tuning, using Grid Tools (Globus
URLcopy) we gain a factor 10!
Today we are transferring 1.5 TByte of simulated
data from UCSD to FNAL
at rates of 10 MByte/second! That almost
saturates the network I/f out of Fermilab
(155Mbps) and at UCSD (FastEthernet)
The ability to transfer a TeraByte in a day is
crucial for the Tier-1/Tier-2 system
Many operational issues remain to be solved
GDMP is a grid tool for file replication,
developed jointly b/w US and EU
show case application for EU Data Grid WP2
data replication
Needs more work and strong support ? VDT team
(PPDG, GriPhyN, iVDGL)
e.g. CMS GDMP heartbeat for debugging new
installations and monitoring old ones.
Installation and configuration issues releases
of underlying software like Globus
Issues with site security and e.g. Firewall
Uses Globus Security Infrastructure, which
demandsVO Certification Authority
infrastructure for CMS
Etc pp
This needs to be developed, tested, deployed and
shows that the USCMS testbed is invaluable!

26
DOE/NSF Grid RD Funding for CMS

27
Farm Setup

Almost any computer can run the CMKIN and CMSIM
steps using the CMS binary distribution system
(US CMS DAR)

This step is almost trivially put on the Grid
almost
28
e.g. on the 13.6 TF - 53M TeraGrid?
Site Resources
Site Resources
26
HPSS
HPSS
4
24
External Networks
External Networks
8
5
Caltech
Argonne
External Networks
External Networks
NCSA/PACI 8 TF 240 TB
SDSC 4.1 TF 225 TB
Site Resources
Site Resources
HPSS
UniTree
TeraGrid/DTF NCSA, SDSC, Caltech, Argonne
www.teragrid.org
29
Farm Setup for Reconstruction

The first step of the reconstruction is Hit
Formatting, where simulated data is taken from
the Fortran files, formatted and entered into the
Objectivity data base.
Process is sufficiently fast and involves enough
data that more than 10-20 jobs will bog down the
data base server.

30
Pile-up simulation!

Unique at LHC due to high Luminosity and short
bunch-crossing time
Up to 200 Minimum Bias events overlayed on
interesting triggers
Lead to Pile-up in detectors ? needs to be
simulated!

This makes a CPU-limited task (event simulation)
VERY I/O intensive!
31
Farm Setup for Pile-up Digitization

The most advanced production step is digitization
with pile-up
The response of the detector is digitized the
physics objects are reconstructed and stored
persistently and at full luminosity 200 minimum
bias events are combined with the signal events

Due to the large amount of minimum bias events
multiple Objectivity AMS data servers are needed.
Several configurations have been tried.
32
Objy Server Deployment Complex
4 Production Federations at FNAL. (Uses
catalog only to locate database files.) 3 FNAL
servers plus several worker nodes used in this
configuration. 3 federation hosts with attached
RAID partitions 2 lock servers 4 journal
servers 9 pileup servers
33
Example of CMS Physics Studies

Resolution studies for jet reconstruction
Full detector simulation essential to understand
jet resolutions
Indispensable to design realistic triggers and
understand rates at high lumi

QCD 2-jet events with FSR Full simulation w/
tracks, HCAL noise
QCD 2-jet events, no FSR No pile-up, no tracks
recon, no HCAL noise
34
Pile up Jet Energy Resolution

Jet energy resolution
Pile-up contribution to jet are large and have
large variations
Can be estimated event-by-event from total energy
in event
Large improvement if pile-up correction applied
(red curve)
e.g. 50 ?35 at ET 40GeV
Physics studies depend on full detailed detector
simulation realistic pile-up processing is
essential!

35
Tutorial at UCSD

Very successful 4-day tutorial with 40 people
attending
Covering use of CMS software, including
CMKIN/CMSIM, ORCA, OSCAR, IGUANA
Covering physics code examples from all PRS
groups
Covering production tools and environment and
Grid tools
Opportunity to get people together
UF and CAS engineers with PRS physicists
Grid developers and CMS users
The tutorials have been very well thought
through very useful for self-study, so they will
be maintained
It is amazing what we already can do with CMS
software
E.g. impressive to see IGUANA visualization
environment, including home made
visualizations
However, our system is (too?, still too?) complex
We maybe need more people taking a day off and
go through the self- guided tutorials

36
FY2002 UF Funding

Excellent initial effort and DOE support for User
Facilities
Fermilab established as Tier-1 prototype and
major Grid node for LHC computing
Tier-2 sites and testbeds are operational and are
contributing to production and RD
Headstart for U.S. efforts has pushed CERN
commitment to support remote sites
The FY2002 funding has given major headaches to
PM
DOE funding 2.24M was insufficient to ramp the
Tier-1 to base-line size
The NSF contribution is unknown as of today
According to plan we should have more people and
equipment at Fermilab T1
Need some 7 additional FTEs and more equipment
funding
This has been strongly endorsed by the baseline
reviews
All European RC (DE, FR, IT, UK, even RU!) have
support at this level of effort

37
Plans For 2002 - 2003

Finish Spring Production challenge until June
User Cluster, User Federations
Upgrade of facilities (300k)
Develop CMS Grid environment toward LCG
Production Grid
Move CMS Grid environment from testbed to
facilities
Prepare for first LCG-USUF milestone, November?
Tier-2, -iVDGL milestones w/ ATLAS, SC2002
LCG-USUF Production Grid milestone in May 2003
Bring Tier-1/Tier-2 prototypes up to scale
Serving user community User cluster,
federations,Grid enabled user environment
UF studies with persistency framework
Start of physics DCs and computing DCs
CAS LCG everything is on the table, but the
table is not empty
persistency framework - prototype in September
2002
Release in July 2003
DDD and OSCAR/Geant4 releases
New strategy for visualization / IGUANA
Develop distributed analysis environment w/
Caltech et al

38
Funding for UF RD Phase

There is lack of funding and lack of guidance for
2003-2005
NSF proposal guidance AND DOE guidance are
(SCMO)
New DOE guidance for SCMO is much below SC
baseline MO request
Fermilab USCMS projects oversight has proposed
minimal MO for 2003-2004and large cuts for SC
given the new DOE guidance
The NSF has ventilated the idea to apply a rule
of 81/250DOE funding
This would lead to very serious problems in every
year of the projectwe would lack 1/3 of the
requested funding (14.0M/21.2M)

39
DOE/NSF funding shortfall
40
FY2003 Allocation à la Nov2001

41
Europeans Achieved Major UF Funding

Funding for European User Facilities in their
countriesnow looks significantly larger than UF
funding in the U.S.
This statement is true relative to size of their
respective communities
It is in some cases event true in absolute
terms!!
Given our funding situationare we going to be a
partner for those efforts?
BTW USATLAS proposes major cuts in UF/Tier-1
pilot flame at BNL

42
How About The Others DE
43
How About The Others IT
44
How About The Others RU
45
How About The Others UK
46
How About The Others FR
47
FY2002 - FY2004 Are Critical in US

Compared to European efforts the US CMS UF
efforts are very small
In FY2002 The US CMS Tier-1 is sized at 4kSI CPU
and 5TB storage
The Tier-1 effort is 5.5 FTE. In addition there
is 2 FTE CAS and 1 FTE Grid
SC base-line 2003/2004 Tier-1 effort needs to
be at least 1M/year above FY2002to sustain the
UF RD and become full part of the LHC Physics
Research Grid
Need some some 7 additional FTEs, more equipment
funding at the Tier-1
Part of this effort would go directly into user
support
Essential areas are insufficiently covered now,
need to be addressed in 2003 the latest
Fabric managent Storage resource mgmt
Networking System configuration management
Collaborative tools Interfacing to Grid i/s
System management operations support
This has been strongly endorsed by the SC
baseline review Nov 2001
All European RC (DE, FR, IT, UK, even RU!) have
support at this level of effort

48
The U.S. User Facilities Will Seriously Fall
BackBehind European Tier-1 EffortsGiven The
Funding Situation!

To Keep US Leadership and
Not to put US based Science at Disadvantage
Additional Funding Is Required
at least 1M/year at Tier-1 Sites

49
LHC Computing Grid Project

36M project 2002-2004, half equipment half
personnel Successful RRB
Expect to ramp to gt30 FTE in 2002, and 60FTE in
2004
About 2M / year equipment
e.g. UK delivers 26.5 of LCG funding AT CERN
(9.6M)
The US CMS has requested 11.7M IN THE US CAS
5.89
Current allocation (assuming CAS, iVDGL) would be
7.1 IN THE US
Largest personnel fraction in LCG Applications
Area
All personnel to be at CERN
People staying at CERN for less than 6 months
are counted at a 50 level, regardless of their
experience.
CCS will work on LCG AA projects ?? US CMS will
contribute to LCG
This brings up several issues that US CMS SC
should deal with
Europeans have decided to strongly support LCG
Application Area
But at the same time we do not see more support
for the CCS efforts
CMS and US CMS will have to do at some level a
rough accounting of LCG AA vs CAS and LCG
facilities vs US UF

50
Impact of LHC Delay

Funding shortages in FY2001 and FY2002 already
lead to significant delays
Others have done more --- we seriously are
understaffed and do not do enough now
We lack 7FTE already this year, and will need to
start hiring only in FY2003
This has led to delays and will further delay our
efforts
Long-term
do not know, too uncertain predictions of
equipment costs to evaluate possible costs
savings due to delays by roughly a year
However, schedules become more realistic
Medium term
Major facilities (LCG) milestones shift by about
6 months
1st LCG prototype grid moved to end of 2002 --gt
more realistic now
End of RD moves from end 2004 to mid 2005
Detailed schedule and work plan expected from LCG
project and CMS CCS (June)
No significant overall costs savings for RD
phase
We are already significantly delayed, and not
even at half the effort of what other countries
are doing (UK, IT, DE, RU!!)
Catch up on our delayed schedule feasible, if we
can manage to hire 7 people in FY2003and manage
to support this level of effort in FY2004
Major issue with lack of equipment funding
Re-evaluation of equipment deployment will be
done during 2002 (PASTA)

51
US SC Minimal Requirements

The DOE funding guidance for the preparation of
the US LHC research program approaches adequate
funding levels around when the LHC starts in
2007, but is heavily back-loaded and does not
accommodate the base-lined software and computing
project and the needs for pre-operations of the
detector in 2002-2005.
We take up the charge to better understand the
minimum requirements, and to consider
non-standard scenarios for reducing some of the
funding short falls, but ask the funding agencies
to explore all available avenues to raise the
funding level.
The LHC computing model of a worldwide
distributed system is new and needs significant
RD. The experiments are approaching this with a
series of "data challenges" that will test the
developing systems and will eventually yield a
system that works.
US CMS SC has to be part of the data challenges
(DC) and to provide support for trigger and
detector studies (UF subproject) and to deliver
engineering support for CMS core software (CAS
subproject).

52
UF Needs

The UF subproject is centered on a Tier-1
facility at Fermilab, that will be driving the US
CMS participation in these Data Challenges.
The prototype Tier-2 centers will become
integrated parts of the US CMS Tier-1/Tier-2
facilities.
Fermilab will be a physics analysis center for
CMS. LHC physics with CMS will be an important
component of Fermilab's research program.
Therefore Fermilab needs to play a strong role as
a Tier-1 center in the upcoming CMS and LHC data
challenges.
The minimal Tier-1 effort would require to at
least double the current Tier-1 FTEs at Fermilab,
and to grant at least 300k yearly funding for
equipment. This level represents the critical
threshold.
The yearly costs for this minimally sized Tier-1
center at Fermilab would approach 2M after an
initial 1.6M in FY03 (hiring delays). The
minimal Tier-2 prototypes would need 400k
support for operations, the rest would come out
of iVDGL funds.

53
CAS Needs

Ramping down the CAS effort is not an option, as
we would face very adverse effects on CMS. CCS
manpower is now even more needed to be able to
drive and profit from the new LCG project - there
is no reason to believe that the LCG will provide
a CMS-ready solution without CCS being heavily
involved in the process. We can even less allow
for slips or delays.
Possible savings with the new close collaboration
between CMS and ATLAS through the LCG project
will potentially give some contingency to the
engineering effort that is to date missing in the
project plan. That contingency (which would first
have to be earned) could not be released before
end of 2005.
The yearly costs of keeping the current level for
CAS are about 1.7M per year (DOE 1000k, NSF
700k), including escalation and reserve.

54
Minimal US CMS SC until 2005

Definition of Minimalif we cant afford even
this, the US will not participate in the CMS
Data Challenges and LCG Milestones in 2002 - 2004
For US CMS SC the minimal funding for the RD
phase (until 2005) would include (PRELIMINARY)
Tier1 1600k in FY03 and 2000k in the following
years
Tier2 400k per year from the NSF to sustain the
support for Tier2 manpower
CAS 1M from DOE and 700k from the NSF
Project Office 300k (includes reserve)
A failure to provide this level of funding would
lead to severe delays and inefficiencies in the
US LHC physics program. Considering the large
investments in the detectors, and the large
yearly costs of the research program such an
approach would not be cost efficient and
productive.
The ramp-up of the UF to the final system, beyond
2005, will need to be aligned with the plans of
CERN and other regional centers. After 2005 the
funding profile seems to approach the demand.

55
Where do we stand?

Setup of an efficient and competent s/w
engineering support for CMS
David is happy and CCS is doing well
proposal-driven support for det/PRS engineering
support
Setup of an User Support organization out of UF
(and CAS) staff
PRS is happy (but needs more)
proposal driven provision of resources data
servers, user cluster
Staff to provide data sets and nTuples for PRS,
small specialized production
Accounts, software releases, distribution, help
desk etc pp
Tutorials done at Tier-1 and Tier-2 sites
Implemented commissioned a first Tier-1/Tier-2
system of RCs
UCSD, Caltech, U.Florida, U.Wisconsin, Fermilab
Shown that Grid-tools can be used in
productionand greatly contribute to success of
Grid projects and middleware
Validated use of network between Tier-1 and
Tier-2 1TB/day!
Developing a production quality Grid-enabled User
Facility
impressive organization for running production
in US
Team at Fermilab and individual efforts at Tier-2
centers
Grid technology helps to reduce the effort
Close collaboration with Grid project infuses
additional effort into US CMS
Collaboration between sites (including ATLAS,
like BNL) for facility issues

56
What have we achieved?

We are participating and driving a world-wide CMS
production ? DC
We are driving a large part of the US Grid
integration and deployment work
That goes beyond the LHC and even HEP
We have shown that the Tier-1/Tier-2 User
Facility system in the US can work!
We definitely are on the map for LHC computing
and the LCG
We also are threatened to be starved over the
next years
The FA have failed to recognize the opportunity
for continued US leadership in this field as
others like the UK are realizing and supporting!
We are thrown back to a minimal funding level,
and even that has been challenged
But this is the time where our partners at CERN
will expect to see us deliver and work with the
LCG

57
Conclusions

The US CMS SC Project looks technically pretty
sound
Our customers (CCS and US CMS Users) appear to be
happy, but want more
We also need more RD to build the system, and
we need to do more to measure up to our partners
We have started in 1998 with some supplemental
fundswe are a DOE-line item now
We have received less than requested for a couple
of years now,
but this FY2002 the project has become bitterly
under-funded cf. the reviewed and endorsed
baseline
The funding agencies have faulted on providing
funding for the US SC and on providing FA
guidance for US User Facilities
The ball is in our (US CMS) park nowit is not an
option to do just a little bit of SCthe SC
RD is a project base-line plans, funding
profiles, change control
It is up to US CMS to decide
I ask you to support my request to build up the
User Facilities in the US

THE END

59
UF Equipment Costs

Detailed Information on Tier-1 Facility Costing
See Document in Your Handouts!
All numbers in FY2002k

60
Total Project Costs

In AYM

61
U.S. CMS Tier-1 RC Installed Capacity
Fully Functional Facilities
5 Data Challenge RD Systems
20 Data Challenge Prototype Systems
310 kSI95 today is 10,000 PCs
62
Alternative Scenarios

Q revise the plans as to not have CMS and ATLAS
identical scope?
never been tried in HEP always competitive
experiments
UF model is NOT to run a computer centers, but to
have an experiment-driven effort to get the
physics environment in place
SC is engineering support for the physics
project -- outsourcing of engineering to a
non-experiment driven (common) project would mean
a complete revision of the physics activities.
This would require fundamental changes to
experiment management and structure, that are not
in the purview of the US part of the
collaboration
specifically the data challenges are not only and
primarily done for the SC project, but are going
to be conducted as a coherent effort of the
physics, detector AND SC groups with the goal to
advance the physics, detector AND SC efforts.
The DC are why we are here. If we cannot
participate there would be no point in going for
a experiment driven UF

63
Alternative Scenarios

Q are Tier-2 resources spread too thin?
The Tier-2 efforts should be as broad as we can
afford it. We are including university
(non-funded) groups, like Princeton
If the role of the Tier-2 centers were just to
provide computing resources we would not
distribute this, but concentrate on the Tier-1
center. Instead the model is to put some
resources at the prototype T2 centers, which
allows us to pull in additional resources at
these sites. This model seems to be rather
successful.
iVDGL funds are being used for much of the
efforts at prototype T2 centers. Hardware
investments at the Tier-2 sites up to know have
been small. The project planned to fund 1.5FTE at
each site (this funding is not yet there). In CMS
we see additional manpower at those sites of
several FTE, that comes out of the base program
and that are being attracted from CS et al
communities through the involvement in Grid
projects

64
Alternative Scenarios

Q additional software development activities to
be combined?
this will certainly happen. Concretely we already
started to plan the first large-scale ATLAS-CMS
common software project, the new persistency
framework. Do we expect significant savings in
the manpower efforts? These could be in the order
of some 20-30, if these efforts could be closely
managed. However, the management is not in US
hands, but in the purview of the LCG project.
Also, the very project is ADDITIONAL effort that
was not necessary when Objectivity was meant to
provide the persistency solution.
generally we do not expect very significant
changes in the estimates for the total
engineering manpower required to complete the
core software efforts, the possible savings would
give a minimal contingency to the engineering
effort that is to date missing in the project
plan. -gt to be earned first, then released in 2005

65
Alternative Scenarios

Q are we loosing, are there real cost benefits?
any experiment that does not have a kernel of
people to run the data challenges will
significantly loose
The commodity is people, not equipment
Sharing of resources is possible (and will
happen), but we need to keep minimal RD
equipment. 300k/year for each T1 is very little
funding for doing that. Below that we should just
go home
Tier2 the mission of the Tier2 centers is to
enable universities to be part of the LHC
research program. That function will be cut in as
much as the funding for it will be cut.
To separate the running of the facilities form
the experiments effort This is a model that we
are developing for our interactions with Fermilab
CD -- this is the ramping to 35 FTE in 2007,
not the 13 FTE now some services already now
are being effort-reported to CD-CMS. We have to
get the structures in place to get this right -
there will be overheads involved
I do not see real cost benefits in any of these
for the RD phase. I prefer not to discuss the
model for 2007 now, but we should stay open
minded. However, if we want to approach
unconventional scenarios we need to carefully
prepare for them. That may start in 2003-2004?

UF PM
Control room logbook
Code dist, dar, role for grid
T2 work
CCS schedule?
More comments on the job
Nucleation point vs T1 user community
new hires
Tonys assignment, prod running
Disk tests, benchmarking, common work w/ BNL and
iVDGL facility grp
Monitoring, ngop, ganglia, iosifs stuff
Mention challenges to test bed/MOP config,
certificates, installations, and help we get from
grid projects VO, ESNET CA, VDT
UF workplan