Title: US CMS Software and Computing Project US CMS Collaboration Meeting at FSU, May 2002
1US CMS Software and Computing ProjectUS CMS
Collaboration Meeting at FSU, May 2002
- Lothar A T Bauerdick/Fermilab
- Project Manager
2Scope and Deliverables
- Provide Computing Infrastructure in the U.S.
that needs RD - Provide software engineering support for CMS
- Mission is to develop and build User
Facilities for CMS physics in the U.S. - To provide the enabling IT infrastructure that
will allow U.S. physicists to fully participate
in the physics program of CMS - To provide the U.S. share of the framework and
infrastructure software - Tier-1 center at Fermilab provides computing
resources and support - User Support for CMS physics community, e.g.
software distribution, help desk - Support for Tier-2 centers, and for physics
analysis center at Fermilab - Five Tier-2 centers in the U.S.
- Together will provide same CPU/Disk resources as
Tier-1 - Facilitate involvement of collaboration in SC
development - Prototyping and test-bed effort very successful
- Universities will bid to host Tier-2 center
- taking advantage of existing resources and
expertise - Tier-2 centers to be funded through NSF program
for empowering Universities - Proposal to the NSF submitted Nov 2001
3Project Milestones and Schedules
- Prototyping, test-beds, RD started in
2000Developing the LHC Computing Grid in the
U.S. - RD systems, funded in FY2002 and FY2003
- Used for 5 data challenge (end 2003)? release
Software and Computing TDR (technical design
report) - Prototype T1/T2 systems, funded in FY2004
- for 20 data challenge (end 2004)? end Phase
1, Regional Center TDR, start deployment - Deployment 2005-2007, 30, 30, 40 costs
- Fully Functional Tier-1/2 funded in FY2005
through FY2007 - ready for LHC physics run ? start of Physics
Program - SC Maintenance and Operations 2007 on
4US CMS SC Since UCR
- Consolidation of the project, shaping out the RD
program - Project Baselined in Nov 2001 Workplan for CAS,
UF, Grids endorsed - CMS has become lead experiment for Grid work ?
Koen, Greg, Rick - US Grid Projects PPDG, GriPhyN and iVDGL
- EU Grid Projects DataGrid, Data Tag
- LHC Computing Grid Project
- Fermilab UF team, Tier-2 prototypes, US CMS
testbed - Major production efforts, PRS support
- Objectivity goes, LCG comes
- We do have a working software and computing
system! ? Physics Analysis - CCS will drive much of the common LCG Application
are - Major challenges to manage and execute the
project - Since fall 2001 we knew LHC start would be
delayed ? new date April 2007 - Proposal to NSF in Oct 2001, things are probably
moving now - New DOE funding guidance (and lack thereof from
NSF) is starving us in 2002-2004 - Very strong support for the Project from
individuals in CMS, Fermilab, Grids, FA
5Other New Developments
- NSF proposal guidance AND DOE guidance are
(SCMO) - That prompted a change in US CMS line management
Program Manager will oversee both Construction
Project and SC Project - New DOE guidance for SCMO is much below SC
baseline MO request - Europeans have achieved major UF funding,
significantly larger relative to U.S.LCG
started, expects U.S. to partner with European
projects - LCG Application Area possibly imposes issues on
CAS structure - Many developments and changes that invalidate or
challenge much of what PM tried to achieve - Opportunity to take stock of where we stand in US
CMS SCbefore we try to understand where we need
to go
6Vivian has left SC
- Thanks and appreciation
- for Vivians work
- of bringing the UF project
- to the successful baseline
- New scientist position
- opened at Fermilab
- for UF L2 manager
- and physics!
- Other assignments
- Hans Wenzel Tier-1 Manager
- Jorge RodrigezU.Florida pT2 L3 manager
- Greg Graham CMS GIT Production Task Lead
- Rick Cavenaugh US CMS Testbed Coordinator
7Project Status
- User Facilities status and successes
- US CMS Prototypes systems Tier-1, Tier-2,
testbed - Intense collaboration with US Grid project,
Grid-enabled MC production system - User Support facilities, software, operations
for PRS studies - Core Application Software status and successes
- See Ians talk
- Project Office started
- Project Engineer hired, to work on WBS,
Schedule, Budget, Reporting, Documenting - SOWs in place w/ CAS Universities MOUs,
subcontracts, invoicing is coming - In process of signing the MOUs
- Have a draft of MOU with iVDGL on prototype
Tier-2 funding
8Successful Base-lining Review
- The Committee endorses the proposed project
scope, schedule, budgets and management plan - Endorsement for the Scrubbed project plan
following the DOE/NSF guidance3.5MDOE 2MNSF
in FY2003 and 5.5DOE 3MNSF in FY2004!
9 CMS Produced Data in 2001
Reconstructed w/ Pile-UpTOTAL 29 TB
- TYPICAL EVENT SIZES
- Simulated
- 1 CMSIM event 1 OOHit event 1.4 MB
- Reconstructed
- 1 1033 event 1.2 MB
- 1 2x1033 event 1.6 MB
- 1 1034 event 5.6 MB
14 TB
CERN
12 TB
FNAL
0.60 TB
Caltech
0.45 TB
Moscow
0.40 TB
INFN
0.22 TB
Bristol/RAL
0.20 TB
UCSD
0.10 TB
IN2P3
0.05 TB
Wisconsin
-
Helsinki
0.08 TB
UFL
- These fully simulated data samples are essential
for physics and trigger studies - ? Technical Design Report for DAQ and Higher
Level Triggers
10Production Operations
- Production Efforts are Manpower intensive!
- Fermiulab Tier-1 Production Operations ? 1.7
FTE sustained effort to fill those 8 roles the
system support people that need to help if
something goes wrong!!!
At Fermilab (US CMS, PPDG) Greg Graham, Shafqat
Aziz, Yujun Wu, Moacyr Souza, Hans Wenzel,
Michael Ernst, Shahzad Muzaffar staff At U
Florida (GriPhyN, iVDGL) Dimitri Bourilkov, Jorge
Rodrigez, Rick Cavenaugh staff At Caltech
(GriPhyN, PPDG, iVDGL, USCMS) Vladimir Litvin,
Suresh Singh et al At UCSD (PPDG, iVDGL) Ian
Fisk, James Letts staff At Wisconsin Pam
Chumney, R. Gowrishankara, David Mulvihill
Peter Couvares, Alain Roy et al At CERN
(USCMS) Tony Wildish many
11US CMS Prototypes and Test-beds
- Tier-1 and Tier-2 Prototypes and Test-beds
operational
- Facilities for event simulationincluding
reconstruction - Sophisticated processing for pile-up simulation
- User cluster and hosting of data samples for
physics studies - Facilities and Grid RD
12Tier-1 equipment
13Tier-1 Equipment
14Using the Tier-1 system User System
- Until the Grid becomes reality (maybe soon!)
people who want to use computing facilities at
Fermilab need to obtain an account - That requires registration as a Fermilab user
(DOE requirement) - We will make sure that turn-around times are
reasonably short, did not hear complains yet - Go to http//computing.fnal.gov/cms/ click on
the "CMS Account" button that will guide you
through the process - Step 1 Get a Valid Fermilab ID
- Step 2 Get a fnalu account and CMS account
- Step 3 Get a Kerberos principal and krypto card
- Step 4 Information for first-time CMS account
users http//consult.cern.ch/writeup/form01/ - Got gt 100 users, currently about 1 new user per
week
15US CMS User Cluster
RD on reliable i/a serviceOS Mosix? batch
system Fbsng? Storage Disk farm?
FRY1
100Mbps
FRY2
FRY3
FRY4
BIGMAC
SWITCH
FRY5
GigaBit
SCSI 160
FRY6
FRY7
FRY8
RAID 250 GB
- To be released June 2002! nTuple, Objy analysis
etc
16User Access to Tier-1 Data
- Hosting of Jets/Met data
- Muons will be coming soon
AMD server AMD/Enstore interface
Objects
Enstore STKEN Silo
Snickers
Network
gt 10 TB
Users
Working on providing Powerful disk cache
Host redirection protocol allows to add more
servers --gt scaling load balancing
17US CMS T2 Prototypes and Test-beds
- Tier-1 and Tier-2 Prototypes and Test-beds
operational
18California Prototype Tier-2 Setup
19Benefits Of US Tier-2 Centers
- Bring computing resources close of user
communities - Provide dedicated resources to regions (of
interest and geographical) - More control over localized resources, more
opportunities to pursue physics goals - Leverage Additional Resources, which exist at the
universities and labs - Reduce computing requirements of CERN (supposed
to account for 1/3 of total LHC facilities!) - Help meet the LHC Computing Challenge
- Provide diverse collection of sites, equipment,
expertise for development and testing - Provide much needed computing resources
- US-CMS plans for about 2 FTE at each Tier-2 site
Equipment funding - supplemented with Grid, University and Lab funds
(BTW no I/S costs in US CMS plan) - Problem How do you run a center with only two
peoplethat will have much greater processing
power than CERN has currently ? - This involved facilities and operations RDto
reduce the operations personnel required to run
the centere.g. investigating cluster management
software
20U.S. Tier-1/2 System Operational
- CMS Grid Integration and Deployment on U.S. CMS
Test Bed - Data Challenges and Production Runson Tier-1/2
Prototype Systems - Spring Production 2002 finishing? Physics,
Trigger, Detector studies - Produce 10M events and 15 TB of dataalso 10M
mini-biasfully simulated including pile-upfully
reconstructed - Large assignment to U.S. CMS
- Successful Production in 2001
- 8.4M events fully simulated, including pile-up,
50 in the U.S. - 29TB data processed13TB in the U.S.
21US CMS Prototypes and Test-beds
- All U.S. CMS SC Institutions are involved in DOE
and NSF Grid Projects - Integrating Grid softwareinto CMS systems
- Bringing CMS Productionon the Grid
- Understanding the operational issues
- CMS directly profit from Grid Funding
- Deliverables of Grid Projects become useful for
LHC in the real world - Major success MOP, GDMP
22Grid-enabled CMS Production
- Successful collaboration with Grid Projects!
- MOP (Fermilab, U.Wisconsin/Condor)
- Remote job execution Condor-G, DAGman
- GDMP (Fermilab, European DataGrid WP2)
- File replication and replica catalog (Globus)
- Successfully used on CMS testbed
- First real CMS Production use finishing now!
23Recent Successes with the Grid
- Grid Enabled CMS Production Environment NB MOP
Grid-ified IMPALA, vertically integrated CMS
application - Brings together US CMS with all three US Grid
Projects - PPDG Grid developers (Condor, DAGman), GDMP (w/
WP2), - GriPhyN VDT, in the future also the virtual data
catalog - iVDGL pT2 sites and US CMS testbed
- CMS Spring 2002 production assignment of 200k
events to MOP - Half-way through, next week transfer back to CERN
- This is being considered a major success for US
CMS and Grids! - Many bugs in Condor and Globus found and fixed
- Many operational issues that needed and still
need to be sorted out - MOP will be moved into production Tier-1/Tier-2
environment
24Successes Grid-enabled Production
- Major Milestone for US CMS and PPDG
- From PPDG internal review of MOP
- From the Grid perspective, MOP has been
outstanding. It has both legitimized the idea of
using Grid tools such as DAGMAN, Condor-G, GDMP,
and Globus in a real production environment
outside of prototypes and trade show
demonstrations. Furthermore, it has motivated
the use of Grid tools such as DAGMAN, Condor-G,
GDMP, and Globus in novel environments leading to
the discovery of many bugs which would otherwise
have prevented these tools from being taken
seriously in a real production environment. - From the CMS perspective, MOP won early respect
for taking on real production problems, and is
soon ready to deliver real events. In fact,
today or early next week we will update the RefDB
at CERN which tracks production at various
regional centers. This has been delayed because
of the numerous bugs that, while being tracked
down, involved several cycles of development and
redeployment. The end of the current CMS
production cycle is in three weeks, and MOP will
be able to demonstrate some grid enabled
production capability by then. We are confident
that this will happen. It is not necessary at
this stage to have a perfect MOP system for CMS
Production IMPALA also has some failover
capability and we will use that where possible.
However, it has been a very useful exercise and
we believe that we are among the first team to
tackle Globus and Condor-G in such a stringent
and HEP specific environment.
25Successes File Transfers
- In 2001 were observing typical rates for large
data transfers,e.g. CERN - FNAL 4.7 GB/hour - After network tuning, using Grid Tools (Globus
URLcopy) we gain a factor 10! - Today we are transferring 1.5 TByte of simulated
data from UCSD to FNAL - at rates of 10 MByte/second! That almost
saturates the network I/f out of Fermilab
(155Mbps) and at UCSD (FastEthernet) - The ability to transfer a TeraByte in a day is
crucial for the Tier-1/Tier-2 system - Many operational issues remain to be solved
- GDMP is a grid tool for file replication,
developed jointly b/w US and EU - show case application for EU Data Grid WP2
data replication - Needs more work and strong support ? VDT team
(PPDG, GriPhyN, iVDGL) - e.g. CMS GDMP heartbeat for debugging new
installations and monitoring old ones. - Installation and configuration issues releases
of underlying software like Globus - Issues with site security and e.g. Firewall
- Uses Globus Security Infrastructure, which
demandsVO Certification Authority
infrastructure for CMS - Etc pp
- This needs to be developed, tested, deployed and
shows that the USCMS testbed is invaluable!
26DOE/NSF Grid RD Funding for CMS
27Farm Setup
- Almost any computer can run the CMKIN and CMSIM
steps using the CMS binary distribution system
(US CMS DAR)
This step is almost trivially put on the Grid
almost
28e.g. on the 13.6 TF - 53M TeraGrid?
Site Resources
Site Resources
26
HPSS
HPSS
4
24
External Networks
External Networks
8
5
Caltech
Argonne
External Networks
External Networks
NCSA/PACI 8 TF 240 TB
SDSC 4.1 TF 225 TB
Site Resources
Site Resources
HPSS
UniTree
TeraGrid/DTF NCSA, SDSC, Caltech, Argonne
www.teragrid.org
29Farm Setup for Reconstruction
- The first step of the reconstruction is Hit
Formatting, where simulated data is taken from
the Fortran files, formatted and entered into the
Objectivity data base. - Process is sufficiently fast and involves enough
data that more than 10-20 jobs will bog down the
data base server.
30Pile-up simulation!
- Unique at LHC due to high Luminosity and short
bunch-crossing time - Up to 200 Minimum Bias events overlayed on
interesting triggers - Lead to Pile-up in detectors ? needs to be
simulated!
This makes a CPU-limited task (event simulation)
VERY I/O intensive!
31Farm Setup for Pile-up Digitization
- The most advanced production step is digitization
with pile-up - The response of the detector is digitized the
physics objects are reconstructed and stored
persistently and at full luminosity 200 minimum
bias events are combined with the signal events
Due to the large amount of minimum bias events
multiple Objectivity AMS data servers are needed.
Several configurations have been tried.
32Objy Server Deployment Complex
4 Production Federations at FNAL. (Uses
catalog only to locate database files.) 3 FNAL
servers plus several worker nodes used in this
configuration. 3 federation hosts with attached
RAID partitions 2 lock servers 4 journal
servers 9 pileup servers
33Example of CMS Physics Studies
- Resolution studies for jet reconstruction
-
- Full detector simulation essential to understand
jet resolutions - Indispensable to design realistic triggers and
understand rates at high lumi
QCD 2-jet events with FSR Full simulation w/
tracks, HCAL noise
QCD 2-jet events, no FSR No pile-up, no tracks
recon, no HCAL noise
34Pile up Jet Energy Resolution
- Jet energy resolution
- Pile-up contribution to jet are large and have
large variations - Can be estimated event-by-event from total energy
in event - Large improvement if pile-up correction applied
(red curve) - e.g. 50 ?35 at ET 40GeV
- Physics studies depend on full detailed detector
simulation realistic pile-up processing is
essential!
35Tutorial at UCSD
- Very successful 4-day tutorial with 40 people
attending - Covering use of CMS software, including
CMKIN/CMSIM, ORCA, OSCAR, IGUANA - Covering physics code examples from all PRS
groups - Covering production tools and environment and
Grid tools - Opportunity to get people together
- UF and CAS engineers with PRS physicists
- Grid developers and CMS users
- The tutorials have been very well thought
through very useful for self-study, so they will
be maintained - It is amazing what we already can do with CMS
software - E.g. impressive to see IGUANA visualization
environment, including home made
visualizations - However, our system is (too?, still too?) complex
- We maybe need more people taking a day off and
go through the self- guided tutorials
36FY2002 UF Funding
- Excellent initial effort and DOE support for User
Facilities - Fermilab established as Tier-1 prototype and
major Grid node for LHC computing - Tier-2 sites and testbeds are operational and are
contributing to production and RD - Headstart for U.S. efforts has pushed CERN
commitment to support remote sites - The FY2002 funding has given major headaches to
PM - DOE funding 2.24M was insufficient to ramp the
Tier-1 to base-line size - The NSF contribution is unknown as of today
- According to plan we should have more people and
equipment at Fermilab T1 - Need some 7 additional FTEs and more equipment
funding - This has been strongly endorsed by the baseline
reviews - All European RC (DE, FR, IT, UK, even RU!) have
support at this level of effort
37Plans For 2002 - 2003
- Finish Spring Production challenge until June
- User Cluster, User Federations
- Upgrade of facilities (300k)
- Develop CMS Grid environment toward LCG
Production Grid - Move CMS Grid environment from testbed to
facilities - Prepare for first LCG-USUF milestone, November?
- Tier-2, -iVDGL milestones w/ ATLAS, SC2002
- LCG-USUF Production Grid milestone in May 2003
- Bring Tier-1/Tier-2 prototypes up to scale
- Serving user community User cluster,
federations,Grid enabled user environment - UF studies with persistency framework
- Start of physics DCs and computing DCs
- CAS LCG everything is on the table, but the
table is not empty - persistency framework - prototype in September
2002 - Release in July 2003
- DDD and OSCAR/Geant4 releases
- New strategy for visualization / IGUANA
- Develop distributed analysis environment w/
Caltech et al
38Funding for UF RD Phase
- There is lack of funding and lack of guidance for
2003-2005 - NSF proposal guidance AND DOE guidance are
(SCMO) - New DOE guidance for SCMO is much below SC
baseline MO request - Fermilab USCMS projects oversight has proposed
minimal MO for 2003-2004and large cuts for SC
given the new DOE guidance - The NSF has ventilated the idea to apply a rule
of 81/250DOE funding - This would lead to very serious problems in every
year of the projectwe would lack 1/3 of the
requested funding (14.0M/21.2M)
39DOE/NSF funding shortfall
40FY2003 Allocation à la Nov2001
41Europeans Achieved Major UF Funding
- Funding for European User Facilities in their
countriesnow looks significantly larger than UF
funding in the U.S. - This statement is true relative to size of their
respective communities - It is in some cases event true in absolute
terms!! - Given our funding situationare we going to be a
partner for those efforts? - BTW USATLAS proposes major cuts in UF/Tier-1
pilot flame at BNL
42How About The Others DE
43How About The Others IT
44How About The Others RU
45How About The Others UK
46How About The Others FR
47FY2002 - FY2004 Are Critical in US
- Compared to European efforts the US CMS UF
efforts are very small - In FY2002 The US CMS Tier-1 is sized at 4kSI CPU
and 5TB storage - The Tier-1 effort is 5.5 FTE. In addition there
is 2 FTE CAS and 1 FTE Grid - SC base-line 2003/2004 Tier-1 effort needs to
be at least 1M/year above FY2002to sustain the
UF RD and become full part of the LHC Physics
Research Grid - Need some some 7 additional FTEs, more equipment
funding at the Tier-1 - Part of this effort would go directly into user
support - Essential areas are insufficiently covered now,
need to be addressed in 2003 the latest - Fabric managent Storage resource mgmt
Networking System configuration management
Collaborative tools Interfacing to Grid i/s
System management operations support - This has been strongly endorsed by the SC
baseline review Nov 2001 - All European RC (DE, FR, IT, UK, even RU!) have
support at this level of effort
48The U.S. User Facilities Will Seriously Fall
BackBehind European Tier-1 EffortsGiven The
Funding Situation!
- To Keep US Leadership and
- Not to put US based Science at Disadvantage
- Additional Funding Is Required
- at least 1M/year at Tier-1 Sites
49LHC Computing Grid Project
- 36M project 2002-2004, half equipment half
personnel Successful RRB - Expect to ramp to gt30 FTE in 2002, and 60FTE in
2004 - About 2M / year equipment
- e.g. UK delivers 26.5 of LCG funding AT CERN
(9.6M) - The US CMS has requested 11.7M IN THE US CAS
5.89 - Current allocation (assuming CAS, iVDGL) would be
7.1 IN THE US - Largest personnel fraction in LCG Applications
Area - All personnel to be at CERN
- People staying at CERN for less than 6 months
are counted at a 50 level, regardless of their
experience. - CCS will work on LCG AA projects ?? US CMS will
contribute to LCG - This brings up several issues that US CMS SC
should deal with - Europeans have decided to strongly support LCG
Application Area - But at the same time we do not see more support
for the CCS efforts - CMS and US CMS will have to do at some level a
rough accounting of LCG AA vs CAS and LCG
facilities vs US UF
50Impact of LHC Delay
- Funding shortages in FY2001 and FY2002 already
lead to significant delays - Others have done more --- we seriously are
understaffed and do not do enough now - We lack 7FTE already this year, and will need to
start hiring only in FY2003 - This has led to delays and will further delay our
efforts - Long-term
- do not know, too uncertain predictions of
equipment costs to evaluate possible costs
savings due to delays by roughly a year - However, schedules become more realistic
- Medium term
- Major facilities (LCG) milestones shift by about
6 months - 1st LCG prototype grid moved to end of 2002 --gt
more realistic now - End of RD moves from end 2004 to mid 2005
- Detailed schedule and work plan expected from LCG
project and CMS CCS (June) - No significant overall costs savings for RD
phase - We are already significantly delayed, and not
even at half the effort of what other countries
are doing (UK, IT, DE, RU!!) - Catch up on our delayed schedule feasible, if we
can manage to hire 7 people in FY2003and manage
to support this level of effort in FY2004 - Major issue with lack of equipment funding
- Re-evaluation of equipment deployment will be
done during 2002 (PASTA)
51US SC Minimal Requirements
- The DOE funding guidance for the preparation of
the US LHC research program approaches adequate
funding levels around when the LHC starts in
2007, but is heavily back-loaded and does not
accommodate the base-lined software and computing
project and the needs for pre-operations of the
detector in 2002-2005. - We take up the charge to better understand the
minimum requirements, and to consider
non-standard scenarios for reducing some of the
funding short falls, but ask the funding agencies
to explore all available avenues to raise the
funding level. - The LHC computing model of a worldwide
distributed system is new and needs significant
RD. The experiments are approaching this with a
series of "data challenges" that will test the
developing systems and will eventually yield a
system that works. - US CMS SC has to be part of the data challenges
(DC) and to provide support for trigger and
detector studies (UF subproject) and to deliver
engineering support for CMS core software (CAS
subproject).
52UF Needs
- The UF subproject is centered on a Tier-1
facility at Fermilab, that will be driving the US
CMS participation in these Data Challenges. - The prototype Tier-2 centers will become
integrated parts of the US CMS Tier-1/Tier-2
facilities. - Fermilab will be a physics analysis center for
CMS. LHC physics with CMS will be an important
component of Fermilab's research program.
Therefore Fermilab needs to play a strong role as
a Tier-1 center in the upcoming CMS and LHC data
challenges. - The minimal Tier-1 effort would require to at
least double the current Tier-1 FTEs at Fermilab,
and to grant at least 300k yearly funding for
equipment. This level represents the critical
threshold. - The yearly costs for this minimally sized Tier-1
center at Fermilab would approach 2M after an
initial 1.6M in FY03 (hiring delays). The
minimal Tier-2 prototypes would need 400k
support for operations, the rest would come out
of iVDGL funds.
53CAS Needs
- Ramping down the CAS effort is not an option, as
we would face very adverse effects on CMS. CCS
manpower is now even more needed to be able to
drive and profit from the new LCG project - there
is no reason to believe that the LCG will provide
a CMS-ready solution without CCS being heavily
involved in the process. We can even less allow
for slips or delays. - Possible savings with the new close collaboration
between CMS and ATLAS through the LCG project
will potentially give some contingency to the
engineering effort that is to date missing in the
project plan. That contingency (which would first
have to be earned) could not be released before
end of 2005. - The yearly costs of keeping the current level for
CAS are about 1.7M per year (DOE 1000k, NSF
700k), including escalation and reserve.
54Minimal US CMS SC until 2005
- Definition of Minimalif we cant afford even
this, the US will not participate in the CMS
Data Challenges and LCG Milestones in 2002 - 2004 - For US CMS SC the minimal funding for the RD
phase (until 2005) would include (PRELIMINARY) - Tier1 1600k in FY03 and 2000k in the following
years - Tier2 400k per year from the NSF to sustain the
support for Tier2 manpower - CAS 1M from DOE and 700k from the NSF
- Project Office 300k (includes reserve)
- A failure to provide this level of funding would
lead to severe delays and inefficiencies in the
US LHC physics program. Considering the large
investments in the detectors, and the large
yearly costs of the research program such an
approach would not be cost efficient and
productive. - The ramp-up of the UF to the final system, beyond
2005, will need to be aligned with the plans of
CERN and other regional centers. After 2005 the
funding profile seems to approach the demand.
55Where do we stand?
- Setup of an efficient and competent s/w
engineering support for CMS - David is happy and CCS is doing well
- proposal-driven support for det/PRS engineering
support - Setup of an User Support organization out of UF
(and CAS) staff - PRS is happy (but needs more)
- proposal driven provision of resources data
servers, user cluster - Staff to provide data sets and nTuples for PRS,
small specialized production - Accounts, software releases, distribution, help
desk etc pp - Tutorials done at Tier-1 and Tier-2 sites
- Implemented commissioned a first Tier-1/Tier-2
system of RCs - UCSD, Caltech, U.Florida, U.Wisconsin, Fermilab
- Shown that Grid-tools can be used in
productionand greatly contribute to success of
Grid projects and middleware - Validated use of network between Tier-1 and
Tier-2 1TB/day! - Developing a production quality Grid-enabled User
Facility - impressive organization for running production
in US - Team at Fermilab and individual efforts at Tier-2
centers - Grid technology helps to reduce the effort
- Close collaboration with Grid project infuses
additional effort into US CMS - Collaboration between sites (including ATLAS,
like BNL) for facility issues
56What have we achieved?
- We are participating and driving a world-wide CMS
production ? DC - We are driving a large part of the US Grid
integration and deployment work - That goes beyond the LHC and even HEP
- We have shown that the Tier-1/Tier-2 User
Facility system in the US can work! - We definitely are on the map for LHC computing
and the LCG - We also are threatened to be starved over the
next years - The FA have failed to recognize the opportunity
for continued US leadership in this field as
others like the UK are realizing and supporting! - We are thrown back to a minimal funding level,
and even that has been challenged - But this is the time where our partners at CERN
will expect to see us deliver and work with the
LCG
57Conclusions
- The US CMS SC Project looks technically pretty
sound - Our customers (CCS and US CMS Users) appear to be
happy, but want more - We also need more RD to build the system, and
we need to do more to measure up to our partners - We have started in 1998 with some supplemental
fundswe are a DOE-line item now - We have received less than requested for a couple
of years now, - but this FY2002 the project has become bitterly
under-funded cf. the reviewed and endorsed
baseline - The funding agencies have faulted on providing
funding for the US SC and on providing FA
guidance for US User Facilities - The ball is in our (US CMS) park nowit is not an
option to do just a little bit of SCthe SC
RD is a project base-line plans, funding
profiles, change control - It is up to US CMS to decide
- I ask you to support my request to build up the
User Facilities in the US
58 59UF Equipment Costs
- Detailed Information on Tier-1 Facility Costing
- See Document in Your Handouts!
- All numbers in FY2002k
60Total Project Costs
61U.S. CMS Tier-1 RC Installed Capacity
Fully Functional Facilities
5 Data Challenge RD Systems
20 Data Challenge Prototype Systems
310 kSI95 today is 10,000 PCs
62Alternative Scenarios
- Q revise the plans as to not have CMS and ATLAS
identical scope? - never been tried in HEP always competitive
experiments - UF model is NOT to run a computer centers, but to
have an experiment-driven effort to get the
physics environment in place - SC is engineering support for the physics
project -- outsourcing of engineering to a
non-experiment driven (common) project would mean
a complete revision of the physics activities.
This would require fundamental changes to
experiment management and structure, that are not
in the purview of the US part of the
collaboration - specifically the data challenges are not only and
primarily done for the SC project, but are going
to be conducted as a coherent effort of the
physics, detector AND SC groups with the goal to
advance the physics, detector AND SC efforts. - The DC are why we are here. If we cannot
participate there would be no point in going for
a experiment driven UF
63Alternative Scenarios
- Q are Tier-2 resources spread too thin?
- The Tier-2 efforts should be as broad as we can
afford it. We are including university
(non-funded) groups, like Princeton - If the role of the Tier-2 centers were just to
provide computing resources we would not
distribute this, but concentrate on the Tier-1
center. Instead the model is to put some
resources at the prototype T2 centers, which
allows us to pull in additional resources at
these sites. This model seems to be rather
successful. - iVDGL funds are being used for much of the
efforts at prototype T2 centers. Hardware
investments at the Tier-2 sites up to know have
been small. The project planned to fund 1.5FTE at
each site (this funding is not yet there). In CMS
we see additional manpower at those sites of
several FTE, that comes out of the base program
and that are being attracted from CS et al
communities through the involvement in Grid
projects
64Alternative Scenarios
- Q additional software development activities to
be combined? - this will certainly happen. Concretely we already
started to plan the first large-scale ATLAS-CMS
common software project, the new persistency
framework. Do we expect significant savings in
the manpower efforts? These could be in the order
of some 20-30, if these efforts could be closely
managed. However, the management is not in US
hands, but in the purview of the LCG project.
Also, the very project is ADDITIONAL effort that
was not necessary when Objectivity was meant to
provide the persistency solution. - generally we do not expect very significant
changes in the estimates for the total
engineering manpower required to complete the
core software efforts, the possible savings would
give a minimal contingency to the engineering
effort that is to date missing in the project
plan. -gt to be earned first, then released in 2005
65Alternative Scenarios
- Q are we loosing, are there real cost benefits?
- any experiment that does not have a kernel of
people to run the data challenges will
significantly loose - The commodity is people, not equipment
- Sharing of resources is possible (and will
happen), but we need to keep minimal RD
equipment. 300k/year for each T1 is very little
funding for doing that. Below that we should just
go home - Tier2 the mission of the Tier2 centers is to
enable universities to be part of the LHC
research program. That function will be cut in as
much as the funding for it will be cut. - To separate the running of the facilities form
the experiments effort This is a model that we
are developing for our interactions with Fermilab
CD -- this is the ramping to 35 FTE in 2007,
not the 13 FTE now some services already now
are being effort-reported to CD-CMS. We have to
get the structures in place to get this right -
there will be overheads involved - I do not see real cost benefits in any of these
for the RD phase. I prefer not to discuss the
model for 2007 now, but we should stay open
minded. However, if we want to approach
unconventional scenarios we need to carefully
prepare for them. That may start in 2003-2004?
66- UF PM
- Control room logbook
- Code dist, dar, role for grid
- T2 work
- CCS schedule?
- More comments on the job
- Nucleation point vs T1 user community
- new hires
- Tonys assignment, prod running
- Disk tests, benchmarking, common work w/ BNL and
iVDGL facility grp - Monitoring, ngop, ganglia, iosifs stuff
- Mention challenges to test bed/MOP config,
certificates, installations, and help we get from
grid projects VO, ESNET CA, VDT - UF workplan
-