US CMS Software and Computing Project US CMS Collaboration Meeting at FSU, May 2002 - PowerPoint PPT Presentation

About This Presentation
Title:

US CMS Software and Computing Project US CMS Collaboration Meeting at FSU, May 2002

Description:

Prototyping and test-bed effort very successful. Universities will 'bid' to host Tier-2 center ... Many bugs in Condor and Globus found and fixed ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 67
Provided by: claudio4
Learn more at: http://www.hep.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: US CMS Software and Computing Project US CMS Collaboration Meeting at FSU, May 2002


1
US CMS Software and Computing ProjectUS CMS
Collaboration Meeting at FSU, May 2002
  • Lothar A T Bauerdick/Fermilab
  • Project Manager

2
Scope and Deliverables
  • Provide Computing Infrastructure in the U.S.
    that needs RD
  • Provide software engineering support for CMS
  • Mission is to develop and build User
    Facilities for CMS physics in the U.S.
  • To provide the enabling IT infrastructure that
    will allow U.S. physicists to fully participate
    in the physics program of CMS
  • To provide the U.S. share of the framework and
    infrastructure software
  • Tier-1 center at Fermilab provides computing
    resources and support
  • User Support for CMS physics community, e.g.
    software distribution, help desk
  • Support for Tier-2 centers, and for physics
    analysis center at Fermilab
  • Five Tier-2 centers in the U.S.
  • Together will provide same CPU/Disk resources as
    Tier-1
  • Facilitate involvement of collaboration in SC
    development
  • Prototyping and test-bed effort very successful
  • Universities will bid to host Tier-2 center
  • taking advantage of existing resources and
    expertise
  • Tier-2 centers to be funded through NSF program
    for empowering Universities
  • Proposal to the NSF submitted Nov 2001

3
Project Milestones and Schedules
  • Prototyping, test-beds, RD started in
    2000Developing the LHC Computing Grid in the
    U.S.
  • RD systems, funded in FY2002 and FY2003
  • Used for 5 data challenge (end 2003)? release
    Software and Computing TDR (technical design
    report)
  • Prototype T1/T2 systems, funded in FY2004
  • for 20 data challenge (end 2004)? end Phase
    1, Regional Center TDR, start deployment
  • Deployment 2005-2007, 30, 30, 40 costs
  • Fully Functional Tier-1/2 funded in FY2005
    through FY2007
  • ready for LHC physics run ? start of Physics
    Program
  • SC Maintenance and Operations 2007 on

4
US CMS SC Since UCR
  • Consolidation of the project, shaping out the RD
    program
  • Project Baselined in Nov 2001 Workplan for CAS,
    UF, Grids endorsed
  • CMS has become lead experiment for Grid work ?
    Koen, Greg, Rick
  • US Grid Projects PPDG, GriPhyN and iVDGL
  • EU Grid Projects DataGrid, Data Tag
  • LHC Computing Grid Project
  • Fermilab UF team, Tier-2 prototypes, US CMS
    testbed
  • Major production efforts, PRS support
  • Objectivity goes, LCG comes
  • We do have a working software and computing
    system! ? Physics Analysis
  • CCS will drive much of the common LCG Application
    are
  • Major challenges to manage and execute the
    project
  • Since fall 2001 we knew LHC start would be
    delayed ? new date April 2007
  • Proposal to NSF in Oct 2001, things are probably
    moving now
  • New DOE funding guidance (and lack thereof from
    NSF) is starving us in 2002-2004
  • Very strong support for the Project from
    individuals in CMS, Fermilab, Grids, FA

5
Other New Developments
  • NSF proposal guidance AND DOE guidance are
    (SCMO)
  • That prompted a change in US CMS line management
    Program Manager will oversee both Construction
    Project and SC Project
  • New DOE guidance for SCMO is much below SC
    baseline MO request
  • Europeans have achieved major UF funding,
    significantly larger relative to U.S.LCG
    started, expects U.S. to partner with European
    projects
  • LCG Application Area possibly imposes issues on
    CAS structure
  • Many developments and changes that invalidate or
    challenge much of what PM tried to achieve
  • Opportunity to take stock of where we stand in US
    CMS SCbefore we try to understand where we need
    to go

6
Vivian has left SC
  • Thanks and appreciation
  • for Vivians work
  • of bringing the UF project
  • to the successful baseline
  • New scientist position
  • opened at Fermilab
  • for UF L2 manager
  • and physics!
  • Other assignments
  • Hans Wenzel Tier-1 Manager
  • Jorge RodrigezU.Florida pT2 L3 manager
  • Greg Graham CMS GIT Production Task Lead
  • Rick Cavenaugh US CMS Testbed Coordinator

7
Project Status
  • User Facilities status and successes
  • US CMS Prototypes systems Tier-1, Tier-2,
    testbed
  • Intense collaboration with US Grid project,
    Grid-enabled MC production system
  • User Support facilities, software, operations
    for PRS studies
  • Core Application Software status and successes
  • See Ians talk
  • Project Office started
  • Project Engineer hired, to work on WBS,
    Schedule, Budget, Reporting, Documenting
  • SOWs in place w/ CAS Universities MOUs,
    subcontracts, invoicing is coming
  • In process of signing the MOUs
  • Have a draft of MOU with iVDGL on prototype
    Tier-2 funding

8
Successful Base-lining Review
  • The Committee endorses the proposed project
    scope, schedule, budgets and management plan
  • Endorsement for the Scrubbed project plan
    following the DOE/NSF guidance3.5MDOE 2MNSF
    in FY2003 and 5.5DOE 3MNSF in FY2004!

9
CMS Produced Data in 2001
Reconstructed w/ Pile-UpTOTAL 29 TB
  • TYPICAL EVENT SIZES
  • Simulated
  • 1 CMSIM event 1 OOHit event 1.4 MB
  • Reconstructed
  • 1 1033 event 1.2 MB
  • 1 2x1033 event 1.6 MB
  • 1 1034 event 5.6 MB

14 TB
CERN
12 TB
FNAL
0.60 TB
Caltech
0.45 TB
Moscow
0.40 TB
INFN
0.22 TB
Bristol/RAL
0.20 TB
UCSD
0.10 TB
IN2P3
0.05 TB
Wisconsin
-
Helsinki
0.08 TB
UFL
  • These fully simulated data samples are essential
    for physics and trigger studies
  • ? Technical Design Report for DAQ and Higher
    Level Triggers

10
Production Operations
  • Production Efforts are Manpower intensive!
  • Fermiulab Tier-1 Production Operations ? 1.7
    FTE sustained effort to fill those 8 roles the
    system support people that need to help if
    something goes wrong!!!

At Fermilab (US CMS, PPDG) Greg Graham, Shafqat
Aziz, Yujun Wu, Moacyr Souza, Hans Wenzel,
Michael Ernst, Shahzad Muzaffar staff At U
Florida (GriPhyN, iVDGL) Dimitri Bourilkov, Jorge
Rodrigez, Rick Cavenaugh staff At Caltech
(GriPhyN, PPDG, iVDGL, USCMS) Vladimir Litvin,
Suresh Singh et al At UCSD (PPDG, iVDGL) Ian
Fisk, James Letts staff At Wisconsin Pam
Chumney, R. Gowrishankara, David Mulvihill
Peter Couvares, Alain Roy et al At CERN
(USCMS) Tony Wildish many
11
US CMS Prototypes and Test-beds
  • Tier-1 and Tier-2 Prototypes and Test-beds
    operational
  • Facilities for event simulationincluding
    reconstruction
  • Sophisticated processing for pile-up simulation
  • User cluster and hosting of data samples for
    physics studies
  • Facilities and Grid RD

12
Tier-1 equipment
13
Tier-1 Equipment
14
Using the Tier-1 system User System
  • Until the Grid becomes reality (maybe soon!)
    people who want to use computing facilities at
    Fermilab need to obtain an account
  • That requires registration as a Fermilab user
    (DOE requirement)
  • We will make sure that turn-around times are
    reasonably short, did not hear complains yet
  • Go to http//computing.fnal.gov/cms/ click on
    the "CMS Account" button that will guide you
    through the process
  • Step 1 Get a Valid Fermilab ID
  • Step 2 Get a fnalu account and CMS account
  • Step 3 Get a Kerberos principal and krypto card
  • Step 4 Information for first-time CMS account
    users http//consult.cern.ch/writeup/form01/
  • Got gt 100 users, currently about 1 new user per
    week

15
US CMS User Cluster
RD on reliable i/a serviceOS Mosix? batch
system Fbsng? Storage Disk farm?
FRY1
100Mbps
FRY2
FRY3
FRY4
BIGMAC
SWITCH
FRY5
GigaBit
SCSI 160
FRY6
FRY7
FRY8
RAID 250 GB
  • To be released June 2002! nTuple, Objy analysis
    etc

16
User Access to Tier-1 Data
  • Hosting of Jets/Met data
  • Muons will be coming soon

AMD server AMD/Enstore interface
Objects
Enstore STKEN Silo
Snickers
Network
gt 10 TB
Users
Working on providing Powerful disk cache
Host redirection protocol allows to add more
servers --gt scaling load balancing
17
US CMS T2 Prototypes and Test-beds
  • Tier-1 and Tier-2 Prototypes and Test-beds
    operational

18
California Prototype Tier-2 Setup
  • UCSD
    Caltech

19
Benefits Of US Tier-2 Centers
  • Bring computing resources close of user
    communities
  • Provide dedicated resources to regions (of
    interest and geographical)
  • More control over localized resources, more
    opportunities to pursue physics goals
  • Leverage Additional Resources, which exist at the
    universities and labs
  • Reduce computing requirements of CERN (supposed
    to account for 1/3 of total LHC facilities!)
  • Help meet the LHC Computing Challenge
  • Provide diverse collection of sites, equipment,
    expertise for development and testing
  • Provide much needed computing resources
  • US-CMS plans for about 2 FTE at each Tier-2 site
    Equipment funding
  • supplemented with Grid, University and Lab funds
    (BTW no I/S costs in US CMS plan)
  • Problem How do you run a center with only two
    peoplethat will have much greater processing
    power than CERN has currently ?
  • This involved facilities and operations RDto
    reduce the operations personnel required to run
    the centere.g. investigating cluster management
    software

20
U.S. Tier-1/2 System Operational
  • CMS Grid Integration and Deployment on U.S. CMS
    Test Bed
  • Data Challenges and Production Runson Tier-1/2
    Prototype Systems
  • Spring Production 2002 finishing? Physics,
    Trigger, Detector studies
  • Produce 10M events and 15 TB of dataalso 10M
    mini-biasfully simulated including pile-upfully
    reconstructed
  • Large assignment to U.S. CMS
  • Successful Production in 2001
  • 8.4M events fully simulated, including pile-up,
    50 in the U.S.
  • 29TB data processed13TB in the U.S.

21
US CMS Prototypes and Test-beds
  • All U.S. CMS SC Institutions are involved in DOE
    and NSF Grid Projects
  • Integrating Grid softwareinto CMS systems
  • Bringing CMS Productionon the Grid
  • Understanding the operational issues
  • CMS directly profit from Grid Funding
  • Deliverables of Grid Projects become useful for
    LHC in the real world
  • Major success MOP, GDMP

22
Grid-enabled CMS Production
  • Successful collaboration with Grid Projects!
  • MOP (Fermilab, U.Wisconsin/Condor)
  • Remote job execution Condor-G, DAGman
  • GDMP (Fermilab, European DataGrid WP2)
  • File replication and replica catalog (Globus)
  • Successfully used on CMS testbed
  • First real CMS Production use finishing now!

23
Recent Successes with the Grid
  • Grid Enabled CMS Production Environment NB MOP
    Grid-ified IMPALA, vertically integrated CMS
    application
  • Brings together US CMS with all three US Grid
    Projects
  • PPDG Grid developers (Condor, DAGman), GDMP (w/
    WP2),
  • GriPhyN VDT, in the future also the virtual data
    catalog
  • iVDGL pT2 sites and US CMS testbed
  • CMS Spring 2002 production assignment of 200k
    events to MOP
  • Half-way through, next week transfer back to CERN
  • This is being considered a major success for US
    CMS and Grids!
  • Many bugs in Condor and Globus found and fixed
  • Many operational issues that needed and still
    need to be sorted out
  • MOP will be moved into production Tier-1/Tier-2
    environment

24
Successes Grid-enabled Production
  • Major Milestone for US CMS and PPDG
  • From PPDG internal review of MOP
  • From the Grid perspective, MOP has been
    outstanding. It has both legitimized the idea of
    using Grid tools such as DAGMAN, Condor-G, GDMP,
    and Globus in a real production environment
    outside of prototypes and trade show
    demonstrations. Furthermore, it has motivated
    the use of Grid tools such as DAGMAN, Condor-G,
    GDMP, and Globus in novel environments leading to
    the discovery of many bugs which would otherwise
    have prevented these tools from being taken
    seriously in a real production environment.
  • From the CMS perspective, MOP won early respect
    for taking on real production problems, and is
    soon ready to deliver real events. In fact,
    today or early next week we will update the RefDB
    at CERN which tracks production at various
    regional centers. This has been delayed because
    of the numerous bugs that, while being tracked
    down, involved several cycles of development and
    redeployment. The end of the current CMS
    production cycle is in three weeks, and MOP will
    be able to demonstrate some grid enabled
    production capability by then. We are confident
    that this will happen. It is not necessary at
    this stage to have a perfect MOP system for CMS
    Production IMPALA also has some failover
    capability and we will use that where possible.
    However, it has been a very useful exercise and
    we believe that we are among the first team to
    tackle Globus and Condor-G in such a stringent
    and HEP specific environment.

25
Successes File Transfers
  • In 2001 were observing typical rates for large
    data transfers,e.g. CERN - FNAL 4.7 GB/hour
  • After network tuning, using Grid Tools (Globus
    URLcopy) we gain a factor 10!
  • Today we are transferring 1.5 TByte of simulated
    data from UCSD to FNAL
  • at rates of 10 MByte/second! That almost
    saturates the network I/f out of Fermilab
    (155Mbps) and at UCSD (FastEthernet)
  • The ability to transfer a TeraByte in a day is
    crucial for the Tier-1/Tier-2 system
  • Many operational issues remain to be solved
  • GDMP is a grid tool for file replication,
    developed jointly b/w US and EU
  • show case application for EU Data Grid WP2
    data replication
  • Needs more work and strong support ? VDT team
    (PPDG, GriPhyN, iVDGL)
  • e.g. CMS GDMP heartbeat for debugging new
    installations and monitoring old ones.
  • Installation and configuration issues releases
    of underlying software like Globus
  • Issues with site security and e.g. Firewall
  • Uses Globus Security Infrastructure, which
    demandsVO Certification Authority
    infrastructure for CMS
  • Etc pp
  • This needs to be developed, tested, deployed and
    shows that the USCMS testbed is invaluable!

26
DOE/NSF Grid RD Funding for CMS

27
Farm Setup
  • Almost any computer can run the CMKIN and CMSIM
    steps using the CMS binary distribution system
    (US CMS DAR)

This step is almost trivially put on the Grid
almost
28
e.g. on the 13.6 TF - 53M TeraGrid?
Site Resources
Site Resources
26
HPSS
HPSS
4
24
External Networks
External Networks
8
5
Caltech
Argonne
External Networks
External Networks
NCSA/PACI 8 TF 240 TB
SDSC 4.1 TF 225 TB
Site Resources
Site Resources
HPSS
UniTree
TeraGrid/DTF NCSA, SDSC, Caltech, Argonne
www.teragrid.org
29
Farm Setup for Reconstruction
  • The first step of the reconstruction is Hit
    Formatting, where simulated data is taken from
    the Fortran files, formatted and entered into the
    Objectivity data base.
  • Process is sufficiently fast and involves enough
    data that more than 10-20 jobs will bog down the
    data base server.

30
Pile-up simulation!
  • Unique at LHC due to high Luminosity and short
    bunch-crossing time
  • Up to 200 Minimum Bias events overlayed on
    interesting triggers
  • Lead to Pile-up in detectors ? needs to be
    simulated!

This makes a CPU-limited task (event simulation)
VERY I/O intensive!
31
Farm Setup for Pile-up Digitization
  • The most advanced production step is digitization
    with pile-up
  • The response of the detector is digitized the
    physics objects are reconstructed and stored
    persistently and at full luminosity 200 minimum
    bias events are combined with the signal events

Due to the large amount of minimum bias events
multiple Objectivity AMS data servers are needed.
Several configurations have been tried.
32
Objy Server Deployment Complex
4 Production Federations at FNAL. (Uses
catalog only to locate database files.) 3 FNAL
servers plus several worker nodes used in this
configuration. 3 federation hosts with attached
RAID partitions 2 lock servers 4 journal
servers 9 pileup servers
33
Example of CMS Physics Studies
  • Resolution studies for jet reconstruction
  • Full detector simulation essential to understand
    jet resolutions
  • Indispensable to design realistic triggers and
    understand rates at high lumi

QCD 2-jet events with FSR Full simulation w/
tracks, HCAL noise
QCD 2-jet events, no FSR No pile-up, no tracks
recon, no HCAL noise
34
Pile up Jet Energy Resolution
  • Jet energy resolution
  • Pile-up contribution to jet are large and have
    large variations
  • Can be estimated event-by-event from total energy
    in event
  • Large improvement if pile-up correction applied
    (red curve)
  • e.g. 50 ?35 at ET 40GeV
  • Physics studies depend on full detailed detector
    simulation realistic pile-up processing is
    essential!

35
Tutorial at UCSD
  • Very successful 4-day tutorial with 40 people
    attending
  • Covering use of CMS software, including
    CMKIN/CMSIM, ORCA, OSCAR, IGUANA
  • Covering physics code examples from all PRS
    groups
  • Covering production tools and environment and
    Grid tools
  • Opportunity to get people together
  • UF and CAS engineers with PRS physicists
  • Grid developers and CMS users
  • The tutorials have been very well thought
    through very useful for self-study, so they will
    be maintained
  • It is amazing what we already can do with CMS
    software
  • E.g. impressive to see IGUANA visualization
    environment, including home made
    visualizations
  • However, our system is (too?, still too?) complex
  • We maybe need more people taking a day off and
    go through the self- guided tutorials

36
FY2002 UF Funding
  • Excellent initial effort and DOE support for User
    Facilities
  • Fermilab established as Tier-1 prototype and
    major Grid node for LHC computing
  • Tier-2 sites and testbeds are operational and are
    contributing to production and RD
  • Headstart for U.S. efforts has pushed CERN
    commitment to support remote sites
  • The FY2002 funding has given major headaches to
    PM
  • DOE funding 2.24M was insufficient to ramp the
    Tier-1 to base-line size
  • The NSF contribution is unknown as of today
  • According to plan we should have more people and
    equipment at Fermilab T1
  • Need some 7 additional FTEs and more equipment
    funding
  • This has been strongly endorsed by the baseline
    reviews
  • All European RC (DE, FR, IT, UK, even RU!) have
    support at this level of effort

37
Plans For 2002 - 2003
  • Finish Spring Production challenge until June
  • User Cluster, User Federations
  • Upgrade of facilities (300k)
  • Develop CMS Grid environment toward LCG
    Production Grid
  • Move CMS Grid environment from testbed to
    facilities
  • Prepare for first LCG-USUF milestone, November?
  • Tier-2, -iVDGL milestones w/ ATLAS, SC2002
  • LCG-USUF Production Grid milestone in May 2003
  • Bring Tier-1/Tier-2 prototypes up to scale
  • Serving user community User cluster,
    federations,Grid enabled user environment
  • UF studies with persistency framework
  • Start of physics DCs and computing DCs
  • CAS LCG everything is on the table, but the
    table is not empty
  • persistency framework - prototype in September
    2002
  • Release in July 2003
  • DDD and OSCAR/Geant4 releases
  • New strategy for visualization / IGUANA
  • Develop distributed analysis environment w/
    Caltech et al

38
Funding for UF RD Phase
  • There is lack of funding and lack of guidance for
    2003-2005
  • NSF proposal guidance AND DOE guidance are
    (SCMO)
  • New DOE guidance for SCMO is much below SC
    baseline MO request
  • Fermilab USCMS projects oversight has proposed
    minimal MO for 2003-2004and large cuts for SC
    given the new DOE guidance
  • The NSF has ventilated the idea to apply a rule
    of 81/250DOE funding
  • This would lead to very serious problems in every
    year of the projectwe would lack 1/3 of the
    requested funding (14.0M/21.2M)

39
DOE/NSF funding shortfall
40
FY2003 Allocation à la Nov2001

41
Europeans Achieved Major UF Funding
  • Funding for European User Facilities in their
    countriesnow looks significantly larger than UF
    funding in the U.S.
  • This statement is true relative to size of their
    respective communities
  • It is in some cases event true in absolute
    terms!!
  • Given our funding situationare we going to be a
    partner for those efforts?
  • BTW USATLAS proposes major cuts in UF/Tier-1
    pilot flame at BNL

42
How About The Others DE
43
How About The Others IT
44
How About The Others RU
45
How About The Others UK
46
How About The Others FR
47
FY2002 - FY2004 Are Critical in US
  • Compared to European efforts the US CMS UF
    efforts are very small
  • In FY2002 The US CMS Tier-1 is sized at 4kSI CPU
    and 5TB storage
  • The Tier-1 effort is 5.5 FTE. In addition there
    is 2 FTE CAS and 1 FTE Grid
  • SC base-line 2003/2004 Tier-1 effort needs to
    be at least 1M/year above FY2002to sustain the
    UF RD and become full part of the LHC Physics
    Research Grid
  • Need some some 7 additional FTEs, more equipment
    funding at the Tier-1
  • Part of this effort would go directly into user
    support
  • Essential areas are insufficiently covered now,
    need to be addressed in 2003 the latest
  • Fabric managent Storage resource mgmt
    Networking System configuration management
    Collaborative tools Interfacing to Grid i/s
    System management operations support
  • This has been strongly endorsed by the SC
    baseline review Nov 2001
  • All European RC (DE, FR, IT, UK, even RU!) have
    support at this level of effort

48
The U.S. User Facilities Will Seriously Fall
BackBehind European Tier-1 EffortsGiven The
Funding Situation!
  • To Keep US Leadership and
  • Not to put US based Science at Disadvantage
  • Additional Funding Is Required
  • at least 1M/year at Tier-1 Sites

49
LHC Computing Grid Project
  • 36M project 2002-2004, half equipment half
    personnel Successful RRB
  • Expect to ramp to gt30 FTE in 2002, and 60FTE in
    2004
  • About 2M / year equipment
  • e.g. UK delivers 26.5 of LCG funding AT CERN
    (9.6M)
  • The US CMS has requested 11.7M IN THE US CAS
    5.89
  • Current allocation (assuming CAS, iVDGL) would be
    7.1 IN THE US
  • Largest personnel fraction in LCG Applications
    Area
  • All personnel to be at CERN
  • People staying at CERN for less than 6 months
    are counted at a 50 level, regardless of their
    experience.
  • CCS will work on LCG AA projects ?? US CMS will
    contribute to LCG
  • This brings up several issues that US CMS SC
    should deal with
  • Europeans have decided to strongly support LCG
    Application Area
  • But at the same time we do not see more support
    for the CCS efforts
  • CMS and US CMS will have to do at some level a
    rough accounting of LCG AA vs CAS and LCG
    facilities vs US UF

50
Impact of LHC Delay
  • Funding shortages in FY2001 and FY2002 already
    lead to significant delays
  • Others have done more --- we seriously are
    understaffed and do not do enough now
  • We lack 7FTE already this year, and will need to
    start hiring only in FY2003
  • This has led to delays and will further delay our
    efforts
  • Long-term
  • do not know, too uncertain predictions of
    equipment costs to evaluate possible costs
    savings due to delays by roughly a year
  • However, schedules become more realistic
  • Medium term
  • Major facilities (LCG) milestones shift by about
    6 months
  • 1st LCG prototype grid moved to end of 2002 --gt
    more realistic now
  • End of RD moves from end 2004 to mid 2005
  • Detailed schedule and work plan expected from LCG
    project and CMS CCS (June)
  • No significant overall costs savings for RD
    phase
  • We are already significantly delayed, and not
    even at half the effort of what other countries
    are doing (UK, IT, DE, RU!!)
  • Catch up on our delayed schedule feasible, if we
    can manage to hire 7 people in FY2003and manage
    to support this level of effort in FY2004
  • Major issue with lack of equipment funding
  • Re-evaluation of equipment deployment will be
    done during 2002 (PASTA)

51
US SC Minimal Requirements
  • The DOE funding guidance for the preparation of
    the US LHC research program approaches adequate
    funding levels around when the LHC starts in
    2007, but is heavily back-loaded and does not
    accommodate the base-lined software and computing
    project and the needs for pre-operations of the
    detector in 2002-2005.
  • We take up the charge to better understand the
    minimum requirements, and to consider
    non-standard scenarios for reducing some of the
    funding short falls, but ask the funding agencies
    to explore all available avenues to raise the
    funding level.
  • The LHC computing model of a worldwide
    distributed system is new and needs significant
    RD. The experiments are approaching this with a
    series of "data challenges" that will test the
    developing systems and will eventually yield a
    system that works.
  • US CMS SC has to be part of the data challenges
    (DC) and to provide support for trigger and
    detector studies (UF subproject) and to deliver
    engineering support for CMS core software (CAS
    subproject).

52
UF Needs
  • The UF subproject is centered on a Tier-1
    facility at Fermilab, that will be driving the US
    CMS participation in these Data Challenges.
  • The prototype Tier-2 centers will become
    integrated parts of the US CMS Tier-1/Tier-2
    facilities.
  • Fermilab will be a physics analysis center for
    CMS. LHC physics with CMS will be an important
    component of Fermilab's research program.
    Therefore Fermilab needs to play a strong role as
    a Tier-1 center in the upcoming CMS and LHC data
    challenges.
  • The minimal Tier-1 effort would require to at
    least double the current Tier-1 FTEs at Fermilab,
    and to grant at least 300k yearly funding for
    equipment. This level represents the critical
    threshold.
  • The yearly costs for this minimally sized Tier-1
    center at Fermilab would approach 2M after an
    initial 1.6M in FY03 (hiring delays). The
    minimal Tier-2 prototypes would need 400k
    support for operations, the rest would come out
    of iVDGL funds.

53
CAS Needs
  • Ramping down the CAS effort is not an option, as
    we would face very adverse effects on CMS. CCS
    manpower is now even more needed to be able to
    drive and profit from the new LCG project - there
    is no reason to believe that the LCG will provide
    a CMS-ready solution without CCS being heavily
    involved in the process. We can even less allow
    for slips or delays.
  • Possible savings with the new close collaboration
    between CMS and ATLAS through the LCG project
    will potentially give some contingency to the
    engineering effort that is to date missing in the
    project plan. That contingency (which would first
    have to be earned) could not be released before
    end of 2005.
  • The yearly costs of keeping the current level for
    CAS are about 1.7M per year (DOE 1000k, NSF
    700k), including escalation and reserve.

54
Minimal US CMS SC until 2005
  • Definition of Minimalif we cant afford even
    this, the US will not participate in the CMS
    Data Challenges and LCG Milestones in 2002 - 2004
  • For US CMS SC the minimal funding for the RD
    phase (until 2005) would include (PRELIMINARY)
  • Tier1 1600k in FY03 and 2000k in the following
    years
  • Tier2 400k per year from the NSF to sustain the
    support for Tier2 manpower
  • CAS 1M from DOE and 700k from the NSF
  • Project Office 300k (includes reserve)
  • A failure to provide this level of funding would
    lead to severe delays and inefficiencies in the
    US LHC physics program. Considering the large
    investments in the detectors, and the large
    yearly costs of the research program such an
    approach would not be cost efficient and
    productive.
  • The ramp-up of the UF to the final system, beyond
    2005, will need to be aligned with the plans of
    CERN and other regional centers. After 2005 the
    funding profile seems to approach the demand.

55
Where do we stand?
  • Setup of an efficient and competent s/w
    engineering support for CMS
  • David is happy and CCS is doing well
  • proposal-driven support for det/PRS engineering
    support
  • Setup of an User Support organization out of UF
    (and CAS) staff
  • PRS is happy (but needs more)
  • proposal driven provision of resources data
    servers, user cluster
  • Staff to provide data sets and nTuples for PRS,
    small specialized production
  • Accounts, software releases, distribution, help
    desk etc pp
  • Tutorials done at Tier-1 and Tier-2 sites
  • Implemented commissioned a first Tier-1/Tier-2
    system of RCs
  • UCSD, Caltech, U.Florida, U.Wisconsin, Fermilab
  • Shown that Grid-tools can be used in
    productionand greatly contribute to success of
    Grid projects and middleware
  • Validated use of network between Tier-1 and
    Tier-2 1TB/day!
  • Developing a production quality Grid-enabled User
    Facility
  • impressive organization for running production
    in US
  • Team at Fermilab and individual efforts at Tier-2
    centers
  • Grid technology helps to reduce the effort
  • Close collaboration with Grid project infuses
    additional effort into US CMS
  • Collaboration between sites (including ATLAS,
    like BNL) for facility issues

56
What have we achieved?
  • We are participating and driving a world-wide CMS
    production ? DC
  • We are driving a large part of the US Grid
    integration and deployment work
  • That goes beyond the LHC and even HEP
  • We have shown that the Tier-1/Tier-2 User
    Facility system in the US can work!
  • We definitely are on the map for LHC computing
    and the LCG
  • We also are threatened to be starved over the
    next years
  • The FA have failed to recognize the opportunity
    for continued US leadership in this field as
    others like the UK are realizing and supporting!
  • We are thrown back to a minimal funding level,
    and even that has been challenged
  • But this is the time where our partners at CERN
    will expect to see us deliver and work with the
    LCG

57
Conclusions
  • The US CMS SC Project looks technically pretty
    sound
  • Our customers (CCS and US CMS Users) appear to be
    happy, but want more
  • We also need more RD to build the system, and
    we need to do more to measure up to our partners
  • We have started in 1998 with some supplemental
    fundswe are a DOE-line item now
  • We have received less than requested for a couple
    of years now,
  • but this FY2002 the project has become bitterly
    under-funded cf. the reviewed and endorsed
    baseline
  • The funding agencies have faulted on providing
    funding for the US SC and on providing FA
    guidance for US User Facilities
  • The ball is in our (US CMS) park nowit is not an
    option to do just a little bit of SCthe SC
    RD is a project base-line plans, funding
    profiles, change control
  • It is up to US CMS to decide
  • I ask you to support my request to build up the
    User Facilities in the US

58
  • THE END

59
UF Equipment Costs
  • Detailed Information on Tier-1 Facility Costing
  • See Document in Your Handouts!
  • All numbers in FY2002k

60
Total Project Costs
  • In AYM

61
U.S. CMS Tier-1 RC Installed Capacity
Fully Functional Facilities
5 Data Challenge RD Systems
20 Data Challenge Prototype Systems
310 kSI95 today is 10,000 PCs
62
Alternative Scenarios
  • Q revise the plans as to not have CMS and ATLAS
    identical scope?
  • never been tried in HEP always competitive
    experiments
  • UF model is NOT to run a computer centers, but to
    have an experiment-driven effort to get the
    physics environment in place
  • SC is engineering support for the physics
    project -- outsourcing of engineering to a
    non-experiment driven (common) project would mean
    a complete revision of the physics activities.
    This would require fundamental changes to
    experiment management and structure, that are not
    in the purview of the US part of the
    collaboration
  • specifically the data challenges are not only and
    primarily done for the SC project, but are going
    to be conducted as a coherent effort of the
    physics, detector AND SC groups with the goal to
    advance the physics, detector AND SC efforts.
  • The DC are why we are here. If we cannot
    participate there would be no point in going for
    a experiment driven UF

63
Alternative Scenarios
  • Q are Tier-2 resources spread too thin?
  • The Tier-2 efforts should be as broad as we can
    afford it. We are including university
    (non-funded) groups, like Princeton
  • If the role of the Tier-2 centers were just to
    provide computing resources we would not
    distribute this, but concentrate on the Tier-1
    center. Instead the model is to put some
    resources at the prototype T2 centers, which
    allows us to pull in additional resources at
    these sites. This model seems to be rather
    successful.
  • iVDGL funds are being used for much of the
    efforts at prototype T2 centers. Hardware
    investments at the Tier-2 sites up to know have
    been small. The project planned to fund 1.5FTE at
    each site (this funding is not yet there). In CMS
    we see additional manpower at those sites of
    several FTE, that comes out of the base program
    and that are being attracted from CS et al
    communities through the involvement in Grid
    projects

64
Alternative Scenarios
  • Q additional software development activities to
    be combined?
  • this will certainly happen. Concretely we already
    started to plan the first large-scale ATLAS-CMS
    common software project, the new persistency
    framework. Do we expect significant savings in
    the manpower efforts? These could be in the order
    of some 20-30, if these efforts could be closely
    managed. However, the management is not in US
    hands, but in the purview of the LCG project.
    Also, the very project is ADDITIONAL effort that
    was not necessary when Objectivity was meant to
    provide the persistency solution.
  • generally we do not expect very significant
    changes in the estimates for the total
    engineering manpower required to complete the
    core software efforts, the possible savings would
    give a minimal contingency to the engineering
    effort that is to date missing in the project
    plan. -gt to be earned first, then released in 2005

65
Alternative Scenarios
  • Q are we loosing, are there real cost benefits?
  • any experiment that does not have a kernel of
    people to run the data challenges will
    significantly loose
  • The commodity is people, not equipment
  • Sharing of resources is possible (and will
    happen), but we need to keep minimal RD
    equipment. 300k/year for each T1 is very little
    funding for doing that. Below that we should just
    go home
  • Tier2 the mission of the Tier2 centers is to
    enable universities to be part of the LHC
    research program. That function will be cut in as
    much as the funding for it will be cut.
  • To separate the running of the facilities form
    the experiments effort This is a model that we
    are developing for our interactions with Fermilab
    CD -- this is the ramping to 35 FTE in 2007,
    not the 13 FTE now some services already now
    are being effort-reported to CD-CMS. We have to
    get the structures in place to get this right -
    there will be overheads involved
  • I do not see real cost benefits in any of these
    for the RD phase. I prefer not to discuss the
    model for 2007 now, but we should stay open
    minded. However, if we want to approach
    unconventional scenarios we need to carefully
    prepare for them. That may start in 2003-2004?

66
  • UF PM
  • Control room logbook
  • Code dist, dar, role for grid
  • T2 work
  • CCS schedule?
  • More comments on the job
  • Nucleation point vs T1 user community
  • new hires
  • Tonys assignment, prod running
  • Disk tests, benchmarking, common work w/ BNL and
    iVDGL facility grp
  • Monitoring, ngop, ganglia, iosifs stuff
  • Mention challenges to test bed/MOP config,
    certificates, installations, and help we get from
    grid projects VO, ESNET CA, VDT
  • UF workplan
Write a Comment
User Comments (0)
About PowerShow.com