Title: Joint DOE and NSF Review of LHC Software and Computing Lawrence Berkeley Lab, January 1417, 2003 US
1Joint DOE and NSF Review of LHC Software and
ComputingLawrence Berkeley Lab, January 14-17,
2003 US CMS Software and Computing Project
Overview
- Lothar A. T. Bauerdick/FermilabProject Manager
2US CMS Software and Computing
- ? Provide software engineering support for CMS ?
CAS subproject - ? Provide SC environment to do LHC Physics in
the U.S. ? UF subproject - Tier-1 center at Fermilab plus Five Tier-2
centers in the U.S. - Tier-2s together will provide same CPU/Disk
resources as Tier-1 - The US CMS System from the beginning spans Tier-1
and Tier-2 systems - There is an economy of scale, and we plan for a
central support component - Already making opportunistic use of resources
that are NOT Tier-2 centers - Important for delivering the resources to physics
AND to involve Universities - e.g. UW Madison condor pool, MRI initiatives at
several Universities - The US CMS Grid System of T1 and T2 prototypes
and testbedshas an important function within CMS - help develop a truly global and distributed
approach to the LHC computing problem - ensure full participation of the US physics
community in the LHC research program - To succeed the U.S. requires the ability and
ambition for leadership and a strong support
to get the necessary resources!
3US CMS SC Since November 2001
- Consolidation of the project, shaping out the RD
program - Project Baselined in Nov 2001 Workplan for CAS,
UF, Grids endorsed - CMS has embraced the Grids
- Working and profiting from US and European Grid
projects - UF US CMS has build an initial successful Grid
for CMS production - commissioning of T1/T2 systems facilities, data
storage, data transfers and throughput - major production efforts for PRS/HLT studies,
- Commissioning of Grid-enabled Impala MC event
production system (MOP), testbed - Integration Grid Testbed, which uses T1/T2
facilities - Evaluation, then commissioning of production Grid
- CAS many new developments, technical and
organizational - We do have a working software and computing
system, that is fit for realistic physics
studies! ? Higher Level Trigger study, DAQ TDR
submitted - CCS will drive much of the common LCG Application
Area - PM many changes
- NSF proposal (Oct 2001) lead to 2-years 2002-2003
grant in September 2002NSF RP may start in 2003
with a 35M total until 2008 - New DOE guidance received in spring was
devastating, and 2003/4 baseline cannot be
realized - partly mitigated by softening profiles of
construction/SC projects, through D.Green
working with DOE - New bare-bones project defined, smaller than
baseline, but fundedallows us to ramp the
Tier-1 effort to the required level
4Organizational Changes
- US CMS Research Program (RP) started, Detector
MO, Upgrade RD, and the SC Project - RP proposal submitted to the NSF covering
2003-2008 - Change in US CMS line managementUS CMS RP
Manager Dan Green Detector Operations and SC
Project part of the RP - SC will continue as a project with baseline and
funding profile - HostlabThe SC Project and the MO effort will
each have its own separate funding
allocationChanges to the SC plan will be
managed in the same way as changes to the
Construction plan are managed, through a change
control process including approval by the host
laboratory and the funding agencies. - Will need to develop a coherent PMP Role of RP
Manager, PMG, ASCB, SCOP,
5US CMS SC Organization
- CMS Assignments related to U.S. CMS and CCS
- Greg Graham/Fermilab CMS CCS Grid Integration
Task/Production Subtask Leader - Julian Bunn/Caltech CMS CCS Grid Integration
Task/Analysis Subtask Leader - Tony Wildish/Princeton CMS CCS L2 Task
Production Processing and Data Management - LATBauerdick/Fermilab CMS CCS L2 Task Computing
Centers - LCG Assignments related to U.S. CMS
- LATBauerdick/Fermilab SC2 member for U.S. LHC,
I.Foster on SC2 for U.S. Grid technologies - R.Pordes/Fermilab and M.Livny/UW Madison LCG
Project Execution Board rep. U.S. Grid Projects - V.White/Fermilab U.S. RC representative to the
LCG Grid Deployment Board - Miron Livny U.S. Grid representative in the GDB
- Rick Cavanaugh U.S. CMS representative in the
GAG
6Project Funds FY02
- DOE funded US CMS SC with in total 2,394,856
- The total sum was sent to Fermilab,
- main CAS effort subcontracted w/ universities
- Funding for CAS efforts at Universities in
units of 1 fulltime person - NSF awarded a 2-year NSF grant 2002-2003 (Nov
2001 RP proposal) - 800k in FY02 1000k in FY03 (through NEU) for
US CMS RP - 690k and 750k were allocated to SC by US CMS
RPM Dan Green (60k overhead) - NSF iVDGL started FY2002, total 13.7M 2M
matching over 5 years - 466k have gone to US CMS prototype Tier-2
centers - Project has successfully tracked ACWP for all
DOE-funded activitiesand have tracked ACWP in
FTE-time for the NSF-funded activities - We have requested invoices for all these
activities, and expect to receive them
7FY02 Funding BA
- Details of the FY02 Budget Authority
8FY02 Actual Costs of Work Performed
- FY02 BA and ACWP for US CMS SC
- (Numbers in italics are BCWS, as no invoices were
received for NSF-funded efforts)
9Accomplishments
- Prototyped Tier-1 and Tier-2 centers and deployed
a Grid System - Participated in a world-wide 20TB data production
for HLT studies - US CMS delivered key components IMPALA, DAR
- Made available large data samples (Objectivity
and nTuples) to the physics community - ? successful submission of the CMS DAQ TDR
- Worked with Grid Projects and VDT to harden
middleware products - Integrated the VDT middleware in CMS production
system - Deployed Integration Grid Testbed and used for
real productions - Decoupled CMS framework from Objectivity
- allows to write data persistently as ROOT/IO
Files - Released a fully functional Detector Description
Database - Released Software Quality and Assessment Plan
10USCMS Contributions to CCS
- Contributions to CCS Manpower (assuming LCG 10
FTE) - CERN Members State contributions mainly coming in
through LCG AA - The U.S. is providing a fair share contribution
11US Contribution to Spring Production
- Contribution in events produced
- This is not a complete metric, but gives a good
indication - US CMS is providing a fair share of the CMS
resources forsimulations to support trigger and
physics studies
events high lumi pile-up
events simulated
12US Contributions to CMS Production
- Spring Production size of the community of
people contributing? this is NOT FTE
count! - CMS software and production systems attract a
sizable community in the US - US Base physicists/PRS people getting involved
in CMS production software for their DAQ-TDR
preparations - US Grids Trillium, and also Middleware
providers (Condor) - US Project in total 5 US CMS SC engineers
were involved
13Main Current Activities
- Develop the US CMS T1/T2 system into a working
Data Grid - high throughput data transfers
- Grid-wide job scheduling
- monitoring
- Middleware is VDT Condor, Globus et al
- US CMS-developed DPE toolkit and procedures,
underlying the - standard CMS production environment (Impala,
RefDB etc) - Integration Grid Testbed (IGT) was very
successful, see later talks - Next steps preparing the US CMS Production Grid
- Major coming milestones
- participation in the CMS 5 data challenge DC04
- be operational as part of LCG Production Grid
in June 2003 - Major development efforts are still needed for
those milestones - Providing a viable storage management solution
for multi-Terabytes data sets - building on dCache, SRM, etc
- End-to-end throughput with the goal of TBs/day
sustained rates from mass storage to mass
storage - Interfacing the Grid VO system to the local user
registration/security requirements - Consolidating the production system, data bases,
production configuration and meta-data systems
(MC_Runjob, catalogs, scheduling)
14Upcoming Projects
- VO management and security
- Working with Fermilab security and PPDG Site-AA
team - many RD, deployment, integration issues
- Need to develop operations scenario
- Cluster Management Generic Farms, Partitioning
- Needed to borrow manpower from Tier-2s (!!)
- Storage Management and Access
- Storage Architecture components and interfaces
- Also data set catalogs, metadata, replication,
robust file transfers - Networking Terabyte throughput to T2, to CERN
- TCP/IP on high-throughput WANs, end-to-end, QoS,
VPN, Starlight? - Web Services, Document Systems, Data Bases
- Need to ramp up general support infrastructure
- Physics Analysis Center
- Analysis cluster
- Desktop support
- Software Distribution, Software support, User
Support Helpdesk - Collaborative tools
15CMS Milestones v33 (June 2002)
16Set of High Level Milestones
- Integration Grid Testbed deployed, running PRS
production - October 2002
- SC2002 demonstration of Grid distributed
production and WorldGrid - November 2002
- review of SC2002 demo, promotion and termination
might be combined with the testbed review after
the SC2002 demonstration - December 2002
- Farm Configuration Definition and Deployment
- February 2003.
- Fully Functional Production Grid on a National
Scale - February 2003
- Migration of TestBed functionality to Production
Facilities - March 2003
- Start of LCG 24x7 Production Grid,
- June 2003 -- this needs definition from the
LCG/GDB as to what it actually means - Start of CMS DC04 production preparation, PCP04
- July 2003
- Running of DC04
- Feb 2004
17Resource Expectations for DC04-prep
18The need for a new plan
- Although the project was morally baselined in
November 2001and had a scope that corresponded
to the Funding Agency guidancethe DOE found that
it would be unable to fund the full baseline - We received a new guidance much below the
previous one, which also included the costs
detector MO and Upgrade RD - I was asked by the US CMS RPM to produce a bare
bones project plan that would address the
funding short fall during FY03 and FY04 - This was done using a top-down approach and
presented to CMS and Funding Agencies during the
summer - I received guidance for the FY03 DOE funds
available to SC, which is about the costs for
the bare bones plan - With this input I asked L2 managers and
sub-project leaders to develop a new WBS and
resource loaded schedule - This is addressing the necessary planning to
arrive at TDRs in 2004 and 2005 - Which will include a re-evaluation of costs and
efforts for the US CMS UF
19Bare Bones Plan Top-Down
- As a response to the revised funding guidance
defined a Bare Bones project plan -
- Continue to deliver SC engineering support to
CCSkeep this effort constant over the next 2
years, instead of ramping further - However, CMS Software is about 12 months behind
due to past severe understaffing - CMS needs the US (and CERN etc) man power to work
with the LCG Applications Area, - Contingencies if any will possibly develop in
course of joining forces with ATLAS et al - Ensure the US CMS system of Regional Centers is
ready for the Data Challenges(finally!) ramp the
Tier-1 effort to the required level of 13 FTE in
FY2003 - Starting at 6 FTE now, need to ramp to 13 FTE to
fully participate in the Data Challenges - The Bare Bones plan should enable US CMSto
become part in the LCG Production Grid
milestone during 2003 - Can participate in DC04 and 10 Data Challenges,
will try to pull in as much as we can from Grid
projects - Modest hardware procurements for Tier-1 and
Tier-2 centers - Typically 500k/year in FY2003, FY2004
- Need more (700k) in 2005 (latest) for
participation in 10 challenge, RC TDR - Start the deployment phase late in 2005
- Then need to start hiring or re-assigning
facility support people at the Tier-1 - Start pilot implementation of US Tier-2
20Bare Bones Project Costs
21Bare-Bones Scope BCWS FY02-FY05
22Funding Guidance to RP
23FY03 RP Funding Allocation
- Allocation of RP funding to SC and MO by
Research Program Manager -
- Assumption on RP taking a loan from construction
to be paid back in 2005/6 when construction
project finishes, softening the CP profile - This is being recognized in the DOE/NSF
Strawman Funding Guidance
24US CMS SC FY03 Budget
- Cost objective for FY03 4M
25A New Detailed WBS for 2003/4
- We have reworked the WBS as a tool to manage the
project - Adapt the project plan to the new shrunk scope of
slightly above bare bones -- instead of
executing the Nov2001 baseline - Formulate a consistent plan leading to a strong
US CMS participation in the - LCG production grid, starting June 2003, and
the - CMS 5 data challenge DC04
- Those plans are in the process of becoming
concrete enough to being planned for in detail
-- - choice of middleware,
- RC resource scheduling
- Security scheme
- The new WBS now clearly recognizes the roles
- of our engagement in the Grid and end-to-end
projects - of the testbeds for facilities and Grid
developments - of the short prototype cycles that enable to put
RD results into production - Worked out a WBS that reflects reality, and has a
structure that works with the new Fermilab
project accounting scheme, and allows tracking of
effort and progress at the lowest level in the WBS
26New WBS Level 3
- Note the large effort captured from Grids and/or
being at Universities - This is now being explicitly tracked by US CMS
27US CMS Approach to RD, Integration, Deployment
- prototyping, early roll out, strong QC/QA
documentation, tracking of external practices
28WBS Level 4 -- 1.1
29WBS Level 4 -- 1.2
30WBS Level 4 -- 1.3
31WBS Level 4 -- 1.4
32Estimated Resource Needs at T1 FY04/04
33US CMS T1 and T2 Manpower
34WBS and Schedule
- Developing the new project plan and a WBS with
resource loaded schedule is a rather slow
process, but we have established the guiding
principles and have gone a long way - WBS is existing and is being used for detailed
work planning - The UF effort will need to come from existing
manpower at Fermilabworking with CD management
to extract that manpower - There is general agreement on the WBS which will
allow us to consolidate a large and distributed
effort with many different funding sources - It is very encouraging to be able to agree on the
general strategy between the Tier-1, the Tier-2
centers and the Grid projects - The new structure is very much better suited so
that the WBS is owned by the local managers.
This is necessary to keep the WBS up to date and
to track the project. - The WBS is being resource-loaded, the detailed
schedule covers the time until end 2003 - However, many unknowns in 2003, including success
of POOL, LCG-1 etc - Please browse the WBS and schedule at
http//heppc16.ucsd.edu/Planning_new/ - We are developing a detailed equipment
procurement plan for FY03
35NSF Research Program Proposal
- Have 2-years grant for FY02/FY03
- expecting the RP to start in FY03 ramping over 5
years - This is addressing
- engineering for CCS
- developing the distributed Grid computing model
- building up the LHC computing infrastructure in
the US - operations of the emerging Grid
- some middleware support
- participation in the data challenges
- deploying the US LHC infrastructure
36NSF RP Proposal until 2007
- Proposal for Research Program 2003-2007 as
submitted through NEU
37Conclusions on US CMS SC
- US CMS SC Project is delivering a working Grid
environment, with a strong participation of
Fermilab and U.S. Universities - We need to do a lot more RD to build the system
for physics - Our customers (CCS, PRS and US CMS Users) are
happy (last time we were asking), but need and
want more support - US CMS is driving the US Grid integration and
deployment work - We have a unique opportunity to bring in our
ideas of doing science in a global and open
international and collaborative environment - Proposal to the NSF ITR solicitation to Globally
Enable Analysis Communities - That goes beyond the LHC and even HEP
- US CMS has shown that the US Tier-1/Tier-2 User
Facility system can indeed work to deliver
effort and resources to US CMS! - We definitely are on the map for LHC computing
and the LCG - With the funding advised by the funding agencies
and project oversightwe will have the manpower
and equipment at the lab and universities to
participate in strongly in the CMS data
challenges, - bringing the opportunity for U.S. leadership into
the emerging LHC physics program