Plans for the integration of grid tools in the CMS computing environment - PowerPoint PPT Presentation

About This Presentation

Title:

Plans for the integration of grid tools in the CMS computing environment

Description:

Two 'official' CMS productions on the grid in 2002 ... shell. scripts. DAGMan (MOP) Local. Batch Manager. EDG. Scheduler. Computer farm. LCG-1. testbed ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 17

Provided by: Claudio109

Learn more at: https://www.slac.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Plans for the integration of grid tools in the CMS computing environment

1
Plans for the integration of grid tools in the
CMS computing environment

Claudio Grandi
(INFN Bologna)
on behalf of the CMS-CCS group

2
Data Challenge 2002 on Grid

Two official CMS productions on the grid in
2002
CMS-EDG Stress Test on EDG testbed CMS sites
260K events CMKIN and CMSIM steps
Top-down approach more functionality but less
robust, large manpower needed
USCMS IGT Production in the US
1M events Ntuple-only (full chain in single job)
500K up to CMSIM (two steps in single job)
Bottom-up approach less functionality but more
stable, little manpower needed
See talk by P.Capiluppi

3
Data Challenge 2004

Next important computing milestone for CMS is
the Data Challenge in 2004 (DC04)
reconstruction and analysis on CMS data sustained
over one month at a rate which is the 5 of the
LHC rate at full luminosity (25 of start-up
luminosity)
50 millions fully digitized events needed as
input
will exploit the LCG-1 resources
is a pure computing challenge!
see talk by V.Innocente for CMS data analysis

4
Pre-Challenge Production

Simulation and digitization of 50M events
(PCP04)
6 months (July to December 2003)
Transfer to CERN 1TB/day for 2 months
(Nov.-Dec. 03)
Distributed most of CMS Regional Centers will
participate

5
Boundary conditions for PCP

CMS persistency is changing
POOL (by LCG) is replacing Objectivity/DB
CMS Compiler is changing
gcc 3.2.2 is replacing 2.95.2
Operating system is changing
Red Hat 7.3 is replacing 6.1.1
Grid middleware structure is changing
EDG on top of VDT
?
CMS has to deal with all this while preparing for
the Pre-Challenge Production!

6
PCP strategy

PCP cannot fail (no DC04!)
basic strategy is to run on dedicated, fully
controllable resources without the need of grid
tools
grid-based prototypes have to be compatible with
the basic non-grid environment
Jobs will run in a limited-sandbox
input data local to the job
local XML POOL catalogue (prepared by the prod.
tools)
output data/metadata and job monitoring data
produced locally and moved to the site manager
asynchronously
synchronous components optionally update central
catalogues. If they fail the job will continue
and the catalogues are updated asynchronously
reduce dependencies on external environment and
improve robustness

7
Hybrid production model
Users Site
Resources
Production Manager defines assignments
RefDB
Phys.Group asks for an official dataset
shell scripts
Local Batch Manager
JDL
EDG Scheduler
Site Manager starts an assignment
LCG-1 testbed
MCRunJob
DAGMan (MOP)
User starts a private production
Chimera VDL
Virtual Data Catalogue
Planner
8
Limited-sandbox environment
Worker Node
Users Site
Job input
Job input
Job Wrapper (job instru- mentation)
User Job
Job output
Job output
Journal writer
Journal
Journal
Remote updater
Asynchronous updater
Metadata DB

File transfers, if needed, are managed by
external tools (EDG-JSS, additional DAG nodes,
etc...)

9
Job production

Done by MCRunJob (see talk by G.Graham)
Modular produce plug-ins for
reading from RefDB
reading from simple GUI
submitting to a local resource manager
submitting to DAGMan/Condor-G (MOP)
submitting to the EDG scheduler
producing derivations in the Chimera Virtual Data
Catalogue (see talk by R.Cavanaugh)
Runs on the user (e.g. site manager) host
Defines also the sandboxes needed by the job
If needed, the specific submission plug-in takes
care of
moving the sandbox files to the worker nodes
preparing the XML POOL catalogue with input files
information

10
Job Metadata management

Job parameters that represent the job running
status are stored in a dedicated database
when did the job start?
is it finished?
but also
how many events did it produce so far?
BOSS is a CMS-developed system that does this
extracting the info from the job standard
input/output/error streams
The remote updater is based on MySQL
A remote updater based on R-GMA is being
developed (scalability tests being done now) for
running in a grid environment
See talk by C.G.

11
Dataset Metadata management

Dataset metadata are stored in the RefDB (see
talk by V.Lefebure)
by what (logical) files is it made of?
but also
what input parameters to the simulation program?
how many events have been produced so far?
Information may be updated in the RefDB in
many ways
manual Site Manager operation
automatic e-mail from the job
a remote updater similar to BOSS R-GMA will be
developed for running in a grid environment
Mapping of logical names to GUID and of GUID
to physical file names will be done on the grid
by RLS

12
Other issues for PCP

Software distribution and installation
pre-installed software
rpm files installed by LCG site administrators
installed on demand (if possible)
DAR files located using PACMAN or the Replica
Manager
pile-up data (huge dataset!)
must be pre-installed in the site (in an
appropriate number of copies) to have reasonable
performance
on the grid considered as part of the
digitization software
Data transfer
Replica Manager or direct gridFTP
MSS access using SRM under test. SE workshop

13
DC04 Workflow

Process data at 25 Hz (50 MB/s) at the Tier-0
Reconstruction produces DST and AOD
AOD replicated to all Tier-1 (assume 4 centers)
DST replicated to at least one Tier-1
Assume Digis are already replicated in at least
one Tier-1
No bandwidth to transfer Digis synchronously
Archive Digis to tape library
Express lines transferred to selected Tier-1
Calibration streams, Higgs analysis stream,
Analysis recalibration
Produce new calibration data at selected Tier-1
and update the Conditions Database
Analysis from the Tier-2 on AOD, DST,
occasionally on Digis

14
DC04 Strategy

DC04 is a computing challenge
Run on LCG-1 (possibly integrated by CMS
resources)
Use Replica Manager services to locate data
Use a Workload Management System to select
resources
Use a Grid-wide monitoring system
Client-server analysis Clarens (see talk by
C.Steenberg)
Data management strategy (preliminary)
Express Lines pushed from Tier-0 to Tier-1s
AOD, DST published by Tier-0 and pulled by
Tier-1s
Conditions DB segmented in read-only Calibration
Sets
Versioned
Metadata stored in the RefDB
Temporary solution
need specific middleware for read-write data
management

15
Summary

Next CMS computing challenges will be done in
a very dynamic environment
Data Challenge 2004 will be done on LCG-1
Pre-Challenge Production already well defined
Flexible production tools may run in a local or
distributed environment
Basically outside the Grid but an ideal proof of
maturity for Grid-based prototypes
Data Challenge architecture will be built on
the experience CMS will gain during PCP

16
(No Transcript)

Write a Comment

User Comments (0)