Plans for the integration of grid tools in the CMS computing environment - PowerPoint PPT Presentation

About This Presentation
Title:

Plans for the integration of grid tools in the CMS computing environment

Description:

Two 'official' CMS productions on the grid in 2002 ... shell. scripts. DAGMan (MOP) Local. Batch Manager. EDG. Scheduler. Computer farm. LCG-1. testbed ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 17
Provided by: Claudio109
Category:

less

Transcript and Presenter's Notes

Title: Plans for the integration of grid tools in the CMS computing environment


1
Plans for the integration of grid tools in the
CMS computing environment
  • Claudio Grandi
  • (INFN Bologna)
  • on behalf of the CMS-CCS group

2
Data Challenge 2002 on Grid
  • Two official CMS productions on the grid in
    2002
  • CMS-EDG Stress Test on EDG testbed CMS sites
  • 260K events CMKIN and CMSIM steps
  • Top-down approach more functionality but less
    robust, large manpower needed
  • USCMS IGT Production in the US
  • 1M events Ntuple-only (full chain in single job)
  • 500K up to CMSIM (two steps in single job)
  • Bottom-up approach less functionality but more
    stable, little manpower needed
  • See talk by P.Capiluppi

3
Data Challenge 2004
  • Next important computing milestone for CMS is
    the Data Challenge in 2004 (DC04)
  • reconstruction and analysis on CMS data sustained
    over one month at a rate which is the 5 of the
    LHC rate at full luminosity (25 of start-up
    luminosity)
  • 50 millions fully digitized events needed as
    input
  • will exploit the LCG-1 resources
  • is a pure computing challenge!
  • see talk by V.Innocente for CMS data analysis

4
Pre-Challenge Production
  • Simulation and digitization of 50M events
    (PCP04)
  • 6 months (July to December 2003)
  • Transfer to CERN 1TB/day for 2 months
    (Nov.-Dec. 03)
  • Distributed most of CMS Regional Centers will
    participate

5
Boundary conditions for PCP
  • CMS persistency is changing
  • POOL (by LCG) is replacing Objectivity/DB
  • CMS Compiler is changing
  • gcc 3.2.2 is replacing 2.95.2
  • Operating system is changing
  • Red Hat 7.3 is replacing 6.1.1
  • Grid middleware structure is changing
  • EDG on top of VDT
  • ?
  • CMS has to deal with all this while preparing for
    the Pre-Challenge Production!

6
PCP strategy
  • PCP cannot fail (no DC04!)
  • basic strategy is to run on dedicated, fully
    controllable resources without the need of grid
    tools
  • grid-based prototypes have to be compatible with
    the basic non-grid environment
  • Jobs will run in a limited-sandbox
  • input data local to the job
  • local XML POOL catalogue (prepared by the prod.
    tools)
  • output data/metadata and job monitoring data
    produced locally and moved to the site manager
    asynchronously
  • synchronous components optionally update central
    catalogues. If they fail the job will continue
    and the catalogues are updated asynchronously
  • reduce dependencies on external environment and
    improve robustness

7
Hybrid production model
Users Site
Resources
Production Manager defines assignments
RefDB
Phys.Group asks for an official dataset
shell scripts
Local Batch Manager
JDL
EDG Scheduler
Site Manager starts an assignment
LCG-1 testbed
MCRunJob
DAGMan (MOP)
User starts a private production
Chimera VDL
Virtual Data Catalogue
Planner
8
Limited-sandbox environment
Worker Node
Users Site
Job input
Job input
Job Wrapper (job instru- mentation)
User Job
Job output
Job output
Journal writer
Journal
Journal
Remote updater
Asynchronous updater
Metadata DB
  • File transfers, if needed, are managed by
    external tools (EDG-JSS, additional DAG nodes,
    etc...)

9
Job production
  • Done by MCRunJob (see talk by G.Graham)
  • Modular produce plug-ins for
  • reading from RefDB
  • reading from simple GUI
  • submitting to a local resource manager
  • submitting to DAGMan/Condor-G (MOP)
  • submitting to the EDG scheduler
  • producing derivations in the Chimera Virtual Data
    Catalogue (see talk by R.Cavanaugh)
  • Runs on the user (e.g. site manager) host
  • Defines also the sandboxes needed by the job
  • If needed, the specific submission plug-in takes
    care of
  • moving the sandbox files to the worker nodes
  • preparing the XML POOL catalogue with input files
    information

10
Job Metadata management
  • Job parameters that represent the job running
    status are stored in a dedicated database
  • when did the job start?
  • is it finished?
  • but also
  • how many events did it produce so far?
  • BOSS is a CMS-developed system that does this
    extracting the info from the job standard
    input/output/error streams
  • The remote updater is based on MySQL
  • A remote updater based on R-GMA is being
    developed (scalability tests being done now) for
    running in a grid environment
  • See talk by C.G.

11
Dataset Metadata management
  • Dataset metadata are stored in the RefDB (see
    talk by V.Lefebure)
  • by what (logical) files is it made of?
  • but also
  • what input parameters to the simulation program?
  • how many events have been produced so far?
  • Information may be updated in the RefDB in
    many ways
  • manual Site Manager operation
  • automatic e-mail from the job
  • a remote updater similar to BOSS R-GMA will be
    developed for running in a grid environment
  • Mapping of logical names to GUID and of GUID
    to physical file names will be done on the grid
    by RLS

12
Other issues for PCP
  • Software distribution and installation
  • pre-installed software
  • rpm files installed by LCG site administrators
  • installed on demand (if possible)
  • DAR files located using PACMAN or the Replica
    Manager
  • pile-up data (huge dataset!)
  • must be pre-installed in the site (in an
    appropriate number of copies) to have reasonable
    performance
  • on the grid considered as part of the
    digitization software
  • Data transfer
  • Replica Manager or direct gridFTP
  • MSS access using SRM under test. SE workshop

13
DC04 Workflow
  • Process data at 25 Hz (50 MB/s) at the Tier-0
  • Reconstruction produces DST and AOD
  • AOD replicated to all Tier-1 (assume 4 centers)
  • DST replicated to at least one Tier-1
  • Assume Digis are already replicated in at least
    one Tier-1
  • No bandwidth to transfer Digis synchronously
  • Archive Digis to tape library
  • Express lines transferred to selected Tier-1
  • Calibration streams, Higgs analysis stream,
  • Analysis recalibration
  • Produce new calibration data at selected Tier-1
    and update the Conditions Database
  • Analysis from the Tier-2 on AOD, DST,
    occasionally on Digis

14
DC04 Strategy
  • DC04 is a computing challenge
  • Run on LCG-1 (possibly integrated by CMS
    resources)
  • Use Replica Manager services to locate data
  • Use a Workload Management System to select
    resources
  • Use a Grid-wide monitoring system
  • Client-server analysis Clarens (see talk by
    C.Steenberg)
  • Data management strategy (preliminary)
  • Express Lines pushed from Tier-0 to Tier-1s
  • AOD, DST published by Tier-0 and pulled by
    Tier-1s
  • Conditions DB segmented in read-only Calibration
    Sets
  • Versioned
  • Metadata stored in the RefDB
  • Temporary solution
  • need specific middleware for read-write data
    management

15
Summary
  • Next CMS computing challenges will be done in
    a very dynamic environment
  • Data Challenge 2004 will be done on LCG-1
  • Pre-Challenge Production already well defined
  • Flexible production tools may run in a local or
    distributed environment
  • Basically outside the Grid but an ideal proof of
    maturity for Grid-based prototypes
  • Data Challenge architecture will be built on
    the experience CMS will gain during PCP

16
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com