CMS Distributed Data Analysis Challenges - PowerPoint PPT Presentation

About This Presentation
Title:

CMS Distributed Data Analysis Challenges

Description:

Claudio Grandi INFN Bologna. ACAT'03 - KEK. 3-Dec-2003. CMS Distributed ... Supposed to end by Xmas. Generation and simulation: 48 million events with CMSIM ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 44
Provided by: Claudio108
Category:

less

Transcript and Presenter's Notes

Title: CMS Distributed Data Analysis Challenges


1
CMS Distributed Data Analysis Challenges
  • Claudio Grandi
  • on behalf of the CMS Collaboration

2
Outline
  • CMS Computing Environment
  • CMS Computing Milestones
  • OCTOPUS CMS Production System
  • 2002 Data productions
  • 2003 Pre-Challenge production (PCP03)
  • PCP03 on grid
  • 2004 Data Challenge (DC04)
  • Summary

3
CMS Computing Environment
4
CMS computing context
  • LHC will produce 40 million bunch crossing per
    second in the CMS detector (1000 TB/s)
  • The on-line system will reduce the rate to 100
    events per second (100 MB/s raw data)
  • Level-1 trigger hardware
  • High level trigger on-line farm
  • Raw data (1MB/evt) will be
  • archived on persistent storage (1 PB/year)
  • reconstructed to DST (0.5 MB/evt) and AOD (20
    KB/evt)
  • Reconstructed data (and part of raw data) will
    be
  • distributed to computing centers of collaborating
    institutes
  • analyzed by physicists at their own institutes

5
CMS Data Production at LHC
40 MHz (1000 TB/sec)
Level 1 Trigger
75 KHz (50 GB/sec)
High Level Trigger
100 Hz (100 MB/sec)
Data Recording Offline Analysis
6
CMS Distributed Computing Model
PByte/sec
100-1500 MBytes/sec
Online System
Experiment
CERN Center PBs of Disk Tape Robot
Tier 0 1
Tier 1
2.5-10 Gbps
FNAL Center
IN2P3 Center
INFN Center
RAL Center
2.5-10 Gbps
Tier 2
2.5-10 Gbps
Tier 3
Institute
Institute
Institute
Institute
Physics data cache
0.1 to 10 Gbps
Workstations
Tier 4
7
CMS software for Data Simulation
  • Event Generation
  • Pythia and other generators
  • Generally Fortran programs. Produce N-tuple files
    (HEPEVT format)
  • Detector simulation
  • CMSIM (uses GEANT-3)
  • Fortran program. Produces Formatted Zebra (FZ)
    files from N-tuples
  • OSCAR (uses GEANT-4 and the CMS COBRA framework)
  • C program. Produces POOL files (hits) from
    N-tuples
  • Digitization (DAQ simulation)
  • ORCA (uses the CMS COBRA framework)
  • C program. Produces POOL files (digis) from
    hits POOL files or FZ
  • Trigger simulation
  • ORCA
  • Reads digis POOL files
  • Normally run as part of the reconstruction phase

8
CMS software for Data Analysis
  • Reconstruction
  • ORCA
  • Produces POOL files (DST and AOD) from hits or
    digis POOL files
  • Analysis
  • ORCA
  • Reads POOL files in (hits, digis,) DST, AOD
    formats
  • IGUANA (uses ORCA and OSCAR as back-end)
  • Visualization program (event display, statistical
    analysis)

9
CMS software ORCA C.
Zebra files with HITS
Pythia
CMSIM (GEANT3)
HEPEVT Ntuples
Other Generators
OSCAR/COBRA (GEANT4)
ORCA/COBRA Hit Formatter
  • Data Simulation ?

Merge signal and pile-up
Digis Database (POOL)
Hits Database (POOL)
ORCA/COBRA Digitization
IGUANA Interactive Analysis
Database (POOL)
ORCA Reconstruction or User Analysis
  • Data Analysis?

Ntuples or Root files
10
CMS Computing Milestones
11
CMS computing milestones
DAQ TDR (Technical Design Report) Spring-2002
Data Production
Software Baselining
Computing Core Software TDR 2003 Data
Production (PCP04) 2004 Data Challenge (DC04)
Physics TDR 2004/05 Data Production (DC05) Data
Analysis for physics TDR
Readiness Review 2005 Data Production
(PCP06) 2006 Data Challenge (DC06)
Commissioning
12
Size of CMS Data Challenges
  • 1999 1TB 1 month 1
    person
  • 2000-2001 27 TB 12 months 30 persons
  • 2002 20 TB 2 months 30 persons
  • 2003 175 TB 6 months lt30 persons

13
World-wide Distributed Productions
CMS Production Regional Centre
CMS Distributed Production Regional Centre
14
CMS Computing Challenges
  • CMS Computing challenges include
  • production of simulated data for studies on
  • Detector design
  • Trigger and DAQ design and validation
  • Physics system setup
  • definition and set-up of analysis infrastructure
  • definition of computing infrastructure
  • validation of computing model
  • Distributed system
  • Increasing size and complexity
  • Tightened to other CMS activities
  • provide computing support for all CMS activities

15
OCTOPUSCMS Production System
16
OCTOPUS Data Production System
Phys.Group asks for a new dataset
Production Manager defines assignments
RefDB
shell scripts
Data-level query
Local Batch Manager
BOSS DB
Job level query
McRunjob plug-in CMSProd
Site Manager starts an assignment
17
Remote connections to databases
User Interface
Worker Node
Job input
Job input
Job Wrapper (job instru- mentation)
User Job
Job output
Job output
Journal writer
Journal Catalog
Journal Catalog
Remote updater
Asynchronous updater
Metadata DB
Direct connection from WN
  • Metadata DB are RLS/POOL, RefDB, BOSS DB

18
Job production
  • MCRunJob
  • Modular produce plug-ins for
  • reading from RefDB
  • reading from simple GUI
  • submitting to a local resource manager
  • submitting to DAGMan/Condor-G (MOP)
  • submitting to the EDG/LCG scheduler
  • producing derivations in the Chimera Virtual Data
    Catalogue
  • Runs on the user (e.g. site manager) host
  • Defines also the sandboxes needed by the job
  • If needed, the specific submission plug-in takes
    care of
  • preparing the XML POOL catalogue with input files
    information
  • moving the sandbox files to the worker nodes
  • CMSProd

19
Job Metadata management
  • Job parameters that represent the job running
    status are stored in a dedicated database
  • when did the job start?
  • is it finished?
  • but also
  • how many events did it produce so far?
  • BOSS is a CMS-developed system that does this
    extracting the info from the job standard
    input/output/error streams
  • The remote updater is based on MySQL
  • Remote updater are being developed now based on
  • R-GMA (still has scalability problems)
  • Clarens (just started)

20
Dataset Metadata management
  • Dataset metadata are stored in the RefDB
  • by what (logical) files is it made of?
  • but also
  • what input parameters to the simulation program?
  • how many events have been produced so far?
  • Information may be updated in the RefDB in
    many ways
  • manual Site Manager operation
  • automatic e-mail from the job
  • remote updaters based on R-GMA and Clarens
    (similar to those developed for BOSS) will be
    developed
  • Mapping of logical names to physical file
    names will be done on the grid by RLS/POOL

21
2002 Data Productions
22
2002 production statistics
  • Used Objectivity/DB for persistency
  • 11 Regional Centers, more than 20 sites, about 30
    site managers
  • Spring 2002 Data production
  • Generation and detector simulation
  • 6 million events in 150 physics channels
  • Digitization
  • gt13 million events with different configuration
    (luminosity)
  • about 200 KSI2000 months
  • more than 20 TB digitized data
  • Fall 2002 Data production
  • 10 million events, full chain (small output)
  • about 300 KSI2000 months
  • Also productions on grid!

23
Spring 2002 production history
CMSIM
1.5 million events per month
24
Fall 2002 CMS grid productions
  • CMS/EDG Stress Test on EDG testbed CMS
    sites
  • Top-down approach more functionality but less
    robust, large manpower needed

1.2 million events in 2 months ?
260,000 events in 3 weeks
?
  • USCMS IGT Production in the US
  • Bottom-up approach less functionality but more
    stable, little manpower needed

25
2003 Pre-Challenge Production
26
PCP04 production statistics
  • Started in july. Supposed to end by Xmas.
  • Generation and simulation
  • 48 million events with CMSIM
  • 50 ?150 KSI2K s/event, 2000 KSI2K months
  • 1MB/event, 50 TB
  • hit-formatting in progress. POOL format reduces
    size of a factor of 2!
  • 6 million events with OSCAR
  • 100 ? 200 KSI2K s/event, 350 KSI2K months (in
    progress)
  • Digitization just starting
  • need to digitize 70 million events. Not all in
    time for DC04! Estimated
  • 30-40 KSI2K s/event, 950 KSI2K months
  • 1.5 MB/event, 100 TB
  • Data movement to CERN
  • 1TB/day for 2 months

27
PCP 2003 production history
CMSIM
13 million events per month
28
PCP04 on grid
29
US DPE production system
  • Running on Grid2003
  • 2000 CPUs
  • Based on VDT1.1.11
  • EDG VOMS for authentication
  • GLUE Schema for MDS Information Providers
  • MonaLisa for monitoring
  • MOP for production control

US DPE Production on Grid2003
MOP System
  • - Dagman and Condor-G for specification and
    submission
  • - Condor-based match-making process selects
    resources

30
Performance of US DPE
  • USMOP Regional Center
  • - 7.7 Mevts pythia
  • 30000 jobs 1.5min each,
  • 0.7 KSI2000 months
  • - 2.3 Mevts cmsim
  • 9000 jobs 10hours each,
  • 90 KSI2000 months
  • 3.5 TB data

CMSIM
Now running OSCAR productions
31
CMS/LCG-0 testbed
  • CMS/LCG-0 is a CMS-wide testbed based on the LCG
    pilot distribution (LCG-0), owned by CMS
  • joint CMS DataTAG-WP4 LCG-EIS effort
  • started in june 2003
  • Components from VDT 1.1.6 and EDG 1.4.X (LCG
    pilot)
  • Components from DataTAG (GLUE schemas and info
    providers)
  • Virtual Organization Management VOMS
  • RLS in place of the replica catalogue (uses
    rlscms by CERN/IT)
  • Monitoring GridICE by DataTAG
  • tests with R-GMA (as BOSS transport layer for
    specific tests)
  • no MSS direct access (bridge to SRB at CERN)
  • About 170 CPUs, 4 TB disk
  • Bari Bologna Bristol Brunel CERN CNAF Ecole
    Polytechnique Imperial College ISLAMABAD-NCP
    Legnaro Milano NCU-Taiwan Padova U.Iowa
  • Allowed to do CMS software integration while
    LCG-1 was not out

32
CMS/LCG-0 Production system
  • OCTOPUS installed on User Interface
  • CMS software (installed on Computing Elements as
    RPMs)

RefDB
SE
RLS
User Interface
McRunjob ImpalaLite
JDL
Grid (LCG) Scheduler
SE
Grid Information System (MDS)
SE
CE
BOSS DB
SE
33
CMS/LCG-0 performance
  • CMS-LCG Regional Center
  • based on CMS/LCG-0
  • 0.5 Mevts heavy pythia
  • 2000 jobs 8hours each,
  • 10 KSI2000 months
  • 1.5 Mevts cmsim
  • 6000 jobs 10hours each,
  • 55 KSI2000 months
  • 2.5 TB data
  • Inefficiency estimation
  • 5 to 10 due to sites misconfiguration and
    local failures
  • 0 to 20 due to RLS unavailability
  • few errors in execution of job wrapper
  • Overall inefficiency 5 to 30

Pythia CMSIM
Now used as a play-ground for CMS grid-tools
development
34
Data Challenge 2004(DC04)
35
2004 Data Challenge
  • Test the CMS computing system at a rate which
    corresponds to the 5 of the full LHC luminosity
  • corresponds to the 25 of the LHC startup
    luminosity
  • for one month (February or March 2004)
  • 25 Hz data taking rate at a luminosity of 0.2 x
    1034 cm-2s-1
  • 50 million events (completely simulated up to
    digis during PCP03) used as input
  • Main tasks
  • Reconstruction at Tier-0 (CERN) at 25 Hz (40
    MB/s)
  • Distribution of DST to Tier-1 centers (5 sites)
  • Re-calibration at selected Tier-1 centers
  • Physics-groups analysis at the Tier-1 centers
  • User analysis from the Tier-2 centers

36
DC04 Calibration challenge
DC04 Analysis challenge
CERN disk pool 40 TByte (20 days data)
DC04 T0 challenge
25Hz 1MB/evt raw
25Hz 0.5MB reco DST
HLT Filter ?
Disk cache
Archive storage
CERN Tape archive
37
Tier-0 challenge
  • Data serving pool to serve digitized events at
    25Hz to the computing farm with 20/24 hour
    operation.
  • 40 MB/s
  • Adequate buffer space (at least 1/4 of the digi
    sample in the disk buffer).
  • Pre-staging software. File locking while in use,
    buffer cleaning and restocking as files have been
    processed
  • Computing Farm approximately 400 CPUs
  • jobs running 20/24 hours. 500 events/job, 3
    hour/job
  • Files in buffer locked till successful job
    completion
  • No dead-time can be introduced to the DAQ.
    Latencies must be no more than of order 6-8 hours
  • CERN MSS 50 MB/s archiving rate
  • archive 1.5 MB 25 Hz raw data (digis)
  • archive 0.5 MB 25 Hz reconstructed events
    (DST)
  • File catalog POOL/RLS
  • Secure and complete catalog of all data
    input/products
  • Accessible and/or replicable to the other
    computing centers

38
Data distribution challenge
  • Replication of the DST and part of raw data at
    one or more Tier-1 centers
  • possibly using the LCG replication tools
  • foreseen some event duplication
  • At CERN 3 GB/s traffic without inefficiencies
    (about 1/5 at Tier-1)
  • Tier-0 catalog accessible by all sites
  • Replication of calibration samples (DST/raw) to
    selected Tier-1
  • Transparent access of jobs at the Tier-1 sites to
    the local data whether in MSS or on disk buffer
  • Replication of any Physics-Groups (PG) data
    produced at the Tier-1 sites to the other Tier-1
    sites and interested Tier-2 sites
  • Monitoring of Data Transfer activites
  • e.g. with MonaLisa

39
Calibration challenge
  • Selected sites will run calibration procedures
  • Rapid distribution of the calibration samples
    (within hours at most) to the Tier-1 site and
    automatically scheduled jobs to process the data
    as it arrives.
  • Publication of the results in an appropriate form
    that can be returned to the Tier-0 for
    incorporation in the calibration database
  • Ability to switch calibration database at the
    Tier-0 on the fly and to be able to track from
    the meta-data which calibration table has been
    used.

40
Tier-1 analysis challenge
  • All data distributed from Tier-0 safely inserted
    to local storage
  • Management and publication of a local catalog
    indicating status of locally resident data
  • define tools and procedures to synchronize a
    variety of catalogs with the CERN RLS catalog
    (EDG-RLS, Globus-RLS, SRB-Mcat, )
  • Tier-1 catalog accessible to at least the
    associated Tier-2 centers
  • Operation of the physics-group (PG) productions
    on the imported data
  • production-like activity
  • Local computing facilities made available to
    Tier-2 users
  • Possibly via the LCG job submission system
  • Export of the PG-data to requesting sites
    (Tier-0, -1 or -2)
  • Registration of the data produced locally to the
    Tier-0 catalog to make them available to at least
    selected sites
  • possibly via the LCG replication tools

41
Tier-2 analysis challenge
  • Point of access to computing resources of the
    physicists
  • Pulling of data from peered Tier-1 sites as
    defined by the local Tier-2 activities
  • Analysis on the local PG-data produces plots
    and/or summary tables
  • Analysis on distributed PG-data or DST available
    at least at the reference Tier-1 and associated
    Tier-2 centers.
  • Results are made available to selected remote
    users possibly via the LCG data replication
    tools.
  • Private analysis on distributed PG-data or DST is
    outside DC04 scope but will be kept as a
    low-priority milestone
  • use of a Resource Broker and Replica Location
    Service to gain access to appropriate resources
    without knowing where the input data are
  • distribution of user-code to the executing
    machines
  • user-friendly interface to prepare, submit and
    monitor jobs and to retrieve results

42
Summary of DC04 scale
  • Tier-0
  • Reconstruction and DST production at CERN
  • 75 TB Input Data
  • 180 KSI2K months 400 CPU _at_24 hour operation
    (_at_500 SI2K/CPU)
  • 25TB Output data
  • 1-2 TB/Day Data Distribution from CERN to sum of
    T1 centers
  • Tier-1
  • Assume all (except CERN) CMS Tier-1s
    participate
  • CNAF, FNAL, Lyon, Karlsruhe, RAL
  • Share the T0 output DST between them (5-10TB
    each)
  • 200 GB/day transfer from CERN (per T1)
  • Perform scheduled analysis group production
  • 100 KSI2K months total 50 CPU per T1 (24
    hrs/30 days)
  • Tier-2
  • Assume about 5-8 T2
  • may be more
  • Store some of PG-data at each T2 (500GB-1TB)
  • Estimate 20 CPU at each center for 1 month

43
Summary
  • Computing is a CMS-wide activity
  • 18 regional centers, 50 sites
  • Committed to support other CMS activities
  • support analysis for DAQ, Trigger and Physics
    studies
  • Increasing in size and complexity
  • 1 TB in 1 month at 1 site in 1999
  • 170 TB in 6 months at 50 sites today
  • Ready for full LHC size in 2007
  • Exploiting new technologies
  • Grid paradigm adopted by CMS
  • Close collaboration with LCG and EU and US grid
    projects
  • Grid tools assuming more and more importance in
    CMS
Write a Comment
User Comments (0)
About PowerShow.com