CMS Plans for Use of LCG1 - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

CMS Plans for Use of LCG1

Description:

OCTOPUS: 'Overtly Contrived Toolkit Of Previously Unrelated Stuff' ... if a server fails, dCache can create a new pool and the application can wait ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 28
Provided by: davids391
Category:
Tags: cms | lcg1 | plans | use

less

Transcript and Presenter's Notes

Title: CMS Plans for Use of LCG1


1
CMS Plans for Use of LCG-1
  • David Stickland CMS Core Software and Computing
  • (Heavily based on recent talks by Ian Fisk,
    Claudio Grandi and Tony Wildish )

2
Computing TDR Strategy
Technologies Evaluation and evolution
Estimated Available Resources (no cost book for
computing)
  • Physics Model
  • Data model
  • Calibration
  • Reconstruction
  • Selection streams
  • Simulation
  • Analysis
  • Policy/priorities
  • Computing Model
  • Architecture (grid, OO,)
  • Tier 0, 1, 2 centres
  • Networks, data handling
  • System/grid software
  • Applications, tools
  • Policy/priorities

Iterations / scenarios
Required resources
Validation of Model
DC04 Data challenge Copes with 25Hz at 2x1033
for 1 month
Simulations Model systems usage patterns
  • C-TDR
  • Computing model ( scenarios)
  • Specific plan for initial systems
  • (Non-contractual) resource planning

3
Schedule PCP, DC04, C-TDR
  • 2003 Milestones
  • June Switch to OSCAR (critical path)
  • July Start GEANT4 production
  • Sept Software baseline for DC04
  • 2004 Milestones
  • April DC04 done (incl. post-mortem)
  • April First Draft C-TDR
  • Oct C-TDR Submission

PCP
4
PCP and DC04
  • Two quite different stages
  • Pre Challenge Production (PCP)
  • Important thing is to get it done how its done
    is not the big issue
  • But anything that can reduce manpower here is
    good, this will be a 6-month process
  • We intend to run a hybrid (non-GRID and GRID)
    operation (with migration from first to second
    case as tools mature)
  • Data Challenge (DC04)
  • Predicated on GRID operation for moving data and
    for running jobs (and/or pseudo jobs) at the
    appropriate locations
  • Exercise some calibration and analysis scenarios

5
DC04 Workflow
  • Process data at 25 Hz at the Tier-0
  • Reconstruction produces DST and AOD
  • AOD replicated to all Tier-1(assume 4 centers)
  • Sent on to participating Tier 2 (pull)
  • DST replicated to at least one Tier-1
  • Assume Digis are already replicated in at least
    one Tier-1
  • No bandwidth to transfer Digis synchronously
  • Archive Digis to tape library
  • Express lines transferred to selected Tier-1
  • Calibration streams, Higgs analysis stream,
  • Analysis recalibration
  • Produce new calibration data at selected Tier-1
    and update the Conditions Database
  • Analysis from the Tier-2 on AOD, DST,
    occasionally on Digis

6
Data Flow
In DC04
DC04 Calibration challenge
DC04 Analysis challenge
DC04 T0 challenge
CERN disk pool 40 TByte (20 days data)
25Hz 1MB/evt raw
25Hz 0.5MB reco DST
HLT Filter ?
Disk cache
Archive storage
CERN Tape archive
7
DC04 Strategy (partial...)
  • DC04 is focussed on preparations for the first
    months of data, not the operation in 2010
  • Grid enters mainly in the data distribution and
    analysis
  • Express Lines pushed from Tier-0 to Tier-1s
  • AOD, DST published by Tier-0 and pulled by
    Tier-1s
  • Use Replica Manager services to locate and move
    the data
  • Use a Workload Management System to select
    resources
  • Use a Grid-wide monitoring system
  • Conditions DB segmented in read-only Calibration
    Sets
  • Versioned
  • Metadata stored in the RefDB
  • Temporary solution
  • need specific middleware for read-write data
    management
  • Client-server analysis Clarens?
  • How does it interface to LCG-1 information and
    data management systems?

8
Boundary conditions for PCP
  • CMS persistency is changing
  • POOL (by LCG) is replacing Objectivity/DB
  • CMS Compiler is changing
  • gcc 3.2.2 is replacing 2.95.2
  • Operating system is changing
  • Red Hat 7.3 is replacing 6.1.1
  • Grid middleware structure is changing
  • EDG on top of VDT
  • ?
  • Flexibility to deal with a dynamic environment
    during the Pre-Challenge Production!

9
Perspective on Spring 02 Production
  • Strong control at each site
  • Complex machinery to install and commission at
    each site
  • Steep learning curve
  • Jobs coupled by Objectivity
  • Dataset-oriented production
  • not well suited to most sites capabilities

10
Perspective for PCP04
  • Must be able to run on grid and on non-grid
    resources
  • Have less (no!) control at non-dedicated sites
  • Simplify requirements at each site
  • Less tools to install
  • Less configuration
  • Fewer site-local services
  • Simpler recovery procedures
  • Opportunistic approach
  • Assignment ? dataset
  • Several short assignments make one large dataset
  • Allows splitting assignments across sites
  • Absolute central definition of assignment
  • Use RefDB instead of COBRA metadata to control
    run numbers etc
  • Random numbers events per run from RefDB
  • Expect (intend) to allow greater mobility
  • One assignment ! one site

11
Catalogs, File Systems etc
  • A reasonably functioning Storage Element needs
  • A data catalog
  • Replication management
  • Transfer Capabilities
  • On the WAN side at least GridFTP
  • On the LAN side a POSIX compliant I/O (Some more
    discussions here I expect)
  • CMS is installing on CMS Centers SRB to solve
    the immediate data management issues
  • We aim to integrate with an LCG supported SE as
    soon as reasonable functionality exists
  • CMS would like to install dCache on those centers
    doing specialized tasks such as High Luminosity
    Pileup

12
Specifics operations
  • CMKIN/CMSIM
  • Can run now
  • Would help us commission sites with McRunjob etc
  • OSCAR (G4)
  • Need to plug the gaps, be sure everything is
    tightly controlled and recorded from assignment
    request to delivered dataset
  • ORCA (Reconstruction)
  • Presumably need the same control-exercise as for
    OSCAR to keep up with the latest versions
  • Digitisation
  • Dont yet know how to do this with pileup on a
    grid
  • Looks like dCache may be able to replace the
    functionality of AMS/RRP(AMS/RRP is a fabric
    level tool not middleware or experiment
    application)

13
Operations (II)
  • Data-movement will be our killer problem
  • Can easily generate gt 1 TB per day in the RCs
  • Especially if we dont work to reduce the event
    size
  • Can we expect to import 2 TB/day to the T0?
  • Yes, for a day or two, but for several months?
  • Not without negotiating with Castor and the
    network Authorities

14
Timelines
  • Start ramping up standalone production in the
    next 2 weeks
  • Exercise the tools users
  • Bring sites up to speed a few at a time
  • Start deploying SRB and testing it on a larger
    scale
  • Production-wide by end of June
  • Start testing LCG-ltngt production in June
  • Use CMS/LCG-0 as available
  • Expect to be ready for PCP in July
  • Partitioning of HW resources will depend on state
    of LCG and the eventual workload

15
PCP strategy
  • PCP cannot be allowed to fail (no DC04)
  • Hybrid, GRID and non-GRID operation
  • Minimum baseline strategy is to be able to run on
    dedicated, fully controllable resources without
    the need of grid tools (local productions)
  • We plan to use LCG and other Grids wherever they
    can operate with reasonable efficiency
  • Jobs will run in a limited-sandbox
  • input data local to the job
  • local XML POOL catalogue (prepared by the prod.
    tools)
  • output data/metadata and job monitoring data
    produced locally and moved to the site manager
    asynchronously
  • synchronous components optionally update central
    catalogs. If they fail the job will continue and
    the catalogs are updated asynchronously
  • reduce dependencies on external environment and
    improve robustness
  • (Conversely, DC04 can be allowed to fail. It is
    the Milestone)

16
Limited-sandbox environment
Worker Node
Users Site
Job input
Job input
Job Wrapper (job instru- mentation)
User Job
Job output
Job output
Journal writer
Journal
Journal
Remote updater
Asynchronous updater
Metadata DB
  • File transfers, if needed, are managed by
    external tools (EDG-JSS, additional DAG nodes,
    etc...)

17
Grid vs. Local Productions
  • OCTOPUS runs only on the User Interface
  • No CMS know-how is needed at CE/SE
    sites
  • UI installation doesnt require full middleware
  • Little Grid know-how is needed on UI
  • CMS programs (ORCA, OSCAR) pre-installed on CEs
    (RPM, DARPACMAN)
  • A site that does local productions is the
    combination of a CE/SE and a UI that submits only
    to that CE/SE
  • Switch between grid and non-grid production only
    reconfiguring the production software on the UI
  • OCTOPUS Overtly Contrived Toolkit Of Previously
    Unrelated Stuff
  • Soon to have a CVS software repository too
  • Release McRunjob, DAR, BOSS, RefDB, BODE, RMT,
    CMSprod as a coherent whole

18
Hybrid production model
Users Site (or grid UI)
Resources
Production Manager defines assignments
RefDB
Phys.Group asks for an official dataset
shell scripts
Local Batch Manager
JDL
EDG Scheduler
Site Manager starts an assignment
CMS/ LCG-0
MCRunJob
DAGMan (MOP)
User starts a private production
Chimera VDL
Virtual Data Catalogue
Planner
19
A Port in every GRID
  • MCRunJob is compatible with opportunistic use of
    almost any GRID environment
  • We can run purely on top of VDT (USCMS IGT tests,
    Fall 03)
  • We can run in an EDG environment (EDG Stress
    Tests)
  • We can run in CONDOR pools
  • Griphyn/iVDGL have used CMS production as a test
    case
  • (We think this is a common approach of all
    experiments)
  • Within reason if a country/center wants to
    experiment with different components above the
    base GRID we can accommodate that
  • We think it even makes sense for centers to
    explore the tools available!

20
Other Grids
  • The US already deploys a VDT based GRID
    environment that has been very useful for our
    productions(probably not just a US concern)
  • They will want to explore new VDT versions on
    timescale that may be different to LCG
  • For good reason, they may have different higher
    level middleware
  • This makes sense to allow/encourage
  • For the PCP the goal is to get the work done any
    way we can
  • If a region wants to work with LCG1 subsets or
    extensions so be it
  • (Certainly in the US case there will also be pure
    LCG functionality)
  • There is not one and one only GRID
  • collaborative exploration makes sense
  • But for DC04, we expect to validate pure LCGltngt

21
CMS Development Environment
  • CMS/LCG-0 is a CMS-wide testbed based on the LCG
    pilot distribution, owned by CMS
  • Before LCG-1 (ready in July)
  • gain experience with existing tools before start
    of PCP
  • Feedback to GDA
  • develop productionanalysis tools to be deployed
    on LCG-1
  • test new tools of potential interest to CMS
  • Common environment for all CMS productions
  • Use also as a base configuration for non-grid
    productions

22
dCache
  • What is dCache?
  • dCache is a disk caching system developed at DESY
    as a front end for Mass Storage Systems
  • It now has significant developer support from
    FNAL and is used in several running experiments
  • We are using it as a way to utilize disk space
    on the worker nodes and efficiently supply data
    in intense applications like simulation with
    pile-up.
  • Applications access the data in d-cache space
    over a POSIX compliant interface. The d-cache
    directory (/pnfs) from the user perspective looks
    like any other cross mounted file system
  • Since this was designed as a front-end to MSS,
    once closed, files cannot be appended
  • Very promising set of features for load balancing
    and error recovery
  • dCache can replicate data between servers if the
    load is too high
  • if a server fails, dCache can create a new pool
    and the application can wait until data is
    available.

23
High Luminosity Pile-Up
  • This has been a bugbear of all our productions.
    It needs specially configured data serving (Not
    for every center)
  • Tried to make a realistic test. In the interest
    of time we generated a fairly small minimum bias
    dataset.
  • Created 10k cmsim minimum bias events writing the
    fz files directly into d-cache.
  • Hit Formatted all events (4 times).
  • Experimented with writing events to d-cache and
    local disk
  • Placed all ROOT-IO files for Hits, THits, MCInfo,
    etc. into d-cache and the META data in local
    disk.
  • Used soft links from the local disk to /pnfs
    d-cache space
  • Digi Files are written into local disk space
  • Minimum bias Hit files are distributed across the
    cluster

24
writeAllDigis Case
  • Pile-up is the most complicated case
  • Pile-up events are stored across the pools
  • Many applications can be running in parallel each
    writing to their own metadata but reading the
    same minimum bias ROOT-IO files

Pool Node
Pool Node
Pool Node
Pool Node
PNFS
Pool Node
Worker Node
libpdcap.so
Pool Node
writeAllDigis
Pool Node
Pool Node
Dcache Server
Local Disk
Pool Node
25
dCache Request
  • dCache successfully made use of disk space from
    the computing elements, efficiently reading and
    writing events for CMSIM
  • No drawbacks were observed
  • Writing into d-Cache with writeHits jobs worked
    properly
  • Performance is reduced for writing metadata,
    reading fz files worked well
  • The lack of ability to append files makes storing
    metadata inconvenient
  • Moving closed files into d-cache and soft linking
    works fine
  • High Luminosity pile-up worked very well reading
    events directly from d-cache
  • Data Rate is excellent
  • System stability is good
  • Tests were quite successful and generally quite
    promising
  • We think this could be useful at many centers
    (But we are not requiring this)
  • but it looks like it will be required at least at
    the centers where we do Digitization (Typically
    at some Tier -1s)
  • But can probably be setup in a corner if it is
    not part of the center model

26
Criteria for LCG1 Evaluation
  • We can achieve a reasonable fraction of the PCP
    on LCG resources
  • We continue to see a tendency to reducing the
    (CMS) expertise required at each site
  • We see the ability to handle the DC04 scales
  • 50-100k files
  • 200TB
  • 1 MSI2k
  • 10-20 users in PCP, 50-100 in DC04 (From any
    country, using any resource)
  • Final criteria for DC04/LCG success will be set
    in September/October

27
Summary
  • Deploying LCG-1 will is clearly a major task
  • expansion must be clearly controlled expand at
    constant efficiency
  • CMS needs LCG-1 for DC04 itself
  • and to prepare for DC04 during Q3Q4 of this year.
  • The pre-challenge production will use LCG-1 as
    much as feasible
  • help burn-in LCG-1
  • PCP can run 'anywhere', not critically dependent
    on LCG-1
  • CMS-specific extensions to LCG-1 environment will
    be minimal and non-intrusive to LCG-1 itself
  • SRB CMS-wide for the PCP
  • dCache where we want to digitize with pileup
Write a Comment
User Comments (0)
About PowerShow.com