Title: HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF)
1HEP Use Cases for Grid Computing J. A.
TemplonUndecided (NIKHEF)
Grid Tutorial, NIKHEF Amsterdam, 3-4 June 2004
www.eu-egee.org
EGEE is a project funded by the European Union
under contract IST-2003-508833
2Contents
- The HEP Computing Problem
- How it matches the Grid Computing Idea
- Some HEP Use Cases Approaches
3Our Problem
- Place event info on 3D map
- Trace trajectories through hits
- Assign type to each track
- Find particles you want
- Needle in a haystack!
- This is relatively easy case
4More complex example
5Data Handling and Computation for Physics Analysis
event filter (selection reconstruction)
detector
processed data
event summary data
raw data
batch physics analysis
event reprocessing
analysis objects (extracted by physics topic)
event simulation
interactive physics analysis
6Scales
- To reconstruct and analyze 1 event takes about 90
seconds - Maybe only a few out of a million are
interesting. But we have to check them all! - Analysis program needs lots of calibration
determined from inspecting results of first pass. - ?Each event will be analyzed several times!
7One of the four LHC detectors
online system multi-level trigger filter out
background reduce data volume
8Scales (2)
- 90 seconds per event to reconstruct and analyze
- 100 incoming events per second
- To keep up, need either
- A computer that is nine thousand times faster, or
- nine thousand computers working together
- Moores Law wait 20 years and computers will be
9000 times faster (we need them in 2007!)
9Computational Impli,Complications
- Four LHC experiments roughly 36k CPUs needed
- BUT accelerator not always on need fewer
- BUT multiple passes per event need more!
- BUT havent accounted for Monte Carlo production
more!! - AND havent addressed the needs of physics
users at all!
10LHC User Distribution
11Classic Motivation for Grids
- Trivially parallel problem
- Large Scales 100k CPUs, petabytes of data
- (if were only talking ten machines, who cares?)
- Large Dynamic Range bursty usage patterns
- Why buy 25k CPUs if 60 of the time you only need
900 CPUs? - Multiple user groups ( purposes) on single
system - Cant hard-wire the system for your purposes
- Wide-area access requirements
- Users not in same lab or even continent
12Solution using Grids
- Trivially parallel break up problem
appropriate-sized pieces - Large Scales 100k CPUs, petabytes of data
- Assemble 100k CPUs and petabytes of mass storage
- Dont need to be in the same place!
- Large Dynamic Range bursty usage patterns
- When you need less than you have, others use
excess capacity - When you need more, use others excess capacities
- Multiple user groups on single system
- Generic grid software services (think web
server here) - Wide-area access requirements
- Public Key Infrastructure for authentication
authorization
13HEP Use Cases
- Simulation
- Data (Re)Processing
- Physics Analysis
General ideas presented here contact us for
detailed info
14Simulation
- The easiest use case
- No input data
- Output can be to a central location
- Bookkeeping not really a problem (lost jobs OK)
- Define program version and parameters
- Tune of events produced per run to reasonable
value - Submit (needed ev)/(ev per job) jobs
- Wait
15Data (Re)Processing
- Quite a bit more challenging there are input
files, and you cant lose jobs - One job per input file (so far)
- Data distribution strategy
- Monitoring and bookkeeping
- Software distribution
- Traceability of output (provenance)
16km3net Reconstruction Model
- Distributed Event Database?
- Auto Distributed Files?
- Single Mass Store Thermal Grid?
Grid useful here get a lot but only when you
need it!
Grid data model applicable, but maybe not
computational model
Distribute from shore station? Or dedicated line
to better-connected location, distribute from
there??
gt 1000 CPUs
1 Mb/s
This needs work!! 2 Gbit/s is not a problem but
you want many x 80 Gbit/s!
L1 Trigger
StreamService
10 Gb/s
Mediterranean
Raw Data Cache
Dual 1TB Circular Buffers?
gt 1 TB
17Directed Acyclic Graphs
HEP Analysis Model Idea
18Conclusions
- HEP Computing well-suited to Grids
- HEP is using Grids now
- There is a lot of (fun) work to do!