Title: Piergiorgio Cerello
1Alice Computing vs. Grids
- Piergiorgio Cerello
- on behalf of the ALICE
Collaboration - Workshop Commissione Calcolo
- Paestum, June 12, 2003
2Alice collaboration
online system multi-level trigger filter out
background reduce data volume
Total weight 10,000t Overall diameter
16.00m Overall length 25m Magnetic Field 0.4Tesla
8 kHz (160 GB/sec)
level 0 - special hardware
200 Hz (4 GB/sec)
level 1 - embedded processors
30 Hz (2.5 GB/sec)
level 2 - PCs
30 Hz (1.25 GB/sec)
data recording offline analysis
3AliRoot layout
G3
G4
FLUKA
ISAJET
HIJING
AliRoot
AliEn
Virtual MC
EVGEN
MEVSIM
HBTAN
STEER
PYTHIA6
PDF
EMCAL
ZDC
ITS
PHOS
PMD
TRD
TOF
RICH
HBTP
CRT
FMD
MUON
TPC
STRUCT
START
RALICE
ROOT
4The ALICE Framework
- AliRoot
- C 400kLOC 225kLOC (generated) macros
77kLOC - FORTRAN 13kLOC (ALICE) 914kLOC (external
packages) - Maintained on Linux (any version!), HP-UX, DEC
Unix, Solaris - Works also with Intel icc compiler
- Two packages to install (ROOTAliRoot) MCs
- Less that 1 second to link (thanks to 37 shared
libs) - 1-click-away install download and make
(non-recursive makefile) - AliEn
- 25kLOC of PERL5 (ALICE)
- 2MLOC mostly PERL5 (opens source components)
- Installed on almost 50 sites by physicists
- gt50 users develop AliRoot from detector groups
- 70 of code developed outside, 30 by the core
Offline team
5AliRoot evolution schema
Alice Offline Framework
Persistency Root output file
Strategic decision in 1998
6The Virtual MC
- Indicated as a technology of choice by the LHC
Simulation Project
7(No Transcript)
8AliRoot on the Grid
- Minimum dependencies easy to distribute and
install - Portable no OS/compiler problems
- Dynamic installation (ROOT AliRoot)
- as a Grid job
successfully tried - C API easily linked to the code (Data Access)
- No change in the framework to run on the Grid
- Changes might be required in the future, related
to data access - Optimise analysis use cases
- Manage the access to distributed input (API)
- Run interactively with PROOF
9AliEn progress
- 32 sites configured
- 5 sites providing mass storage
- 12 production rounds
- 22773 jobs validated, 2428 failed (10)
- Up to 450 concurrent jobs
- 0.5 operators
10AliEn as a meta-GRID
AliEn User Interface
11AliEn EDG Interface
- An interface site is an EDG User Interface
machine which runs the AliEn Client suite
(ClusterMonitor, CE and SE) - The interface client pulls a job from the
server, generates an appropriate jdl file and
submits it to a RB - When the job arrives on a WN, it starts AliEn
reporting directly to the AliEn server.
12AliEn EDG Interface
Mar, 11th first AliRoot job, driven by AliEn,
run on EDG
Status report
13AliEnFS Distributed Analysis
14Parallel Analysis of Event Data
proof.conf slave node1 slave node2 slave
node3 slave node4
Remote PROOF Cluster
Local PC
root
.root
node1
ana.C
.root
root
node2
root root 0 tree.Process(ana.C)
root root 0 tree.Process(ana.C) root 1
gROOT-gtProof(remote)
root root 0 tree-gtProcess(ana.C) root 1
gROOT-gtProof(remote) root 2
dset-gtProcess(ana.C)
.root
node3
.root
node4
15GARR network stress-tests
ALICE Bologna, Cagliari, Catania, CNAF,
Padova, Torino, Trieste
- Simulate a random analysis by several users
- Check whether present/foreseen bandwidths can
cope with short/medium term ALICE needs - Is automatic replication a network-killer?
- Find out bottle-necks in disk-to-disk transfers
16Tier-1 Typical Workflow
17Real-life multi-tier Topology
(ALICE HBT production, 5000 events, 9 TB)
1 MB in 50 MB out
18Preparation and Tools
- SSH keys exchanged among all machines to secure
file transfers without typing passwords - Automatic procedure installed on all machines
- random time delay (uniform between 0 and
customizable maximum - 1 min and 5 mins tried so
far) - random selection of the destination server
(weight proportional to the maximum bandwidth to
the destination site) - random selection of the file size (1.6 GB, 0.8
GB, and 0.3 GB) - send and retrieve the file using bbFTP with a
customizable number of parallel streams (8, 16,
...) - Report on a log file
19Results
20Bandwidth Measurements
Iperf-1.6.3
Netperf-2.1
21What have we learnt?
- Random, uncoordinated analysis is not compatible
with network conditions - Data re-clustering input for algorithm
developments? - Parallel remote processing (PROOF)
- Job splitting (AliEn, LCG) output plug-in
merging module (ALICE) - Actual bandwidth measurements might differ from
nominal values - Useful information on the farm architecture
(limits of NFS)
22ALICE Physics Data Challenges
- Verify model and computing framework
- Reduce the technological risk
- Understand physics potentialities of the detector
- Prepare code for simulation, reconstruction and
analysis
23Conclusions
- AliRoot is evolving into a solid computing
infrastructure - Tight integration with ROOT fast prototyping and
development cycle - Virtual MC interface
- AliEn provides a complete GRID solution adapted
to HEP needs and it allowed us large productions
with very few people in charge - it is interfaced with EDG/LCG
- Strategy resources driven by AliEn, directly or
indirectly (LCG) - In our opinion, many ALICE-developed solutions
have a high potential to be adopted by other
experiments and become common solutions
24ALICE LCG
- LCG is the LHC Computing Project
- The objective is to build the computing
environment for LHC - ALICE has lobbied strongly to base the LCG
project on - ROOT and AliEn
- Other experiments chose
- to establish a client-provider relationship with
ROOT - to develop alternatives for some of existing ROOT
elements or hide them behind abstract interfaces - to use the result of GRID MiddleWare projects
- ALICE has expressed its worries
- Little time to develop and deploy a new system
- Duplication and dispersion of efforts
- ALICE will continue to develop its system
- to provide basic technology, i.e. VMC and
geometrical modeller - to maintain/develop the interface(s) to other
Grid middleware - and it will try to collaborate with LCG
wherever possible