Alice DC Status - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Alice DC Status

Description:

Title: Offline News Author: Federico Carminati Last modified by: Piergiorgio Cerello Created Date: 3/8/2004 11:18:24 AM Document presentation format – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 39
Provided by: FedericoC6
Category:

less

Transcript and Presenter's Notes

Title: Alice DC Status


1
Alice DC Status
  • P. Cerello
  • March 19th, 2004

2
Summary
  • Status of AliRoot
  • Status of AliEn
  • Physics Data Challenge
  • Conclusions

3
AliRoot layout
G3
G4
FLUKA
ISAJET
AliEn
AliRoot
Virtual MC
HIJING
EVGEN
MEVSIM
HBTAN
STEER
PYTHIA6
PDF
EMCAL
ZDC
ITS
PHOS
TRD
TOF
RICH
PMD
HBTP
CRT
FMD
MUON
TPC
START
RALICE
STRUCT
ROOT
4
AliRoot Current status
  • Major changes in the last year
  • New multi-file I/O finally in full production
  • New coordinate system (and we survived!)
  • New reconstruction and simulations drivers
  • First attempt at the ESD and analysis framework
  • Improvements in reconstruction and simulation
  • Clearly the system works well, however many
    changes to come
  • ESD the philosophy is still evolving
  • Introduction of FLUKA and new geometrical
    modeller
  • Development of the analysis framework
  • Raw data for all the detectors
  • Introduction of the condition database
    infrastructure

5
Software Development Process
  • ALICE opted for a light core CERN offline team
  • Concentrate on framework, software distribution
    and maintenance
  • plus some people from the collaboration
  • GRID coordination (Torino), World Computing Model
    (Nantes), Detector Construction Database
    (Warsaw), Web and VMC (La Habana)
  • Close integration with physics!
  • The ALICE Physics Coordinator is also a member of
    the offline team
  • A development cycle adapted to ALICE
  • Developers work on the most important feature at
    any moment
  • A stable production version exists
  • Collective ownership of the code
  • Flexible release cycle and simple packaging and
    installation
  • Micro-cycles happen continuously, macro-cycles
    2-3 times per year
  • Discussed implemented at Off-line meetings and
    Code Reviews

6
The ALICE Approach (AliEn)
  • Standards are now emerging for the basic building
    blocks of a GRID
  • There are millions lines of code in the OS domain
    dealing with these issues
  • Why not using these to build the minimal GRID
    that does the job?
  • Fast development of a prototype, no problem in
    exploring new roads, restarting from scratch etc
    etc
  • Hundreds of users and developers
  • Immediate adoption of emerging standards
  • An example, AliEn by ALICE (5 of code developed,
    95 imported)

7
AliEn Timeline
Functionality Simulation
Interoperability Reconstruction
Performance, Scalability, Standards Analysis
8
AliEn ROOT (A)

?
provides
Analysis Macro
Input Files
Query for Input Data
new TAliEnAnalysis Object
USER

List of Input Data Locations
produces
Job Splitting
IO Object 1 for Site BI
IO Object 1 for Site C
IO Object 1 for Site A
IO Object 2 for Site A
Job Submission




Job Object 1 for Site B
Job Object 1 for Site A
Job Object 2 for Site A
Job Object 1 for Site C
Execution
Histogram Merging Tree Chaining
Results
9
PROOF of AliEn (B)
PROOF uses AliEn Grid File Catalogue and Data
Management to map LFNs to a chain of PFNs and
Workload Management to detect which nodes in a
cluster can be used in a parallel session
Nice! Now I can finally analyze my datasets on
the Grid and produce a histogram. And it is fast
too!
The PROOF system allows parallel analysis of
objects in a set of files parallel execution of
scripts on clusters of heterogeneous machines
10
ALICE Physic Data Challenges
Period (milestone) Fraction of the final capacity () Physics Objective
06/01-12/01 1 pp studies, reconstruction of TPC and ITS
06/02-12/02 5 First test of the complete chain from simulation to reconstruction for the PPR Simple analysis tools Digits in ROOT format
01/04-06/04 10 Complete chain used for trigger studies Prototype of the analysis tools Comparison with parameterised MonteCarlo Simulated raw data
01/06-06/06 20 Test of the final system for reconstruction and analysis
11
PDC 3 schema
AliEn job control
Data transfer
Production of RAW
Shipment of RAW to CERN
Reconstruction of RAW in all T1s
CERN
Analysis
Tier2
Tier1
Tier2
Tier1
12
Merging
Mixed signal
13
AliEn, Genius EDG/LCG seen by ALICE
User submits jobs
Server
Alien CE LCG UI
Alien CEs/SEs
LCG RB
LCG CEs/SEs
LCG PFN
Catalog
Catalog
LCG LFN
LCG PFN AliEn LFN
14
AliEn EDG Interface
Mar, 11th, 2003 first AliRoot job, driven by
AliEn, run on EDG
Status report
15
ALICE PDC-3 LCG
  • All the production will be started via AliEn, the
    analysis will be done via Root/Proof/AliEn
  • LCG-2 will be one CE element of AliEn, which will
    integrate seamlessly LCG and non LCG resources
  • If LCG-2 works well, it will suck a large amount
    of jobs, and it will be used heavily
  • If LCG-2 does not work well, AliEn will privilege
    other resources, and it will be less used
  • In all cases we will use LCG-2 as much as
    possible
  • We will not need to take any decision the
    performance of the system will decide for us
  • The figure of merit will be

16
AliEn LCG Data Challenge
Alien CE/SE
A User submits jobs
Alien CE/SE
Submission
Alien CE/SE
Server
LCG CE/SE
Alien CE LCG UI
Catalog
LCG CE/SE
LCG RB
Catalog
LCG CE/SE
17
AliEn LCG Interface
  • Remote AliEn and AliRoot installation OK on all
    LCG-2 sites
  • Job management interface works with no real
    problem
  • No reliable SE available on the LCG production
    infrastructure
  • generated data is always moved to CERN CASTOR as
    soon as the job finishes, using AliEn tools
    (AIOd).
  • An interface to LCG storage is anyhow available,
    and it will be tested as soon as LCG provides
    storage support on the EIS testbed.

18
Software Installation on LCG
  • Via LCG jobs
  • VO_ALICE_SW_DIR/root/v3-10-02/

  • geant3/v0-6/

  • aliroot/v4-01-Rev-00/

  • alien/

  • AliEn/

LCG site
installAlice.sh
installAlice.jdl
LCG site
LCG-UI
LCG site
installAliEn.sh
LCG site
installAliEn.jdl
LCG site
19
First Event Round on LCG
Submitted OK Aborted by LCG Zombi Aborted by AliEn Still runinng
Friday batch 480 157 5 201 117 0
Sunday batch 250 149 0 0 1 100
  • OK as reported by AliEn. Output transfered to
    CERN CASTOR and registered on AliEn Data
    Catalogue
  • Aborted by LCG reported as Aborted by LB.
  • Zombi lost contact between AliEn and the job.
    All due to server and gateway restarts, many
    probably finished correctly on LCG.
  • Aborted by AliEn failed. Many due to server and
    gateway problems since then fixed.
  • Still running As reported by AliEn on Sunday,
    Feb, 29th, 5 p.m.

20
Short history
  • Jan 03 Requirements for ALICE PDC04 presented to
    PEB
  • End Dec 03 Announcement of LCG-2 by mid February
    2004
  • Beg Jan 04 Decision to delay PDC04 by one month
    waiting for LCG-2
  • End Jan 04 LCG announces that there will be no
    SE in LCG-2
  • Beg Feb 04 The WAN resources allocated by LCG
    for data storage are insufficient/inadequate
  • Mid Feb 04 Development of an ALICE solution,
    developed in haste and working against all odds!
  • End Feb 04 IT has also come up with a solution
    responding to a CMS requirement
  • End Feb 04 Production started, new sites being
    added
  • Confusing that during all this time LCG-2 has
    been declared ready for ALICE on a day-by-day
    basis!
  • Beg Mar 04 castor database has to be reinstalled
    (running on Linux 6.2!)
  • Beg Mar 04 castor servers have to be reinstalled
    for security
  • Beg Mar 04 LCG RB works differently on the
    different centres. CNAF has to be switched on and
    off by hand, otherwise it swallows all the
    jobs!
  • Beg Mar 04 we are getting now close to 10 TB, 30
    were promised by LCG on 1/1/04
  • Mid Mar 04 Files on the IT-provided pool are
    erased before being copied on tape(!)
  • 18 Mar 04 restart production insert Grid.it

21
Shapshot on Mar, 16th
  • file///C/Documents20and20Settings/Piergiorgio
    20Cerello/My20Documents/Alice/AlienControls.htm

22
Data Challenge Statistics
  • First round, closed on Mar 16th

23
Data Challenge Statistics
  • First round, closed on Mar 16th

24
Data Challenge Statistics
  • First round, closed on Mar 16th

25
DC Monitoring http//alien.cern.ch
  • Monalisa http//aliens3.cern.ch8080

26
Shapshot on Mar, 18th
  • file///C/Documents20and20Settings/Piergiorgio
    20Cerello/My20Documents/Alice/AlienControls2.htm

27
Data Challenge Statistics
  • FirstSecond round, started on Mar 18th 1713
    jobs

28
Data Challenge Statistics
  • FirstSecond round, started on Mar 18th 1051,
    680

29
Data Challenge Statistics
  • FirstSecond round, started on Mar 18th 592,
    476

30
Present Status
  • AliEn native sites
  • CERN, CNAF, Cyfronet, Catania, FZK, JINR, LBL,
    Lyon, OSC, Prague, Torino
  • LCG-2 sites
  • CERN, CNAF, RAL ok (up to 400 concurrent jobs)
  • FZK problems with installation, solved as of
    mar, 18th
  • NIKHEF old version of aliroot in PATH solved
    as of mar,18th
  • TAIWAN intermittent problems (network?)
  • Fermilab not an Alice site
  • Grid.it sites
  • Installation (aliroot AliEn) ok everywhere but
    Bo
  • In production as of mar, 18th
  • Ba, Ct, Fe, LNL, Pd, To ok
  • Bo-INGV, Pi, not seen by RB
  • Bo, Rm minor installation problems
  • Mar, 19th, 0030 Ba 1, Ct 7, Fe 7, LNL 97, Pd
    70, To 17 199 running jobs

31
Double access _at_ CNAF
WN
A User submits jobs
Alien/CNAF CE/SE
WN
Submission
Server
WN
Alien CE LCG UI
LCG/CNAF CE/SE
WN
LCG RB
WN
32
Remarks
  • First GRID production with fully transparent
    common access to different middlewares (AliEn
    LCG)
  • Relevant improvement in the LCG stability (450/12
    hours wrt. 450/2 months)
  • AliEn LCG load is about 50-50
  • Optimal situation wrt any other choice (AliEn
    only or LCG only) the availability of resources
    is doubled
  • There is room for improvement (on both sides)
  • but
  • The Data Challenge started well, altough it is
    just at the beginning
  • We hope in the continued support from LCG
  • And centres should provide us with the promised
    resources
  • AliEn already provides functionality for
    distributed analysis
  • LCG/ARDA will improve it

33
Conclusions
  • ALICE has solutions that are evolving into a
    solid computing infrastructure
  • Major decisions have been taken and users have
    adopted them
  • Collaboration between physicists and computer
    scientists is excellent
  • The tight integration with ROOT allows a fast
    prototyping and development cycle
  • AliEn goes a long way toward providing a GRID
    solution adapted to HEP needs
  • It allowed us to do large productions with very
    few people in charge
  • Many ALICE-developed solutions have a high
    potential to be adopted by other experiments and
    indeed are becoming common solutions

34
(No Transcript)
35
AliEn
1
1
1. lookup
1..n
3. register
2. authenticate
1..n
1
API
4. bind
1
0..n
0..n
1
0..n
1..n
1
1
0..n
1
0..n
1
1
1
1
0..n
36
ARDA in a nutshell
Long they laboured in the regions of Eä, which
are vast beyond the thought of Elves and Men,
until in the time appointed was made Arda... -
J.R.R Tolkien, Valaquenta
  • ARDA RTAG
  • Found AliEn the most complete system among all
    considered in Sep 03
  • Suggested a fast prototype in 6 months
  • Six months went to calm the turmoil spurred by
    this report!
  • ARDA is now started as suggested by the report
  • At least so we hope!
  • ARDA, if successful, will form the basis for the
    EGEE MW

37
AliEn (ARDA)
38
ROOT, ALICE LCG
  • LCG has brought support for ROOT and FLUKA
  • We will continue to develop our system
  • Providing basic technology,e.g. VMC and
    geometrical modeller
  • and we will try to collaborate with LCG
    wherever possible
  • Possible convergence in the simulation area,
    collaboration on simple benchmarks
  • We have proposed to base LCG on ROOT and AliEn
  • LCG established a client-provider relationship
    with ROOT, which is rapidly evolving
  • Is now adopting AliEn via ARDA/EGEE
  • LCG decided to develop alternatives for some ROOT
    elements or hide them with interfaces
  • We expressed our worries
  • No time to develop and deploy a new system
  • Duplication and dispersion of efforts
  • Divergence with the rest of HEP
  • We will keep looking for opportunities to
    collaborate
Write a Comment
User Comments (0)
About PowerShow.com