The ARDA project: Grid analysis prototypes of the LHC experiments Massimo Lamanna ARDA Project Leade - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

The ARDA project: Grid analysis prototypes of the LHC experiments Massimo Lamanna ARDA Project Leade

Description:

EGEE is a project funded by the European Union under contract IST-2003-508833 ' ... naturally from the fact that ARDA will be open to provide demonstration benches ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 24
Provided by: massimo9
Category:

less

Transcript and Presenter's Notes

Title: The ARDA project: Grid analysis prototypes of the LHC experiments Massimo Lamanna ARDA Project Leade


1
The ARDA project Grid analysis prototypes of
the LHC experiments Massimo LamannaARDA
Project LeaderMassimo.Lamanna_at_cern.ch
RAL, 13 May 2004
http//cern.ch/arda
www.eu-egee.org
cern.ch/lcg
EGEE is a project funded by the European Union
under contract IST-2003-508833
2
Contents
  • ARDA Project
  • Mandate and organisation
  • ARDA activities during 2004
  • General pattern
  • LHCb
  • CMS
  • ATLAS
  • ALICE
  • Conclusions and Outlook

3
ARDA working group recommendations our starting
point
  • New service decomposition
  • Strong influence of Alien system
  • the Grid system developed by the ALICE
    experiments and used by a wide scientific
    community (not only HEP)
  • Role of experience, existing technology
  • Web service framework
  • Interfacing to existing middleware to enable
    their use in the experiment frameworks
  • Early deployment of (a series of) prototypes to
    ensure functionality and coherence

EGEE Middleware
ARDA project
4
EGEE and LCG
  • Strong links already established between EDG and
    LCG. It will continue in the scope of EGEE
  • The core infrastructure of the LCG and EGEE grids
    will be operated as a single service, and will
    grow out of LCG service
  • LCG includes many US and Asia partners
  • EGEE includes other sciences
  • Substantial part of infrastructure common to both
  • Parallel production lines as well
  • LCG-2
  • 2004 data challenges
  • Pre production prototype
  • EGEE MW
  • ARDA playground for the LHC experiments

ARDA
5
End-to-end prototypes why?
  • Provide a fast feedback to the EGEE MW
    development team
  • Avoid uncoordinated evolution of the middleware
  • Coherence between users expectations and final
    product
  • Experiments ready to benefit from the new MW as
    soon as possible
  • Frequent snapshots of the middleware available
  • Expose the experiments (and the community in
    charge of the deployment) to the current
    evolution of the whole system
  • Experiments system are very complex and still
    evolving
  • Move forward towards new-generation real systems
    (analysis!)
  • Prototypes should be exercised with realistic
    workload and conditions
  • No academic exercises or synthetic demonstrations
  • LHC experiments users absolutely required here!!!
    EGEE Pilot Application
  • A lot of work (experience and useful software) is
    involved in current experiments data challenges
  • Concrete starting point
  • Adapt/complete/refactorise the existing we do
    not need another system!

6
End-to-end prototypes how?
  • The initial prototype will have a reduced scope
  • Components selection for the first prototype
  • Experiments components not in use for the first
    prototype are not ruled out (and used/selected
    ones might be replaced later on)
  • Not all use cases/operation modes will be
    supported
  • Every experiment has a production system (with
    multiple backends, like PBS, LCG, G2003,
    NorduGrid, ). We focus on end-user analysis on a
    EGEE MW based infrastructure
  • Adapt/complete/refactorise the existing
    experiment (sub)system!
  • Collaborative effort (not a parallel development)
  • Attract and involve users
  • Many users are absolutely required
  • Informal Use Cases are still being defined, e.g.
  • A physicist selects a data sample (from current
    Data Challenges)
  • With an example/template as starting point (s)he
    prepares a job to scan the data
  • The job is split in sub-jobs, dispatched to the
    Grid, some error-recovery is automatically
    performed, merged back in a single output
  • The output (histograms, ntuples) is returned
    together with simple information on the job-end
    status

7
ARDA _at_ Regional Centres
  • Deployability is a key factor of MW success
  • A few Regional Centres will have the
    responsibility to provide early installation for
    ARDA
  • Understand Deployability issues
  • Extend the ARDA test bed
  • The ARDA test bed will be the next step after the
    most complex EGEE Middleware test bed
  • Stress and performance tests could be ideally
    located outside CERN
  • This is for experiment-specific components (e.g.
    a Meta Data catalogue)
  • Leverage on Regional Centre local know how
  • Data base technologies
  • Web services
  • Pilot sites might enlarge the resources available
    and give fundamental feedback in terms of
    deployability to complement the EGEE SA1
    activity (EGEE/LCG operations)
  • Running ARDA pilot installations
  • Experiment data available where the experiment
    prototype is deployed

8
Coordination and forum activities
  • The coordination activities would flow naturally
    from the fact that ARDA will be open to provide
    demonstration benches
  • Since it is neither necessary nor possible that
    all projects could be hosted inside the ARDA
    experiments prototypes, some coordination is
    needed to ensure that new technologies can be
    exposed to the relevant community
  • Transparent process
  • ARDA should organise a set of regular meetings
    (one per quarter?) to discuss results, problems,
    new/alternative solutions and possibly agree on
    some coherent program of work.
  • The ARDA project leader organises this activity
    which will be truly distributed and lead by the
    active partners
  • ARDA is embedded in EGEE NA4
  • namely NA4-HEP
  • Special relation with LCG GAG
  • LCG forum for Grid requirements and use cases
  • Experiments representatives coincide with the
    EGEE NA4 experiments representatives
  • ARDA will channel this information to the
    appropriate recipients
  • ARDA workshop (January 2004 at CERN open over
    150 participants)
  • ARDA workshop (June 21-23 at CERN by invitation)
  • The first 30 days of EGEE middleware
  • NA4 meeting mid July (NA4/JRA1 and NA4/SA1
    sessions foreseen.
  • Organised by M. Lamanna and F. Harris)
  • ARDA workshop (September 2004? open)

9
People
  • Massimo Lamanna
  • Birger Koblitz
  • Dietrich Liko
  • Frederik Orellana
  • Derek Feichtinger
  • Andreas Peters
  • Julia Andreeva
  • Juha Herrala
  • Andrew Maier
  • Kuba Moscicki

Russia
  • Andrey Demichev
  • Viktor Pose
  • Wei-Long Ueng
  • Tao-Sheng Chen

ALICE
Taiwan
ATLAS
Experiment interfaces Piergiorgio Cerello
(ALICE) David Adams (ATLAS) Lucia Silvestris
(CMS) Ulrik Egede (LHCb)
CMS
LHCb
10
Example of activity
  • Existing system as starting point
  • Every experiment has different implementations of
    the standard services
  • Used mainly in production environments
  • Few expert users
  • Coordinated update and read actions
  • ARDA
  • Interface with the EGEE middleware
  • Verify (help to evolve to) such components to
    analysis environments
  • Many users
  • Robustness
  • Concurrent read actions
  • Performance
  • One prototype per experiment
  • A Common Application Layer might emerge in future
  • ARDA emphasis is to enable each of the experiment
    to do its job

Very very soon
Already started
11
LHCb
  • The LHCb system within ARDA uses GANGA as
    principal component (see next slide).
  • The LHCb/GANGA plans
  • enable physicists (via GANGA) to analyse the data
    being produced during 2004 for their studies
  • It naturally matches the ARDA mandate
  • Have the prototype where the LHCb data will be
    the key
  • At the beginning, the emphasis will be to
    validate the tool focusing on usability,
    validation of the splitting and merging
    functionality for users jobs
  • The DIRAC system (LHCb grid system, used mainly
    in production so far, could be a useful
    playground to understand the detailed behaviour
    of some components, like the file catalog)

12
GANGAGaudi/Athena aNd Grid Alliance
  • Gaudi/Athena LHCb/ATLAS frameworks
  • The Athena uses Gaudi as a foundation
  • Single desktop for a variety of tasks
  • Help configuring and submitting analysis jobs
  • Keep track of what they have done, hiding
    completely all technicalities
  • Resource Broker, LSF, PBS, DIRAC, Condor
  • Job registry stored locally or in the roaming
    profile
  • Automate config/submit/monitor procedures
  • Provide a palette of possible choices and
    specialized plug-ins (pre-defined application
    configurations, batch/grid systems, etc.)
  • Friendly user interface (CLI/GUI) is essential
  • GUI Wizard Interface
  • Help users to explore new capabilities
  • Browse job registry

GANGA
GUI
Collective Resource Grid Services
Histograms Monitoring Results
JobOptions Algorithms
GAUDI Program
GANGA
UI
Internal Model
BkSvc
WLM
ProSvc
Monitor
Grid Services
Bookkeeping Service
WorkLoad Manager
Profile Service
GAUDI Program
Instr.
File catalog
SE
CE
13
ARDA contribution to Ganga
  • Integration with EGEE middleware
  • Waiting for the EGEE middleware, we developed an
    interface to Condor
  • Use of Condor DAGMAN for splitting/merging and
    error recovery capability
  • Design and Development
  • Command Line Interface
  • Future evolution of Ganga
  • Release management
  • Software process and integration
  • Testing, tagging policies etc.
  • Infrastructure
  • Installation, packaging etc.

14
LHCb Metadata catalog
  • Used in production (for large productions)
  • Web Service layer being developed (main
    developers in the UK)
  • Oracle backend
  • ARDA contributes a testing focused on the
    analysis usage
  • Robustness
  • Performances under high concurrency (read mode)

Measured network rate vs no. of concurrent
clients
15
CERN/Taiwan tests
Client
Network monitor
Virtual Users
Bookkeeping Server
  • CPU Load
  • Network
  • Process time
  • Web XML-RPC Service performance tests
  • CPU Load
  • Network
  • Process time

Oracle DB
CERN
Bookkeeping Server
  • Clone Bookkeeping DB in Taiwan
  • Install the WS layer
  • Performance Tests
  • Database I/O Sensor
  • Bookkeeping Server performance tests
  • Taiwan/CERN Bookkeeping Server DB
  • XML-RPC Service performance tests
  • CPU Load, Network send/receive sensor, Process
    time
  • Client Host performance tests
  • CPU Load, Network send/receive sensor, Process
    time
  • DB I/O Sensor

TAIWAN
Oracle DB
16
CMS
  • The CMS system within ARDA is still under
    discussion
  • Provide easy access (and possibly sharing) of
    data for the CMS users is a key issue
  • RefDB is the bookkeeping engine to plan and steer
    the production across different phases
    (simulation, reconstruction, to some degree into
    the analysis phase)
  • It contained all necessary information except
    file physical location (RLS) and info related to
    the transfer management system (TMDB)
  • The actual mechanism to provide these data to
    analysis users is under discussion
  • Measuring performances underway (similar
    philosophy as for the LHCb Metadata catalog
    measurements)

RefDB in CMS DC04
RefDB
Reconstruction instructions
Summaries of successful jobs
Reconstruction jobs
McRunjob
T0 worker nodes
Transfer agent
Reconstructed data
Checks what has arrived
GDB castor pool
Updates
Updates
Tapes
RLS
TMDB
Reconstructed data
Export Buffers
17
ATLAS
  • The ATLAS system within ARDA has been agreed
  • ATLAS has a complex strategy for distributed
    analysis, addressing different area with specific
    projects (Fast response, user-driven analysis,
    massive production, etc see http//www.usatlas.b
    nl.gov/ADA/)
  • Starting point is the DIAL system
  • The AMI metadata catalog is a key component
  • mySQL as a back end
  • Genuine Web Server implementation
  • Robustness and performance tests from ARDA
  • In the start up phase, ARDA provided some help in
    developing ATLAS production tools
  • Being finalised

18
What is DIAL?
19
AMI studies in ARDA
  • Atlas Metadata- Catalogue, contains File
    Metadata
  • Simulation/Reconstruction-Version
  • Does not contain physical filenames
  • Many problems still open
  • Large network traffic overhead due to schema
    independent tables
  • SOAP proxy supposed to provide DB access
  • Note that Web Services are stateless (not
    automatic handles to have the concept of session,
    transaction, etc) 1 query 1 (full) response
  • Large queries might crashed server
  • Shall proxy re-implement all database
    functionality?
  • Good collaboration in place with ATLAS-Grenoble

User
Meta-Data(MySQL)
User
SOAP-Proxy
User
  • Studied behaviour using many concurrent clients

20
ALICE Grid enabled PROOF SuperComputing 2003
(SC2003) Demo
Site C
Site B
Site A
PROOF SLAVES
TcpRouter
  • Strategy
  • The ALICE/ARDA will evolve the analysis system
    presented by ALICE at SuperComputing 2003
  • With the new EGEE middleware (at SC2003, AliEn
    was used)
  • Activity on PROOF
  • Robustness
  • Error recovery

TcpRouter
PROOF MASTER SERVER
TcpRouter
USER SESSION
21
ALICE-ARDA prototype improvements
  • SC2003
  • The setup was heavily connected with the
    middleware services
  • Somewhat inflexible configuration
  • No chance to use PROOF on federated grids like
    LCG in AliEn
  • TcpRouter service needs incoming connectivity in
    each site
  • Libraries can not be distributed using the
    standard rootd functionality
  • Improvement ideas
  • Distribute another daemon with ROOT, which
    replaces the need for aTcpRouter service
  • Connect each slave proofd/rootd via this daemon
    to two central proofd/rootd master multiplexer
    daemons, which run together with the proof
    master
  • Use Grid functionality for daemon start-up and
    booking policies througha plug-in interface from
    ROOT
  • Put PROOF/ROOT on top of the grid services
  • Improve on dynamic configuration and error
    recovery

22
ALICE-ARDA improved system
Proxy proofd
Proxy rootd
Grid Services
Booking
  • The remote proof slaves looklike a local proof
    slave onthe master machine
  • Booking service is usable also on local clusters

Master
23
Conclusions and Outlook
  • ARDA is starting
  • Main tool experiment prototypes for analysis
  • Detailed project plan being prepared
  • Good feedback from the LHC experiments
  • Good collaboration with EGEE NA4
  • Good collaboration with Regional Centres. More
    help needed
  • Look forward to contribute to the success of EGEE
  • Helping EGEE Middleware to deliver a fully
    functionally solution
  • ARDA main focus
  • Collaborate with the LHC experiments to set up
    the end-to-end prototypes
  • Aggressive schedule
  • First milestone for the end-to-end prototypes is
    Dec 2004
Write a Comment
User Comments (0)
About PowerShow.com