The ARDA project: Grid analysis prototypes of the LHC experiments Massimo Lamanna ARDA Project Leade - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

The ARDA project: Grid analysis prototypes of the LHC experiments Massimo Lamanna ARDA Project Leade

Description:

EGEE is a project funded by the European Union under contract IST-2003-508833 ' ... From: http://www.webopedia.com. End-to-end prototypes: why? ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 43
Provided by: massimo9
Category:

less

Transcript and Presenter's Notes

Title: The ARDA project: Grid analysis prototypes of the LHC experiments Massimo Lamanna ARDA Project Leade


1
The ARDA project Grid analysis prototypes of
the LHC experiments Massimo LamannaARDA
Project LeaderMassimo.Lamanna_at_cern.ch
DESY, 10 May 2004
http//cern.ch/arda
www.eu-egee.org
cern.ch/lcg
EGEE is a project funded by the European Union
under contract IST-2003-508833
2
Contents
  • A bit of history
  • LHC experiments and the LCG project
  • EGEE project
  • ARDA Project
  • Mandate and organisation
  • ARDA activities during 2004
  • Now
  • Second half of 2004
  • Conclusions and Outlook

3
LHC Experiments
ATLAS
CMS
Storage Raw recording rate 0.1 1 GByte/s
Accumulating at 5-8 PetaByte/year 10 PetaByte
of disk Processing 200,000 of todays
fastest PCs
LHCb
ALICE
LHCb
4
Multi-Tiered View of LHC Computing
5
The LHC Computing Grid Project
  • Prepare and deploy the computing environment for
    the LHC experiments
  • Common applications, tools, frameworks and
    environments,
  • Move from testbed systems to real production
    services
  • Experiments need a dependable system
  • Operated and supported 24x7 globally
  • Computing fabrics run as production physics
    services
  • Computing environment must be robust, stable,
    predictable, and supportable
  • Foster collaboration, coherence of the LHC
    computing centres
  • LCG is not a grid technology RD project
  • Enable physics data analysis and distributed
    collaboration to a new scale

6
The LHC Computing Grid ProjectPhase 1 and Phase
2
  • Phase 1 2002-05
  • Development and prototyping
  • Approved by CERN Council 20 September 2001
  • Phase 2 2006-08
  • Installation and operation of the full world-wide
    initial production Grid
  • Exploiting Phase 1 experience
  • Costs (materials staff) included in the LHC
    cost to completion estimates

7
The LCG Phase 1 Goals
  • Prepare the LHC computing environment
  • Provide the common tools and infrastructure for
    the physics application software
  • Establish the technology for fabric, network and
    grid management
  • Operate a series of data challenges for the
    experiments
  • Build a solid collaboration and a fertile
    exchange of experience within the community of
    the centres contributing to the LCG.
  • Validate the technology and models by building
    progressively more complex Grid prototypes
  • Develop models for building the Phase 2 Grid
  • Maintain reasonable opportunities for the re-use
    of the results of the project in other fields
  • Deploy a 50 model production GRID including the
    committed LHC Regional Centres
  • Produce a Technical Design Report for the full
    LHC Computing Grid to be built in Phase 2 of the
    project
  • 50 of the complexity of one of the LHC
    experiments

8
Too early?
  • First collisions in Spring 2007
  • 1 year to procure, install, and test the full LHC
    computing fabrics
  • Infrastructure work like civil engineering
    already started
  • The Computing TDR must be ready in mid-2005
  • At least 1 year of experience in operating a
    production grid to validate the computing model
  • Experiments data challenges should run within
    LCG in 2004
  • With a reasonable level of production service
  • How do we evolve the present services (LCG-2)
    into the final system?

9
The EGEE project
  • Create a European-wide Grid production quality
    infrastructure for multiple sciences
  • Profit from current and planned national and
    regional Grid programmes, building on
  • the results of existing projects such as DataGrid
    (EDG), LCG and others
  • EU Research Network and industrial Grid
    developers
  • Support Grid computing needs common to the
    different communities
  • integrate the computing infrastructures and agree
    on common access policies
  • Exploit International connections (US and AP)
  • Provide interoperability with other major Grid
    initiatives such as the US NSF Cyberinfrastructure
    , establishing a worldwide Grid infrastructure
  • Leverage national resources in a more effective
    way
  • 70 leading institutions in 27 countries(including
    Russia and US)

10
EGEE Scope
  • The project started April 2004
  • First phase will last 2 years with EU funding of
    32M
  • Possibility of 2nd phase if successful
  • EGEE Scope ALL-inclusive for academic
    applications
  • Open to industrial and socio-economic world as
    well
  • Industrial participation both as potential
    end-users and IT technology and service suppliers
  • EGEE organises an Industry Forum to keep
    Industrial and Commercial parties in close
    contact
  • Services developed in 2004-5 may be tendered to
    Industry in the second phase (2006-7)
  • The major success criterion of EGEE how many
    satisfied users from how many different domains ?
  • 5000 users from at least 5 disciplines
  • 2 Pilot Application Domains Physics
    Bioinformatics

11
EGEE and LCG
  • Strong links already established between EDG and
    LCG and this approach will continue in the scope
    of EGEE
  • The core infrastructure of the LCG and EGEE grids
    will be operated as a single service, and will
    grow out of LCG service
  • LCG includes US and Asia
  • EGEE includes other sciences
  • Substantial part of infrastructure common to both
  • Parallel production lines
  • LCG-2
  • 2004 data challenges
  • Pre production prototype
  • EGEE MW
  • ARDA playground

12
ARDA working group recommendations
  • New service decomposition
  • Strong influence of Alien system
  • the Grid system developed by the ALICE
    experiments and used by a wide scientific
    community (not only HEP)
  • Role of experience, existing technology
  • Web service framework
  • Interfacing to existing middleware to enable
    their use in the experiment frameworks
  • Early deployment of (a series of) prototypes to
    ensure functionality and coherence

EGEE Middleware
ARDA project
13
Web Services
  • Web servicesThe term Web services describes a
    standardized way of integrating Web-based
    applications using the XML, SOAP, WSDL and UDDI
    open standards over an Internet protocol
    backbone. XML is used to tag the data, SOAP is
    used to transfer the data, WSDL is used for
    describing the services available and UDDI is
    used for listing what services are available.
    Used primarily as a means for businesses to
    communicate with each other and with clients, Web
    services allow organizations to communicate data
    without intimate knowledge of each other's IT
    systems behind the firewall.Unlike traditional
    client/server models, such as a Web server/Web
    page system, Web services do not provide the user
    with a GUI. Web services instead share business
    logic, data and processes through a programmatic
    interface across a network. The applications
    interface, not the users. Developers can then add
    the Web service to a GUI (such as a Web page or
    an executable program) to offer specific
    functionality to users.Web services allow
    different applications from different sources to
    communicate with each other without
    time-consuming custom coding, and because all
    communication is in XML, Web services are not
    tied to any one operating system or programming
    language. For example, Java can talk with Perl,
    Windows applications can talk with UNIX
    applications.N.B. Web services do not require
    the use of browsers or HTML. From
    http//www.webopedia.com

14
End-to-end prototypes why?
  • Provide a fast feedback to the EGEE MW
    development team
  • Avoid uncoordinated evolution of the middleware
  • Coherence between users expectations and final
    product
  • Experiments ready to benefit from the new MW as
    soon as possible
  • Frequent snapshots of the middleware available
  • Expose the experiments (and the community in
    charge of the deployment) to the current
    evolution of the whole system
  • Experiments system are very complex and still
    evolving
  • Move forward towards new-generation real systems
    (analysis!)
  • Prototypes should be exercised with realistic
    workload and conditions
  • No academic exercises or synthetic demonstrations
  • LHC experiments users absolutely required here!!!
  • A lot of work (and useful software) is involved
    in current experiments data challenges this will
    be used as a starting point
  • Adapt/complete/refactorise the existing we do
    not need another system!

15
E2E Prototypes implementation
  • Every experiment has already at least one system
  • Analysis/Production typically distinct entities
  • Using a variety of back-ends (Batch systems,
    different grid systems)
  • ARDA will put its effort on the experiment
    (sub)system the experiment chooses
  • EGEE MW as foundation layer
  • Multigrid interfaces outside our scope
  • Experiments do know how to deal with this
  • By default, we expect 4 systems
  • There is nothing like an ARDA prototype
  • Adapt/complete/refactorise the existing
    (sub)system!
  • Collaborative effort (not a parallel development)
  • Commonality is not ruled out, but it should
    emerge and become attractive for the experiments.
    Anyway not imposed from outside
  • Users users users!!!
  • First important checkpoint December 2004

16
Experiment End-to-End Prototypes
  • The initial prototype will have a reduced scope
  • Components selection for the first prototype
  • Experiments components not in use for the first
    prototype are not ruled out (and used/selected
    ones might be replaced later on)
  • Not all use cases/operation modes will be
    supported
  • Attract and involve users
  • Many users are absolutely required
  • The Use Cases are still being defined
  • Example
  • A physicist selects a data sample (from current
    Data Challenges)
  • With an example/template as starting point (s)he
    prepares a job to scan the data
  • The job is split in sub-jobs, dispatched to the
    Grid, some error-recovery is automatically
    performed, merged back in a single output
  • The output (histograms, ntuples) is returned
    together with simple information on the job-end
    status

17
E2E Prototypes
Experiment software
  • Each experiment chooses the starting point (1
    system)
  • Subset of the existing system
  • Emphasis on analysis
  • EGEE MW as foundation layer
  • There is nothing like an ARDA prototype!
  • Adapt/complete/refactorise the existing one
    together with the experiments teams
  • The initial prototype will have a reduced scope
  • Just the most sensible starting point

Experiment-specific middleware
EGEE Middleware Interface Layer
Other systems in use (LCG2, G2003,
NorduGrid, LSF, PBS, )
Generic middleware
FileCatalog
CE
Workload
FileCatalog
FileCatalog
SE
18
ARDA Project current set up
  • LCG
  • Project leader (Massimo Lamanna/CERN)
  • 4 LCG staff (100 at CERN) matching the 4 EGEE
    staff
  • 1 more staff from LCG (100 at CERN)
  • About 4 FTEs from other sources (20 at CERN)
  • EGEE
  • 4 NA4 staff (100 at CERN)
  • Experiments
  • 4 experiments interfaces
  • Represent the experiments in project definition,
    implementation and evaluation
  • Identify and coordinate the experiment
    contributions
  • analysis groups in the experiments with whom the
    middleware people can work to specify the
    services and validate the implementations
  • upper middleware teams (experiment-specific MW)

Strong link with exp. teams
Strong link with regional centres
Strong link with exp. teams
Users
Exp.System
Strong link with exp. teams
19
People
  • Massimo Lamanna
  • Birger Koblitz
  • Dietrich Liko
  • Frederik Orellana
  • Derek Feichtinger
  • Andreas Peters
  • Julia Andreeva
  • Juha Herrala
  • Andrew Maier
  • Kuba Moscicki

Russia
  • Andrey Demichev
  • Viktor Pose
  • Wei-Long Ueng
  • Tao-Sheng Chen

ALICE
Taiwan
ATLAS
Experiment interfaces Piergiorgio Cerello
(ALICE) David Adams (ATLAS) Lucia Silvestris
(CMS) Ulrik Egede (LHCb)
CMS
LHCb
20
ARDA _at_ Regional Centres
  • Deployability is a key factor of MW success
  • A few Regional Centres will have the
    responsibility to provide early installation for
    ARDA
  • Understand Deployability issues
  • Extend the ARDA test bed
  • The ARDA test bed will be the next step after the
    most complex EGEE Middleware test bed
  • Stress and performance tests could be ideally
    located outside CERN
  • This is for experiment-specific components (e.g.
    a Meta Data catalogue)
  • Leverage on Regional Centre local know how
  • Data base technologies
  • Web services
  • Pilot sites might enlarge the resources available
    and give fundamental feedback in terms of
    deployability to complement the EGEE SA1
    activity (EGEE/LCG operations)

21
Coordination and forum activities
  • The coordination activities would flow naturally
    from the fact that ARDA will be open to provide
    demonstration benches
  • Since it is neither necessary nor possible that
    all projects could be hosted inside the ARDA
    experiments prototypes, some coordination is
    needed to ensure that new technologies can be
    exposed to the relevant community
  • Transparent process
  • ARDA should organise a set of regular meetings
    (one per quarter?) to discuss results, problems,
    new/alternative solutions and possibly agree on
    some coherent program of work.
  • The ARDA project leader organises this activity
    which will be truly distributed and lead by the
    active partners
  • Special relation with LCG GAG
  • LCG forum for Grid requirements and use cases
  • Experiments representatives coincide with the
    EGEE NA4 experiments representatives
  • ARDA will channel this information to the
    appropriate recipients
  • ARDA workshop (January 2004 at CERN open over
    150 participants)
  • ARDA workshop (June 21-23 at CERN by invitation)
  • The first 30 days of EGEE middleware
  • ARDA workshop (September 2004? open)

22
Coordination and forum activities
ALICE Distr. Analysis
ATLAS Distr. Analysis
CMS Distr. Analysis
LHCb Distr. Analysis
EGEE NA4 Application identification and support
ARDA Collaboration Coordination Integration Specif
ication Priorities Planning
GAE

Experience ? ?Use Cases
PROOF
LCG-GAG Grid Application Group
SEAL
POOL
EGEE middleware
Resource Providers Community
23
Plans and activity within the experiments
  • General pattern
  • Planning
  • Example
  • LHCb
  • CMS
  • ATLAS
  • ALICE

24
Example of activity
  • Existing system as starting point
  • Every experiment has different implementations of
    the standard services
  • Used mainly in production environments
  • Few expert users
  • Coordinated update and read actions
  • ARDA
  • Interface with the EGEE middleware
  • Verify (help to evolve to) such components to
    analysis environments
  • Many users
  • Robustness
  • Concurrent read actions
  • Performance
  • One prototype per experiment
  • A Common Application Layer might emerge in future
  • ARDA emphasis is to enable each of the experiment
    to do its job

Very soon
Already started
25
LHCb
  • The LHCb system within ARDA uses GANGA as
    principal component.
  • The LHCb/GANGA plans to enable physicists to use
    GANGA to analyse the data being produced during
    2004 for their studies naturally matches the ARDA
    mandate
  • At the beginning, the emphasis will be to
    validate the tool focusing on usability,
    validation of the splitting and merging
    functionality for users jobs
  • The DIRAC system (LHCb grid system, used mainly
    in production so far, could be a useful
    playground to understand the detailed behaviour
    of some components, like the file catalog)

26
GANGAGaudi/Athena aNd Grid Alliance
  • Gaudi/Athena LHCb/ATLAS frameworks
  • The Athena uses Gaudi as a foundation
  • Single desktop for a variety of tasks
  • Help configuring and submitting analysis jobs
  • Keep track of what they have done, hiding
    completely all technicalities
  • Resource Broker, LSF, PBS, DIRAC, Condor
  • Job registry stored locally or in the roaming
    profile
  • Automate config/submit/monitor procedures
  • Provide a palette of possible choices and
    specialized plug-ins (pre-defined application
    configurations, batch/grid systems, etc.)
  • Friendly user interface (CLI/GUI) is essential
  • GUI Wizard Interface
  • Help users to explore new capabilities
  • Browse job registry

GANGA
GUI
Collective Resource Grid Services
Histograms Monitoring Results
JobOptions Algorithms
GAUDI Program
GANGA
UI
Internal Model
BkSvc
WLM
ProSvc
Monitor
Grid Services
Bookkeeping Service
WorkLoad Manager
Profile Service
GAUDI Program
Instr.
File catalog
SE
CE
27
ARDA contribution to Ganga
  • Integration with EGEE middleware
  • Waiting for the EGEE middleware, we developed an
    interface to Condor
  • Use of Condor DAGMAN for splitting/merging and
    error recovery capability
  • Design and Development
  • Command Line Interface
  • Future evolution of Ganga
  • Release management
  • Software process and integration
  • Testing, tagging policies etc.
  • Infrastructure
  • Installation, packaging etc.

28
LHCb Metadata catalog
  • Used in production (for large productions)
  • Web Service layer being developed (main
    developers in the UK)
  • Oracle backend
  • ARDA contributes a testing focused on the
    analysis usage
  • Robustness
  • Performances under high concurrency (read mode)

Measured network rate vs no. of concurrent
clients
29
CERN/Taiwan tests
Client
Network monitor
Virtual Users
Bookkeeping Server
  • CPU Load
  • Network
  • Process time
  • Web XML-RPC Service performance tests
  • CPU Load
  • Network
  • Process time

Oracle DB
CERN
Bookkeeping Server
  • Clone Bookkeeping DB in Taiwan
  • Install the WS layer
  • Performance Tests
  • Database I/O Sensor
  • Bookkeeping Server performance tests
  • Taiwan/CERN Bookkeeping Server DB
  • XML-RPC Service performance tests
  • CPU Load, Network send/receive sensor, Process
    time
  • Client Host performance tests
  • CPU Load, Network send/receive sensor, Process
    time
  • DB I/O Sensor

TAIWAN
Oracle DB
30
CMS
  • The CMS system within ARDA is still under
    discussion
  • This Wednesday CMS session during CMS software
    week
  • It is already clear that the complex RefDB system
    (the heart of the data challenge DC04, recently
    finished) will be one of the area of
    collaboration between CMS and the corresponding
    ARDA team
  • RefDB is the bookkeeping engine to plan and steer
    the production across different phases
    (simulation, reconstruction, to some degree into
    the analysis phase) . It contained all necessary
    information except file physical location (RLS)
    and information related to the transfer
    management system (TMDB).
  • Measuring performances underway (similar
    philosophy as for the LHCb Metadata catalog
    measurements)

31
DC04 data flow at T0 (CERN)
Summaries of successful jobs
RefDB
Reconstruction jobs
McRunjob
Reconstruction instructions
T0 worker nodes
Transfer agent
Reconstructed data
Checks what has arrived
GDB castor pool
Updates
Updates
Tapes
RLS
TMDB
Reconstructed data
Export Buffers
32
ATLAS
  • The ATLAS system within ARDA has been agreed
  • ATLAS has a complex strategy for distributed
    analysis, adressing different area with specific
    projects (Fast response, user-driven analysis,
    massive production, etc see http//www.usatlas.b
    nl.gov/ADA/)
  • Starting point is the DIAL system
  • The AMI metadata catalog is a key component
  • mySQL as a back end
  • Genuine Web Server implementation
  • Robustness and performance tests from ARDA
  • In the start up phase, ARDA provided some help in
    developing ATLAS production tools
  • Finishing

33
What is DIAL?
34
ATLAS Metadata Catalog (AMI)
  • Atlas Metadata- Catalogue, contains File
    Metadata
  • Simulation/Reconstruction-Version
  • File-ContentEvent types
  • Does not contain physical filenames
  • SOAP-Proxy (in Java) front-end to hierarchical
    databases (institute ? collaboration)
  • Proxy allows database schema evolution
  • SOAP allows automatic code generation for client

Planned
35
AMI studies in ARDA
  • Studied behaviour using many concurrent clients

User
SOAP-Proxy
Meta-Data(MySQL)
User
User
  • Many problems still open
  • Large network traffic overhead due to schema
    independent tables
  • SOAP proxy supposed to provide DB properties
  • Browsable results
  • Note that Web Services are stateless (not
    automatic handles to have the concept of session,
    transaction, etc) 1 query 1 (full) response
  • Large queries crashed server
  • Shall proxy re-implement all database
    functionality?
  • Nice collaboration in place with ATLAS-Grenoble

36
ATLAS ATCOM
  • AtCom II planned successor of AtCom
  • Graphical interactive tool to support production
    management in ATLAS
  • Large scale job definition, submission and
    progress monitoring
  • Linked to several bookkeeping databases (AMI and
    Magda)
  • Plug-ins for LSF, EDG and Nordugrid

37
ALICE
  • The ALICE system within ARDA will be the
    evolution of the analysis system presented by
    ALICE at SuperComputing 2003 (SC2003)
  • With the new EGEE middleware (at SC2003, AliEn
    was used)
  • Some activity on the PROOF system
  • Robustness
  • Error recovery

38
ALIEN system/Grid enabled PROOF (SC2003 Demo)
Site C
Site B
Site A
PROOF SLAVES
TcpRouter
TcpRouter
TcpRouter
PROOF MASTER SERVER
TcpRouter
USER SESSION
39
ALICE-ARDA prototype improvements
  • SC2003
  • The setup was heavily connected with the
    middleware services
  • Somewhat inflexible configuration
  • No chance to use PROOF on federated grids like
    LCG in AliEn
  • TcpRouter service needs incoming connectivity in
    each site
  • Libraries can not be distributed using the
    standard rootd functionality
  • Improvement ideas
  • Distribute another daemon with ROOT, which
    replaces the need for aTcpRouter service
  • Connect each slave proofd/rootd via this daemon
    to two central proofd/rootd master multiplexer
    daemons, which run together with the proof
    master
  • Use Grid functionality for daemon start-up and
    booking policies througha plug-in interface from
    ROOT
  • Put PROOF/ROOT on top of the grid services
  • Improve on dynamic configuration and error
    recovery

40
ALICE-ARDA improved system
Proxy proofd
Proxy rootd
Grid Services
Booking
  • The remote proof slaves looklike a local proof
    slave onthe master machine
  • Booking service is usable also on local clusters

Master
41
Conclusions and Outlook
  • ARDA is starting
  • Main tool experiment prototypes for analysis
  • Detailed project plan being prepared
  • Good feedback from the LHC experiments
  • Good collaboration with EGEE NA4
  • Good collaboration with Regional Centres
  • Look forward to contribute to the success of EGEE
  • Helping EGEE Middleware to deliver a fully
    functionally solution
  • ARDA main focus
  • Collaborate with the LHC experiments to set up
    the end-to-end prototypes
  • Aggressive schedule
  • First milestone for the end-to-end prototypes is
    Dec 2004

42
Links
  • LCG
  • http//cern.ch/lcg
  • EGEE
  • www.eu-egee.org
  • NA4 (Application Identification and Support)
    http//egee-na4.ct.infn.it/index.php
  • NA4 HEPhttp//egee-na4.ct.infn.it/hep/
  • ARDA
  • http//cern.ch/arda
  • GAG
  • http//project-lcg-gag.web.cern.ch/project-lcg-gag
    /
Write a Comment
User Comments (0)
About PowerShow.com