The ARDA project: Grid analysis prototypes of the LHC experiments Massimo Lamanna ARDA Project Leade

About This Presentation

Title:

The ARDA project: Grid analysis prototypes of the LHC experiments Massimo Lamanna ARDA Project Leade

Description:

EGEE is a project funded by the European Union under contract IST-2003-508833 ' ... From: http://www.webopedia.com. End-to-end prototypes: why? ... – PowerPoint PPT presentation

Number of Views:128

Avg rating:3.0/5.0

Slides: 43

Provided by: massimo9

Category:

more less

Transcript and Presenter's Notes

Title: The ARDA project: Grid analysis prototypes of the LHC experiments Massimo Lamanna ARDA Project Leade

1
The ARDA project Grid analysis prototypes of
the LHC experiments Massimo LamannaARDA
Project LeaderMassimo.Lamanna_at_cern.ch
DESY, 10 May 2004
http//cern.ch/arda
www.eu-egee.org
cern.ch/lcg
EGEE is a project funded by the European Union
under contract IST-2003-508833
2
Contents

A bit of history
LHC experiments and the LCG project
EGEE project
ARDA Project
Mandate and organisation
ARDA activities during 2004
Now
Second half of 2004
Conclusions and Outlook

3
LHC Experiments
ATLAS
CMS
Storage Raw recording rate 0.1 1 GByte/s
Accumulating at 5-8 PetaByte/year 10 PetaByte
of disk Processing 200,000 of todays
fastest PCs
LHCb
ALICE
LHCb
4
Multi-Tiered View of LHC Computing
5
The LHC Computing Grid Project

Prepare and deploy the computing environment for
the LHC experiments
Common applications, tools, frameworks and
environments,
Move from testbed systems to real production
services
Experiments need a dependable system
Operated and supported 24x7 globally
Computing fabrics run as production physics
services
Computing environment must be robust, stable,
predictable, and supportable
Foster collaboration, coherence of the LHC
computing centres
LCG is not a grid technology RD project
Enable physics data analysis and distributed
collaboration to a new scale

6
The LHC Computing Grid ProjectPhase 1 and Phase
2

Phase 1 2002-05
Development and prototyping
Approved by CERN Council 20 September 2001
Phase 2 2006-08
Installation and operation of the full world-wide
initial production Grid
Exploiting Phase 1 experience
Costs (materials staff) included in the LHC
cost to completion estimates

7
The LCG Phase 1 Goals

Prepare the LHC computing environment
Provide the common tools and infrastructure for
the physics application software
Establish the technology for fabric, network and
grid management
Operate a series of data challenges for the
experiments
Build a solid collaboration and a fertile
exchange of experience within the community of
the centres contributing to the LCG.
Validate the technology and models by building
progressively more complex Grid prototypes
Develop models for building the Phase 2 Grid
Maintain reasonable opportunities for the re-use
of the results of the project in other fields
Deploy a 50 model production GRID including the
committed LHC Regional Centres
Produce a Technical Design Report for the full
LHC Computing Grid to be built in Phase 2 of the
project
50 of the complexity of one of the LHC
experiments

8
Too early?

First collisions in Spring 2007
1 year to procure, install, and test the full LHC
computing fabrics
Infrastructure work like civil engineering
already started
The Computing TDR must be ready in mid-2005
At least 1 year of experience in operating a
production grid to validate the computing model
Experiments data challenges should run within
LCG in 2004
With a reasonable level of production service
How do we evolve the present services (LCG-2)
into the final system?

9
The EGEE project

Create a European-wide Grid production quality
infrastructure for multiple sciences
Profit from current and planned national and
regional Grid programmes, building on
the results of existing projects such as DataGrid
(EDG), LCG and others
EU Research Network and industrial Grid
developers
Support Grid computing needs common to the
different communities
integrate the computing infrastructures and agree
on common access policies
Exploit International connections (US and AP)
Provide interoperability with other major Grid
initiatives such as the US NSF Cyberinfrastructure
, establishing a worldwide Grid infrastructure
Leverage national resources in a more effective
way
70 leading institutions in 27 countries(including
Russia and US)

10
EGEE Scope

The project started April 2004
First phase will last 2 years with EU funding of
32M
Possibility of 2nd phase if successful
EGEE Scope ALL-inclusive for academic
applications
Open to industrial and socio-economic world as
well
Industrial participation both as potential
end-users and IT technology and service suppliers
EGEE organises an Industry Forum to keep
Industrial and Commercial parties in close
contact
Services developed in 2004-5 may be tendered to
Industry in the second phase (2006-7)
The major success criterion of EGEE how many
satisfied users from how many different domains ?
5000 users from at least 5 disciplines
2 Pilot Application Domains Physics
Bioinformatics

11
EGEE and LCG

Strong links already established between EDG and
LCG and this approach will continue in the scope
of EGEE
The core infrastructure of the LCG and EGEE grids
will be operated as a single service, and will
grow out of LCG service
LCG includes US and Asia
EGEE includes other sciences
Substantial part of infrastructure common to both
Parallel production lines
LCG-2
2004 data challenges
Pre production prototype
EGEE MW
ARDA playground

12
ARDA working group recommendations

New service decomposition
Strong influence of Alien system
the Grid system developed by the ALICE
experiments and used by a wide scientific
community (not only HEP)
Role of experience, existing technology
Web service framework
Interfacing to existing middleware to enable
their use in the experiment frameworks
Early deployment of (a series of) prototypes to
ensure functionality and coherence

EGEE Middleware
ARDA project
13
Web Services

Web servicesThe term Web services describes a
standardized way of integrating Web-based
applications using the XML, SOAP, WSDL and UDDI
open standards over an Internet protocol
backbone. XML is used to tag the data, SOAP is
used to transfer the data, WSDL is used for
describing the services available and UDDI is
used for listing what services are available.
Used primarily as a means for businesses to
communicate with each other and with clients, Web
services allow organizations to communicate data
without intimate knowledge of each other's IT
systems behind the firewall.Unlike traditional
client/server models, such as a Web server/Web
page system, Web services do not provide the user
with a GUI. Web services instead share business
logic, data and processes through a programmatic
interface across a network. The applications
interface, not the users. Developers can then add
the Web service to a GUI (such as a Web page or
an executable program) to offer specific
functionality to users.Web services allow
different applications from different sources to
communicate with each other without
time-consuming custom coding, and because all
communication is in XML, Web services are not
tied to any one operating system or programming
language. For example, Java can talk with Perl,
Windows applications can talk with UNIX
applications.N.B. Web services do not require
the use of browsers or HTML. From
http//www.webopedia.com

14
End-to-end prototypes why?

Provide a fast feedback to the EGEE MW
development team
Avoid uncoordinated evolution of the middleware
Coherence between users expectations and final
product
Experiments ready to benefit from the new MW as
soon as possible
Frequent snapshots of the middleware available
Expose the experiments (and the community in
charge of the deployment) to the current
evolution of the whole system
Experiments system are very complex and still
evolving
Move forward towards new-generation real systems
(analysis!)
Prototypes should be exercised with realistic
workload and conditions
No academic exercises or synthetic demonstrations
LHC experiments users absolutely required here!!!
A lot of work (and useful software) is involved
in current experiments data challenges this will
be used as a starting point
Adapt/complete/refactorise the existing we do
not need another system!

15
E2E Prototypes implementation

Every experiment has already at least one system
Analysis/Production typically distinct entities
Using a variety of back-ends (Batch systems,
different grid systems)
ARDA will put its effort on the experiment
(sub)system the experiment chooses
EGEE MW as foundation layer
Multigrid interfaces outside our scope
Experiments do know how to deal with this
By default, we expect 4 systems
There is nothing like an ARDA prototype
Adapt/complete/refactorise the existing
(sub)system!
Collaborative effort (not a parallel development)
Commonality is not ruled out, but it should
emerge and become attractive for the experiments.
Anyway not imposed from outside
Users users users!!!
First important checkpoint December 2004

16
Experiment End-to-End Prototypes

The initial prototype will have a reduced scope
Components selection for the first prototype
Experiments components not in use for the first
prototype are not ruled out (and used/selected
ones might be replaced later on)
Not all use cases/operation modes will be
supported
Attract and involve users
Many users are absolutely required
The Use Cases are still being defined
Example
A physicist selects a data sample (from current
Data Challenges)
With an example/template as starting point (s)he
prepares a job to scan the data
The job is split in sub-jobs, dispatched to the
Grid, some error-recovery is automatically
performed, merged back in a single output
The output (histograms, ntuples) is returned
together with simple information on the job-end
status

17
E2E Prototypes
Experiment software

Each experiment chooses the starting point (1
system)
Subset of the existing system
Emphasis on analysis
EGEE MW as foundation layer
There is nothing like an ARDA prototype!
Adapt/complete/refactorise the existing one
together with the experiments teams
The initial prototype will have a reduced scope
Just the most sensible starting point

Experiment-specific middleware
EGEE Middleware Interface Layer
Other systems in use (LCG2, G2003,
NorduGrid, LSF, PBS, )
Generic middleware
FileCatalog
CE
Workload
FileCatalog
FileCatalog
SE
18
ARDA Project current set up

LCG
Project leader (Massimo Lamanna/CERN)
4 LCG staff (100 at CERN) matching the 4 EGEE
staff
1 more staff from LCG (100 at CERN)
About 4 FTEs from other sources (20 at CERN)
EGEE
4 NA4 staff (100 at CERN)
Experiments
4 experiments interfaces
Represent the experiments in project definition,
implementation and evaluation
Identify and coordinate the experiment
contributions
analysis groups in the experiments with whom the
middleware people can work to specify the
services and validate the implementations
upper middleware teams (experiment-specific MW)

Strong link with exp. teams
Strong link with regional centres
Strong link with exp. teams
Users
Exp.System
Strong link with exp. teams
19
People

Massimo Lamanna
Birger Koblitz
Dietrich Liko
Frederik Orellana
Derek Feichtinger
Andreas Peters
Julia Andreeva
Juha Herrala
Andrew Maier
Kuba Moscicki

Russia

Andrey Demichev
Viktor Pose
Wei-Long Ueng
Tao-Sheng Chen

ALICE
Taiwan
ATLAS
Experiment interfaces Piergiorgio Cerello
(ALICE) David Adams (ATLAS) Lucia Silvestris
(CMS) Ulrik Egede (LHCb)
CMS
LHCb
20
ARDA _at_ Regional Centres

Deployability is a key factor of MW success
A few Regional Centres will have the
responsibility to provide early installation for
ARDA
Understand Deployability issues
Extend the ARDA test bed
The ARDA test bed will be the next step after the
most complex EGEE Middleware test bed
Stress and performance tests could be ideally
located outside CERN
This is for experiment-specific components (e.g.
a Meta Data catalogue)
Leverage on Regional Centre local know how
Data base technologies
Web services
Pilot sites might enlarge the resources available
and give fundamental feedback in terms of
deployability to complement the EGEE SA1
activity (EGEE/LCG operations)

21
Coordination and forum activities

The coordination activities would flow naturally
from the fact that ARDA will be open to provide
demonstration benches
Since it is neither necessary nor possible that
all projects could be hosted inside the ARDA
experiments prototypes, some coordination is
needed to ensure that new technologies can be
exposed to the relevant community
Transparent process
ARDA should organise a set of regular meetings
(one per quarter?) to discuss results, problems,
new/alternative solutions and possibly agree on
some coherent program of work.
The ARDA project leader organises this activity
which will be truly distributed and lead by the
active partners
Special relation with LCG GAG
LCG forum for Grid requirements and use cases
Experiments representatives coincide with the
EGEE NA4 experiments representatives
ARDA will channel this information to the
appropriate recipients
ARDA workshop (January 2004 at CERN open over
150 participants)
ARDA workshop (June 21-23 at CERN by invitation)
The first 30 days of EGEE middleware
ARDA workshop (September 2004? open)

22
Coordination and forum activities
ALICE Distr. Analysis
ATLAS Distr. Analysis
CMS Distr. Analysis
LHCb Distr. Analysis
EGEE NA4 Application identification and support
ARDA Collaboration Coordination Integration Specif
ication Priorities Planning
GAE

Experience ? ?Use Cases
PROOF
LCG-GAG Grid Application Group
SEAL
POOL
EGEE middleware
Resource Providers Community
23
Plans and activity within the experiments

General pattern
Planning
Example
LHCb
CMS
ATLAS
ALICE

24
Example of activity

Existing system as starting point
Every experiment has different implementations of
the standard services
Used mainly in production environments
Few expert users
Coordinated update and read actions
ARDA
Interface with the EGEE middleware
Verify (help to evolve to) such components to
analysis environments
Many users
Robustness
Concurrent read actions
Performance
One prototype per experiment
A Common Application Layer might emerge in future
ARDA emphasis is to enable each of the experiment
to do its job

Very soon
Already started
25
LHCb

The LHCb system within ARDA uses GANGA as
principal component.
The LHCb/GANGA plans to enable physicists to use
GANGA to analyse the data being produced during
2004 for their studies naturally matches the ARDA
mandate
At the beginning, the emphasis will be to
validate the tool focusing on usability,
validation of the splitting and merging
functionality for users jobs
The DIRAC system (LHCb grid system, used mainly
in production so far, could be a useful
playground to understand the detailed behaviour
of some components, like the file catalog)

26
GANGAGaudi/Athena aNd Grid Alliance

Gaudi/Athena LHCb/ATLAS frameworks
The Athena uses Gaudi as a foundation
Single desktop for a variety of tasks
Help configuring and submitting analysis jobs
Keep track of what they have done, hiding
completely all technicalities
Resource Broker, LSF, PBS, DIRAC, Condor
Job registry stored locally or in the roaming
profile
Automate config/submit/monitor procedures
Provide a palette of possible choices and
specialized plug-ins (pre-defined application
configurations, batch/grid systems, etc.)
Friendly user interface (CLI/GUI) is essential
GUI Wizard Interface
Help users to explore new capabilities
Browse job registry

GANGA
GUI
Collective Resource Grid Services
Histograms Monitoring Results
JobOptions Algorithms
GAUDI Program
GANGA
UI
Internal Model
BkSvc
WLM
ProSvc
Monitor
Grid Services
Bookkeeping Service
WorkLoad Manager
Profile Service
GAUDI Program
Instr.
File catalog
SE
CE
27
ARDA contribution to Ganga

Integration with EGEE middleware
Waiting for the EGEE middleware, we developed an
interface to Condor
Use of Condor DAGMAN for splitting/merging and
error recovery capability
Design and Development
Command Line Interface
Future evolution of Ganga
Release management
Software process and integration
Testing, tagging policies etc.
Infrastructure
Installation, packaging etc.

28
LHCb Metadata catalog

Used in production (for large productions)
Web Service layer being developed (main
developers in the UK)
Oracle backend
ARDA contributes a testing focused on the
analysis usage
Robustness
Performances under high concurrency (read mode)

Measured network rate vs no. of concurrent
clients
29
CERN/Taiwan tests
Client
Network monitor
Virtual Users
Bookkeeping Server

CPU Load
Network
Process time

Web XML-RPC Service performance tests
CPU Load
Network
Process time

Oracle DB
CERN
Bookkeeping Server

Clone Bookkeeping DB in Taiwan
Install the WS layer
Performance Tests
Database I/O Sensor
Bookkeeping Server performance tests
Taiwan/CERN Bookkeeping Server DB
XML-RPC Service performance tests
CPU Load, Network send/receive sensor, Process
time
Client Host performance tests
CPU Load, Network send/receive sensor, Process
time

DB I/O Sensor

TAIWAN
Oracle DB
30
CMS

The CMS system within ARDA is still under
discussion
This Wednesday CMS session during CMS software
week
It is already clear that the complex RefDB system
(the heart of the data challenge DC04, recently
finished) will be one of the area of
collaboration between CMS and the corresponding
ARDA team
RefDB is the bookkeeping engine to plan and steer
the production across different phases
(simulation, reconstruction, to some degree into
the analysis phase) . It contained all necessary
information except file physical location (RLS)
and information related to the transfer
management system (TMDB).
Measuring performances underway (similar
philosophy as for the LHCb Metadata catalog
measurements)

31
DC04 data flow at T0 (CERN)
Summaries of successful jobs
RefDB
Reconstruction jobs
McRunjob
Reconstruction instructions
T0 worker nodes
Transfer agent
Reconstructed data
Checks what has arrived
GDB castor pool
Updates
Updates
Tapes
RLS
TMDB
Reconstructed data
Export Buffers
32
ATLAS

The ATLAS system within ARDA has been agreed
ATLAS has a complex strategy for distributed
analysis, adressing different area with specific
projects (Fast response, user-driven analysis,
massive production, etc see http//www.usatlas.b
nl.gov/ADA/)
Starting point is the DIAL system
The AMI metadata catalog is a key component
mySQL as a back end
Genuine Web Server implementation
Robustness and performance tests from ARDA
In the start up phase, ARDA provided some help in
developing ATLAS production tools
Finishing

33
What is DIAL?
34
ATLAS Metadata Catalog (AMI)

Atlas Metadata- Catalogue, contains File
Metadata
Simulation/Reconstruction-Version
File-ContentEvent types
Does not contain physical filenames
SOAP-Proxy (in Java) front-end to hierarchical
databases (institute ? collaboration)
Proxy allows database schema evolution
SOAP allows automatic code generation for client

Planned
35
AMI studies in ARDA

Studied behaviour using many concurrent clients

User
SOAP-Proxy
Meta-Data(MySQL)
User
User

Many problems still open
Large network traffic overhead due to schema
independent tables
SOAP proxy supposed to provide DB properties
Browsable results
Note that Web Services are stateless (not
automatic handles to have the concept of session,
transaction, etc) 1 query 1 (full) response
Large queries crashed server
Shall proxy re-implement all database
functionality?
Nice collaboration in place with ATLAS-Grenoble

36
ATLAS ATCOM

AtCom II planned successor of AtCom
Graphical interactive tool to support production
management in ATLAS
Large scale job definition, submission and
progress monitoring
Linked to several bookkeeping databases (AMI and
Magda)
Plug-ins for LSF, EDG and Nordugrid

37
ALICE

The ALICE system within ARDA will be the
evolution of the analysis system presented by
ALICE at SuperComputing 2003 (SC2003)
With the new EGEE middleware (at SC2003, AliEn
was used)
Some activity on the PROOF system
Robustness
Error recovery

38
ALIEN system/Grid enabled PROOF (SC2003 Demo)
Site C
Site B
Site A
PROOF SLAVES
TcpRouter
TcpRouter
TcpRouter
PROOF MASTER SERVER
TcpRouter
USER SESSION
39
ALICE-ARDA prototype improvements

SC2003
The setup was heavily connected with the
middleware services
Somewhat inflexible configuration
No chance to use PROOF on federated grids like
LCG in AliEn
TcpRouter service needs incoming connectivity in
each site
Libraries can not be distributed using the
standard rootd functionality
Improvement ideas
Distribute another daemon with ROOT, which
replaces the need for aTcpRouter service
Connect each slave proofd/rootd via this daemon
to two central proofd/rootd master multiplexer
daemons, which run together with the proof
master
Use Grid functionality for daemon start-up and
booking policies througha plug-in interface from
ROOT
Put PROOF/ROOT on top of the grid services
Improve on dynamic configuration and error
recovery

40
ALICE-ARDA improved system
Proxy proofd
Proxy rootd
Grid Services
Booking

The remote proof slaves looklike a local proof
slave onthe master machine
Booking service is usable also on local clusters

Master
41
Conclusions and Outlook

ARDA is starting
Main tool experiment prototypes for analysis
Detailed project plan being prepared
Good feedback from the LHC experiments
Good collaboration with EGEE NA4
Good collaboration with Regional Centres
Look forward to contribute to the success of EGEE
Helping EGEE Middleware to deliver a fully
functionally solution
ARDA main focus
Collaborate with the LHC experiments to set up
the end-to-end prototypes
Aggressive schedule
First milestone for the end-to-end prototypes is
Dec 2004

42
Links

LCG
http//cern.ch/lcg
EGEE
www.eu-egee.org
NA4 (Application Identification and Support)
http//egee-na4.ct.infn.it/index.php
NA4 HEPhttp//egee-na4.ct.infn.it/hep/
ARDA
http//cern.ch/arda
GAG
http//project-lcg-gag.web.cern.ch/project-lcg-gag
/

Write a Comment

User Comments (0)

About PowerShow.com

The ARDA project: Grid analysis prototypes of the LHC experiments Massimo Lamanna ARDA Project Leade - PowerPoint PPT Presentation

The ARDA project: Grid analysis prototypes of the LHC experiments Massimo Lamanna ARDA Project Leade

EGEE is a project funded by the European Union under contract IST-2003-508833 ' ... From: http://www.webopedia.com. End-to-end prototypes: why? ... – PowerPoint PPT presentation