LCG ARDA project Status and plans - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

LCG ARDA project Status and plans

Description:

to change anything in the configuration file for ORCA ... Orca data cards. Data sample, Working directory, Castor directory to save output, ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 45
Provided by: ppephysi
Category:
Tags: arda | lcg | orca | plans | project | status

less

Transcript and Presenter's Notes

Title: LCG ARDA project Status and plans


1
LCG ARDA project Status and plans
  • Dietrich Liko / CERN

2
The ARDA project
  • ARDA is an LCG project
  • main activity is to enable LHC analysis on the
    grid
  • ARDA is contributing to EGEE
  • Includes entire CERN NA4-HEP resource (NA4
    Applications)
  • Interface with the new EGEE middleware (gLite)
  • By construction, ARDA uses the new middleware
  • Follow the grid software as it matures
  • Verify the components in an analysis environments
  • Contribution in the experiments framework
    (discussion, direct contribution, benchmarking,)
  • Users feedback is fundamental in particular
    physicists needing distributed computing to
    perform their analyses
  • Provide early and continuous feedback

3
ARDA prototype overview
4
Ganga4
  • Major version
  • Important contribution from theARDA team
  • Interesting concepts
  • Note that GANGA is a joint ATLAS-LHCbproject
  • Contacts with CMS (exchange of ideas,code
    snippets, )

5
ALICE prototype
  • ROOT and PROOF
  • ALICE provides
  • the UI
  • the analysis application (AliROOT)
  • GRID middleware gLite provides all the rest
  • ARDA/ALICE is evolving the ALICE analysis system

Middleware
UI shell
Application
end to end
6
PROOF SLAVES
Site B
PROOF MASTER SERVER
Site C
Site A
USER SESSION
Demo based on a hybrid system using 2004 prototype
7
ARDA shell C/C API
  • C access library for gLite has been developed
    by ARDA
  • High performance
  • Protocol quite proprietary...
  • Essential for the ALICE prototype
  • Generic enough for general use
  • Using this API grid commands have been added
    seamlessly to the standard shell

8
Current Status
  • Developed gLite C API and API Service
  • providing generic interface to any GRID service
  • C API is integrated into ROOT
  • In the ROOT CVS
  • job submission and job status query for batch
    analysis can be done from inside ROOT
  • Bash interface for gLite commands with catalogue
    expansion is developed
  • More powerful than the original shell
  • In use in ALICE
  • Considered a generic mw contribution (essential
    for ALICE, interesting in general)
  • First version of the interactive analysis
    prototype ready
  • Batch analysis model is improved
  • submission and status query are integrated into
    ROOT
  • job splitting based on XML query files
  • application (Aliroot) reads file using xrootd
    without prestaging

9
ATLAS/ARDA
  • Main component
  • Contribute to the DIAL evolution
  • gLite analysis server
  • Embedded in the experiment
  • AMI tests and interaction
  • Production and CTB tools
  • Job submission (ATHENA jobs)
  • Integration of the gLite Data Management within
    Don Quijote
  • Active participation in several ATLAS reviews
  • Benefit from the other experiments prototypes
  • First look on interactivity/resiliency issues
  • e.g. use of DIANE
  • GANGA (Principal component of the LHCb prototype,
    key component of the overall ATLAS strategy)

Tao-Sheng Chen, ASCC
10
Data Management
Don Quijote Locate and move data over grid
boundaries
ARDA has connected gLite
gLite
11
Combined Test Beam
Real data processed at gLite Standard Athena for
testbeam Data from CASTOR Processed on gLite
worker node
Example ATLAS TRT data analysis done by PNPI St
Petersburg Number of straw hits per layer
12
DIANE
13
DIANE on gLite running Athena
14
DIANE on LCG (Taiwan)
A worker died no problem, its tasks get
reallocated
Job need some time to start up. No problem.
15
ARDA/CMS
  • Prototype (ASAP)
  • Contributions to CMS-specific components
  • RefDB/PubDB
  • Usage of components used by CMS
  • Notably Monalisa
  • Contribution to CMS-specific developments
  • Physh

16
ARDA/CMS
  • RefDB Re-Design and PubDB
  • Taking part in the RefDB redesign
  • Developing schema for PubDB and supervising
    development of the first PubDB version
  • Analysis Prototype Connected to MonAlisa
  • To track the progress of an analysis task is
    troublesome when the task is split into several
    (hundreds of) sub-jobs
  • Analysis prototype associates each sub-job with
    built-in identity and capability to report its
    progress to the MonAlisa system
  • MonAlisa service receives and combines progress
    reports of single sub-jobs and publishes the
    overall progress of the whole task

17
CMS - Using MonAlisafor user job monitoring
18
ARDA/CMS
  • PhySh
  • Physicist Shell
  • ASAP is Python-based and it uses XML-RPC calls
    for client-server interaction like Clarens and
    PhySh

19
ARDA/CMS
  • CMS prototype (ASAP Arda Support for cms
    Analysis Processing)
  • First version of the CMS analysis prototype
    capable of creating-submitting-monitoring of the
    CMS analysis jobs on the gLite middleware had
    been developed by the end of the year 2004
  • Demonstrated at the CMS week in December 2004
  • Prototype was evolved to support both RB
    versions deployed at the CERN testbed (prototype
    task queue and gLite 1.0 WMS )
  • Currently submission to both RBs is available and
    completely transparent for the users (same
    configuration file, same functionality)
  • Plan to implement gLite job submission handler
    for Crab

20
ASAP Starting point for users
  • The user is familiar with the experiment
    application needed to perform the analysis (ORCA
    application for CMS)
  • The user knows how to create executable able to
    run the analysis task (reading selected data
    samples, use the data to compute derived
    quantities, take decisions, fill histograms,
    select events, etc). The executable is based on
    the experiment framework
  • The user debugged the executable on small data
    samples, on a local computer or computing
    services (e.g. lxplus at CERN)
  • How to go for larger samples , which can be
    located at any regional center CMS-wide?
  • The users should not be forced
  • to change anything in the compiled code
  • to change anything in the configuration file for
    ORCA
  • to know where the data samples are located

21
ASAP work and information flow
Job running on the Worker Node
Job monitoring directory
Job submission Checking job status Resubmission
in case of failure Fetching results Storing
results to Castor
Delegates user credentials using MyProxy
Application,applicationversion, Executable, Orca
data cards Data sample, Working directory,
Castor directory to save output, Number of events
to be processed Number of events per job
Output files location
22
Job Monitoring
  • ASAP Monitor

23
Merging the results
24
H-gt2?-gt2j analysis bkg. data available (all
signal events processed with Arda)
A. Nikitenko (CMS)
25
Higgs boson mass (M??) reconstruction
Higgs boson mass was reconstructed after basic
off-line cuts reco ET? jet gt 60 GeV, ETmiss gt
40 GeV. M?? evaluation is shown for
the consecutive cuts p? gt 0 GeV/c, p? gt 0
GeV/c, ??j1j2 lt 1750.
?(MH) ?(ETmiss) / sin(?j1j2)
M?? and ?(M??) are in a very good agreement with
old results CMS Note 2001/040, Table 3 M?? 455
GeV/c2, ?(M??)77 GeV/c2. ORCA4, Spring 2000
production.
A. Nikitenko (CMS)
26
ARDA ASAP
  • First users were able to process their data on
    gLite
  • Work of these pilot users can be regarded as a
    first round of validation of the gLite middleware
    and analysis prototypes
  • The number of users should increase as soon as
    preproduction system will become available
  • Interest to have CPUs at the centres where data
    sits (LHC Tier-1s)
  • To enable user analysis on the Grid
  • we will continue to work in the close
    collaboration with the physics community and
    gLite developers
  • ensuring good level of communication between them
  • providing constant feedback to the gLite
    development team
  • Key factors to progress
  • Increasing number of users
  • Larger distributed systems
  • More middleware components

27
ARDA Feedback (gLite middleware)
  • 2004
  • Prototype available (CERN Madison Wisconsin)
  • A lot of activity (4 experiments prototypes)
  • Main limitation size
  • Experiments data available! ?
  • Just an handful of worker nodes ?
  • 2005
  • Coherent move to prepare a gLite package to be
    deployed on the pre-production service
  • ARDA contribution
  • Mentoring and tutorial
  • Actual tests!
  • Lot of testing during 05Q1
  • PreProduction Service is about to start!

28
WMS monitor
29
Data Management
  • Central component together with the WMS
  • Early tests started in 2004
  • Two main components
  • gLiteIO (protocol server to access the data)
  • FiReMan (file catalogue)
  • The two components are not isolated, for example
    gLiteIO uses the ACL as recorded in FiReMan,
    FiReMan exposes the physical location of files
    for the WMS to optimise the job submissions
  • Both LFC and FiReMan offer large improvements
    over RLS
  • LFC is the most recent LCG2 catalogue
  • Still some issues remaining
  • Scalability of FiReMan
  • Bulk Entry for LFC missing
  • More work needed to understand performance and
    bottlenecks
  • Need to test some real Use Cases
  • In general, the validation of DM tools takes
    time!

30
FiReMan Performance - Queries
  • Query Rate for an LFN

31
FiReMan Performance - Queries
  • Comparison with LFC

32
More data coming C. Munro (ARDA Brunel Univ.)
at ACAT 05
33
Summary of gLite usage and testing
  • Info available also under http//lcg.web.cern.ch/l
    cg/PEB/arda/LCG_ARDA_Glite.htm
  • gLite version 1
  • WMS
  • Continuous monitor available on the web (active
    since 17th of February)
  • Concurrency tests
  • Usage with ATLAS and CMS jobs (Using Storage
    Index)
  • Good improvements observed
  • DMS (FiReMan gLiteIO)
  • Early usage and feedback (since Nov04) on
    functionality, performance and usability
  • Considerable improvement in performances/stability
    observed since
  • Some of the tests given to the development team
    for tuning and to JRA1 to be used in the testing
    suite
  • Most of the tests given to JRA1 to be used in the
    testing suite
  • Performance/stability measurements heavy-duty
    testing needed for real validation
  • Contribution to the common testing effort to
    finalise gLite 1 with SA1, JRA1 and NA4-testing)
  • Migration of certification tests within the
    certification test suite (LCG?gLite)
  • Comparison between LFC (LCG) and FiReMan
  • Mini tutorial to facilitate the usage of gLite
    within the NA4 testing

34
Metadata services on the Grid
  • gLite has provided a prototype for the EGEE
    Biomed community (in 2004)
  • Requirements in ARDA (HEP) were not all satisfied
    by that early version
  • ARDA preparatory work
  • Stress testing of the existing experiment
    metadata catalogues
  • Existing implementations showed to share similar
    problems
  • ARDA technology investigation
  • On the other hand usage of extended file
    attributes in modern systems (NTFS, NFS, EXT2/3
    SCL3,ReiserFS,JFS,XFS) was analysed
  • a sound POSIX standard exists!
  • Prototype activity in ARDA
  • Discussion in LCG and EGEE and UK GridPP Metadata
    group
  • Synthesis
  • New interface which will be maintained by EGEE
    benefiting from the activity in ARDA (tests and
    benchmarking of different data bases and direct
    collaboration with LHCb/GridPP)

35
ARDA Implementation
  • Prototype
  • Validate our ideas and expose a concrete example
    to interested parties
  • Multiple back ends
  • Currently Oracle, PostgreSQL, SQLite
  • Dual front ends
  • TCP Streaming
  • Chosen for performance
  • SOAP
  • Formal requirement of EGEE
  • Compare SOAP with TCP Streaming
  • Also implemented as standalone Python library
  • Data stored on the file system

36
Dual Front End
  • Text based protocol
  • Data streamed to client in single connection
  • Implementations
  • Server C, multiprocess
  • Clients C, Java, Python, Perl, Ruby
  • Most operations are SOAP calls
  • Based on iterators
  • Session created
  • Return initial chunk of data and session token
  • Subsequent request client calls nextQuery()
    using session token

37
More data coming N. Santos (ARDA Coimbra
Univ.) at ACAT 05
  • Test protocol performance
  • No work done on the backend
  • Switched 100Mbits LAN
  • Language comparison
  • TCP-S with similar performance in all languages
  • SOAP performance varies strongly with toolkit
  • Protocols comparison
  • Keepalive improves performance significantly
  • On Java and Python, SOAP is several times slower
    than TCP-S
  • Measure scalability of protocols
  • Switched 100Mbits LAN
  • TCP-S 3x faster than gSoap (with keepalive)
  • Poor performance without keepalive
  • Around 1.000 ops/sec (both gSOAP and TCP-S)

38
Current Uses of the ARDA Metadata prototype
  • Evaluated by LHCb bookkeeping
  • Migrated bookkeeping metadata to ARDA prototype
  • 20M entries, 15 GB
  • Feedback valuable in improving interface and
    fixing bugs
  • Interface found to be complete
  • ARDA prototype showing good scalability
  • Ganga (LHCb, ATLAS)
  • User analysis job management system
  • Stores job status on ARDA prototype
  • Highly dynamic metadata
  • Discussed within the community
  • EGEE
  • UK GridPP Metadata group

39
ARDA workshops and related activities
  • ARDA workshop (January 2004 at CERN open)
  • ARDA workshop (June 21-23 at CERN by invitation)
  • The first 30 days of EGEE middleware
  • NA4 meeting (15 July 2004 in Catania EGEE open
    event)
  • ARDA workshop (October 20-22 at CERN open)
  • LCG ARDA Prototypes
  • Joint session with OSG
  • NA4 meeting 24 November (EGEE conference in Den
    Haag)
  • ARDA workshop (March 7-8 2005 at CERN open)
  • ARDA workshop (October 2005 together with LCG
    Service Challenges)
  • Wednesday afternoon meeting started in 2005
  • Presentations from experts and discussion (not
    necessary from ARDA people)
  • Available from http//arda.cern.ch

40
Conclusions (1/3)
  • ARDA has been set up to
  • Enable distributed HEP analysis on gLite
  • Contact have been established
  • With the experiments
  • With the middleware developers
  • Experiment activities are progressing rapidly
  • Prototypes for ALICE, ATLAS, CMS LHCb
  • Complementary aspects are studied
  • Good interaction with the experiments environment
  • Always seeking for users!!!
  • People more interested in physics than in
    middleware we support them!
  • 2005 will be the key year (gLite version 1 is
    becoming available on the pre-production service)

41
Conclusions (2/3)
  • ARDA provides special feedback to the development
    team
  • First use of components (e.g. gLite prototype
    activity)
  • Try to run real-life HEP applications
  • Dedicated studies offer complementary information
  • Experiment-related ARDA activities produce
    elements of general use
  • Very important by-product
  • Examples
  • Shell access (originally developed in ALICE/ARDA)
  • Metadata catalog (proposed and under test in
    LHCb/ARDA)
  • (Pseudo)-interactivity experience (something
    in/from all experiments)

42
Conclusions (3/3)
  • ARDA is a privileged observatory to follow,
    contribute and influence the evolution of the HEP
    analysis
  • Analysis prototypes are a good idea!
  • Technically, they complement the data challenges
    experience
  • Key point these systems are exposed to users
  • The approach of 4 parallel lines is not too
    inefficient
  • Contributions in the experiments from day zero
  • Difficult environment
  • Commonality can not be imposed
  • We could do better in keeping good connection
    with OSG
  • How?

43
Outlook
  • Commonality is a very tempting concept, indeed
  • Sometimes a bit fuzzy, maybe
  • Maybe it is becoming more important
  • Lot of experience in the whole community!
  • Baseline services ideas
  • LHC schedule physics is coming!
  • Maybe it is emerging (examples are not
    exhaustive)
  • Interactivity is a genuine requirement e.g.
    PROOF and DIANE
  • Toolkits for the users to build applications on
    top of the computing infrastructure e.g. GANGA
  • Metadata/workflow systems open to the users
  • Monitor and discovery services open to users
    e.g. Monalisa in ASAP
  • Strong preference for a a posteriori approach
  • All experiments still need their system
  • Keep on being pragmatic

44
People
  • Massimo Lamanna
  • Frank Harris (EGEE NA4)
  • Birger Koblitz
  • Andrey Demichev
  • Viktor Pose
  • Victor Galaktionov
  • Derek Feichtinger
  • Andreas Peters
  • Hurng-Chun Lee
  • Dietrich Liko
  • Frederik Orellana
  • Tao-Sheng Chen
  • Julia Andreeva
  • Juha Herrala
  • Alex Berejnoi
  • 2 PhD students
  • Craig Munro (Brunel Univ.) Distributed analysis
    within CMSworking mainly with Julia
  • Nuno Santos (Coimbra Univ) Metadata and resilient
    computingworking mainly with Birger
  • Catalin Cirstoiu and Slawomir Biegluk (short-term
    LCG visitors)

Good collaboration with EGEE/LCG Russian
institutes and with ASCC Taipei
Write a Comment
User Comments (0)
About PowerShow.com