ARDA Reports to the LCG SC2 - PowerPoint PPT Presentation

About This Presentation
Title:

ARDA Reports to the LCG SC2

Description:

talk with POOL how to extend the file concept to a more generic collection concept ... Roadmap to a GS Architecture for the LHC ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 29
Provided by: usc1
Learn more at: https://uscms.org
Category:
Tags: arda | lcg | collection | reports | sc2

less

Transcript and Presenter's Notes

Title: ARDA Reports to the LCG SC2


1
ARDA Reportsto the LCG SC2
  • L.A.T.Bauerdick
  • for the RTAG-11/ARDA group

Architectural Roadmap towards a Distributed
Analysis
2
ARDA Mandate

3
ARDA Schedule and Makeup

Done
Done
  • Alice Fons Rademakers and Predrag Buncic
  • Atlas Roger Jones and Rob Gardner
  • CMS Lothar Bauerdick and Lucia Silvestris
  • LHCb Philippe Charpentier and Andrei
    Tsaregorodtsev
  • LCG GTA David Foster, stand-in Massimo Lamanna
  • LCG AA Torre Wenaus
  • GAG Federico Carminati

4
ARDA mode of operation
  • Thank you for an excellent committee -- large
    expertise, agility and responsiveness, very
    constructive and open-minded, and sacrificing
    quite a bit of the summer
  • Series of weekly meetings July and August,
    mini-workshop in September
  • Invited talks from existing experiments
    projects
  • Summary of Caltech GAE workshop (Torre)
  • PROOF (Fons)
  • AliEn (Predrag)
  • DIAL (David Adams)
  • GAE and Clarens (Conrad Steenberg)
  • Ganga (Pere Mato)
  • Dirac (Andrei)
  • Cross-check w/ other projects of emerging ARDA
    decomposition of services
  • Magda, DIAL -- Torre, Rob
  • EDG, NorduGrid -- Andrei, Massimo
  • SAM, MCRunjob -- Roger, Lothar
  • BOSS, MCRunob -- Lucia, Lothar
  • Clarens, GAE -- Lucia, Lothar
  • Ganga -- Rob, Torre
  • PROOF -- Fons
  • AliEn -- Predrag

Done
Done
5
Initial Picture Distributed Analysis (Torre,
Caltech w/s)
6
Hepcal-II Analysis Use Cases
  • Scenarios based on GAG HEPCAL-II report
  • Determine data sets and eventually event
    components
  • Input data are selected via a query to a metadata
    catalogue
  • Perform iterative analysis activity
  • Selection and algorithm are passed to a workload
    management system, together with spec of the
    execution environment
  • Algorithms are executed on one or many nodes
  • User monitors progress of job execution
  • Results are gathered together and passed back to
    the job owner
  • Resulting datasets can be published to be
    accessible to other users
  • Specific requirements from Hepcal-II
  • Job traceability, provenance, logbooks
  • Also discussed support for finer-grain access
    control and enabling to share data within physics
    groups

7
Analysis Scenario
  • This scenario represents the analysis activity
    from the user perspective. However, some other
    actions are done behind the scene of the user
    interface
  • To carry out the analysis tasks users are
    accessing shared computing resources. To do so,
    they must be registered with their Virtual
    Organization (VO), authenticated and their
    actions must be authorized according to their
    roles within the VO
  • The user specifies the necessary execution
    environment (software packages, databases, system
    requirements, etc) and the system insures it on
    the execution node. In particular, the necessary
    environment can be installed according to the
    needs of a particular job
  • The execution of the user job may trigger
    transfers of various datasets between a user
    interface computer, execution nodes and storage
    elements. These transfers are transparent for the
    user

8
Example Asynchronous Analysis
  • Running Grid-based analysis from inside ROOT
    (adapted from AliEn example)
  • ROOT calling the ARDA API from the command prompt
  • // connect authenticate to the GRID Service
    arda as lucia
  • TGrid arda TGridConnect(arda",lucia,"",""
    )
  • // create a new analysis Object ( ltunique IDgt,
    lttitlegt, subjobs)
  • TArdaAnalysis analysis new TArdaAnalysis(pass
    001",MyAnalysis",10)
  • // set the program, which executes the Analysis
    Macro/Script
  • analysis-gtExec("ArdaRoot.sh,"file/home/vincenzo
    /test.C") // script to execute
  • // setup the event metadata query
  • analysis-gtQuery("2003-09/V6.08.Rev.04/00110/gjet
    met.root?ptgt0.2")
  • // specify job splitting and run
  • analysis-gtOutputFileAutoMerge(true) // merge
    all produced .root files
  • analysis-gtSplit() // split the task in subjobs
  • analysis-gtRun() // submit all subjobs to the
    ARDA queue
  • // asynchronously, at any time get the (partial
    or complete) results
  • analysis-gtGetResults() // download
    partial/final results and merge them
  • analysis-gtInfo() // display job information

9
Asynchronous Analysis Model
  • Extract a subset of the datasets from the virtual
    file catalogue using metadata conditions provided
    by the user.
  • Split the tasks according to the location of data
    sets.
  • A trade-off has to be found between best use of
    available resources and minimal data movements.
    Ideally jobs should be executed where the data
    are stored. Since one cannot expect a uniform
    storage location distribution for every subset of
    data, the analysis framework has to negotiate
    with dedicated Grid services the balancing
    between local data access and data replication.
  • Spawn sub-jobs and submit to Workload Management
    with precise job descriptions
  • User can control the results while and after data
    are processed
  • Collect and Merge available results from all
    terminated sub-jobs on request
  • Analysis objects associated with the analysis
    task remains persistent in the Grid environment
    so the user can go offline and reload an analysis
    task at a later date, check the status, merge
    current results or resubmit the same task with
    modified analysis code.

10
Synchronous Analysis
  • Scenario using PROOF in the Grid environment
  • Parallel ROOT Facility, main developer Maarten
    Ballintjin/MIT
  • PROOF already provides a ROOT-based framework to
    use a (local) cluster computing resources
  • balancing dynamically the workload, with the goal
    of optimizing CPU exploitation and minimizing
    data transfers
  • makes use of the inherent parallelism in event
    data
  • works in heterogeneous clusters with distributed
    storage
  • Extend this to the Grid using interactive
    analysis services, that could be based on the
    ARDA services

11
ARDA Roadmap Informed By DA Implementations
  • Following SC2 advice, reviewed major existing DA
    projects
  • Clearly AliEn today provides the most complete
    implementation of a distributed analysis
    services, that is fully functional -- also
    interfaces to PROOF
  • Implements the major Hepcal-II use cases
  • Presents a clean API to experiments application,
    Web portals,
  • Should address most requirements for upcoming
    experiments physics studies
  • Existing and fully functional interface to
    complete analysis package --- ROOT
  • Interface to PROOF cluster-based interactive
    analysis system
  • Interfaces to any other system well defined and
    certainly feasible
  • Based on Web-services, with global (federated)
    database to give state and persistency to the
    system
  • ARDA approach
  • Re-factoring AliEn, using the experience of the
    other project, to generalize it in an
    architecture Consider OGSI as a natural
    foundation for that
  • Confront ARDA services with existing projects
    (notably EDG, SAM, Dirac, etc)
  • Synthesize service definition, defining their
    contracts and behavior
  • Blueprint for initial distributed analysis
    service infrastructure

12
ARDA Distributed Analysis Services
  • Distributed Analysis in a Grid Services based
    architecture
  • ARDA Services should be OGSI compliant -- built
    upon OGSI middleware
  • Frameworks and applications use ARDA API with
    bindings to C, Java, Python, PERL,
  • interface through UI/API factory --
    authentication, persistent session
  • Fabric Interface to resources through CE, SE
    services
  • job description language, based on Condor
    ClassAds and matchmaking
  • Database(ses) through Dbase Proxy provide
    statefulness and persistence
  • We arrived at a decomposition into the following
    key services
  • API and User Interface
  • Authentication, Authorization, Accounting and
    Auditing services
  • Workload Management and Data Management services
  • File and (event) Metadata Catalogues
  • Information service
  • Grid and Job Monitoring services
  • Storage Element and Computing Element services
  • Package Manager and Job Provenance services

13
AliEn (re-factored)

14
ARDA Key Services for Distributed Analysis
15
ARDA HEPCAL matching an example
  • HEPCAL-II Use Case Group Level Analysis (GLA)
  • User specifies job information including
  • Selection criteria
  • Metadata Dataset (input)
  • Information about s/w (library) and configuration
    versions
  • Output AOD and/or TAG Dataset (typical)
  • Program to be run
  • User submits job
  • Program is run
  • Selection Criteria are used for a query on the
    Metadata Dataset
  • Event ID satisfying the selection criteria and
    Logical Dataset Name of corresponding Datasets
    are retrieved
  • Input Datasets are accessed
  • Events are read
  • Algorithm (program) is applied to the events
  • Output Datasets are uploaded
  • Experiment Metadata is updated
  • Report summarizing the output of the jobs is
    prepared for the group (eg. how many evts to
    which stream, ...) extracting the information
    from the application and GridMW
  • Authentication
  • Authorization
  • Metadata catalog
  • Workload management
  • Metadata catalog
  • Package manager
  • Compute element
  • File catalog
  • Data management
  • Storage Element
  • Metadata catalog
  • Job provenance
  • Auditing

16
API to Grid services
  • ARDA services present an API, called by
    applications like the experiments frameworks,
    interactive analysis packages, Grid portals, Grid
    shells, etc
  • In particular the importance of UI/API
  • Interface services to higher level software
  • Exp. framework
  • Analysis shells, e.g. ROOT
  • Grid portals and other forms of user interactions
    with environment
  • Advanced services e.g. virtual data, analysis
    logbooks etc
  • Provide experiment specific services
  • Data and Metadata management systems
  • Provide an API that others can project against
  • Benefits of common API to framework
  • Goes beyond traditional UIs à la GANGA, Grid
    portals, etc
  • Benefits in interfacing to analysis applications
    like ROOT et al
  • Process to get a common API b/w experiments --gt
    prototype
  • The UI/API can use the Condor ClassAds as a Job
    Description Language
  • This will maintain compatibility with existing
    job execution services, in particular LCG-1.

17
API and User Interface

18
File Catalogue and Data Management
  • Input and output associated with any job can be
    registered in the VOs File Catalogue, a virtual
    file system in which a logical name is assigned
    to a file.
  • Unlike real file systems, the File Catalogue does
    not own the files it only keeps an association
    between the Logical File Name (LFN) and (possibly
    more than one) Physical File Names (PFN) on a
    real file or mass storage system. PFNs describe
    the physical location of the files and include
    the name of the Storage Element and the path to
    the local file.
  • The system should support file replication and
    caching and will use file location information
    when it comes to scheduling jobs for execution.
  • The directories and files in the File Catalogue
    have privileges for owner, group and the world.
    This means that every user can have exclusive
    read and write privileges for his portion of the
    logical file namespace (home directory).

19
Job Provenance service
  • The File Catalogue is extended to include
    information about running processes in the system
    (in analogy with the /proc directory on Linux
    systems) and to support virtual data services
  • Each job sent for execution gets an unique id and
    a corresponding /proc/id directory where it can
    register temporary files, standard input and
    output as well as all job products. In a typical
    production scenario, only after a separate
    process has verified the output, the job products
    will be renamed and registered in their final
    destination in the File Catalogue. The entries
    (LFNs) in the File Catalogue have an immutable
    unique file id attribute that is required to
    support long references (for instance in ROOT)
    and symbolic links.

20
Package Manager Service
  • Allows dynamic installation of application
    software released by the VO (e.g. the experiment
    or a physics group).
  • Each VO can provide the Packages and Commands
    that can be subsequently executed. Once the
    corresponding files with bundled executables and
    libraries are published in the File Catalogue and
    registered, the Package Manager will install them
    automatically as soon as a job becomes eligible
    to run on a site whose policy accepts these jobs.
  • While installing the package in a shared package
    repository, the Package Manager will resolve the
    dependencies on other packages and, taking into
    account package versions, install them as well.
    This means that old versions of packages can be
    safely removed from the shared repository and, if
    these are needed again at some point later, they
    will be re-installed automatically by the system.
    This provides a convenient and automated way to
    distribute the experiment specific software
    across the Grid and assures accountability in the
    long term.

21
Computing Element
  • Computing Element is a service representing a
    computing resource. Its interface should allow
    submission of a job to be executed on the
    underlying computing facility, access to the job
    status information as well as high level job
    manipulation commands. The interface should also
    provide access to the dynamic status of the
    computing resource like its available capacity,
    load, number of waiting and running jobs.
  • This service should be available on a per VO
    basis.
  • Etc pp.

22
Talking Points
  • Horizontally structured system of services with a
    well-defined API and a database backend
  • Can easily be extended with additional services,
    new implementations can be moved in, alternative
    approaches tested and commissioned
  • Interface to LCG-1 infrastructure
  • VDT/EDG interface through CE, SE and the use of
    JDL, compatible with existing i/s
  • ARDA VO services can build on emerging VO
    management infrastructure
  • ARDA initially looked at file based datasets, not
    object collection
  • talk with POOL how to extend the file concept to
    a more generic collection concept
  • investigate experiments metadata/file catalog
    interaction
  • VO system and site security
  • Jobs are executed on behalf of VO, however users
    fully traceable
  • How do policies get implemented, e.g. analysis
    priorities, MoU contributions etc
  • Auditing and accounting system, priorities
    through special optimizers
  • accounting of site contributions, that depend
    what resources sites expose
  • Database backend for the prototype
  • Address latency, stability and scalability issues
    up-front good experience exists
  • In a sense, the system is the database (possibly
    federated and distributed) that contains all
    there is to know about all jobs, files, metadata,
    algorithms of all users within a VO
  • set of OGSI grid services provide
    windows/views into the database, while the
    API provides the user access
  • allows structuring into federated grids and
    dynamic workspaces

23
General ARDA Roadmap
  • Emerging picture of waypoints on the ARDA
    roadmap
  • ARDA RTAG report
  • review of existing projects, common architecture
    componenta decomposition re-factoring
  • recommendations for a prototypical architecture
    and definition of prototypical functionality and
    a development strategy
  • Development of a prototype and first release
  • Integration with and deployment on LCG-1
    resources and services
  • Re-engineering of prototypical ARDA services, as
    required
  • OGSI gives framework in which to run ARDA
    services
  • Addresses architecture
  • Provides framework for advanced interactions with
    the Grid
  • Need to address issues of OGSI performance and
    scalability up-front
  • Importance of modeling, plan for scaling up,
    engineering of underlying services infrastructure

24
Roadmap to a GS Architecture for the LHC
  • Transition to grid services explicitly addressed
    in several existing projects
  • Clarens and Caltech GAE, MonaLisa
  • Based on web services for communication,
    Jini-based agent architecture
  • Dirac
  • Based on intelligent agents working within
    batch environments
  • AliEn
  • Based on web services and communication to
    distributed database backend
  • DIAL
  • OGSA interfaces
  • Initial work on OGSA within LCG-GTA
  • GT3 prototyping
  • Leverage from experience gained in Grid M/W RD
    projects

25
ARDA Roadmap for Prototype
  • No evolutionary path from GT2-based grids
  • Recommendation build early a prototype based on
    refactoring existing implementations
  • Prototype provides the initial blueprint
  • Do not aim for a full specification of all the
    interfaces
  • 4-prong approach
  • Re-factoring of AliEn, Dirac and possibly other
    services into ARDA
  • Initial release with OGSILite/GT3 proxy,
    consolidation of API, release
  • Implementation of agreed interfaces, testing,
    release
  • GT3 modeling and testing, ev. quality assurance
  • Interfacing to LCG-AA software like POOL,
    analysis shells like ROOT
  • Also opportunity to early interfacing to
    complementary projects
  • Interfacing to experiments frameworks
  • metadata handlers, experiment specific services
  • Provide interaction points with community

26
Experiments and LCG Involved in Prototyping
  • ARDA prototype would define the initial set of
    services and their interfaces. Timescale spring
    2004
  • Important to involve experiments and LCG at the
    right level
  • Initial modeling of GT3-based services
  • Interface to major cross-exp packages POOL,
    ROOT, PROOF, others
  • Program experiment frameworks against ARDA API,
    integrate with experiment environments
  • Expose services and UI/API to other LHC projects
    to allow synergies
  • Spend appropriate effort to document, package,
    release, deploy
  • After the prototype is delivered, improve on
  • Scale up and re-engineer as needed OGSI,
    databases, information services
  • Deployment and interfaces to site and grid
    operations, VO management etc
  • Build higher-level services and experiment
    specific functionality
  • Work on interactive analysis interfaces and new
    functionalities

27
Major Role for Middleware Engineering
  • ARDA roadmap based on a well-factored prototype
    implementation that allows evolutionary
    development into a complete system that evolves
    to the full LHC scale
  • ARDA prototype would be pretty lightweight
  • Stability through basing on global database to
    which services talk through a database proxy
  • people know how to do large databases -- well
    founded principle (see e.g. SAM for RunII), with
    many possible migration paths
  • HEP-specific services, however based on generic
    OGSI-compliant services
  • Expect LCG/EGEE middleware effort to play major
    role to evolve this foundation, concepts and
    implementation
  • re-casting the (HEP-specific event-data analysis
    oriented) services into more general services,
    from which the ARDA services would be derived
  • addressing major issues like a solid OGSI
    foundation, robustness, resilience, fault
    recovery, operation and debugging
  • Expect US middleware projects to be involved in
    this!

28
Conclusions
  • ARDA is identifying a services oriented
    architecture and an initial decomposition of
    services required for distributed analysis
  • Recognize a central role for a Grid API which
    provides a factory of user interfaces for
    experiment frameworks, applications, portals, etc
  • ARDA Prototype would provide an distributed
    physics analysis environment of distributed
    experimental data
  • for experiment framework based analysis
  • Cobra, Athena, Gaudi, AliRoot,
  • for ROOT based analysis
  • interfacing to other analysis packages like JAS
    event displays like Iguana grid portals etc.
    can be implemented easily
Write a Comment
User Comments (0)
About PowerShow.com