ADA: ATLAS Distributed Analysis - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

ADA: ATLAS Distributed Analysis

Description:

Job management service (long-running jobs) Other services. Dataset ... Use production system for long-running jobs. ATLAS Distributed Analysis BNL Tech Mtg ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 42
Provided by: rhic1
Category:

less

Transcript and Presenter's Notes

Title: ADA: ATLAS Distributed Analysis


1
ADA ATLAS Distributed Analysis
BNL Technology Meeting
  • David Adams
  • BNL
  • June 7, 2004

2
Contents
  • Analysis model
  • Key features
  • Architecture
  • Status
  • Deliverables
  • Authentication and authorization
  • Service infrastructure
  • AJDL
  • Analysis services
  • Catalogs
  • Data movement
  • Clients
  • Applications
  • Datasets
  • Deployment
  • monitoring
  • Status summary
  • ARDA
  • Conclusions
  • Other presentations
  • More information
  • Acknowledgements

3
Analysis model
  • User selects
  • Dataset defining the input data
  • Application to process the data
  • Athena, root, paw,
  • Task to configure the application
  • E.g. Script to define histograms and code to fill
    them
  • User locates an analysis service (or local
    scheduler)
  • Former submitted to the latter
  • Examine partial results and result
  • Repeat as desired

4
Key features
  • Collection of high-level web services
  • User entry points
  • Analysis service (job submission and monitoring)
  • Catalog services (repositories, selection, )
  • Job management service (long-running jobs)
  • Other services
  • Dataset splitters and mergers
  • Loose coupling allows
  • Migration from reference to sophisticated
    implementations
  • Contributions from independent development teams
  • Common interfaces
  • Selection of most appropriate service at run time
  • One client can be used with many service
    implementations
  • May clients can access the same service

5
Key features (cont)
  • AJDL Abstract Job Definition Language
  • Used to define the high-level service interfaces
  • Object oriented
  • Application, Task, Dataset, JobPreferences, Job
  • Extensible
  • E.g. EventDataset, AtlasPoolEventDataset,
    RootHistogramDataset,
  • Data component in XML
  • Argument for service invocation
  • Class representation
  • For constructing services and clients
  • C, Python and maybe java

6
Key features (cont)
  • Multiple implementations of the analysis service
  • Production system
  • Take advantage of existing infrastructure
  • Provide capability to individual users
  • Use system as a whole or individual executors
  • DIAL
  • Distributed Interactive Analysis of Large
    datasets
  • Provide interactive response for analysis jobs
  • ARDA
  • Based on the new EGEE GLITE middleware
  • Long term replacement for the above?
  • If performance is the same or better and all
    sites support
  • Switch
  • To select between the above and create networks

7
Key features (cont)
  • Catalogs, catalogs, catalogs
  • Repositories
  • Hold XML descriptions of objects indexed by ID
  • E.g. Dataset and job repositories
  • Selection catalogs
  • Provide metadata enabling users to select object
  • E.g. (virtual) dataset selection catalog
  • Dataset replica catalog
  • Map virtual datasets to one or more concrete
    representations
  • Virtual data catalog
  • Maps application, task and input dataset to
    output dataset
  • Both datasets are virtual
  • Included in dataset selection catalog?

8
Key features (cont)
  • User interface(s)
  • Use CINT to provide direct access to DIAL/AJDL
    classes from the root command line
  • Use lcgdict to provide Python bindings for these
    classes
  • GANGA uses the latter as the basis for a GUI

9
Architecture
Work begun
Release 1.0
10
Demo
  • See DIAL web pages to run demo
  • http//www.usatlas.bnl.gov/dladams/dial
  • Runs at BNL and sites with AFS access
  • Distribution kit provided but not robust
  • Steps
  • Select analysis service
  • Choose application, task and input dataset
  • Submit job
  • Monitor and display partial and final results
  • Repeat as desired

11
Status
  • The following slides are taken from
  • ATLAS software workshop
  • May 23-28

12
Authentication and authorization
  • Authentication done with GSI
  • Users get and register grid certificate
  • BNL tier 1 providing list of authorized users
  • From all ATLAS LCG and USATLAS lists
  • Soon updated automatically every 6 hours
  • Now, send me message when you join and I will ask
    for update
  • DIAL service uses this list for authorization
  • Interface and implementation by VS
  • Available to others
  • BNL tier 1 will provide authorization web service
  • Summer 2004?
  • DIAL will add implementation of the same
    interface

13
Authentication and authorization (cont)
  • Like to add user IDs
  • Long term beyond lifetime of DNs
    (distinguished names) and CAs (certificate
    authorities)
  • Need catalog to map DNs to IDs
  • For 1.0?

14
Service infrastructure
  • For now, persistent standalone services
  • No OGSA OGSI, WSRF,
  • No UDDI
  • But we would to have multiple service instances
    in 1.0
  • Expect scaling problems later
  • More service instances for interactive jobs
  • Use production system for long-running jobs

15
AJDL
  • Abstract Job Description Language
  • Components used to define high-level interfaces
  • Dataset, transformation, job and job preferences
  • Both data and class to interpret and manipulate
    data
  • Data in XML
  • Classes in C and python later java perhaps
  • Component has generic interface but is extensible
  • High level services typically use generic
    interface
  • User and endpoint application need extensions
  • E.g. application uses event collection (not
    needed?)
  • Or user accesses histograms

16
AJDL (cont)
  • For release 1.0
  • Take AJDL components from DIAL
  • XML schema (DTD)
  • C class interfaces
  • C class implementation
  • Including writing and reading XML
  • Python implementation by wrapping C classes
  • Future
  • Separate AJDL from DIAL
  • Move data for generic interface into generic
    classes
  • So we dont need subclasses to implement generic
    interface
  • Some methods (e.g. kill job) may fail gracefully

17
AJDL Job
  • For release 1.0
  • All exchanged data carried in base class
  • Clients only see base for remote jobs
  • Recently added job ID and list of sub-jobs
  • Job manipulation done though analysis service
  • Methods include create, start, kill, update,
  • Later
  • Manipulation though remote class?
  • By calling appropriate analysis service
  • Or running job management service
  • Job is a service (OGSI) or resource (WSRF)

18
AJDL dataset
  • For release 1.0
  • Interface in Dataset, EventDataset and
    CompoundDataset
  • Partial implementation in BaseDataset,
    EventDataset and CompoundDataset
  • Remaining implementation provided by subclasses
  • CbntDataset, AtlasPoolEventDataset,
  • Subclasses have different XML structure
  • Later
  • Try to move XML data into base classes
  • Maybe drop CompoundDataset
  • Add to Dataset interface

19
AJDL transformation
  • Release 1.0
  • Application is just name version
  • Scripts to build and run stored at site
  • Software pre-installed
  • Task is collection of text files
  • Carried in XML
  • Viability of model demonstrated by the
    introduction of new applications
  • Some interest in adding task parameters, e.g.
    atlas release
  • Later
  • Application carries scripts
  • Application carries list of SW packages
  • Automatic installation with PacMan

20
AJDL Job Preferences
  • Class and XML representation added to DIAL
  • Mostly a placeholder
  • Now includes output file catalogs
  • Plan to add user ID, expected CPU time/event,

21
Analysis services
  • DIAL service is running
  • Want to improve response time
  • Distribute merging
  • Not yet distributed but no longer blocking
  • Use Condor COD for submission (done but not
    integrated)
  • New problem client-side crashes
  • Need steering service
  • Separate interactive analysis jobs from batch
    analysis and reconstruction (nothing done)
  • If we support the latter (we do)
  • Like to add a service (or services) based on
    production system
  • Probably not for release 1.0 (no one identified
    yet)

22
Catalogs
  • See Catalog services for ATLAS for the list of
    required catalogs
  • Repositories
  • XML description indexed by ID
  • For dataset, task, jobs,
  • Selection catalogs
  • ID and metadata
  • For datasets and tasks
  • Use AMI to host most of these catalogs
  • Access through generic AMI web service
  • DIAL/AJDL provides client classes for these
  • User-friendly interface

23
Catalogs (cont)
  • For release 1.0
  • Define schema and construct tables
  • AMI dataset and task repositories in place
  • Soon have others job, job preferences,
    application,
  • Provide interface classes
  • Have AMI interface for repositories
  • Have MySQL interface for DSC and DRC
  • New generic interfaces
  • Abstract SqlTable takes SqlQuery returns
    SqlResult
  • Have MySQL implementation of SqlTable
  • Will add AMI implementation
  • Maybe add Oracle?
  • Easy to select catalogs at runtime
  • Base selection (and maybe repository)
    implementation on these

24
Catalogs (cont)
  • Future
  • Distribute catalogs
  • Do not depend on access to central catalog
    service
  • Private data in catalogs?
  • Indicate owner/creator for each entry in catalogs
  • User IDs required for this
  • Private catalogs for users?
  • Catalog interface to aid in selection
  • Directly or to build graphical interface
  • To select datasets and tasks
  • Simple queries implemented
  • Enough for 1.0

25
Data movement
  • At present data movement supported through
    FileCatalog interface
  • OK if data is produced in a catalog capable of
    presenting data to the user
  • AfsFileCatalog, MagdaFileCatalog
  • Work required to get existing or new catalogs to
    use DMS, SRM or GridFTP to move data
  • Like to add DonQuojoteFileCatalog
  • Or SrmFileCatalog?

26
Clients
  • Use C/CINT client provided by DIAL
  • User-friendly access from ROOT
  • DIAL classes are processed using rootcint
  • Continuing success near full binding in every
    release
  • Problems with a few C types, e.g.
    vectorltstringgt
  • GANGA provides Python interface
  • For release 1.0, import with LCGDict
  • Problem no free functions ?no output stream
  • Adding display() and to_string() methods
  • Python binding now included in DIAL
  • Future maybe provide some python implementation
  • Reduce or remove dependency on AJDL C library

27
Clients (cont)
  • Java maybe in the future
  • Command line partial based on C
  • dial_submit extended to allow use of analysis
    service
  • Web browser some day

28
Clients (cont)
  • Like to add client tools to improve user
    interface
  • Dataset browser
  • Task browser and editor
  • Job monitor
  • Graphical interface to tie these together
  • Expect these to be provided by GANGA
  • Work begun on job configuration panel that
    includes the above browsers and editors (Alvins
    talk)

29
ATLAS Applications
  • Minimum for release 1.0 is AOD to histograms
  • User provides algorithm to fill histograms from
    AOD
  • Carried by task
  • Application compiles and (dynamically) links
  • Ketevi working on this
  • Partially implemented
  • aodhisto 0.90
  • Input AtlasPoolEventDataset
  • Output RootHistogramDataset
  • Scripts in place-needs integration and testing
  • At present, user code is not implemented the
    application runs the Higgs finding algorithm
    provided by Ketevi
  • Like to find someone to take over this package

30
ATLAS Applications (cont)
  • Also like to have reconstruction
  • Maybe RDO to ESD for release 1.0
  • Later add support for bytestream and AOD
  • Christian and Szymon working on this
  • Application is in place
  • atlasreco 0.90
  • Input AtlasPoolEventDataset
  • Output AtlasPoolEventDataset
  • Does reconstruction using release 8.0.1
  • Many ideas for future development
  • See Christians talk
  • Later add Monte Carlo tasks

31
ATLAS datasets
  • For DC2, need to add datasets describing
  • POOL event collection
  • Implementation now in place AtlasPoolEventDataset
  • Constructed from a single file holding an
    implicit collection
  • Will add support for multiple files and explicit
    collections
  • Latter will make it possible to implement the
    missing event selection method
  • Which will enable distributed processing of a
    single file
  • One such dataset in dataset catalogs
  • ROOT histograms
  • RootHistogramDataset now in place
  • Bytestream

32
Deployment
  • Analysis services
  • Interactive service at BNL
  • Using Condor COD or LSF dial queue
  • Reconstruction service at BNL
  • DIAL using BNL condor
  • Or another service using grid
  • Service to select between these
  • Last two only if we support reconstruction
  • Nice to have another site
  • Service based on 0.90 alpha is usually running at
    BNL
  • Presently uses LSF dial queue
  • Need to choose a different queue for
    reconstruction jobs
  • Probably switch to Condor COD for analysis

33
Deployment (cont)
  • Catalog service hosted by AMI
  • Might deploy some catalog services at BNL
  • Those for which we do not have AMI clients
  • Or to study distributed cataloging
  • Don Quixote at BNL if needed
  • Now deployed for production

34
Monitoring
  • Would like to monitor
  • Available services
  • Jobs
  • CPU usage
  • Disk usage
  • No plans and no requirements for release 1.0
  • Have added some functionality to analysis service
    and client for interactive use
  • Would very much like have graphical monitoring
  • Might be included in GANGA

35
Status summary
  • Status of deliverables for release 1.0 (updated)
  • Authentication and authorization ok
  • Service infrastructure ok
  • AJDL ok (enhanced)
  • Analysis services will improve not ready
  • Catalogs not ready (much work done)
  • Data movement not ready (some progress job
    prefs)
  • Clients will improve (Python now available)
  • ATLAS Applications not ready will improve
  • ATLAS Datasets not ready done
  • Deployment will improve
  • Monitoring not provided

36
Status summary (cont)
  • Status of deliverables for release 0.90
  • Authentication and authorization ok
  • Service infrastructure ok
  • AJDL ok
  • Analysis services fix
  • Catalogs might improve
  • Data movement might improve
  • Clients ok
  • ATLAS Applications ok
  • ATLAS Datasets ok
  • Deployment ok
  • Monitoring ok

37
ARDA
  • ATLAS in ARDA
  • Agreed that ARDA team will deliver an analysis
    service based on the GLITE (EGEE) middleware
  • We have promised DIAL release by the end of May
  • Release 0.90 is for ARDA (alpha available now)
  • Try to finalize next week
  • Close to what we have now
  • Try to resolve client crash problem
  • Get demos working
  • ARDA/GLITE status
  • I still have not heard much
  • Meeting planned week of June 14 (by invitation)

38
Conclusions
  • Significant effort required for release 1.0
  • However we do expect to release an end-to-end
    system
  • Push this release to June or July
  • Aim for release 0.90 in next week or two
  • Alpha release available now
  • Extra effort could greatly enhance system
  • Analysis service using production system
  • Means for data movement
  • Client tools (work started)
  • Reconstruction application (first version in
    place)
  • Deployment at other sites
  • Monitoring of any sort

39
Other presentations
  • This session we will hear more about
  • DC2 Reconstruction application Christian
  • DC2 analysis application Ketevi
  • GANGA/Python interface Karl
  • Graphical interface - Alvin
  • Working group meeting tomorrow
  • Demos
  • More detailed discussion

40
More information
  • ADA home page
  • http//www.usatlas.bnl.gov/ADA
  • This page has links to deliverables with more
    info
  • Needs updating to reflect changes since April
  • Release info
  • Follow links to DIAL and DIAL releases
  • E.g. release 0.90 described at
  • http//www.usatlas.bnl.gov/dladams/dial/releases/
    0.90
  • Includes instructions for installing software and
    running the demos

41
Acknowledgments
  • Much work leading up to and during this workshop
  • Including effort from
  • Christian Haeberli
  • Szymon Gadomski
  • Ketevi Assamagan
  • Karl Harrison
  • Alvin Tan
  • Vinay Sambamurthy
  • Nagesh Chetan
  • Chitra Kannan
  • Comments and suggestions from many others
Write a Comment
User Comments (0)
About PowerShow.com