MAGDA - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

MAGDA

Description:

38k DC1 files (mostly gathered using spiders) Transferred more than 4 TB data between CERN castor and BNL HPSS since the start of DC1 ... – PowerPoint PPT presentation

Number of Views:398
Avg rating:3.0/5.0
Slides: 15
Provided by: rogerw73
Category:
Tags: magda | dc1

less

Transcript and Presenter's Notes

Title: MAGDA


1
MAGDA
  • Roger Jones
  • UCL
  • 16th December 2002

2
MAGDA
  • Main authors Wensheng Deng, Torre Wenaus
  • Magda is a distributed data manager prototype for
    grid-resident data.
  • The system is designed for rapid and flexible
    evolution of database schema and surrounding
    infrastructure, integration and interchange of
    third party components, etc.
  • Web service (via perlSOAP) and command line
    interfaces
  • C, Java and Perl APIs for access to all
    components of the database
  • C, Java APIs autogenerated by perl scripts from
    the mySQL database, so always synchronised
  • Developed as part of PPDG

3
Magda Documents and Installation Status
  • User guide http//www.atlasgrid.bnl.gov/magdadoc/u
    serguide.htm
  • In preparation. Suggestions are welcome.
  • Useful introduction - SC2002 hand out
  • http//www.atlasgrid.bnl.gov/magdademo/sc2002_post
    er.ppt Servers (web and database) at BNL
  • AFS clients available at
  • /afs/usatlas.bnl.gov/project/magda/current
  • /afs/cern.ch/atlas/maxidisk/d94/wenaus/wdeng/atlas
    _magda/magda_setup
  • Installation document
  • http//www.atlasgrid.bnl.gov/magdadoc/userguide.ht
    m3

4
Magda Usage
  • Total 327k files occupying 26 TB
  • 38k DC1 files (mostly gathered using spiders)
  • Transferred more than 4 TB data between CERN
    castor and BNL HPSS since the start of DC1
  • 4k U.S. Grid Testbed DC1 files and replicas
    registered using magda tools
  • Tokyo and Lyon tried for DC1, other sites being
    added progressively.
  • RAL is a priority (large store)
  • GDMP and Reptor integration now underway but we
    need these (production level) tools now

5
Stores currently accessed
  • NFS and AFS disk areas at US ATLAS Tier 1, CERN
  • ATLAS pools in the CERN staging system
  • CERN Castor mass store (ATLAS storage areas, eg.
    testbeam data)
  • US ATLAS Tier 1 HPSS 'rftp' service (the HPSS
    access mode that US ATLAS currently has access
    to)
  • ATLAS code repository contents
  • Personal data areas
  • MSS Locations at US ATLAS grid testbed sites
    (ANL, LBNL, Boston, Indiana)
  • Also Lyon, Tokyo,

6
MAGDA Entities
  • prime File catalog.
  • Catalogs all instances of all files in the
    system.
  • logical Logical filename catalog. Metadata about
    logical files (associated keys) not specific to
    particular physical instances.
  • site A computing facility, may have many data
    stores
  • e.g CERN CASTOR
  • location Data locations (eg. directory, staging
    pool).
  • Associated with a particular site.
  • Given location designated as either a 'prime' or
    'replica' location.
  • host Computers on which system runs or which
    provide access
  • Is the means by which the spider knows
  • Where it is
  • What locations it can scan
  • collection Collections of logical files.
  • collectionContent Logical file lists for
    collections.
  • task Catalog of replication tasks.
  • generic_sig Generic 'data signature sufficient
    for regeneration
  • Identifies equivalent data sets

7
Magda Architecture
8
Magda Command-line Tools
  • Type tools without parameters - get usage info
  • Calls globus-url-copy internally, and
    globus-job-run to interact with HPSS
  • magda_findfile searches the magda database
  • magda_putfile extended to work with Lyon HPSS
    recently
  • magda_getfile
  • magda_delete
  • Usage magda_delete filerecord ltfilenamegt
    ltsitelocationgt
  • magda_delete filerecord
    ltsitelocationgt
  • magda_delete location
    ltsitelocationgt

9
Magda Examples
  • magda_findfile dc1.002107.simul.0024 --sub
  • LFN//atlas.org/test.dc1.002107.simul.0024.hlt.eta
    _scan.zebra siteusatlasrftp path size28188000
    primary
  • LFN//atlas.org/dc1.002107.simul.0024.hlt.eta_scan
    .zebra siteutatlasfarm path size28188000
  • also shows .his and .log files
  • magda_getfile dc1.002107.simul.0024.hlt.eta_scan
    .log
  • Instance at usatlasrftp/home/grid_a/simul/00210
    7/log remotely accessible.
  • Instance at utatlasfarm/opt/testbed/cache/replica
    remotely accessible.
  • globus-url-copy -p 3 gsiftp//atlas000.uta.edu/opt
    /testbed/cache/replica/dc1.002107.simul.0024.hlt.e
    ta_scan.log file///tmp/dc1.002107.simul.0024.hlt.
    eta_scan.log 2gt1
  • File dc1.002107.simul.0024.hlt.eta_scan.log
    staged into local directory
  • LFN follows EDG form
  • Multiple versions are handled

10
Magda Replication
  • Automated file replication is supported
  • Definition of replication tasks
  • collection of files to be replicated
  • information on source location, including a cache
    collection if needed
  • information on the file transport mechanism
    (currently gridftp, bbftp and scp)
  • information on destination location, including a
    destination-side cache if necessary

11
Magda File Spider
  • File spider processes run as cron jobs on
    distributed hosts to fill catalog and keep the
    catalog up to date
  • Based on the host it is running on, it determines
    which sites and locations are accessible and
    updates them
  • Catalog entry is deleted if file is removed
  • Run crontab e to set it up as a cron job,
    useful info
  • /afs/usatlas.bnl.gov/project/magda/current/.cron
  • Spider can be invoked from command-line
  • dyFileSpider.pl sitelocation
  • magda_putfile is preferred for positive
    registration in production scripts

12
Magda in U.S. Grid Testbed DC1
13
Magda Production Database
  • Magda production database capability was used in
    U.S. Grid Testbed for DC1
  • Jobinfo
  • filename, submithost, processhost, joburl,
    moddate, primestore
  • Jobstatus
  • project, dataset, step, partition, finished,
    joburl, started, group, filename, dirname, extra
  • Very useful feature for general ATLAS DC
    production management

14
Magda Future Plans
  • Integration with ATLAS MetaData Iinterface for DC
    analysis
  • Will integrate Hierarchical Resource Manager
    (HRM) with the command line tools
  • Implementation of managing files distributed on
    the local disk of each node of a Linux farm
  • When file records go up to the order of millions,
    scalability is an important issue. Will look into
    grid catalog service (RLS)
  • Being evaluated by other experiments (STAR)
Write a Comment
User Comments (0)
About PowerShow.com