DDM - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

DDM

Description:

Using improved version of DQ1 Servers, separate from Production. Includes Reliable File Transfer ... Before xmas with an additional machine did double that amount ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 30
Provided by: ppephysi
Category:
Tags: ddm | xmas

less

Transcript and Presenter's Notes

Title: DDM


1
DDM
  • ATLAS software week 23-27 May 2005

2
Outline
  • Existing Tools
  • Lessons Learned
  • DQ2 Architecture
  • Datasets
  • Catalogs, mappings and interactions
  • DQ2 Prototype
  • Technology
  • Servers, CLI, web page
  • Data Movement, subscriptions and datablocks
  • Plans for future
  • SC3
  • Evolution of prototype

3
Tools
dms2
dms3
dms4
Using improved version of DQ1 Servers, separate
from Production Includes Reliable File Transfer
Will use new Distributed Data Management
Infrastructure (DQ2 Servers)
First version of tool for end-users Using DQ1
Production Servers
New Grid catalogs Native support for
Datasets Automatic movement of blocks of data
Slow servers and Grid catalogs Problems with file
movement
Slow Grid catalogs Dealing only with individual
files is difficult
Current Version
4
dms3
  • dms3 is currently the tool to get data from Rome
    or DC-2 production
  • Still based on older DQ Production Servers and
    existing Grid catalogs
  • Requires Grid Certificates
  • http//lcg-registrar.cern.ch
  • Documentation on Twiki, including installation
    notes and common use cases
  • https//uimon.cern.ch/twiki/bin/view/Atlas/DonQuij
    oteDms3
  • (please, feel free to add notes to this page!)
  • Software
  • CERN
  • /afs/cern.ch/atlas/offline/external/DQClient/dms3/
    dms3.py
  • External Users
  • Download http//cern.ch/mbranco/cern/dms3.tar.gz
  • or contact your site administrator to have a
    shared installation

5
Reliable File Transfer
  • DQ includes a simple file transfer service RFT
  • MySQL backend database with transfer definition
    and queue
  • Set of transfer agents fetching requests from
    database
  • Only uses GridFTP to move data, supports SRM
    (get, put) and transfer priorities
  • Uses existing DQ Servers to interface with Grid
    catalogs
  • Two client interfaces
  • dms3, using replicate command transfer is queued
    into RFT
  • for end-users only transfer priority is limited
  • super-user client
  • allows priorities to be set, transfers to be
    rescheduled, cancelled, tagging of transfers,
  • access limited to Production

6
Reliable File Transfer
  • Transfer rate depends mostly on status of sites
  • Can go from smooth running to very large number
    of failures
  • critical for files with a single copy on the Grid
  • Bottlenecks
  • Grid catalogs for querying
  • Transferring individual files, not blocks of
    files
  • Lack of people to monitor transfer failures and
    sites
  • A single machine being used
  • 7000 transfers/day (mostly requests to transfer
    data to CERN)
  • Will not increase otherwise kills Castor_at_CERN
    for other users
  • Before xmas with an additional machine did double
    that amount
  • But was killing Castor GridFTP front-end machines
    very often

7
Lessons learned
  • Catalogs were provided by Grid providers and used
    as-is
  • Granularity file-level
  • No datasets, no file collections
  • No scoping of queries (difficult to find data,
    slow)
  • No bulk operations
  • Metadata support not usable
  • Too slow
  • Not valid workaround to query data per site, MD5
    checksums, file sizes
  • Logical Collection Name as metadata
    /datafiles/rome/
  • Catalogs not always geographically distributed
  • Single point of failure (middleware,
    people/timezones)
  • No ATLAS resources information system (with
    known/negotiated QoS)
  • and unreliable information systems from Grid
    providers

8
Lessons learned
  • No managed and transparent data access,
    unreliable GridFTP
  • SRM (and GridFTP with mass storage) still not
    sufficient
  • Difficult to handle different mass storage
    staggers from Grid
  • DQ
  • Single point of failure
  • NaĂŻve validation procedure
  • No self-validation at sites, between site
    contents and global catalogs
  • Operations level
  • Too centralized
  • Insufficient man-power
  • Still need to identify site contacts, at least
    for major sites
  • Insufficient training for users/production
    managers
  • Lack of coordination launching requests for
    files not stagged, ..
  • Also due to lack of automatic connections between
    Data Management and Production System tasks
  • Monitoring insufficient tools and people!

9
Lessons learned
  • Multiple flavors of Grid Catalogs with slightly
    different interfaces
  • Effort wasted on developing common interfaces
  • Minimal functionality with maximum error
    propagation!
  • No single data management tool for
  • Production
  • End-user analysis
  • (common across all Grids!)
  • No reliable file transfer plugged into Production
    System
  • Moving individual files non-optimal!
  • Too many sites used for permanent storage
  • Should restrict the list and comply with
    Computing Model and Tier organization

10
Distributed Data Management - outline
  • Database ( Data Management) project recently
    took responsibility in this area (formerly
    Production)
  • Approach proceed by evolving Don Quijote, while
    revisiting requirements, design and
    implementation
  • Provide continuity of needed functionality
  • Add dataset management above file management
  • Dataset named collection of files descriptive
    metadata
  • Container Dataset named collection of datasets
    descriptive metadata
  • Design, implementation, component selection
    driven by startup requirements for performance
    and functionality
  • Covering end user analysis (with priority) as
    well as production
  • Make decisions on implementation and component
    selection accordingly, to achieve the most
    capable system
  • Foresee progressive integration of new middleware
    over time

11
Don Quijote 2
  • Moves from a file based system to one based on
    datasets
  • Hides file level granularity from users
  • A hierarchical structure makes cataloging more
    manageable
  • However file level access is still possible
  • Scalable global data discovery and access via a
    catalog hierarchy
  • No global physical file replica catalog (but
    global dataset replica catalog and global logical
    file catalog)

12
Catalog architecture and interactions
13
Global catalogs
Holds all dataset names and unique IDs ( system
metadata)
Maintains versioning information and information
on container datasets, datasets consisting of
other datasets
Maps each dataset to its constituent files This
one holds info on every logical file so must be
highly scalable, however it can be highly
partitioned using metadata etc..
Stores locations of each dataset
All logically global but may be distributed
physically
14
Local Catalogs
Per grid/site/tier logical to physical file name
mapping. Implementations of this catalog are Grid
specific but must use a standard interface.
Per site storing of user claims on files and
datasets. Claims are used to manage stage
lifetime, resources and provide accounting.
15
(Some) DDM Use Cases (1)
  • Data acquisition and publication

Publish replica locations
Publish dataset info
Publish file replica locations
Publish dataset file content
16
DDM Use Cases (2)
  • Event selection

Select datasets based on physics attributes
Versioning / container dataset info
Get locations of datasets
Local file information
Get the files in the datasets
17
DDM Use Cases (3)
  • Dataset replication (see also subscriptions
    later)

Get current dataset location, replicate, then
publish new replica info
Get/publish local file info
Get the files to replicate
For more use cases and details see https//uimon.c
ern.ch/twiki/bin/view/Atlas/DonQuijoteUseCases
18
Implementation - Prototype Development Status
  • Technology choices
  • Python clients/servers based on HTTP GET/POST
  • POOL FC interface gives us choice of back-end
    (all our catalogs fit to the LFNGUIDPFN mapping
    system)
  • For prototype MySQL DB is used (with planned
    future evaluation of LCG File Catalog would
    give us ACLs, support for user defined catalogs
    etc.)
  • Servers
  • Use HTTPS (with Globus proxy certs) for POSTs and
    HTTP for GETs, ie world-readable data (can be
    made secure to eg ATLAS VO if required though)
  • Clients
  • Python command line client per server and overall
    UI client dq2
  • Web page interface directly to HTTP servers for
    querying

19
dq2 commands
  • dq2
  • Usage dq2 ltcommandgt ltargsgt
  • Commands
  • registerNewDataset ltdataset namegt ltlfn1 guid1
    lfn2 guid2...gt
  • registerDatasetLocations lt-i-cgt -v
    dataset version ltdataset namegt ltlocation(s)gt
  • registerNewVersion ltdataset namegt ltnew
    files lfn1 guid1 lfn2 guid2...gt
  • listDatasetReplicas -i-c -v dataset version
    ltdataset namegt
  • listFilesInDataset -v dataset version
    ltdataset namegt listDatasetsInSite -i-c ltsite
    namegt
  • listFileReplicas ltlogical file namegt
  • listDatasets -v dataset version ltdataset namegt
  • eraseDataset ltdataset namegt
  • -i and -c signify incomplete and complete
    datasets respectively (mandatory for adds,
    optional for queries (default is return both)) If
    no -v option is supplied the latest version is
    used.

20
Web browser interface
21
Datablocks
  • Datablocks are defined as immutable and
    unbreakable collections of files
  • They are a special case of datasets
  • A site cannot hold partial datablocks
  • There are no versions for datablocks
  • Used to aggregate files for convenient
    distribution
  • Files grouped together by physics properties, run
    number etc..
  • Much more scalable than file level distribution
  • The principal means of data distribution and data
    discovery
  • immutability avoids consistency problems when
    distributing data
  • moving data in blocks improves data distribution
    (bulk SRM requests)

22
Subscriptions
  • A site can subscribe to data
  • When a new version is available, this latest
    version of the dataset is automatically made
    available through site-local specific services
    carrying out the required replication
  • Subscriptions can be made to datasets (for file
    distribution) or container datasets (for
    datablock distribution)
  • Use cases
  • Automatic distribution of datasets holding a
    variable collection of datablocks (container
    datasets)
  • Automatic replication of files by subscribing to
    a mutable dataset (eg file-based calibration data
    distribution)

Site X
Subscriptions
Site Y
23
Subscriptions
  • System supports subscriptions for
  • Datasets
  • latest version of a dataset (triggers automatic
    updates whenever a new version appears)
  • Container Datasets
  • which in turn contain datablocks or datasets
  • supports subscriptions to the latest version of a
    container dataset (automatically triggers updates
    whenever e.g. the set of datablocks making up the
    container dataset changes)
  • Datablocks (immutable set of files)
  • Databuckets (see details next)
  • replication of a set of files using notification
    model (whenever new content appears on the
    databucket, the replication is triggered)

24
Subscription Agents
File state (local XML POOL FC)
Function
Agents
25
Data buckets
Remote Site
Data bucket
(file-based data bucket)
26
DQ concepts vs DQ2 concepts
  • DQ
  • File
  • identified by GUID or
  • by LFN
  • Only unit for data
  • movement, querying,
  • identifying sites (PFN),

27
Claims
28
Plans for future development
  • Service challenge 3 July - Dec
  • Prototype evolution
  • Fill catalogs with real data (Rome) and test
    robustness and scalability
  • Implement catalogs not yet done (hierarchy,
    claims)
  • External components
  • Testing of gLite FTS underway soon
  • POOL FC interfaces for LFC should be available
    nowish will evaluate as suitable backend based
    on performance
  • Users
  • Agreed with TDAQ to start discussions on whether
    DDM can/should be applied to EF-gtT0 data movement
  • Support commissioning in the near term
  • Gradual release to user community for analysis

29
What still needs to be done / Milestones
  • To be done
  • Finish hierarchical cataloguing system
  • Monitoring/logging of user operations
  • Security policies
  • Claims management system
  • Milestones

June July August September
October .
Write a Comment
User Comments (0)
About PowerShow.com