Distributed Data Management and Processing 2.3 - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed Data Management and Processing 2.3

Description:

Online System. CERN Computer Center. US Center. Fermilab. France Regional Center ... available with similar goals (LSF, PBS, DQS, Condor, ...) but at the moment ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 42
Provided by: ygap5
Learn more at: https://uscms.org
Category:

less

Transcript and Presenter's Notes

Title: Distributed Data Management and Processing 2.3


1
Distributed Data Management and Processing 2.3
  • Ian Fisk
  • U.C. San Diego

2
Introduction to 2.3, Distributed Data Management
and Processing
  • 2.3 Distributed Data Management and Processing
    develops software tools to support the CMS
    Distributed Computing Model.

Tier 01
100 MBytes/sec
Online System
PBytes/sec
US Center Fermilab
Italy Regional Center
Tier 1
France Regional Center
UK Regional Center
Tier 2
Tier 3
Institute
Institute
Institute
Institute
Physics data cache
Tier 4
3
Introduction to 2.3, Distributed Data Management
and Processing
  • Supporting the CMS Distributed Computing Model is
    a daunting task.
  • 1/3 Processing Capability at Tier0, 1/3 Tier1s,
    and 1/3 Tier2s
  • Centers are spread globally over networks of
    variable bandwidth
  • Most physicists will be performing analysis at
    remote centers that locally have only a portion
    of the raw or reconstructed data.
  • Creating tools that will allow efficient access
    to data at local sites and to resources at remote
    sites is a complicated task.
  • This is a larger dataset, using more computing
    power, spread over a greater distance, than HEP
    has previously attempted and it requires a more
    advanced set of tools.

4
Introduction to 2.3, Distributed Data Management
and Processing
  • DDMP attempts to break the project into 5
    manageable pieces for efficient development
  • 2.3.1 Distributed Process Management
  • 2.3.2 Distributed Database Management
  • 2.3.3 Load Balancing
  • 2.3.4 Distributed Production Tools
  • 2.3.5 System Simulation
  • The combination of these should allow CMS to
    take full advantage of the grid of Distributed
    Computing Resources.
  • CMS has immediate needs for simulation for use in
    completing TDRs (HLT, Physics, ). Wherever
    possible the project attempts to develop tools
    that are useful in production at existing
    facilities while developing more advanced tools
    for the future.

5
Introduction to 2.3 Distributed Data Management
and Processing
  • Distributed Data Management and Processing
    attempts to integrate software developed
    elsewhere whenever possible. Exploiting a number
    of tools being developed for grid computing.

6
Introduction to Distributed Process Management
2.3.1
  • The Goal of Distributed Process Management is to
    develop tools that enable physicists to make
    efficient use of computing resources distributed
    world wide.
  • There are a number of tools available with
    similar goals (LSF, PBS, DQS, Condor, ) but at
    the moment none are judged to be adequate to meet
    long term needs of CMS.
  • Important Issues
  • Keeping track of long running jobs
  • Support collaboration among multiple physicists
  • Conserving limiting network bandwidth
  • Maintaining high availability
  • Tolerating partition failures that are common to
    WANs

7
Distributed Process Management Prototype
Development
  • Prototype introduces the concept of a session
    which is a container for interrelated jobs. This
    allows submission, monitoring, and termination
    with a single command. Sessions can be shared
  • Processors can be chosen based on data
    availability,processor type, and load.
  • Replicated states are maintained so that
    computations
  • will not be lost if a server fails.
  • Prototype is based on functional language ML and
  • Group Communications Toolkit. The Group
  • Communications Toolkit aids writing
    distributed
  • programs.

8
Distributed Process Management Current Status
  • Working prototype exists with features described
    on previous slide.
  • The system has been tested with 32 processors
    performing CMS ORCA production.
  • Some scalability issues were encountered and
    repaired.
  • System has been tested on 65 processors with no
    scalability problems encountered.

9
Distributed Process Management Prototype Future
Plans
  • In the next few months Distributed Process
    Management will move development efforts to the
    CMS Tier2 Prototype Computation Center at
    Caltech/UCSD.
  • The unique split nature of the center and large
    number of processors makes it a nearly ideal
    place to work on scalability, remote submission,
    and more complex ORCA scenarios.
  • Spring 2001 there are plans for support for
    multiple users and development of a queuing
    system when resources are unavailable.
  • First Prototype is expected to be complete in the
    summer of 2001.

10
Distributed Process Management Fully Functional
Development
  • Milestones tied to deliverables to CMS for use in
    production.
  • Program starts with algorithm development for use
    in Process Management including data aware Self
    Organizing Neural Network agents for scheduling.
  • Fully Functional should be completed sometime in
    2003.

11
Introduction to Distributed Database Management
2.3.2
  • Distributed Database Management develops tools
    external to the ODBMS that control replication
    and synchronization of data over the grid as well
    as monitoring and improving the performance of
    database access.
  • As event production becomes less CERN centric
    there is an immediate need for tools to replicate
    data produced at remote sites. There is also a
    need to evaluate and improve the performance of
    database access.
  • In the future for Distributed Production there is
    a need for tools to automatically synchronize
    some databases over all sites analyzing results
    and a need to replicate databases on demand for
    remote analysis jobs.
  • Distributed Database Management attempts to meet
    both these needs.

12
Distributed Database Management Prototype
Development
  • To meet the long and short term goals two paths
    were pursued an investigational prototype
    written in Perl and development with the Grid
    Data Management Pilot (GDMP) on a functional
    prototype based on Globus Middleware.
  • Both require high speed transfers, secure data
    access, transfer verification, integration of the
    data upon arrival, and remote catalogue querying
    and publishing.

13
Investigational Prototype Goals
  • one-way bulk replication of read-only (static)
    datafiles
  • simple prototype using available software
  • RFIO from HPSS to disk at CERN
  • SCP from disk at CERN to disk at Fermilab
  • FMSS to archive from disk at Fermilab to tape
  • Objectivity tools (oodumpcatalog, oodumpschema,
    oonewfd, ooschemaupgrade, ooattachdb)
  • all wrapped up in Perl with HTTP and TCP/IP
  • aim for automation, not performance
  • transferring 1 2GB file is easy, transferring
    1000 is not
  • objective is to clone (part of) a federation from
    CERN in Fermilab.
  • automated MSS ? MSS transfer via (small) disk
    pools
  • use as a possible fallback solution.
  • documentation at http//home.cern.ch/wildish

14
Investigational Prototype Steps
  • The basic steps
  • Create an empty federation with the right schema
    and pagesize.
  • Get the schema directly from the source
    federation via a web-enabled ooschemadump.
  • Find out what data is available.
  • Use a web-enabled oodumpcatalog to list the
    source federation catalogue.
  • Determine what is new w.r.t. your local
    federation.
  • Use a catalogue-diff, based on DB Name or ID.
  • Request the files you want from a server at the
    source site.
  • Server will stage files from HPSS to a local disk
    buffer, then send them to you.
  • Process files as they arrive.
  • Attach them to your federation, archive them to
    MSS, purge them when your local disk buffer fills
    up.
  • Repeat steps 2,3,4 as desired, and step 5 as
    desired.

15
Exporting data prototype design
http
Catalogue-server
Remote client
ooschemadump oodumpcatalog
ooschemaupgrade oonewfd
User federation
New cloned federation
(firewall?)
(firewall?)
CERN
Fermilab
16
Distributed Database Management Investigational
Prototype Exporting
http
Catalogue-server
Catalogue-diff
TCP socket
DBServer
oodumpcatalog
HPSS
Local cloned federation
rfcp
(firewall?)
MSS
disk pool
ooattachdb
scp (Secure Shell Copy)
archive
Process new DBs
(firewall?)
CERN
disk pool
Fermilab
17
Distributed Database Management Investigational
Prototype Results
  • 600GB transferred in 9 days
  • SHIFT20 (200GB disk) ? CMSUN1 (280GB disk)
  • federation built automatically as data arrived
  • data archived automatically to FMSS
  • peak rate 2.7MB/sec sustainable for several hours
  • performance unaffected by batch jobs running on
    Fermilab client or CERN server.
  • best results with ? 40 simultaneous copies
    running
  • monitored with the production-monitoring system
  • monitor Fermilab client from a desktop at CERN
  • Investigation Prototype development frozen, but
    parts of the code are being reused for
    Distributed Production Tools, updated monitoring
    system, and database comparisons.

18
Distributed Database Management Functional
Prototype
  • Flexible, layered, and modular architecture
    designed to be able to support modifications and
    extensions using Globus as the basic Middleware.
  • Data Model
  • Export Catalog
  • Contains information about the new files produced
    which are ready to be accessed by other sites.
  • Export catalog is published to all the subscribed
    sites.
  • A new export catalog is generated, every time a
    site wants to publish its files, which contains
    the newly generated files only.
  • Import Catalog
  • Contains the information about the files which
    have been published by other sites but not yet
    transferred locally.
  • As soon as the file is transferred locally,
    validated and attached to the federation, it is
    removed from the import catalog.
  • Subscription Service
  • All the sites that subscribe to a particular site
    get notified whenever there is an update in its
    catalog. Supports both a push and pull
    mechanism.

19
Database Replicator Functional Prototype
Architecture
  • Communication
  • Control Messages
  • Data Mover
  • File Transfers
  • Logging Incoming and Outgoing Files
  • Resuming File Transfers
  • Progress Meters
  • Error Checks
  • Security
  • Authentication and authorization
  • Replica Manager
  • Handling Replica Catalogue
  • Replica Selection and Synchronization

Application
Globus-threads
Request Manager
Globus-dc
Globus Rep. Manager
gssapi
GIS
Objy API
DB Manager
Information Service
Replica Manager
Security
Control Comm.
Data Mover
Globus-ftp
Globus_io
Layered Architecture for Distributed Data
Management
20
Database Replicator Functional Prototype
Architecture
  • Information Service
  • Publish data and network resources at sites.
  • DB Manager
  • Backend to database specific functions.
  • Request Manager
  • Generating Request on the client side and
    handling requests on the server side.
  • Application
  • Multi-threaded Server handling clients.

Application
Globus-threads
Request Manager
Globus-dc
Globus Rep. Manager
gssapi
GIS
Objy API
DB Manager
Information Service
Replica Manager
Security
Control Comm.
Data Mover
Globus-ftp
Globus_io
Layered Architecture for Distributed Data
Management
21
Integration into the CMS Environment
Site A
CMS environment
Physics software
CheckDB script
GDMP system
Write DB
DB completeness check
CMS/GDMP interface
Production federation
catalog
Site B
Purge file
Copy file to MSS
Stage Purge scripts
Stage Purge scripts
Copy file to MSS
MSS
MSS
Transfer attach
Update catalog
Purge file
User federation
User federation
catalog
catalog
wan
Stage file (opt)
trigger
trigger
trigger
read
GDMP export catalog
Subscribers list
GDMP import catalog
Replicate files
write
Generate import catalog
Publish new catalog
Generate new catalog
GDMP server
22
Database Replicator Functional Prototype Current
Status
  • The decision was made to use the Functional
    Prototype in the fall ORCA production.
  • This required adding some features and making it
    more fault tolerant.
  • Parallel Transfers to improve performance.
  • Resumption of file transfer from checkpoints to
    handle network interruptions.
  • Catalogue filtering to allow more choices for
    files to import and export to remote sites.
  • User Guide
  • Being used at remote centers for ORCA fall
    production to handle replication of Objectivity
    files.

23
Database Replicator Prototype Future Plans
  • When the GDMP tools were written they were
    tightly coupled to Objectivity applications and
    they were unable to replicate non-Objectivity
    files. With the addition of Globus Replica
    Catalogue, they should be able to perform file
    format independent replication in January of
    2001.
  • In May 2001 integration and development of Grid
    Information Services should begin. At the moment
    the data replicator cannot make an intelligent
    choice as to which copy to access given choices.
    This decision should be made based on current
    network bandwidth, latency between two given
    nodes, load on the data servers, etc.

24
Fully Functional Prototype Development
  • Development toward a fully functional prototype
    is foreseen starting after the summer of 2001 and
    continuing until 2003.
  • This involves the testing and integration of grid
    tools currently under development
  • Mobile agents that float on the network
    independently, communicate, and make intelligent
    decisions when triggered.
  • Use of virtual data, the concept that all except
    irreproducible raw experimental data need exist
    only as specifications for how to derive them.

25
Request Redirection Protocol
  • The second goal of Distributed Database
    Management was to evaluate and improve database
    access.
  • The performance and capabilities of the
    Objectivity AMS server can be improved by writing
    plugins that conform to a well defined interface.
  • To improve the availability of the database
    servers, one such plugin, the Request Redirection
    Protocol has been implemented. When the
    Federated Database has determined that an AMS has
    crashed (due to a disk failure, etc.), jobs can
    be automatically transferred to an alternate
    server. This has been implemented on the CERN
    AMS servers for a month.
  • In early 2001, a security protocol plugin will be
    implemented.

26
Introduction to Load Balancing 2.3.3
  • Balancing the use of resources in a distributed
    computing environment is difficult and requires
    the integration and augmentation of elements
    Distributed Process Management and Distributed
    Database Management with intelligent algorithms
    to determine the most efficient course of action.
  • In a distributed computing system jobs can be
    submitted to the computing resources where the
    data is available or the data can be moved to
    available computing resources.
  • Deciding between these two cases to efficiently
    complete all requests and balance the load over
    all the computing grid requires good algorithms
    and lots of information about network traffic,
    CPU loads, and data availability.

27
Load Balancing Current Status
  • While there has been considerable work on
    Distributed Process Management and Distributed
    Database Management and some effort on
    information services and algorithm development,
    most of the work on Load Balancing is still to
    come.
  • Preliminary Work has been done on a prototype of
    Grid Information Services using Globus
    Middleware.
  • Publish outside domain resources that can be
    accessed inside domain.
  • Static
  • CPU Power
  • Operating System Details
  • Software Versions
  • Available Memory
  • Dynamic
  • CPU Load
  • Network Bandwidth
  • Network Latency
  • Updates every few seconds

28
Load Balancing Future Plans
  • Algorithm Development should start in the summer
    of 2001 using conventional and Self Organizing
    Neural Network techniques.
  • Integration of Distributed Process Management and
    Distributed Database Management should begin as
    those projects enter the fully functional
    prototype phase.

29
Introduction to Distributed Production Tools 2.3.4
  • The Goal of Distributed Production Tools is to
    develop tools for immediate use to aid CMS
    production at existing computing facilities.
  • Job submission
  • Transferring and Archiving Results
  • System Monitoring
  • US-CMS until recently had no dedicated production
    facilities. Production in the US was performed
    on existing facilities with a wide variety of
    capabilities, platforms, and configurations.
  • CMS has an immediate need for simulated events to
    complete the Trigger TDR and later the Physics
    TDR. This project helps to meet the immediate
    need, while lessons learned help long term goals
    as well.

30
Distributed Production Tools Current Status
  • Based on the database replicator investigative
    prototype, tools have been designed to
    automatically record and archive results of
    production performed at remote sites and to
    transfer these results to the CERN mass storage
    system. This has been primarily used for
    archiving CMSIM production performed at Padua,
    Moscow, IN2P3, Caltech, Fermilab, Bristol, and
    Helsinki.
  • Tools have been developed to utilize existing
    facilities in the US. The aging HP X-class
    Exemplar System has been used for CMSIM
    production and the Wisconsin Condor system, which
    is a scavenger system using spare cycles of
    Linux systems has been used for CMSIM production
    and will be used for ORCA production this fall.

31
Distributed Production Tools Current Status of
System Monitoring
  • Tools have been developed to monitor production
    systems.
  • This helps to evaluate and repair bottlenecks in
    the production systems.
  • This provides realistic input parameters to the
    system simulation tools and improves the quality
    of simulation.
  • This provides information to make intelligent
    choices about requirements of future production
    facilities.
  • Monitoring uses Perl/bash scripts running on each
    node
  • Information is generated in a netlogger
    inspired format.
  • UDP datagrams transmit results to collection
    machines
  • Numerical quantities are histogrammed every n
    minutes and put on the web.
  • During Spring Production it was used to monitor
    150 nodes with 25MB ASCII logging per day.

32
Distributed Production Tools Current Status of
System Monitoring
  • Goals of the project are to try to understand how
    best to arrange the data for fast access.
  • Monitor standard things on data servers
  • CPU, network, disk I/O, paging, swap, load
    average etc.
  • Monitor the AMS.
  • Which files the user reads (includes those
    already on disk).
  • Number of open filehandles (also for Lockserver).
  • Monitor the lockserver
  • Transaction ages, hosts holding locks etc.
  • Monitor the staging system
  • Names of files staged in.
  • Time it takes for them to arrive.
  • Names of purged files.

33
System Monitoring Results
  • This shows the AMS activity on 6 AMS servers by
    the simple means of counting the number of
    filehandles that each server had open at a given
    time.

34
Distributed Production Tools Future Plans
  • Tools are being developed to support generic job
    submission over diverse existing computing
    facilities to improve the ease of use. The first
    of these which is based on LSF will be available
    in the spring of 2001.
  • It is a relatively small extension of the system
    monitoring tools to initiate an action if the
    monitoring measures certain kinds of problems.
    Already a system exists to send e-mail to the
    appropriate people. Tools are being developed so
    that all the jobs in a batch queue should be able
    to be cleanly stopped or paused if the system
    monitoring tools determine that a server has
    crashed or that a disk has filled up.

35
Introduction to System Simulation 2.3.5
  • Distributed Computing Systems of the scope and
    complexity proposed by CMS do not yet exist. The
    System Simulation project attempts to evaluate
    distributed computing plans by performing
    simulations of large scale computing systems.
  • The MONARC simulation toolkit is used. The goals
    of MONARC are
  • To provide realistic modeling of distributed
    computing systems, customized for specific HEP
    applications.
  • To reliably model the behavior of computing
    facilities and networks, using specific
    application software and usage patterns.
  • To offer a dynamic and flexible simulation
    environment.
  • To provide a design framework to evaluate a range
    of possible computing systems as measured by the
    ability to provide physicists with the requested
    data within the required time.
  • To narrow down a region of parameter space in
    which viable models can be chosen.
  • The toolkit is Java based to take advantage of
    Javas built in support for multi-threaded for
    concurrent processing.

36
System Simulation Current Status
  • MONARC is currently in its third phase and was
    recently updated to be able to handle larger
    scale simulations.
  • The simulation of the spring 2000 ORCA production
    served as a nice validation of the tool kit.
    Using inputs from the system monitoring tools the
    simulation was able to accurately reproduce the
    behavior of the computing farm CPU utilization,
    network traffic and total time to complete jobs.
  • As an indication of the maturity of the
    simulation tools, there is a simulation being
    performed of Distributed Process Management using
    Self Organizing Neural Networks. Since full
    scale production facilities will not be available
    for some time, it is nice to get a head start on
    algorithm development using the simulation.

37
System Simulation Current Status
  • Simulation GUI

38
System Simulation Spring HLT Production
Below are simulation examples of network traffic
and CPU efficiency
Measurement
Simulation
39
System Simulation Future Plans
  • Plans to update the estimated CMS computing needs
    in December.
  • In early 2001 there are plans to update the
    MONARC package to have modules for Distributed
    Process Management and Distributed Database
    Management.

40
System Simulation Future Plans
  • The upgraded package should allow better
    simulation of Distributed Computing Systems. Two
    are planned for Spring 2001
  • A study of the role of tapes in Tier1-Tier2
    interactions, which should help describe
    interactions and evaluate storage needs.
  • A complex study of Tier0-Tier1-Tier2 interactions
    to evaluate a complete CMS data processing
    scenario, including all the major tasks
    distributed among regions centers.
  • During the remainder of 2001 the System
    Simulation Project will aid in the development of
    load balancing schemes.

41
Conclusions
  • The CMS Distributed Computing model is complex
    and advanced software is needed to make it work.
  • Tools are needed to submit, monitor and control
    groups of jobs at remote and local sites.
  • Data needs to be moved over the computing grid to
    the processes that need it.
  • An intelligent system needs to exist to determine
    the most efficient split of moving data and
    exporting processes.
  • CMS has TDRs due which require large numbers of
    simulated events for analysis and tools are
    needed to facilitate production.
  • We are trying to deliver both.
Write a Comment
User Comments (0)
About PowerShow.com