Third%20LCB%20Workshop - PowerPoint PPT Presentation

About This Presentation



Define RC Architectures and Services. Provide Guidelines for the final Models ... centers. Tape Mass Storage & Disk Servers. Database Servers. Physics. Software ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 71
Provided by: cms561


Transcript and Presenter's Notes

Title: Third%20LCB%20Workshop

  • Third LCB Workshop
  • Distributed Computing and Regional Centres
  • Harvey B. Newman (CIT)
  • Marseilles, September 29, 1999
  • http//
  • http//

LHC Computing Different from Previous Experiment
  • Geographical dispersion of people and resources
  • Complexity the detector and the LHC environment
  • Scale Petabytes per year of data

1800 Physicists 150 Institutes 32
Major challenges associated with ?
Coordinated Use of Distributed computing
resources ? Remote software development and
physics analysis ? Communication and
collaboration at a distance RD New Forms
of Distributed Systems
Challenges Complexity
  • Events
  • Signal event is obscured by 20 overlapping
    uninteresting collisions in same crossing
  • Track reconstruction time at 1034
    Luminosityseveral times 1033
  • Time does not scale from previous generations

HEP Bandwidth Needs Price Evolution
  • 1989 - 1999 A Factor of one to Several Hundred
    on Principal Transoceanic
  • A Factor of Up to 1000 in
    Domestic Academic and
    Research Nets
  • 1999 - 2006 Continued Study by ICFA-SCIC
    1998 Results of ICFA-NTF Show A
    Factor of One to Several Hundred (2X
    Per Year)
  • COSTS ( to Vendors)
  • Optical Fibers and WDM a factor gt 2/year
    reduction now ?Limits of Transmission Speed,
    Electronics, Protocol Speed
    PRICE to HEP ?
  • Complex Market, but Increased Budget likely to be
    neededReference BW/Price Evolution 1.5

Cost Evolution CMS 1996 Versus1999 Technology
Tracking Team
CMS 1996 Estimates
1996 Estimates
1996 Estimates
  • Compare to 1999 Technology Tracking Team
    Projections for 2005
  • CPU Unit cost will be close to early
  • Disk Will be more expensive (by 2) than early
  • Tape Currently Zero to 10 Annual Cost
    Decrease (Potential Problem)

LHC (and HENP) Computing and
Software Challenges
  • Software Modern Languages, Methods and
    Tools The Key to Manage Complexity
  • FORTRAN The End of an EraOBJECTS A
    Coming of Age
  • TRANSPARENT Access To Data
  • Location and Storage Medium Independence
  • Data Grids A New Generation of Data-Intensive
    Network-Distributed Systems for Analysis
  • A Deep Heterogeneous Client/Server
    Hierarchy, of Up to 5 Levels
  • An Ensemble of Tape and Disk Mass Stores
  • LHC Object Database Federations
  • Interaction of the Software and Data Handling
    Architectures The Emergence of New Classes of
    Operating Systems

Four Experiments
The Petabyte to Exabyte Challenge
  • Higgs and New particles Quark-Gluon Plasma CP

Data written to tape 5 Petabytes/Year
and UP (1 PB 1015 Bytes)
0.1 to 1 Exabyte (1 EB
1018 Bytes) (2010) (2020 ?) Total for
the LHC Experiments
To Solve the HENP Data
  • While the proposed future computing and data
    handling facilities are large by present-day
    standards,They will not support FREE access,
    transport or reconstruction for more than a
    Minute portion of the data.
  • Need effective global strategies to handle and
    prioritise requests, based on both policies and
    marginal utility
  • Strategies must be studied and prototyped, to
    ensure Viability acceptable turnaround times
    efficient resource utilization
  • Problems to be Explored How To
  • Meet the demands of hundreds of users who need
    transparent access to local and remote data, in
    disk caches and tape stores
  • Prioritise hundreds to thousands of requests
    from local and remote communities
  • Ensure that the system is dimensioned
    optimally, for the aggregate demand

  • Models Of Networked Analysis
    At Regional Centers
  • Caltech, CERN, Columbia, FNAL, Heidelberg,
  • Helsinki, INFN, IN2P3, KEK, Marseilles, MPI,
    Munich, Orsay, Oxford, Tufts
  • Specify the main parameters characterizing
    the Models performance throughputs, latencies
  • Develop Baseline Models in the feasible
  • Verify resource requirement baselines
    (computing, data handling, networks)
  • Define the Analysis Process
  • Define RC Architectures and Services
  • Provide Guidelines for the final Models
  • Provide a Simulation Toolset for Further Model

622 Mbits/s
FNAL/BNL 4.106 MIPS 200 Tbyte Robot
Desk tops
622 Mbits/s
Desk tops
University n.106MIPS 100 Tbyte Robot
Optional Air Freight
622 Mbits/s
CERN 6.107 MIPS 2000 Tbyte Robot
Desk tops
Model Circa 2006
622 Mbits/s
622 Mbits/s
MONARC General Conclusions on LHC
  • Following discussions of computing and network
    requirements, technology evolution and projected
    costs, support requirements etc.
  • The scale of LHC Computing is such that it
    requires a worldwide effort to accumulate the
    necessary technical and financial resources
  • The uncertainty in the affordable network BW
    implies that several scenarios of computing
    resource-distribution must be developed
  • A distributed hierarchy of computing centres will
    lead to better useof the financial and manpower
    resources of CERN, the Collaborations,and the
    nations involved, than a highly centralised model
    focused at CERN
  • Hence The distributed model also provides better
    use of physics opportunities at the LHC by
    physicists and students
  • At the top of the hierarchy is the CERN Center,
    with the ability to perform allanalysis-related
    functions, but not the ability to do them
  • At the next step in the hierarchy is a collection
    of large, multi-service Tier1
    Regional Centres, each with
  • 10-20 of the CERN capacity devoted to one
  • There will be Tier2 or smaller special purpose
    centers in many regions

Grid-Hierarchy Concept
  • Matched to the Worldwide-Distributed
    Collaboration Structure of LHC Experiments
  • Best Suited for the Multifaceted
  • Balance Between
  • Proximity of the data to centralized processing
  • Proximity to end-users for frequently accessed
  • Efficient use of limited network bandwidth
    (especially transoceanic and many world
    regions)through organized caching/mirroring/repli
  • Appropriate use of (world-) regional and local
    computing and data handling resources
  • Effective involvement of scientists and students
    in eachworld region, in the data analysis and
    the physics

MONARC Phase 1 and 2Deliverables
  • September 1999 Benchmark test validating the
  • Milestone completed
  • Fall 1999 A Baseline Model representing a
    possible (somewhat simplified)
    solution for LHC Computing.
  • Baseline numbers for a set of system and
    analysis process parameters
  • CPU times, data volumes, frequency and site of
    jobs and data...
  • Reasonable ranges of parameters
  • Derivatives How the effectiveness depends
    on some of the more sensitive parameters
  • Agreement of the experiments on the
    reasonableness of the Baseline Model
  • Chapter on Computing Models in the CMS and ATLAS
    Computing Technical Progress Reports

MONARC and Regional Centres
  • MONARC RC Representative Meetings in April and
  • Regional Centre Planning well-advanced, with
    optimistic outlook, in US (FNAL for CMS BNL for
    ATLAS), France (CCIN2P3), Italy
  • Proposals to be submitted late this year or
    early next
  • Active RD and prototyping underway, especially
    in US, Italy, Japanand UK (LHCb), Russia (MSU,
    ITEP), Finland (HIP/Tuovi)
  • Discussions in the national communities also
    underway in Japan, Finland, Russia, UK, Germany
  • Varying situations according to the funding
    structure and outlook
  • Need for more active planning outside of US,
    Europe, Japan, Russia
  • Important for RD and overall planning
  • There is a near-term need to understand the level
    and sharing ofsupport for LHC computing between
    CERN and the outside institutes, to enable the
    planning in several countries to advance.
  • MONARC CMS/SCB assumption traditional 1/3
    2/3 sharing

MONARC Working Groups Chairs
  • Analysis Process Design
  • P. Capiluppi (Bologna, CMS)
  • Architectures
  • Joel Butler (FNAL, CMS)
  • Simulation
  • Krzysztof Sliwa (Tufts, ATLAS)
  • Testbeds
  • Lamberto Luminari (Rome, ATLAS)
  • Steering
  • Laura Perini (Milan, ATLAS)
  • Harvey Newman (Caltech, CMS)
  • Regional Centres Committee

MONARC Architectures WG
  • Discussion and study of Site Requirements
  • Analysis task division between CERN and RC
  • Facilities required with different analysis
    scenarios, and network bandwidth
  • Support required to (a) sustain the Centre, and
    (b) contribute effectively to the distributed
  • Reports
  • Rough Sizing Estimates for a Large LHC
    Experiment Facility
  • Computing Architectures of Existing
  • LEP, FNAL Run2, CERN Fixed Target (NA45, NA48),
    FNAL Fixed Target (KTeV, FOCUS)
  • Regional Centres for LHC Computing
    (functionality services)
  • Computing Architectures of Future Experiments
    (in progress)
  • Babar, RHIC, COMPASS
  • Conceptual Designs, Drawings and
    Specifications for Candidate Site Architecture

Comparisons with LHC sized experiment CMS or
  • Total CPU CMS or ATLAS 1.5-2,000,000
    MSi95 (Current Concepts maybe for 1033

Architectural Sketch One Major LHC Experiment,
At CERN (L. Robertson)
  • Mass Market Commodity PC Farms
  • LAN-SAN and LAN-WAN Stars (Switch/Routers)
  • Tapes (Many Drives for ALICE) an archival
    medium only ?

MONARC Architectures WG Lessons and Challenges
for LHC
  • SCALE 100 Times more CPU and 10 Times more
    Data than CDF at Run2 (2000-2003)
  • DISTRIBUTION Mostly Achieved in HEP Only for
    Simulation. For Analysis (and some
    re-Processing), it will not happen without
    advance planning and commitments
  • REGIONAL CENTRES Require Coherent support,
    continuity, the ability to maintain the code
    base, calibrations and job parameters
  • HETEROGENEITY Of facility architecture and
    mode of use, and of operating systems, must be
  • FINANCIAL PLANNING Analysis of the early
    planning for the LEP era showed a definite
    tendency to underestimate the more requirements
    (by more than an order of magnitude)
  • Partly due to budgetary considerations

Regional Centre ArchitectureExample by I. Gaines
Tape Mass Storage Disk Servers Database Servers
Tier 2
Local institutes
Data Import
Data Export
Production Reconstruction Raw/Sim ?
ESD Scheduled, predictable experiment/ physics
Production Analysis ESD ? AOD AOD ?
DPD Scheduled Physics groups
Individual Analysis AOD ? DPD and
plots Chaotic Physicists
Physics Software Development
RD Systems and Testbeds
Info servers Code servers
Web Servers Telepresence Servers
Training Consulting Help Desk
MONARC Architectures WGRegional Centre
Facilities Services
  • Regional Centres Should Provide
  • All technical and data services required to do
    physics analysis
  • All Physics Objects, Tags and Calibration data
  • Significant fraction of raw data
  • Caching or mirroring calibration constants
  • Excellent network connectivity to CERN and the
    regions users
  • Manpower to share in the development of common
    maintenance, validation and production software
  • A fair share of post- and re-reconstruction
  • Manpower to share in the work on Common RD
  • Service to members of other regions on a (?)
    best effort basis
  • Excellent support services for training,
    documentation, troubleshooting at the Centre
    or remote sites served by it
  • Long Term Commitment for staffing, hardware
    evolution and supportfor RD, as part of the
    distributed data analysis architecture

MONARC Analysis Process WG
  • How much data is processed by how many people,
    how often, in how many places, with which
  • Analysis Process Design Initial Steps
  • Consider number and type of processing and
    analysis jobs, frequency, number of events, data
    volumes, CPU etc.
  • Consider physics goals, triggers, signals and
    background rates
  • Studies covered Reconstruction, Selection/Sample
    Reduction (one or more passes), Analysis,
  • Lessons from existing experiments are limited
    each case is tuned to the detector, run
    conditions, physics goals and technology of the
  • Limited studies so far, from the user rather
    than the system point of view more as
    feedback from simulations are obtained
  • Limitations on CPU dictate a largely Physics
    Analysis Group oriented approach to
    reprocessing of data
  • And Regional (local) support for individual
  • Implies dependence on the RC Hierarchy

MONARC Analysis ProcessInitial Sharing
  • Assume similar computing capacity available
    outside CERN for re-processing and data
  • There is no allowance for event simulation and
    reconstruction of simulated data, which it is
    assumed will be performed entirely outside CERN
  • Investment, services and infrastructure should be
    optimised to reduce overall costs TCO
  • Tape sharing makes sense if Alice needs so much
    more at a different time of the year
  • First two assumptions would likely result in
    at least a 1/32/3 CERNOutside ratio of
    resources(I.e., likely to be larger outside).

MONARC Analysis Process Example

MONARC Analysis Process BaselineGroup-Oriented

MONARC Baseline Analysis ProcessATLAS/CMS
Reconstruction Step

Monarc Analysis Model Baseline Event Sizes and
CPU Times
  • Sizes
  • Raw data 1 MB/event
  • ESD 100 KB/event
  • AOD 10 KB/event
  • TAG or DPD 1 KB/event
  • CPU Time in SI95 seconds
  • (without ODBMS overhead 20)
  • Creating ESD (from Raw) 350
  • Selecting ESD 0.25
  • Creating AOD (from ESD) 2.5
  • Creating TAG (from AOD) 0.5
  • Analyzing TAG or DPD 3.0
  • Analyzing AOD 3.0
  • Analyzing ESD 3.0
  • Analyzing RAW 350

Monarc Analysis Model Baseline ATLAS or CMS at
CERN Center
  • CPU Power 520 KSI95
  • Disk space 540 TB
  • Tape capacity 3 PB, 400 MB/sec
  • Link speed to RC 40 MB/sec (1/2 of 622 Mbps)
  • Raw data 100 1-1.5 PB/year
  • ESD data 100 100-150 TB/year
  • Selected ESD 100 20 TB/year
  • Revised ESD 100 40 TB/year
  • AOD data 100 2 TB/year
  • Revised AOD 100 4 TB/year
  • TAG/DPD 100 200 GB/year
  • Simulated data 100 100 TB/year (repository)
  • Covering all Analysis Groups each selecting
    1 of Total ESD or AOD data for a Typical

Monarc Analysis Model Baseline ATLAS or CMS at
CERN Center
LHCb (Prelim.)
  • CPU Power 520 KSI95
  • Disk space 540 TB
  • Tape capacity 3 PB, 400 MB/sec
  • Link speed to RC 40 MB/sec (1/2 of 622 Mbps)
  • Raw data 100 1-1.5 PB/year
  • ESD data 100 100-150 TB/year
  • Selected ESD 100 20 TB/year
  • Revised ESD 100 40 TB/year
  • AOD data 100 2 TB/year
  • Revised AOD 100 4 TB/year
  • TAG/DPD 100 200 GB/year
  • Simulated data 100 100 TB/year (repository)
  • Some of these Basic Numbers require
    further Study

300 KSI95 ? 200 TB/yr 140 TB/yr
1-10 TB/yr 70 TB/yr
Monarc Analysis Model Baseline ATLAS or CMS
Typical Tier1 RC
  • CPU Power 100 KSI95
  • Disk space 100 TB
  • Tape capacity 300 TB, 100 MB/sec
  • Link speed to Tier2 10 MB/sec (1/2 of 155 Mbps)
  • Raw data 1 10-15 TB/year
  • ESD data 100 100-150 TB/year
  • Selected ESD 25 5 TB/year
  • Revised ESD 25 10 TB/year
  • AOD data 100 2 TB/year
  • Revised AOD 100 4 TB/year
  • TAG/DPD 100 200 GB/year Simulated data 25 25
    TB/year (repository)
  • Covering Five Analysis Groups each
    selecting 1 of Total ESD or AOD data for a
    Typical Analysis
  • Covering All Analysis Groups

MONARC Analysis Process WGA Short List of
Upcoming Issues
  • Priorities, schedules and policies
  • Production vs. Analysis Group vs. Individual
  • Allowed percentage of access to higher data
    tiers (TAG /Physics Objects/Reconstructed/RAW)
  • Improved understanding of the Data Model, and
  • Including MC production simulated data storage
    and access
  • Mapping the Analysis Process onto heterogeneous
    distributed resources
  • Determining the role of Institutes workgroup
    servers and desktops, in the Regional Centre
  • Understanding how to manage persistent data
    e.g. storage / migration / transport /
    re-compute strategies
  • Deriving a methodology for Model testing and
  • Metrics for evaluating the global efficiency of
    a Model Cost vs throughput turnaround
    reliability of data access

MONARC Testbeds WG
  • Measurements of Key Parameters governing the
    behavior and scalability of the Models
  • Simple testbed configuration defined and
  • Sun Solaris 2.6, C compiler version 4.2
  • Objectivity 5.1 with /C, /stl, /FTO, /Java
  • Set up at CNAF, FNAL, Genova, Milano, Padova,
    Roma, KEK, Tufts, CERN
  • Four Use Case Applications Using Objectivity
  • ATLASFAST, GIOD/JavaCMS, ATLAS 1 TB Milestone,
    CMS Test Beams
  • System Performance Tests Simulation Validation
    Milestone Carried Out See I. Legrand talk

MONARC Testbed Systems
MONARC Testbeds WG Isolation of Key Parameters
  • Some Parameters Measured,Installed in the MONARC
    Simulation Models,and Used in First Round
    Validation of Models.
  • Objectivity AMS Response Time-Function, and its
    dependence on
  • Object clustering, page-size, data
    class-hierarchy and access pattern
  • Mirroring and caching (e.g. with the Objectivity
    DRO option)
  • Scalability of the System Under Stress
  • Performance as a function of the number of jobs,
    relative to the single-job performance
  • Performance and Bottlenecks for a variety of
    data access patterns
  • Frequency of following TAG ? AOD AOD ? ESD
    ESD ? RAW
  • Data volume accessed remotely
  • Fraction on Tape, and on Disk
  • As Function of Net Bandwidth Use of QoS

MONARC Simulation
  • A CPU- and code-efficient approach for the
    simulation of distributed systemshas been
    developed for MONARC
  • provides an easy way to map the distributed data
    processing, transport, and analysis tasks onto
    the simulation
  • can handle dynamically any Model
    configuration,including very elaborate ones with
    hundreds of interacting complex Objects
  • can run on real distributed computer systems,
    and may interact with real components
  • The Java (JDK 1.2) environment is well-suited
    for developinga flexible and distributed process
    oriented simulation.
  • This Simulation program is still under
    development, and dedicated measurements to
    evaluate realistic parameters and validate the
    simulation program are in progress.

Example Physics Analysis at Regional Centres
  • Similar data processing jobs are performed
    in several RCs
  • Each Centre has TAG and AOD databases
  • Main Centre provides ESD and RAW data
  • Each job processes AOD data, and also a a
    fraction of ESD and RAW.

Example Physics Analysis

Simple Validation Measurements The AMS Data
Access Case
4 CPUs Client
Raw Data
MONARC Strategy and Tools for Phase 2
  • Strategy Vary System Capacity and Network
    Performance Parameters Over a Wide Range
  • Avoid complex, multi-step decision processes
    that could require protracted study.
  • Keep for a possible Phase 3
  • Majority of the workload satisfied in an
    acceptable time
  • Up to minutes for interactive queries, up to
    hours for short jobs, up to a few days for
    the whole workload
  • Determine requirements baselines and/or flaws
    in certain Analysis Processes in this way
  • Perform a comparison of a CERN-tralised Model,
    and suitable variations of Regional Centre
  • Tools and Operations to be Designed in Phase 2
  • Query estimators
  • Affinity evaluators, to determine proximity of
    multiple requests in space or time
  • Strategic algorithms for caching, reclustering,
    mirroring, or pre-emptively moving
    data (or jobs or parts of jobs)

MONARC Phase 2Detailed Milestones
July 1999 Complete Phase 1 Begin Second Cycle
of Simulationswith More Refined Models
MONARC Possible Phase 3
  • Facilitate the efficient planning and design of
    mutually compatible site and network
    architectures, and services
  • Among the experiments, the CERN Centre
    and Regional Centres
  • Provide modelling consultancy and service to the
    experiments and Centres
  • Provide a core of advanced RD activities, aimed
    at LHC computing system optimisation and
    production prototyping
  • Take advantage of work on distributed
    data-intensive computingfor HENP this year in
    other next generation projects
  • For example in US Particle Physics Data Grid
    (PPDG) of DoE/NGI A Physics Optimized Grid
    Environment for Experiments (APOGEE) to
    DoE/HENP joint GriPhyN proposal to NSF by
  • See H. Newman, http//

MONARC Phase 3
  • Possible Technical Goal System
    OptimisationMaximise Throughput and/or Reduce
    Long Turnaround
  • Include long and potentially complex
    decision-processesin the studies and simulations
  • Potential for substantial gains in the work
    performed or resources saved
  • Phase 3 System Design Elements
  • RESILIENCE, resulting from flexible management of
    each data transaction, especially over WANs
  • FAULT TOLERANCE, resulting from robust fall-back
    strategies to recover from abnormal conditions
    co-schedule requests and resources, detect
    or predict faults
  • Synergy with PPDG and other Advanced RD
  • Potential Importance for Scientific Research and
    IndustrySimulation of Distributed Systems for
    Data-Intensive Computing.

MONARC Status Conclusions
  • MONARC is well on its way to specifying baseline
    Models representing cost-effective solutions
    to LHC Computing.
  • Initial discussions have shown that LHC computing
    has a new scale and level of complexity.
  • A Regional Centre hierarchy of networked centres
    appears to be the most promising solution.
  • A powerful simulation system has been developed,
    and we areconfident of delivering a very useful
    toolset for further model studies by the end of
    the project.
  • Synergy with other advanced RD projects has been
    identified.This may be of considerable mutual
  • We will deliver important information, and
    example Models
  • That is very timely for the Hoffmann Review and
    discussions of LHC Computing over the next
  • In time for the Computing Progress Reports of
    ATLAS and CMS

LHC Data Models RD45
  • HEP data models are complex!
  • Rich hierarchy of hundreds of complex data
    types (classes)
  • Many relations between them
  • Different access patterns (Multiple Viewpoints)
  • LHC experiments rely on OO technology
  • OO applications deal with networks of objects
    (and containers)
  • Pointers (or references) are used to describe
  • Existing solutions do not scale
  • Solution suggested by RD45 ODBMS coupled to a
    Mass Storage System

System View of Data Analysis by 2005
  • Multi-Petabyte Object Database Federation
  • Backed by a Networked Set of Archival Stores
  • High Availability and Immunity from Corruption
  • Seamless response to database queries
  • Location Independence storage brokers caching
  • Clustering and Reclustering of Objects
  • Transfer only useful data
  • Tape/disk across networks disk/client
  • Access and Processing Flexibility
  • Resource and application profiling, state
    tracking, co-scheduling
  • Continuous retrieval/recalculation/storage
  • Trade off data storage, CPU and network
    capabilities to optimize performance and costs

CMS Analysis and Persistent Object Store
  • Data Organized In a(n Object) Hierarchy
  • Raw, Reconstructed (ESD), Analysis Objects (AOD),
  • Data Distribution
  • All raw, reconstructed and master parameter DBs
    at CERN
  • All event TAG and AODs at all regional centers
  • Selected reconstructed data sets at each regional
  • HOT data (frequently accessed) moved to RCs

Slow Control Detector Monitoring
Persistent Object Store Object Database
Management System
Calibrations, Group Analyses
User Analysis
Common Filters and Pre-Emptive Object Creation
On Demand Object Creation
GIOD Summary (Caltech/CERN/FNAL/HP/SDSC)
  • GIOD has
  • Constructed a Terabyte-scale set of fully
    simulated events and used these to create a large
    OO database
  • Learned how to create large database federations
  • Completed 100 (to 170) Mbyte/sec CMS Milestone
  • Developed prototype reconstruction and analysis
    codes, and Java 3D OO visualization prototypes,
    that work seamlessly with persistent
    objects over networks
  • Deployed facilities and database federations as
    useful testbeds for Computing Model

Babar OOFS Putting The Pieces Together
Dynamic Load Balancing Hierarchical Secure AMS
  • Defer Request Protocol
  • Transparently delays client while data is made
  • Accommodates high latency storage systems (e.g.,
  • Request Redirect Protocol
  • Redirects client to an alternate AMS
  • Provides for dynamic replication and real-time
    load balancing

Regional Centers ConceptA Data Grid Hierarchy
  • LHC Grid Hierarchy Example
  • Tier0 CERN
  • Tier1 National Regional Center
  • Tier2 Regional Center
  • Tier3 Institute Workgroup Server
  • Tier4 Individual Desktop
  • Total 5 Levels

Background Why Grids?
For transparent, rapid access and delivery of
Petabyte-scale data(and Multi-TIPS computing
  • I. Foster, ANL/Chicago
  • Because the resources needed to solve complex
    problems are rarely colocated
  • Advanced scientific instruments
  • Large amounts of storage
  • Large amounts of computing
  • Groups of smart people
  • For a variety of reasons
  • Resource allocations not optimized for one
  • Required resource configurations change
  • Different views of priorities and truth

Grid Services Architecture
Adapted from Ian Foster there are computing
grids, data grids, access (collaborative)
Roles of HENP Projectsfor Distributed Analysis (
? Grids)
  • RD45, GIOD Networked Object Databases
  • Clipper/GC High speed access to Objects
    or File data FNAL/SAM for
    processing and analysis
  • SLAC/OOFS Distributed File System
    Objectivity Interface
  • NILE, Condor Fault Tolerant Distributed
    Computing with Heterogeneous CPU Resources
  • MONARC LHC Computing Models Architecture,
    Simulation, Testbeds Strategy, Politics
  • PPDG First Distributed Data Services and
    Grid System Prototype
  • ALDAP OO Database Structures and
    Access Methods for Astrophysics and HENP
  • APOGEE Full-Scale Grid Design
    Instrumentation, System Modeling and
    Simulation, Evaluation/Optimization
  • GriPhyN Production Prototype Grid in
    Hardware and Software then Production

  • ALDAP Accessing Large Data Archives in
    Astronomy and Particle Physics
  • NSF Knowledge Discovery Initiative (KDI)
  • CALTECH, Johns Hopkins, FNAL(SDSS)
  • Explore advanced adaptive database structures,
    physical data storage hierarchies for archival
    storage of next generation astronomy and
    particle physics data
  • Develop spatial indexes, novel data
    organizations, distribution and delivery
    strategies, for Efficient and transparent
    access to data across networks
  • Create prototype network-distributed data query
    execution systems using Autonomous Agent workers
  • Explore commonalities and find effective common
    solutions for particle physics and astrophysics

The China Clipper ProjectA Data Intensive Grid
  • China Clipper Goal
  • Develop and demonstrate middleware allowing
    applications transparent, high-speed access to
    large data sets distributed over wide-area
  • ? Builds on expertise and assets at ANL, LBNL
  • ? NERSC, ESnet
  • ? Builds on Globus Middleware and
    high-performance distributed storage
    system (DPSS from LBNL)
  • ? Initial focus on large DOE HENP applications
  • ? RHIC/STAR, BaBar
  • ? Demonstrated data rates to 57 Mbytes/sec.

HENP Grand Challenge/Clipper Testbed and Tasks
  • High-Speed Testbed
  • Computing and networking (NTON, ESnet)
  • Differentiated Network Services
  • Traffic shaping on ESnet
  • End-to-end Monitoring Architecture (QE, QM, CM)
  • Traffic analysis, event monitor agents to
    support traffic shaping and CPU scheduling
  • Transparent Data Management Architecture
  • Application Demonstration
  • Standard Analysis Framework (STAF)
  • Access data at SLAC, LBNL, or ANL (net and data

The Particle Physics Data Grid (PPDG)
  • DoE/NGI Next Generation Internet Project
  • Goal To be able to query and partially retrieve
    data from PB data stores across Wide Area
    Networks within seconds
  • Drive progress in the development of the
    necessary middleware, networks and fundamental
    computer science of distributed systems.
  • Deliver some of the infrastructure for widely
    distributed data analysis at multi-PetaByte
    scales by 100s to 1000s of physicists

PPDG First Year DeliverableSite-to-Site
Replication Service
PRIMARY SITE Data Acquisition, CPU, Disk, Tape
  • Network Protocols Tuned for High Throughput
  • Use of DiffServ for (1) Predictable high
    priority delivery of high - bandwidth data
    streams (2) Reliable background transfers
  • Use of integrated instrumentation to
    detect/diagnose/correct problems in long-lived
    high speed transfers NetLogger DoE/NGI
  • Coordinated reservaton/allocation techniques
    for storage-to-storage performance
  • First Year Goal Optimized cached read access
    to 1-10 Gbytes, drawn from a total data set of
    order One Petabyte

PPDG Multi-site Cached File Access System
PRIMARY SITE Data Acquisition, Tape, CPU, Disk,
Satellite Site Tape, CPU, Disk, Robot
University CPU, Disk, Users
Satellite Site Tape, CPU, Disk, Robot
Satellite Site Tape, CPU, Disk, Robot
University CPU, Disk, Users
University CPU, Disk, Users
First Year PPDG System Components
  • Middleware Components (Initial Choice) See PPDG
    Proposal Page 15
  • Object and File-Based Objectivity/DB (SLAC
    enhanced) Application Services GC Query Object,
    Event Iterator, Query Monitor
  • FNAL SAM System
  • Resource Management Start with Human
    Intervention (but begin to deploy resource
    discovery mgmnt tools)
  • File Access Service Components of OOFS
  • Cache Manager GC Cache Manager (LBNL)
  • Mass Storage Manager HPSS, Enstore, OSM
  • Matchmaking Service Condor (U.
  • File Replication Index MCAT
  • Transfer Cost Estimation Service Globus (ANL)
  • File Fetching Service Components of OOFS
  • File Movers(s)
    SRB (SDSC) Site specific
  • End-to-end Network Services Globus tools for
    QoS reservation
  • Security and authentication Globus (ANL)

PPDG Middleware Architecture for Reliable High
Speed Data Delivery
Resource Management
Object-based and File-based Application Services
File Replication Index
Matchmaking Service
File Access Service
Cost Estimation
Cache Manager
File Fetching Service
Mass Storage Manager
File Mover
File Mover
End-to-End Network Services
Site Boundary
Security Domain
PPDG Developments In 2000-2001
  • Co-Scheduling algorithms
  • Matchmaking and Prioritization
  • Dual-Metric Prioritization
  • Policy and Marginal Utility
  • DiffServ on Networks to segregate tasks
  • Performance Classes
  • Transaction Management
  • Cost Estimators
  • Application/TM Interaction
  • Checkpoint/Rollback
  • Autonomous Agent Hierarchy

Beyond Traditional ArchitecturesMobile Agents
(Java Aglets)
Agents are objects with rules and legs -- D.
  • Mobile Agents Reactive, Autonomous, Goal Driven,
  • Execute Asynchronously
  • Reduce Network Load Local Conversations
  • Overcome Network Latency Some Outages
  • Adaptive ? Robust, Fault Tolerant
  • Naturally Heterogeneous
  • Extensible Concept Agent Hierarchies

Distributed Data Delivery and LHC Software
  • LHC Software and/or Analysis Process must
    account for data and resource-related realities
  • Delay for data location, queueing, scheduling
    sometimes for transport and reassembly
  • Allow for long transaction times,
    performance shifts, errors, out-of-order arrival
    of data
  • Software Architectural Choices
  • Traditional, single-threaded applications
  • Allow for data arrival and reassembly OR
  • Performance-Oriented (Complex)
  • I/O requests up-front multi-threaded data
    driven respond to ensemble of (changing) cost
  • Possible code movement as well as data movement
  • Loosely coupled, dynamic e.g. Agent-based

GriPhyN First Production Scale Grid Physics
  • Develop a New Form of Integrated Distributed
    System, while Meeting Primary Goals of the US
    LIGO and LHC Programs
  • Single Unified GRID System Concept Hierarchical
  • (Sub-)Implementations, for LIGO, SDSS, US CMS,
  • 20 Centers Few Each in US for LIGO, CMS,
  • Aspects Complementary to Centralized DoE Funding
  • University-Based Regional Tier2 Centers,
    Partnering with the Tier1 Centers
  • Emphasis on Training, Mentoring and Remote
  • Making the Process of Search and Discovery
    Accessible to Students
  • GriPhyN Web Site http//
  • White Paper http//

APOGEE/GriPhyN Data Grid Implementation
  • An Integrated Distributed System of Tier1 and
    Tier2 Centers
  • Flexible relatively low-cost (PC-based) Tier2
  • Medium-scale (for the LHC era) data storage and
    I/O capability
  • Well-adapted to local operation modest system
    engineer support
  • Meet changing local and regional needs in the
    active, early phases of
    data analysis
  • Interlinked with Gbps Network Links Internet2
    and Regional Nets Circa 2001-2005
  • State of the Art QoS techniques to prioritise
    and shape traffic, to manage
    bandwidth. Preview transoceanic BW, within the US
  • A working Production-Prototype (2001-2003) for
    Petabyte-Scale Distributed Computing Models
  • Focus on LIGO ( BaBar and Run2) handling of
    real data, and LHC Mock Data
    Challenges with simulated data
  • Meet the needs, and learn from system
    performance under stress

VRVS From Videoconferencing to Collaborative
  • gt 1400 registered hosts, 22 reflectors, 34
  • Running in U.S. Europe and Asia
  • Switzerland CERN (2)
  • Italy CNAF Bologna
  • UK Rutherford Lab
  • France IN2P3 Lyon, Marseilles
  • Germany Heidelberg Univ.
  • Finland FUNET
  • Spain IFCA-Univ. Cantabria
  • Russia Moscow State Univ., Tver. U.
  • U.S
  • Caltech, LBNL, SLAC, FNAL,
  • ANL, BNL, Jefferson Lab.
  • DoE HQ Germantown
  • Asia Academia Sinica, Taiwan
  • South America CeCalcula, Venezuala

(No Transcript)
Role of Simulationfor Distributed Systems
  • Simulations are widely recognized and used as
    essential tools for the design, performance
    evaluation and optimisation of complex
    distributed systems
  • From battlefields to agriculture from the
    factory floor to telecommunications systems
  • Discrete event simulations with an appropriate
    and high level of abstraction are powerful tools
  • Time intervals, interrupts and performance/load
    characteristics are the essentials
  • Not yet an integral part of the HENP culture, but
  • Some experience in trigger, DAQ and tightly
    coupledcomputing systems CERN CS2 models
  • Simulation is a vital part of the study of site
    architectures, network behavior, data
    access/processing/delivery strategies,
    for HENP Grid Design and Optimization

Monitoring ArchitectureUse of NetLogger as in
  • End-to-end monitoring of grid assets is
    needed to
  • Resolve network throughput problems
  • Dynamically schedule resources
  • Add precision-timed event monitor agents to
  • ATM switches
  • DPSS servers
  • Testbed computational resources
  • Produce trend analysis modules for monitor
  • Make results available to applications
  • See talk by B. Tierney

  • The HENP/LHC Data Analysis Problem
  • Worldwide-distributed Petabyte scale compacted
    binary data, and computing resources
  • Development of a robust networked data access
    and analysis system is mission-critical
  • An aggressive RD program is required
  • to develop systems for reliable data access,
    processing and analysis across an ensemble
    of networks
  • An effective inter-field partnership is now
    developing through many RD projects
  • HENP analysis is now one of the driving forces
    for the development of Data Grids
  • Solutions to this problem could be widely
    applicable in other scientific fields and
    industry, by LHC startup

LHC Computing Upcoming Issues
  • Cost of Computing at CERN for the LHC Program
  • May Exceed 100 MCHF at CERN Correspondingly
    More in Total
  • Some ATLAS/CMS Basic Numbers (CPU, 100 kB Reco.
    Event) from 1996 Require Further Study
  • We cannot scale up from previous generations
    (new methods)
  • CERN/Outside Sharing MONARC and CMS/SCB Use
    1/32/3 Rule
  • Computing Architecture and Cost Evaluation
  • Integration and Total Cost of Ownership
  • Possible Role of Central I/O Servers
  • Manpower Estimates
  • CERN versus scaled Regional Centre estimates
  • Scope of services and support provided
  • Limits of CERN support and service and the need
    for Regional Centres
  • Understanding that LHC Computing is Different
  • A different scale and worldwide distributed
    computing for the first time
  • Continuing, System RD is required
Write a Comment
User Comments (0)