Towards a US and LHC Grid - PowerPoint PPT Presentation

About This Presentation
Title:

Towards a US and LHC Grid

Description:

February 12, 2000: Towards a US and LHC Grid Environment for ... acceptable turnaround times, and efficient resource utilisation. Problems to be Explored ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 39
Provided by: CMS199
Category:

less

Transcript and Presenter's Notes

Title: Towards a US and LHC Grid


1
  • Towards a US (and LHC) Grid
  • Environment for HENP Experiments
  • CHEP 2000 Grid WorkshopHarvey B. Newman, Caltech
  • Padova
  • February 12, 2000

2
Data Grid Hierarchy Integration, Collaboration,
Marshal resources
1 TIPS 25,000 SpecInt95 PC (today) 10-15
SpecInt95
PBytes/sec
Online System
100 MBytes/sec
Offline Farm20 TIPS
Bunch crossing per 25 nsecs.100 triggers per
secondEvent is 1 MByte in size
100 MBytes/sec
Tier 0
CERN Computer Center
622 Mbits/sec
or Air Freight
Tier 1
Fermilab4 TIPS
France Regional Center
Italy Regional Center
Germany Regional Center
2.4 Gbits/sec
Tier 2
622 Mbits/sec
Tier 3
Physicists work on analysis channels. Each
institute has 10 physicists working on one or
more channels Data for these channels should be
cached by the institute server
Institute 0.25TIPS
Institute
Institute
Institute
Physics data cache
100 - 1000 Mbits/sec
Tier 4
Workstations
3
To Solve the LHC Data Problem
  • The proposed LHC computing and data handling will
    not support FREE access, transport or processing
    for more than a small part of the data
  • Balance between proximity to large computational
    and data handling facilities, and proximity
    to end users and more local resources for
    frequently-accessed datasets
  • Strategies must be studied and prototyped, to
    ensure both acceptable turnaround times, and
    efficient resource utilisation
  • Problems to be Explored
  • How to meet demands of hundreds of users who
    need transparent access to local and remote
    data, in disk caches and tape stores
  • Prioritise hundreds of requests of local and
    remote communities, consistent with local and
    regional policies
  • Ensure that the system is dimensioned/used/manag
    ed optimally, for the mixed workload

4
Regional Center Architecture Example by I.
Gaines (MONARC)
Tape Mass Storage Disk Servers Database Servers
Tier 2
Local institutes
Data Import
Data Export
Production Reconstruction Raw/Sim ?
ESD Scheduled, predictable experiment/ physics
groups
Production Analysis ESD ? AOD AOD ?
DPD Scheduled Physics groups
Individual Analysis AOD ? DPD and
plots Chaotic Physicists
CERN
Tapes
Desktops
Physics Software Development
RD Systems and Testbeds
Info servers Code servers
Web Servers Telepresence Servers
Training Consulting Help Desk
5
Grid Services Architecture
Applns
HEP Data-Analysis Related Applications
Appln Toolkits
Remote viz toolkit
Remote comp. toolkit
Remote data toolkit
Remote sensors toolkit
Remote collab. toolkit
...
Grid Services
Protocols, authentication, policy, resource
management, instrumentation, data discovery, etc.
Grid Fabric
Networks, data stores, computers, display
devices, etc. associated local services (local
implementations)
Adapted from Ian Foster
6
Grid Hierarchy Goals Better Resource Use and
Faster Turnaround
  • Grid integration and (de facto standard) common
    services to ease development, operation,
    management and security
  • Efficient resource use and improved
    responsiveness through
  • Treatment of the ensemble of site and network
    resourcesas an integrated (loosely coupled)
    system
  • Resource discovery, query estimation
    (redirection),
    co-scheduling, prioritization, local and global
    allocations
  • Network and site instrumentation performance
    tracking, monitoring, forward-prediction,
    problem trapping and handling

7
GriPhyN First Production Scale Grid Physics
Network
  • Develop a New Integrated Distributed System,
    while Meeting Primary Goals of the US LIGO, SDSS
    and LHC Programs
  • Unified GRID System Concept Hierarchical
    Structure
  • Twenty Centers with Three Sub-Implementations
  • 5-6 Each in US for LIGO, CMS, ATLAS 2-3 for
    SDSS
  • Emphasis on Training, Mentoring and Remote
    Collaboration
  • Focus on LIGO, SDSS (BaBar and Run2) handling
    of real data, and LHC Mock Data Challenges with
    simulated data
  • Making the Process of Discovery Accessible to
    Students Worldwide
  • GriPhyN Web Site http//www.phys.ufl.edu/avery/m
    re/
  • White Paper http//www.phys.ufl.edu/avery/mre/wh
    ite_paper.html

8
Grid Development Issues
  • Integration of applications with Grid Middleware
  • Performance-oriented user application software
    architectureis required, to deal with the
    realities of data access and delivery
  • Application frameworks must work with system
    state and policy information (instructions)
    from the Grid
  • O(R)DBMSs must be extended to work across
    networks
  • E.g. Invisible (to the DBMS) data transport,
    and catalog update
  • Interfacility cooperation at a new level, across
    world regions
  • Agreement on choice and implementation of
    standard Grid components, services, security and
    authentication
  • Interface the common services locally to match
    with heterogeneous resources, performance levels,
    and local operational requirements
  • Accounting and exchange of value software to
    enable cooperation

9
Roles of Projectsfor HENP Distributed Analysis
  • RD45, GIOD Networked Object Databases
  • Clipper/GC High speed access to Objects
    or File data FNAL/SAM for
    processing and analysis
  • SLAC/OOFS Distributed File System
    Objectivity Interface
  • NILE, Condor Fault Tolerant Distributed
    Computing with Heterogeneous CPU Resources
  • MONARC LHC Computing Models Architecture,
    Simulation, Strategy, Politics
  • PPDG First Distributed Data Services and
    Data Grid System Prototype
  • ALDAP OO Database Structures and
    Access Methods for Astrophysics and HENP
    Data
  • GriPhyN Production-Scale Data Grid
  • Simulation/Modeling, Application Network
    Instrumentation, System Optimization/Evaluation
  • APOGEE

10
Other ODBMS tests
Tests with Versant(fallback ODBMS)
DRO WAN Tests with CERN
Production on CERNs PCSF and file movement to
Caltech
Objectivity/DB Creation of 32000 database
federation
11
The China Clipper ProjectA Data Intensive Grid
ANL-SLAC-Berkeley
  • China Clipper Goal
  • Develop and demonstrate middleware allowing
    applications transparent, high-speed access to
    large data sets distributed over wide-area
    networks.
  • ? Builds on expertise and assets at ANL, LBNL
    SLAC
  • ? NERSC, ESnet
  • ? Builds on Globus Middleware and
    high-performance distributed storage
    system (DPSS from LBNL)
  • ? Initial focus on large DOE HENP applications
  • ? RHIC/STAR, BaBar
  • ? Demonstrated data rates to 57 Mbytes/sec.

12
Grand Challenge Architecture
  • An order-optimized prefetch architecture for data
    retrieval from multilevel storage in a multiuser
    environment
  • Queries select events and specific event
    components based upon tag attribute ranges
  • Query estimates are provided prior to execution
  • Queries are monitored for progress, multi-use
  • Because event components are distributed over
    several files, processing an event requires
    delivery of a bundle of files
  • Events are delivered in an order that takes
    advantage of what is already on disk, and
    multiuser policy-based prefetching of further
    data from tertiary storage
  • GCA intercomponent communication is CORBA-based,
    but physicists are shielded from this layer

13
GCA System Overview
GCA STACS
File Catalog
Index
Staged event files
(Other) disk-resident event data
Event Tags
pftp
HPSS
14
STorage Access Coordination System (STACS)
Query Estimator
Query
Bit-Sliced Index
Estimate
List of file bundles and events
Policy Module
Query Monitor
File Bundles, Event lists
Query Status, Cache Map
Requests for file caching and purging
Pftp and file purge commands
Cache Manager
File Catalog

15
The Particle Physics Data Grid (PPDG)
ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC,
U.Wisc/CS
Site to Site Data Replication Service 100
Mbytes/sec
PRIMARY SITE Data Acquisition, CPU, Disk, Tape
Robot
SECONDARY SITE CPU, Disk, Tape Robot
  • First Year Goal Optimized cached read access to
    1-10 Gbytes, drawn from a total data set of
    order One Petabyte

Multi-Site Cached File Access Service
16
The Particle Physics Data Grid (PPDG)
  • The ability to query and partially retrieve
    hundreds of terabytes across Wide Area Networks
    within seconds,
  • PPDG uses advanced services in three areas
  • Distributed caching to allow for rapid data
    delivery in response to multiple requests
  • Matchmaking and Request/Resource co-scheduling
    to manage workflow and use computing and net
    resources efficiently to achieve high throughput
  • Differentiated Services to allow
    particle-physics bulk data transport to coexist
    with interactive and real-time remote
    collaboration sessions, and other network
    traffic.

17
PPDG Architecture for Reliable High Speed Data
Delivery
Resource Management
Object-based and File-based Application Services
File Replication Index
Matchmaking Service
File Access Service
Cost Estimation
Cache Manager
File Fetching Service
Mass Storage Manager
File Mover
File Mover
FutureFile and Object Export Cache State
Tracking Forward Prediction
End-to-End Network Services
Site Boundary
Security Domain
18
First Year PPDG System Components
  • Middleware Components (Initial Choice) See PPDG
    Proposal
  • Object and File-Based Objectivity/DB (SLAC
    enhanced) Application Services GC Query Object,
    Event Iterator, Query Monitor
  • FNAL SAM System
  • Resource Management Start with Human
    Intervention(but begin to deploy resource
    discovery mgmnt tools Condor, SRB)
  • File Access Service Components of OOFS
    (SLAC)
  • Cache Manager GC Cache Manager (LBNL)
  • Mass Storage Manager HPSS, Enstore, OSM
    (Site-dependent)
  • Matchmaking Service Condor (U.
    Wisconsin)
  • File Replication Index MCAT
    (SDSC)
  • Transfer Cost Estimation Service Globus (ANL)
  • File Fetching Service Components of OOFS
  • File Movers(s)
    SRB (SDSC) Site specific
  • End-to-end Network Services Globus tools for
    QoS reservation
  • Security and authentication Globus (ANL)

19
CONDOR Matchmaking A Resource Allocation Paradigm
  • Parties use ClassAds to advertise properties,
    requirements and ranking to a matchmaker
  • ClassAds are Self-describing (no separate schema)
  • ClassAds combine query and data

High Throughput Computing
http//www.cs.wisc.edu/condor
20
Remote Execution in Condor
Agents for Remote Execution in CONDOR
Execution
Submission
Request Queue
Owner Agent
Customer Agent
Object Files
Object Files
Execution Agent
Application Agent
Data Object Files
Ckpt Files
Application Process
Application Process
Remote I/O Ckpt
21
Beyond Traditional ArchitecturesMobile Agents
(Java Aglets)
Agents are objects with rules and legs -- D.
Taylor
Agent
Service
Agent
Application
  • Mobile Agents
  • Execute Asynchronously
  • Reduce Network Load Local Conversations
  • Overcome Network Latency Some Outages
  • Adaptive ? Robust, Fault Tolerant
  • Naturally Heterogeneous
  • Extensible Concept Agent Hierarchies

22
Using the Globus Tools
  • Tests with gsiftp, a modified ftp
    server/client that allows control of the TCP
    buffer size
  • Transfers of Objy database files from the
    Exemplar to
  • Itself
  • An O2K at Argonne (via CalREN2 and Abilene)
  • A Linux machine at INFN (via US-CERN
    Transatlantic link)
  • Target /dev/null in multiple streams (1 to 16
    parallel gsiftp sessions).
  • Aggregate throughput as a function of number
    of streams and send/receive buffer sizes

25 MB/sec on HiPPI loop-back
4MB/sec to Argonne by tuning TCP window size
Saturating available B/W to Argonne
23
Distributed Data Delivery and LHC Software
Architecture
  • Software Architectural Choices
  • Traditional, single-threaded applications
  • Wait for data location, arrival and reassembly
    OR
  • Performance-Oriented (Complex)
  • I/O requests up-front multi-threaded data
    driven respond to ensemble of (changing) cost
    estimates
  • Possible code movement as well as data movement
  • Loosely coupled, dynamic

24
GriPhyN Foundation
  • Build on the Distributed System Results of the
    GIOD, MONARC, NILE, Clipper/GC and PPDG Projects
  • Long Term Vision in Three Phases
  • 1. Read/write access to high volume data and
    processing power
  • Condor/Globus/SRB NetLogger components to
    manage jobs and resources
  • 2. WAN-distributed data-intensive Grid computing
    system
  • Tasks move automatically to the most effective
    Node in the Grid
  • Scalable implementation using mobile agent
    technology
  • 3. Virtual Data concept for multi-PB
    distributed data management, with
    large-scale Agent Hierarchies
  • Transparently match data to sites, manage data
    replication or transport, co-schedule data
    compute resources
  • Build on VRVS Developments for Remote
    Collaboration

25
GriPhyN/APOGEE Production-Design of a Data
Analysis Grid
  • INSTRUMENTATION, SIMULATION, OPTIMIZATION,
    COORDINATION
  • SIMULATION of a Production-Scale Grid Hierarchy
  • Provide a Toolset for HENP experiments to test
    and optimize their data analysis and resource
    usage strategies
  • INSTRUMENTATION of Grid Prototypes
  • Characterize the Grid components performance
    under load
  • Validate the Simulation
  • Monitor, Track and Report system state, trends
    and Events
  • OPTIMIZATION of the Data Grid
  • Genetic algorithms, or other evolutionary methods
  • Deliver optimization package for HENP distributed
    systems
  • Applications to other experiments accelerator
    and other control systems other fields
  • COORDINATE with Experiment-Specific Projects
    CMS, ATLAS, BaBar, Run2

26
Grid (IT) Issues to be Addressed
  • Dataset compaction data caching and mirroring
    strategies
  • Using large time-quanta or very high bandwidth
    bursts, for large data transactions
  • Query estimators, Query Monitors (cf. GCA work)
  • Enable flexible, resilient prioritisation
    schemes (marginal utility)
  • Query redirection, fragmentation, priority
    alteration, etc.
  • Pre-Emptive and realtime data/resource
    matchmaking
  • Resource discovery
  • Data and CPU Location Brokers
  • Co-scheduling and queueing processes
  • State, workflow, performance-monitoring
    instrumentation tracking and forward
    prediction
  • Security Authentication (for resource
    allocation/usage and priority) running a
    certificate authority

27
CMS Example Data Grid Program of Work (I)
  • FY 2000
  • Build basic services 1 Million event samples
    on proto-Tier2s
  • For HLT milestones and detector/physics studies
    with ORCA
  • MONARC Phase 3 simulations for
    study/optimization
  • FY 2001
  • Set up initial Grid system based on PPDG
    deliverables at the first Tier2 centers and
    Tier1-prototype centers
  • High speed site-to-site file replication service
  • Multi-site cached file access
  • CMS Data Challenges in support of DAQ TDR
  • Shakedown of preliminary PPDG ( MONARC and
    GIOD) system strategies and tools
  • FY 2002
  • Deploy Grid system at the second set of Tier2
    centers
  • CMS Data Challenges for Software and Computing
    TDR and Physics TDR

28
Data Analysis Grid Program of Work (II)
  • FY 2003
  • Deploy Tier2 centers at last set of sites
  • 5-Scale Data Challenge in Support of Physics
    TDR
  • Production-prototype test of Grid Hierarchy
    System, with first elements of the production
    Tier1 Center
  • FY 2004
  • 20 Production (Online and Offline) CMS Mock
    Data Challenge, with all Tier2 Centers, and
    partly completed Tier1 Center
  • Build Production-quality Grid System
  • FY 2005 (Q1 - Q2)
  • Final Production CMS (Online and Offline)
    Shakedown
  • Full distributed system software and
    instrumentation
  • Using full capabilities of the Tier2 and Tier1
    Centers

29
Summary
  • The HENP/LHC data handling problem
  • Multi-Petabyte scale, binary pre-filtered data,
    resources distributed worldwide
  • Has no analog now, but will be increasingly
    prevalent in research, and industry by 2005.
  • Development of a robust PB-scale networked data
    access and analysis system is mission-critical
  • An effective partnership exists, HENP-wide,
    through many RD projects -
  • RD45, GIOD, MONARC, Clipper, GLOBUS, CONDOR,
    ALDAP, PPDG, ...
  • An aggressive RD program is required to develop
  • Resilient self-aware systems, for data access,
    processing and analysis across a hierarchy of
    networks
  • Solutions that could be widely applicable to
    data problems in other scientific fields and
    industry, by LHC startup
  • Focus on Data Grids for Next Generation Physics

30
LHC Data Models 1994-2000
  • HEP data models are complex!
  • Rich hierarchy of hundreds of complex data
    types (classes)
  • Many relations between them
  • Different access patterns (Multiple Viewpoints)
  • OO technology
  • OO applications deal with networks of objects
    (and containers)
  • Pointers (or references) are used to describe
    relations
  • Existing solutions do not scale
  • Solution suggested by RD45 ODBMS coupled to a
    Mass Storage System
  • Construction of Compact Datasets for
    AnalysisRapid Access/Navigation/Transport

31
Content Delivery Networks (CDN)
  • Web-Based Server-Farm Networks Circa 2000Dynamic
    (Grid-Like) Content Delivery Engines
  • Akamai, Adero, Sandpiper
  • 1200 ? Thousands of Network-Resident Servers
  • 25 ? 60 ISP Networks
  • 25 ? 30 Countries
  • 40 Corporate Customers
  • 25 B Capitalization
  • Resource Discovery
  • Build Weathermap of Server Network (State
    Tracking)
  • Query Estimation Matchmaking/Optimization
    Request rerouting
  • Virtual IP Addressing One address per
    server-farm
  • Mirroring, Caching
  • (1200) Autonomous-Agent Implementation

32
Strawman Tier 2 Evolution
  • 2000 2005
  • Linux Farm 1,200 SI95 20,000 SI95
  • Disks on CPUs 4 TB 50 TB
  • RAID Array 1 TB 30 TB
  • Tape Library 1-2 TB 50-100 TB
  • LAN Speed 0.1 - 1 Gbps 10-100 Gbps
  • WAN Speed 155 - 622 Mbps 2.5 - 10 Gbps
  • Collaborative MPEG2 VGA Realtime
    HDTVInfrastructure (1.5 - 3 Mbps) (10 -
    20 Mbps)
  • Reflects lower Tier 2 component costs due to
    less demanding usage. Some of the CPU will be
    used for simulation.

33
USCMS SC Spending profile
2006 is a model year for the operations phase of
CMS
34
GriPhyN Cost
  • System support 8.0 M
  • RD 15.0 M
  • Software 2.0 M
  • Tier 2 networking 10.0 M
  • Tier 2 hardware 50.0 M
  • Total 85.0 M

35
Grid Hierarchy ConceptBroader Advantages
  • Partitioning of users into proximate
    communitiesinto for support, troubleshooting,
    mentoring
  • Partitioning of facility tasks, to manage and
    focus resources
  • Greater flexibility to pursue different physics
    interests, priorities, and resource allocation
    strategies by region
  • Lower tiers of the hierarchy ? More local control

36
Storage Request Brokers (SRB)
  • Name Transparency Access to data by attributes
    stored in an RDBMS (MCAT).
  • Location Transparency Logical collections (by
    attributes) spanning multiple physical resources.
  • Combined Location and Name Transparency
    meansthat datasets can be replicated across
    multiple caches and data archives (PPDG).
  • Data Management Protocol Transparency SRB with
    custom-built drivers in front of each storage
    system
  • User does not need to know how the data is
    accessedSRB deals with local file system
    managers
  • SRBs (agents) authenticate themselves and users,
    using Grid Security Infrastructure (GSI)

37
Role of Simulationfor Distributed Systems
  • Simulations are widely recognized and used as
    essential tools for the design, performance
    evaluation and optimisation of complex
    distributed systems
  • From battlefields to agriculture from the
    factory floor to telecommunications systems
  • Discrete event simulations with an appropriate
    and high level of abstraction
  • Just beginning to be part of the HEP culture
  • Some experience in trigger, DAQ and tightly
    coupledcomputing systems CERN CS2 models
    (Event-oriented)
  • MONARC (Process-Oriented Java 2 Threads Class
    Lib)
  • These simulations are very different from HEP
    Monte Carlos
  • Time intervals and interrupts are the
    essentials
  • Simulation is a vital part of the study of site
    architectures, network behavior, data
    access/processing/delivery strategies,
    for HENP Grid Design and Optimization

38
Monitoring ArchitectureUse of NetLogger in
CLIPPER
  • End-to-end monitoring of grid assets is necessary
    to
  • Resolve network throughput problems
  • Dynamically schedule resources
  • Add precision-timed event monitor agents to
  • ATM switches
  • Storage servers
  • Testbed computational resources
  • Produce trend analysis modules for monitor
    agents
  • Make results available to applications
Write a Comment
User Comments (0)
About PowerShow.com