Harvey B. Newman, Caltech - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Harvey B. Newman, Caltech

Description:

Caltech/Wisconsin Condor/NCSA Production. Simple Job Launch from Caltech ... Condor, NCSA. Distributed MOnte Carlo Production (MOP): FNAL ' ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 35
Provided by: cms586
Category:
Tags: caltech | harvey | newman

less

Transcript and Presenter's Notes

Title: Harvey B. Newman, Caltech


1
LHC Experiments and the PACIA Partnership
for Global Data Analysis
  • Harvey B. Newman, Caltech
  • Advisory Panel on CyberInfrastructure
  • National Science Foundation
  • November 29, 2001
  • http//l3www.cern.ch/newman/LHCGridsPACI.ppt

2
Global Data Grid Challenge
  • Global scientific communities, served by
    networks with bandwidths varying by orders of
    magnitude, need to perform computationally
    demanding analyses of geographically distributed
    datasets that will grow by at least 3 orders of
    magnitude over the next decade, from the 100
    Terabyte to the 100 Petabyte scale from 2000 to
    2007

3
The Large Hadron Collider (2006-)
  • The Next-generation Particle Collider
  • The largest superconductor installation in
    the world
  • Bunch-bunch collisions at 40 MHz,Each generating
    20 interactions
  • Only one in a trillion may lead to a major
    physics discovery
  • Real-time data filtering Petabytes per second
    to Gigabytes per second
  • Accumulated data of many Petabytes/Year

Large data samples explored and analyzed by
thousands of globally dispersed scientists, in
hundreds of teams
4
Four LHC Experiments The Petabyte to Exabyte
Challenge
  • ATLAS, CMS, ALICE, LHCBHiggs New particles
    Quark-Gluon Plasma CP Violation

Data stored 40 Petabytes/Year and UP
CPU 0.30 Petaflops and UP 0.1
to 1 Exabyte (1 EB 1018
Bytes) (2007) (2012 ?) for the LHC
Experiments
5
Evidence for the Higgs at LEP at M115 GeV The
LEP Program Has Now Ended
6
LHC Higgs Decay into 4 muons 1000X LEP Data Rate
109 events/sec, selectivity 1 in 1013 (1 person
in a thousand world populations)
7
LHC Data Grid Hierarchy
CERN/Outside Resource Ratio 12Tier0/(?
Tier1)/(? Tier2) 111
PByte/sec
100-400 MBytes/sec
Online System
Experiment
CERN 700k SI95 1 PB Disk Tape Robot
Tier 0 1
HPSS
2.5 Gbits/sec
Tier 1
FNAL 200k SI95 600 TB
IN2P3 Center
INFN Center
RAL Center
2.5 Gbps
Tier 2
2.5 Gbps
Tier 3
Institute 0.25TIPS
Institute
Institute
Institute
Physicists work on analysis channels Each
institute has 10 physicists working on one or
more channels
100 - 1000 Mbits/sec
Physics data cache
Tier 4
Workstations
8
TeraGridNCSA, ANL, SDSC, Caltech
StarLight Intl Optical Peering Point (see
www.startap.net)
A Preview of the Grid Hierarchyand Networks of
the LHC Era
Abilene
Chicago
Indianapolis
DTF Backplane(4x? 40 Gbps)
Urbana
Pasadena
Starlight / NW Univ
UIC
I-WIRE
San Diego
Multiple Carrier Hubs
Ill Inst of Tech
ANL
OC-48 (2.5 Gb/s, Abilene)
Univ of Chicago
Multiple 10 GbE (Qwest)
Indianapolis (Abilene NOC)
Multiple 10 GbE (I-WIRE Dark Fiber)
NCSA/UIUC
  • Solid lines in place and/or available in 2001
  • Dashed I-WIRE lines planned for Summer 2002

Source Charlie Catlett, Argonne
9
Current Grid Challenges Resource Discovery,
Co-Scheduling, Transparency
  • Discovery and Efficient Co-Scheduling of
    Computing, Data Handling, and Network Resources
  • Effective, Consistent Replica Management
  • Virtual Data Recomputation Versus Data Transport
    Decisions
  • Reduction of Complexity In a Petascale World
  • GA3 Global Authentication, Authorization,
    Allocation
  • VDT Transparent Access to Results (and Data
    When Necessary)
  • Location Independence of the User Analysis,
    Grid,and Grid-Development Environments
  • Seamless Multi-Step Data Processing and
    AnalysisDAGMan (Wisc), MOPIMPALA(FNAL)

10
CMS Production Event Simulation and
Reconstruction
Common Prod. tools (IMPALA)
GDMP
Digitization
Simulation
PU
No PU
Fully operational
?
?
CERN
?
?
FNAL
In progress
?
Moscow
?
?
INFN
?
?
Caltech
?
?
UCSD
?
?
UFL
Worldwide Productionat 12 Sites
?
?
Imperial College
?
?
Bristol
?
?
Wisconsin
Not Op.
?
?
IN2P3
Not Op.
Not Op.
Helsinki
Grid-Enabled
Automated
11
US CMS TeraGrid Seamless Prototype
  • Caltech/Wisconsin Condor/NCSA Production
  • Simple Job Launch from Caltech
  • Authentication Using Globus Security
    Infrastructure (GSI)
  • Resources Identified Using Globus Information
    Infrastructure (GIS)
  • CMSIM Jobs (Batches of 100, 12-14 Hours, 100 GB
    Output) Sent to the Wisconsin Condor Flock
    Using Condor-G
  • Output Files Automatically Stored in NCSA Unitree
    (Gridftp)
  • ORCA Phase Read-in and Process Jobs at NCSA
  • Output Files Automatically Stored in NCSA Unitree
  • Future Multiple CMS Sites Storage in Caltech
    HPSS Also,Using GDMP (With LBNLs HRM).
  • Animated Flow Diagram of the DTF Prototype
    http//cmsdoc.cern.ch/wisniew/infrastructure.html

12
Baseline BW for the US-CERN Link HENP
Transatlantic WG (DOENSF)
Transoceanic NetworkingIntegrated with the
TeraGrid, Abilene, Regional Netsand Continental
NetworkInfrastructuresin US, Europe, Asia,
South America
US-CERN Plans 155 Mbps to 2 X 155 Mbps this
Year 622 Mbps in April 2002DataTAG 2.5 Gbps
Research Link in Summer 200210 Gbps Research
Link in 2003
13
Transatlantic Net WG (HN, L. Price)
Bandwidth Requirements

Installed BW. Maximum Link Occupancy 50
Assumed The Network Challenge is Shared by Both
Next- and Present Generation Experiments
14
Internet2 HENP Networking WG Mission
  • To help ensure that the required
  • National and international network
    infrastructures
  • Standardized tools and facilities for high
    performance and end-to-end monitoring and
    tracking, and
  • Collaborative systems
  • are developed and deployed in a timely manner,
    and used effectively to meet the needs of the US
    LHC and other major HENP Programs, as well as
    the general needs of our scientific community.
  • To carry out these developments in a way that is
    broadly applicable across many fields, within and
    beyond the scientific community
  • Co-Chairs S. McKee (Michigan), H. Newman
    (Caltech) With thanks to R. Gardner and J.
    Williams (Indiana)

15
Grid RD Focal Areas for NPACI/HENP Partnership
  • Development of Grid-Enabled User Analysis
    Environments
  • CLARENS (IGUANA) Project for Portable
    Grid-Enabled Event Visualization, Data
    Processing and Analysis
  • Object Integration backed by an ORDBMS, and
    File-Level Virtual Data Catalogs
  • Simulation Toolsets for Systems Modeling,
    Optimization
  • For example the MONARC System
  • Globally Scalable Agent-Based Realtime
    Information Marshalling Systems
  • To face the next-generation challenge of
    DynamicGlobal Grid design and operations
  • Self-learning (e.g. SONN) optimization
  • Simulation (Now-Casting) enhanced to monitor,
    track and forward predict site, network and
    global system state
  • 1-10 Gbps Networking development and global
    deployment
  • Work with the TeraGrid, STARLIGHT, Abilene, the
    iVDGL GGGOC, HENP Internet2 WG, Internet2 E2E,
    and DataTAG
  • Global Collaboratory Development e.g. VRVS,
    Access Grid

16
CLARENS a Data AnalysisPortal to the Grid
Steenberg (Caltech)
  • A highly functional graphical interface,
    Grid-enabling the working environment for
    non-specialist physicists data analysis
  • Clarens consists of a server communicating with
    various clients via the commodity XML-RPC
    protocol. This ensures implementation
    independence.
  • The server is implemented in C to give access
    to the CMS OO analysis toolkit.
  • The server will provide a remote API to Grid
    tools
  • Security services provided by the Grid (GSI)
  • The Virtual Data Toolkit Object collection
    access
  • Data movement between Tier centers using GSI-FTP
  • CMS analysis software (ORCA/COBRA)
  • Current prototype is running on the Caltech
    Proto-Tier2
  • More information at http//heppc22.hep.caltech.edu
    , along with a web-based demo

17
Modeling and SimulationMONARC System
  • Modelling and understanding current systems,
    their performance and limitations, is essential
    for the design of the future large scale
    distributed processing systems.
  • The simulation program developed within the
    MONARC (Models Of Networked Analysis At Regional
    Centers) project is based on a process oriented
    approach for discrete event simulation. It is
    based on the on Java(TM) technology and provides
    a realistic modelling tool for such large scale
    distributed systems.

SIMULATION of Complex Distributed Systems
18
MONARC SONN 3 Regional Centres Learning to
Export Jobs (Day 9)
ltEgt 0.73
ltEgt 0.83
1MB/s 150 ms RTT
CERN30 CPUs
CALTECH 25 CPUs
1.2 MB/s 150 ms RTT
0.8 MB/s 200 ms RTT
NUST 20 CPUs
ltEgt 0.66
Day 9
19
Maximizing US-CERN TCP Throughput (S.Ravot,
Caltech)
  • TCP Protocol Study Limits
  • We determined Precisely
  • The parameters which limit the throughput over
    a high-BW, long delay (170 msec) network
  • How to avoid intrinsic limits unnecessary
    packet loss
  • Methods Used to Improve TCP
  • Linux kernel programming in order to tune TCP
    parameters
  • We modified the TCP algorithm
  • A Linux patch will soon be available
  • Result The Current State of the Art for
    Reproducible Throughput
  • 125 Mbps between CERN and Caltech
  • 135 Mbps between CERN and Chicago
  • Status Ready for Tests at Higher BW (622 Mbps)
    in Spring 2002

Congestion window behavior of a TCP connection
over the transatlantic line
Reproducible 125 Mbps BetweenCERN and
Caltech/CACR
20
Agent-Based Distributed System JINI Prototype
(Caltech/Pakistan)
  • Includes Station Servers (static) that host
    mobile Dynamic Services
  • Servers are interconnected dynamically to form a
    fabric in which mobile agents travel, with a
    payload of physics analysis tasks
  • Prototype is highly flexible and robust against
    network outages
  • Amenable to deployment on leading edge and
    future portable devices (WAP, iAppliances, etc.)
  • The system for the travelling physicist
  • The Design and Studies with this prototype use
    the MONARC Simulator, and build on SONN
    studies? See http//home.cern.ch/clegrand/lia/

21
Globally Scalable Monitoring Service
Lookup Service
Discovery
Lookup Service
Proxy
Client (other service)
Registration
Push Pull rsh ssh existing scripts snmp
RC Monitor Service
  • Component Factory
  • GUI marshaling
  • Code Transport
  • RMI data access

Farm Monitor
Farm Monitor
22
Examples
  • GLAST meeting
  • 10 participants connected via VRVS (and 16
    participants in Audio only)

VRVS 7300 Hosts 4300 Registered Users In 58
Countries 34 Reflectors 7 In I2 Annual Growth
250
US CMS will use the CDF/KEK remote control room
concept for Fermilab Run II as a starting point.
However, we will (1) expand the scope to
encompass a US based physics group and US LHC
accelerator tasks, and (2) extend the concept to
a Global Collaboratory for realtime data
acquisition analysis
23
Next Round Grid Challenges Global Workflow
Monitoring, Management, and Optimization
  • Workflow Management, Balancing Policy Versus
    Moment-to-moment Capability to Complete Tasks
  • Balance High Levels of Usage of Limited Resources
    Against Better Turnaround Times for Priority
    Jobs
  • Goal-Oriented According to (Yet to be Developed)
    Metrics
  • Maintaining a Global View of Resources and System
    State
  • Global System Monitoring, Modeling,
    Quasi-realtime simulation feedback on the
    Macro- and Micro-Scales
  • Adaptive Learning new paradigms for execution
    optimization and Decision Support (eventually
    automated)
  • Grid-enabled User Environments

24
PACI, TeraGrid and HENP
  • The scale, complexity and global extent of the
    LHC Data Analysis problem is unprecedented
  • The solution of the problem, using globally
    distributed Grids, is mission-critical for
    frontier science and engineering
  • HENP has a tradition of deploying new highly
    functional systems (and sometimes new
    technologies) to meet its technical and
    ultimately its scientific needs
  • HENP problems are mostly embarrassingly
    parallel but potentially overwhelming in their
    data- and network intensiveness
  • HENP/Computer Science synergy has increased
    dramatically over the last two years, focused on
    Data Grids
  • Successful collaborations in GriPhyN, PPDG, EU
    Data Grid
  • The TeraGrid (present and future) and its
    development program is scoped at an appropriate
    level of depth and diversity
  • to tackle the LHC and other Petascale
    problems, over a 5 year time span
  • matched to the LHC time schedule, with full ops.
    In 2007

25
Some Extra Slides Follow
26
Computing Challenges LHC Example
  • Geographical dispersion of people and resources
  • Complexity the detector and the LHC environment
  • Scale Tens of Petabytes per year of data

5000 Physicists 250 Institutes 60
Countries
Major challenges associated with Communication
and collaboration at a distance Network-distribute
d computing and data resources Remote software
development and physics analysis RD New Forms
of Distributed Systems Data Grids
27
Why Worldwide Computing? Regional Center Concept
Goals
  • Managed, fair-shared access for Physicists
    everywhere
  • Maximize total funding resources while meeting
    the total computing and data handling needs
  • Balance proximity of datasets to large central
    resources, against regional resources under more
    local control
  • Tier-N Model
  • Efficient network use higher throughput on short
    paths
  • Local gt regional gt national gt international
  • Utilizing all intellectual resources, in several
    time zones
  • CERN, national labs, universities, remote sites
  • Involving physicists and students at their home
    institutions
  • Greater flexibility to pursue different physics
    interests, priorities, and resource allocation
    strategies by region
  • And/or by Common Interests (physics topics,
    subdetectors,)
  • Manage the Systems Complexity
  • Partitioning facility tasks, to manage and focus
    resources

28
HENP Related Data Grid Projects
  • Funded Projects
  • PPDG I USA DOE 2M 1999-2001
  • GriPhyN USA NSF 11.9M 1.6M 2000-2005
  • EU DataGrid EU EC 10M 2001-2004
  • PPDG II (CP) USA DOE 9.5M 2001-2004
  • iVDGL USA NSF 13.7M 2M 2001-2006
  • DataTAG EU EC 4M 2002-2004
  • About to be Funded Project
  • GridPP UK PPARC gt15M? 2001-2004
  • Many national projects of interest to HENP
  • Initiatives in US, UK, Italy, France, NL,
    Germany, Japan,
  • EU networking initiatives (Géant, SURFNet)
  • US Distributed Terascale Facility (53M, 12
    TFL, 40 Gb/s network)

in final stages of approval
29
Network Progress andIssues for Major Experiments
  • Network backbones are advancing rapidly to the 10
    Gbps range Gbps end-to-end data flows will
    soon be in demand
  • These advances are likely to have a profound
    impacton the major physics Experiments
    Computing Models
  • We need to work on the technical and political
    network issues
  • Share technical knowledge of TCP Windows,
    Multiple Streams, OS kernel issues Provide User
    Toolset
  • Getting higher bandwidth to regions outside W.
    Europe and US China, Russia, Pakistan, India,
    Brazil, Chile, Turkey, etc.
  • Even to enable their collaboration
  • Advanced integrated applications, such as Data
    Grids, rely onseamless transparent operation
    of our LANs and WANs
  • With reliable, quantifiable (monitored), high
    performance
  • Networks need to become part of the Grid(s)
    design
  • New paradigms of network and system
    monitoringand use need to be developed, in the
    Grid context

30
Grid-Related RD Projects in CMS Caltech, FNAL,
UCSD, UWisc, UFl
  • Installation, Configuration and Deployment of
    Prototype Tier2 Centers at Caltech/UCSD and
    Florida
  • Large Scale Automated Distributed Simulation
    Production
  • DTF TeraGrid (Micro-)Prototype CIT, Wisconsin
    Condor, NCSA
  • Distributed MOnte Carlo Production (MOP) FNAL
  • MONARC Distributed Systems Modeling
    Simulation system applications to Grid Hierarchy
    management
  • Site configurations, analysis model, workload
  • Applications to strategy development e.g.
    inter-siteload balancing using a Self
    Organizing Neural Net (SONN)
  • Agent-based System Architecture for
    DistributedDynamic Services
  • Grid-Enabled Object Oriented Data Analysis

31
MONARC Simulation System Validation
CMS Proto-Tier1 Production Farm at FNAL
CMS Farm at CERN
32
MONARC SONN 3 Regional Centres Learning to
Export Jobs (Day 0)
1MB/s 150 ms RTT
CERN30 CPUs
CALTECH 25 CPUs
1.2 MB/s 150 ms RTT
0.8 MB/s 200 ms RTT
NUST 20 CPUs
Day 0
33
US CMS Remote Control RoomFor LHC
34
Full Event Database of 100,000 large objects
Denver Client
Full Event Database of 40,000 large objects
?
?
?
Request
?
Request
?
?
Parallel tuned GSI FTP
Parallel tuned GSI FTP
Tag database of 140,000 small objects
Bandwidth Greedy Grid-enabled Object Collection
Analysis for Particle Physics (SC2001
Demo) Julian Bunn, Ian Fisk, Koen Holtman, Harvey
Newman, James Patton
The object of this demo is to show grid-supported
interactive physics analysis on a set of 144,000
physics events. Initially we start out with
144,000 small Tag objects, one for each event, on
the Denver client machine. We also have 144,000
LARGE objects, containing full event data,
divided over the two tier2 servers.
? Using local Tag event database, user plots
event parameters of interest ? User selects
subset of events to be fetched for further
analysis ? Lists of matching events sent to
Caltech and San Diego ? Tier2 servers begin
sorting through databases extracting required
events ? For each required event, a new large
virtual object is materialized in the server-side
cache, this object contains all tracks in the
event. ? The database files containing the new
objects are sent to the client using Globus FTP,
the client adds them to its local cache of
large objects ? The user can now plot event
parameters not available in the Tag ? Future
requests take advantage of previously cached
large objects in the client
http//pcbunn.cacr.caltech.edu/Tier2/Tier2_Overall
_JJB.htm
Write a Comment
User Comments (0)
About PowerShow.com