The Caltech CMSL3 Group Physics, Software and Computing Grids and Networks for HENP - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

The Caltech CMSL3 Group Physics, Software and Computing Grids and Networks for HENP

Description:

Mary Anne Scott (DOE/MICS) Visit. Harvey B Newman. California Institute of Technology ... HENP Transatlantic WG (DOE NSF) US-CERN Link: 622 Mbps this month ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 50
Provided by: harvey
Category:

less

Transcript and Presenter's Notes

Title: The Caltech CMSL3 Group Physics, Software and Computing Grids and Networks for HENP


1
The Caltech CMS/L3 GroupPhysics, Software and
ComputingGrids and Networks for HENP
2
CALTECH L3/CMS GROUP
  • E. Aslakson, J. Bunn, G. Denis, P. Galvez, M.
    Gataullin, K. Holtman, S. Iqbal, I. Legrand, X.
    Lei, V. Litvin, H. Newman, S. Ravot, S.
    Shevchenko, S. Singh, E. Soedermadji, C.
    Steenberg, R. Wilkinson, L. Zhang, K. Wei, Q.
    Wei, R. Y. Zhu
  • L3 At LEP 1981 - 2002
  • CMS At LHC 1994 - 2020
  • Search for Higgs, SUSY, New Physics from
    Electroweak to Quantum Gravity
  • Precision Electroweak to the TeV Scale
  • Emphasis on Precision e/g Measurements
  • MINOS At FNAL 2001 - 2006
  • Neutrino Oscillations and Flavor Mixing

3
L3/CMS GROUPPersonnel In FY 2002
  • Harvey Newman Professor
  • Renyuan Zhu Member of the
    Professional Staff
  • Sergey Shevchenko Senior Research Fellow
  • Marat Gataullin, Xia Lei Graduate Students on L3
  • Vladimir Litvin US CMS Software Engineers
    Distributed Iosif LeGrand
    Computing and Data Systems
  • Julian Bunn Senior Staff
    Scientist (CACR/HEP ALDAP)
  • Rick Wilkinson Staff Scientist CMS Core
    SWReconstruction
  • Philippe Galvez Senior Network
    Engineer
  • Gregory Denis, Kun Wei Multimedia Engineers
    (VRVS)
  • Sylvain Ravot (from 8/01) Network Engineer
    (LHCNet)
  • Liyuan Zhang, Qing Wei Visiting Scientists (Laser
    Optics Specialists)
  • Takako Hickey (to 5/02) CS Grid Production
    Systems (PPDG)
  • Conrad Steenberg Software Engineer CMS Grid SW
    (PPDG)
  • Koen Holtman CS CMS Grid SW and Databases
    (ALDAP)
  • Suresh Singh (from 3/01) Grid Production and
    Tier2 Support (GriPhyN/iVDGL)
  • Saima Iqbal (from 11/01) ORACLE Databases
    (ALDAP)
  • Edwin Soedermadji (from 11/01) CMS Grid SW
    Tier2 Support (iVDGL)
  • Eric Aslakson (from 12/01) Grid
    Software (PPDG)
  • N. Wisniewski, T. Lee Students Part-time Photon
    Reconstruction

4
L3 PHYSICS RESULTSand the CALTECH GROUP
  • Of the 255 L3 Physics
  • Papers Published to Date
  • 39 have been written by Caltech group members,
    and rely on their analysis
  • 38 more have been produced under Caltech group
    leadership

5

THE CALTECH L3/CMS GROUP L3 THESES and
PHYSICS ANALYSIS
  • LEP 1 (Z0 Peak Ecm 88-93 GeV)
  • Led 3 of 8 Analysis Groups New Particles,
    Taus, QCD
  • Precision Electroweak t t -(g)
    M. Gruenewald 1993 ee- (g) W.
    Lu 1997
  • Inclusive Hard Photons With Jets D.
    Kirkby 1995
  • LEP 2
  • Lead 2 of 3 Particle Search Groups SUSY
    Exotics
  • W Physics Mass, s and El-Weak Couplings A.
    Shvorob 2000
  • Physics with Single or Multi-g and E-Missing
    M. Gataullin 2002Anomalous Couplings SUSY
    n Counting
  • Searches for Supersymmetric Leptons
    Lei Xia 2002


6
Evidence for the Higgs at LEP at M115.5 GeV The
LEP Program Has Now Ended
L3
ALEPH
Hnn Candidate (?)Two well b-tagged jets m114.4
GeV (error3 GeV)
7
The Large Hadron Collider (2007-)
  • The Next-generation Particle Collider
  • The largest superconductor installation in
    the world
  • Bunch-bunch collisions at 40 MHz,Each generating
    20 interactions
  • Only one in a trillion may lead to a major
    physics discovery
  • Real-time data filtering Petabytes per second
    to Gigabytes per second
  • Accumulated data of many Petabytes/Year

Large data samples explored and analyzed by
thousands of globally dispersed scientists, in
hundreds of teams
8
Four LHC Experiments The
Petabyte to Exabyte Challenge
  • ATLAS, CMS, ALICE, LHCBHiggs New particles
    Quark-Gluon Plasma CP Violation

Data stored 40 Petabytes/Year and UP
CPU 0.30 Petaflops and UP 0.1
to 1 Exabyte (1 EB 1018
Bytes) (2007) (2012 ?) for the LHC
Experiments
9
Higgs Events In CMS
Higgs to Two Photons
Higgs to Four Muons
FULL CMSSIMULATION
  • General purpose pp detectorwell-adapted to
    lower initial lumi
  • Crystal ECAL for precise e and g measurements
  • Precise All-Silicon Tracker (223 m2) Three
    Pixel Layers
  • Excellent muon ID and precisemomentum
    measurements (Tracker Standalone Muon)
  • Hermetic jet measurements with good resolution

10
Higgs, SUSY and Dark MatterDiscovery Reach at CMS
  • The Full Range of SM Higgs Masses Will Be
    Covered
  • MH lt 1 TeV
  • In the MSSM Higgs Sector
  • Mh lt 130 GeV Maximum
  • Nearly All the Parameter Space Will Be
    Explored
  • Discovery Reach for SUSY
  • Squarks and Gluinos to M gt 2 TeV (Not
    Sensitive to SM Backgrounds)
  • Cosmologically Interesting Region Of SUSY
    Parameter Covered
  • SUSY Leptons
  • SUSY Signals Likely To Be Visible In The First
    (few) fb- 1
  • LHC First runs in 2007

11
CALTECH CONTRIBUTIONS and LEADERSHIP In CMS and
LHC
  • US CMS Collaboration Board Chair (1998 - 2000
    2000-2002 Re-Nominated in May 2002)
  • Originated and Launched US CMS SC Project
  • Led MONARC Project
  • Original LHC Grid Data Hierarchy ModelSet
    Computing, Data and Network Requirements
  • Co-PI on PPDG, GriPhyN/iVDGL and ALDAP Grid
    Projects Grid Software and Systems Development
  • Tier2 Prototype Caltech and UCSD
  • Regional Center and Network System Design for CMS
  • India, Pakistan, China, Brazil, Romania, ...
  • High Bandwidth Networking and Remote
    Collaboration Systems for LHC and HENP
  • Co-PI of TAN WG ICFA-SCIC Chair PI of I2 HENP
    WG
  • VRVS System
  • CMS ECAL

12
Baseline BW for the US-CERN Link HENP
Transatlantic WG (DOENSF)
Transoceanic NetworkingIntegrated with the
Abilene, TeraGrid, Regional Netsand Continental
NetworkInfrastructuresin US, Europe, Asia,
South America
Baseline evolution typicalof major HENPlinks
2001-2006
  • US-CERN Link 622 Mbps this month
  • DataTAG 2.5 Gbps Research Link in Summer 2002
  • 10 Gbps Research Link by Approx. Mid-2003

13
Transatlantic Net WG (HN, L. Price)
Bandwidth Requirements

Installed BW. Maximum Link Occupancy 50
Assumed See http//gate.hep.anl.gov/lprice/TAN
(US-CERN update)
14
DataTAG Project
NewYork
ABILENE
STARLIGHT
ESNET
GENEVA
Wave Triangle
CALREN
STAR-TAP
  • EU-Solicited Project. CERN, PPARC (UK), Amsterdam
    (NL), and INFN (IT)and US (DOE/NSF UIC, NWU
    and Caltech) partners
  • Main Aims
  • Ensure maximum interoperability between US and EU
    Grid Projects
  • Transatlantic Testbed for advanced network
    research
  • 2.5 Gbps Wavelength Triangle 7/02 (10 Gbps
    Triangle in 2003)

15
TeraGridNCSA, ANL, SDSC, Caltech
StarLight Intl Optical Peering Point (see
www.startap.net)
A Preview of the Grid Hierarchyand Networks of
the LHC Era
Abilene
Chicago
Indianapolis
DTF Backplane(4x? 40 Gbps)
Urbana
Pasadena
Starlight / NW Univ
UIC
I-WIRE
San Diego
Multiple Carrier Hubs
Ill Inst of Tech
ANL
OC-48 (2.5 Gb/s, Abilene)
Univ of Chicago
Multiple 10 GbE (Qwest)
Indianapolis (Abilene NOC)
Multiple 10 GbE (I-WIRE Dark Fiber)
NCSA/UIUC
  • Solid lines in place and/or available in 2001
  • Dashed I-WIRE lines planned for Summer 2002

Source Charlie Catlett, Argonne
16
Internet2 HENP WG
  • Mission To help ensure that the required
  • National and international network
    infrastructures(end-to-end)
  • Standardized tools and facilities for high
    performance and end-to-end monitoring and
    tracking, and
  • Collaborative systems
  • are developed and deployed in a timely manner,
    and used effectively to meet the needs of the
    US LHC and other major HENP Programs, as well as
    the at-large scientific community.
  • To carry out these developments in a way that is
    broadly applicable across many fields
  • Formed an Internet2 WG as a suitable framework
    Oct. 26 2001
  • Co-Chairs S. McKee (Michigan), H.
    Newman (Caltech) Secy J. Williams (Indiana)
  • Website http//www.internet2.edu/henp also see
    the Internet2End-to-end Initiative
    http//www.internet2.edu/e2e

17
HENP Projects Object Databases and Regional
Centers to Data Grids
  • RD45, GIOD Networked Object Databases
  • MONARC LHC Regional Center Computing Model
    Architecture, Simulation, Strategy, Politics
  • ALDAP OO Database Structures Access Methods
    for Astrophysics and HENP Data
  • PPDG
  • GriPhyN Production-Scale Data Grids
  • iVDGL Intl Testbeds as Grid
    Laboratories
  • EU Data Grid

18
MONARC Project
  • Models Of Networked Analysis
    At Regional Centers
  • Caltech, CERN, Columbia, FNAL, Heidelberg,
  • Helsinki, INFN, IN2P3, KEK, Marseilles, MPI
    Munich, Orsay, Oxford, Tufts
  • PROJECT GOALS
  • Develop Baseline Models
  • Specify the main parameters characterizing
    the Models performance throughputs, latencies
  • Verify resource requirement baselines
    Computing, Data handling, Networks
  • TECHNICAL GOALS
  • Define the Analysis Process
  • Define RC Architectures and Services
  • Provide Guidelines for the final Models
  • Provide a Simulation Toolset for Further Model
    studies

FNAL/BNL 200k SI95 650 Tbyte Disk Robot
2.5 Gbps
Univ 1
0.6-2.5 Gbps
Univ 2
Tier2 Ctr 50k SI95 100 TB Disk Robot
N X2.5 Gbps
Optional Air Freight
CERN 700k SI95 1000 TB Disk Robot
Univ M
2.5 Gbps
Model Circa 2006
2.5 Gbps
2.5 Gbps
19
Modeling and SimulationMONARC System (I.
Legrand)
  • The simulation program developed within MONARC
    (Models Of Networked Analysis At Regional
    Centers) uses a process- oriented approach for
    discrete event simulation, and provides a
    realistic modelling tool for large scale
    distributed systems.

SIMULATION of Complex Distributed Systems for LHC
20
LHC Data Grid Hierarchy (2007)
CERN/Outside Resource Ratio 12Tier0/(?
Tier1)/(? Tier2) 111
PByte/sec
100-400 MBytes/sec
Online System
Experiment
CERN 700k SI95 1 PB Disk Tape Robot
Tier 0 1
HPSS
2.5 Gbps
Tier 1
FNAL 200k SI95 600 TB
IN2P3 Center
INFN Center
RAL Center
2.5-10 Gbps
Tier 2
2.5-10 Gbps
Tier 3
Institute 0.25TIPS
Institute
Institute
Institute
Physicists work on analysis channels Each
institute has 10 physicists working on one or
more channels
0.110 Gbps
Physics data cache
Tier 4
Workstations
21
11,020 Hosts6205 Registered Users in 65
Countries 42 (7 I2) Reflectors Annual Growth 2
to 3X
22
The Particle Physics Data Grid (PPDG)
ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC,
U.Wisc/CS Florida
Site to Site Data Replication Service 100
Mbytes/sec
PRIMARY SITE Data Acquisition, CPU, Disk, Tape
Robot
PRIMARY SITE Data Acquisition, CPU, Disk, Tape
Robot
SECONDARY SITE CPU, Disk, Tape Robot
SECONDARY SITE CPU, Disk, Tape Robot
  • First Round Optimized cached read access to
    10-100 Gbytes drawn from a total data set of
    0.1 to 1 Petabyte

Multi-Site Cached File Access Service
University CPU, Disk, Users
University CPU, Disk, Users
University CPU, Disk, Users
University CPU, Disk, Users
University CPU, Disk, Users
University CPU, Disk, Users
Satellite Site Tape, CPU, Disk, Robot
Satellite Site Tape, CPU, Disk, Robot
University CPU, Disk, Users
PRIMARY SITE DAQ, Tape, CPU, Disk, Robot
University CPU, Disk, Users
PRIMARY SITE DAQ, Tape, CPU, Disk, Robot
University CPU, Disk, Users
Satellite Site Tape, CPU, Disk, Robot
Satellite Site Tape, CPU, Disk, Robot
23
Particle Physics Data GridCollaboratory Pilot
(2001-2003)
The Particle Physics Data Grid Collaboratory
Pilot will develop, evaluate and deliver vitally
needed Grid-enabled tools for data-intensive
collaboration in particle and nuclear physics.
Novel mechanisms and policies will be vertically
integrated with Grid Middleware, experiment
specific applications and computing resources to
provide effective end-to-end capability.
  • Computer Science Program of Work
  • CS1 Job Description Language
  • CS2 Schedule and Manage Data Processing and
    Placement Activities
  • CS3 Monitoring and Status Reporting
  • CS4 Storage Resource Management
  • CS5 Reliable Replication Services
  • CS6 File Transfer Services
  • .
  • CS11 Grid-enabled Analysis
  • DOE MICS/HENP partnership
  • DB file/object-collection replication, caching,
    catalogs, end-to-end
  • Practical orientation networks,
    instrumentation, monitoring

24
PPDG Focus and Foundations
  • MISSION FOCUS
  • Allow thousands of physicists to share data and
    computing resources, for scientific processing
    and analyses
  • TECHNICAL FOCUS End-to-End Applications
    Integrated Production Systems Using
  • Robust Data Replication
  • Intelligent Job Placement and Scheduling
  • Management of Storage Resources
  • Monitoring and Information Global Services
  • FOUNDATION Coop. Reliance On Other SCIDAC
    Programs
  • Security Uniform Authentication, Authorization
  • Reliable High Speed Data TransferNetwork
    Management
  • Common, Interoperable Middleware ServicesDe
    Facto Standards

25
CMS Productions and Computing Data Challenges
  • Already completed
  • 2000,1 Single site production challenges w/up to
    300 nodes
  • 5 Million events, Pileup for 1034
  • 2000,1 GRID Enabled prototypes demonstrated
  • 2001,2 Worldwide production infrastructure
  • 12 Regional Centers comprising 21 computing
    installations
  • Underway Now
  • Worldwide production 10 million events for DAQ
    TDR
  • 1000 CPUs in use
  • Production and Analysis at CERN and offsite
  • Being Scheduled
  • Single Site Production Challenges
  • Test code performance, computing perf.
    bottlenecks etc
  • Multi Site Production Challenges
  • Test Infrastructure, GRID prototypes, networks,
    replication
  • Single- and Multi-Site Analysis Challenges
  • Stress local and GRID prototypes under quite
    different conditions to Analysis

26
MONARC Simulation System Validation
CMS Proto-Tier1 Production Farm at FNAL
CMS Farm at CERN
27
Grid-enabled Data Analysis SC2001 Demoby K.
Holtman, J. Bunn (CMS/Caltech)
  • Demonstration of the use of Virtual Data
    technology for interactive CMS physics analysis
    at Supercomputing 2001, Denver (Nov 2001)
  • Interactive subsetting and analysis of 144,000
    CMS QCD events (105 GB)
  • Tier 4 workstation (Denver) gets data from two
    tier 2 servers (Caltech and San Diego)
  • Prototype tool showing feasibility of these CMS
    computing model concepts
  • Navigates from tag data to full event data
  • Transparently accesses virtual' objects through
    Grid-API
  • Reconstructs On-Demand (Virtual Data
    materialisation)
  • Integrates object persistency layer and grid
    layer
  • Peak throughput achieved 29.1 Mbyte/s78
    efficiency on 3 Fast Ethernet Ports

28
Grid-Related RD Projects in CMS Caltech, FNAL,
UCSD, UWisc, UFL (1)
  • Installation, Configuration and Deployment of
    Prototype Tier1and Tier2 Centers at FNAL,
    Caltech/UCSD, Florida
  • Grid Data Management Pilot (GDMP) with EU
    DataGrid
  • Detailed CMS Grid Requirements Documents, CMS
    Notes 2001/037, 2001/047
  • Revised PPDG/GriPhyN Architecture
  • Large Scale Automated Distributed Simulation
    Production
  • DTF TeraGrid Prototype CIT, Wisconsin Condor,
    NCSA
  • Distributed (Automated) MOnte Carlo Production
    (MOP) FNAL
  • MONARC Distributed Systems Modeling
    Simulation system applications to Grid Hierarchy
    management
  • Site configurations, analysis model, workload
  • Applications to strategy development e.g.
    inter-siteload balancing using a Self
    Organizing Neural Net (SONN)
  • Agent-based System Architecture for
    DistributedDynamic Services

29
Grid-Related RD Projects in CMS Caltech, FNAL,
UCSD, UWisc, UFL (2)
  • Large Scale Data Query Optimization
  • Bit-Sliced TAGs for Data Exploration
  • Development of Prototypes for Object-CollectionEx
    traction and Delivery (SC2001 COJAC)
  • Development of a Network-Efficient Interactive
    Remote Analysis Service (Java or C Clients
    C Servers) CLARENS
  • Development of a Grid Enabled Analysis
    Environment
  • Development of security infrastructure for a
    Virtual Organization
  • Robust (Scalable, Fault Tolerant) Execution
    Service RES
  • High Throughput Network Developments
  • Network Monitoring Systems In the US and at
    CERNDual CPU AMD or P4 Servers with GigE
    Interfaces

30
Cal-Tier2 Prototype (contd.)
31
Cal-Tier2 Prototype Work Plan
  • RD on distributed computing model
  • Tier2s have 1/3 of the organized resources
  • Contribute to RD on Optimization of Site
    Facilities
  • Leverage Expertise at CACR and SDSC
  • Network and system expertise at Caltech and UCSD
  • Strategies for production processing and analysis
  • Deliverance of CMS production milestones (PRS)
  • Support US-based physics analysis and somedata
    distribution among CA universities CIT,
    UCD, UCLA, UCR, UCSB, UCSD
  • Common Interests EMU, and e/Gamma (w/Tracker)
  • Local Software and Systems Expertise

32
COJAC CMS ORCA Java Analysis Component Java3D
Objectivity JNI Web Services
Demonstrated Caltech-Riode Janeiro (Feb.) and
Chile
33
Upcoming Grid Challenges Global SecureWorkflow
Management and Optimization
  • Workflow Management, Balancing Policy Versus
    Moment-to-moment Capability to Complete Tasks
  • Balance High Levels of Usage of Limited Resources
    Against Better Turnaround Times for Priority
    Jobs
  • Goal-Oriented According to (Yet to be Developed)
    Metrics
  • Maintaining a Global View of Resources and System
    State
  • Global System Monitoring, Modeling, Realtime
    tracking feedback on the Macro- and
    Micro-Scales
  • Realtime Error Detection, Redirection and
    Recovery
  • Global Distributed System Optimization
  • Adaptive Learning new paradigms for execution
    optimization and Decision Support
  • New Mechanisms New Metrics
  • User-Grid Interactions the Grid-Enabled Analysis
    Environment
  • Guidelines, Agents

34
Agent-Based Distributed System JINI Prototype
(Caltech/Pakistan)
  • Includes Station Servers (static) that host
    mobile Dynamic Services
  • Servers are interconnected dynamically to form a
    fabric in which mobile agents travel, with a
    payload of physics analysis tasks
  • Prototype is highly flexible and robust against
    network outages
  • Amenable to deployment on leading edge and
    future portable devices (WAP, iAppliances, etc.)
  • The system for the travelling physicist
  • The Design and Studies with this prototype use
    the MONARC Simulator, and build on SONN
    studies? See http//home.cern.ch/clegrand/lia/

35
Globally Scalable Monitoring Service CMS
(Caltech and Pakistan)
Lookup Service
Discovery
Lookup Service
Proxy
Client (other service)
Registration
Push Pull rsh ssh existing scripts snmp
RC Monitor Service
  • Component Factory
  • GUI marshaling
  • Code Transport
  • RMI data access

Farm Monitor
Farm Monitor
36
US CMS Prototypes and Test-beds
  • Tier-1 and Tier-2 Prototypes and Test-beds
    operational
  • Facilities for event simulationincluding
    reconstruction
  • Sophisticated processing for pile-up simulation
  • User cluster and hosting of data samples for
    physics studies
  • Facilities and Grid RD

37
Application Architecture Interfacing to the Grid
  • (Physicists) Application Codes
  • Experiments Software Framework Layer
  • Modular and Grid-aware Architecture able to
    interact effectively with the lower layers
  • Grid Applications Layer (Parameters and
    algorithms that govern system operations)
  • Policy and priority metrics
  • Workflow evaluation metrics
  • Task-Site Coupling proximity metrics
  • Global End-to-End System Services Layer
  • Monitoring and Tracking Component performance
  • Workflow monitoring and evaluation mechanisms
  • Error recovery and redirection mechanisms
  • System self-monitoring, evaluation and
    optimization mechanisms

38
MONARC SONN 3 Regional Centres Learning to
Export Jobs (Day 9)
ltEgt 0.73
ltEgt 0.83
1MB/s 150 ms RTT
CERN30 CPUs
CALTECH 25 CPUs
1.2 MB/s 150 ms RTT
0.8 MB/s 200 ms RTT
NUST 20 CPUs
ltEgt 0.66
Day 9
By I. Legrand
39
Focus on the Grid-EnabledAnalysis Environment
(GAE)
  • Development of the Grid-enabled production
    environment is progressing, BUT
  • Most of the physicists effort, and half of the
    resources will be devoted to Analysis. So we
    need to focus on
  • The Grid-enabled Analysis Environment (GAE)
  • This is where the real Grid Challenges lie
  • Use by a large diverse community tasks with
    different technical demands priorities
    security challenges (scaling)
  • The problem of high resource usage versus
    reasonable turnaround time for tasks
  • Need to study, generate guidelines for users, to
    get work done
  • Need to understand how/how much one can/should
    automate operations with Grid tools
  • This is where the the keys to success or
    failure are
  • Where the physics gets done where they live
  • Where the Grid E2E Services and Grid Apps.
    Layers get built

40
GriPhyN PetaScale Virtual Data Grids
Production Team
Individual Investigator
Workgroups
Interactive User Tools
Request Planning
Request Execution
Virtual Data Tools
Management Tools
Scheduling Tools
Resource
Other Grid
  • Resource
  • Security and
  • Other Grid

Security and
Management
  • Management
  • Policy
  • Services

Policy
Services
Services
  • Services
  • Services

Services
Transforms
Distributed resources
(code, storage,
Raw data
computers, and network)
source
41
PPDG Collaboratory Grid Pilot
  • In coordination with complementary projects in
    the US and Europe, this proposal is aimed at
    meeting the urgent needs for advanced
    Grid-enabled technology and strengthening the
    collaborative foundations of experimental
    particle and nuclear physics.
  • Our research and development will focus on
    the missing or less developed layers in the stack
    of Grid middleware and on issues of end-to-end
    integration and adaptation to local requirements.
    Each experiment has its own unique set of
    computing challenges, giving it a vital function
    as a laboratory for CS experimentation. At the
    same time, the wide generality of the needs of
    the physicists for effective distributed data
    access, processing, analysis and remote
    collaboration will ensure that the Grid
    technology that will be developed and/or
    validated by this proposed collaboratory pilot
    will be of more general use.

42
Launching the GAE Recasting a Mainstream CMS
NTuple Analysis
  • Strategic Oversight and Direction Harvey Newman
  • Security Infrastructure for a VO Conrad
    Steenberg
  • Analysis Architecture and Grid integration Koen
    Holtman
  • Ntuple ? AOD RDBMS Conversion (JETMET and
    General ntuple). Eric Aslakson
  • Reproduction of PAW-based ntuple analysis on
    Tier2, timing measurements, identification of
    possible optimization. Edwin Soedarmadji
  • ROOT version of PAW-based ntuple analysis,
    reproduction of results, timing measurements,
    data access via CLARENS server. Conrad
    Steenberg
  • RDBMS population with ntuple data. Julian Bunn
    (SQLServer), Saima Iqbal (Oracle 9i), Eric
    Aslakson (Tools), Edwin Soedarmadji
    (Optimisations/stored procedures)
  • Monitoring, Simulation, Optimization Legrand
    (PK, RO Groups)
  • Interactive Environment Object-Collection
    Prototypes All

43
Additional Slides on CMS Work Related to Grids
  • Some Extra Slides On CMS Work on Grid-Related
    Activities, and Associated Issues Follow

44
Computing Challenges Petabyes, Petaflops,
Global VOs
  • Geographical dispersion of people and resources
  • Complexity the detector and the LHC environment
  • Scale Tens of Petabytes per year of data

5000 Physicists 250 Institutes 60
Countries
Major challenges associated with Communication
and collaboration at a distance Managing globally
distributed computing data resources
Cooperative software development and physics
analysis New Forms of Distributed Systems Data
Grids
45
Links Required to US Labs and Transatlantic

Maximum Link Occupancy 50 AssumedOC3155
Mbps OC12622 Mbps OC482.5 Gbps OC19210 Gbps
46
LHC Grid Computing ProjectDRAFT High Level
Milestones
Prototype of Hybrid Event Store (Persistency
Framework)
Appli-cations
Hybrid Event Store available for general users
Distributed production using grid services
Full Persistency Framework
Distributed end-user interactive analysis
LHC Global Grid TDR
Grid Develop
50 proto. (LCG-3) available
LCG-1 reliability and performance targets
First Global Grid Service (LCG-1) available
47
GriPhyN There are Many Interfaces to the Grid
  • Views of the Grid (By Various Actors)
  • From Inside a Running Program
  • By Users Running Interactive Sessions
  • From Agent (armies) Gathering, Disseminating
    Information
  • From Operations Console(s) Supervisory
    Processes/Agents
  • From Workflow Monitors Event-Handlers
  • From Grid Software Developers/Debuggers
  • From Grid- and Grid-Site Administrators
  • Nature of Queries to the Grid
  • From Running Processes (e.g. via DB Systems)
  • By Users Via a DB System, and/or a Query Language
  • By Grid-Instrumentation Processes, for Grid
    Operations
  • By Agents Gathering Information

48
CMS Interfaces to the Grid
  • The CMS Data Grid from the users point of view
    is a way to deliver processed results
  • Object Collections (REC, AOD, DPDs, etc.)
  • From the Grids own view its a system to
    monitor, track, marshall and co-schedule
    resources (CPU, storage, networks)
  • to deliver results, and information to Users,
    users batch jobs, managing processes (agents),
    and Grid operators developers
  • Early Grid Architecture is file-based
  • Identifying (naming) object collections,
    extracting themto build a file list for
    transport, are CMS responsibility
  • We will have to learn how to deal with the Grid
    replica catalogs for data/metadata
  • Grid Tools will be general, not too elaborate
    interfacing these Tools to CARF and the ODBMS
    will be a CMS Task
  • Additional tools will be needed for data
    browsing, Grid progress and state tracking
    perhaps some work scheduling

49
PPDG Deliverable Set 1
  • Distributed Production File Service (CS-5, CS-6,
    CS-2)
  • Based on a GDMP and Globus tools build on
    existing CS collaborations with the Globus,
    Condor and LBL Storage management teams.
  • CMS requirements include
  • Global namespace definition for files
  • Specification language and user interface to
    specify a set of jobs, job parameters, input
    datasets, job flow and dataset disposition
    specification
  • Pre-staging and caching files resource
    reservation (storage, CPU perhaps network)
  • HRM integration with HPSS, Enstore and Castor,
    using GDMP
  • Globus information infrastructure, metadata
    catalog, digital certificates PKI adaptation
    to security at HENP labs
  • System language to describe distributed system
    assets and capabilities match to requests
    define priorities and policies.
  • Monitoring tools and displays to locate
    datasets, track task progress, data flows, and
    estimated time to task completion display site
    facilities state (utilization, queues,
    processes) using SNMP
  • ...

50
PPDG Deliverable Set 2
  • 2. Distributed Interactive Analysis Service
    (CS-1, CS-2)
  • CMS requirements include
  • User interface for locating, accessing,
    processing and/or delivering (APD) files for
    analysis
  • Tools to display systems availability, quotas,
    priorities to the user
  • Monitoring tools for cost estimation, tracking
    APD of files request redirection
  • Display file replicas, properties, estimated
    time for access or delivery
  • Integration into the CMS the distributed
    interactive user analysis environment (UAE)

51
PPDG Deliverable Set 3
  • 3. Object-Collection Access Extensions to
    CS-1, CS- 2
  • CMS requirements include
  • Object-Collection extraction, transport and
    delivery
  • Integration with ODBMS
  • Metadata catalog concurrent support
Write a Comment
User Comments (0)
About PowerShow.com