U'S' ATLAS Computing Facilities - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

U'S' ATLAS Computing Facilities

Description:

Support for software development and testing ...to enable effective participation by US ... Allows wide range of services with only 1 FTE of sys admin ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 38
Provided by: BruceG90
Category:

less

Transcript and Presenter's Notes

Title: U'S' ATLAS Computing Facilities


1
U.S. ATLAS Computing Facilities
  • U.S. ATLAS Physics Computing Review
  • Bruce G. Gibbard, BNL
  • 10-11 January 2000

2
US ATLAS Computing Facilities
  • Facilities procured, installed and operated
  • to meet US MOU Obligations
  • Direct IT responsibility (Monte Carlo, for
    example)
  • Support for detector construction, testing,
    calib.
  • Support for software development and testing
  • to enable effective participation by US
    physicists in ATLAS physics program!
  • Direct access to and analysis of physics data
    sets
  • Support simulation, re-reconstruction, and
    reorganization of data associated with that
    analysis

3
Setting the Scale
  • Uncertainties in Defining Requirements
  • Five years of detector, algorithm software
    development
  • Five years of computer technology evolution
  • Start from ATLAS estimate rules of thumb
  • Adjust for US ATLAS perspective (experience and
    priorities)
  • Adjust for details of architectural model of US
    ATLAS facilities

4
Atlas Estimate Rules of Thumb
  • Tier 1 Center in 05 should include ...
  • 30,000 SPECint95 for Analysis
  • 10-20,000 SPECint95 for Simulation
  • 50-100 TBytes/year of On-line (Disk) Storage
  • 200 TBytes/year of Near-line (Robotic Tape)
    Storage
  • 100 Mbit/sec connectivity to CERN
  • Assume no major raw data processing or handling
    outside of CERN

5
US ATLAS Perspective
  • US ATLAS facilities must be adequate to meet any
    reasonable U.S. ATLAS computing needs (U.S. role
    in ATLAS should not be constrained by a computing
    shortfall, rather the U.S. role should be
    enhanced by computing strength)
  • Store re-reconstruct 10-30 of events
  • Take high end of simulation capacity range
  • Take high end of disk capacity range
  • Augment analysis capacity
  • Augment CERN link bandwidth

6
Adjusted For US ATLAS Perspective
  • US ATLAS Tier 1 Center in 05 should include ...
  • 10,000 SPECint95 for Re-reconstruction
  • 50,000 SPECint95 for Analysis
  • 20,000 SPECint95 for Simulation
  • 100 TBytes/year of On-line (Disk) Storage
  • 300 TBytes/year of Near-line (Robotic Tape)
    Storage
  • Dedicate OC12, 622 Mbit/sec to CERN

7
Architectural Model
  • Consists of Transparent Hierarchically
    Distributed Grid Connected Computing Resources
  • Primary ATLAS Computing Centre at CERN
  • US ATLAS Tier 1 Computing Center at BNL
  • National in scope at 20 of CERN
  • US ATLAS Tier 2 Computing Centers
  • Six, each regional in scope at 20 of Tier 1
  • Likely one of them at CERN
  • US ATLAS Institutional Computing Facilities
  • Local LAN in scope, not project supported
  • US ATLAS Individual Desk Top Systems

8
Schematic of Model
9
Distributed Model
  • Rationale (benefits)
  • Improved user access to computing resources
  • Local geographic travel
  • Higher performance regional networks
  • Enable local autonomy
  • Less widely shared
  • More locally managed resources
  • Increased capacities
  • Encourage integration of other equipment
    expertise
  • Institutional, base program
  • Additional funding options
  • Com Sci, NSF

10
Distributed Model
  • But increase vulnerability (Risk)
  • Increased dependence on network
  • Increased dependence on GRID infrastructure RD
  • Increased dependence on facility modeling tools
  • More complex management
  • Risk / benefit analysis must yield positive result

11
Adjusted For Architectural Model
  • US ATLAS facilities in 05 should include ...
  • 10,000 SPECint95 for Re-reconstruction
  • 85,000 SPECint95 for Analysis
  • 35,000 SPECint95 for Simulation
  • 190 TBytes/year of On-line (Disk) Storage
  • 300 TBytes/year of Near-line (Robotic Tape)
    Storage
  • Dedicated OC12 622 Mbit/sec Tier 1 connectivity
    to each Tier 2
  • Dedicate OC12 622 Mbit/sec to CERN

? ?
12
GRID Infrastructure
  • GRID infrastructure software must supply
  • Efficiency (optimizing hardware use)
  • Transparency (optimizing user effectiveness)
  • Projects
  • PPDG Distributed data services - Later talk by
    D. Malon
  • APOGEE Complete GRID infrastructure including
    distributed resources management, modeling,
    instrumentation, etc.
  • GriPhyN Staged development toward delivery of a
    production system
  • Alternative to success with these projects is a
    difficult to use and/or inefficient overall
    system
  • U.S. ATLAS involvement includes - ANL, LBNL, LBNL

13
Facility Modeling
  • Performance of Complex Distribute System is
    Difficult but Necessary to Predict
  • MONARC - LHC centered project
  • Provide toolset for modeling such systems
  • Develop guidelines for designing such systems
  • Currently capable of relevant analyses
  • U.S. ATLAS Involvement
  • Later talk by K. Sliwa

14
Components of Model Tier 1
  • Full Function Facility
  • Dedicated Connectivity to CERN
  • Primary Site for Storage/Serving
  • Cache/Replicate CERN data needed by US ATLAS
  • Archive and Serve WAN all data of interest to US
    ATLAS
  • Computation
  • Primary Site for Re-reconstruction (perhaps only
    site)
  • Major Site for Simulation Analysis (2 x Tier
    2)
  • Repository of Technical Expertise and Support
  • Hardware, OSs, utilities, and other standard
    elements of U.S. ATLAS
  • Network, AFS, GRID, other infrastructure
    elements of WAN model

15
Components of Model Tier 2
  • Limit personnel and maintenance support costs
  • Focused Function Facility
  • Excellent connectivity to Tier 1 (Network GRID)
  • Tertiary storage via Network at Tier 1 (none
    local)
  • Primary Analysis site for its region
  • Major Simulation capabilities
  • Major online storage cache for its region
  • Leverage local expertise and other resources
  • Part of site selection criteria, 1 FTE
    contributed, for example

16
Technology Trends Choices
  • CPU
  • Range Commodity processors -gt SMP servers
  • Factor 2 decrease in price/performance in 1.5
    years
  • Disk
  • Range Commodity disk -gt RAID disk
  • Factor 2 decrease in price/performance in 1.5
    years
  • Tape Storage
  • Range Desktop storage -gt High-end storage
  • Factor 2 decrease in price/performance in 1.5 - 2
    years

17
Price/Performance Evolution
As of Dec 1996
From Harvey Newman presentation, Third LCB
Workshop, Marseilles, Sept. 1999
18
Technology Trends Choices
  • For Costing Purpose
  • Start with familiar established technologies
  • Project by observed exponential slopes
  • This is a Conservative Approach
  • There are no known near term show stoppers to
    these established technologies
  • A new technology would have to be more cost
    effective to supplant projection of an
    established technology

19
Technology Trends Choices
  • CPU Intensive processing
  • Farms of commodity processors - Intel/Linux
  • I/O Intensive Processing and Serving
  • Mid-scale SMPs (SUN, IBM, etc.)
  • Online Storage (Disk)
  • Fibre Channel Connected RAID
  • Nearline Storage (Robotic Tape System)
  • STK / 9840 / HPSS
  • LAN
  • Gigabit Ethernet

20
Composition of Tier 1
  • Commodity processor farms (Intel/Linux)
  • Mid-scale SMP servers (SUN)
  • Fibre Channel connected RAID disk
  • Robotic tape / HSM system (STK / HPSS)

21
Current Tier 1 Status
  • U.S. ATLAS Tier 1 facility is currently operating
    as a small, 5 , adjunct to the RHIC Computing
    Facility (RCF)
  • Deployment includes
  • Intel/Linux farms (28 CPUs)
  • Sun E450 server (2 CPUs)
  • 200 Mbytes of Fibre Channel RAID Disk
  • Intel/Linux web server
  • Archiving via low priority HPSS Class of Service
  • Shared use of an AFS server (10 GBytes)

22
Current Tier 1 Status
  • These RCF chosen platforms/technologies are
    common to ATLAS
  • Allows wide range of services with only 1 FTE of
    sys admin contributed (plus US ATLAS librarian)
  • Significant divergence of direction between US
    ATLAS and RHIC has been allowed for
  • Complete divergence, extremely unlikely, would
    exceed current staffing estimates

23
(No Transcript)
24
RAID Disk Subsystem
25
Intel/Linux Processor Farm
26
Intel/Linux Nodes
27
Composition of Tier 2 (Initial One)
  • Commodity processor farms (Intel/Linux)
  • Mid-scale SMP servers
  • Fibre Channel connected RAID disk

28
Staff Estimate(In Pseudo Detail)
29
Time Evolution of Facilities
  • Tier 1 functioning as early prototype
  • Ramp up to meet needs and validate design
  • Assume 2 years for Tier 2 to fully establish
  • Initiate first Tier 2 in 2001
  • True Tier 2 prototype
  • Demonstrate Tier 1 - Tier 2 interaction
  • Second Tier 2 initiated in 2002 (CERN?)
  • Four remaining initiated in 2003
  • Fully operational by 2005
  • Six are to be identical (CERN exception?)

30
Staff Evolution
31
Network
  • Tier 1 connectivity to CERN and to Tier 2s is
    critical
  • Must be guaranteed and allocable (dedicated and
    differentiate)
  • Must be adequate (Triage of functions is
    disruptive)
  • Should grow with need OC12 should be practical
    by 2005 when serious data will flow

32
WAN Configurations and Cost(FY 2000 k)
33
Annual Equipment Costs for Tier 1 Center (FY
2000 k)
34
Annual Equipment Costs Tier 2 Center(FY 2000
k)
35
Integrated Facility Capacities by Year
36
US ATLAS Facilities Annual Costs (FY2000 k)
37
Major Milestones
Write a Comment
User Comments (0)
About PowerShow.com