Grid Computing at LHC and ATLAS Data Challenges - PowerPoint PPT Presentation

About This Presentation
Title:

Grid Computing at LHC and ATLAS Data Challenges

Description:

Grid Computing at LHC and ATLAS Data Challenges – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 65
Provided by: gilbert63
Category:
Tags: atlas | lhc | caf | challenges | computing | data | el | grid

less

Transcript and Presenter's Notes

Title: Grid Computing at LHC and ATLAS Data Challenges


1
Grid Computing at LHC and ATLAS Data Challenges
  • IMFP-2006
  • El Escorial, Madrid, Spain.
  • April 4, 2006
  • Gilbert Poulard (CERN PH-ATC)

2
Overview
  • Introduction
  • LHC experiments Computing challenges
  • WLCG Worldwide LHC Computing Grid
  • ATLAS experiment
  • Building the Computing System
  • Conclusions

3
LHC (CERN)
Introduction LHC/CERN
Mont Blanc, 4810 m
Geneva
4
LHC Computing Challenges
  • Large distributed community
  • Large data volume and access to it to everyone
  • Large CPU capacity

5
Challenge 1 Large, distributed community
Offline software effort 1000 person-yearsper
experiment
Software life span 20 years
6
Large data volume
RateHz RAWMB ESD rDSTRECO MB AOD kB Monte CarloMB/evt MonteCarlo of real
ALICE HI 100 12.5 2.5 250 300 100
ALICE pp 100 1 0.04 4 0.4 100
ATLAS 200 1.6 0.5 100 2 20
CMS 150 1.5 0.25 50 2 100
LHCb 2000 0.025 0.025 0.5 20
50 days running in 2007107 seconds/year pp from
2008 on ? 2 x 109 events/experiment106
seconds/year heavy ion
7
Large CPU capacity
  • ATLAS resources in 2008
  • Assume 2 x 109 events per year (1.6 MB per event)
  • First pass reconstruction will run at CERN Tier-0
  • Re-processing will be done at Tier-1s (Regional
    Computing Centers) (10)
  • Monte Carlo simulation will be done at Tier-2s
    (e.g. Physics Institutes) (30) 4
  • Full simulation of 20 of the data rate
  • Analysis will be done at Analysis Facilities
    Tier-2s Tier-3s

CPU (MSi2k) Disk (PB) Tape (PB)
Tier-0 4.1 0.4 5.7
CERN Analysis Facility 2.7 1.9 0.5
Sum of Tier-1s 24.0 14.4 9.0
Sum of Tier-2s 19.9 8.7 0.0
Total 50.7 25.4 15.2
50000 todays CPU
8
CPU Requirements
Tier-2
Tier-1
58pledged
CERN
9
Disk Requirements
Tier-2
Tier-1
54pledged
CERN
10
Tape Requirements
Tier-1
CERN
75pledged
11
LHC Computing Challenges
  • Large distributed community
  • Large data volume and access to it to everyone
  • Large CPU capacity
  • How to face the problems?
  • CERN Computing Review (2000-2001)
  • Grid is the chosen solution
  • Build the LCG (LHC Computing Grid) project
  • Roadmap for the LCG project
  • And for experiments
  • In 2005 LCG became WLCG

12
What is the Grid?
  • The World Wide Web provides seamless access to
    information that is stored in many millions of
    different geographical locations.
  • The Grid is an emerging infrastructure that
    provides seamless access to computing power and
    data storage capacity distributed over the globe.
  • Global Resource Sharing
  • Secure Access
  • Resource Use Optimization
  • The Death of Distance - networking
  • Open Standards

13
The Worldwide LHC Computing Grid Project - WLCG
  • Collaboration
  • LHC Experiments
  • Grid projects Europe, US
  • Regional national centres
  • Choices
  • Adopt Grid technology.
  • Go for a Tier hierarchy
  • Goal
  • Prepare and deploy the computing environment to
    help the experiments analyse the data from the
    LHC detectors.

14
The Worldwide LCG Collaboration
  • Members
  • The experiments
  • The computing centres Tier-0, Tier-1,
    Tier-2
  • Memorandum of understanding
  • Resources, services, defined service levels
  • Resource commitments pledged for the next year,
    with a 5-year forward look

15
WLCG services built on two major science grid
infrastructures EGEE - Enabling Grids for
E-SciencE OSG - US Open Science Grid
16
Enabling Grids for E-SciencE
  • EU supported project
  • Develop and operate a multi-science grid
  • Assist scientific communities to embrace grid
    technology
  • First phase concentrated on operations and
    technology
  • Second phase (2006-08) Emphasis on extending
    the scientific, geographical and industrial
    scope
  • world-wide Grid infrastructure
  • international collaboration
  • in phase 2 will have gt 90 partners in
    32 countries

17
Open Science Grid
  • Multi-disciplinary Consortium
  • Running physics experiments CDF, D0, LIGO, SDSS,
    STAR
  • US LHC Collaborations
  • Biology, Computational Chemistry
  • Computer Science research
  • Condor and Globus
  • DOE Laboratory Computing Divisions
  • University IT Facilities
  • OSG today
  • 50 Compute Elements
  • 6 Storage Elements
  • VDT 1.3.9
  • 23 VOs

18
Architecture Grid services
  • Storage Element
  • Mass Storage System (MSS) (CASTOR, Enstore, HPSS,
    dCache, etc.)
  • Storage Resource Manager (SRM) provides a common
    way to access MSS, independent of implementation
  • File Transfer Services (FTS) provided e.g. by
    GridFTP or srmCopy
  • Computing Element
  • Interface to local batch system e.g. Globus
    gatekeeper.
  • Accounting, status query, job monitoring
  • Virtual Organization Management
  • Virtual Organization Management Services (VOMS)
  • Authentication and authorization based on VOMS
    model.
  • Grid Catalogue Services
  • Mapping of Globally Unique Identifiers (GUID) to
    local file name
  • Hierarchical namespace, access control
  • Interoperability
  • EGEE and OSG both use the Virtual Data Toolkit
    (VDT)
  • Different implementations are hidden by common
    interfaces

19
Technology - Middleware
  • Currently, the LCG-2 middleware is deployed in
    more than 100 sites
  • It originated from Condor, EDG, Globus, VDT, and
    other projects.
  • Will evolve now to include functionalities of the
    gLite middleware provided by the EGEE project
    which has just been made available.
  • Site services include security, the Computing
    Element (CE), the Storage Element (SE),
    Monitoring and Accounting Services currently
    available both form LCG-2 and gLite.
  • VO services such as Workload Management System
    (WMS), File Catalogues, Information Services,
    File Transfer Services exist in both flavours
    (LCG-2 and gLite) maintaining close relations
    with VDT, Condor and Globus.

20
Technology Fabric Technology
  • Moores law still holds for processors and disk
    storage
  • For CPU and disks we count a lot on the evolution
    of the consumer market
  • For processors we expect an increasing importance
    of 64-bit architectures and multicore chips
  • Mass storage (tapes and robots) is still a
    computer centre item with computer centre pricing
  • It is too early to conclude on new tape drives
    and robots
  • Networking has seen a rapid evolution recently
  • Ten-gigabit Ethernet is now in the production
    environment
  • Wide-area networking can already now count on 10
    Gb connections between Tier-0 and Tier-1s. This
    will move gradually to the Tier-1 Tier-2
    connections.

21
Common Physics Applications
  • Core software libraries
  • SEAL-ROOT merger
  • Scripting CINT, Python
  • Mathematical libraries
  • Fitting, MINUIT (in C)
  • Data management
  • POOL ROOT I/O for bulk dataRDBMS for metadata
  • Conditions database COOL
  • Event simulation
  • Event generators generator library (GENSER)
  • Detector simulation GEANT4 (ATLAS, CMS, LHCb)
  • Physics validation, compare GEANT4, FLUKA, test
    beam
  • Software development infrastructure
  • External libraries
  • Software development and documentation tools
  • Quality assurance and testing
  • Project portal Savannah

22
The Hierarchical Model
  • Tier-0 at CERN
  • Record RAW data (1.25 GB/s ALICE 320 MB/s ATLAS)
  • Distribute second copy to Tier-1s
  • Calibrate and do first-pass reconstruction
  • Tier-1 centres (11 defined)
  • Manage permanent storage RAW, simulated,
    processed
  • Capacity for reprocessing, bulk analysis
  • Tier-2 centres (gt 100 identified)
  • Monte Carlo event simulation
  • End-user analysis
  • Tier-3
  • Facilities at universities and laboratories
  • Access to data and processing in Tier-2s, Tier-1s
  • Outside the scope of the project

23
Tier-1s
Tier-1 Centre Experiments served with priority Experiments served with priority Experiments served with priority Experiments served with priority
Tier-1 Centre ALICE ATLAS CMS LHCb
TRIUMF, Canada X
GridKA, Germany X X X X
CC, IN2P3, France X X X X
CNAF, Italy X X X X
SARA/NIKHEF, NL X X X
Nordic Data Grid Facility (NDGF) X X X
ASCC, Taipei X X
RAL, UK X X X X
BNL, US X
FNAL, US X
PIC, Spain X X X
24
Tier-2s
100 identified number still growing
25
Tier-0 -1 -2 Connectivity
National Research Networks (NRENs) at
Tier-1sASnetLHCnet/ESnetGARRLHCnet/ESnetRENA
TERDFNSURFnet6NORDUnetRedIRISUKERNACANARIE
26
Prototypes
  • It is important that the hardware and software
    systems developed in the framework of LCG be
    exercised in more and more demanding challenges
  • Data Challenges have been recommended by the
    Hoffmann Review of 2001. They though the main
    goal was to validate the distributed computing
    model and to gradually build the computing
    systems, the results have been used for physics
    performance studies and for detector, trigger,
    and DAQ design. Limitations of the Grids have
    been identified and are being addressed.
  • A series of Data Challenges have been run by the
    4 experiments
  • Presently, a series of Service Challenges aim to
    realistic end-to-end testing of experiment
    use-cases over extended period leading to stable
    production services.
  • The project A Realisation of Distributed
    Analysis for LHC (ARDA) is developing end-to-end
    prototypes of distributed analysis systems using
    the EGEE middleware gLite for each of the LHC
    experiments.

27
Service Challenges
  • Purpose
  • Understand what it takes to operate a real grid
    service run for days/weeks at a time (not just
    limited to experiment Data Challenges)
  • Trigger and verify Tier1 large Tier-2 planning
    and deployment - tested with realistic usage
    patterns
  • Get the essential grid services ramped up to
    target levels of reliability, availability,
    scalability, end-to-end performance
  • Four progressive steps from October 2004 thru
    September 2006
  • End 2004 - SC1 data transfer to subset of
    Tier-1s
  • Spring 2005 SC2 include mass storage, all
    Tier-1s, some Tier-2s
  • 2nd half 2005 SC3 Tier-1s, gt20 Tier-2s first
    set of baseline services
  • Jun-Sep 2006 SC4 pilot service

28
Key dates for Service Preparation
Sep05 - SC3 Service Phase
Jun06 SC4 Service Phase
Sep06 Initial LHC Service in stable operation
Apr07 LHC Service commissioned
  • SC3 Reliable base service most Tier-1s,
    some Tier-2s basic experiment software chain
    grid data throughput 1GB/sec, including mass
    storage 500 MB/sec (150 MB/sec 60 MB/sec at
    Tier-1s)
  • SC4 All Tier-1s, major Tier-2s capable of
    supporting full experiment software chain inc.
    analysis sustain nominal final grid data
    throughput ( 1.5 GB/sec mass storage throughput)
  • LHC Service in Operation September 2006
    ramp up to full operational capacity by April
    2007 capable of handling twice the nominal data
    throughput

29
ARDA A Realisation of Distributed Analysis for
LHC
  • Distributed analysis on the Grid is the most
    difficult and least defined topic
  • ARDA sets out to develop end-to-end analysis
    prototypes using the LCG-supported middleware.
  • ALICE uses the AliROOT framework based on PROOF.
  • ATLAS has used DIAL services with the gLite
    prototype as backend this is rapidly evolving.
  • CMS has prototyped the ARDA Support for CMS
    Analysis Processing (ASAP) that us used by
    several CMS physicists for daily analysis work.
  • LHCb has based its prototype on GANGA, a common
    project between ATLAS and LHCb.

30
Production GridsWhat has been achieved
  • Basic middleware
  • A set of baseline services agreed and initial
    versions in production
  • All major LCG sites active
  • 1 GB/sec distribution data rate mass storage to
    mass storage, gt 50 of the nominal LHC data rate
  • Grid job failure rate 5-10 for most
    experiments,down from 30 in 2004
  • Sustained 10K jobs per day
  • gt 10K simultaneous jobs during prolonged periods

31
Summary on WLCG
  • Two grid infrastructures are now in operation, on
    which we are able to complete the computing
    services for LHC
  • Reliability and performance have improved
    significantly over the past year
  • The focus of Service Challenge 4 is to
    demonstrate a basic but reliable service that
    can be scaled up by April 2007 to the capacity
    and performance needed for the first beams.
  • Development of new functionality andservices
    must continue, but we must be careful that this
    does not interferewith the main priority for
    this year reliable operation of the baseline
    services

From Les Robertson (CHEP06)
32
ATLAS
A Toroidal LHC ApparatuS
  • Detector for the study of high-energy
    proton-proton collision.
  • The offline computing will have to deal with an
    output event rate of 200 Hz. i.e 2x109 events per
    year with an average event size of 1.6 Mbyte.
  • Researchers are spread all over the world.

ATLAS 2000 Collaborators 150 Institutes 34
Countries
Diameter 25 m Barrel toroid length 26 m Endcap
end-wall chamber span 46 m Overall weight
7000 Tons
33
The Computing Model
PC (2004) 1 kSpecInt2k
Pb/sec
Event Builder
10 GB/sec
Event Filter159kSI2k
  • Some data for calibration and monitoring to
    institutess
  • Calibrations flow back

450 Mb/sec
  • 9 Pb/year/T1
  • No simulation

Tier 0
T0 5MSI2k
HPSS
300MB/s/T1 /expt

Tier 1
  • 7.7MSI2k/T1
  • 2 Pb/year/T1

UK Regional Centre (RAL)
US Regional Centre
Spanish Regional Centre (PIC)
Italian Regional Centre
HPSS
?622Mb/s
Tier 2
Tier2 Centre 200kSI2k
Tier2 Centre 200kSI2k
Tier2 Centre 200kSI2k
  • 200 Tb/year/T2

?622Mb/s
Each Tier 2 has 25 physicists working on one or
more channels Each Tier 2 should have the full
AOD, TAG relevant Physics Group summary
data Tier 2 do bulk of simulation
Lancaster 0.25TIPS
Sheffield
Manchester
Liverpool
Physics data cache
100 - 1000 MB/s
Desktop
Workstations
34
ATLAS Data Challenges (1)
  • LHC Computing Review (2001)
  • Experiments should carry out Data Challenges of
    increasing size and complexity
  • to validate
  • their Computing Model
  • their Complete Software suite
  • their Data Model
  • to ensure
  • the correctness of the technical choices to be
    made

35
ATLAS Data Challenges (2)
  • DC1 (2002-2003)
  • First ATLAS exercise on world-wide scale
  • O(1000) CPUs peak
  • Put in place the full software chain
  • Simulation of the data digitization pile-up
    reconstruction
  • Production system
  • Tools
  • Bookkeeping of data and Jobs (AMI) Monitoring
    Code distribution
  • Preliminary Grid usage
  • NorduGrid all production performed on the Grid
  • US Grid used at the end of the exercise
  • LCG-EDG some testing during the Data Challenge
    but not real production
  • At least one person per contributing site
  • Many people involved
  • Lessons learned
  • Management of failures is a key concern
  • Automate to cope with large amount of jobs
  • Build the ATLAS DC community
  • Physics Monte Carlo data needed for ATLAS High
    Level Trigger Technical Design Report

36
ATLAS Data Challenges (3)
  • DC2 (2004)
  • Similar exercise as DC1 (scale physics
    processes)
  • BUT
  • Introduced the new ATLAS Production System
    (ProdSys)
  • Unsupervised production across many sites spread
    over three different Grids (US Grid3
    ARC/NorduGrid LCG-2)
  • Based on DC1 experience with AtCom and GRAT
  • Core engine with plug-ins
  • 4 major components
  • Production supervisor
  • Executor
  • Common data management system
  • Common production database
  • Use middleware components as much as possible
  • Avoid inventing ATLASs own version of Grid
  • Use middleware broker, catalogs, information
    system,
  • Immediately followed by Rome production (2005)
  • Production of simulated data for an ATLAS Physics
    workshop in Rome in June 2005 using the DC2
    infrastructure.

37
ATLAS Production System
  • ATLAS uses 3 Grids
  • LCG ( EGEE)
  • ARC/NorduGrid (evolved from EDG)
  • OSG/Grid3 (US)
  • Plus possibility for local batch submission (4
    interfaces)
  • Input and output must be accessible from all
    Grids
  • The system makes use of the native Grid
    middleware as much as possible (e.g.. Grid
    catalogs) not re-inventing its own solution.

38
ATLAS Production System
  • The production database, which contains abstract
    job definitions
  • A supervisor (Windmill Eowyn) that reads the
    production database for job definitions and
    present them to the different Grid executors in
    an easy-to-parse XML format
  • The Executors, one for each Grid flavor, that
    receives the job-definitions in XML format and
    converts them to the job description language of
    that particular Grid
  • DonQuijote (DQ), the ATLAS Data Management
    System, moves files from their temporary output
    locations to their final destination on some
    Storage Elements and registers the files in the
    Replica Location Service of that Grid
  • In order to handle the task of ATLAS DCs
  • an automated Production system was developed.
  • It consists of 4 components

39
The 3 Grid flavors LCG-2
ATLAS DC2 Autumn 2004
Number of sites resources are evolving quickly
40
The 3 Grid flavors Grid3
ATLAS DC2 Autumn 2004
  • Sep 04
  • 30 sites, multi-VO
  • shared resources
  • 3000 CPUs (shared)
  • The deployed infrastructure has been in operation
    since November 2003
  • At this moment running 3 HEP and 2 Biological
    applications
  • Over 100 users authorized to run in GRID3

41
The 3 Grid flavors NorduGrid
  • NorduGrid is a research collaboration established
    mainly across Nordic Countries but includes sites
    from other countries.
  • They contributed to a significant part of the DC1
    (using the Grid in 2002).
  • It supports production on several operating
    systems.

ATLAS DC2 Autumn 2004
  • gt 10 countries, 40 sites, 4000 CPUs,
  • 30 TB storage

42
Production phases
Bytestream Raw Digits
ESD
AOD
Bytestream Raw Digits
Digits (RDO) MCTruth
Mixing
Reconstruction
Hits MCTruth
Events HepMC
Geant4
Digitization
Bytestream Raw Digits
ESD
AOD
Digits (RDO) MCTruth
Hits MCTruth
Events HepMC
Pythia
Reconstruction
Geant4
Digitization
Digits (RDO) MCTruth
Hits MCTruth
Events HepMC
Pile-up
Geant4
Bytestream Raw Digits
ESD
AOD
Bytestream Raw Digits
Mixing
Reconstruction
Digits (RDO) MCTruth
Events HepMC
Hits MCTruth
Geant4
Bytestream Raw Digits
Pile-up
20 TB
5 TB
20 TB
30 TB
5 TB
Event Mixing
Digitization (Pile-up)
Reconstruction
Detector Simulation
Event generation
Byte stream
Persistency Athena-POOL
TB
Physics events
Min. bias Events
Piled-up events
Mixed events
Mixed events With Pile-up
Volume of data for 107 events
43
ATLAS productions
  • DC2
  • Few datasets
  • Different type of jobs
  • Physics Events Generation
  • Very short
  • Geant simulation
  • Geant3 in DC1 Geant4 in DC2 Rome
  • Long more than 10 hours
  • Digitization
  • Medium 5 hours
  • Reconstruction
  • short
  • All types of jobs run sequentially
  • Each phase one after the other
  • Rome
  • Many different (gt170) datasets
  • Different physics channels
  • Same type of jobs
  • Event Generation Simulation, etc.
  • All type of jobs run in parallel
  • Now continuous production
  • Goal is to reach 2M events per week.

The different type of running has a large impact
on the production rate
44
ATLAS Productions countries (sites)
  • Australia (1) (0)
  • Austria (1)
  • Canada (4) (3)
  • CERN (1)
  • Czech Republic (2)
  • Denmark (4) (3)
  • France (1) (4)
  • Germany (12)
  • Greece (0) (1)
  • Hungary (0) (1)
  • Italy (7) (17)
  • Japan (1) (0)
  • Netherlands (1) (2)
  • Norway (3) (2)
  • Poland (1)
  • Portugal (0) (1)
  • Russia (0) (2)
  • Slovakia (0) (1)
  • Slovenia (1)
  • Spain (3)
  • Sweden (7) (5)
  • Switzerland (1) (11)
  • Taiwan (1)
  • UK (7) (8)
  • USA (19)

DC2 20 countries 69 sites Rome 22 countries
84 sites
DC2 13 countries 31 sites Rome 17 countries
51 sites
DC2 7 countries 19 sites Rome 7 countries
14 sites
Spring 2006 30 countries 126 sites LCG
104 OSG/Grid3 8 NDGF 14
45
ATLAS DC2 Jobs Total
As of 30 November 2004
20 countries 69 sites 260000 Jobs 2
MSi2k.months
46
Rome production Number of Jobs
4
5
As of 17 June 2005
6
4
6
5
6
5
47
Rome production statistics
  • 173 datasets
  • 6.1 M events simulated and reconstructed (without
    pile-up)
  • Total simulated data 8.5 M events
  • Pile-up done for 1.3 M events
  • 50 K reconstructed

48
ATLAS Production (2006)
49
ATLAS Production(July 2004 - May 2005)
50
ATLAS Service Challenges 3
  • Tier-0 scaling tests
  • Test of the operations at CERN Tier-0
  • Original goal 10 exercise
  • Preparation phase July-October 2005
  • Tests October05-January06

51
ATLAS Service Challenges 3
  • The Tier-0 facility at CERN is responsible for
    the following operations
  • Calibration and alignment
  • First-pass ESD production
  • First-pass AOD production
  • TAG production
  • Archiving of primary RAW and first-pass ESD, AOD
    and TAG data
  • Distribution of primary RAW and first-pass ESD,
    AOD and TAG data.

52
ATLAS SC3/Tier-0 (1)
  • Components of Tier-0
  • Castor mass storage system and local replica
    catalogue
  • CPU farm
  • Conditions DB
  • TAG DB
  • Tier-0 production database
  • Data management system, Don Quijote 2 (DQ2)
  • To be orchestred by the Tier-0 Management System
  • TOM, based on ATLAS Production System (ProdSys)

53
ATLAS SC3/Tier-0 (2)
  • Deploy and test
  • LCG/gLite components (main focus on T0 exercise)
  • FTS server at T0 and T1
  • LFC catalog at T0, T1 and T2
  • VOBOX at T0, T1 and T2
  • SRM Storage element at T0, T1 and T2
  • ATLAS DQ2 specific components
  • Central DQ2 dataset catalogs
  • DQ2 site services
  • Sitting in VOBOXes
  • DQ2 client for TOM

54
ATLAS Tier-0
tape
RAW
ESD
RAW
AODm
ESD (2x)
RAW
AODm (10x)
0.44 Hz 37K f/day 440 MB/s
1.6 GB/file 0.2 Hz 17K f/day 320 MB/s 27 TB/day
1 Hz 85K f/day 720 MB/s
castor
EF
T1
T1
T1
2.24 Hz 170K f/day (temp) 20K f/day (perm) 140
MB/s
0.4 Hz 190K f/day 340 MB/s
RAW
ESD
AOD
AODm
AOD
0.5 GB/file 0.2 Hz 17K f/day 100 MB/s 8 TB/day
10 MB/file 2 Hz 170K f/day 20 MB/s 1.6 TB/day
500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day
CPU
55
Scope of the Tier-0 Scaling Test
  • It was only possible to test
  • EF writing into Castor
  • ESD/AOD production on reco farm
  • archiving to tape
  • export to Tier-1s of RAW/ESD/AOD
  • the goal was to test as much as possible, as
    realistic as possible
  • mainly data-flow/infrastructure test (no physics
    value)
  • calibration alignment processing not included
    yet
  • CondDB and TagDB streams

56
Oct-Dec 2005 Test Some Results
Castor Writing Rates (Dec 19-20) - EF farm ?
Castor (write.raw) - reco farm ? Castor
- reco jobs write.esd write.aodtmp
- AOD-merging jobs write.aod
57
Tier-0 Internal Test, Jan 28-29, 2006
READING (nom. rate 780 MB/s) - Disk ? WN -
Disk ? Tape
780 M
460 M
WRITING (nom. rate 460 MB/s) - SFO ? Disk -
WN ? Disk
440 M
WRITING (nom. rate 440 MB/s) - Disk ? Tape
58
ATLAS SC4 Tests (June to December 2006)
  • Complete Tier-0 test
  • Internal data transfer from Event Filter farm
    to Castor disk pool, Castor tape, CPU farm
  • Calibration loop and handling of conditions data
  • Including distribution of conditions data to
    Tier-1s (and Tier-2s)
  • Transfer of RAW, ESD, AOD and TAG data to Tier-1s
  • Transfer of AOD and TAG data to Tier-2s
  • Data and dataset registration in DB
  • Distributed production
  • Full simulation chain run at Tier-2s (and
    Tier-1s)
  • Data distribution to Tier-1s, other Tier-2s and
    CAF
  • Reprocessing raw data at Tier-1s
  • Data distribution to other Tier-1s, Tier-2s and
    CAF
  • Distributed analysis
  • Random job submission accessing data at Tier-1s
    (some) and Tier-2s (mostly)
  • Tests of performance of job submission,
    distribution and output retrieval

Need to define and test Tiers infrastructure and
Tier-1 Tier-1 Tier-1
Tier-2s associations
59
ATLAS Tier-1s
2008 Resources 2008 Resources CPU CPU Disk Disk Tape Tape
2008 Resources 2008 Resources MSI2K PB PB
Canada TRIUMF 1.06 4.4 0.62 4.3 0.4 4.4
France CC-IN2P3 3.02 12.6 1.76 12.2 1.15 12.8
Germany FZK 2.4 10 1.44 10 0.9 10
Italy CNAF 1.76 7.3 0.8 5.5 0.67 7.5
Nordic Data Grid Facility 1.46 6.1 0.62 4.3 0.62 6.9
Netherlands SARA 3.05 12.7 1.78 12.3 1.16 12.9
Spain PIC 1.2 5 0.72 5 0.45 5
Taiwan ASGC 1.87 7.8 0.83 5.8 0.71 7.9
UK RAL 1.57 6.5 0.89 6.2 1.03 11.5
USA BNL 5.3 22.1 3.09 21.4 2.02 22.5

Total 2008 pledged 22.69 94.5 12.55 87 9.11 101.4
2008 needed 23.97 100 14.43 100 8.99 100
2008 missing 1.28 5.5 1.88 13 -0.12 -1.4
60
ATLAS Tiers Association (SC4-draft)
Associated Tier-1 Tier-2 or planned Tier-2 Tier-2 or planned Tier-2 Tier-2 or planned Tier-2 Tier-2 or planned Tier-2
Disk TB PB
Canada TRIUMF 5.3 SARA East T2 Fed. West T2 Fed.
France CC-IN2P3 13.5 BNL CC-IN2P3 AF GRIF LPC HEP-Beijing
Romanian T2
Germany FZK-GridKa 10.5 BNL DESY Munich Fed. Freiburg Uni. Wuppertal Uni.
FZU AS (CZ) Polish T2 Fed.
Italy CNAF 7.5 RAL INFN T2 Fed.
Netherlands SARA 13.0 TRIUMF ASGC
Nordic Data Grid Facility 5.5 PIC
Spain PIC 5.5 NDGF ATLAS T2 Fed
Taiwan ASGC 7.7 SARA Taiwan AF Fed
UK RAL 7.5 CNAF Grid London NorthGrid ScotGrid SouthGrid
USA BNL 24 CC-IN2P3 FZK-GridKa BU/HU T2 Midwest T2 Southwest T2
No association (yet) No association (yet) No association (yet) No association (yet) Melbourne Uni. ICEPP Tokyo LIP T2 HEP-IL Fed.
No association (yet) No association (yet) No association (yet) No association (yet) Russian Fed. CSCS (CH) UIBK Brazilian T2 Fed.
61
Computing System Commissioning
  • We have defined the high-level goals of the
    Computing System Commissioning operation during
    2006
  • More a running-in of continuous operation than a
    stand-alone challenge
  • Main aim of Computing System Commissioning will
    be to test the software and computing
    infrastructure that we will need at the beginning
    of 2007
  • Calibration and alignment procedures and
    conditions DB
  • Full trigger chain
  • Event reconstruction and data distribution
  • Distributed access to the data for analysis
  • At the end (autumn-winter 2006) we will have a
    working and operational system, ready to take
    data with cosmic rays at increasing rates

62
(No Transcript)
63
Conclusions (ATLAS)
  • Data Challenges (1,2) productions(Rome
    current (continuous))
  • Have proven that the 3 Grids LCG-EGEE OSG/Grid3
    and Arc/NorduGrid can be used in a coherent way
    for real large scale productions
  • Possible, but not easy
  • In SC3
  • We succeeded to reach the nominal data transfer
    at Tier-0 (internally) and reasonable transfers
    to Tier-1
  • SC4
  • Should allow us to test the full chain using the
    new WLCG middleware and infrastructure and the
    new ATLAS Production and Data management systems
  • This will include a more complete Tier-0 test
    Distributed productions and distributed analysis
    tests
  • Computing System Commissioning
  • Will have as main goal to have a full working and
    operational system
  • Leading to a Physics readiness report

64
Thank you
Write a Comment
User Comments (0)
About PowerShow.com