les robertson cernit 1100 1 presentation

About This Presentation

Transcript and Presenter's Notes

Title: les robertson cernit 1100 1

1
The LHC Computing Challenge

CMS Conference
ITEP - Moscow
Les Robertson
CERN - IT Division
23 November 2000
les.robertson_at_cern.ch

2
Summary

HEP offline computing the current model
LHC computing requirements
The wide area computing model
A place for Grid technology?
The DataGRID project
Conclusions

3
Data Handling and Computation for Physics Analysis
event filter (selection reconstruction)
detector
processed data
event summary data
raw data
batch physics analysis
event reprocessing
analysis objects (extracted by physics topic)
event simulation
interactive physics analysis
les.robertson_at_cern.ch
4
HEP Computing Characteristics

Large numbers of independent events
trivial parallelism
Large data sets
smallish records
mostly read-only
Modest I/O rates
few MB/sec per fast processor
Modest floating point requirement
SPECint performance
Very large aggregate requirements computation,
data
Scaling up is not just big it is also complex
and once you exceed the capabilities of a single
geographical installation ?

5
The SHIFT Software Model
application servers
IP network
stage (migration)servers
Storage access API which can be implemented over
IP ----- all data available to all
processes ----- replicated components -
scalable heterogeneous distributed
les.robertson_at_cern.ch
6
Generic computing farm
network servers
application servers
tape servers
les.robertson_at_cern.ch
disk servers
Cern/it/pdp-les.robertson 10-98-6
7
HEP computing farms use commodity
components Simple Office PCs
8
Standard components

Computing Storage Fabric
built up from commodity components
Simple PCs
Inexpensive network-attached disk
Standard network interface (Fast Gigabit
Ethernet)
with a minimum of high(er)-end components
LAN backbone
WAN connection

PC-based disk server 20 IDE disks 1.5 TeraBytes
9
HEPs not special, just more cost conscious

Computing Storage Fabric
built up from commodity components
Simple PCs
Inexpensive network-attached disk
Standard network interface (Fast Gigabit
Ethernet)
with a minimum of high(er)-end components
LAN backbone
WAN connection

10
Limit the role of high end equipment

Computing Storage Fabric
built up from commodity components
Simple PCs
Inexpensive network-attached disk
Standard network interface (Fast Gigabit
Ethernet)
with a minimum of high(er)-end components
LAN backbone WAN
connection

11
High Throughput Computing

High Throughput Computing
mass of modest problems
throughput rather than performance
resilience rather than ultimate reliability
HEP can exploit inexpensive mass market
components
to build large computing/data clusters
scalable, extensible, flexible, heterogeneous,
..
and as a result - really hard to manage
We should have much in common with data mining,
Internet computing facilities,

Chaotic workload
12
LHC Computing Requirements
13
Projected LHC Computing Fabric at CERNno more
than 1/3 of the total LHC computing requirement
Estimated computing resources required at CERN
for LHC
experiments in 2006
collaboration
ALICE
ATLAS
CMS
LHCB
Total
420 000
520 000
1 760 000
600 000
220 000
CPU capacity (SPECint95)
2006
3 000
3 000
3 000
estimated cpus in 2006
1 500
10 500
disk capacity (TB)
2006
800
750
650
450
2 650
3,7
3,0
1,8
0.6
9,1
2006
mag.tape capacity (PB)
aggregate I/O rates (GB/sec)
100
100
40
340
100
disk
1,2
0,8
0,8
0,2
3,0
tape
Effective throughputof LAN backbone
14
lt 50 of the main analysis capacity will be at
CERN
les.robertson_at_cern.ch
15
Other experiments
LHC experiments
Jan 200030 TB disk1 PB tape
Other experiments
LHC experiments
les.robertson_at_cern.ch
16
Components to Fabrics

Commodity components are just fine for HEP
Masses of experience with inexpensive farms
Long experience with mass storage
LAN technology is going the right way
Inexpensive high performance PC attachments
Compatible with hefty backbone switches
Good ideas for improving automated operation and
management
Just needs some solid computer engineering RD?

17
Two Problems

Funding
will funding bodies place all their investment at
CERN?
Geography
does a geographically distributed model better
serve the needs of the world-wide distributed
community?

No Maybe if it is reliable and easy to use
18
World Wide Collaboration ? distributed
computing storage capacity
CMS 1800 physicists 150 institutes 32 countries
19
Solution? - Regional Computing Centres

Exploit established computing expertise
infrastructure
in national labs, universities
Reduce dependence on links to CERN
full summary data available nearby
through a fat, fast, reliable network link
Tap funding sources not otherwise available to
HEP at CERN
Devolve control over resource allocation
national interests?
regional interests?
at the expense of physics interests?

20
The Basic Problem - Summary

Scalability ? cost ? complexity ? management
Thousands of processors, thousands of disks,
PetaBytes of data, Terabits/second of I/O
bandwidth, .
Wide-area distribution ? complexity ? management
? bandwidth
WANs are only and will only be 1 of LANs
Distribute, replicate, cache, synchronise the
data
Multiple ownership, policies, .
Integration of this amorphous collection of
Regional Centres ..
.. with some attempt at optimisation
Adaptability ? flexibility ? simplicity
We shall only know how analysis will be done once
the data arrives

21
The Wide Area Computing Model
22
Regional Centres - a Multi-Tier Model
les.robertson_at_cern.ch
23

Tier 0 CERN
Data recording, reconstruction, 20 analysis
Full data sets on permanent mass storage
raw, ESD, simulated data
Hefty WAN capability
Range of export-import media
24 X 7 availability

Tier 1 established data centre or new
facility hosted by a lab
Major subset of data all/most of the ESD,
selected raw data
Mass storage, managed data operation
ESD analysis, AOD generation, major analysis
capacity
Fat pipe to CERN
High availability
User consultancy Library Collaboration
Software support

Tier 2 smaller labs, smaller countries,
probably hosted by existing data centre
Mainly AOD analysis
Data cached from Tier 1, Tier 0 centres
No mass storage management
Minimal staffing costs

University physics department
Final analysis
Dedicated to local users
Limited data capacity cached only via the
network
Zero administration costs (fully automated)

25
More realistically - a Grid Topology
les.robertson_at_cern.ch
26
A place for Grid technology?
27
Are Grids a solution?

Computational Grids
Change of orientation of Meta-computing activity
From inter-connected super-computers ..
towards a more general concept of a
computational power Grid (The Grid Ian
Foster, Carl Kesselman)
Has found resonance with the press, funding
agencies
But what is a Grid?
Dependable, consistent, pervasive access to
resources
So, in some way Grid technology makes it easy to
use diverse, geographically distributed, locally
managed and controlled computing facilities
as if they formed a coherent local cluster

Ian Foster and Carl Kesselman, editors, The
Grid Blueprint for a New Computing
Infrastructure, Morgan Kaufmann, 1999
28
What does the Grid do for you?

You submit your work
And the Grid
Finds convenient places for it to be run
Organises efficient access to your data
Caching, migration, replication
Deals with authentication to the different sites
that you will be using
Interfaces to local site resource allocation
mechanisms, policies
Runs your jobs
Monitors progress
Recovers from problems
Tells you when your work is complete
If there is scope for parallelism, it can also
decompose your work into convenient execution
units based on the available resources, data
distribution

29
Current state

Globus project (http//www.globus.org)
Basic middleware
Authentication
Information service
Resource management
Good basis to build on
Active collaborative community
Open approach Grid Forum (http//www.gridforum.o
rg)
Who is handling lots of data?
How many production quality implementations?

30
RD required

Local fabric
Issues of scalability, management, reliability of
the local computing fabric
Adaptation of these amorphous computing fabrics
to the Grid
Wide Area Mass Storage
Grid technology in an environment that is High
Throughput, Data Intensive, and has a Chaotic
Worload
Grid scheduling
Data management
Monitoring - reliability and performance

31
HEP Grid Initiatives

DataGRID
European Commission support, HEP,Earth
Observation, Biology
PPDG Particle Physics Data Grid
US labs HEP data analysis
High performance file transfer, data caching
GriPhyN Grids for Physics Networks
Computer science focus
HEP applications target
Several national European initiatives
Italy INFN
UK, France, Netherlands,

32
The DataGRID Project
33
The Data Grid Project

Proposal for EC Fifth Framework funding
Principal goals
Middleware for fabric Grid management
Large scale testbed
Production quality demonstrations
mock data, simulation analysis, current
experiments
Three year phased developments demos
Collaborate with and complement other European
and US projects
Open source and communication
GRID Forum
Industry and Research Forum

34
DataGRID Partners

Managing partners
UK PPARC Italy INFN
France CNRS Holland NIKHEF
Italy ESA/ESRIN CERN
Industry
IBM (UK), Compagnie des Signaux (F), Datamat (I)
Associate partners
Istituto Trentino di Cultura, Helsinki Institute
of Physics, Swedish Science Research Council,
Zuse Institut Berlin, University of Heidelberg,
CEA/DAPNIA (F), IFAE Barcelona, CNR (I), CESNET
(CZ), KNMI (NL), SARA (NL), SZTAKI (HU)

35
Preliminary programme of work

Middleware
Grid Workload Management (C. Vistoli/INFN-CNAF)
Grid Data Management (B. Segal/CERN)
Grid Monitoring services (R. Middleton/RAL)
Fabric Management (T. Smith/CERN)
Mass Storage Management (J. Gordon/RAL)
Testbed
Testbed Integration (F. Etienne/CNRS-Marseille
)
Network Services (C. Michau/CNRS)
Scientific Applications
HEP Applications (F. Carminati/CERN)
Earth Observation Applications (L.
Fusco/ESA-ESRIN)
Biology Applications (C. Michau/CNRS)

36
Middleware

Wide-area - building on an existing framework
(Globus)
workload management
The workload is chaotic unpredictable job
arrival rates, data access patterns
The goal is maximising the global system
throughput (events processed per second)
data management
Management of petabyte-scale data volumes, in an
environment with limited network bandwidth and
heavy use of mass storage (tape)
Caching, replication, synchronisation, object
database model
application monitoring
Tens of thousands of components, thousands of
jobs and individual users
End-user - tracking of the progress of jobs and
aggregates of jobs
Understanding application and grid level
performance
Administrator understanding which global-level
applications were affected by failures, and
whether and how to recover

37
Middleware

Local fabric
Effective local site management of giant
computing fabrics
Automated installation, configuration management,
system maintenance
Automated monitoring and error recovery -
resilience, self-healing
Performance monitoring
Characterisation, mapping, management of local
Grid resources
Mass storage management
multi-PetaByte data storage
real-time data recording requirement
active tape layer 1,000s of users
uniform mass storage interface
exchange of data and meta-data between mass
storage systems

38
Infrastructure

Operate a production quality trans European
testbed interconnecting clusters in several
sites
Initial tesbed participantsCERN, RAL, INFN
(several sites), IN2P3-Lyon, ESRIN (ESA-Italy),
SARA/NIKHEF (Amsterdam), ZUSE Institut (Berlin),
CESNET (Prague), IFAE (Barcelona), LIP (Lisbon),
IFCA (Santander)
Define, integrate and build successive releases
of the project middleware
Define, negotiate and manage the network
infrastructure
assume that this is largely Ten-155 and then
Géant
Stage demonstrations, data challenges
Monitor, measure, evaluate, report

39
Applications

HEP
The four LHC experiments
Live testbed for the Regional Centre model
Earth Observation
ESA-ESRIN
KNMI (Dutch meteo) climatology
Processing of atmospheric ozone data derived from
ERS GOME and ENVISAT SCIAMACHY sensors
Biology
CNRS (France), Karolinska (Sweden)
Application being defined

40
Data Grid Challenges

Data
Scaling
Reliability

41
DataGRID Challenges (ii)

Large, diverse, dispersed project
but coordinating this European activity is one of
the projects raisons dêtre
Collaboration, convergence with US and other Grid
activities this area is very dynamic
Organising adequate Network bandwidth a
vital ingredient for success of a Grid
Keeping the feet on the ground The GRID is a
good idea but not the panacea suggested by some
recent press articles

42
Conclusions on LHC Computing

The scale of the computing needs of the LHC
experiments is large compared with current
experiments
each experiment is one to two orders of magnitude
greater than the TOTAL capacity installed at CERN
today
We believe that the hardware technology will be
there to evolve the current architecture of
commodity clusters into large scale computing
fabrics
But there are many management problems -
workload, computing fabric, data, storage in a
wide area distributed environment
Disappointingly solutions for local site
management on this scale are not emerging from
industry
The Grid technologies look very promising to
deliver a major step forward in wide area
computing usability and effectiveness
But a great deal of work will be required to
make this a reality

These are general problems HEP has just come
across them first

Write a Comment

User Comments (0)

About PowerShow.com

les robertson cernit 1100 1 PowerPoint PPT Presentation