les robertson cernit 1100 1 - PowerPoint PPT Presentation

Loading...

PPT – les robertson cernit 1100 1 PowerPoint presentation | free to view - id: 216ab5-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

les robertson cernit 1100 1

Description:

HEP offline computing the current model. LHC computing requirements ... smallish records. mostly read-only. Modest I/O rates. few MB/sec per fast processor ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 43
Provided by: lesr150
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: les robertson cernit 1100 1


1
The LHC Computing Challenge
  • CMS Conference
  • ITEP - Moscow
  • Les Robertson
  • CERN - IT Division
  • 23 November 2000
  • les.robertson_at_cern.ch

2
Summary
  • HEP offline computing the current model
  • LHC computing requirements
  • The wide area computing model
  • A place for Grid technology?
  • The DataGRID project
  • Conclusions

3
Data Handling and Computation for Physics Analysis
event filter (selection reconstruction)
detector
processed data
event summary data
raw data
batch physics analysis
event reprocessing
analysis objects (extracted by physics topic)
event simulation
interactive physics analysis
les.robertson_at_cern.ch
4
HEP Computing Characteristics
  • Large numbers of independent events
  • trivial parallelism
  • Large data sets
  • smallish records
  • mostly read-only
  • Modest I/O rates
  • few MB/sec per fast processor
  • Modest floating point requirement
  • SPECint performance
  • Very large aggregate requirements computation,
    data
  • Scaling up is not just big it is also complex
  • …and once you exceed the capabilities of a single
    geographical installation ………?

5
The SHIFT Software Model
application servers
IP network
stage (migration) servers
Storage access API which can be implemented over
IP ----- all data available to all
processes ----- replicated components -
scalable heterogeneous distributed
les.robertson_at_cern.ch
6
Generic computing farm
network servers
application servers
tape servers
les.robertson_at_cern.ch
disk servers
Cern/it/pdp-les.robertson 10-98-6
7
HEP computing farms use commodity
components Simple Office PCs
8
Standard components
  • Computing Storage Fabric
  • built up from commodity components
  • Simple PCs
  • Inexpensive network-attached disk
  • Standard network interface (Fast Gigabit
    Ethernet)
  • with a minimum of high(er)-end components
  • LAN backbone
  • WAN connection

PC-based disk server 20 IDE disks 1.5 TeraBytes
9
HEPs not special, just more cost conscious
  • Computing Storage Fabric
  • built up from commodity components
  • Simple PCs
  • Inexpensive network-attached disk
  • Standard network interface (Fast Gigabit
    Ethernet)
  • with a minimum of high(er)-end components
  • LAN backbone
  • WAN connection

10
Limit the role of high end equipment
  • Computing Storage Fabric
  • built up from commodity components
  • Simple PCs
  • Inexpensive network-attached disk
  • Standard network interface (Fast Gigabit
    Ethernet)
  • with a minimum of high(er)-end components
  • LAN backbone WAN
    connection

11
High Throughput Computing
  • High Throughput Computing
  • mass of modest problems
  • throughput rather than performance
  • resilience rather than ultimate reliability
  • HEP can exploit inexpensive mass market
    components
  • to build large computing/data clusters
  • scalable, extensible, flexible, heterogeneous,
    ………..
  • and as a result - really hard to manage
  • We should have much in common with data mining,
    Internet computing facilities, ……

Chaotic workload
12
LHC Computing Requirements
13
Projected LHC Computing Fabric at CERN no more
than 1/3 of the total LHC computing requirement
Estimated computing resources required at CERN
for LHC
experiments in 2006
collaboration
ALICE
ATLAS
CMS
LHCB
Total
420 000
520 000
1 760 000
600 000
220 000
CPU capacity (SPECint95)
2006
3 000
3 000
3 000
estimated cpus in 2006
1 500
10 500
disk capacity (TB)
2006
800
750
650
450
2 650
3,7
3,0
1,8
0.6
9,1
2006
mag.tape capacity (PB)
aggregate I/O rates (GB/sec)
100
100
40
340
100
disk
1,2
0,8
0,8
0,2
3,0
tape
Effective throughput of LAN backbone
14
lt 50 of the main analysis capacity will be at
CERN
les.robertson_at_cern.ch
15
Other experiments
LHC experiments
Jan 2000 30 TB disk 1 PB tape
Other experiments
LHC experiments
les.robertson_at_cern.ch
16
Components to Fabrics
  • Commodity components are just fine for HEP
  • Masses of experience with inexpensive farms
  • Long experience with mass storage
  • LAN technology is going the right way
  • Inexpensive high performance PC attachments
  • Compatible with hefty backbone switches
  • Good ideas for improving automated operation and
    management
  • Just needs some solid computer engineering RD?

17
Two Problems
  • Funding
  • will funding bodies place all their investment at
    CERN?
  • Geography
  • does a geographically distributed model better
    serve the needs of the world-wide distributed
    community?

No Maybe if it is reliable and easy to use
18
World Wide Collaboration ? distributed
computing storage capacity
CMS 1800 physicists 150 institutes 32 countries
19
Solution? - Regional Computing Centres
  • Exploit established computing expertise
    infrastructure
  • in national labs, universities
  • Reduce dependence on links to CERN
  • full summary data available nearby
  • through a fat, fast, reliable network link
  • Tap funding sources not otherwise available to
    HEP at CERN
  • Devolve control over resource allocation
  • national interests?
  • regional interests?
  • at the expense of physics interests?

20
The Basic Problem - Summary
  • Scalability ? cost ? complexity ? management
  • Thousands of processors, thousands of disks,
    PetaBytes of data, Terabits/second of I/O
    bandwidth, ….
  • Wide-area distribution ? complexity ? management
    ? bandwidth
  • WANs are only and will only be 1 of LANs
  • Distribute, replicate, cache, synchronise the
    data
  • Multiple ownership, policies, ….
  • Integration of this amorphous collection of
    Regional Centres ..
  • .. with some attempt at optimisation
  • Adaptability ? flexibility ? simplicity
  • We shall only know how analysis will be done once
    the data arrives

21
The Wide Area Computing Model
22
Regional Centres - a Multi-Tier Model
les.robertson_at_cern.ch
23
  • Tier 0 CERN
  • Data recording, reconstruction, 20 analysis
  • Full data sets on permanent mass storage
    raw, ESD, simulated data
  • Hefty WAN capability
  • Range of export-import media
  • 24 X 7 availability
  • Tier 1 established data centre or new
    facility hosted by a lab
  • Major subset of data all/most of the ESD,
    selected raw data
  • Mass storage, managed data operation
  • ESD analysis, AOD generation, major analysis
    capacity
  • Fat pipe to CERN
  • High availability
  • User consultancy Library Collaboration
    Software support

24
  • Tier 2 smaller labs, smaller countries,
    probably hosted by existing data centre
  • Mainly AOD analysis
  • Data cached from Tier 1, Tier 0 centres
  • No mass storage management
  • Minimal staffing costs
  • University physics department
  • Final analysis
  • Dedicated to local users
  • Limited data capacity cached only via the
    network
  • Zero administration costs (fully automated)

25
More realistically - a Grid Topology
les.robertson_at_cern.ch
26
A place for Grid technology?
27
Are Grids a solution?
  • Computational Grids
  • Change of orientation of Meta-computing activity
  • From inter-connected super-computers … ..
    towards a more general concept of a
    computational power Grid (The Grid Ian
    Foster, Carl Kesselman)
  • Has found resonance with the press, funding
    agencies
  • But what is a Grid?
  • Dependable, consistent, pervasive access to
    resources
  • So, in some way Grid technology makes it easy to
    use diverse, geographically distributed, locally
    managed and controlled computing facilities
  • as if they formed a coherent local cluster

Ian Foster and Carl Kesselman, editors, The
Grid Blueprint for a New Computing
Infrastructure, Morgan Kaufmann, 1999
28
What does the Grid do for you?
  • You submit your work
  • And the Grid
  • Finds convenient places for it to be run
  • Organises efficient access to your data
  • Caching, migration, replication
  • Deals with authentication to the different sites
    that you will be using
  • Interfaces to local site resource allocation
    mechanisms, policies
  • Runs your jobs
  • Monitors progress
  • Recovers from problems
  • Tells you when your work is complete
  • If there is scope for parallelism, it can also
    decompose your work into convenient execution
    units based on the available resources, data
    distribution

29
Current state
  • Globus project (http//www.globus.org)
  • Basic middleware
  • Authentication
  • Information service
  • Resource management
  • Good basis to build on
  • Active collaborative community
  • Open approach Grid Forum (http//www.gridforum.o
    rg)
  • Who is handling lots of data?
  • How many production quality implementations?

30
RD required
  • Local fabric
  • Issues of scalability, management, reliability of
    the local computing fabric
  • Adaptation of these amorphous computing fabrics
    to the Grid
  • Wide Area Mass Storage
  • Grid technology in an environment that is High
    Throughput, Data Intensive, and has a Chaotic
    Worload
  • Grid scheduling
  • Data management
  • Monitoring - reliability and performance

31
HEP Grid Initiatives
  • DataGRID
  • European Commission support, HEP, Earth
    Observation, Biology
  • PPDG Particle Physics Data Grid
  • US labs HEP data analysis
  • High performance file transfer, data caching
  • GriPhyN Grids for Physics Networks
  • Computer science focus
  • HEP applications target
  • Several national European initiatives
  • Italy INFN
  • UK, France, Netherlands,

32
The DataGRID Project
33
The Data Grid Project
  • Proposal for EC Fifth Framework funding
  • Principal goals
  • Middleware for fabric Grid management
  • Large scale testbed
  • Production quality demonstrations
  • mock data, simulation analysis, current
    experiments
  • Three year phased developments demos
  • Collaborate with and complement other European
    and US projects
  • Open source and communication
  • GRID Forum
  • Industry and Research Forum

34
DataGRID Partners
  • Managing partners
  • UK PPARC Italy INFN
  • France CNRS Holland NIKHEF
  • Italy ESA/ESRIN CERN
  • Industry
  • IBM (UK), Compagnie des Signaux (F), Datamat (I)
  • Associate partners
  • Istituto Trentino di Cultura, Helsinki Institute
    of Physics, Swedish Science Research Council,
    Zuse Institut Berlin, University of Heidelberg,
    CEA/DAPNIA (F), IFAE Barcelona, CNR (I), CESNET
    (CZ), KNMI (NL), SARA (NL), SZTAKI (HU)

35
Preliminary programme of work
  • Middleware
  • Grid Workload Management (C. Vistoli/INFN-CNAF)
  • Grid Data Management (B. Segal/CERN)
  • Grid Monitoring services (R. Middleton/RAL)
  • Fabric Management (T. Smith/CERN)
  • Mass Storage Management (J. Gordon/RAL)
  • Testbed
  • Testbed Integration (F. Etienne/CNRS-Marseille
    )
  • Network Services (C. Michau/CNRS)
  • Scientific Applications
  • HEP Applications (F. Carminati/CERN)
  • Earth Observation Applications (L.
    Fusco/ESA-ESRIN)
  • Biology Applications (C. Michau/CNRS)

36
Middleware
  • Wide-area - building on an existing framework
    (Globus)
  • workload management
  • The workload is chaotic unpredictable job
    arrival rates, data access patterns
  • The goal is maximising the global system
    throughput (events processed per second)
  • data management
  • Management of petabyte-scale data volumes, in an
    environment with limited network bandwidth and
    heavy use of mass storage (tape)
  • Caching, replication, synchronisation, object
    database model
  • application monitoring
  • Tens of thousands of components, thousands of
    jobs and individual users
  • End-user - tracking of the progress of jobs and
    aggregates of jobs
  • Understanding application and grid level
    performance
  • Administrator understanding which global-level
    applications were affected by failures, and
    whether and how to recover

37
Middleware
  • Local fabric
  • Effective local site management of giant
    computing fabrics
  • Automated installation, configuration management,
    system maintenance
  • Automated monitoring and error recovery -
    resilience, self-healing
  • Performance monitoring
  • Characterisation, mapping, management of local
    Grid resources
  • Mass storage management
  • multi-PetaByte data storage
  • real-time data recording requirement
  • active tape layer 1,000s of users
  • uniform mass storage interface
  • exchange of data and meta-data between mass
    storage systems

38
Infrastructure
  • Operate a production quality trans European
    testbed interconnecting clusters in several
    sites
  • Initial tesbed participants CERN, RAL, INFN
    (several sites), IN2P3-Lyon, ESRIN (ESA-Italy),
    SARA/NIKHEF (Amsterdam), ZUSE Institut (Berlin),
    CESNET (Prague), IFAE (Barcelona), LIP (Lisbon),
    IFCA (Santander) ……
  • Define, integrate and build successive releases
    of the project middleware
  • Define, negotiate and manage the network
    infrastructure
  • assume that this is largely Ten-155 and then
    Géant
  • Stage demonstrations, data challenges
  • Monitor, measure, evaluate, report

39
Applications
  • HEP
  • The four LHC experiments
  • Live testbed for the Regional Centre model
  • Earth Observation
  • ESA-ESRIN
  • KNMI (Dutch meteo) climatology
  • Processing of atmospheric ozone data derived from
    ERS GOME and ENVISAT SCIAMACHY sensors
  • Biology
  • CNRS (France), Karolinska (Sweden)
  • Application being defined

40
Data Grid Challenges
  • Data
  • Scaling
  • Reliability

41
DataGRID Challenges (ii)
  • Large, diverse, dispersed project
  • but coordinating this European activity is one of
    the projects raisons dêtre
  • Collaboration, convergence with US and other Grid
    activities this area is very dynamic
  • Organising adequate Network bandwidth a
    vital ingredient for success of a Grid
  • Keeping the feet on the ground The GRID is a
    good idea but not the panacea suggested by some
    recent press articles

42
Conclusions on LHC Computing
  • The scale of the computing needs of the LHC
    experiments is large compared with current
    experiments
  • each experiment is one to two orders of magnitude
    greater than the TOTAL capacity installed at CERN
    today
  • We believe that the hardware technology will be
    there to evolve the current architecture of
    commodity clusters into large scale computing
    fabrics
  • But there are many management problems -
    workload, computing fabric, data, storage in a
    wide area distributed environment
  • Disappointingly solutions for local site
    management on this scale are not emerging from
    industry
  • The Grid technologies look very promising to
    deliver a major step forward in wide area
    computing usability and effectiveness
  • But a great deal of work will be required to
    make this a reality

These are general problems HEP has just come
across them first
About PowerShow.com