Jrgen Knobloch cernit 1 - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Jrgen Knobloch cernit 1

Description:

... seamless access to computing power and data storage capacity distributed over the globe. ... hour period of 1.1GB/s and. peaks of 1.2GB/s. In addition : 600 ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 28
Provided by: JuergenK5
Category:
Tags: cd | cernit | hour | jrgen | knobloch | power

less

Transcript and Presenter's Notes

Title: Jrgen Knobloch cernit 1


1
The LHC Computing Grid ProjectChallenges, Status
Plans
LCG
  • May 2004
  • Jürgen Knobloch
  • IT Department, CERN
  • This file is available at http//cern.ch/lcg/pres
    entations/LCG_JK_EU_Visit.ppt

2
CERN
Large Hadron Collider
3
CERN where the Web was born
Tim Berners-Lee
The World Wide Web provides seamless access to
information that is stored in many millions of
different geographical locations In contrast,
the Grid is an emerging infrastructure that
provides seamless access to computing power and
data storage capacity distributed over the globe.
WSIS, Geneva, October 10-12, 2003
First Web Server
4
Particle Physics
Establish a periodic system of the fundamental
building blocks
andunderstandforces
5
Methods of Particle Physics
The most powerful microscope
Creating conditions similar to the Big Bang
6
Particle physics data
From raw data to physics results
2037 2446 1733 1699 4003 3611 952 1328 2132 1870
2093 3271 4732 1102 2491 3216 2421 1211 2319
2133 3451 1942 1121 3429 3742 1288 2343 7142
_
Raw data Convert to physics quantities
Interaction with detector material Pattern, recogn
ition, Particle identification
Analysis
Reconstruction
Simulation (Monte-Carlo)
7
Challenge 1 Large, distributed community
Offline software effort 1000 person-yearsper
experiment
CMS
ATLAS
Software life span 20 years
5000 Physicistsaround the world- around the
clock
LHCb
8
Challenge 2 Data Volume
Annual data storage 12-14 PetaBytes/year
9
Challenge 3 Find the Needle in a Haystack
10
Therefore Provide mountains of CPU
CalibrationReconstructionSimulationAnalysis
For LHC computing,some 100 Million SPECint2000
are needed!
Produced by Inteltoday in 6 hours
1 SPECint2000 0.1 SPECint95 1 CERN-unit 4
MIPS - a 3 GHz Pentium 4 has 1000 SPECint2000
11
The CERN Computing Centre
2,400 processors 200 TBytes of disk 12 PB of
magnetic tape
Even with technology-driven improvements in
performance and costs CERN can provide nowhere
near enough capacity for LHC!
12
What is the Grid?
  • Resource Sharing
  • On a global scale, across the labs/universities
  • Secure Access
  • Needs a high level of trust
  • Resource Use
  • Load balancing, making most efficient use
  • The Death of Distance
  • Requires excellent networking
  • Open Standards
  • Allow constructive distributed development
  • There is not (yet) a single Grid

6.25 Gbps 20 April 2004
13
How will it work?
  • The GRID middleware
  • Finds convenient places for the scientists job
    (computing task) to be run
  • Optimises use of the widely dispersed resources
  • Organises efficient access to scientific data
  • Deals with authentication to the different sites
    that the scientists will be using
  • Interfaces to local site authorisation and
    resource allocation policies
  • Runs the jobs
  • Monitors progress
  • Recovers from problems
  • and .
  • Tells you when the work is complete and transfers
    the result back!

14
The LHC Computing Grid Project - LCG
  • Collaboration
  • LHC Experiments
  • Grid projects Europe, US
  • Regional national centres
  • Choices
  • Adopt Grid technology.
  • Go for a Tier hierarchy.
  • Use Intel CPUs in standard PCs
  • Use LINUX operating system.
  • Goal
  • Prepare and deploy the computing environment to
    help the experiments analyse the data from the
    LHC detectors.

15
Operational Management of the Project
Applications Area Development environment Joint
projects Data management Distributed analysis
Middleware Area now EGEE Provision of a base
set of gridmiddleware acquisition,development,
integration,testing, support
ARDA A Realisation of Distributed Analysis for
LHC
CERN Fabric Area Large cluster management Data
recording Cluster technology Networking Computing
service at CERN
Grid Deployment Area Establishing and managing
theGrid Service - Middleware certification,
security, operations,registration,
authorisation,accounting
Joint with a new European project EGEE
Phase 1 2002-05development of common software
prototyping and operation of a pilot computing
service Phase 2 2006-08acquire, build and
operate the LHC computing service
Enabling Grids for e-Science in Europe
16
Most of our work is also useful for
  • Medical/Healthcare (imaging, diagnosis and
    treatment )
  • Bioinformatics (study of the human genome and
    proteome to understand genetic diseases)
  • Nanotechnology (design of new materials from the
    molecular scale)
  • Engineering (design optimization, simulation,
    failure analysis and remote Instrument access and
  • control)
  • Natural Resources and the Environment
  • (weather forecasting, earth observation, modeling
  • and prediction of complex systems, earthquake)

17
Virtual Organizations for LHC and others
18
Deploying the LCG Service
  • Middleware
  • Testing and certification
  • Packaging, configuration, distribution and site
    validation
  • Support problem determination and resolution
    feedback to middleware developers
  • Operations
  • Grid infrastructure services
  • Site fabrics run as production services
  • Operations centres trouble and performance
    monitoring, problem resolution 24x7 globally
  • Support
  • Experiment integration ensure optimal use of
    system
  • User support call centres/helpdesk global
    coverage documentation training

19
Progressive Evolution
  • Improve reliability, availability
  • Add more sites
  • Establish service quality
  • Run more and more demanding data challenges
  • Improve performance, efficiency
  • scheduling, data migration, data transfer
  • Develop interactive services
  • Increase capacity and complexity gradually
  • Recognise and migrate to de facto standards as
    soon as they emerge

20
Challenges
  • Service quality
  • Reliability, availability, scaling, performance
  • Security our biggest risk
  • Management and operations
  • grid ? a collaboration of computing centres
  • ? Maturity is some years away - a second (or
    third) generation of middleware will be needed
    before LHC starts
  • In the short-term there will many grids and
    middleware implementations
  • for LCG - inter-operability will be a major
    headache
  • How homogeneous does it need to be?

21
LCG Service Status
  • Certification and distribution process
    established
  • Middleware package components from
  • European DataGrid (EDG)
  • US (Globus, Condor, PPDG, GriPhyN) ? the Virtual
    Data Toolkit
  • Agreement reached on principles for registration
    and security
  • Rutherford Lab (UK) providing the initial Grid
    Operations Centre
  • FZK (Karlsruhe) operating the Call Centre
  • LCG-2 software released February 2004
  • More than 40 centres connected with more than
    3000 processors
  • Four collaborations run data challenges on the
    grid

22
Regional centres connected to the LCG grid
gt 40 sites gt 3,100 CPUs
23
Data challenges
  • The 4 LHC experiments currently run data
    challenges using the LHC computing grid
  • Part 1 World-wide production of simulated data
  • Job submission, resource allocation and
    monitoring
  • Catalog of distributed data
  • Part 2 Test of Tier-0 operation
  • Continuous (24 x 7) recording of data up to 450
    MB/s per experiment (target for ALICE in 2005
    750 MB/s)
  • First pass data reconstruction and analysis
  • Distribute data in real-time to Tier-1 centres
  • Part 3 Distributed analysis on the Grid
  • Access of data from anywhere in the world in an
    organized as well as in a chaotic access pattern

Now
Summer 04
Fall 04
24
1 Gbyte/s Computing Data Challenge ? Observed
rates
running in parallel with increasing production
service
1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0
920 MB/s average over a period of 3 days with an
8 hour period of 1.1GB/s and peaks of 1.2GB/s
daytime tape server intervention
In addition 600 MB/s into CASTOR for 12 hours
, then window of opportunity closed services
started
T (minutes)
25
High Througput Prototype (openlab LCG prototype)
10GE WAN connection
ENTERASYS N7 10 GE Switches
180 IA32 CPU Server (dual 2.4 GHz P4, 1 GB mem.)
56 Itanium Server (dual 1.3/1.5 GHz Itanium2, 2
GB mem)
GE per node
10GE per node
10GE
GE per node
multi GE connections to the backbone
20 Tape Server STK 9940B
GE per node
28 Disk Server (dual
P4, IDE disks, 1TB disk space each)
26
Preparing for 2007
  • 2003 has demonstrated event production
  • In 2004 we must show that we can also handle the
    data even if the computing model is very simple
  • -- This is a key goal of the 2004 Data
    Challenges
  • Target for end of this year
  • Basic model demonstrated using current grid
    middleware
  • All Tier-1s and 25 of Tier-2s operating a
    reliable service
  • Validate security model, understand storage
    model
  • Clear idea of the performance, scaling, and
    management issues

27
LCG and EGEE (and EU projects in general)
  • LCG counts on EGEE middleware with the required
    functionality
  • The experience we have gained with existing
    middleware will be essential EGEE starts from
    LCG-2
  • LCG focuses on a practical application that
    cannot be satisfied otherwise
  • Pushing technology to its limit
  • We are dedicated to success!
  • LCG provides the running of services for EGEE
  • How else could a 2 year project get quickly
    started?
  • The developments are not special for HEP
  • Other sciences will profit
  • What happens after the 2 (2) year lifespan of
    EGEE?
  • HEP and CERN is a leading user of the GEANT
    network
Write a Comment
User Comments (0)
About PowerShow.com