Performance monitoring and analysis on the EGEE Grid infrastructure status and plans - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Performance monitoring and analysis on the EGEE Grid infrastructure status and plans

Description:

virtual environment contained in the overlaying operating system. XEN. emulate a Virtual Machine in the overlaying operating system ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 41
Provided by: erw69
Category:

less

Transcript and Presenter's Notes

Title: Performance monitoring and analysis on the EGEE Grid infrastructure status and plans


1
Grid Computinga Project for CoimbrainATLASPhys
icsand beyond
Helmut Wolters Coimbra 26/5/2006
2
Overview
  • What is the GRID
  • Role of LIP in the GRID
  • Advanced computing in Coimbraat Center of
    Computational Physicscentopeia
  • Joined venture LIP/CFC
  • Building the Tier2 node at LIP Coimbra/Lisbon

3
What is the GRID ?
  • So you have the Internet
  • it links all computers to a common network (if
    you want)
  • what do you need more?
  • The GRID is much more than that !
  • it offers completely new perspectives
  • and it is not easy to be implemented

4
What is the GRID?
  • GRID computing is a recent concept which takes
    distributing computing a step forward
  • The name GRID is chosen by analogy with the
    electric power grid
  • Transparent plug-in to obtain computing power
    without worrying where it comes from
  • Permanent and available everywhere
  • Pay per use
  • The World Wide Web provides seamless access to
    information that is stored in many millions of
    different geographical locations.
  • In contrast, the GRID is a new computing
    infrastructure which provides seamless access to
    computing power and data storage distributed all
    over the globe.

5
Motivation to build a GRID
  • Single institutions are no longer able to support
    the computing power and storage capacity needed
    for modern scientific research.
  • Compute intensive sciences which are presently
    driving the GRID development
  • Physics/Astronomy data from different kinds of
    research instruments
  • Medical/Healthcare imaging, diagnosis and
    treatment
  • Bioinformatics study of the human genome and
    proteome to understand genetic diseases
  • Nanotechnology design of new materials from the
    molecular scale
  • Engineering design optimization, simulation,
    failure analysis and remote Instrument access and
    control
  • Natural Resources and the Environment weather
    forecasting, earth observation, modeling and
    prediction of complex systems river floods and
    earthquake simulation

6
GRID vs. Distributed Computing
  • Distributed infrastructures already exist, but
  • they normally tend to be specialized and local
    systems
  • intended for a single purpose or user group
  • Restricted to a limit number of users
  • Do not allow coherent interactions with resources
    from other institutions.
  • The GRID goes further and takes into account
  • Different kinds of resources
  • Not always the same hardware, data, applications
    and admin. policies.
  • Different kinds of interactions
  • User groups or applications want to interact with
    Grids in different ways.
  • Access computing power / storage capacity across
    different administrative domains by an unlimited
    set of non-local users.
  • Dynamic nature
  • Resources added/removed/changed frequently.
  • World wide dimension.

7
The GRID metaphor
  • The interaction between heterogeneous resources
    (owned by geographically spread organizations),
    applications and users is only possible through
  • the use of a specialized layer of software
    called middleware.
  • The middleware hides all the infrastructure
    technical details allowing a transparent and
    uniform interaction with the grid.

8
How is LIP involved
  • LIP is involved in
  • LHC Computing GRID (LCG)
  • The biggest worldwide GRID infrastructure
  • Will be used in the data analysis produced by
    the LHC accelerator built at CERN
  • Enabling GRIDS for E-Science in Europe (EGEE)
  • A European GRID infrastructure being built for
    multi-disciplinary sciences
  • Iniciativa Nacional GRID
  • Portuguese Government

9
The LCG project
  • The LHC will be the world most powerful particle
    accelerator (2007)
  • Accelerates two beams of particles (protons or Pb
    ions) in opposite directions, around a 27-km
    tunnel and at velocities close to the speed of
    light
  • These beams are smashed against each other
    producing new particles which will be detected by
    4 LHC experiments (CMS, ATLAS, ALICE and LHC-b).
  • Such collisions are expected to produce the
    states of matter which may have existed in the
    very first instants of the Universe.

10
The LCG project
  • The simulation and reconstruction of a full
    (central) PbPb collision at LHC (Alice, about
    84000 primary tracks!) takes 12 hours of a
    top-PC and produces more than 2 GB output.

11
The LCG project
  • pp in Atlas still 1000s and 5 MB for a tt
    production on a Pentium IV 2.6 GHz
  • global estimation 140 CPUs and up to 40 TB
    storage

12
The LCG project
  • LHC Computing GRID
  • LCG aims to build/maintain a data storage and
    analysis infrastructure for the large LHC physics
    community.
  • LHC experiments are expected to produce 10
    Petabytes of experimental data annually
  • 109 collisions/second (1 Ghz) 100Hz (after
    filtering)
  • 1 collision 1MB of data
  • 100 MB/s ?10 PB per year 20 Km CD stack
  • Computing power 100 000 today's fastest PC for
    analysis and simulation
  • Needed to be available during the 15 years life
    time of the LHC machine
  • Fully accessible to about 5000 scientists from
    more than 500 institutes around the world.

13
EGEE project
  • EGEE is a European Union project which has as
    fundamental goal the deployment of a secure,
    reliable and robust grid for all fields of
    science.
  • EGEE is built on top of the infrastructures
    developed for LCG
  • It integrates national/regional grids
  • Strong requirements for middleware development to
    provide the proper tools to science fields such
    as High-Energy Physics and Bio-medical
    applications
  • Very strong commitment in the dissemination of
    grid technologies
  • Extensive support and training tutorials for both
    users and administrators.

14
LIP and the EGEE organization
  • EGEE South-West federation (SWE)
  • includes LIP and several Spanish sites
  • Responsible for the operation of essential
    core
    services
  • Responsible for the monitoring of

    grid resources
  • Receives, responds and coordinates
    GRID
    operation problems
  • Regional Operations Centre (ROC)
  • Coordinates the EGEE federations
  • Shared among the different SWE institutes
  • Certifies that a site fulfills all requirements
    to join production infrastructure
  • Negotiates service level agreements (SLAs)
  • EGEE South-West federation offers
  • 8 Resource Brokers and 8 top BDII machines as
    production core services.
  • Local sites deploy
  • 738.3 CPUs (value normalized to a 1000
    SpectInt2000)
  • 7.7 TB of online storage 2.9 PB of nearline
    storage (tape backend)
  • Shared by more than 20 virtual organizations.

15
LCG tier model
Tier 1
Tier 2
Tier 4
Tier 3
RAL
.
small centres
desktops portables
IN2P3
IC
FNAL
IFCA
UB
  • Tier-0 the accelerator centre
  • Filter? raw data ? reconstruction ?
    event summary data (ESD)
  • Record the master copy of raw and ESD
  • Tier-1
  • Managed Mass Storage permanent storage raw,
    ESD, calibration data, meta-data, analysis data
    and databases (Tape)
  • Data-heavy (ESD-based) analysis
  • Re-processing of raw data
  • National, regional support
  • online to the data acquisition processhigh
    availability, long-term commitment

NIKHEF
Cambridge
Tier 0
TRIUMF
Budapest
CNAF
Prague
FZK
Taipei
BNL
PIC
LIP
Nordic
ICEPP
Legnaro
CSCS
IFIC
Rome
.
CIEMAT
MSU
USC
Krakow
  • Tier-2
  • Well-managed, grid-enabled disk storage
  • End-user analysis batch and interactive
  • Simulation

16
Advanced computing in Coimbra
CFC - Center for Computational PhysicsCentopeia
17
Centopeia
  • System for parallelized computing
  • 108 legs Intel Pentium 4
  • 12 x 2.2 GHz,
  • 24 x 2.4 GHz,
  • 60 x 2.8 GHz,
  • 12 x 3.0 GHz
  • Þ GRID-like technical infrastructure
  • UPS 23 KVA
  • Air condition

18
Centopeia how is it used ?
  • it is open to external users
  • 30 from University of Coimbra (18 CFC other
    Centers/Departments)
  • 25 outside UC (Porto, IST, Aveiro, Évora, Braga,
    etc.)
  • Some topics
  • Dynamic processes in molecules TDDFT - Time
    Dependent Density Functional Theory
  • Lattice QCD - (Gluon propagator)
  • Protein Unfolding (NAMD code)
  • Optimization of molecular structures (PSM-MGP)
  • Monte-Carlo Simulations of various problems

19
Centopeia how is it used ?
  • Is this a GRID ?
  • Of course not the system is open to every-one,
    but it is more like a single parallel
    Supercomputer
  • Can it become part of the GRID ?
  • Not easily. Because it is a uniform parallelized
    system.
  • gridification takes away many optimizations of
    parallel processing

20
Where goes CFC ?
  • New system (coming soon)
  • 130 Sun Fire X4100each 8 GB RAM 2 x AMD
    Opteron Double Core 2.2GHz 520 CPUs 1,5
    Teraflops
  • 2 x Gigabit network (data/administration)
  • 6 TB central storage
  • SuSE Linux Enterprise Server 9
  • Sun GRID Engine
  • Gigabit link to University Computing Center
  • Challenges and necessities
  • enforce links with HPC community in Coimbra and
    Portugal, including enterprises
  • Integration in GRID projects, in Coimbra and
    national wide
  • large band width allocation by RCCN (public
    scientific and educational network) (dedicated
    lines ?)
  • So there are challenges in common with LHC Grid !

21
Why should we join efforts ?
  • LIP needs much more computing power for
    implementing the GRID Tier2 node
  • we need computing power and storage for
    Simulation, Reconstruction and Data Analysis in
    ATLAS
  • LIP Lisbon is very restricted due to physical
    space limitation
  • Basic infrastructure is a very expensive part for
    computing installations of this level
  • we can share the space
  • we can share the network not really, but we can
    join efforts
  • even if the computing part would be kept aside

22
How do we join efforts ?
  • joint venture
  • gridify Centopeia when new cluster up and running
  • install LHC compatible GRID structure
  • we got 108 Pentium-IV CPUs
  • 200 Gigaflops
  • This is a quite good start
  • Share efforts (funding requests, ) in
  • infrastructure
  • manpower
  • computing power (?)

23
How can we get closer
  • Share efforts in computing power (?)
  • LHC experiments will need a lot of computing
    power
  • convince CFC people to gridify their new cluster
    in LHC sense?
  • So we could become clients

24
How can we get closer
  • No way
  • LHC Simulation and Analysis software is not
    optimized for parallel computing
  • in fact, it depends on a very CERN-like
    software environment (Scientific Linux,
    libraries, )
  • our simulation software only runs on Scientific
    Linux 3.0.6
  • it has to be gcc 3.2.3
  • if we adapt the system to LCG needs,the system
    looses the best part of its capacities
  • you probably wont even get SLC 3 to run on this
    architecture easily

25
How can we get closer
  • the CFC philosophy is adapt your software to
    the system a parallel super computer
  • the LHC Grid philosophy is adapt the system to
    the software needs at least for now
  • LHC simulation and analysis software is a very
    complex system of libraries and tools,that is
    not easy to adapt
  • results are needed now, soon

26
How can we get closer
  • maybe we can meet somewhere in a common GRIDbut
    not yet
  • challengecreate parallelized analyzation/simulat
    ion tools ?
  • CERN software must become more CERN-infrastructu
    re independentThis is a long time process.
  • Collaboration with CFC is a very interesting
    opportunity

27
Maybe we can share something ?
  • sometimes it could run as GRID and sometimes as
    Parallel Computer
  • This is noting completely strange in LCG
  • Example ATLAS TDAQ Event Filter (online!)Event
    Handler Requirements Use Cases
  • UC010
  • The hardware of the EF can be shared with the
    offline computing, in particular with the Tier 0
    of the LHC Computing GRID model. During data
    taking period, a part possibly variable in size
    with the time of the Tier 0 machines can be
    used by the EF. Conversely, the machines normally
    devoted to the EF only can be used for offline
    work when ATLAS is not running.

28
Maybe we can share something ?
  • probably we cannot use the same operating system.
  • We could try
  • dual bootvery confusing to configurate and
    maintain
  • chrootvirtual environment contained in the
    overlaying operating system
  • XENemulate a Virtual Machine in the overlaying
    operating system
  • But these are virtual hypotheses this can be
    tried if we have the systems up and running in
    production mode

29
LHC Computing GRID (LCG)
  • The largest GRID infrastructure under operation
    (data from 14/5/2006)

30
LIP Tier2 - Site 1
LIP Lisboa
31
LIP Tier2 - Site 1
  • CPUs for GRID
  • 6 dual Xeon 2.8GHz
  • 9 dual PIII 933MHz
  • 20 single CPU (PIV and PIII)
  • partly for GRID
  • 17 AMD 2.2 GHz (2 CPUs dual core)
  • 18 AMD 2.2 GHz (2 CPUs single core)
  • 8 TB disk storage
  • future plans
  • 2006 - 150 CPUs
  • 2007 - 300 CPUs
  • 2008 - 450 CPUs

32
LIP Tier2
Coimbra
200 km
Lisboa
33
LIP Tier2 - Site 2
LIP Coimbra
34
Gridification of Centopeia
CFC LIP
Gridification
35
Gridification of Centopeia initial phase
  • 12 legs taken out of Centopeia for testing
  • installing SLC 3.0.6
  • kernel does not recognize disk interface
  • kernel upgrade from 2.4.x to 2.6.x
  • implementing a minimum GRID environment
  • Computing Element (CE)
  • Storage Element (SE)
  • Monitoring Box (MonBox)
  • User Interface (UI)
  • working nodes
  • gaining experience
  • join in all 108 nodes as soon as they become
    available

36
GRIDS - Still an unfinished work
  • Many key concepts identified, known and tested
  • Major efforts now on establishing
  • Standards (a slow process) (Global Grid Forum,
    http//www.gridforum.org/ )
  • Production Grids for multiple Virtual
    Organisations
  • Production Reliable, sustainable, quality of
    service
  • In Europe, EGEE
  • In US, Teragrid
  • One stack of middleware that serves many research
    (and other!!!) communities
  • Operational procedures and services (people,
    policy,..)
  • New user communities
  • Research development continues

37
The final idea
  • GRID technologies are already implemented in more
    than 200 EGEE sites worldwide
  • Strong indication that the scientific community
    is benefiting from such computing infrastructure
  • GRIDS will become more powerful, sophisticated
    and user-friendly

38
Until realizing the final idea
  • we have to do simulations and analysis
  • we use the LCG
  • we make part of LCG
  • we collaborate in developing LCG
  • we should also study how CFC supercomputer can be
    useful for us (simulation, analysis)
  • this could help us on the way to a common GRID
  • gives access to much more computing
    infrastructure in the future
  • in Coimbra !

39
Local motivation
  • Portugal is somewhat peripheric in Europe
  • Coimbra is somewhat peripheric in Portugal

40
Local motivation
  • Coimbra cidade do conhecimento
  • we will have to do something in order to continue
    in the game as a global player not just as an
    educated spectator with 700 years of great
    history and knowledge
  • be that game
  • top physics (LIP, CFC, and more)
  • top ICT infra-structure (LIP, CFC)
  • much more (but that is probably not with us)
  • CERN where the Web was born
  • visibility also matters

GRID
Write a Comment
User Comments (0)
About PowerShow.com