Systmes distribus grande chelle - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Systmes distribus grande chelle

Description:

Security, Performance, Fault tolerance, Load Balancing, Fairness, Coordination, ... Jean Claude Barbet (Orsay) -Franck Bonnassieux (UREC) -Julien le duc (Grenoble) ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 19
Provided by: lri
Category:

less

Transcript and Presenter's Notes

Title: Systmes distribus grande chelle


1
A Nation Wide Experimental Grid
2
Grid Distributed System Problematic renewal
Grid raises a lot of research issues Security,
Performance, Fault tolerance, Load Balancing,
Fairness, Coordination, Message passing, Data
storage, Programming, Communication protocols and
architecture, Deployment, etc.
Theoretical models and simulators cannot capture
real life conditions Production platforms have
strong difficulties to reproduce experimental
conditions
  • How to test and compare?
  • Fault tolerance protocols
  • Security mechanisms
  • Deployment tools
  • etc.

3
Tools for Distributed System Studies
To investigate Distributed System issues, we
need 1) Tools (model, simulators, emulators,
experi. Platforms) 2) Strong interaction between
these research tools
Tools for Large Scale Distributed Systems
log(cost)
Real systems Real applications In-lab
platforms Synthetic conditions
Real systems Real applications Real
platforms Real conditions
Models Sys, apps, Platforms, conditions
Key system mecas. Algo, app. kernels Virtual
platforms Synthetic conditions
log(realism)
emulation
math
simulation
live systems
4
We need a Grid experimental platform
According to the current knowledge There is no
large scale testbed dedicated to Grid experiments
  • Grid5000 as a live system
  • Grid eXplorer as a large scale emulator

log(cost)
Grid5000 TERAGrid PlanetLab Naregi Testbed
Grid eXplorer WANinLab Emulab
SimGrid MicroGrid Bricks NS, etc.
Model Protocol proof
log(realism)
emulation
math
simulation
live systems
5
What do we need for Grid experiments ?
  • Remotely controllable Grid nodes installed in
    geographically distributed laboratories
  • A Controllable and Monitorable Network
    between the
  • Grid nodes
  • A middleware infrastructure connecting the nodes
    (security)
  • A playground to prepare experiments
  • A toolkit to deploy, manage, run experiments and
    collect results

6
The Grid5000 Project
  • Building a nation wide experimental platform for
  • Grid researches (like a particle accelerator for
    the computer
  • scientists)
  • 10/11 geographically distributed sites
  • every site hosts a cluster (from 256 CPUs to 1K
    CPUs)
  • All sites are connected by RENATER (French
    Academ. Network)
  • RENATER hosts probes to trace network condition
    load
  • Design and develop a system/middleware
    environment
  • for safely test and repeat experiments
  • 2) Use the platform for Grid experiments
  • Address critical issues of Grid
    system/middleware
  • Programming, Scalability, Fault Tolerance,
    Scheduling
  • Address critical issues of Grid Networking
  • High performance transport protocols, Qos
  • Port and test applications
  • Investigate original mechanisms
  • P2P resources discovery, Desktop Grids

7
Grid5000 Big Picture
Control site
Site 2
Users (ssh loggin password)
Front end
Control Master
Control Slave
Site 1
LAB/Firewall
Router
Control Slave
Test Cluster
Firewall/nat
Labs Network
Site 3
Gateway VPN (192. For all nodes)
Test Cluster
One machine Can be seen as a Virtual Grid Gateway
8
Grid5000 Committees
Technical Committee
Steering Committee (organizer Franck Cappello,
Orsay)
-David Gueldrech (Sophia) -Jean Claude Barbet
(Orsay) -Franck Bonnassieux (UREC) -Julien le duc
(Grenoble) -Fred Desprez (Lyon) -Yvon Jégou
(Rennes) -Olivier Coulaud (Bordeaux) -Frédéric
Barbaresco (Toulouse)
-Thierry Priol (ACI Grid Director) -Brigitte
Plateau (President of ACI Grid SC) -Dani
Vandrome (Director of Renater) -Frédéric Desprez
(Lyon) -Michel Daydé (Toulouse) -Yvon Jégou
(Rennes) -Stéphane Lantéri (Sophia) -Raymond
Namyst (Bordeaux) -Pascale Primet (Lyon) -Olivier
Richard (Grenoble)
Forums Deployment/exploitation Franck Cappello
(AS1, RTP8) Programming models Raymond Namyst
(AS2, RTP8)
9
Grid5000 Schedule
Call for Expression Of Interest
Vendor selection
Instal. First tests
Final review
Fisrt Demo (SC04)
Call for proposals
Selection of 7 sites
ACI GRID Funding
Grid5000 Hardware
Grid5000 System/middleware Forum
Security Prototypes
Control Prototypes
Grid5000 Programming Forum
Grid5000 Builder Community
Grid5000 Experiments
March04
Jun/July 04
Spt 04
Oct 04
Nov 04
Sept03
Nov03
Jan04
10
Grid5000 Funding (ACI Local
District/Prefecture)
0,6M
0,4
0,5
0,35
0,5
0,3?
0,35
Grid5000 2004
3M for hardware only
11
Grid5000 in September2004
Grid 5000 nodes
(soon 4)
3
12
Summary of Grid5000 XPs
  • Networking
  • End Host Communication layer
  • High performance long distance protocols
  • High Speed Network Emulation
  • Grid Networking Layer
  • Middleware / OS
  • Grid5000 control/access/experiment automation
  • Scheduling / data distribution in Grid
  • Fault tolerance in Grid
  • Resource management
  • Computational Steering
  • Grid SSI OS and Grid I/O
  • Desktop Grid/P2P systems
  • Programming
  • Component programming for the Grid (Java, Corba)
  • GRID-RPC
  • GRID-MPI
  • Code Coupling
  • Applications

13
Middleware1(XP)Grid5000
XP eXPeriments on
  • Grid5000 control
  • - Computing Environment deployment (Ka-tools)
  • - Experiment automation (security and control)
  • - VGrid mapping a virtual Grid on a real
    testbed
  • - Monitoring, benchmarking, performance
    characterization and analysis
  • Grid Scheduling / data distribution
  • - Scheduling Data transfers, global
    communications, work stealing,...
  • - Data re-distribution in Grid
  • - Task distribution and load balancing in
    heterogeneous Grid
  • - Mixed Parallelism (task and data parallelism)
  • - Mixing data management and task scheduling
  • - Hierarchical and Distributed Scheduling
  • Fault tolerance in Grid
  • - Fault tolerant Grid-RPC (RPC-V)
  • - Hierarchical Fault tolerant MPI (MPICH-V)
  • - Fault tolerant in data-flow approach
    (Athapascan)

14
Middleware2(XP)Grid5000
  • Grid Management
  • - AROMA tool resources management over a Grid
    of clusters with different classes of services
  • - Mobile agents for open Grid management
  • - Management of Grids and hosted services
    (security, QoS, monitoring control, dynamic
    configuration, )
  • Optimization for wide area distributed query
    processing
  • Tools to support the development, administration
    and usage of heterogeneous resources over the
    Grid
  • Virtualization of data storage on Grids
  • Automatic Deployment of GridRPC middle tier.
  • - Multiclusters and lightweights Grid resource
    management (OAR/CIGRI)
  • Global Computing/P2P Middleware
  • - Executing Web Services on Desktop Grid Workers
    (XtremWeb)
  • - Distributing the Coordination in Desktop Grids
    (XtremWeb)
  • - Harnessing Clusters as parallel Workers
  • - Probabilistic certification in peer-to-peer
    systems
  • - Large Scale Data Sharing Service based on JXTA
    (JuxMem)
  • - Management services for textual document in
    P2P systems

15
Network(XP)Grid5000
  • End Host Communication layer
  • Communication libraries Madeleine,
    MPICH/Madeleine
  • - Intelligent Usage of NICs for local and wide
    area communications
  • - Direct file access over Myrinet ORFA/NFS and
    ORFA/LUSTRE
  • High performance long distance protocols
  • - Alternative Transport for very high speed
    networks (backpressure)
  • - Differentiated transport with delay control on
    WAN
  • Reliable active and non active Multicast
  • Network Bandwidth optimization in Grid (VTHD,
    Paco).
  • - High performance communication across
    heterogeneous networks
  • Fast forwarding and Multiplexing of data on
    gateway nodes
  • High Speed Network Emulation
  • - Automatic Deployment of emulated high speed
    domains
  • - Experiment design for grid flow interactions
    studies
  • Grid Networking Layer
  • - Network Resource and QoS on demand
  • - Grid Overlay and Programmable Routers
  • Measurement Services for network aware middleware

16
Programming(XP)Grid5000
  • Component programming on the grid
  • - ProActive a JAVA library (parallel,
    distributed, concurrent
  • computing with security and mobility)
  • Assessment of scalability, deployment, security
    and fault
  • tolerance issues
  • Hierarchical components architecture
  • PadicoTM/Paco combining parallel and
    distributed computing
  • RPC Environment
  • Large scale experimentation of the DIET
    platform (Distributed
  • Interactive Engineering Toolbox)
  • Client/Agent/Server model following the GridRPC
    standard with
  • distributed scheduling agents
  • MPI Environment
  • - Time sharing Grid resources
  • Migration over Clusters with heterogeneous high
    speed networks
  • Code Coupling

17
Applications1(XP)Grid5000
  • Multi-parametric applications
  • - ACI GRID-TLSE Project expertise site for
    sparse linear algebra
  • - Climate modeling and Global Change
  • DataGène Project Functional genomic
  • Large scale experimentation of distributed
    applications
  • MECAGRID (ACI GRID project, Smash project-team)
  • Massively parallel computations in multi-material
    fluid mechanics
  • Study of numerical algorithms for heterogeneous
    computing platforms
  • Grid computing for medical applications (Epidaure
    project-team)
  • Interoperable medical image registration grid
    service
  • Optimal design of complex systems (Coprin
    project-team)
  • Evaluation of parallel optimization algorithms
    based on interval analysis techniques
  • Study of load balancing strategies on
    heterogeneous resources
  • Fluid mechanics, molecular dynamics and
    host-parasite systems in population dynamics,
    etc.
  • CFD, astrophysics, applications
  • Collaborating tools in virtual 3D environment.

18
Applications2(XP)Grid5000
  • Steering
  • JECS a JAVA Environment for Computational
    Steering
  • Distributed computing and interactive
    visualization of 3D numerical simulations (Caiman
    and Oasis project-teams)
  • Collaborative environment
  • Computational Electromagnetism application
    (JEM3D)
  • Steering of numerical simulations (ACI GRID-EPSN
    Project)
  • Parallel on-line visualization / monitoring
  • Data Redistribution
  • Computational Steering by direct image
    manipulation
Write a Comment
User Comments (0)
About PowerShow.com