Kaoutar El Maghraoui, elmagkcs'rpi'edu - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Kaoutar El Maghraoui, elmagkcs'rpi'edu

Description:

Today: Condor. Support for matching application requirements to resources ... Can extend to Grid via Globus (Condor-G) What is missing? ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 44
Provided by: Sta7553
Category:

less

Transcript and Presenter's Notes

Title: Kaoutar El Maghraoui, elmagkcs'rpi'edu


1
An Architecture for Reconfiguring MPI
Applications in Dynamic and Heterogeneous
Environments
12th SIAM Conference on Parallel Processing for
Scientific Computing
  • Kaoutar El Maghraoui, elmagk_at_cs.rpi.edu
  • Department of Computer Science
  • Rensselaer Polytechnic Institute
  • http//wcl.cs.rpi.edu/ios/
  • In Collaboration with
  • Dr. Carlos Varela (Thesis Advisor)
  • Dr. Boleslaw Szymanski
  • Travis Desell
  • February 15, 2006

2
Todays Grid Environments
  • Infrastructure
  • Complex, large-scale, high fault rates, and
    dynamic
  • Applications
  • Complex deployment
  • Challenges
  • High-level application development interface
  • Designing and constructing applications for
    adaptability
  • Late mapping of applications to Grid resources
  • Monitoring and control of performance

3
MPI Challenges on Dynamic Grids
  • Tailored for tightly coupled systems
  • Dynamic reconfiguration
  • Process mobility
  • Scale up to accommodate new resources
  • Shrink to accommodate leaving or slow resources
  • Transparent performance monitoring and
    application adaptability
  • Currently handled by the programmer
  • Goal
  • Extending MPI with dynamic reconfiguration and
    adaptability to dynamic computational grids

4
Approach
  • Separation of concerns between the application
    and the middleware
  • Middleware-level
  • When and how to reconfigure applications?
  • Applications-level
  • Problem solving
  • Support for migration and/or malleability
  • Gap bridging software
  • High level APIs
  • Library support to integrate applications and
    middleware

5
IOS Overview
  • The Internet Operating System (IOS) is a
    decentralized middleware framework that provides
  • Opportunistic load balancing capabilities
  • Resource-level profiling
  • Application-level profiling
  • Goal
  • Automatic reconfiguration of applications in
    dynamic environments (e.g., Computational Grids)
  • Scalability to worldwide execution environments
  • Modular architecture enabling evaluation of
    different load balancing and resource profiling
    strategies
  • Generic Interfaces to interoperate with various
    programming models

6
IOS Architecture
  • Distributed middleware agents
  • Encapsulate modules for resource profiling and
    reconfiguration policies.
  • Capable of interconnecting in various virtual
    topologies (hierarchical or P2P)
  • Interface with high level applications
  • Interfacing with IOS agents
  • Applications implement specific APIs to interface
    with IOS agents
  • Applications need to support component
    migration/malleability

7
IOS Architecture
IOS-enabled Node
Reconfiguration request (migrate/split/merge/repli
cate)
Application Component
Message passing
Application profiling
IOS API
Decision Module
Profiling Module
Protocol Module
Steal requests
Communication profiles
Reconfigure?
List of profiles
Evaluates the gain of a potential
reconfiguration
Sends steal requests/ Receives steal requests
Available processing
Decision
Interfaces to resources profilers
Inter-delay info
Network monitor
Memory monitor
CPU monitor
Initiate a steal request
IOS Agent
8
IOS Load Balancing Strategies
  • Modularity for customizable load balancing and
    profiling strategies, e.g.
  • Random work-stealing (RS)
  • Based on Cilks work stealing approach
  • Lightly-loaded nodes send work steal packets to
    heavily loaded nodes
  • Application topology-sensitive work-stealing
    (ATS)
  • Extension to RS
  • Collocate processes communicating frequently
  • Network topology-sensitive work-stealing (NTS)
  • Extension to ATS
  • Considers network topology
  • Minimizes WAN latencies

9
Reconfiguring MPI Applications with IOS
  • Extending MPI
  • Semi-transparent checkpointing
  • Process migration support
  • Integration with IOS
  • Currently for iterative applications

10
The MPI/IOS Runtime Architecture
  • Instrumented MPI applications
  • Process Checkpointing and Migration (PCM) library
  • Wrappers for some MPI native calls
  • The MPI library
  • The IOS runtime components

11
MPI/IOS Interactions
12
MPI Process Migration
  • Implemented at the user-level
  • Relies on MPI communicator rearrangements and
    MPI-2 spawning feature
  • Instrumentation of programs with PCM calls
  • Benefit portability
  • Limitation semi-transparency

13
Migration Example
Migrate
MPI_SPAWN
3
0
1
4
Transfer of state
2
5
0
Newly created communicator
MPI_COMM_WORLD
14
Migration Example
MPI_Intercomm_merge merges the two communicators
3
6
1
4
2
5
0
MPI_COMM_WORLD
15
Migration Example
MPI_Comm_create creates a new communicators
3
3
1
4
2
5
0
MPI_COMM_WORLD
16
Profiling MPI Applications
  • The profiling library is based on the MPI
    profiling interface
  • Transparent interception of all MPI calls
  • Goal Profile MPI applications' communication
    patterns

17
How to Instrument MPI Programs with
PCM?(Initialization Phase)
  • include mpi.h
  • include "pcm.h
  • MPI_Comm PCM_COMM_WORLD
  • int main(int argc, char argv)
  • MPI_Init( argc, argv )
  • PCM_COMM_WORLD MPI_COMM_WORLD
  • PCM_Init(PCM_COMM_WORLD)
  • MPI_Comm_rank( PCM_COMM_WORLD, rank )
  • MPI_Comm_size( PCM_COMM_WORLD, n )
  • spawnrank PCM_Process_Status()
  • if(spawnrank gt 0)
  • //load any checkpointed data
  • PCM_Load()

18
How to Instrument MPI Programs with
PCM?(Iterations Phase)
  • for(several iterations)
  • pcm_status PCM_Status(PCM_COMM_WORLD)
  • if(pcm_status PCM_MIGRATE)
  • //checkpoint data
  • PCM_Store()
  • PCM_COMM_WORLD PCM_Reconfigure()
  • else if(pcm_status PCM_RECONFIGURE)
  • PCM_COMM_WORLD PCM_Reconfigure()
  • MPI_Comm_rank(PCM_COMM_WORLD, rank)
  • // Data Computation.
  • //Exchange of computed data with
    neighboring processes.
  • // MPI_Send() MPI_Recv()
  • PCM_Finalize(PCM_COMM_WORLD)
  • MPI_Finalize()

19
A Reconfiguration Scenario
Processor 1

MPI Process rank 1
IOS Agent
20
Case Study Heat Diffusion Problem
  • A problem that models heat transfer in a solid
  • A two-dimensional mesh is used to represent the
    problem data space
  • An Iterative Application
  • Highly synchronized

21
Adaptation Experiments
22
Adaptation Experiments (2)
Adaptation through removing a slow processor
23
Adaptation Experiments (3)
Adaptation through migration to a better cluster
24
Empirical Results Overhead of the PCM library
25
Reconfiguration Overhead
26
Breakdown of Reconfiguration Cost
27
Ongoing/Future Work
  • Splitting and Merging MPI Application Processes
  • New reconfiguration policies on dynamic
    environments
  • More realistic load characteristics and network
    latencies.
  • Interoperability with MPICH-G2
  • Improving the PCM API
  • Non-iterative applications

28
Related Work
  • MPICH-G2
  • Grid-enabled implementation of MPI
  • http//www3.niu.edu/mpi/
  • Adaptive MPI (AMPI)
  • MPI implementation with light threads for process
    migration Huang03
  • MPI Process Swapping
  • Initial over-allocation of processors and
    selection of the best executing nodes Sievert04
  • Extensions to MPI with checkpointing and restart
  • SRS library Vadhiyar03 application stop and
    restart
  • CoCheck Stellner96 and StarFishAgbaria99
    Fault tolerance
  • MPICH-VBouteiller05 Fault tolerance

29
Questions?
30
Backup Slides
31
Using the IOS middleware
  • Start IOS Peer Servers a mechanism for peer
    discovery
  • Start a network of IOS theaters
  • Write your SALSA programs and extend all actors
    to autonomous actors
  • Bind autonomous actors to theaters
  • IOS automatically reconfigures the location of
    actors in the network for improved performance of
    the application.
  • IOS supports the dynamic addition and removal of
    theaters

32
Parallel Issues
  • When running across multiple resources, the
    bandwidth and latencies of communication between
    processes on different resources is much greater
    than between processes on a single resource
  • Need to think about communication patterns is
    it possible to reduce the amount of communication
    by, for example, buffering data for longer and
    sending larger batches of data.

33
Today Globus
  • Developed by Ian Foster and Carl Kesselman
  • Grew from the I-Way (SC-95)
  • Basic Services for distributed computing
  • Resource discovery and information services
  • User authentication and access control
  • Job initiation
  • Communication services (Nexus and MPI)
  • Applications are programmed by hand
  • Many applications
  • User responsible for resource mapping and all
    communication
  • Existing users acknowledge how hard this is

34
Today Condor
  • Support for matching application requirements to
    resources
  • User and resource provider write ClassAD
    specifications
  • System matches ClassADs for applications with
    ClassADs for resources
  • Selects the best match based on a
    user-specified priority
  • Can extend to Grid via Globus (Condor-G)
  • What is missing?
  • User must handle application mapping tasks
  • No dynamic resource selection
  • No checkpoint/migration (resource re-selection)
  • Performance matching is straightforward
  • Priorities coded into ClassADs

35
Resource Sensitive Model
  • Decision components use a resource sensitive
    model to decide based on the profiled
    applications how to balance the resources
    consumption
  • Reconfiguration decisions
  • Where to migrate
  • When to migrate
  • How many entities to migrate

36
(No Transcript)
37
IOS API
  • The following methods notify the profiling agent
    of actors entering and exiting the theater due to
    migration and binding
  • public void addProfile(UAN uan)
  • public void removeProfile(UAN uan)
  • Public void migrateProfile(UAN uan, UAL target)
  • The profiling agent updates its actor profiles
    based on message sending with these methods
  • public void msgSend(UAN uan, Msg_INFO msgInfo)
  • The profiling agent updates its actor profiles
    based on message reception with this method
  • public void msgReceive(UAN uan, targetUAL,
    Msg_INFO msgInfo)
  • The following methods notify the profiling agent
    of the start of a message being processed and the
    end of a message being processed, with a UAN or
    UAL to identify the sending actor
  • public void beginProcessing(UAN uan, Msg_INFO
    msgInfo)
  • public void endProcessing(UAN uan, Msg_INFO
    msgInfo)

38
Virtual Topologies of IOS Agents
  • Agents organize themselves in various
    network-sensitive virtual topologies to sense the
    underlying physical environments
  • Peer-to-peer topology agents form a p2p network
    to exchange profiled information.
  • Cluster-to-cluster topology agents organize
    themselves in groups of clusters. Cluster
    managers form a p2p network.

39
C2C vs. P2P topologies
40
Parallel Decomposition of the Heat Problem
41
MPI Process Migration
  • Upon a migration notification from the IOS
    middleware
  • The migrating process saves its current state
    through the PCM checkpointing support
  • The rest of the processes get notified about the
    event of a migration. Any communication is
    suspended until migration is done.
  • The migrating process spawns a new process in the
    target location and sends its local checkpointed
    data (MPI-2)
  • The newly created process restores its state
  • Rearrangement of any shared communicators is
    performed collectively by all processes.
  • Computation is then resumed

42
How to Instrument an MPI Program?
  • The PCM API
  • Process Checkpointing and Migration API
  • Register variables with a check-point handler
  • Store data locally or remotely in a PCM Daemon.
  • Restores previously check-pointed data
  • Periodic probing of the status of an MPI
    application or MPI process.
  • The PCM Daemon
  • Loaded on every participating node.
  • Communicates with IOS agents and the MPI
    profiling library
  • Handles process migration

43
  • include ltmpi.hgt
  • int main(int argc, char argv)
  • MPI_Init( argc, argv )
  • MPI_Comm_rank( MPI_COMM_WORLD, rank )
  • MPI_Comm_size( MPI_COMM_WORLD,
    totalProcessors )
  • current_iteration 0
  • //Initialize and Distribute data among
    processors
  • for(several loops)
  • // Data Computation.
  • //Exchange of computed data with
    neighboring processes.
  • // MPI_Send() MPI_Recv()
  • // Data Collection
  • MPI_Barrier( MPI_COMM_WORLD )
  • MPI_Finalize()
  • return 0
Write a Comment
User Comments (0)
About PowerShow.com