Kaoutar El Maghraoui, elmagkcs.rpi.edu - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Kaoutar El Maghraoui, elmagkcs.rpi.edu

Description:

High level APIs. Library support to integrate applications and middleware. 9/7/09. 5 ... Applications implement specific APIs to interface with IOS agents ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 38
Provided by: Sta7553
Category:

less

Transcript and Presenter's Notes

Title: Kaoutar El Maghraoui, elmagkcs.rpi.edu


1
An Architecture for Reconfiguring MPI
Applications in Dynamic and Heterogeneous
Environments
12th SIAM Conference on Parallel Processing for
Scientific Computing
  • Kaoutar El Maghraoui, elmagk_at_cs.rpi.edu
  • Department of Computer Science
  • Rensselaer Polytechnic Institute
  • http//wcl.cs.rpi.edu/ios/
  • In Collaboration with
  • Dr. Carlos Varela (Thesis Advisor)
  • Dr. Boleslaw Szymanski
  • Travis Desell
  • February 22, 2006

2
Todays Grid Environments
  • Infrastructure
  • Complex, large-scale, high fault rates, and
    dynamic
  • Applications
  • Complex development, deployment
  • Challenges
  • High-level application development interface
  • Designing and constructing applications for
    adaptability
  • Late mapping of applications to Grid resources
  • Monitoring and control of performance

3
MPI Challenges on Dynamic Grids
  • Tailored for tightly coupled systems
  • Dynamic reconfiguration
  • Process mobility
  • Scale up to accommodate new resources
  • Shrink to accommodate leaving or slow resources
  • Transparent performance monitoring and
    application adaptability
  • Currently handled by the programmer
  • Goal
  • Extending MPI with dynamic reconfiguration and
    adaptability to dynamic computational grids

4
Approach
  • Separation of concerns between the application
    and the middleware
  • Middleware-level
  • When and how to reconfigure applications?
  • Applications-level
  • Problem solving
  • Support for migration and/or malleability
  • Gap bridging software
  • High level APIs
  • Library support to integrate applications and
    middleware

5
IOS Overview
  • The Internet Operating System (IOS) is a
    decentralized middleware framework that provides
  • Opportunistic load balancing capabilities
  • Resource-level profiling
  • Application-level profiling
  • Goal
  • Automatic reconfiguration of applications in
    dynamic environments (e.g., Computational Grids)
  • Scalability to worldwide execution environments
  • Modular architecture enabling evaluation of
    different load balancing and resource profiling
    strategies
  • Generic interfaces to interoperate with various
    programming models

6
IOS Architecture
  • Distributed middleware agents
  • Encapsulate modules for resource profiling and
    reconfiguration policies.
  • Capable of interconnecting in various virtual
    topologies (e.g., hierarchical or P2P)
  • Interface with high level applications
  • Interfacing with IOS agents
  • Applications implement specific APIs to interface
    with IOS agents
  • Applications need to support component
    migration/malleability

7
IOS Architecture
IOS-enabled Node
Reconfiguration request (migrate/split/merge/repli
cate)
Application Component
Message passing
Application profiling
IOS API
Decision Module
Profiling Module
Protocol Module
Steal requests
Communication profiles
Reconfigure?
List of profiles
Evaluates the gain of a potential
reconfiguration
Sends steal requests/ Receives steal requests
Available processing
Decision
Interfaces to resources profilers
Inter-delay info
Network monitor
Memory monitor
CPU monitor
Initiate a steal request
IOS Agent
8
IOS Load Balancing Strategies
  • Modularity for customizable load balancing and
    profiling strategies, e.g.
  • Random work-stealing (RS)
  • Based on Cilks work stealing approach
  • Lightly-loaded nodes send work steal packets to
    heavily loaded nodes
  • Application topology-sensitive work-stealing
    (ATS)
  • Extension to RS
  • Collocate processes communicating frequently
  • Network topology-sensitive work-stealing (NTS)
  • Extension to ATS
  • Considers network topology
  • Minimizes WAN latencies

9
Reconfiguring MPI Applications with IOS
  • Extending MPI
  • Semi-transparent checkpointing
  • Process migration support
  • Integration with IOS
  • Currently for iterative applications

10
The MPI/IOS Runtime Architecture
  • Instrumented MPI applications
  • Process Checkpointing and Migration (PCM) library
  • Wrappers for some MPI native calls
  • The MPI library
  • The IOS runtime components

11
MPI/IOS Interactions
12
MPI Process Migration
  • Implemented at the user-level
  • Relies on MPI communicator rearrangements and
    MPI-2 spawning feature
  • Instrumentation of programs with PCM calls
  • Benefit portability
  • Limitation semi-transparency

13
Migration Example
3
1
4
2
5
0
MPI_COMM_WORLD
14
Migration Example
MPI_Intercomm_merge merges the two communicators
3
6
1
4
2
5
0
MPI_COMM_WORLD
15
Migration Example
MPI_Comm_create creates a new communicators
3
3
1
4
2
5
0
MPI_COMM_WORLD
16
Profiling MPI Applications
  • The profiling library is based on the MPI
    profiling interface
  • Transparent interception of all MPI calls
  • Goal Profile MPI applications' communication
    patterns

17
How to Instrument MPI Programs with
PCM?(Initialization Phase)
  • include mpi.h
  • include "pcm.h
  • MPI_Comm PCM_COMM_WORLD
  • int main(int argc, char argv)
  • MPI_Init( argc, argv )
  • PCM_COMM_WORLD MPI_COMM_WORLD
  • PCM_Init(PCM_COMM_WORLD)
  • MPI_Comm_rank( PCM_COMM_WORLD, rank )
  • MPI_Comm_size( PCM_COMM_WORLD, n )
  • spawnrank PCM_Process_Status()
  • if(spawnrank gt 0)
  • //load any checkpointed data
  • PCM_Load()

18
How to Instrument MPI Programs with
PCM?(Iterations Phase)
  • for(several iterations)
  • pcm_status PCM_Status(PCM_COMM_WORLD)
  • if(pcm_status PCM_MIGRATE)
  • //checkpoint data
  • PCM_Store()
  • PCM_COMM_WORLD PCM_Reconfigure()
  • else if(pcm_status PCM_RECONFIGURE)
  • PCM_COMM_WORLD PCM_Reconfigure()
  • MPI_Comm_rank(PCM_COMM_WORLD, rank)
  • // Data Computation.
  • //Exchange of computed data with
    neighboring processes.
  • // MPI_Send() MPI_Recv()
  • PCM_Finalize(PCM_COMM_WORLD)
  • MPI_Finalize()

19
A Reconfiguration Scenario

20
A Reconfiguration Scenario

21
Case Study Heat Diffusion Problem
  • A problem that models heat transfer in a solid
  • A two-dimensional mesh is used to represent the
    problem data space
  • An Iterative Application
  • Highly synchronized

22
Adaptation Experiments
23
Adaptation Experiments (2)
Adaptation through removing a slow processor
24
Adaptation Experiments (3)
Adaptation through migration to a better cluster
25
Empirical Results Overhead of the PCM library
26
Reconfiguration Overhead
27
Breakdown of Reconfiguration Cost
28
Ongoing/Future Work
  • Splitting and Merging MPI Application Processes
  • New reconfiguration policies on dynamic
    environments
  • More realistic load characteristics and network
    latencies.
  • Interoperability with MPICH-G2
  • Improving the PCM API
  • Non-iterative applications

29
Related Work
  • MPICH-G2
  • Grid-enabled implementation of MPI
  • http//www3.niu.edu/mpi/
  • Adaptive MPI (AMPI)
  • MPI implementation with light threads for process
    migration Huang03
  • MPI Process Swapping
  • Initial over-allocation of processors and
    selection of the best executing nodes Sievert04
  • Extensions to MPI with checkpointing and restart
  • SRS library Vadhiyar03 application stop and
    restart
  • CoCheck Stellner96 and StarFishAgbaria99
    Fault tolerance
  • MPICH-VBouteiller05 Fault tolerance

30
Questions?
31
Backup Slides
32
Resource Sensitive Model
  • Decision components use a resource sensitive
    model to decide based on the profiled
    applications how to balance the resources
    consumption
  • Reconfiguration decisions
  • Where to migrate
  • When to migrate
  • How many entities to migrate

33
A General Model for Weighted Resource-Sensitive
Work-Stealing (WRS)
  • Given
  • A set of resources, R r0 rn
  • A set of actors, A a0 an
  • w is a weight, based on importance of the
    resource r to the performance of a set of actors
    A
  • 0 w(r,A) 1
  • Sall r w(r,A) 1
  • a(r,f) is the amount of resource r available at
    foreign node f
  • u(r,l,A) is the amount of resource r used by
    actors A at local node l
  • M(A,l,f) is the estimated cost of migration of
    actors A from l to f
  • L(A) is the average life expectancy of the set of
    actors A
  • The predicted increase in overall performance G
    gained by migrating A from l to f, where G 1
  • D(r,l,f,A) (a(r,f) u(r,l,A)) / (a(r,f)
    u(r,l,A))
  • G Sall r (w(r,A) D(r,l,f,A))
    M(A,l,f)/(10log L(A))
  • When work requested by f, migrate actor(s) A with
    greatest predicted increase in overall
    performance, if positive.

34
IOS API
  • The following methods notify the profiling agent
    of actors entering and exiting the theater due to
    migration and binding
  • public void addProfile(UAN uan)
  • public void removeProfile(UAN uan)
  • Public void migrateProfile(UAN uan, UAL target)
  • The profiling agent updates its actor profiles
    based on message sending with these methods
  • public void msgSend(UAN uan, Msg_INFO msgInfo)
  • The profiling agent updates its actor profiles
    based on message reception with this method
  • public void msgReceive(UAN uan, targetUAL,
    Msg_INFO msgInfo)
  • The following methods notify the profiling agent
    of the start of a message being processed and the
    end of a message being processed, with a UAN or
    UAL to identify the sending actor
  • public void beginProcessing(UAN uan, Msg_INFO
    msgInfo)
  • public void endProcessing(UAN uan, Msg_INFO
    msgInfo)

35
Virtual Topologies of IOS Agents
  • Agents organize themselves in various
    network-sensitive virtual topologies to sense the
    underlying physical environments
  • Peer-to-peer topology agents form a p2p network
    to exchange profiled information.
  • Cluster-to-cluster topology agents organize
    themselves in groups of clusters. Cluster
    managers form a p2p network.

36
C2C vs. P2P topologies
37
How to Instrument an MPI Program?
  • The PCM API
  • Process Checkpointing and Migration API
  • Register variables with a check-point handler
  • Store data locally or remotely in a PCM Daemon.
  • Restores previously check-pointed data
  • Periodic probing of the status of an MPI
    application or MPI process.
  • The PCM Daemon
  • Loaded on every participating node.
  • Communicates with IOS agents and the MPI
    profiling library
  • Handles process migration
Write a Comment
User Comments (0)
About PowerShow.com