Title: Checkpointing-based Rollback Recovery for Parallel Applications on the InteGrade Grid Middleware
1Checkpointing-based Rollback Recovery for
Parallel Applications on the InteGrade Grid
Middleware
- Raphael Y. de Camargo
- Andrei Goldchleger
- Fabio Kon
- Alfredo Goldman
- Department of Computer Science
- University of São Paulo, Brazil
Middleware 2004 Toronto, Canada 2nd
International Workshop on Grid Computing
2Summary
- Introduction
- InteGrade Grid middleware
- BSP Computing Model
- Checkpointing-based Rollback Recovery
- Checkpointing Infrastructure
- Preliminary Experiments
- Conclusions
3Introduction
- Grid Computing
- Grid computing allows the leveraging and
integration of computer resources distributed
across LANs and WANs - Besides dedicated computing resources, it is also
possible to use idle computing power from
commodity workstations (opportunistic computing)
- Challenges
- Environment composed of shared user workstations
spread across many different LANs. - Machines may fail, become unaccessible, or may
switch from idle to busy very rapidly - Some mechanism for fault-tolerance is a major
requirement for such a system.
4InteGrade Grid Middleware
- Objectives
- Use idle computing power of commodity
workstations (opportunistic computing) - Allow organizations to increase their available
computing power without buying extra hardware - Ensures the quality of service of machine owners
sharing its computing resources
- Implementation Status
- Basic architecture already implemented
- Uses CORBA distributed object technology for
communication - Provides support for execution of sequential, BSP
and bag-of-tasks applications
5InterCluster InteGrade Architecture
- GRM (Global Resource Manager)
- Manages the grid resources and schedules
applications for execution - ASCT
- Allows the submission and controlling of
applications on the Grid - LRM (Local Resource Manager)
- Manages a nodes resources
- Runtime Libraries
- Provide support for running parallel applications
6BSP Parallel Computing Model
- Computation is performed using a sequence of
parallel supersteps - Each superstep is composed of computation and
communication, with a synchronization barriers in
the end - All data from communication is available to other
processes only in the next superstep - Two communication Mechanisms
- Direct Remote Memory Access (DRMA)
- Bulk Synchronous Message Passing (BSMP)
7 Checkpointing-based Rollback Recovery
- Checkpointing
- Consists in periodically saving the application
state into a checkpoint, so that its state can be
recovered from it - Checkpointing-based Rollback-Recovery
- Process of reinitializing an application from an
intermediate execution point after a failure is
detected
- Two approachs for checkpointing
- System-level
- - The memory space and processor registers from
an application are saved into the checkpoint. - Application-level
- - The application is responsible for providing
the data to be saved and reconstructing its state
from the checkpoint.
8Application-level checkpointing
- The application is reponsible for
- Providing which data needs to be saved
- Recovering its state from a previous checkpoint
- Advantages
- Semantic information about data being saved
Possibility of generating portable checkpoints - Only the necessary data for recovering
application state needs to be saved
- Disadvantages
- Need to instrument source-code with
checkpointing code - Necessary to have access to application
source-code - Cannot generate forced checkpoints
9Checkpointing of Parallel Applications
- In case of parallel applications we must consider
the depencies among application processes
generated by message exchanges
- Global checkpoint is a collection contaning
checkpoints from every application process. In
the diagram, the global checkpoint s1 is
inconsistent while global checkpoint s2 is
consistent. - BSP applications consistency can be guaranteed
by generating the checkpoints after the
synchronization phases.
10Checkpointing Infrastructure
- Pre-Compiler
- Instruments a C/C application source-code with
checkpointing code - Runtime libraries
- Allows saving the application state into a
checkpoint and recovering the data from a
previous checkpoint - ExecutionMonitor
- Keep information about applications running on
the grid, allowing the restarting of these
applications in case of failures.
11PreCompiler
- Based on OpenC. Permits that we use
compile-time reflection to instrument an
application source-code with checkpointing code - Needs to modify application code in order to save
the following data - Execution Stack contains runtime data from the
active functions in a particular moment during
application execution - Position Counter the current position in the
program - The Heap contains memory chuncks allocated by
commands such as malloc and new - Global variables
12Saving and Recoveringthe Execution Stack State
Execution Stack
local variables
control information
function parameters
local variables
control information
function parameters
- Execution stack state
- Not directly accessible from application code.
- Saving the execution stack state
- Save a list of the currently active functions
and the values of their local variables. - Recovering the execution stack state
- Call the functions in the saved list, declare
the local variables and recover their values from
the checkpoint. The remaining code is skipped. - Position Counter
- Process state will only be saved in certain
points in the source code, marked by a call to
some function, such as checkpoint_candidate()
13Saving Local Vars and Pointers
- Local Variables
- Auxiliary stack keeps the address of local
variables that are currently in scope. - Local variable addresses are pushed into the
stack just after their declaration, and removed
when the variables leave scope - During checkpoint generation, the values
contained in these addresses are saved in the
checkpoint.
- Pointers
- In the case of pointers, it is necessary first
to dereference the pointer - When saving pointer with multiple levels of
indirection it is necessary to follow the pointer
graph structure - Special care is necessary with graphs containing
cycles and when multiple pointers reference the
same memory chunk
14Saving the Heap Memory
- HeapManager
- Mantains a list of currently allocated chunks of
memory - Includes the memory address, its size, and a flag
that indicates if that chunk has already been
saved during checkpoint generation - Updated before memory allocation calls such as
malloc and free for C and new and delete for C.
15Classes, Structures and BSP Calls
- BSP
- The bsp_begin and bsp_synch standard functions
are replaced by functions from the checkpointing
library - During reinitialization, calls to functions that
modify the state of the BSP library must be
reexecuted. - (e.g., bsp_pushregister)
- Structures
- Saved in the same way as local vars.
- Must follow the pointers present in the
structure. - Classes
- Use introspection to add methods for saving and
restoring the class members.
16Precompiler Example of Instrumented Code
- int function ()
- int lastFunctionCalled -1
- int localVar 0
- ckp_push_data(lastFunctionCalled,sizeof(int))
- ckp_push_data(localVar, sizeof(int))
- if ( ckpRecovering 1 )
- ckp_get_data(lastFunctionCalled,
sizeof(int)) - ckp_get_data(localVar, sizeof(int))
- if( lastFunctionCalled 0 )
- goto ckp0
-
- // Do computations (...)
- ckp0
- lastFunctionCalled 0
- functionA ( )
- // Do computations (...)
- ckp_npop_data(2)
- return localVar
? Original Code ? Modified Code
17Checkpointing Runtime Library
- Checkpointing Library
- Provides the functionality for mantaining a
stack of local variables, managing heap state and
saving the data to a checkpoint - Provides a timer that applications can set to
ensure a minimun time between checkpoints - Checkpoints are currently architecture dependent
and saved to file in the file system.
- BSP Ckp Library
- Provides specific functionality for
checkpointing BSP applications - bsp_begin_ckp( ) registers some addresses
necessary for checkpointing coordination and
initializes the timer. - bsp_synch_ckp( ) Test if the timer has expired
and if true, signals the others processes to
generate a new checkpoint.
18Application ExecutionMonitoring and
Reinitialization
- LRM
- Captures the exit status of running applications
and sends to the ExecutionMonitor - If process was explicitally killed by the signals
SIGTERM or SIGKILL it is restarted - BSP Applications
- For BSP applications, all the processes in the
application are reinitialized
- Execution Monitor
- Contains a list of running applications in the
nodes from its cluster - Reschedule new executions with the GRM for
failed processes - GRM
- Detects when a node or LRM fails and notifies
the Execution Monitor - Report nodes failures to the GRM
19Preliminary Experiments
- Sequence similarity application
- Compares two sequences of characters and finds
the similarity among them using given criteria. - Used in bioinformatics to compare sequences of
DNA. - Was parallelized using the BSP computing model
tmin nckp ttotal torig ovh
Experiments were performed on a cluster of 10
1.4GHz machines connect by a 100Mbps Fast
Ethernet network.
600s 0 339.7s 339.9s 0
60s 5 347.1s 339.9s 2.1
10s 23 371.9s 339.9s 9.4
20Conclusions
- We described an checkpointing-based rollback
recovery mechanism for applications running in
the InteGrade Grid middleware - This mechanism will allow a better resource
utilization in the Grid, since it will be
possible to migrate processes between nodes - Premiliminary indicates that checkpointing
overhead can be low enough to be used on
long-running BSP parallel applications
21Ongoing Work
- Improve pre-compiler support for C
- Support for portable checkpoints
- Allows better resource utilization In
heterogeneous environments - Robust storage system for checkpoints
- Data saved in a distributed way
- Provide some degree of replication to provide
fault-tolerance - Implement a efficient process migration mechanism
on InteGrade - Can be used for both fault-tolerance and dynamic
adaptation
22 Questions ?