FTOP: A library for fault tolerance in a cluster - PowerPoint PPT Presentation

About This Presentation
Title:

FTOP: A library for fault tolerance in a cluster

Description:

Fault tolerant environment built for PVM. ... Local state of the tasks are restored using setjmp() and longjmp() calls. ... An lseek call provides the file pointer. ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 22
Provided by: cse156
Category:

less

Transcript and Presenter's Notes

Title: FTOP: A library for fault tolerance in a cluster


1
FTOP A library for fault tolerance in a cluster
R. Badrinath Rakesh Gupta Nisheeth Shrivastava
2
Why FTOP ?
  • Fault tolerant environment built for PVM.
  • Implements a transparent fault tolerance
    technique using Checkpointing and Rollback
    Recovery for PVM based distributed applications.
  • Handles issues related to in-transit messages,
    routing of messages to migrated tasks and open
    files.
  • Entirely at user level. No changes in kernel
    needed.
  • Intended to be extensible to other CRR schemes.

3
FTOP assumptions
  • Assumes a homogeneous Linux cluster with PVM
    running on them.
  • One of the host is configured as a Global
    resource Manager which is assumed to be fault
    free. ... (impl.!)
  • Another host assumed to be fault free is
    configured as the Stable storage. The file system
    of the stable storage is NFS mounted on all other
    host. (Using NFS has problems ?)
  • Assumes reliable FIFO channels between hosts in
    the cluster.
  • Handles task/node crash failure only.

4
System and Fault Model
  • System consists of
  • A set of workstations.
  • Connected through a high speed LAN.
  • Stable storage accessible to all workstations
    (assumed to be fault proof).
  • Fault can be
  • Network failure.
  • Node failure.
  • Fail stop model.

5
Implementation Checkpointing
  • Non blocking Coordinated checkpointing.
  • What is checkpointed?
  • The process context ( pc value, registers etc. ).
  • The process control state( like pid, parent pid,
    fd of open files etc.).
  • The process address space ( the text area, data
    area and stack area).
  • Where are the checkpoints stored ?
  • On a stable storage(assumed to be failure proof).
  • Two checkpoint files for each process.

6
How we checkpoint
  • The process context ( pc ,register value etc )
  • Signal mechanism. A process on receiving a signal
    saves state in stack which could be
    checkpointed.. Use of setjmp( ) and longjmp( ).
  • The process Memory Regions
  • RO sections are not checkpointed. Other
    sections are checkpointed by writing them to a
    file.
  • /proc file system provides section boundaries.
  • The process control state
  • Written to a regular file named after the
    taskid.

7
Checkpoint Protocol
SIGALARM
SM_CKPTSIGNAL
SIGUSR1
TM_CKPTDONE
SM_CKPTDONE
SM_CKPTCOMMIT
SIGUSR1
GRM
PVMd
TASK
Time Diagram of the Checkpointing Protocol.
It is based on 2 phase commit Protocol.
8
Checkpoint Protocol (contd..)
GRM
SM_CKPTSIGNAL SM_CKPTDONE SM_CKPTCOMMIT
Host 1
Host2
PVMd
PVMd
SIGUSR1 TM_CKPTDONE
Task 2
Task 3
Task 1
Task 2
Task 3
Task 1
9
Other Messages
  • Two more messages are required for the
    consistency of the checkpoints taken -
  • TM_Ckptsignal ( from task to its daemon )
  • DM_Ckptsignal ( from daemon to another daemon )
  • To allow checkpointing to be partly non-blocking,
    these messages precede any application message
    when the checkpoint protocol is in progress i.e.
    after a process has taken a checkpoint and before
    the checkpoint is committed.

10
Other Messages (contd ..)
  • For TM_Ckptsignal if the application message is
    destined to a local task the daemon determines
    the status of the task and delivers the message
    to the destination only if it has completed its
    checkpoint.
  • If the application message is bound to a foreign
    task the daemon sends DM_Ckptsignal to the
    destination before sending the application
    message.

11
Recovery
  • Fault Detection
  • Daemons detect node failure.
  • Inform GRM through SM_HOSTX message
  • Fault Assessment
  • GRM finds all the failed tasks.
  • Fault Recovery
  • GRM spawns the failed tasks on appropriate hosts.
    Each Failed tasks start from beginning and then
    copy its last checkpoint on its own address space.

12
Recovery (contd..)
  • Recovering tasks
  • Local state of the tasks are restored using
    setjmp() and longjmp() calls. Setjmp() is called
    before checkpointing begins and longjmp() is
    called after the address space is restored from
    the checkpoint file.
  • Note issues related to
  • Processes which started after the recovery-line.
  • Processes which exited normally after
    recovery-line.

13
Recovery (contd..)
  • GRM starts the recovery protocol
  • Calculates the recovery-line.
  • Transmits to every process the file-id of the
    last committed checkpoint (integer 1 or 2).
  • Each process restores its checkpointed image.
  • Processes not allowed to send or receive
    application Messages during the recovery stage.

14
Recovery Protocol
HOSTX
SM_RECOVER
SIGUSR2
TM_RECOVERYDONE
SM_RECOVERYDONE
SM_RECOVERYCOMMIT
SIGUSR2
GRM
PVMd
TASK
15
Other Issues
  • In-transit messages.
  • Logging reliable comm. model, part of
    checkpoint.
  • Replaying Before future interaction.
  • Routing.
  • Why a problem?
  • Maintain route table what to keep
  • Open files.
  • Why a problem?
  • How to handle
  • Reconnecting with daemon.

16
Handling Routing
  • tid (task identifier) is used as an address of
    message in PVM. Failed task when they recover get
    a new tid. Other tasks dont know about this
    change causing routing problems.
  • A mapping table of the oldest and the most recent
    tid of a task is maintained.
  • Header of each message is parsed and if the
    message is destined to one of the failed task,
    then the address field is replaced with the most
    recent tid of the failed task.

17
Handling Open files
  • lsof a Linux utility provides list of all open
    files, their descriptors and mode. An lseek call
    provides the file pointer.
  • All this information (file name, descriptor, mode
    and file pointer) is stored with the checkpoint
    image of the process.
  • The state of the file is restored using this
    information at the time of recovery.
  • May need to actually checkpoint the file content.

18
Reconnecting with the Daemon
  • A task is connected to the virtual machine
    through the PVM daemon. A failed task when spawns
    on a new host needs to reconnect to the daemon.
  • It connects to the new daemon through the unix
    domain socket name advertised by the daemon in a
    host specific file. It will also clean up old
    socket information.

19
Testing
  • Testing Environment
  • The Hosts 3-5 Pentium III with red hat Linux
    7.1
  • The Channel 100 Mbps Ethernet LAN.
  • Failure Simulation By removing a host from the
    virtual machine.
  • Test Cases
  • Matrix Multiplication.
  • PVMPOV (full featured distributed ray tracer
    algorithm build on PVM).
  • Others for correctness simple file I/O,
    ping-pong etc.

20
Overheads
Checkpointing Overhead for the Matrix
multiplication program.
260
255
254
250
247
246
245
Checkpointing Overhead for the PVMPOV program.
241
241
240
238
236
235
Running time
230
228
225
220
215
30
60
90
120
150
180
210
240
Checkpointing Interval
21
Conclusion and future work
  • Builds fault tolerance into the standard PVM
    staying entirely at the user level.
  • Able to rollback the open files and in transit
    messages.
  • In future direction we wish to handle device
    association which may require explicit OS
    support.
  • We also intend to integrate well known
    optimizations into the checkpointing protocol.
  • We also aim to other CRR Schemes.
Write a Comment
User Comments (0)
About PowerShow.com