SwiFT: SOFTWARE IMPLEMENTED FAULT TOLERANCE Pawan Kumar Choudhary, Kishor S' Trivedi - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

SwiFT: SOFTWARE IMPLEMENTED FAULT TOLERANCE Pawan Kumar Choudhary, Kishor S' Trivedi

Description:

Hot - SwiFT monitors all of a fault tolerant process's replicas. ... SwiFT's components are designed to handle both client and server error ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 13
Provided by: people3
Category:

less

Transcript and Presenter's Notes

Title: SwiFT: SOFTWARE IMPLEMENTED FAULT TOLERANCE Pawan Kumar Choudhary, Kishor S' Trivedi


1
SwiFT SOFTWARE IMPLEMENTED FAULT TOLERANCEPawan
Kumar Choudhary, Kishor S. Trivedi
  • Center for Advanced Computing and Communication
  • Department of Electrical and Computer Engineering
  • Duke University

2
Need for Software Fault-tolerance
  • From a users point of view, fault tolerance has
    two dimensions
  • Availability
  • Users of telephone switching systems, for
    example, demand continuous availability
  • Data consistency
  • Bank teller machine customers demand the highest
    degree of data consistency.
  • Safety critical, real-time systems, such as
    nuclear power reactors and flight control
    systems, need the highest levels of both
    availability and data consistency.

3
What is SwiFT
  • SwiFT( Software Implemented Fault tolerance) is a
    collection of daemon processes and C/C
    libraries .
  • It provides fault tolerance to applications on a
    cluster of Windows-NT nodes, logically configured
    as a ring.
  • It provides Automatic error detection and
    recovery, checkpointing/message-logging, fault
    management, event logging and replay ,data
    replications, and IP packets re-routing

4
Transaction processing vs. Process Replication
  • To achieve high availability and reliability in
    applications like telecommunication in a
    distributed network environment, two types of
    techniques have been deployed for fault
    tolerance
  • Transaction processing-
  • Applications usually have a well-defined
    transaction boundary, such as updating a record
    or establishing a communication channel.
  • When a fault occurs, both the client and server
    abort the on-going transaction and rollback to a
    clean state
  • Process replication-
  • It allows for faster recovery than transactional
    processing and for recovery of non-transactional
    and long transactional applications, such as
    switching systems and PBX's.
  • It is also suitable for applications which incur
    long transactions or do not satisfy transaction
    property, such as atomicity or isolation.
  • Process replication uses two techniques, atomic
    multicasting and checkpoint/message-logging, to
    make process states consistent

5
Replication techniques in SwiFT
  • SwiFT applies three different techniques for
    message replication-
  • Cold- Only one active copy of FT process is
    present. If it fails , SwiFT first tries to
    recover the failed process locally if the local
    recovery fails, SwiFT migrates the process onto
    another machine
  • Warm-One or more backup processes run on a
    network, and the primary process periodically
    checkpoints its state to its backup processes
  • Hot - SwiFT monitors all of a fault tolerant
    process's replicas. If SwiFT detects any replica
    failure, it recovers the failed replica so the
    number of replicas remains constant.

6
Checkpoint /Message-Logging
  • SwiFT applies the checkpoint/message-logging
    technique for fault tolerance.
  • SwiFT provides fault tolerant services by
    routinely checkpointing the server's state onto
    backup servers or into stable storage.
  • When a failure occurs, SwiFT stops the failed
    server process and either promotes a backup
    server to being the primary server or creates a
    new process.
  • Checkpointing in SwiFT is done with the help of
    application monitoring, application failure
    recovery, file replication, Windows events
    logging/replay, IP packets dispatching, and IP
    address fail-over.

7
Components of SwiFT
  • Watchd for process failure detection, recovery,
    replication management, and distributed system
    services,
  • Winckp for transparent process checkpointing and
    mouse/keyboard events logging and replaying,
  • Libft for data checkpointing, communication
    messages logging and recovery,
  • REPL for on-line incremental file replication and
    disaster recovery, and
  • One-IP for IP packets dispatching, fail-over and
    re-routing
  • SwiFT's components are designed to handle both
    client and server error recoveries so they can
    all be applied within a program.
  • Program developers can often access a server
    program's source code but have no control over
    the client programs developed by companies, using
    SwiFT makes client error recovery as transparent
    as possible

8
Applications
  • Embedded within the system to improve
    availability and reliability.
  • Specially useful in telecommunication as high
    availability is desired.
  • In e-commerce and financial transactions over
    internet where data consistency is utmost
    importance.

9
SwiFT and our Role
  • SwiFT has been developed by Lucent Technologies
    for Windows NT systems.
  • http//www.bell-labs.com/projects/swift
  • Our emphasis will be in using it for evaluating
    the effectiveness of different recovery mechanism
    . This will also help in verifying several
    recovery models. For example
  • S. Garg, Y. Huang, C. Kintala, K. S. Trivedi, and
    S. Yajnik. Performance and reliability evaluation
    of passive replication schemes in application
    level fault tolerance. 1999

10
Modeling with SwiFT for different Replication
schemes
CTMC model for server with no, cold and warm
replication
2)
1)
1) Plot of availability vs. ?n (mean time
for node failure detection) 2) Loss probability
and Throughput plotted vs. ?n (mean time for node
failure detection)
Effect of polling frequency, K 12 Polling
interval for SwiFT is set to 2 seconds.
?10,?P1sec.
11
Other Commercial FT processes
  • DOORS- Distributed Object Oriented Reliable
    Service
  • MSCS-Microsoft Cluster Server
  • Microsoft Wolfpack
  • Veritas First Watch
  • Vinca Standby Server
  • HP MC-Service Gaurd

12
Summary
  • SwiFT is a collection of re-usable software
    components that facilitate the development of
    fault-tolerant applications for the Windows NT
    operating system.
  • Designed with high available applications in
    mind its components addresses cold and warm
    replication management schemes.
  • SwiFT specializes in detecting hangs and failures
    resulting from system crashes.
  • SwiFT addresses the checkpointing of process
    states and the replication of files, processes
    and applications these items are useful to
    implement low-cost fault tolerance.
Write a Comment
User Comments (0)
About PowerShow.com