Fault Tolerance - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Fault Tolerance

Description:

The Byzantine generals problem for 3 loyal generals and1 traitor. ... The same as in previous , except now with 2 loyal generals and one traitor. Reliable RPC ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 21
Provided by: steve1836
Category:

less

Transcript and Presenter's Notes

Title: Fault Tolerance


1
Fault Tolerance
  • Chapter 7

2
Agreement in Faulty Systems
  • Lamport, L., Shostak, R., and Paese, M.,
    Byzantine Generals Problem, ACM Transactions
    on Programming Language Systems, vol. 4, no. 3,
    ppp 382401, July 1982.
  • Basic result In a system with m faulty
    processes, agreement can be achieved only if 2m1
    correctly functioning processes are present, for
    a total of 3m1 (assuming reliable messaging).
  • Without message guarantees, no agreement is
    possible if even one process is faulty!! (Fischer
    et al, 1985)
  • - For example, the two-army problem.

3
Agreement in Faulty Systems (1)
  • The Byzantine generals problem for 3 loyal
    generals and1 traitor.
  • The generals announce their troop strengths (in
    units of 1 kilosoldiers).
  • The vectors that each general assembles based on
    (a)
  • The vectors that each general receives in step 3.

4
Agreement in Faulty Systems (2)
  • The same as in previous slide, except now with 2
    loyal generals and one traitor.

5
Reliable RPC
  • Already covered

6
Reliable Group Communication
  • Reliable multicasting a message sent to a
    process group is delivered to every process in
    that group
  • What happens if a new process attempts to join
    the group during the communication?
  • What happens if a sending process crashes during
    communication?
  • Have to make distinction between reliable in the
    presence of faults, and not.
  • First lets look at the weaker form (no faults,
    no group changes, messages in an order).

7
Basic Reliable-Multicasting Schemes
  • A simple solution to reliable multicasting when
    all receivers are known and are assumed not to
    fail
  • Message transmission
  • Reporting feedback receivers know of misses

8
Basic Reliable-Multicasting Schemes
  • Some problems
  • N receivers implies N acknowledgements, leading
    to ACK implosion
  • Can use NACKs instead, and this scales better
  • With negative acknowledgements, how do you know
    when the sender can purge its send buffer?
  • There are several proposals for reliable
    multicast.

9
Nonhierarchical Feedback Control
  • Several receivers have scheduled a request for
    retransmission, but the first retransmission
    request leads to the suppression of others.
  • SRM Scalable Reliable Multicast

10
Scalable Reliable Multicast
  • Intended to provide reliable packet delivery over
    the Internet
  • A receiver sends a join message announcing it
    wants to receive
  • Each group member is responsible for detecting
    packet loss by finding gaps in the sequence
    numbers
  • When a node detects a missing packet, it sends a
    repair request for a random time in the future
    when the timer goes off, the node multicasts the
    repair request it retries after random delay.
    If it receives the packet it cancels the repair
    request.

11
Scalable Reliable Multicast
  • When a node receives a repair request it
    schedules a repair for some time in the future.
  • When the timer goes off it multicasts the repair.
  • If it receives the repair from another node
    before the timer goes off then it cancels the
    repair.
  • Idea the delays help to reduce the number of
    duplicate messages in the network and to reduce
    ack/nack implosion
  • The repair timer depends on the distance to the
    node requesting the repair, so a nearby node is
    more likely to retransmit than a faraway node.

12
Hierarchical Feedback Control
  • The essence of hierarchical reliable
    multicasting.
  • Each local coordinator forwards the message to
    its children.
  • A local coordinator handles retransmission
    requests.

13
The Atomic Multicast Problem
  • Idea a message either arrives successfully at
    all nodes, OR to none at all
  • One important application is a replicated
    database in a distributed system
  • If a message multicast is delivered to each
    nonfaulty process in the group it is called
    virtual synchrony. If the sender crashes the
    message may be delivered to all remaining
    processes or ignored by each of them.

14
Virtual Synchrony (1)
  • The logical organization of a distributed system
    to distinguish between message receipt and
    message delivery

15
Virtual Synchrony (2)
  • The principle of virtual synchronous multicast.

16
Message Ordering
  • In general, four different orderings are
    possible
  • 1) unordered multicasts
  • 2) FIFO-ordered multicasts
  • 3) causally-ordered multicasts
  • 4) totally-ordered multicasts
  • A reliable, unordered multicast is a virtually
    synchronous multicast in which no guarantees are
    given concerning the order in which messages are
    delivered by different processes.

17
Message Ordering (1)
  • Three communicating processes in the same group.
    The ordering of events per process is shown along
    the vertical axis.

18
Message Ordering (2)
  • Four processes in the same group with two
    different senders, and a possible delivery order
    of messages under FIFO-ordered multicasting

19
Implementing Virtual Synchrony (1)
  • Six different versions of virtually synchronous
    reliable multicasting.

20
Implementing Virtual Synchrony (2)
  • Process 4 notices that process 7 has crashed,
    sends a view change
  • Process 6 sends out all its unstable messages,
    followed by a flush message
  • Process 6 installs the new view when it has
    received a flush message from everyone else
Write a Comment
User Comments (0)
About PowerShow.com