Fault Tolerance - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Fault Tolerance

Description:

The Byzantine generals problem for 3 loyal generals and1 traitor. ... The same as in previous , except now with 2 loyal generals and one traitor. Reliable RPC ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 21

Provided by: steve1836

Category:

more less

Transcript and Presenter's Notes

Title: Fault Tolerance

1
Fault Tolerance

Chapter 7

2
Agreement in Faulty Systems

Lamport, L., Shostak, R., and Paese, M.,
Byzantine Generals Problem, ACM Transactions
on Programming Language Systems, vol. 4, no. 3,
ppp 382401, July 1982.
Basic result In a system with m faulty
processes, agreement can be achieved only if 2m1
correctly functioning processes are present, for
a total of 3m1 (assuming reliable messaging).
Without message guarantees, no agreement is
possible if even one process is faulty!! (Fischer
et al, 1985)
- For example, the two-army problem.

3
Agreement in Faulty Systems (1)

The Byzantine generals problem for 3 loyal
generals and1 traitor.
The generals announce their troop strengths (in
units of 1 kilosoldiers).
The vectors that each general assembles based on
(a)
The vectors that each general receives in step 3.

4
Agreement in Faulty Systems (2)

The same as in previous slide, except now with 2
loyal generals and one traitor.

5
Reliable RPC

Already covered

6
Reliable Group Communication

Reliable multicasting a message sent to a
process group is delivered to every process in
that group
What happens if a new process attempts to join
the group during the communication?
What happens if a sending process crashes during
communication?
Have to make distinction between reliable in the
presence of faults, and not.
First lets look at the weaker form (no faults,
no group changes, messages in an order).

7
Basic Reliable-Multicasting Schemes

A simple solution to reliable multicasting when
all receivers are known and are assumed not to
fail
Message transmission
Reporting feedback receivers know of misses

8
Basic Reliable-Multicasting Schemes

Some problems
N receivers implies N acknowledgements, leading
to ACK implosion
Can use NACKs instead, and this scales better
With negative acknowledgements, how do you know
when the sender can purge its send buffer?
There are several proposals for reliable
multicast.

9
Nonhierarchical Feedback Control

Several receivers have scheduled a request for
retransmission, but the first retransmission
request leads to the suppression of others.
SRM Scalable Reliable Multicast

10
Scalable Reliable Multicast

Intended to provide reliable packet delivery over
the Internet
A receiver sends a join message announcing it
wants to receive
Each group member is responsible for detecting
packet loss by finding gaps in the sequence
numbers
When a node detects a missing packet, it sends a
repair request for a random time in the future
when the timer goes off, the node multicasts the
repair request it retries after random delay.
If it receives the packet it cancels the repair
request.

11
Scalable Reliable Multicast

When a node receives a repair request it
schedules a repair for some time in the future.
When the timer goes off it multicasts the repair.
If it receives the repair from another node
before the timer goes off then it cancels the
repair.
Idea the delays help to reduce the number of
duplicate messages in the network and to reduce
ack/nack implosion
The repair timer depends on the distance to the
node requesting the repair, so a nearby node is
more likely to retransmit than a faraway node.

12
Hierarchical Feedback Control

The essence of hierarchical reliable
multicasting.
Each local coordinator forwards the message to
its children.
A local coordinator handles retransmission
requests.

13
The Atomic Multicast Problem

Idea a message either arrives successfully at
all nodes, OR to none at all
One important application is a replicated
database in a distributed system
If a message multicast is delivered to each
nonfaulty process in the group it is called
virtual synchrony. If the sender crashes the
message may be delivered to all remaining
processes or ignored by each of them.

14
Virtual Synchrony (1)

The logical organization of a distributed system
to distinguish between message receipt and
message delivery

15
Virtual Synchrony (2)

The principle of virtual synchronous multicast.

16
Message Ordering

In general, four different orderings are
possible
1) unordered multicasts
2) FIFO-ordered multicasts
3) causally-ordered multicasts
4) totally-ordered multicasts
A reliable, unordered multicast is a virtually
synchronous multicast in which no guarantees are
given concerning the order in which messages are
delivered by different processes.

17
Message Ordering (1)

Three communicating processes in the same group.
The ordering of events per process is shown along
the vertical axis.

18
Message Ordering (2)

Four processes in the same group with two
different senders, and a possible delivery order
of messages under FIFO-ordered multicasting

19
Implementing Virtual Synchrony (1)

Six different versions of virtually synchronous
reliable multicasting.

20
Implementing Virtual Synchrony (2)

Process 4 notices that process 7 has crashed,
sends a view change
Process 6 sends out all its unstable messages,
followed by a flush message
Process 6 installs the new view when it has
received a flush message from everyone else

Write a Comment

User Comments (0)