Title: An evaluation of ring-based algorithms for the Eventually Perfect failure detector class
1An evaluation of ring-based algorithms for the
Eventually Perfect failure detector class
- Joachim Wieland
- Mikel Larrea
- Alberto Lafuente
- The University of
- the Basque Country
2Contents
- Motivation
- Unreliable failure detectors
- System model
- Communication-efficient implementations of ?P
- A non communication-efficient approach
- Performance evaluation
- Conclusion
3Motivation
- FLP impossibility result (Fischer, Lynch, and
Paterson) Consensus cannot be solved
deterministically in an asynchronous system
subject to even a single process crash - Possibility result (Chandra and Toueg) Consensus
can be solved in an asynchronous system subject
to failures with an unreliable failure detector
4Motivation (2)
- Evaluate different ring-based algorithms for ?P
- Two kinds of performance parameters
- Communication efficiency
- Quality of service
- Two families of algorithms
- Communication-efficient ?P some optimizations
- Non communication-efficient ?Q transformation
to ?P - modular approach
- designed with quality of service in mind
5Unreliable failure detectors
- Distributed oracle that provides (possibly
incorrect) hints about the operational status of
other processes - Abstractly characterized in terms of two
properties completeness and accuracy - Completeness characterizes the degree to which
crashed processes are suspected by correct
processes - Accuracy characterizes the degree to which
correct processes are not suspected, i.e.,
restricts the false suspicions that a failure
detector can make
6Unreliable failure detectors (2)
7System model
- Finite set of n processes ? p1, p2, ..., pn
that communicate only by message-passing - Every pair of processes is connected by two
unidirectional and reliable communication links - Processes can fail by crashing. Once a process
crashes, it does not recover - Processes are arranged in a logical ring
- Partially synchronous system
8Communication-efficient implementations of ?P
- A basic communication-efficient algorithm (LLW0)
- each process p sends heartbeats to the processes
in the ring between itself (excluded) and its
successor succp (included) - p monitors its predecessor predp by hearing
heartbeats from it - upon timeout on predp, p suspects it and monitors
the predecessor of predp - if p erroneously suspected q, p starts monitoring
q again - processes propagate the list of suspicions around
the ring, piggybacked in the heartbeats - upon reception of the list of suspicions from
predp, p builds a new list by merging the list
received with its local suspicions. Then, p sets
succp to its nearest and non suspected process
9Communication-efficient implementations of ?P (2)
LLW0
pred1 p6, succ1 p2
p5
p5
pred2 p1 succ2 p3
pred6 p5 succ6 p1
pred6 p4 succ6 p1
p5
pred5 p4 succ5 p6
pred3 p2 succ3 p4
p5
pred4 p3, succ4 p5
pred4 p3, succ4 p6
10Communication-efficient implementations of ?P (3)
- Providing a faster stabilization of the ring
(LLW1) sending sporadic one-to-one messages - upon timeout on predp, p sends (START, p) to
pred(predp). Upon reception of this message,
pred(predp) sets its successor to p - when p learns that it is erroneously suspecting
q, p sends (START, q) to ps current predp. Upon
reception of this message, ps current predp sets
its successor to q - Broadcasting suspicions to reduce the detection
latency (LLW2) sending sporadic one-to-all
messages - upon timeout on predp, p sends (SUSPICION, predp)
to all processes - when a process p is being erroneously suspected,
it sends (REFUTATION, p) to all processes
11Communication-efficient implementations of ?P (4)
LLW1
pred1 p6, succ1 p2
p5
p5
pred2 p1 succ2 p3
pred6 p5 succ6 p1
pred6 p4 succ6 p1
p5
pred5 p4 succ5 p6
pred3 p2 succ3 p4
p5
pred4 p3, succ4 p5
pred4 p3, succ4 p6
12Communication-efficient implementations of ?P (5)
LLW2
pred1 p6, succ1 p2
p5
p5
pred2 p1 succ2 p3
pred6 p5 succ6 p1
pred6 p4 succ6 p1
p5
pred5 p4 succ5 p6
pred3 p2 succ3 p4
p5
pred4 p3, succ4 p5
pred4 p3, succ4 p6
13A non communication-efficient approach
- A basic ring-based algorithm implementing ?Q
- identical monitoring schema to LLW0
- the list of suspicions does not circulate
around the ring - every process p periodically sends (START, p)
to predp. When a process p receives (START,
new_succ), it sets succp to new_succ - Providing a faster stabilization of the ring
- Transforming ?Q into ?P
- propagating suspicions through the ring (LLWQP1)
- broadcasting suspicions to reduce the detection
latency (LLWQP2)
14A non communication-efficient approach (2)
LLWQ
pred1 p6, succ1 p2
pred2 p1 succ2 p3
pred6 p5 succ6 p1
pred6 p4 succ6 p1
pred5 p4 succ5 p6
pred3 p2 succ3 p4
pred4 p3, succ4 p5
pred4 p3, succ4 p6
15Performance evaluation
16Performance evaluation (2)
17Performance evaluation (3)
18Performance evaluation (4)
19Performance evaluation (5)
20Performance evaluation (6)
21Conclusion
- Evaluation of two families of heartbeat-,
ring-based algorithms implementing ?P - Communication-efficient family
- n links are eventually used
- Non communication-efficient family
- modular approach
- n C links are eventually used (with 1 C n)
- better quality of service