Efficient Eventual Leader Election in Crash-Recovery Systems presentation

About This Presentation

Transcript and Presenter's Notes

Title: Efficient Eventual Leader Election in Crash-Recovery Systems

1
Efficient Eventual Leader Election in
Crash-Recovery Systems

Mikel Larrea, Cristian Martín, Iratxe Soraluze
University of the Basque Country, UPV/EHU

2
Contents

Motivation
System Model
Efficiency Definitions
A Near-Efficient Algorithm
Instability Awareness
Efficient Algorithms
Relaxing the Assumptions

3
Motivation

Unreliable failure detectors have been used to
address Consensus and related problems in
asynchronous crash-prone distributed systems
Theory impossibility/possibility results,
minimality results
Practice efficient implementations,
transformations
The Omega failure detector satisfies the
following property (eventual leader election)
there is a time after which every correct process
always trusts the same correct process
Omega is the weakest failure detector for solving
Consensus in the crash failure model

4
Eventual Leader Election
?p4
?p4
?p4
?p4
crashed
correct
5
Is Omega a Failure Detector?

The Eventually Perfect failure detector (?P)
satisfies
Strong completeness eventually every process
that crashes is permanently suspected by every
correct process
Eventual strong accuracy there is a time after
which correct processes are not suspected by any
correct process
The Eventually Strong failure detector (?S)
satisfies
Strong completeness
Eventual weak accuracy there is a time after
which some correct process is never suspected by
any correct process
Omega is equivalent to ?S

6
This Work

We address the implementation of Omega in the
crash-recovery failure model
crashed processes can recover
some (unstable) processes can crash and recover
infinitely often
Previously proposed algorithms are not efficient
they require every process to periodically send a
message to the rest of processes
We propose several algorithms in which
eventually, among correct processes, only one
(the elected leader) keeps sending messages
forever

7
System Model

Finite set of n processes ? p1, p2, ..., pn
that communicate only by message-passing
processes are synchronous
Every pair of processes is connected by two
unidirectional communication links, one in each
direction
types of links eventually timely, fair lossy
Crash-recovery failure model
types of processes eventually up, eventually
down, unstable
eventually up processes are correct, the rest
incorrect
we assume that at least one process is correct

8
Efficiency Definitions

An algorithm implementing Omega in the
crash-recovery failure model is efficient if
there is a time after which only one process
sends messages forever
An algorithm implementing Omega in the
crash-recovery failure model is near-efficient if
there is a time after which, among correct
processes, only one sends messages forever
Since the leader must send messages forever, an
efficient algorithm is also near-efficient
In a near-efficient algorithm, besides the
leader, unstable processes can send messages
forever

9
A Near-Efficient Algorithm

Assumptions on communication reliability/synchrony
(i) for every correct process p, there is an
eventually timely link from p to every correct
and every unstable process
(ii) for every unstable process u, there is a
fair lossy link from u to every correct process
Uses a set of candidates to become leader, and a
counter of the number of times that each process
has recovered
During initialization (and upon recovery), a
RECOVERED message is sent to the rest of
processes
The leader is set to the process in the set of
candidates with the smallest associated counter
If a process considers itself the leader, it
sends a LEADER message periodically to the rest
of processes

10
A Near-Efficient Algorithm
11
A Near-Efficient Algorithm
12
Unstable Processes Disagree

With this algorithm, eventually every correct
process always trusts the same correct process l.
Consequently, eventually among correct processes,
only one keeps sending LEADER messages (?)
Concerning the behavior of unstable processes
(1) upon recovery, they send a RECOVERED message
to the rest of processes
(2) initially they trust themselves, and they can
trust other unstable processes before trusting
process l (?)
We propose an adaptation that avoids (2)
initially they do not trust any process, and if
they remain up for sufficiently long then l
until they crash
the adaptation assumes a majority of correct
processes

13
Unstable Processes Disagree
?p4
?p4
?p4
?p2
?p4
?p2
14
Instability Awareness
15
Instability Awareness
16
Instability Awareness
?p4
?p4
?p4
?NULL
?p4
?p4
17
Instability Awareness

The proposed adaptation makes the algorithm no
longer near-efficient, since all correct
processes may send PONG messages forever (?)
Can we design an algorithm such that
processes do not have access to stable storage,
unstable processes eventually do not disagree,
and it is near-efficient?
Yes We Can! (?)

18
A Near-Efficient Algorithm
19
A Near-Efficient Algorithm
20
An Efficient Algorithm

Assumes that local stable storage is accessible
process recovery counter
leader identity
Assumption on communication reliability/synchrony
(i) for every correct process p, there is an
eventually timely link from p to every correct
and every unstable process
No need of RECOVERED messages
With this algorithm, eventually every process
that is up, either correct or unstable, always
trusts the same correct process l
assuming that every unstable process succeeds in
writing l definitely in its stable storage

21
Another Efficient Algorithm

Besides (i), assumes a non-decreasing local clock
at each process
The elected leader will be the oldest correct
process, i.e., the process that first recovers
definitely

22
Relaxing the Assumptions

Based on message relaying
Weaker assumptions on communication
reliability/synchrony
(i) for every correct process p, there is an
eventually timely path from p to every correct
and every unstable process
(ii) for every unstable process u, there is a
fair lossy link from u to some correct process
Algorithms are no longer (near-)efficient

23
The One Slide to Remember

The Omega failure detector provides an eventual
leader election functionality in a distributed
system
Theory weakest failure detector for solving
Consensus
Practice used by several real fault-tolerant
protocols
It is interesting to design efficient algorithms
implementing Omega
In the crash-recovery failure model, we have to
cope with unstable processes
to avoid them to send messages forever
to avoid disagreement with correct processes
Stable storage, if available, makes things easier

24
An Example Paxos

Leslie Lamport. The Part-Time Parliament. ACM
Transactions on Computer Systems, 1998. First
submitted in 1990!
Leader-based Consensus algorithms
Could benefit from efficient leader election
Production use of Paxos (from wikipedia)
Google Chubby distributed lock service
IBM SAN Volume Controller
Microsoft Autopilot cluster management service
WANdisco Distributed Coordination Engine
Scalien Keyspace

Write a Comment

User Comments (0)

About PowerShow.com

Efficient Eventual Leader Election in Crash-Recovery Systems PowerPoint PPT Presentation