LINF 2345 Leader election and consensus with crash and Byzantine failures - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

LINF 2345 Leader election and consensus with crash and Byzantine failures

Description:

The Leader Election Problem ... In fact we have seen an election algorithm in the previous section on arbitrary ... Impossibility Results ... – PowerPoint PPT presentation

Number of Views:194
Avg rating:3.0/5.0
Slides: 73
Provided by: seifha
Category:

less

Transcript and Presenter's Notes

Title: LINF 2345 Leader election and consensus with crash and Byzantine failures


1
LINF 2345Leader electionand consensus with
crashand Byzantine failures
  • Seif Haridi
  • Peter Van Roy

2
Overview
  • Synchronous systems with crash failures
  • Leader election in rings
  • Fault-tolerant consensus

3
Leader Electionin Rings
4
BackgroundRings
  • The ring topology is a circular arrangement of
    nodes used often as a control topology in
    distributed computations
  • A graph is regular if all nodes have the same
    degree
  • A ring is an undirected, connected, regular graph
    of degree 2
  • G is a ring, is equivalent to
  • There is a one-to-one mapping of V to 0,,n-1
    such that the neighbors of node i are nodes i-1
    and i1 (modulo n)

5
The Leader Election Problem
  • A situation where a group of processors must
    select a leader among them
  • Simplified coordination
  • Helpful in achieving fault tolerance
  • Coordinator in two/three phase commits
  • Represents a general class of symmetry breaking
    problems
  • Deadlock removals

6
The Leader Election Problem
  • An algorithm solves the leader election problem
    if
  • The terminated states are partitioned into
    elected and non-elected states
  • Once a processor enters an elected/non-elected
    state, its transition function will only move it
    to another (or the same) elected/non-elected
    state
  • In every admissible execution exactly one
    processor enters an elected state and all others
    enter a non-elected state.

7
The Leader Election ProblemRings
  • In fact we have seen an election algorithm in the
    previous section on arbitrary network topology
  • For rings
  • Edges go between pi and pi1 (addition modulo n),
    for all i, 0?i?n-1
  • Processors have a consistent notion of left
    (clockwise) and right (anti clockwise)

p0
2
1
2
1
p1
p2
1
2
Simple oriented ring
8
Anonymous Rings
  • A leader election algorithm for a ring is
    anonymous if
  • Every processor has the same state machine
  • Implies that processors do not have unique
    identifiers
  • An algorithm is uniform if does not use the value
    n, the number of processors
  • Otherwise the algorithm is nonuniform
  • For each size n there is a state machine, but it
    might be different for different size n

9
Anonymous RingsImpossibility Results
  • Main result
  • There is no anonymous leader election algorithm
    for ring systems
  • The result can be stated more comprehensively as
  • There is no nonuniform anonymous algorithm for
    leader election in synchronous rings
  • Impossibility results for synchronous systems
    implies the same impossibility results for
    asynchronous systems. Why?
  • Impossibility results for nonuniform implies the
    same for uniform. Why?

10
Anonymous RingsImpossibility Results
  • Impossibility results for synchronous systems
    implies the same impossibility results for
    asynchronous systems. Why?
  • Answer An admissible execution in SS is also an
    admissible execution in AS
  • Therefore there is always at least one admissible
    execution of any AS algorithm that does not
    satisfy the correctness condition of a leader
    election algorithm
  • Impossibility results for nonuniform implies the
    same for uniform. Why?
  • If there is a uniform algorithm, it could be used
    as a nonuniform algorithm

11
Asynchronous Rings
p0,0
  • Processors have unique identifiers which could be
    any natural numbers
  • For each pi, there is a variable idi initialized
    to the identifier of pi
  • We specify a ring by listing the processors
    starting from the one with the smallest
    identifier
  • Each processor pi, 0?i?n, is assigned idi

p1,10
p2,5
p3,97
12
Asynchronous RingsAn O(n2) Algorithm
  • Each processor sends a message with its id to its
    left neighbor, and waits for messages from its
    right neighbor
  • When a processor pi receives a message m, it
    checks the id of m
  • If m.id gt pi.id, pi forwards m to its own left
    neighbor
  • Otherwise the message is consumed
  • A processor pk that receives a message with its
    own id declares itself as a leader, and sends a
    termination message to its left neighbor
  • A processor that receives a termination message
    forwards it to the left, and terminates as a
    non-leader

13
Asynchronous RingsAn O(n2) Algorithm
  • The algorithm never sends more than O(n2)
    messages
  • O(n2) means c.n2 is an upper bound, for some
    constant c
  • The processor with the lowest id may forward n
    messages plus one termination message
  • There is an admissible execution in which the
    algorithm sends ?(n2) messages
  • ?(n2) means c1.n2 is an upper bound and c2.n2 is
    a lower bound for some constants c1 and c2

14
Asynchronous RingsAn O(n2) Algorithm
  • Example (an execution)
  • The message of processor with identifier i is
    sent exactly i1 times
  • n termination messages
  • Total is

2
1
0
n-1
n-2
15
Asynchronous RingsAn O(n log n) Algorithm
  • The k-neighborhood of a processor pi is the set
    of processors up to distance k from pi in the
    ring (to the left and to the right)
  • The algorithm operates in phases starting at 0
  • At the kth phase a processor tries to be the
    winner of that phase
  • To be a k-phase winner it must have the largest
    id in its 2k-neighborhood
  • Only winners of phase k continue to phase k1
  • At the end only one processor survives, and is
    elected as the leader

16
Asynchronous RingsAn O(n log n) Algorithm
  • In phase 0, each processor pi attempts to be a
    phase 0 winner
  • pi sends a ?probe, idi? message to its
    1-neighborhood
  • If id of a neighbor receiving the probe is
    greater that idi the message is swallowed
  • Otherwise the neighbor sends a reply message
  • If pi receives a reply message from both its
    neighbors, it becomes a phase-0 winner and
    continues with phase 1

17
Asynchronous RingsAn O(n log n) Algorithm
  • In phase k, each processor pi that is a
    (k-1)-phase winner sends probe messages to its
    k-neighborhood
  • Each message traverses 2k processors one by one
  • A probe is forwarded by a processor if its id is
    smaller than the probes id, or it is not the
    last processor

18
Asynchronous RingsAn O(n log n) Algorithm
  • If the probe is not swallowed by the last
    processor, it sends back a reply
  • If pi receives reply messages from both
    directions it becomes a k-phase winner and
    continues with phase k1
  • A processor that receives its own probe declares
    itself a leader and sends a termination message
    around the ring

19
Asynchronous RingsAn O(n log n) Algorithm
p1
p2
p3
p4
p5
p6
p7
p8
p9
p1
p1, p3, p5,p7 phase 0-winners
p1
p2
p3
p4
p5
p6
p7
p8
p9
p1
p1, p5 phase 1-winners
20
Asynchronous RingsAn O(n log n) Algorithm
Messages
  • ?probe, id, k, i?
  • ?reply, id, k?
  • id identifier of the processor
  • k integer, the phase number
  • i integer, a hop counter

21
Asynchronous RingsAn O(n log n) Algorithm
  • Initially asleep false
  • Upon receiving no message
  • if asleep then
  • asleep false
  • send ?probe, id, 0, 1? to left and right

22
Asynchronous RingsAn O(n log n) Algorithm
  • Initially asleep false
  • Upon receiving ?probe, j, k, d? from left (resp.
    right)
  • if j id then terminate as the leader
  • if j gt id and d lt 2k then
  • send ?probe, j, k, d1? to right (resp.
    left)
  • if j gt id and d 2k then
  • send ?reply, j, k? to left (resp. right)

23
Asynchronous RingsAn O(n log n) Algorithm
  • Initially asleep false
  • Upon receiving ?reply, j, k? from left (resp.
    right)
  • if j ? id then send ?reply, j, k? to right
    (resp. left)
  • else
  • if already received ?reply, j, k? from
    right (resp. left) then
  • send ?probe, id, k1,1? to left and
    right

24
Fault-Tolerant Consensus
25
Fault-Tolerance Consensus Overview
  • Study problems when a distributed system is
    unreliable
  • Processors behave incorrectly
  • The consensus problem
  • Requires processors to agree on common output
    based on their (possibly conflicting) inputs
  • Types of failures
  • Crash failure (a processor stops operating)
  • Byzantine failure (a processor behaves
    arbitrarily, also known as malicious failure)

26
Fault-Tolerance Consensus Overview
  • Synchronous systems
  • To solve consensus with Byzantine failure, less
    than a third of the processors may behave
    arbitrarily
  • We will show one algorithm in detail, which uses
    optimal number of rounds but has exponential
    message complexity
  • More sophisticated algorithms are possible, for
    example, an algorithm that has polynomial message
    complexity

27
Fault-Tolerance Consensus Overview
  • Asynchronous message passing systems
  • The consensus problem cannot be solved by
    deterministic algorithms, neither for crash nor
    Byzantine failures
  • This is a famous impossibility result first
    proved in 1985 by Fischer, Lynch, and Paterson
  • How do we get around this impossibility?
  • We can introduce a synchrony assumption or we can
    make the algorithm randomized (probabilistic).
  • Both solutions can be practical, but have their
    limitations

28
Synchronous Systems with Crash Failures
  • Assumptions
  • The communication graph is complete, i.e. a
    clique
  • Communication links are fully reliable
  • In the reliable SS
  • An execution consists of rounds
  • Each round consists of delivery of all messages
    pending in outbuf variables, followed by one
    computation step by each processor

29
Synchronous Systems with Crash Failures
  • An f-resilient system
  • A system where f processors can fail
  • Execution in an f-resilient system
  • There exist a subset F of at most f processors,
    the faulty processors (different for different
    executions)
  • Each round contains exactly one computation event
    for every processor not in F, and at most one
    computation event for every processor in F

30
Synchronous Systems with Crash Failures
  • Execution in an f-resilient system
  • Each round contains exactly one computation event
    for every processor not in F, and at most one
    computation event for every processor in F
  • If a processor in F does not have a computation
    event in some round, then it has no computation
    event in any subsequent round
  • In the last round in which a faulty processor has
    a computation event, an arbitrary subset of its
    outgoing messages are delivered

31
Synchronous Systems with Crash Failures
  • Clean failure
  • A situation where all or none of the processors
    messages are delivered in its last step
  • Consensus is easy and efficient for clean failure
  • We have to deal with non-clean failure
  • As we shall see, this is what makes the algorithm
    expensive

32
The Consensus Problem
  • Each pi has a special component xi, called the
    input, and yi, called the output
  • Initially
  • Each xi contains a value from some well-ordered
    set
  • yi is undefined
  • Solution to the consensus problem must satisfy
    the following conditions
  • Termination
  • Agreement
  • Validity

33
The Consensus Problem
  • Termination
  • In every admissible execution, yi is eventually
    assigned a value, for every nonfaulty processor
    pi
  • Agreement
  • In every execution, if yi and yj are assigned,
    then yi yj, for all nonfaulty processors pi and
    pj
  • Validity
  • In every execution, if yi is assigned v for some
    value v on a nonfaulty processor pi, then there
    exists a processor pj such that xjv

34
Simple Algorithm
  • Needs f1 rounds
  • Every processor maintains a set of values it
    knows to exist in the system
  • Initially this set contains only its input value
  • In later rounds
  • A processor updates its set by adding new values
    received from other processors
  • And broadcasts any new additions
  • At round f1 each processor decides on the
    smallest value in its set

35
Simple AlgorithmConsensus in the Presence of
Crash Failure
  • Initially V x
  • Round k, 1? k ? f1
  • send v ? V pi has not already sent v to
    all processors
  • receive Sj from pj, 0 ? j ? n-1, j ? i
  • if k f 1 then y min(V)

36
Illustration of the Algorithmf 3
  • The algorithm requires f1 rounds, and tolerates
    f crash failures

Round 4
Round 3
Round 2
Round 1
p0 p1 p2 p3 p4
x
x
x
x
37
Illustration of the Algorithmf 3
Round 4
Round 3
Round 2
Round 1
  • p2 and p4 survive
  • Others crash one at a time
  • p2 and p4 have the value x

p0 p1 p2 p3 p4
x
x
x
x
38
How the algorithm works
  • Why is one round not enough?
  • Hint non-clean failures!
  • In the previous slides, the value x is sent
    across only one link instead of all links,
    because the processor has a non-clean failure
  • We need enough rounds to cover the possibility of
    a non-clean failure in each round

39
Synchronous Systems with Byzantine Failures
  • We want to reach an agreement in spite of
    malicious processors
  • In an execution of an f-resilient Byzantine
    system, there is at most a subset of f processors
    which are faulty
  • In a computation step of a faulty processor, its
    state and the messages sent are completely
    unconstrained
  • A faulty processor may also mimic the behavior of
    a crashed processor

40
The Consensus Problem
  • Termination
  • In every admissible execution, yi is eventually
    assigned a value, for every nonfaulty processor
    pi
  • Agreement
  • In every execution, if yi and yj are assigned,
    then yi yj, for all nonfaulty processors pi and
    pj
  • Validity
  • In every execution, if yi is assigned v for some
    value v on a nonfaulty processor pi, then there
    exists a processor pj such that xjv

41
Lower Bounds on the number of Faulty Processors
  • If a third or more processors can be Byzantine
    then consensus cannot be reached

42
Lower Bounds on the number of Faulty Processors
  • If a third or more processors can be Byzantine
    then consensus cannot be reached
  • In a system with three processors such that one
    is Byzantine, there is no algorithm that solves
    the consensus problem

43
Three Processor system
2
  • Assume that there is a 3-processor Algorithm A
    that solves the Byzantine agreement problem if
    one is faulty
  • Take two copies of A and configure them into a
    hexagonal system S

1
3
44
Three Processor System
2
2
3
0
0
1
A
S
1
0
1
1
3
1
1
2
3
  • Input value for processors 1,2, and 3 is 0
  • Input value for processors 1,2, and 3 is 1

45
Three Processor System
2
  • S is a synchronous system, each processor is
    running its algorithm in the triangle system A
  • Each processor in S knows its neighbors and it is
    unaware of other nodes
  • We expect S to exhibit a well defined behavior
    with its input
  • Observe S does not solve the consensus problem
  • Call the resulting execution ? (infinite
    synchronous execution)

3
0
0
1
S
1
0
1
1
1
2
3
46
Execution ? from the point of view of processors
2 and 3
2
3
2
0
0
0
?
1
1
0
?1
1
0
1
3
1
1
2
3
  • Processors 2 and 3 see 1 as faulty, and since A
    is a consensus algorithm they both decide on 0 in
    execution ? of S

47
Execution ? from the point of view of processors
1 and 2
2
3
2
0
0
1
?
1
1
0
?2
1
1
1
3
1
1
2
3
  • Processors 1 and 2 see 3 as faulty, and since A
    is a consensus algorithm they both decide on 1 in
    execution ? of S

48
Execution ? from the point of view of processors
1 and 3
2
2
3
0
0
?3
?
1
1
0
1
0
1
1
3
1
  • Processors 1 and 3 see 2 as faulty, and since A
    is a consensus algorithm they both must decide
    on one output value in execution ? of S
  • This is not possible since they already decided
    differently
  • A contradiction! Therefore A does not exist

1
2
3
49
Consensus Algorithm 1
  • Takes exactly f1 rounds
  • Requires n ? 3f1
  • The algorithm has two stages
  • First, information is gathered by communication
    among processors
  • Second, each processor computes locally its
    decision value

50
Information Gathering Phase
  • Information in each processor is represented as a
    tree, in which each path from the root to leaf
    contains f2 nodes, (height f1)
  • Nodes are labeled by sequences of processor
    names
  • The root is labeled by the empty sequence
  • Let the label of the internal node v be (i1, i2,
    , ir), then for each i, 0?i?n-1, not in the
    sequence, v has a child labeled by (i1, i2, ,
    ir, i)

51
Information Gathering Phasen4, f1
  • xi is the value of pi
  • At () is the value x of this processor
  • At (j) the value of pj given by pj
  • At (j,k) the value of processor j given by
    processor k (opinion of pk about xj)
  • For example, (1,2) is the opinion of p2 about the
    value x1
  • (1,2,0), the opinion of p0 about the value of x1
    given to p0 by p2

()
(2)
(3)
(0)
(1)
(1,0)
(1,2)
(1,3)
52
Information Gathering Phasen4, f1
  • In the first round, each processor sends its
    initial value to all processors, including itself
  • When pi receives a value x from pj it stores x at
    the node labeled j

()
(2)
(3)
(0)
(1)
(1,0)
(1,2)
(1,3)
53
Information Gathering Phasen4, f1
  • At the beginning of round r, each processor
    broadcasts the rth level of its tree
  • When a processor receives a message from pj, with
    the value labeled (i1,,ir), it stores it at the
    node labeled (i1,,ir,j) in its tree

54
Information Gathering Phasen4, f1
()
Level 1
(0)
(2)
(3)
(1)
Level 2
(0,2)
(1,2)
(3,2)
Level 3
  • A processor at level 3 stores the opinion of p2
    at level 2

55
Information Gathering Phasen4, f1
  • pi stores at (i1,,ir,j) that value thatpj
    says that ir says that that i1 says it has
  • We denote this value by tree(i1,,ir,j)
  • Information gathering continues f1 until the
    entire tree has been filled
  • The function resolve is applied on the tree
    locally (a majority voting function)

56
Information Gathering Phasen4, f1, resolve
  • function resolve(tTree) if t is a leaf then
    value(t) else Majority(?resolve(ti) ti is a
    child of t?)
  • Majority takes a list of decisions (values) and
    returns the most popular one (one that occurs
    most frequently), otherwise a default v

57
Information Gathering Phasen4, f1, Resolve
()
(0)
(2)
(3)
(1)
(0,1)
(0,2)
(1,2)
(3,2)
(0,3)
(1,0)
(1,3)
(2,0)
(2,1)
(2,3)
(3,0)
(3,1)
58
Information Gathering Phasen4, f1, p0 is
malicious, p3
()
1
(0)
(2)
(3)
(1)
0
1
1
1
0
0
0
1
0
1
1
1
1
1
1
0
(0,1)
(0,2)
(1,2)
(3,2)
(0,3)
(1,0)
(1,3)
(2,0)
(2,1)
(2,3)
(3,0)
(3,1)
59
Information Gathering Phasen3, f1, Resolve, p2
()
(0)
(2)
(1)
0
?
1
0
0
0
1
1
1
(0,1)
(0,2)
(1,2)
(1,0)
(2,0)
(2,1)
60
Consensus Algorithms for Byzantine Failures
  • Minimum number of rounds is f 1, since crash
    failures are a special case of Byzantine failures

61
Exponential Tree Algorithm
  • Each processor maintains a tree data structure in
    its local state.
  • Each node of the tree is labeled with a sequence
    of processor indices with no repeats
  • roots label is empty sequence ? (root has level
    0)
  • root has n children labeled 0 through n 1
  • child node labeled i has n 1 children labeled
    i 0, through i n, skipping i i

62
Exponential Tree Algorithm
  • roots label is empty sequence ? (root has level
    0)
  • root has n children labeled 0 through n 1
  • child node labeled i has n 1 children labeled
    i 0, through i n, skipping i I
  • in general, node at level d with label v has n
    d, children labeled v 0 through v n, skipping
    any index appearing in v
  • nodes at level f 1 are the leaves.

63
The tree when n 4 and f 1
64
The Exponential Tree Algorithm
  • Each processor fills in the tree nodes with
    values as the rounds go by
  • initially, store your input in the root (level 0)
  • round 1 send level 0 of your tree (the root)
    store value received from pj in node j (level 1)
    (default if none)
  • round 2 send level 1 of your tree store value
    received from pj for node k in node k j (level
    2) (the value that pj told me that pk told pj)
    (default if none)
  • continue for f 1 rounds

65
The Exponential Tree Algorithm
  • In the last round, each processor uses the values
    in its tree to compute its decision. The decision
    is resolve(?), where resolve(?) equals
  • The value in tree node labeled ? if it is a leaf
  • Majority resolve(?) ? is a child of ?
    otherwise (default if none)

66
Proof of Exponential Tree Algorithm
  • Lemma (5.10) Nonfaulty processor pis resolved
    value for node ? ?j, what pj reports for ?,
    equals what pj has stored for ?
  • Basis ? is a leaf. Then pi stores in node ? what
    pj sends it for ? in the last round
  • For leaves, the resolved value is the tree value.

67
Proof of Exponential Tree Algorithm
  • Induction ? is not a leaf.
  • By tree definition, has at least n f children
  • Since n gt 3f, ? has majority of nonfaulty
    children.
  • Let ?k be a child of ? such that pk is nonfaulty.
  • Since pj is nonfaulty, pj correctly reports to pk
    that it has some value v in node ? thus pk
    stores v in node ? ?j.
  • By induction, pis resolved value for ?k equals
    the value v that pk has in its tree node ?
  • So all of ?s nonfaulty children resolve to v in
    pis tree, and thus resolves to v in pis tree.

68
The Exponential Tree Algorithm
69
The Exponential Tree Algorithm
  • Validity Suppose all inputs are v.
  • Nonfaulty processor pi decides on resolve(?),
    which is the majority among resolve(j), 0 ? j ?
    n-1
  • The previous lemma implies that for each
    nonfaulty pj, resolve(j) is the value stored at
    the root of pjs tree, which is pjs input v.
  • Thus pi decides v.

70
The Exponential Tree Algorithm
  • Agreement Show that all nonfaulty processors
    resolve the same value for their tree roots
  • A node is common if all nonfaulty processors
    resolve the same value for it. We will show the
    root is common
  • Strategy
  • Show that every node with a certain property is
    common.
  • Show that the root has the property.

71
The Exponential Tree Algorithm
  • If every ?-to-leaf path has a common node, then ?
    is common.

72
The Exponential Tree Algorithm
  • Show every root-to-leaf path has a common node.
  • There are f 2 nodes on a root-to-leaf path.
  • The label of each non-root node on a
    root-to-leaf path ends in a distinct processor
    index i1, i2, , if1.
  • At least one of these indices is that of a
    nonfaulty processor, say ik.
  • Lemma 5.10 implies that the node whose label
    ends in ik is common.
Write a Comment
User Comments (0)
About PowerShow.com