CMPT 401 2008 - PowerPoint PPT Presentation

Loading...

PPT – CMPT 401 2008 PowerPoint presentation | free to download - id: b2ce6-YzMyN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

CMPT 401 2008

Description:

CMPT 401 2008. Dr. Alexandra Fedorova. Lecture IX: Coordination ... Election algorithms are used when a unique process must be chosen to play a particular role: ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 50
Provided by: sashafe
Category:
Tags: cmpt

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: CMPT 401 2008


1
Lecture IX Coordination And Agreement
  • CMPT 401 2008
  • Dr. Alexandra Fedorova

2
A Replicated Service
servers
client
network
slave
W
R
master
W
W
R
slave
client
data replication
write
read
W
W
R
3
A Need For Coordination And Agreement
servers
client
network
slave
Must coordinate election of a new master
master
Must agree on a new master
slave
client
4
Roadmap
  • Today we will discuss protocols for coordination
    and agreement
  • This is a difficult problem because of failures
    and lack of bound on message delay
  • We will begin with a strong set of assumptions
    (assume few failures), and then we will relax
    those assumptions
  • We will look at several problems requiring
    communication and agreement distributed mutual
    exclusion, election
  • We will finally learn that in an asynchronous
    distributed system it is impossible to reach a
    consensus

5
Distributed Mutual Exclusion (DMTX)
  • Similar to a local mutual exclusion problem
  • Processes in a distributed system share a
    resource
  • Only one process can access a resource at a time
  • Examples
  • File sharing
  • Sharing a bank account
  • Updating a shared database

6
Assumptions and Requirements
  • An asynchronous system
  • Processes do not fail
  • Message delivery is reliable (exactly once)
  • Protocol requirements
  • Safety At most one process may execute in the
    critical section at a time
  • Liveness Requests to enter and exit the
    critical section eventually succeed
  • Fairness Requests to enter the critical section
    are granted in the order in which they were
    received

7
Evaluation Criteria of DMTX Algorithms
  • Bandwidth consumed
  • proportional to the number of messages sent in
    each entry and exit operation
  • Client delay
  • delay incurred by a process and each entry and
    exit operation
  • System throughput
  • the rate at which processes can access the
    critical section (number of accesses per unit of
    time)

8
DMTX Algorithms
  • We will consider the following algorithms
  • Central server algorithm
  • Ring-based algorithm
  • An algorithm based on voting

9
The Central Server Algorithm
10
The Central Server Algorithm
  • Performance
  • Entering a critical section takes two messages (a
    request message followed by a grant message)
  • System throughput is limited by the
    synchronization delay at the server the time
    between the release message to the server and the
    grant message to the next client)
  • Fault tolerance
  • Does not tolerate failures
  • What if the client holding the token fails?

11
A Ring-Based Algorithm
12
A Ring-Based Algorithm (cont)
  • Processes are arranged in the ring
  • There is a communication channel from process pi
    to process (pi1) mod N
  • They continuously pass the mutual exclusion token
    around the ring
  • A process that does not need to enter the
    critical section (CS) passes the token along
  • A process that needs to enter the CS retains the
    token once it exits the CS, it keeps on passing
    the token
  • No fault tolerance
  • Excessive bandwidth consumption

13
Maekawas Voting Algorithm
  • To enter a critical section a process must
    receive a permission from a subset of its peers
  • Processes are organized in voting sets
  • A process is a member of M voting sets
  • All voting sets are of equal size (for fairness)

14
Maekawas Voting Algorithm
  • Intersection of voting sets guarantees mutual
    exclusion
  • To avoid deadlock, requests to enter critical
    section must be ordered

p4
p1
p3
p2
15
Elections
  • Election algorithms are used when a unique
    process must be chosen to play a particular role
  • Master in a master-slave replication system
  • Central server in the DMTX protocol
  • We will look at the bully election algorithm
  • The bully algorithm tolerates failstop failures
  • But it works only in a synchronous system with
    reliable messaging

16
The Bully Election Algorithm
  • All processes are assigned identifiers
  • The system always elects a coordinator with the
    highest identifier
  • Each process must know all processes with higher
    identifiers than its own
  • Three types of messages
  • election a process begins an election
  • answer a process acknowledges the election
    message
  • coordinator an announcement of the identity of
    the elected process

17
The Bully Election Algorithm (cont.)
  • Initiation of election
  • Process p1 detects that the existing coordinator
    p4 has crashed an initiates the election
  • p1 sends an election messages to all processes
    with higher identifier than itself

p1
p2
p3
p4
18
The Bully Election Algorithm (cont.)
  • What happens if there are no crashes
  • p2 and p3 receive the election message from p1
    send back the answer message to p1 , and begin
    their own elections
  • p3 sends answer to p2
  • p3 receives no answer message from p4, so after a
    timeout it elects itself as a leader (knowing it
    has the highest ID)

coordinator
coordinator
p1
p2
p3
p4
19
The Bully Election Algorithm (cont.)
  • What happens if p3 also crashes after sending the
    answer message but before sending the coordinator
    message?
  • In that case, p2 will time out while waiting for
    coordinator message and will start a new election

p1
p2
p3
p4
p2
20
The Bully Election Algorithm (summary)
  • The algorithm does not require a central server
  • Does not require knowing identities of all the
    processes
  • Does require knowing identities of processes with
    higher IDs
  • Survives crashes
  • Assumes a synchronous system (relies on timeouts)

21
Consensus in Asynchronous Systems With Failures
  • The algorithms weve covered have limitations
  • Either tolerate only limited failures (failstop)
  • Or assume a synchronous system
  • Consensus is impossible to achieve in an
    asynchronous system
  • Next we will see why…

22
Consensus
  • All processes agree on the same value (or set of
    values)
  • When do you need consensus?
  • Leader (master) election
  • Mutual exclusion
  • Transaction involving multiple parties (banking)
  • We will look at several variants of consensus
    problem
  • Consensus
  • Byzantine generals
  • Interactive consensus

23
System Model
  • There is a set of processes Pi
  • There is a set of values v0, …, vN-1 proposed
    by processes
  • Each processes Pi decides on di
  • di belongs to the set v0, …, vN-1
  • Assumptions
  • Synchronous system (for now)
  • Failstop failures
  • Byzantine failures
  • Reliable channels

24
Consensus
P1
P1
v1
d1
v3
v2
d2
d3
P2
P3
P2
Step 1 Propose.
P3
Step 2 Decide.
Courtesy of Jeff Chase, Duke University
25
Consensus (C)
di vk
Pi selects di from v0, …, vN-1. All Pi select
the same vk (make the same decision)
Courtesy of Jeff Chase, Duke University
26
Conditions for Consensus
  • Termination All correct processes eventually
    decide.
  • Agreement All correct processes select the same
    di.
  • Integrity If all correct processes propose the
    same v, then di v

27
Byzantine Generals Problem (BG)
leader or commander
vleader
subordinate or lieutenant
di vleader
dj vleader
  • Two types of generals commander and subordinates
  • A commander proposes an action (vi).
  • Subordinates must agree

Courtesy of Jeff Chase, Duke University
28
Conditions for Consensus
  • Termination All correct processes eventually
    decide.
  • Agreement All correct processes select the same
    di.
  • Integrity If the commander is correct than all
    correct processes decide on the value that the
    commander proposed

29
Interactive Consistency (IC)
di v0 , …, vN-1
  • Each Pi proposes a value vi
  • Pi selects di v0 , …, vN-1 vector reflecting
    the values proposed by all correct participants.
  • All Pi must decide on the same vector

30
Conditions for Consensus
  • Termination All correct processes eventually
    decide.
  • Agreement The decision vector of all correct
    processes is the same
  • Integrity If Pi is correct then all correct
    processes decide on vi as the ith component of
    their vector

31
Equivalence of IC and BG
  • We will show that BG is equivalent to IC
  • If there is solution to one, there is solution to
    another
  • Notation
  • BGi(j, v) returns the decision value of pi when
    the commander pj proposed v
  • ICi (v1, v2, …., vN)j returns the jth value in
    the decision vector of pi in the solution to IC,
    where v1, v2, …., vN are the values that the
    processes proposed
  • Our goal is to find solution to IC given a
    solution to BG

32
Equivalence of IC and BG
  • We run the BG problem N times
  • Each time the commander pj proposes a value v
  • Recall that in IC each process proposes a value
  • After each run of BG problem we record BGi(j, v)
    for all i that is what each process decided
    when the pj proposed v
  • Similarity with IC we record what each pi
    decided for vector position j
  • We need to record decisions for N vector
    positions, so we run the problem N times

33
Equivalence of IC and BG
?
?
?
Initialization
Empty decision vectors
?
?
?
?
?
?
Run 1
Run 2
Run 3
P0 proposes v0 We record d0 for all p
P1 proposes v1 We record d1 for all p
P2 proposes v2 We record d2 for all p
d0
?
?
d0
d1
?
d0
d1
d2
d0
?
?
d0
d1
?
d0
d1
d2
d0
?
?
d0
d1
?
d0
d1
d2
34
Consensus in a Synchronous System Without Failures
  • Each process pi proposes a decision value vi
  • All proposed vi are sent around, such that each
    process knows all proposed vi
  • Once all processes receive all proposed vs, they
    apply to them the same function, such as
    minimum(v1, v2, …., vN)
  • Each process pi sets di minimum(v1, v2, …., vN)
  • The consensus is reached
  • What if processes fail? Can other processes still
    reach an agreement?

35
Consensus in a Synchronous System With Failstop
Failures
  • We assume that at most f out of N processes fail
  • To reach a consensus despite f failures, we must
    extend the algorithm to take f1 rounds
  • At round 1 each process pi sends its proposed vi
    to all other processes and receives vs from
    other processes
  • At each subsequent round process pi sends vs
    that it has not sent before and receives new vs
  • The algorithm terminates after f1 rounds
  • Lets see why it works…

36
Consensus in a Synchronous System With Failstop
Failures Proof
  • Will prove by contradiction
  • Suppose some correct process pi possesses a value
    that another correct process pj does not possess
  • This must have happened because some other
    processes pk sent that value to pi but crashed
    before sending it to pj
  • The crash must have happened in round f1 (last
    round). Otherwise, pi would have sent that value
    to pj in round f1
  • How come pj have not received that value in any
    of the previous rounds?
  • If at every round there was a crash some
    process sent the value to some other processes,
    but crashed before sending it to pj
  • But this implies that there must have been f1
    crashes
  • This is a contradiction we assumed at most f
    failures

37
Consensus in a Synchronous System Discussion
  • Can this algorithm withstand other types of
    failures omission failures, byzantine failures?
  • Let us look at consensus in presence of byzantine
    failures

Processes separated by network partition each
group can agree on a separate value
38
Consensus in a Synchronous System With Byzantine
Failures
  • Byzantine failure a process can forward to
    another process an arbitrary value v
  • Byzantine generals the commander says to one
    lieutenant that v A, says to another lieutenant
    that v B
  • We will show that consensus is impossible with
    only 3 generals
  • Pease et. al generalized this to impossibility of
    consensus with N3f faulty generals

39
BG Impossibility With Three General
Scenario 1
Scenario 2
31u means 3 says 1 says u.
  • Scenario 1 p2 must decide v (by integrity
    condition)
  • But p2 cannot distinguish between Scenario 1 and
    Scenario 2, so it will decide w in Scenario 2
  • By symmetry, p3 will decide x in Scenario 2
  • p2 and p3 will have reached different decisions

40
Solution With Four Byzantine Generals
  • We can reach consensus if there are 4 generals
    and at most 1 is faulty
  • Intuition use the majority rule

Who is telling the truth?
Majority rules!
Correct process
41
Solution With Four Byzantine Generals
Round 1 The commander sends v to all other
generals Round 2 All generals exchange values
that they sent to commander The decision is made
based on majority
42
Solution With Four Byzantine Generals
p
p2 receives v, v, u. Decides v p4 receives
v, v, w. Decides v
(Commander)
1
1v
1v
1v
21v
p
p
31u
2
3
41v
41v
21v
31w
p
4
43
Solution With Four Byzantine Generals
p
(Commander)
p2 receives u, w, v. Decides NULL p4 receives
u, v, w. Decides NULL p3 receives w, u, v.
Decides NULL
1
1w
1u
1v
21u
p
p
31w
2
3
41v
41v
21u
31w
p
4
The result generalizes for system with N 3f
1, (N is the number of processes, f is the
number of faulty processes)
44
Consensus in an Asynchronous System
  • In the algorithms weve looked at consensus has
    been reached by using several rounds of
    communication
  • The systems were synchronous, so each round
    always terminated
  • If a process has not received a message from
    another process in a given round, it could assume
    that the process is faulty
  • In an asynchronous system this assumption cannot
    be made!
  • Fischer-Lynch-Patterson (1985) No consensus can
    be guaranteed in an asynchronous communication
    system in the presence of any failures.
  • Intuition a failed process may just be slow,
    and can rise from the dead at exactly the wrong
    time.

45
Consensus in Practice
  • Real distributed systems are by and large
    asynchronous
  • How do they operate if consensus cannot be
    reached?
  • Fault masking assume that failed processes
    always recover, and define a way to reintegrate
    them into the group.
  • If you havent heard from a process, just keep
    waiting…
  • A round terminates when every expected message is
    received.
  • Failure detectors construct a failure detector
    that can determine if a process has failed.
  • A round terminates when every expected message is
    received, or the failure detector reports that
    its sender has failed.

46
Fault Masking
  • In a distributed system, a recovered nodes state
    must also be consistent with the states of other
    nodes.
  • Transaction processing systems record state to
    persistent storage, so they can recover after
    crash and continue as normal
  • What if a node has crashed before important state
    has been recorded on disk?
  • A functioning node may need to respond to a
    peers recovery.
  • rebuild the state of the recovering node, and/or
  • discard local state, and/or
  • abort/restart operations/interactions in progress
  • e.g., two-phase commit protocol

47
Failure Detectors
  • First problem how to detect that a member has
    failed?
  • pings, timeouts, beacons, heartbeats
  • recovery notifications
  • Is the failure detector accurate? Does it
    accurately detect failures?
  • Is the failure detector live? Are there bounds
    on failure detection time?
  • In an asynchronous system, it impossible for a
    failure detector to be both accurate and live

48
Failure Detectors in Real Systems
  • Use a failure detector that is live but not
    accurate.
  • Assume bounded processing delays and delivery
    times.
  • Timeout with multiple retries detects failure
    accurately with high probability. Tune it to
    observed latencies.
  • If a failed site turns out to be alive, then
    restore it or kill it (fencing, fail-silent).
  • What do we assume about communication failures?
  • How much pinging is enough?
  • Tune parameters for your system can you predict
    how your system will behave under pressure?
  • Thats why distributed system engineers often
    participate in multi-day support calls…
  • What about network partitions?
  • Processes form two independent groups, reach
    consensus independently. Rely on quorum.

49
Summary
  • Coordination and agreement are essential in real
    distributed systems
  • Real distributed systems are asynchronous
  • Consensus cannot be reached in an asynchronous
    distributed system
  • Nevertheless, people still build useful
    distributed systems that rely on consensus
  • Fault recovery and masking are used as mechanisms
    for helping processes reach consensus
  • Popular fault masking and recovery techniques are
    transactions and replication the topics of the
    next few lectures
About PowerShow.com