Principles of Reliable Distributed Systems Recitation 11: State Machine Replication with Paxos - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Principles of Reliable Distributed Systems Recitation 11: State Machine Replication with Paxos

Description:

x.write1(0), x.write2(1), x.ack1, x.read1, x.ack2, x.ret1(0), x.read2, x.ret2(1) ... x.write1(1), y.write2(1), y.ack2, x.ack1, y.read1, x.read2, x.ret1(), y.ret2 ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 20
Provided by: idi3
Category:

less

Transcript and Presenter's Notes

Title: Principles of Reliable Distributed Systems Recitation 11: State Machine Replication with Paxos


1
Principles of Reliable Distributed Systems
Recitation 11State Machine Replication with
Paxos Sequential Consistency
  • Spring 2009
  • Alex Shraer

2
Replicated State Machines
  • Data is replicated at n servers.
  • Operations are initiated by clients.
  • Operations need to be performed at all correct
    servers in the same order.
  • Goal ensure that all the copies are the same
    after the ith operation.

3
Client-Server Interaction
  • Leader-based each process (client/server) has an
    estimate of who is the current leader.
  • A client sends a request to its current leader.
  • The leader sends the response to the client.

4
Sequence of Paxos Instances
  • A sequence of separate instances of Paxos.
  • The value chosen by instance i is the ith
    operation.
  • Clients send operations to the current leader.
  • The leader decides where in the sequence each
    operation should appear.
  • If the leader decides that a certain operation
    should appear as the 135th operation, it tries to
    have that operation as the value of the 135th
    instance of Paxos.

5
Safety and Liveness
  • Reasons for leader proposals failures
  • A leader fails
  • A different node believes it is a leader
  • Safety is always preserved (worst case)
  • Performance can be optimized during non-faulty
    periods

6
Replication with Fast Paxos
  • Non-optimized version
  • New leader learns the entire history
  • Observations
  • No value is chosen until phase 2 of Paxos.
  • At the end of phase 1, either the value to be
    proposed is determined, or else the proposer is
    free to choose any value.

7
Normal Operation
  • Normal operation the previous leader has just
    failed and a new one has been selected.
  • The new leader knows most of the operations that
    have already been chosen
  • Since it participated in the protocol before it
    became a leader
  • Suppose it knows operations 1-134, 138 and 139.

8
Normal Operation (contd)
  • The leader executes phase 1 of instances 135-137
    and of all instances gt 139.
  • Suppose the outcome of this phase determines the
    value to be proposed in instances 135 and 140,
    and is unconstrained in the other instances.
  • The new leader now executes phase 2 for instance
    135 and 140 (Why does it have to?)

9
Normal Operation (contd)
  • Every server knows commands 1-135 ? the leader
    can execute them.
  • Cannot execute commands 138-140 before 136 and
    137
  • Two options
  • Use the next two client requests as commands 136
    and 137.
  • Fill the gap using no-op operations
  • Which one is better?

10
Normal Operation (contd)
  • Operations 1-140 have now been chosen, and all
    servers can execute them.
  • The leader also completed phase 1 for instances gt
    140.
  • Can start working in express mode
  • Can propose any value in phase 2 of these
    instances immediately

11
How can gaps occur?
  • The leader can propose operation 142 before it
    knows its proposed 141 operation is chosen.
  • Bad scenario
  • All messages it sent proposing operation 141 are
    lost and operation 142 is chosen before any
    server learns about operation 141.
  • The leader fails before 141 is chosen.

12
Phase 1 for infinity?
  • A new leader executes phase 1 for infinitely many
    instances of Paxos. (135-137 and all instances gt
    139).
  • Uses the same BallotNum for all of the instances.
  • A response to a prepare message needs to include
    a value only for the instances for which it
    already accepted a value (in phase 2). In the
    example 135 and 140.
  • the servers can respond with a "reasonably short"
    message

13
Abnormal Operation
  • We assumed that there is a single leader.
  • Only phase 2 can be executed for each instance.
  • What happens if that is not the case?
  • Safety is preserved (why?).
  • A single leader is needed for liveness.

14
Sequential-Linearization
  • A Sequential-linearization ? of a concurrent
    execution ? is
  • A sequential execution
  • Each invocation is immediately followed by its
    response
  • Satisfies the objects sequential specification
  • Looks like ?
  • Responses to all invocations are the same as in ?
  • Responses to pending invocations in ? may be
    added
  • Preserves local real-time order
  • If the completion for operation o1 at process pi
    occurs in ? before the invocation for operation
    o2 at node pi, then o1 appears before o2 in ?
  • Can be written as ?i ?i for all i

14
15
Sequential Consistency
  • A concurrent execution that has a
    sequential-linearization is sequentially
    consistent
  • Whats different from linearizability?

15
16
Sequential Consistency
  • A concurrent execution that has a
    sequential-linearization is sequentially
    consistent
  • What is the difference from linearizability?
  • Both linearizability and sequentially consistency
    are strong consistency conditions all
    processes must agree on the order in which all
    operations occur

16
17
Some notations
  • x.writei(v) invocation by process pi of a write
    operation with value v to register x
  • x.acki completion of write operation to
    register x by process pi
  • x.readi invocation by process pi of a read
    operation from register x
  • x.reti(v) completion of read operation from
    register x by process pi, with v being the
    returned value

17
18
Sequentially consistent local-writes algorithm
  • the algorithm emulates sequentially-consistent
    shared register using message-passing
  • abcast and adeliver reliable atomic broadcast
  • xi is the local copy of the shared register x at
    pi
  • upon x.readi
  • if num0 then
  • invoke x.reti(xi)
  • upon x.writei(v)
  • num ? num1
  • abcast(?"write", x, v?)
  • invoke x.acki
  • upon adeliveri(j, ?"write", x, v?)
  • xi ? v
  • if (i j) then
  • num ? num 1
  • if num 0 and a read on x is pending then
  • invoke x.reti(xi)

The algorithm is taken from Attiya Book (second
edition), page 197
18
19
Question 2
  • For each of the following executions, determine
    whether it is linearizable, sequentially-consisten
    t, or neither, and explain (assume that the
    initial value in all register is ??)
  • x.write1(0), x.write2(1), x.ack1, x.read1,
    x.ack2, x.ret1(0), x.read2, x.ret2(1).
  • x.write1(1), x.ack1, x.read2, x.ret2(-), x.read2,
    x.ret2(1)
  • x.write1(0), x.write2(1), x.ack1, x.ack2,
    x.read1, x.read2, x.ret1(0), x.ret2(1).
  • x.write1(1), x.ack1, x.write3(2), x.ack3,
    x.read4, x.read2, x.ret2(1), x.ret4(2), x.read4,
    x.ret4(1), x.read2, x.ret2(2).
  • x.write1(1), y.write2(1), y.ack2, x.ack1,
    y.read1, x.read2, x.ret1(-), y.ret2(-)
  • Hint it always helps to draw the execution as in
    the lectures, and your explanation should use the
    requirements made by the definition

19
Write a Comment
User Comments (0)
About PowerShow.com