Replication - PowerPoint PPT Presentation

About This Presentation
Title:

Replication

Description:

Replication Improves reliability Improves availability (What good is a reliable system if it is not available?) Replication must be transparent and create the ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 48
Provided by: Sukuma6
Category:

less

Transcript and Presenter's Notes

Title: Replication


1
Replication
  • Improves reliability
  • Improves availability
  • (What good is a reliable system if it is not
    available?)
  • Replication must be transparent and create the
    illusion of a single copy.

2
Updating replicated data
shared
Separate replicas
F
F
F
Alice
Bob
Bob
Alice
Update and consistency are primary issues.
3
Passive replication
  • Each client communicates with one
  • replica called the primary server
  • Each client maintains a variable L
  • (leader) that specifies the replica to
  • which it will send requests. Requests
  • are queued at the primary server.
  • Backup servers ignore client requests.

4
L3
1
3
L3
primary
2
clients
backup
4
Primary-backup protocol
  • Receive. Receive the request from the client and
    update the state if appropriate.
  • Broadcast. Broadcast an update of the state to
    all other replicas.
  • Reply. Send a response to the client.

client
req
reply
primary
update
backup
5
Primary-backup protocol
  • If the client fails to get a response due
  • to the crash of the primary, then the
  • request is retransmitted until a
  • backup is promoted as the primary.
  • The switch should ideally be
  • Instantaneous, but practically
  • it is not so
  • Failover time is the duration when
  • there is no primary server.

New primary elected
client
req
reply
primary
update
?
heartbeat
backup
election
6
Active replication
  • Each server receives client requests, and
    broadcasts them to the other servers. They
    collectively implement a fault-tolerant state
    machine. In presence of crash, all the correct
    processes reach the same next state.

input
Next state
State
7
Fault-tolerant state machine
  • This formalism is based on a survey by Fred
    Schneider.
  • The clients must receive correct response even if
    up to
  • m replica servers fail (either fail-stop or
    byzantine).
  • For fail-stop, (m1) replicas are needed. If a
    client queries
  • the replicas, the first one that responds gives a
    correct value.
  • For byzantine failure (2m1) replicas are
    needed. m bad
  • responses can be voted out by the (m1) good
    responses.
  • But the states of the good processes must be
    correctly
  • Updated (byzantine consensus is needed)

Fault intolerant
Fault tolerant
8
Replica coordination
  • Agreement. Every correct replica receives all the
    requests.
  • Order. Every correct replica receives the
    requests in the same order.
  • Agreement part is solved by atomic multicast.
  • Order part is solved by total order multicast.
  • The order part solves the consensus problem
  • where servers will agree about the next update.
  • It requires a synchronous model. Why?

server
client
9
Agreement
client
  • With fail-stop processors, the agreement part
  • is solved by reliable atomic multicast.
  • To deal with byzantine failures, an interactive
  • consistency protocol needs to be implemented.
  • Thus, with an oral message protocol, n 3m1
  • processors will be required.

server
10
Order
  • Let timestamps determine the message order.

client
A request is stable at a server, when the it
does not expect to receive any other client
request with a lower timestamp. Assume three
clients are trying to send an update, the
channels are FIFO, and their timestamps are 20,
30, 42. Each server will first update its copy
with the value that has the timestamp 20.
30
20
server
42
11
Order
But some clients may not have any update. How
long should the server wait? Require clients to
send null messages (as heartbeat signals) with
some timestamp ts. A message (null, 35) means
that the client will not send any update till
ts35. These can be part of periodic heartbeat
messages. An alternative is to use virtual time,
where processes are able to undo actions.
client
30
null
35
server
42
12
What is replica consistency?
replica
clients
Consistency models define a contract between the
data manager and the clients regarding the
responses to read and write operations.
13
Replica Consistency
  • Data Centric
  • Client communicates with the same replica
  • Client centric
  • Client communicates with different replica at
    different times. This may be the case with mobile
    clients.

14
Data-centric Consistency Models
  • 1. Strict consistency
  • 2. Linearizability
  • 3. Sequential consistency
  • Causal consistency
  • Eventual consistency (as in DNS)
  • Weak consistency
  • There are many other models

15
Strict consistency
  • Strict consistency corresponds to true
    replication transparency. If one of the processes
    executes x 5 at real time t and this is the
    latest write operation, then at a real time t gt
    t, every process trying to read x will receive
    the value 5. Too strict! Why?

W(x5)
p1
R(x5)
p2
t
t
Assume the read or write operations are
non-blocking
16
Sequential consistency
  • Some interleaving of the local temporal order of
    events at the different replicas is a consistent
    trace.

W(x100)
W(x99
R(x100)
R(x99)
17
Sequential consistency
  • Is sequential consistency satisfied here?
    Initially x y 0

W(x10)
W(x8
R(x10)
W(x20)
R(x20)
R(x10)
18
Causal consistency
  • All writes that are causally related must be
    seen by every process in the same order.

W(x10)
W(x20)
R(x10)
R(x20)
R(x10)
R(x20)
19
Linearizability
  • Linearizability is a correctness criterion for
    concurrent object (Herlihy Wing ACM TOPLAS
    1990). It provides the illusion that each
    operation on the object takes effect in zero
    time, and the results are equivalent to some
    legal sequential computation.

20
Linearizability
  • A trace is in a read-write system is consistent,
    when every read returns the latest value written
    into the shared variable preceding that read
    operation. A trace is linearizable, when (1) it
    is consistent, and (2) the temporal ordering
    among the reads and writes is respected (may be
    based on real time or logical time).

W (x0)
R (x1)
W (x0)
ts10
ts21
ts27
R(x1)
W (x1)
(Initially xy0)
ts38
ts19
Linearizability is stronger than sequential
consistency, i.e. every linearizable object is
also sequentially consistent.
Is it a linearizable trace?
21
Exercise
What consistency model is satisfied by the above?
22
Implementing consistency models
  • Why are there so many consistency models?
  • Each model has a use in some type of
    application.
  • The cost of implementation (as measured by
    message complexity) decreases as the models
    become weaker.

23
Implementing linearizability
W (x20)
Read x
W(x10)
Read x
Needs total order multicast of all reads and
writes
24
Implementing linearizability
  • The total order multicast forces every process to
    accept and handle all reads and writes in the
    same temporal order.
  • The peers update their copies in response to a
    write, but only send acknowledgments for reads.
    After all updates and acknowledgments are
    received, the local copy is returned to the
    client.

25
Implementing sequential consistency
  • Use total order broadcast all writes only,
  • but for reads, immediately return local copies.

26
Eventual consistency
  • Only guarantees that all replicas eventually
    receive all updates, regardless of the order.
  • The system does not provide replication
    transparency but large scale systems like Bayou
    allows this. Conflicting updates are resolved
    using occasional anti-entropy sessions that
    incrementally steer the system towards a
    consistent configuration.

27
Implementing eventual consistency
  • Updates are propagated via epidemic protocols.
    Server S1 randomly picks a neighboring server S2,
    and passes on the update.
  • Case 1. S2 did not receive the update before. In
    this case, S2 accepts the update, and both S1 and
    S2 continue the process.
  • Case 2. S2 already received the update from
    someone else. In that case, S1 loses interest in
    sending updates to S2 (reduces the probability of
    transmission to S2 to 1/p (p is a tunable
    parameter)
  • There is always a finite probability that some
    servers do not receive all updates. The number
    can be controlled by changing p.

28
Anti-entropy sessions
  • These sessions minimize the degree of chaos in
    the states of the replicas.
  • During such a session, server S1 will pull the
    update from S2, and server S3 can push the
    update to S4

30
Timestamp of update
30
S4
26
32
30
S2
S3
24
S1
29
Exercise
  • Let x, y be two shared variables
  • Process P Process Q
  • initially x0 initially y0
  • x 1 y1
  • if y0 ? x2 fi if x0 ? y2 fi
  • Print x Print y
  • If sequential consistency is preserved, then
    what are the possible values of the printouts?
    List all of them.

30
Client centric consistency model
Relevant in the cloud storage environment
31
Client-centric consistency model
  • Read-after-read
  • If read from A is followed by read from B then
    the second read should return a data that is as
    least as old the previous read.

A
B
Iowa City
San Francisco
All the emails read at location A must be marked
as read in location B
32
Client-centric consistency model
  • Read-after-write (a.k.a read your writes
  • Consider a large distributed store containing a
    massive collection of music. Clients set up
    password-protected accounts for purchasing and
    downloading music.
  • Alice changed her password in Iowa City,
    traveled to a Minneapolis, and tried to access
    the collection by logging into the account using
    her new password, then she must be able to do so.

33
Client-centric consistency model
  • Write-after-read (a.k.a. write-follows-read)
  • Each write operation following a read should
    take effect on the previously read copy, or a
    more recent version of it.

Use your bank card to pay 500 in a store in
Denver
Alice then went to San Francisco
Balance 1500
Balancebalance-500
Write should take effect on Balance 1500
Balance in Iowa city bank after your paycheck was
credited
But the payment did not go through!
Write-after-read consistency was violated
34
Client-centric consistency model
  • Write-after-write (a.k.a. monotonic write)
  • When write at S is followed by write at a
    different server S, the updates at S must be
    visible before the data is updates at S.

S
San Francisco
S
Dallas
Alice then went to San Francisco
Only ½ of the updates at S are visible here
Alice gave a raise to each of her 100 employees
Alice then decided to give a 10 bonus on the new
salary to every employee
½ of the employees will receive a lower bonus
Write-after-read consistency was violated
35
Implementing client-centric consistency
Read set RS, write set WS Before an operation at
a different server is initiated, the
appropriate RS or WS is fetched from another
server.
36
Quorum-based protocols
A quorum system engages only a designated minimum
number of the replicas for every read or write
operation this number is called the read or
write quorum. When the quorum is not met, the
operation (read or write) is not performed.
Improves reliability, available, and reduces the
load on individual servers
37
Quorum-based protocols
Use 2-phase locking to update all the copies
(value, version )
Write quorum
Thomas rule
To write, update gt N/2 of them, and tag it with
new version number. To read, access gt N/2
replicas, and access the value from the copy with
the largest version number. Otherwise abandon the
read
Read quorum
38
Rationale
N no of replicas.
Ver 3
Ver 2
If different replicas store different version
numbers for an item, the state associated with a
larger version number is more recent than the
state associated with a smaller version
number. We require that RW gt N, i.e., read
quorums always intersect with write quorums.
This will ensure that read results always reflect
the result of the most recent write (because the
read quorum will include at least one replica
from the most recent write).
39
How it works
N no of replicas.
1. Send a write request containing the state and
new version number to all the replicas and waits
to receive acknowledgements from a write quorum.
At that point the write operation is complete.
The replicas are locked when the write is in
progress. 2. Send a read request for the version
number to all the replicas, and wait for replies
from a read quorum.
40
Quorum-based protocols
After a partition, only the larger segment runs
the consensus protocol. The smaller segment
contains stale data, until the network is
repaired.
Ver.1
Ver.0
41
Quorum-based protocolsGeneralized version
Asymmetric quorum W R gt N W gt N/2
No two writes overlap No read overlaps with a
write.
R read quorum W write quorum
This generalization is due to Gifford.
42
Brewers CAP Theorem
In an invited talk in the PODC 2000 conference,
Eric Brewer presented a conjecture that it is
impossible for a web service to provide all three
of the following guarantees consistency (C),
Availability (A), and partition-tolerance (P).
Individually each of these guarantees is highly
desirable, however, a web-service can meet at
most two of the three guarantees.
43
A High-level View of CAP Theorem
For consistency and availability, propagate the
update from the left to the right partition. But
how can you do it? So sacrifice partition
tolerance If you prefer partition tolerance and
availability, the sacrifice consistency. Or if
you prefer both partition-tolerance and
consistency, then sacrifice availability users
in the right partition will wait indefinitely
until the partition is restored and the update is
propagated to the right.
44
Amazon Dynamo
Amazons Dynamo is a highly scalable and highly
available key-value storage designed to support
the implementation of its various e-commerce
services. Dynamo serves tens of millions of
customers at peak times using thousands of
servers located across numerous data centers
around the world Dynamo uses distributed hash
tables (DHT) to map its servers in a circular
key space using consistent hashing commonly used
in many P2P networks. .
45
Amazon Dynamo
(a) The key K is stored in the server SG and is
also replicated in servers like SH and SA (b)
The evolution of multi-version data as reflected
by the values of the vector clocks.
46
Amazon Dynamo
Multiple versions of data are however rare. In a
24-hour profile of the shopping cart service,
99.94 of requests saw exactly one version,
and 0.00057 of requests saw 2 versions. Write
the coordinator generates the vector clock for
the new version, and sends it to the top T
reachable nodes. If at least W nodes respond,
then the write is considered successful. Read
the coordinator sends a request for all existing
version to the T top reachable servers. If it
receives R responses then the read is considered
successful Uses sloppy quorum -- T, R, and W
are limited to the first set of reachable
non-faulty servers in the consistent hashing ring
-- this speeds up the read and the write
operations by avoiding the slow servers.
Typically, (T,R,W) (3,2,2)
47
Amazon Dynamo
Maintains the spirit of always write When a
designated server S is inaccessible or down, the
write is directed to a different server S with a
hint that this update is meant for S . S later
delivers the update to S when it recovers (Hinted
handoff).   Service level agreement Quite
stringent -- a typical SLA requires that 99.9 of
the read and write requests execute within 300ms,
otherwise customers lose interest and business
suffers.
Write a Comment
User Comments (0)
About PowerShow.com