Consistency and Replication

About This Presentation

Title:

Consistency and Replication

Description:

Preliminary Version, Not Final. Consistency and Replication. Introduction to ... How strong is Orbitz's model? If it shows a ticket available, is it really? ... – PowerPoint PPT presentation

Number of Views:147

Avg rating:3.0/5.0

Slides: 73

Provided by: Ken667

Category:

more less

Transcript and Presenter's Notes

Title: Consistency and Replication

1
Consistency and Replication

Introduction to Distributed SystemsCS
457/557Fall 2008Kenneth Chiu

Topics
Consistency models
Implementation
Replica location and content distribution
Maintaining consistency

3
Why Replicate?

Reliability
If one goes down, the others can stay up.
How can it address corrupted data?
Compare multiple versions
Performance
Divide the work
Place data closer to place it is used.
What is the challenge?
Consistency
Consider a web cache in your browser.

4
Costs

As a scaling technique, may not always be
applicable.

Update replica M times per second
Access replica N times per second
P

What if N

What do we do?

WAN
Withdraw 50
Withdraw 50

A dilemma
Scalability can be alleviated by replication and
caching.
But consistency requires global synchronization!
Only real solution is to relax consistency
requirements.

6
Consistency Models Review

Enforcing absolute ordering is too expensive,
especially with replication and caching.
So we need to allow for mis-ordering.
We could just do it casually. Tell programmers,
Well, you might see things out of order a little
bit, but only in ways that wont matter.
They would say, What do you mean?
So we need an exact, very precise way of
specifying the kinds of inconsistencies that the
application might see.
That is the purpose and point of having
consistency models.

7
Data Centric Consistency Models
8
Data Stores

Consistency is viewed as read/write ops on shared
data.
A consistency model is a contract between the
processes and the data store.

9
Continuous Consistency

Three axes for continuous consistency ranges
Deviation in numerical values
Deviation in staleness (age) between replicas
Deviation with respect to ordering
Numerical deviation
Can be specified in terms of deviation in values.
Can also be specified in terms of the number of
updates that have been applied, but not yet seen
by others. Deviation in value is then known as
the weight.
Staleness deviation
A replica can be out-of-date, as long as it is
not too out-of-date
For example, a weather report.
Ordering deviation
Can be specified as the number of ops that may
need to be rolled back.

10
Consistency Unit

Conit The unit of data over which consistency is
to be measured. Examples?
A single stock
A single weather report

Each replica maintains a vector clock. So it can
do causally ordered multicast.
The notation means time t at replica i.
Conit is data items x and y. Both initialized to
0. Replica A has committed one operation.

Replica A
Ordering deviation is 3, since it has three
uncommitted operations.
Numerical deviation by operations is 1. Weight is
5.

Replica B
Ordering deviation is 2.
Numerical deviation is 3, with weight of 6.

13
Conit Granularity

Why do some hotels have a sink outside?
Should conits be coarse-grained (a whole
database) or fine-grained (just one record in
it)?
In other words, should we try to keep large
pieces of data consistent or small pieces?

Assume that two replicas may only differ in one
outstanding update.
In top, the conit has two data items. In the
bottom, it only has one.
Two updates for the top will force propagation,
on the bottom it will not.

Data item
Update
Propagate updates
Update
Conit
Replica 2
Replica 1
Update
Updates postponed
Update
Replica 2
Replica 1
15

So should conits always be as small as possible?
Higher overhead.
Similar things in real life. For example, hotel
rooms with sink outside.

Data item
Update
Propagate updates
Update
Conit
Replica 2
Replica 1
Update
Updates postponed
Update
Replica 2
Replica 1
16
Consistent Ordering

A more traditional way to model consistency.
From architecture and concurrent programming.

17
Notation

Processes execute to the right as time
progresses.
The notation W1(x)a means that the process P1
wrote the value a to the variable x.
The notation R2(x)a means that the process P2
read the value a from the variable x.
The subscript is often dropped.

18
Sequential Consistency

The result of any execution is the same as if the
(read and write) operations by all processes on
the data store were executed in some sequential
order and the operations of each individual
process appear in this sequence in the order
specified by its program.
There is some global order.
Operations between processes must be as in the
program.

Program A A-OP1A-OP2A-OP3
Which of these are valid?
Global Order 2 A-OP1B-OP1A-OP2B-OP2B-OP3A-OP3
Global Order 3 A-OP1B-OP1A-OP2B-OP3B-OP2A-OP3
Global Order 1 A-OP1A-OP2A-OP3B-OP1B-OP2B-OP3
Program B B-OP1B-OP2B-OP3
19

Which of these is sequentially consistent?

Consider three concurrently executing processes
P1, P2, and P3.
The data items are x, y, and z.
Assume all initialized to 0.
Assignment is a write operation.
Print is a simultaneous read operation.
All operations are indivisible.
What are some possible execution interleavings?
Which ones are valid?

The signature is the value of the output of P1,
P2, and P3, concatenated in that order.
Not all signatures are valid.
Which of these are valid?

Process P1
Process P2
Process P3
22
Sequential Consistency(From 2006)

The result of any execution is the same as if the
(read and write) operations by all processes on
the data store were executed in some sequential
order and the operations of each individual
process appear in this sequence in the order
specified by its program.
There is some global order.
Operations between processes must be as in the
program.

Program A A-OP1A-OP2A-OP3
Global Order 2 A-OP1B-OP1A-OP2B-OP2B-OP3A-OP3
Global Order 3 A-OP1B-OP1A-OP2B-OP3B-OP2A-OP3
Global Order 1 A-OP1A-OP2A-OP3B-OP1B-OP2B-OP3
Program B B-OP1B-OP2B-OP3
23
Sequential Consistency (3)(From 2006)
24
Sequential Consistency (4)(From 2006)

Figure 7-6. Three concurrently-executing
processes.

25
Sequential Consistency (5)(From 2006)

Figure 7-7. Four valid execution sequences for
the processes of Fig. 7-6. The vertical axis is
time.

26
Causal Consistency

For a data store to be considered causally
consistent, it is necessary that the store obeys
the following condition
Writes that are potentially causally related must
be seen by all processes in the same order.
Concurrent writes may be seen in a different
order on different machines.

Allowed?
This sequence is allowed with a
causally-consistent store, but not with a
sequentially consistent store.

Causally consistent?

29
Grouping Operations

Do SMP machines also need consistency models?
Yes, there are many kinds.
Why we not care about these when writing MT
programs?
We do, if we are platform dependent and dont use
locks.
How do we handle consistency in MT programs?
Use locks.
As viewed by an external, data-centric process,
what do locks do?
They turn non-atomic operations into atomic ones
(functionally).
In other words, they group them.

30
Synchronization Variables

Operations are grouped via synchronization
variables (locks).
Each synchronization variable protects an
associated data set.
Each kind of synchronization variable has some
associated properties.

31
Release Consistency

Two operations
Acquire a critical section is about to be
entered.
Release a critical section is about to be exited.

32
Entry Consistency

Entry Consistency Necessary criteria for correct
synchronization
An acquire access of a synchronization variable
is not allowed to perform until all updates to
guarded shared data have been performed with
respect to that process.
Before exclusive mode access to synchronization
variable by a process is allowed to perform with
respect to that process, no other process may
hold the synchronization variable, not even in
nonexclusive mode.
After exclusive mode access to a synchronization
variable has been performed, any other process
next nonexclusive mode access to that
synchronization variable may not be performed
until it has performed with respect to that
variables owner.

An acquire access of a synchronization variable
is not allowed to perform until all updates to
guarded shared data have been performed with
respect to that process.
When a process does an acquire, the acquire may
not complete until all remote changes to the
guarded data have been made visible.
Before exclusive mode access to synchronization
variable by a process is allowed to perform with
respect to that process, no other process may
hold the synchronization variable, not even in
nonexclusive mode.
Before updating a shared item, a process must
enter the critical section in exclusive mode.
After exclusive mode access to a synchronization
variable has been performed, any other process
next nonexclusive mode access to that
synchronization variable may not be performed
until it has performed with respect to that
variables owner.
If a process wants to enter a critical section in
non-exclusive mode, it must first check with the
owner of the synchronization variable to get the
most recent copies of the shared data.

Is this valid for entry consistency?
Yes, a valid event sequence for entry consistency.

35
Consistency vs. Coherence

Consistency model describes what happens to a set
of data when a set of processes operate on that
data.
Coherence model only pertains to a single data
item. So it is about a set of processes writing
to a single data item.

36
Client Centric Models
37
Weaker Models

Sometimes strong models are needed, if the result
of race conditions are very bad.
Banks
Sometimes the result of races are just
inefficiency, or inconvenience, etc.
How strong is Orbitzs model?
If it shows a ticket available, is it really?
How does it prevent two people from reserving the
same seat?
One kind of weaker model is eventual consistency
It eventually becomes consistent

38
Eventual Consistency
Client moves to other location and(transparently)
connects to other replica
Replicas need tomaintain client-centric
consistency
WAN
Laptop
Read/writeoperations
Distributed andreplicated database

How well does EC work for mobile clients?
Not very well. Things can disappear (go
backwards, etc.).
Client-centric is intended to address this.
Consistent for a single client.

39
Client-Centric Consistency

Intended to address the issues in eventual
consistency for mobile clients.
Consistent for a single client.
Notation
xit is the version of x at local copy Li at
time t.
Version xit is the result of a series of write
operations at Li that took place since
initialization. This is WS(xit).
If operations in WS(xit) have also been
performed at local copy Lj at a later time t2, we
write WS(xit1xjt2).

40
Monotonic Reads

A data store is said to provide monotonic-read
consistency if the following condition holds
If a process reads the value of a data item x any
successive read operation on x by that process
will always return that same value or a more
recent value.
In other words, if a process has seen a value of
x at time t, it will never see an older version
of x at a later time.
Example Suppose a user opens his mailox in San
Francisco, then flies to New York. Should he see
an earlier version of his mailbox?

Which one of these obeys this model?

42
Monotonic Writes

In a monotonic-write consistent store, the
following condition holds
A write operation by a process on a data item x
is completed before any successive write
operation on x by the same process.
In other words, a write operation must wait for
all preceding write operations.

Which one of these obeys that?

44
Read Your Writes

A data store is said to provide read-your-writes
consistency, if the following condition holds
The effect of a write operation by a process on
data item x will always be seen by a successive
read operation on x by the same process.
In other words a write operation is always
completed before a successive read operation by
the same process, no matter where the read
operation takes place.
Suppose your web browser has a cache.
You update your web page on the server.
You refresh your browser.
Do you have read-your-writes consistency?

Which of these is read-your-writes?

46
Writes Follow Reads

A data store is said to provide
writes-follow-reads consistency, if the following
holds
A write operation by a process on a data item x
following a previous read operation on x by the
same process is guaranteed to take place on the
same or a more recent value of x that was read.
In other words, any successive write operation by
a process on a data item x is guaranteed to take
place on a copy of x that is up to date with the
value most recently read.
Example Suppose we are replicating a database
for a blog. Performing a write amounts to posting
a response. If we do not use writes-follow-reads,
then it would be possible for a user to read a
response without the original.

Which of these obeys writes-follow-reads?

48
Replica Management
49
Two Subproblems

Your boss says to you, Our system is too slow,
make it faster.
You decide that replication of servers is the
answer. What do you do next? What are the
questions that need to be answered?
Where to place servers?
Where to place content?

50
Placing Servers

Given a set of N locations, how do you place the
K servers?
What are the goals?
What is the metric that is being optimized?
One algorithm, each time you place a server,
minimize the average remaining distance to
clients.
What is distance?
Is average the right thing to minimize? What if
one client accesses a lot, the other not so much.
Can we ignore the client locations?
Yes, if they are uniformly distributed.
Other ideas for algorithms?

51
Clustering

One idea, identify the K largest clusters, then
put one server in each cluster.
How do you find clusters?
One way, divide space up into cells, pick K most
populated ones.

52
Replica-Server Placement

Choosing a proper cell size for server placement.
Turns out that computing from average distance
between two nodes and the number of replicas
works well.
Close to optimum results, but takes much less
time O(Nmaxlog(N),K).
For example, computing the 20 best replica
locations for 64,000 nodes is about 50,000 times
faster.

53
Content Replication and Placement

The logical organization of different kinds of
copies of a data store into three concentric
rings.

Server-initiated replication
Client-initiated replication
54
Content Replication

Permanent replicas
Can be distributed across servers at a single
location. (What problem does this address?)
Can be distributed geographically. (What problem
does this address?)

Server-initiated replicas
Created more dynamically, at the request of the
server.
For example, imagine the traffic on a
hypothetical Red Sox web site the night they won
the world series.
Can be done to reduce load, and also to improve
client performance.
One algorithm Each server keeps track of
requests for files, and where they come from.
If the number of requests for F at Q drops below
del(Q,F), the file is removed (if not the last
replica).
If the number of requests for F at Q goes above
rep(Q, F), the file is replicated.
If the number of requests for F is between del(Q,
F) and rep(Q, F), the file will be migrated if
for some server P, cntQ(P,F) exceeds more than
half of the total requests for F.

Counting access requests from different clients.

If migration does not succeed for some reason,
then replication is attempted. Server checks all
other servers, starting with the one farthest
away (why?). If some server has cntQ(R,F) above
a certain fraction of the requests for F, a
replication attempt is made.

Client-Initiated Replicas (client-side caches)
Client can cache at will.
Can have different invalidation policies, etc.

59
Content Distribution

What to propagate? Possibilities
Propagate only a notification of an update.
Invalidation protocol.
Transfer data from one copy to another.
Propagate the update operation to other copies.
When is each advantageous?
Read/write ratio is small?
Read/write ratio is high?

60
Pull vs. Push

Push is sent by servers without request.
Pull is specifically asked.
When is each advantageous?
One way of looking at efficiency is whether or
not a message is likely to be useless. For
example, an update message that is not read
before another one is sent.

61
Leases

Hybrid approach A lease is a promise by the
server to push for a specified amount of time.
After that, the client must poll.
Can distinguish three criteria
If the data is rarely modified, should we give
long or short lease?
If a client often requests an update, should we
give long or short?
If space is short at the server?

62
Unicasting vs. Multicasting

Which is better?

63
Consistency Protocols
64
Primary-Based Protocols

In practice, consistency models are usually not
too hard to understand.
If it is too hard to understand, it is too hard
to write correct applications.
Note that this situation is somewhat different
for hardware consistency models. Why?
In primary-based protocols, each data item has an
associated primary replica.
Can be fixed or can move around.

65
Remote-Write Protocols

All write operations forwarded to a single fixed
primary server (also known as primary-backup).
This does the update and forwards to all others.
Only when all have responded does the original
respond.

66
Client
Primary serverfor item x
Client
R2
W1
W5
R1
W4
W4
W3
W3
Backup server
W3
W2
W4
Data store
W1. Write requestW2. Forward request to
primaryW3. Tell backups to updateW4.
Acknowledge updateW5. Acknowledge write completed
R1. Read requestR2. Response to read
67
Client
Primary serverfor item x
Client
R2
W1
W5
R1
W4
W4
W3
W3
Backup server
W3
W2
W4
Data store

How is the performance of this protocol?
Is it necessary to wait for the W5 to complete
before allowing the client to continue?

68
Local-Write Protocols

Primary copy migrates.
Advantage is that multiple successive writes can
be carried out locally.
Reading processes can continue to read.

69
(No Transcript)
70

Also corresponds well with mobile computing.
Before you disconnect, make your laptop the
primary server.
While disconnected, everything is update locally.
Also fits distributed file systems.

71
Replicated-Write Protocols

Active replication
Writes may happen to any replica
Need to handle ordering issues.
One way is with totally ordered multicast.
Another way is with a sequencer coordinator that
assigns sequence numbers.

Quorum-based Use voting.
To do a write, a client must first get the
approval of a majority of the servers.
File is then updated, and a new version number is
assigned.
To do a read, a client also contacts a majority,
and gets the current version number from them. If
version numbers are the same, then it is the most
recent version.
Generalized
To do a read, assemble a read quorum, NR.
To modify, assemble a write quorum, NW.
Constraints
NR NW N, to prevent read-write conflicts.
NW N/2, to prevent write-write conflicts.

73
Read quorum
Write quorum

Which of these are valid?

74
Cache-Coherence Protocols

For hardware, broadcast or snooping is possible.
Not for distributed systems.
Three aspects
Coherence detection strategy When are
inconsistencies detected.
Static, such as a compiler, inserts instructions
that might lead to inconsistencies. What about
for concurrency?
Dynamic, inconsistencies are detected at runtime.
When accessed, block the operation/transaction.
When accessed, but do not block the transaction
(optimistic).
Only when commit.
When is each of these good?
Coherence enforcement strategy How caches are
kept consistent.
Do not cache any shared data.
If can be cached
Send invalidation to all caches.
Send the actual update.
When is each of these going to be better?
Modifications by clients What happens when a
client modifies data.
Write-through
Write-back