CPSC 668 Distributed Algorithms and Systems - PowerPoint PPT Presentation

About This Presentation
Title:

CPSC 668 Distributed Algorithms and Systems

Description:

CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch Distributed Shared Memory A model for inter-process communication Provides illusion of ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 46
Provided by: Jennifer587
Category:

less

Transcript and Presenter's Notes

Title: CPSC 668 Distributed Algorithms and Systems


1
CPSC 668Distributed Algorithms and Systems
  • Fall 2006
  • Prof. Jennifer Welch

2
Distributed Shared Memory
  • A model for inter-process communication
  • Provides illusion of shared variables on top of
    message passing
  • Shared memory is often considered a more
    convenient programming platform than message
    passing
  • Formally, give a simulation of the shared memory
    model on top of the message passing model
  • We'll consider the special case of
  • no failures
  • only read/write variables to be simulated

3
Shared Memory Issues
  • A process will invoke a shared memory operation
    at some time
  • The simulation algorithm running on the same node
    will execute some code, possibly involving
    exchanges of messages
  • Eventually the simulation algorithm will inform
    the process of the result of the shared memory
    operation.
  • So shared memory operations are not
    instantaneous!
  • Operations (invoked by different processes) can
    overlap
  • What should be returned by operations that
    overlap other operations?
  • defined by a memory consistency condition

4
Sequential Specifications
  • Each shared object has a sequential
    specification specifies behavior of object in
    the absence of concurrency.
  • Object supports operations
  • invocations
  • matching responses
  • Set of sequences of operations that are legal

5
Sequential Spec for R/W Registers
  • Operations are reads and writes
  • Invocations are readi(X) and writei(X,v)
  • Responses are returni(X,v) and acki(X)
  • A sequence of operations is legal iff each read
    returns the value of the latest preceding write.

6
Memory Consistency Conditions
  • Consistency conditions tie together the
    sequential specification with what happens in the
    presence of concurrency.
  • We will study two well-known conditions
  • linearizability
  • sequential consistency
  • We will only consider read/write registers, in
    the absence of failures.

7
Definition of Linearizability
  • Suppose ? is a sequence of invocations and
    responses.
  • an invocation is not necessarily immediately
    followed by its matching response
  • ? is linearizable if there exists a permutation ?
    of all the operations in ? (now each invocation
    is immediately followed by its matching response)
    s.t.
  • ?X is legal (satisfies sequential spec) for all
    X, and
  • if response of operation O1 occurs in ? before
    invocation of operation O2, then O1 occurs in ?
    before O2 (? respects real-time order of
    non-concurrent operations in ?).

8
Linearizability Examples
Suppose there are two shared variables, X and Y,
both initially 0
p0
1
3
0
p1
2
4
Is this sequence linearizable?
Yes - green triangles.
What if p1's read returns 0?
No - see arrow.
9
Definition of Sequential Consistency
  • Suppose ? is a sequence of invocations and
    responses.
  • ? is sequentially consistent if there exists a
    permutation ? of all the operations in ? s.t.
  • ?X is legal (satisfies sequential spec) for all
    X, and
  • if response of operation O1 occurs in ? before
    invocation of operation O2 at the same process,
    then O1 occurs in ? before O2 (? respects
    real-time order of operations by the same process
    in ?).

10
Sequential Consistency Examples
Suppose there are two shared variables, X and Y,
both initially 0
0
4
3
p0
1
2
p1
Is this sequence sequentially consistent?
Yes - green numbers.
What if p0's read returns 0?
No - see arrows.
11
Specification of Linearizable Shared Memory Comm.
System
  • Inputs are invocations on the shared objects
  • Outputs are responses from the shared objects
  • A sequence ? is in the allowable set iff
  • Correct Interaction each proc. alternates
    invocations and matching responses
  • Liveness each invocation has a matching
    response
  • Linearizability ? is linearizable

12
Specification of Sequentially Consistent Shared
Memory
  • Inputs are invocations on the shared objects
  • Outputs are responses from the shared objects
  • A sequence ? is in the allowable set iff
  • Correct Interaction each proc. alternates
    invocations and matching responses
  • Liveness each invocation has a matching
    response
  • Sequential Consistency ? is sequentially
    consistent

13
Algorithm to Implement Linearizable Shared Memory
  • Uses totally ordered broadcast as the underlying
    communication system.
  • Each proc keeps a replica for each shared
    variable
  • When read request arrives
  • send bcast msg containing request
  • when own bcast msg arrives, return value in local
    replica
  • When write request arrives
  • send bcast msg containing request
  • upon receipt, each proc updates its replica's
    value
  • when own bcast msg arrives, respond with ack

14
The Simulation
user of read/write shared memory
read/write
return/ack
read/write
return/ack

Shared Memory
alg0
algn-1
to-bc-send
to-bc-recv
to-bc-send
to-bc-recv
Totally Ordered Broadcast
15
Correctness of Linearizability Algorithm
  • Consider any admissible execution ? of the
    algorithm
  • underlying totally ordered broadcast behaves
    properly
  • users interact properly
  • Show that ?, the restriction of ? to the events
    of the top interface, satisfies Liveness, and
    Linearizability.

16
Correctness of Linearizability Algorithm
  • Liveness (every invocation has a response) By
    Liveness property of the underlying totally
    ordered broadcast.
  • Linearizability Define the permutation ? of the
    operations to be the order in which the
    corresponding broadcasts are received.
  • ? is legal because all the operations are
    consistently ordered by the TO bcast.
  • ? respects real-time order of operations if O1
    finishes before O2 begins, O1's bcast is ordered
    before O2's bcast.

17
Why is Read Bcast Needed?
  • The bcast done for a read causes no changes to
    any replicas, just delays the response to the
    read.
  • Why is it needed?
  • Let's see what happens if we remove it.

18
Why Read Bcast is Needed
read return(1)
p0
write(1)
p1
to-bc-send
p2
read return(0)
Not linearizable!
19
Algorithm for Sequential Consistency
  • The linearizability algorithm, without doing a
    bcast for reads
  • Uses totally ordered broadcast as the underlying
    communication system.
  • Each proc keeps a replica for each shared
    variable
  • When read request arrives
  • immediately return the value stored in the local
    replica
  • When write request arrives
  • send bcast msg containing request
  • upon receipt, each proc updates its replica's
    value
  • when own bcast msg arrives, respond with ack

20
Correctness of SC Algorithm
  • Lemma (9.3) The local copies at each proc. take
    on all the values appearing in write operations,
    in the same order, which preserves the per-proc.
    order of writes.
  • Lemma (9.4) If pi writes Y and later reads X,
    then pi's update of its local copy of Y (on
    behalf of that write) precedes its read of its
    local copy of X (on behalf of that read).

21
Correctness of the SC Algorithm
  • (Theorem 9.5) Why does SC hold?
  • Given any admissible execution ?, must come up
    with a permutation ? of the shared memory
    operations that is
  • legal and
  • respects per-proc. ordering of operations

22
The Permutation ?
  • Insert all writes into ? in their to-bcast order.
  • Consider each read R in ? in the order of
    invocation
  • suppose R is a read by pi of X
  • place R in ? immediately after the later of
  • the operation by pi that immediately precedes R
    in ?, and
  • the write that R "read from" (caused the latest
    update of pi's local copy of X preceding the
    response for R)

23
Permutation Example
4
read return(2)
p0
write(2)
3
ack
p1
to-bc-send
to-bc-send
p2
read return(1)
write(1)
ack
1
2
permutation is given by green numbers
24
Permutation ? Respects Per Proc. Ordering
  • For a specific proc
  • Relative ordering of two writes is preserved by
    Lemma 9.3
  • Relative ordering of two reads is preserved by
    the construction of ?
  • If write W precedes read R in exec. ?, then W
    precedes R in ? by construction
  • Suppose read R precedes write W in ?. Show same
    is true in ?.

25
Permutation ? Respects Ordering
  • Suppose R and W are swapped in ?
  • There is a read R' by pi that equals or precedes
    R in
  • There is a write W' that equals W or follows W in
    the to-bcast order
  • And R' "reads from" W'.

R
W
R'
?pi
W W' R' R
?
  • But
  • R' finishes before W starts in ? and
  • updates are done to local replicas in to-bcast
    order (Lemma 9.3) so update for W' does not
    precede update for W
  • so R' cannot read from W'.

26
Permutation ? is Legal
  • Consider some read R by pi and some write W s.t.
    R reads from W in ?.
  • Suppose in contradiction, some other write W'
    falls between W and R in ?
  • Why does R follow W' in ??

27
Permutation ? is Legal
  • Case 1 R follows W' in ? because W' is also by
    pi and R follows W' in ?.
  • Update for W at pi precedes update for W' at pi
    in ? (Lemma 9.3).
  • Thus R does not read from W, contradiction.

28
Permutation ? is Legal
  • Case 2 R follows W' in ? due to some operation
    O by pi s.t.
  • O precedes R in ?, and
  • O is placed between W' and R in ?
  • Case 2.1 O is a write.
  • update for W' at pi precedes update for O at pi
    in ? (Lemma 9.3)
  • update for O at pi precedes pi's local read for R
    in ? (Lemma 9.4)
  • So R does not read from W, contradiction.

29
Permutation ? is Legal
W W' O' O R
?
  • Case 2.2 O is a read.
  • A recursive argument shows that there exists a
    read O' by pi (which might equal O) that
  • reads from W' in ? and
  • appears in ? between W' and O
  • Update for W at pi precedes update for W' at pi
    in ? (Lemma 9.3).
  • Update for W' at pi precedes local read for O' at
    pi in ? (otherwise O' would not read from W').
  • Recall that O' equals or precedes O (from above)
    and O precedes R (by assumption for Case 2) in ?
  • Thus R cannot read from W, contradiction.

30
Performance of SC Algorithm
  • Read operations are implemented "locally",
    without requiring any inter-process
    communication.
  • Thus reads can be viewed as "fast" time between
    invocation and response is that needed for some
    local computation.
  • Time for writes is time for delivery of one
    totally ordered broadcast (depends on how
    to-bcast is implemented).

31
Alternative SC Algorithm
  • It is possible to have an algorithm that
    implements sequentially consistent shared memory
    on top of totally ordered broadcast that has
    reverse performance
  • writes are local/fast (even though bcasts are
    sent, don't wait for them to be received)
  • reads can require waiting for some bcasts to be
    received
  • Like the previous SC algorithm, this one does not
    implement linearizable shared memory.

32
Time Complexity for DSM Algorithms
  • One complexity measure of interest for DSM
    algorithms is how long it takes for operations to
    complete.
  • The linearizability algorithm required D time for
    both reads and writes, where D is the maximum
    time for a totally-ordered broadcast message to
    be received.
  • The sequential consistency algorithm required D
    time for writes and C time for reads, where C is
    the time for doing some local computation.
  • Can we do better? To answer this question, we
    need some kind of timing model.

33
Timing Model
  • Assume the underlying communication system is the
    point-to-point message passing system (not
    totally ordered broadcast).
  • Assume that every message has delay in the range
    d-u,d.
  • Claim Totally ordered broadcast can be
    implemented in this model so that D, the maximum
    time for delivery, is O(d).

34
Time and Clocks in Layered Model
  • Timed execution associate an occurrence time
    with each node input event.
  • Times of other events are "inherited" from time
    of triggering node input
  • recall assumption that local processing time is
    negligible.
  • Model hardware clocks as before run at same
    rate as real time, but not synchronized
  • Notions of view, timed view, shifting are same
  • Shifting Lemma still holds (relates h/w clocks
    and msg delays between original and shifted execs)

35
Lower Bound for SC
  • Let Tread worst-case time for a read to
    complete
  • Let Twrite worst-case time for a write to
    complete
  • Theorem (9.7) In any simulation of sequentially
    consistent shared memory on top of point-to-point
    message passing, Tread Twrite ? d.

36
SC Lower Bound Proof
  • Consider any SC simulation with Tread Twrite lt
    d.
  • Let X and Y be two shared variables, both
    initially 0.
  • Let ?0 be admissible execution whose top layer
    behavior is
  • write0(X,1) ack0(X) read0(Y) return0(Y,0)
  • write begins at time 0, read ends before time d
  • every msg has delay d
  • Why does ?0 exist?
  • The alg. must respond correctly to any sequence
    of invocations.
  • Suppose user at p0 wants to do a write,
    immediately followed by a read.
  • By SC, read must return 0.
  • By assumption, total elapsed time is less than d.

37
SC Lower Bound Proof
  • Similarly, let ?1 be admissible execution whose
    top layer behavior is
  • write1(Y,1) ack1(Y) read1(X) return1(X,0)
  • write begins at time 0, read ends before time d
  • every msg has delay d
  • ?1 exists for similar reason.
  • Now merge p0's timed view in ?0 with p1's timed
    view in ?1 to create admissible execution ?'.
  • But ?' is not SC, contradiction!

38
SC Lower Bound Proof
time
0
d
not SC - contradiction!
39
Linearizability Write Lower Bound
  • Theorem (9.8) In any simulation of linearizable
    shared memory on top of point-to-point message
    passing, Twrite u/2.
  • Proof Consider any linearizable simulation with
    Twrite lt u/2.
  • Let be an admissible exec. whose top layer
    behavior is
  • p1 writes 1 to X, p2 writes 2 to X, p0 reads 2
    from X
  • Shift to create admissible exec. in which p1 and
    p2's writes are swapped, causing p0's read to
    violate linearizability.

40
Linearizability Write Lower Bound
linearizable
admissible
41
Linearizability Write Lower Bound
u
time
0
u/2
read 2
p0
not linearizable
write 1
shift p1 by u/2
p1
shift p2 by -u/2
write 2
p2
contradiction!
admissible
42
Linearizability Read Lower Bound
  • Approach is similar to the write lower bound.
  • Assume in contradiction there is an algorithm
    with Tread lt u/4.
  • Identify a particular execution
  • fix a pattern of read and write invocations,
    occurring at particular times
  • fix the pattern of message delays
  • Shift this execution to get one that is
  • still admissible
  • but not linearizable

43
Linearizability Read Lower Bound
  • Original execution
  • p1 reads X and gets 0 (old value).
  • Then p0 starts writing 1 to X.
  • When write is done, p0 reads X and gets 1 (new
    value).
  • Also, during the write, p0 and p1 alternate
    reading X.
  • At some point, the reads stop getting the old
    value (0) and start getting the new value (1)

44
Linearizability Read Lower Bound
  • Set all delays in this execution to be d - u/2.
  • Now shift p2 earlier by u/2.
  • Verify that result is still admissible (every
    delay either stays the same or becomes d or d -
    u).
  • But in shifted execution, sequence of values read
    is
  • 0, 0, , 0, 1, 0, 1, 1, , 1

not linearizable!
45
Linearizability Read Lower Bound
Write a Comment
User Comments (0)
About PowerShow.com