UBI529 Distributed Algorithms - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

UBI529 Distributed Algorithms

Description:

Goal: Take a snapshot of the global computation ... Group of photographers taking snaps of different portions and trying to combine ... – PowerPoint PPT presentation

Number of Views:271
Avg rating:3.0/5.0
Slides: 59
Provided by: kevin589
Category:

less

Transcript and Presenter's Notes

Title: UBI529 Distributed Algorithms


1
UBI529 Distributed Algorithms
Global State of Distributed Systems
2
Motivation
  • Goal Take a snapshot of the global computation
  • A snapshot of local states on n processes taken
    at exactly the same time
  • Two terms global state and global snapshot
  • Useful for debugging
  • Useful for backup/check-pointing
  • Useful for calculating global predicate
  • E.g., Exactly how much currency do we have in the
    country (notice that money flows among people
    constantly)?
  • Deadlock Detection
  • Rollback Recovery
  • Termination Detection

3
Global state
  • Global state
  • A set of local states that are concurrent with
    each other
  • Concurrent states no two states have a happened
    before relation with each other

4
The mystery of the missing dollars
Send 100
B
A
300
400
  • Picture taken at A - 400
  • A sends 100 to B
  • Picture taken at B - 400
  • Total is 800

5
Global Snapshot Problem
  • Determine the global system state (e.g. the total
    money )
  • Each process records its own state
  • No shared clock/memory
  • Group of photographers taking snaps of different
    portions and trying to combine to get the overall
    picture.

6
Consistent cut
  • Given computation (E,!) and F µ E is a cut iff
  • F is a consistent cut (global snapshot) iff

7
Consistent and inconsistent cuts
8
Consistent cut
A cut is a set of events.
  • (a ? consistent cut C) ? (b happened before
    a) ? b ? C

b
g
c
a
d
P1
e
m
f
P2
P3
k
h
i
j
Cut 1
Cut 2
(Not consistent)
(Consistent)
9
Consistent snapshot
  • The set of states immediately following a
    consistent cut forms a consistent snapshot of a
    distributed system.
  • A snapshot that is of practical interest is the
    most recent one. Let C1 and C2 be two consistent
    cuts and C1 ? C2. Then C2 is more recent than C1.
  • Analyze why certain cuts in the one-dollar bank
    are inconsistent.

10
Consistent snapshot
  • How to record a consistent snapshot? Note that
  • 1. The recording must be non-invasive
  • 2. Recording must be done on-the-fly.
  • You cannot stop the system.

11
Chandy Lamport Algorithm
  • Assumes
  • FIFO and
  • Unidirectional channels
  • A bidirectional channel is modelled as two
    unidirectional channels
  • Each process has an associated color. All
    processes are initially white.
  • A process records it local state just before
    turning red
  • On turning red the process sends out a marker on
    all outgoing channels
  • On receiving a marker a white process turns red

12
Chandy-Lamport Algorithm
  • Works on a
  • (1) strongly connected graph
  • (2) each channel is FIFO.
  • An initiator initiates the algorithm by sending
    out a marker ( )

13
White and red processes
  • Initially every process is white. When a process
    receives a marker, it turns red if it has not
    already done so.
  • Every action by a process, and every message sent
    by a process gets the color of that process.

14
Two steps
  • Step 1. In one atomic action, the initiator (a)
    Turns red (b) Records its own state (c) sends a
    marker along all outgoing channels
  • Step 2. Every other process, upon receiving a
    marker for the first time (and before doing
    anything else) (a) Turns red (b) Records its own
    state (c) sends markers along all outgoing
    channels
  • The algorithm terminates when (1) every process
    turns red, and (2) Every process has received a
    marker through each incoming channel.

15
Why does it work?
  • Lemma 1. No red message is received in a white
    action.

16
Why does it work?
All white
All red
SSS
Easy conceptualization of the snapshot state
  • Theorem. The global state recorded by
    Chandy-Lamport algorithm is equivalent to the
    ideal snapshot state SSS.
  • Hint. A pair of actions (a, b) can be scheduled
    in any order, if there is no causal order between
    them, so (a b) is equivalent to (b a)

17
Why does it work?
Let an observer observe the following
actions wi wk rk wj ri wl rj rl
? wi wk wj rk ri wl rj rl
Lemma 1 ? wi wk wj rk wl ri rj
rl Lemma 1 ? wi wk wj wl rk ri
rj rl done!
Recorded state
18
Example 1. Count the tokens
  • Let us verify that Chandy-Lamport snapshot
    algorithm correctly counts
  • the tokens circulating in the system

D
C
A
B
How to account for the channel states? Use sent
and received variables for each process.
19
Chandy Lamport Algorithm
20
Algorithm
public class RecvCamera extends Process
implements Camera . . . public
RecvCamera(Linker initComm, CamUser app)
. . . for (int i 0 i lt N i)
if (isNeighbor(i))
closedi false chani new
LinkedList() else closedi
true public synchronized void
globalState() myColor red
app.localState() // record local State
sendToNeighbors("marker", myId) // send
Markers public synchronized void
handleMsg(Msg m, int src, String tag)
if (tag.equals("marker")) if
(myColor white) globalState()
closedsrc true if (isDone())
----- Display channel state
(transit messages) chan ----
else // application message
if ((myColor red)
(!closedsrc))
chansrc.add(m) app.handleMsg(m,
src, tag) // give it to app
boolean isDone() if (myColor white)
return false for (int i 0 i lt N
i) if (!closedi) return false
return true
21
Lai Yang Algorithm
  • LY1. The initiator records its own state. When
    it needs to send a message m to another process,
    it sends a message (m, red).
  • LY2. When a process receives a message (m, red),
    it records its state if it has not already done
    so, and then accepts the message m.

22
Another example of distributed snapshot
Communicating State Machines
23
Something unusual
  • Let machine i start Chandy-lamport snapshot
    before it has sent M along ch1. Also, let machine
    j receive the marker after it sends out M along
    ch2. Observe that the snapshot state is
  • down ? up M
  • Doesnt this appear strange? This state was
    never reached during the computation!

24
Understanding snapshot
25
Understanding snapshot
The observed state is a feasible state that is
reachable from the initial configuration. It may
not actually be visited during a specific
execution. The final state of the original
computation is always reachable from the
observed state.
26
Discussions
  • What good is a snapshot if that state has never
    been visited by the system?
  • - It is relevant for the detection of stable
    predicates.
  • - Useful for checkpointing.

27
Discussions
  • What if the channels are not FIFO?
  • Study how Lai-Yang algorithm works. It does not
    use any marker
  • LY1. The initiator records its own state. When
    it needs to send a message m to another process,
    it sends a message (m, red).
  • LY2. When a process receives a message (m, red),
    it records its state if it has not already done
    so, and then accepts the message m.
  • Question 1. Why will it work?
  • Question 1 Are there any limitations of this
    approach?

28
Global state collection
  • Some applications
  • - computing network topology
  • - termination detection
  • - deadlock detection
  • Chandy Lamport algorithm does a partial job.
    Each process collects a fragment of the global
    state, but these pieces have to be stitched
    together to form a global state.

29
A simple exercise
  • Once the pieces of a consistent global state
    become available, consider collecting the global
    state via all-to-all broadcast
  • At the end, each process
  • will compute a set V, where
  • V s(i) 0 i N-1

s(i)
s(j)
i
j
s(k)
s(l)
k
l
30
All-to-all broadcast
Assume that the topology is strongly connected
graph
  • Program broadcast (for process i
  • define V.i, W.i set of values
  • initially V.is(i), W.i ??
  • ?and?every channel is empty?
  • do V.i ? W.i? send (V.i \ W.i) to every outgoing
    channel W.i V.i
  • ? empty (k, i)? receive X from channel(k, i)
    V.i V.i ? X
  • od

V.i W.i
V.k W.k
(i,k)
Acts like a pump
31
Proof
  • Lemma. empty (i. k) ? W.i ??V.k.
  • (Upon termination) ?i V.i W.i,
  • and all channels are empty.
  • So, V.i ?? V.k.
  • On a cyclic path, V.i V.k must be
  • true. Since s(i) ??V.i, s(i) ??V.k

V.i W.i
V.k W.k
(i,k)
32
Acknowledgements
  • This part is heavily dependent on Dr. Sukumar
    Ghosh Iowa University Distributed Systems course
    22C166

33
(No Transcript)
34
Termination Detection and Deadlocks
35
Termination detection
  • During the progress of a distributed computation,
  • processes may periodically turn active or
    passive.
  • A distributed computation termination when
  • (a) every process is passive,
  • (b) all channels are empty, and
  • (c) the global state satisfies the desired
    postcondition

36
Visualizing diffusing computation
initiator
active
passive
Notice how one process engages another process.
Eventually all processes turn white, and no
message is in transit -this signals termination.
How to develop a signaling mechanism to detect
termination?
37
Dijkstra-Scholten algorithm
The basic scheme
  • Node j engages node k.
  • An initiator initiates termination detection
  • by sending signals (messages) down the
  • edges via which it engages other nodes.
  • At a suitable time, the recipient sends an
  • ack back.
  • When the initiator receives ack from every
  • node that it engaged, it detects termination.

j
k
signal
j
k
j
k
ack
38
Dijkstra-Scholten algorithm
  • Deficit (e) of signals on edge e - of ack
    on edge e
  • For any node, C total deficit along incoming
    edges
  • and D total deficit along outgoing
    edges
  • For the initiator, by definition, C 0
  • Dijkstra-Scholten algorithm used the following
    two
  • Invariants to develop their algorithm
  • Invariant 1. (C 0) ? (D 0)
  • Invariant 2. (C gt 0) ? (D 0)

0
1
2
3
4
5
39
Dijkstra-Scholten algorithm
  • The invariants must hold when an interim node
    sends an ack.
  • So, acks will be sent when
  • (C-1 0) ? (C-1 gt 0 ??D0)
  • follows from INV1 and INV2
  • (C gt 1) ?? (C 1 ? D0)
  • (C gt 1) ??(C 1 ? D0)

0
1
2
3
4
5
40
Dijkstra-Scholten algorithm
  • program detect for an internal node i
  • initially C0, D0, parent i
  • do
  • - m signal ? (C0) ?
  • C1 state active parent sender
  • this node can send out messages to engage other
    nodes, or turn passive
  • - m ack ? D D-1
  • - (C1? D0) ? state passive ? send ack
    to parent C 0 parent i
  • - m signal ? (C1) ?
  • send ack to the sender
  • od

0
1
2
3
4
5
Note that the engaged nodes induce a spanning tree
41
Distributed deadlock
  • Assume each process owns a few resources, and
    review how resources are allocated.
  • Why deadlocks occur?
  • - Exclusive (i.e not shared) resources
  • - Non-preemptive scheduling
  • - Circular waiting by all or a subset of
    processes

42
Distributed deadlock
  • Three aspects of deadlock
  • deadlock detection
  • deadlock prevention
  • deadlock recovery

43
Distributed deadlock
  • May occur due to bad designs/bad strategy
  • Sometimes prevention is more expensive than
    detection and recovery. So designs may not care
    about deadlocks, particularly if it is rare.
  • Caused by failures or perturbations in the system

44
Wait-for Graph (WFG)
  • Represents who waits for whom.
  • No single process can see the WFG.
  • Review how the WFG is formed.

45
Another classification
  • Resource deadlock
  • R1 AND R2 AND R3
  • also known as AND deadlock
  • Communication deadlock
  • R1 OR R2 OR R3
  • also known as OR deadlock

46
Detection of resource deadlock
  • Notations
  • w(j) true ? (j is waiting)
  • depend j,i true ??
  • j ? succn(i) (ngt0)
  • P(i,s,k) is a probe
  • (iinitiator, s sender, rreceiver)

2
1
3
4
P(4,4,3)
initiator
47
Detection of resource deadlock
  • Program for process k
  • do
  • P(i,s,k) received ? wk ? (k ? i) ??
    dependk, i ?
  • send P(i,k,j) to each successor j dependk,
    i true
  • P(i,s, k) received ??wk ? (k i) ? process k
    is deadlocked
  • od

48
Observations
  • To detect deadlock, the initiator must be in a
    cycle
  • Message complexity O(E)
  • (edge-chasing algorithm)

Eset of edges
Should the links be FIFO?
49
Communication deadlock
This has a resource deadlock but no
communication deadlock
50
Detection of communication deadlock
  • A process ignores a probe, if it is not waiting
    for any process. Otherwise,
  • first probe ?
  • mark the sender as parent
  • forwards the probe to successors
  • Not the first probe ?
  • Send ack to that sender
  • ack received from every successor ?
  • send ack to the parent
  • Communication deadlock is detected
  • if the initiator receives ack.

Has many similarities with Dijkstra-Scholtens
termination detection algorithm
51
Distributed deadlock
  • May occur due to faulty design or resource
    sharing problems
  • Sometimes prevention is more expensive than
    detection and recovery. So certain designs
    deliberately do not care about deadlocks,
    particularly if it is rare.
  • Sometimes failures failures or perturbations can
    modigy the system state and cause deadlock.

Major issues
detection
prevention
recovery
52
Wait-for Graph (WFG)
  • Represents who waits for whom.
  • No single process can see the WFG.
  • Review how the WFG is formed.

53
Another classification
  • Resource deadlock
  • R1 AND R2 AND R3
  • also known as AND deadlock
  • Communication deadlock
  • R1 OR R2 OR R3
  • also known as OR deadlock

54
Detection of resource deadlock
  • Notations
  • w(j) true ? (j is waiting)
  • depend j,i true ??j ? succn(i) (ngt0)
  • P(i,s,k) is a probe
  • (iinitiator, s sender, rreceiver)

2
1
3
4
P(4,4,3)
initiator
55
Detection of resource deadlock
Chandy-Misra-Haas algorithm
  • Program for process k
  • do P(i,s,k) received ?
  • wk ? (k ? i) ?? dependk, i ?
  • send P(i,k,j) to each successor j dependk,
    i true
  • P(i,s,k) received ??wk ? (k i) ? process k
    is deadlocked
  • od

56
Observations
  • To detect deadlock, the initiator must be in a
    cycle
  • Message complexity O(E)
  • (edge-chasing algorithm)

Eset of edges
57
Communication deadlock
5
The subgraph of the WFG consisting of black nodes
and black edges has a resource deadlock as well
as a communication deadlock. However, if we add
node 5 and the red edge (4,5) then the
communication deadlock will disappear.
58
Detection of communication deadlock
  • A process ignores a probe, if it is not waiting
    for any process. Otherwise,
  • first probe ?
  • mark the sender as parent
  • forwards the probe to successors
  • Not the first probe ?
  • Send ack to that sender
  • ack received from every successor ?
  • send ack to the parent
  • Communication deadlock is detected
  • if the initiator receives ack.

Has many similarities with Dijkstra-Scholtens
termination detection algorithm
59
Acknowledgements
  • This part of the slides is almost entirely
    dependent on Dr. Sukumar Ghosh course Iowa
    University Distributed Systems course 22C166
Write a Comment
User Comments (0)
About PowerShow.com