Distributed Systems - PowerPoint PPT Presentation

1 / 106
About This Presentation
Title:

Distributed Systems

Description:

... timeline view, event 2 must be caused by some passage of information from event 1 ... Happen-before Relationship of events: ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 107
Provided by: liefl
Category:

less

Transcript and Presenter's Notes

Title: Distributed Systems


1
Distributed Systems
Distributed Systems Clocks Concurrency Deadlocks

2
Distributed Systems
Intended Schedule for this Lecture

Times in DS

Physical Time

Synchronization of Clocks

Logical Time

Concurrency

Centralized Algorithm

Token-based Algorithm

Voting Algorithm(
Ricart
-
Agrawala
)

Election Algorithms(Bully, Ring)

Deadlock

Centralized Detection

Path Pushing

Distributed Detection of Cycles(
Chandy
-
Misra
-Haas)
3
Motivation
Problems with Time
1. Nobody really has one 2. Every ordinary clock
is unprecise 3. Due to Albert time is really
relative 4. However, we need time stamps in DS
(to order events etc.)
4
Motivation
Why Timestamps in Systems?
In order to 1. do some precise performance
measurements 2. guarantee up-to-date data or
judge the actuality of data 3. establish a
total ordering of objects or operations
(like transactions) 4. what else?
5
Motivation
Lack of a Uniform Global Time in DS
However, due to the nature of non precise clocks
there is no global, unique time in a DS Suppose
there is a central time server being able to
deliver exact times via time-messages to all
nodes of a widely spread DS (e.g. a MAN) gt
due to non deterministic transfer-times of these
time-messages there is no uniform time on all
nodes of the DS
Transfer-time of time-messages from the
central time server to a specific node may vary
over time
6
Distributed Systems
Side Trip into Philosophy and Special Relativity
The Myth of Simultaneity Event 1 and event 2 at
same time
Event 1
Event 2
7
Distributed Systems
Event Timelines (Example of previous Slide)
time
Node 3
Node 4
Node 5
Note The arrows start from an event and end at
an observation. The slope of the arrows
depend of relative speed of propagation
8
Distributed Systems
Causality
Event 1
causes
Event 2
Requirement We have to establish causality, i.e.
each observer must see event 1 before event 2
9
Distributed Systems
Event Timelines (Example of previous Slide)
time
Node 3
Node 4
Node 5
Note In the timeline view, event 2 must be
caused by some passage of information from event
1 if it is caused by event 1
10
Distributed Systems
Example (distributed Unix make)
Editor on computer 1
Local time on computer 1
Compiler on computer 2
Local time on computer 2
Absolute time (Gods clock)
11
Distributed Systems
Physical Time
Some systems really need quite accurate absolute
times.
How to achieve high accuracy? Which physical
entity may deliver precise timing?
1. The sun 2. An Atom
TAI (International Atomic Time)
12
Distributed Systems
Problem with Physical Time
A TAI-day is about 3 msec shorter than a day
gt BHI inserts 1 sec, if the difference
between a day and a TAI-day is more than
800msec gt Definition of UTC universal time
coordinated, being the base of any international
time measure.
13
Distributed Systems
Physical Time
  • UTC-signals come from radio broadcasting stations
  • or from satellites (GEOS, GPS) with an accuracy
    of
  • 1.0 msec (broadcasting station)
  • 0.1 msec (GEOS)
  • 0.1 µsec (GPS)

Remark Using more than 1 UTC source you may
improve the accuracy
14
Distributed Systems
Clock Synchronization
  • Adjusting physical clocks
  • local clock behind reference clock
  • local clock ahead of reference clock
  • Observation
  • Clocks in DS tend to drift apart and need to be
    resynchronized periodically
  • A. If local clock is behind a reference clock
  • could be adjusted in one jump or
  • could be adjusted in a series of small jumps
  • B. What to do if local clock is ahead of
    reference clock?

You can adjust by slowing down your local clock,
i.e. ignoring some of its HW clock ticks
15
Distributed Systems
Absolute Clock Synchronization
computer to be synchronized with a UTC-Receiver
UTC-Time Server
t0
request
Ts time to handle the request
tUTC
t1
both time values (t0 and t1) are measured with
the same clock
time
16
Distributed Systems
Absolute Clock Synchronization
  • Initialize local clock t tUTC
  • (Problem Time-Message Transfer-Time)
  • Estimate Message transfer-time, (t1-t0)/2 ? t
    tUTC (t1- t0)/2
  • (Problem Time of the Request Message tr)
  • Suppose tr is known, ? t tUTC (t1- t0 -
    tr)/2
  • (Problem Message transfer-times are load
    dependent)
  • Multiple measurements (t1 - t0)
  • Throw away measurements above a threshold value
  • Take all others to get an average

17
Distributed Systems
Relative Clock Synchronization (Berkeleys
Algorithm)
If you need a uniform time (without a
UTC-receiver per computer), but you can
established a central time-server
  • Time-server periodically asks all nodes to give
    him their times by the clock
  • The time server can estimate the local times of
    all nodes regarding the
  • involved message transfer times.
  • Time server uses the estimated local times for
    building the arithmetic mean
  • The corresponding deviations from this
    arithmetic mean are sent to the nodes

18
Distributed Systems
Network Time Protocol (NTP)
  • Goal
  • absolute (UTC)-time service in large nets (e.g.
    Internet)
  • high availability (via fault tolerance)
  • protection against fabrication (via
    authentication)
  • Architecture
  • time-servers build up a hierarchical
    synchronization subnet
  • all primary servers (root level 1 server) have
    an UTC-receiver
  • secondary servers are synchronized by their
    corresponding parent primary server
  • all other stations are leaves on level 3 being
    synchronized by level 2 time servers
  • the accuracy of individual clocks decreases with
    increasing level number
  • the net is able to reconfigure

19
Distributed Systems
Three NTP Modes
  • Multicast mode (for quick LANs, low accuracy)
  • server sends periodically its actual time to its
    leaves in the LAN via multicast
  • Procedure-call mode (medium accuracy)
  • server responds to requests with its actual
    timestamp
  • Symmetric mode (high accuracy used to
    synchronize between the time servers)
  • intermediate exchange of timestamps

Remark In all cases the UDP-transportation
protocol is used, i.e. messages can get lost!
20
Distributed Systems
Some NTP Details
Despite of multicast all other messages are
transferred in pairs, i.e.you note the send-time
as well as the receive-time.
ti-2
ti-1
Server A
m
m
Server B
ti-3
ti
Let o tA - tB be the true time difference
between B relative to A oi the
estimation of o t and t the
corresponding message transfer times for m and
m di t t the total message
transfer-time You can measure di ti - ti-3 -
(ti-1 - t I-2) ?? ti-2 ti-3 t o and ti
ti-1 t - o
21
Distributed Systems
More NTP Details
  • Successive pairs ltoi,digt may have to filtered
    once more, to get better estimates
  • Time Servers synchronize with various other
    time-servers,
  • typically with one on the same level
  • and two other ones of a lower level)
  • Server may choose their synchronization-partners
  • Measurements in the Internet show, that 99 of
    all nodes
  • have an synchronization error of less than 30
    msec.

22
Distributed Systems
Logical Time
In many cases its sufficient just to order the
relevant events, i.e. we want to be able to
position these events relatively, but not
absolutely. The interesting thing is the
relative position of an event on the time
axis Especially we do not need any scaling of
this logical time axis!
  • A very simple solution is the ring clock (André,
    Herman and Verjus 1985)
  • A clock message circulates
  • Incremented at each event

23
Distributed Systems
Logical Time
  • Characteristics of a logical time
  • causal dependencies have to be mapped correctly
  • (e.g. send message before receive message)
  • non related events (from independent activities)
  • do not have to be ordered
  • (i.e. they can appear in any order on the
    logical time axis)
  • Assumptions
  • DS n single-processor nodes
  • Activity of each node sequence of totally
    ordered events EN
  • 3 types of events local events, sends, receives
  • The total activity of the system is E ?EN

  • N

24
Distributed Systems
Logical Time
Happen-before Relationship of events Let ?p
denote the local relation happen-before within
node p a ?p b iff a and b are both events on p
and a happens before b. We define the global
happen-before relation ? a ? b holds
iff ? node p a ?p b, or ? message m a
send(m), b receive(m), or ? event c a ? c
and c ? b.
Note The relation happen-before models
potential causality, not necessarily real
causality.
25
Distributed Systems
Logical Time
Concurrency of events Two events a and b are
concurrent , a??b , iff neither a ? b nor b ? a
holds.
26
Distributed Systems
Implies an inherent order
Example
node1
node2
node3
It holds e11 ? e12 ? e21 ? e22 ? e32 ,
furthermore e31 ? e32, whereas e31 is neither
related happen-before to e11, nor to e12, nor
to e21, nor to e22. e31 is concurrent to e11,
e12, e21, and e22.
Remark The relation happen-before ? is also
called causality-relation.
27
Distributed Systems
Lamport Time
With the ordering implied by the
happen-before-relation we can establish the
Lamport time L via simple counters, whereby E
events,
The mapping L E ? N defines the Lamport-time
L, i.e. each e ? E gets a time stamp L(e), as
follows
1. e is a pure local event or a sending-event
if e has no local predecessor, then L(e) 1,
otherwise there is a local predecessor e,
thus L(e) L(e) 1
2. e is a receiving event, with a corresponding
sending-event s if e has no local
predecessor, then L(e) L(s) 1, otherwise
there is a local predecessor e, thus L(e)
maxL(s),L(e) 1
28
Distributed Systems
Example
node 1
node 2
node 3
Note Any node has only a local counter being
incremented with each local event. With
each communication we have to adjust the involved
counters of the two communicating nodes
to be consistent with the happen-before-relation
.
Remark The same mechanism can be used to adjust
clocks on different nodes. The Lamport time is
consistent with the happen-before-relation,
i.e. if x ? y, then L(x) lt L(y), but not vice
versa.
29
Distributed Systems
Example Adjusting local clocks with varying rates
30
Distributed Systems
Relationships between the Notions
The Lamport-Time is consistent with the
causality, but it does not characterize
causality. If x causes y, then x has a smaller
Lamport-time stamp than y, x ? y ? L(x) lt L(y)
However L(x) lt L(y) does not necessarily
imply x causes y !!!
31
Distributed Systems
Vector Time
  • There is a DS with n nodes.
  • The n-dimensional vector Vp is the vector-time of
    node p,
  • if it is built according to the following rules
  • (1) Initially, Vp (0,0)
  • (2) For a local event on node p Vpp 1
  • For a send event on p, do the same and
  • append the new Vp to the message
  • (4) When receiving a message m with an appended
    V(m) on node p
  • increment Vp as in (2), and later on do
  • Vp maxV(m), Vp)

Build the maximum componentwise
32
Distributed Systems
Example Vector Time
P1
P2
P3
33
Distributed Systems
Characteristics of the Vector Time
You can define the following relations for the
vector-time A) Suppose u,v are two vector times
of dimension n 1. u ? v ? up ? vp ?p
1, K, n 2. u lt v ? u ? v and u ? v 3. u??
v ? (u ? v ) and (v ? u)
34
Distributed Systems
Characteristics of the Vector Time
The following inter relationships between
causality and vector-time hold A.) e ? e ?
V(e) lt V(e) B.) e ?? e ? V(e) ?? V(e) The
vector-time is the best known estimation for
global sequencing that is based only on local
information.
35
Distributed Systems
Total Ordering of Events
The Lamport-time gives us at least a
partial-ordering of distributed events which is
sufficient for many problems.
However, if we add the unambiguous node number,
we can establish a total-ordering An event e
on node a gets the global time stamp LT(e)
(L(e), a). (L(e),a) lt (L(e),b) ltgt L(e)
lt L(e) or L(e) L(e) and a lt b
36
Distributed Systems
Causal Ordering of Messages
If a message system guaranteeing the original
order of the messages, is an agreeable
characteristic that may ease protocols or
algorithms.
Definition m1 and m2 are two messages being
received at the same node i. A set of mesages is
causally ordered if for all pairs ltm1,m2gt the
following holds send(m1) ? send(m2) ?
receive(m1) ? receive(m2)
Example of non causally ordered messages
P1
P2
P3
37
Distributed Systems
Protocol forcing Causal Ordering of Messages
  • Each node i maintains a nxn matrix Mi,
    initialized to 0,
  • (i.e. no message was sent up to now).
  • When sending a message from node i to node j,
    increment Mi i,j,
  • i.e. (i,j, Mi i,j) unambiguously identifies
    the message

38
Distributed Systems
Protocol forcing Causal Ordering of Messages
  • The incremented matrix Mi and the node number i
    are
  • appended to the message, i.e. lt i, Mi,
    ltmessagegtgt is sent to node j
  • Upon receiving a message (with Matrix M) at node
    j
  • first this node j updates its matrix Mj as
    follows
  • ?? k,l ?1,n, l ? j Mjk,l max Mjk,l ,
    Mk,l and
  • Mji,j Mji,j 1
  • Delay this message until the following holds M
    ?lt Mj,
  • ( A lt B iff ?k,l Ak,l lt Bk,l )
  • i.e. wait for earlier messages to node j,
    having not yet arrived
  • (could be even a message from the same node i)

39
Distributed Systems
Example
P1
P2
0 1 0 0 0 1 0 1 0
P3
40
Distributed Systems
Concurrency Control
  • About coping with some sort of conflicts
  • Locking
  • Transactions
  • Time-Stamp Orderings

41
Distributed Systems
Mutual Exclusion
The problem For accessing shared data or for
using of resources we often have to provide
exclusiveness !! The corresponding pieces of
code are named critical sections!
Concurrent accesses are not allowed
Data
Logically we still have a common memory
42
Distributed Systems
Mutual Exclusion
Requirements for a correct solution 1.
Safety Only a single task/threads is allowed
to be in the
critical section! 2. Liveness Each competitor
must enter its critical section
after finite waiting time 3.
Sequence order Waiting in front of a critical
section is handled according to FCFS 4.
Fault tolerance 1. and 2. have to be fulfilled
even in case
of failures.
No Deadlocks No Starvation
43
Distributed Systems
Criteria for Mutual Exclusion
  • Number of needed messages per critical section
    CS, minimal nm
  • Protocol delay (to evaluate who is the next) per
    CS, minimal d
  • Response time RTCS, time interval between
    requesting to enter
  • a CS and until you leave the CS, minimal RTCS
  • Throughput TPCS, passing a CS per time unit
    (maximize TPCS)
  • TPCS 1/(d ECS)

44
Distributed Systems
Solutions for Mutual Exclusion in DS
  • Three major approaches
  • Centralized lock manager
  • Token-passing lock manager
  • Standard Token Algorithm
  • Enhanced Token Algorithm
  • Distributed lock manager
  • Lamport Algorithm
  • Ricard-Agrawala Algorithm

45
Distributed Systems
Centralized Lock Manager
One task is designated to be the coordinator for
all competing tasks concerning a specific
critical region, CR CSs belonging to the same
mutual exclusion problem Centralized lock
manager CLM controls accesses to CR using a
token which represents permission to enter CS To
enter its CS client sends a request message to
CLM and the waits for a positive answer from the
CLM If no client holds the token the CLM
responds immediately with the token. Otherwise
this request is queued.
46
Distributed Systems
Centralized Lock Manager
Token holder
Client
Client
Client
Server might crash! 1. Client may hold the
token 2. Client may have returned it 3. What
about queued request?
queue
Centralized Lock Manager
Question What problems might arise?
47
Distributed Systems
The queued message is optional. Benefits?
Centralized Lock Manager
Application 1
Application 2
Lock Manager
receive_message
send_message
send_message
send_message
receive_message
receive_message
receive_message
send_message
send_message
A1
A2
queued requests
Note A major drawback of a centralized lock
manager is the single point of failure Another
drawback is the danger of becoming a bottleneck.
The protocol delay is determined by at least two
messages (request, grant)
48
Distributed Systems
Token-Passing Mutual Exclusion
There is a single token for all participants
competing for a critical section. To enter a
critical section an application must posses this
token.
We have to invent a logical ring amongst those
participants and hand over this token within
this logical ring in order to guarantee that each
participant will have the chance to enter the
critical section
  • The token-passing algorithm
  • before entering the critical section
  • an application must await the token
  • after the critical section each application
    sends the token
  • to the next neighbored participant
  • if no participants want to enter the critical
    section
  • the token continues circulating

49
Distributed Systems
Logical Ring
Standard Token Algorithm
Current Lock Holder
50
Distributed Systems
Analysis of the Token Based Exclusion
Check out the list of requirements 1. Safety,
yes, due to unique token, only the token
holder may enter its CS 2. Liveness, yes, as
long as the logical ring has only a finite
of nodes 3. Sequence order, no, CLM may change
the internal order of waiting requests 4.
Fault tolerance, no, splitting of the logical
ring and you may be lost.
51
Distributed Systems
Problems with the Token-Passing Mutual Exclusion
1. How do you determine if the token is lost or
is just being used for a very long time?
2. What happens if the location that has the
token crashes for an extended period of time
3. How to maintain a logical ring if a
participant drop out (voluntarily or by
failure) of the system?
4. How to identify and add new participants
joining the logical ring, respectively remove old
ones?
5. That token is perpetually passed over the
logical ring even though none of the
participants wants to enter its CS ?
unnecessary overhead
52
Distributed Systems
Implementation Problems
53
Distributed Systems
Implementation Problems
54
Distributed Systems
Implementation Problems
Question What may happen if you always try to
give the token to the next neighbored node? If
that participant does not wait for it ? poor
performance !
55
Distributed Systems
Implementation Problems
Prob ?1
Question How to solve this problem as a
system-architect if we do not want to change
the philosophy of the standard token algorithm?
56
Distributed Systems
Implementation Solution
Invest another TokenHandler-thread per
application and critical section
Participant on Node i 1
TokenHandler Node i 1
Prob ?1
Non blocking option
Receive(Token fom Nodei)
Send_Request(Token for CrS_1)
Receive(Local_Request)
Receive(Token for CrS_1)
If Local_Request ?
no
yes
Critical Section_1
Send(Local_Request)
Send_Release(Token for CrS_1)
Receive(Local_Release)
Send(Token to Node i2)
57
Distributed Systems
Example Perpetual Passing the Token
CS
CS
Node i
CS
Node j
no need for the token
CS
Node k
no need for the token
Exercise 1 Invent a better token based solution
avoiding the overhead of perpetual token
passing! Hint You have to know who really
wants to get the token!
58
Distributed Systems
Distributed Lock Manager
  • Though similar to the centralized solution
  • there are additional problems to solve
  • Who sends messages when and to whom?
  • Who receives messages when and from whom?
  • Which messages are necessary to enter a critical
    section?

59
Distributed Systems
Distributed Lock Manager
  • Three message types (2 are required, 1 is
    optional)
  • Request_Message
  • Queued_Message
  • Granted_Message

60
Distributed Systems
Request Message
The application wishing to enter its critical
section sends this message to all those
applications (threads) competing for this
critical section. How?
  • Either n-times individually or via a multi-cast
    (see later slides).
  • Each request message contains a timestamp from
    the source.

61
Distributed Systems
Queued Message
This message is only optional and is sent by
those recipients of the request message whenever
the request cannot be granted immediately, i.e.
  • recipient is currently in the critical section
    or
  • recipient had initiated an earlier request

Remark This message type eases to find out
whether there are dead participants
62
Distributed Systems
Granted Message
Sent to a requesting process from all
participants in two circumstances
  • recipient is not in its critical section
  • and has no earlier request
  • if recipient has queued request it will sent
    grant
  • upon leaving the critical section

63
Distributed Systems
Release Message
After having released the resource sent to all
participants with a queued request-message
Remark Have a closer look on both algorithms in
Stallings, p. 603 - 606, 1. Lamport Time,
Clocks, and the Ordering of Events in a DS, C.
ACM, July 1978 2. Ricart An Optimal Algorithm
for Mutual Exclusion in Computer Networks,
C.ACM January and September 1981
64
Distributed Systems
Ricart/Agrawala-Algorithm
Waiting for Entrance in Critical Section
Requesting Mutual Exclusion
Computation outside of Critical section
Critical Section
Activating Others
65
Distributed Systems
Closer Look on Ricart/Agrawala-Algorithm (1981)
  • No tokens anymore
  • Cooperative voting to determine the intended
    sequence of CSs
  • Does not rely on an interconnection media
    offering ordered messages
  • Serialization based on logical time stamps
    (total ordering)
  • If a participant wants to enter its CS it asks
    all others for permission
  • and does not proceed until it has permission of
    all other participants
  • If a participant gets a permission request and
    is not interested in its CS,
  • it returns permission immediately to the
    requester.

66
Distributed Systems
Correctness Conditions (1)
  • All nodes behave identically, thus we just
    regard node x
  • After voting, three groups of requests may be
    distinguished
  • 1. known at node x with a time stamp less than Cx
  • 2. known at node x whith a time stamp greater
    than Cx
  • 3. those being still unknown at node x

67
Distributed Systems
Correctness Conditions (2)
During this voting process marks may change
according to the following conditions
Condition 1 Requests of group 1 have to be
served or they have to take a time stamp
greater than Cx Condition 2 Requests of group 2
may not get a time stamp smaller than
Cx Condition 3 Request of group 3 must have
time stamps greater than Cx
68
Distributed Systems
Two Phases of the Voting Algorithm
1. Participants at node i willing to enter their
critical section send request messages ei to
all other participants, where ei contains the
actual Lamport time Li of node i. (After
each send, node i increments its counter Ci). 2.
All other participants return permission messages
ai. Node x replies to a request message ei
as soon as all older requests (received at
earlier Lamport times) are completed.
Delay a bit
Cx maxCx,Ci 1
Result If all permission messages have arrived
at node i, the corresponding requester may
enter its critical section.
69
Distributed Systems
Example of the Voting Algorithm
Node i
Node j
Node k
Suppose Mi lt Mk ? the request message Mi has a
smaller time stamp than Mk, we have to delay
the answer for the request message ek in node i !
70
Distributed Systems
Comparison between Mutual Exclusion Algorithms
T Message Transfer Time E Execution Time
of CS
71
Distributed Systems
Election Algorithms
Suppose, your centralized lock manager crashes
for a longer period of time. Then you need a new
one, i.e. you have to elect a new one. How to
do that in a DS?
  • The 2 major election algorithms are based upon
  • each node has a unique node number
  • (i.e. there is a total ordering of all nodes)
  • node with highest number of all active nodes is
    coordinator
  • after a crash a restarting node is put back to
    the set
  • of active nodes

72
Distributed Systems
Bully Algorithm (Garcia-Molina, 1982)
Goal Find the active node with highest number,
tell him to be the coordinator and tell
this all other nodes, too
Start The algorithm may start at any node, may
be a node recognizing that the previous
coordinator is no longer active.
  • Message types
  • Election messages e, initiating the election
  • Answer message a, confirming the reception of an
    e-message
  • Coordinator messages c, telling, the sender is
    the new coordinator

73
Distributed Systems
Steps of Bully Algorithm
1. Some node Ni sends e-messages to all other
nodes Nj, j gt i. 2. If there is no answer within
time-limit T, Ni elects himself as coordinator
sending this information via a c-message to all
others Nj, j lt i. 3. If Ni got an a-message
within T (i.e. there is an active node with a
higher number), it is awaiting another
time-limit T. He restarts the whole algorithm,
if there is no c-message within T. 4. If Nj
receives an e-message from Ni, it answers with an
a-message to Ni and starts the algorithm for
itself (step 1). 5. If a node N -after having
crashed and being restarted- is active again, it
starts step 1. 6. The node with the highest
number establishes itself as coordinator
74
Distributed Systems
Example Bully Algorithm
Nodes 3 and 4 have to start the algorithm due to
their higher number
75
Distributed Systems
Ring Algorithm (Le Lann, 1977)
  • Each node is part of one logical ring
  • Each node knows that logical ring, i.e. its
    immediate successor as well
  • as all other successors.
  • 2 types of messages are used
  • election-message e to elect the new coordinator
  • coordinator-message c to introduce the
    coordinator to the nodes
  • The algorithm is initiated by some node Ni
    detecting
  • that the coordinator no longer works
  • This initiating node Ni send an e-message with
    its node number i
  • to its immediate successor Ni1
  • If this immediate successor Ni1 does not
    answer, it is assumed that
  • Ni1 has crashed and the e-message is sent to
    Ni2

76
Distributed Systems
Ring Algorithm (Le Lann, 1977)
  • If node Ni receives an e- or c-message, it
    contains a list of node numbers
  • If an e-message does not contain its node number
    i, Ni adds its node number
  • and sends this e-message to Ni1
  • If an e-message contains its node number i, this
    e-message has circled
  • once around the ring of all active nodes
  • If its an c-message keeps in mind the node with
    the highest number in that list
  • being the new coordinator
  • If a c-message has circled once around the
    logical ring, its deleted
  • After having restarted a crashed node you can
    use an inquiry-message,
  • circling once around the logical ring

77
Distributed Systems
Ring Algorithm (Le Lann, 1977)
78
Distributed Systems
Ring Algorithm (Le Lann, 1977)
79
Distributed Systems
Ring Algorithm (Le Lann, 1977)
This coordinator-message circles once around the
logical-ring
80
Distributed Systems
Comparison of both Election Algorithms
81
Distributed Systems
Deadlocks in Distributed Systems
  • Prevention (sometimes)
  • Avoidance (far too complicated and
    time-consuming)
  • Ignoring (often done in practice)
  • Detecting (sometimes really needed)

82
Distributed Systems
Deadlocks in Distributed Systems
  • In DS a distinction is made between
  • Resource deadlock processes are stuck waiting
    for resources
  • held be each other
  • Communication dl processes are stuck waiting
    for messages
  • from each other where no messages are in transit

83
Distributed Systems
Distributed Deadlocks
  • Using locks within transactions may lead to
    deadlocks

A deadlock has occurred if the global waiting
graph contains a cycle.
84
Distributed Systems
Deadlock Prevention in Distributed Systems
1. Only allow single resource holding (gt no
cycles) 2. Preallocation of resources (gt low
resource efficiency) 3. Forced release to
request 4. Acquire in order ( quite a cumbersome
task to number all resources in a DS) 5.
Seniority rules each application gets a
timestamp. if a senior application
request a resource being held by a
junior, the senior wins.
85
Distributed Systems
Deadlock Avoidance in Distributed Systems
Deadlock avoidance in DS is impractical
because 1. Every node must keep the track of
the global state of the DS gt
substantial storage and communication
overhead 2. Checking for a global state safe
must be mutual exclusive 3. Checking for safe
states requires substantial processing and
communication overhead if there are many
processes and resources
86
Distributed Systems
Deadlock Detection in Distributed Systems
Increased problem If there is a deadlock in
general resources from different nodes are
involved Several approaches
1. Centralized Control 2. Hierarchical
control 3. Distributed Control
In any case Deadlocks must be detected within a
finite amount of time
87
Distributed Systems
Deadlock Detection in Distributed Systems
  • Corretness in a waiting-graph depends on
  • progress
  • safety

88
Distributed Systems
Deadlock Detection in Distributed Systems
  • General remarks
  • Deadlocks must be detected within a finite
    amount of time
  • Message delay and out of date data may cause
    false cycles
  • to be detected (phantom deadlocks)
  • After a possible deadlock has been detected,
  • one may need to double check that it is a real
    one!

89
Distributed Systems
Deadlock Detection in DS Centralized Control
  • local and global deadlock detector (LDD and GDD)
  • (if a LDD detects a local deadlock it resolves
    it locally!).
  • The GDD gets status information from the LDD
  • on waiting-graph updates
  • periodically
  • on each request
  • (if a GDD detects a deadlock involving
    resources at two or more nodes,
  • it resolves this deadlock globally!)

90
Distributed Systems
Deadlock Detection in DS Centralized Control
  • Major drawbacks
  • The node hosting the GDD is a point of single
    failure
  • Phantom deadlocks may arise because
  • the global waiting graph is not up to date

91
Distributed Systems
Deadlock Detection in DS Hierarchical Control
  • hierarchy of deadlock-detectors (controllers)
  • waiting graphs (union of waiting graphs of its
    children)
  • deadlocks resolved at lowest level possible

92
Distributed Systems
Deadlock Detection in DS Hierarchical Control
Each node in the tree (except a leaf node) keeps
track of the resource allocation information of
itself and of all its successors gt
A deadlock that involves a set of resources will
be detected by the node that is the common
ancestor of all nodes whose resources are among
the objects in conflict.
93
Distributed Systems
Distributed Deadlock Detection in DS (Obermark,
1982)
  • no global waiting-graph
  • deadlock detection cycle
  • wait for information from other nodes
  • combine with local waiting-information
  • break cycles, if detected
  • share information on potential global cycles

Remark The non-local portion of the global
waiting-graph is an abstract node ex
94
Distributed Systems
Distributed Deadlock Detection in DS (Obermark,
1982)
Situation on node x
Already a deadlock???
ex
No local deadlock
95
Distributed Systems
Distributed Deadlock Detection in DS
(Chandy/Misra/Haas 1983)
  • a probe message lti, j, kgt is sent whenever a
    process blocks
  • this probe message is sent along the edges of
    the waiting-graph
  • if the recipient is waiting for a resource
  • if this probe message is sent to the initiating
    process,
  • then there is a deadlock

96
Distributed Systems
Distributed Deadlock Detection in DS
(Chandy/Misra/Haas)
  • If a process P has to wait for a resource R it
    sends a message
  • to the owner O of that resource.
  • This message contains
  • PID of waiting process P
  • PID of sending process S
  • PID of receiving process E
  • The receiving process E checks, if E is also
    waiting. If so,
  • it modifies the message
  • First component of message still holds
  • 2. Component is changed to PID(E)
  • 3. Component is changed to the PID of that
    process
  • process E is waiting for.
  • If the message ever reaches the waiting process
    P, then there is a deadlock.

97
Distributed Systems
Example of Distributed Deadlock Detection in DS
(0, 8, 0)
(0,4,6)
P6 P8
P4
(0,1,2)
P0 ? P1 ?P2
P3
(0,5,7))
P7
P5
Node 1
Node 2
Node 1
98
Distributed Systems
Deadlock Detection in DS Distributed Control
Recommended Reading Knapp, E. Deadlock
Detection in Distributed Databases, ACM Comp.
Surveys, 1987 Sinha, P. Distributed Operating
Systems Concepts and Design,
IEEE Computer Society, 1996 Galli,
D. Distributed Operating Systems Concepts and
Practice, Prentice Hall, 2000
99
Distributed Systems
Deadlocks in Message Communication
1. Deadlocks may occur if each member of a
specific group is waiting for a message of
another member of the same group. 2.
Deadlocks may occur due to unavailability of
message buffers etc. Study for yourself Read
Stallings Chapter 14.4., p. 615 ff
100
Distributed Systems
Multicast Paradigm
d
c
P
P
a
a
a
a
a b
b c
P
P
P
P
P
P
  • Ordering (unordered, FIFO, Causal, Agreed)
  • Delivery guarantees (unreliable, reliable,
    safe/stable)
  • Open groups versus closed groups
  • Failure model (omission, fail-stop,
    crash-recovery, network partitions)
  • Multiple groups


101
Distributed Systems
Traditional Protocols for Multicast
  • Example TCP/IP a point-to-point interconnection
  • Automatic flow control
  • Reliable delivery
  • Connection service
  • Complexity (n2)
  • Linear degradation in performance

Remark More on Linux-Multicast see
www.cs.washington.edu/esler/multicast/
102
Distributed Systems
Traditional Protocols for Multicast
  • Example Unreliable broadcast/multicast (UDP,
    IP-Multicast)
  • Employs hardware support for broadcast and
    multicast
  • Message losses 0.01 at normal load, more than
    30 at high load
  • Buffers overflow (in the network and in the OS)
  • Interrupt misses
  • No connection service

103
Distributed Systems
IP-Multicast
  • Multicast extension to IP
  • Best effort multicast service
  • No accurate membership
  • Class D addresses are reserved for multicast
  • 224.0.0.0 to 239.255.255.255 are used as group
    addresses
  • The standard defines how hardware Ethernet
    multicast addresses
  • can be used if these are possible

104
Distributed Systems
IP-Multicast Locical Design
105
Distributed Systems
IP Multicast
  • Extensions to IP inside a host
  • Host may send IP multicast using a multicast
  • by using address as the destination address
  • Host manages a table of groups and local
  • application processes that belong to this group
  • When a multicast message arrives at the host, it
    delivers
  • copies of it to all of the local processes that
    belong to that group
  • A host acts as a member of a group only if it
    has at least
  • one active process that joined that group

106
Distributed Systems
IP Multicast Group Management
  • Extensions to IP within one sub-net (IGMP)
  • A multicast router periodically sends queries to
    all hosts
  • participating in IP multicast on the special
    224.0.0.1 all-hosts group
  • Each relevant host sets a random timer for each
    group it is member of.
  • When the timer expires, it sends a report
    message on that group multicast access.
  • Each host that gets a report message for a group
  • cancels its local timer for that group
  • When a host joins a group it announces that on
    the group multicast address

Remark We have to skip further interesting
topics like backbones, multicast routing,
reliable multicast services (see other
specialized lectures).
Write a Comment
User Comments (0)
About PowerShow.com