Chapter 18: Distributed Process Management

About This Presentation

Title:

Chapter 18: Distributed Process Management

Description:

Destroy the process on A and create it on B. Move at least the PCB ... J receives a message from node I, it updates its local CJ to 1 max{ CJ , Ti } ... – PowerPoint PPT presentation

Number of Views:676

Avg rating:3.0/5.0

Slides: 53

Provided by: markt2

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 18: Distributed Process Management

1
Chapter 18 Distributed Process Management

CS 472 Operating Systems
Indiana University Purdue University Fort Wayne

2
Distributed Process Management

Note Chapter 18 is an online chapter and does
not appear in the textbook
Available under Online Chapters at
WilliamStallings.com/OS/OSe6.html
Be aware that the URL is case sensitive
A .pdf of this chapter should also be available
on the class web site under Resources

3
Distributed Process Management

This chapter concerns some issues in developing a
distributed OS
Process migration
Global state of a distributed system
Distributed mutual exclusion

4
Process migration

A sufficient amount of the state of a process
must be transferred from one computer to another
for the process to execute on the target machine
Goals
Load sharing
Efficient interaction with other processes and
data
Access to special resources
Survival

5
Process migration goals

Load sharing
Move processes from heavily loaded to lightly
load systems
OS typically initiates the migration for load
sharing
Communications performance
Processes that interact intensively can be moved
to the same node to reduce communications cost
May be better to move process to where the data
reside when the data set is large
The process itself typically initiates this
migration

6
Process migration goals

Utilizing special capabilities
Process can migrate to take advantage of unique
hardware or software capabilities
Availability (survival)
Long-running process may need to move because the
machine it is running on will be down

7
To migrate process P from A to B . . .

Destroy the process on A and create it on B
Move at least the PCB
Update any links between P and other processes
and data . . .
Including any outstanding messages and signals,
open files, etc.

8
(No Transcript)
9
Migration of process P from A to B

Entire address space can be moved or pieces
transferred on demand
Transfer strategies (assuming paged virtual
memory)
Eager (all)
Precopy
Eager(dirty)
Copy-on-reference
Flushing

10
Transfer strategies

Eager (all) Transfer entire address space
No trace of process is left behind
If address space is large and if the process does
not need most of it, then this approach my be
unnecessarily expensive
Precopy Process continues to execute on the
source node while the address space is copied
Pages modified on the source during precopy
operation have to be copied a second time
Reduces the time that a process is frozen and
cannot execute during migration

11
Transfer strategies

Eager (dirty) Transfer only modified pages in
main memory
Any additional blocks of the virtual address
space are transferred on demand from disk
The source machine is involved throughout the
life of the process
Copy-on-reference Transfer pages only when
referenced
Has lowest initial cost of process migration
Flushing Pages are cleared from main memory by
flushing dirty pages to disk
Relieves the source of holding any pages of the
migrated process in main memory
Needed pages are subsequently loaded from disk

12
Initiation of migration can be made by ...

by a load balancing process
by the process itself
for communications performance
for survival
to access special resources
by the foreign system (eviction)

13
Global state of a distributed system

Difficult concept to understand
Global state of a distributed system needs to be
known for mutual exclusion, avoiding deadlock,
etc.
Operating system cannot know the current state of
all process in the distributed system

14
Global state of a distributed system

A node can only know the current state of all
local processes and earlier states of remote
processes
States of remote processes are known only through
messages
Even the exact times of remote states cannot be
known
It is impossible to synchronize clocks of nodes
accurately enough to be of use

15
Example

A bank is distributed over two branches
To close a checking account at a bank, the
account balance (global state of account) needs
to be known
Deposits may not have cleared
Fund transfers may be pending
Checks may not have been cashed
Ask all correspondents to state pending activity
Close the account when all reply
Situation is analogous to determining the global
state of a system

16
Example

At 3 PM the account balance is to be determined
Messages are exchanged for needed information
A snapshot is established for each branch as of
3 PM

17
Example

Suppose at the time of balance determination, a
fund transfer message is in progress from branch
A to branch B
The result is a false balance determination (0)

18
Example

To correct the balance, all messages in transit
at the time of observation must be examined
Total consists of balance at both branches and
amount in the messages

19
Example

Suppose the clocks at the two branches are not
perfectly synchronized
Transfer amount at 301 from branch A
Amount arrives at branch B at 259
At 300 the amount is counted twice (200)

20
Terminology

Channel
Exists between two processes if they exchange
messages
State
Sequence of messages that have been sent and
received along channels incident with the process
Snapshot of a process
Current local state of the process . . .
together with the state as defined above
Global state
The combined snapshots of all processes

21
Problem

Process P gathers snapshots from the other
processes and determines a global state
Process Q does the same
The two global states as determined by P and Q
may be different
Solution Settle for consistent global states
Global states are consistent if . . .
for each message received, the snapshot of the
sender indicates that the message was sent

22
Inconsistent global state
23
Consistent global state
24
Distributed snapshot algorithm

Assumes that all messages are delivered in the
order sent and no messages are lost (e.g. TCP)
Special control message called a marker is used
Any process may initiate the algorithm by
recording its state
sending out the marker on all outgoing channels
before any other messages are sent

25
(No Transcript)
26
Distributed snapshot algorithm

Let P be any participating process
Upon first receipt of the marker (say from
process Q) process P does the following
P records its local state SP
P records the state of the incoming channel from
Q to P as empty
P propagates the marker to all its neighbors
along all outgoing channels
These three steps must be performed atomically
without any other messages sent or received

27
Distributed snapshot algorithm

Later, when P receives a marker from another
incoming channel (say, from process R) . . .
P records the state of the channel from R to P as
the sequence of messages P has received from R
from the time P recorded its local state SP to
the time it received the marker from R
The algorithm terminates at process P once the
marker has been received along every incoming
channel

28
Distributed snapshot algorithm

Once the algorithm has terminated at all
processes, the consistent global state can be
assembled at any node
Any node wanting a consistent global state asks
every other node to send it the state data
recorded at that node

29
Distributed snapshot algorithm

The algorithm succeeds even if several nodes
independently decide to initiate the algorithm
The algorithm is not affected by any other
distributed algorithm the processes are
executing
Algorithm terminates in a finite amount of time
Algorithm can be used to adapt any centralized
algorithm to a distributed environment

30
Distributed mutual exclusion

Recall that shared memory and semaphores cannot
be used to enforce mutual exclusion
Instead, any mechanism must depend on the
exchange of messages
Algorithms for mutual exclusion may be
Centralized
Distributed

31
Centralized mutual exclusion algorithm

Algorithm is straightforward
One node is designated as the control node
This node controls access to all shared objects
Only the control node makes resource-allocation
decisions
Uses Request, Permission, and Release messages
The control node may be a bottleneck
Failure of the control node causes a breakdown of
mutual exclusion

32
Distributed mutual exclusion algorithm

Each node has only a partial picture of the total
system and must make decisions based on this
information
All nodes bear equal responsibility for the final
decision
Failure of a node, in general, does not result in
a total system collapse
There is no common clock and no way to adequately
synchronize clocks

33
Distributed mutual exclusion

Distributed algorithm does require a time
ordering of events
For this, an event is the sending of a message
Did event E1 on node S1 occur before event E2 on
node S2 ?
Communication delays must be overcome
The answer need not be correct, but all nodes
must reach the conclusion

34
Lamports timestamping algorithm

Gives a consistent time-ordering of events in a
distributed system
Each node I has a local counter CI
When node I sends a message, it first increments
CI by 1
Messages from node I all have form ( m, TI, I),
where
m is the actual message (like Request or Release)
I is the node number
TI is a copy of CI (the nodes timestamp) at
the time the message was created

35
Lamports timestamping algorithm

When node J receives a message from node I, it
updates its local CJ to 1 max CJ , Ti
( m, TI, I ) precedes ( m, TJ, J ) . . .
if TI lt TJ
if TI TJ and I lt J
For this to work, each message must be sent to
all other nodes

36
Example

(a,1,1) lt (x,3,2) lt (b,5,1) lt (j,5,3)

37
Example

(a,1,1) lt (q,1,4)

38
Distributed mutual exclusion using a distributed
queue

The queue is just an array with one entry for
each node
Requests for resources are granted FIFO, based on
timestamped request messages
All nodes maintain a copy of the queue
Each node keeps the most recent message from each
of the other nodes in the queue

39
Distributed mutual exclusion using a distributed
queue

All nodes agree on a order for the messages
within the queue if no messages are in transit
The transit problem is overcome by the
distributed queue algorithm (First Version)
Summary on next slide
3(N-1) messages are involved per request
Version Two is more efficient 2(N-1) messages

40
Summary of distributed queue algorithm

A timestamped resource Request message is sent to
all other nodes
A copy of the Request message is also saved in
the queue of the requesting node
If it has not itself made a request, each node
receiving a request sends a Reply message back to
the sender
This assures that no earlier Request message is
in transit when the requester makes its decision
A process may access the requested resource when
its request is the earliest message in its queue
After acting on a resource request, a node sends
a Release message to all other nodes and puts a
copy in its own queue

41
2
Suppose node wants to enter a critical
section . . .
1 2 3 4
1 2 3 4
1
Q
P Q P P
P Q
2
P
L
Q
Q
L
P
P
L
1 2 3 4
1 2 3 4
3
4
Q P
Q P
Q reQuest P rePly L reLease
4
What if node made an earlier request (in
transit)?
42
Token-passing algorithm for distributed mutual
exclusion

Two arrays are used
Token array
Passed from node to node
The kth position contains timestamp of node k the
last time the token visited that node
Request array
Maintained by each node
The jth position contains the timestamp of the
last Request message received from node j

43
Token-passing algorithm

Send request to all other nodes
Wait for the token
Release the resource by sending the token to some
node requesting the resource
Choose the first requesting node K whose Request
message has a timestamp gt its timestamp in the
token
That is requestK gt tokenK

44
2
Suppose node wants to enter a critical
section and holds the token
3
1 2 3 4
1 2 3 4
1

Q
Q
2
Q
Q
token
1 2 3 4
1 2 3 4
3
4
Q
Q
Q reQuest T Time of last visit
1 2 3 4
T T T T
token
45
Token-passing algorithm

See full algorithm in Figure 18.11
N messages are needed per resource request
Choice of next requesting node is not FIFO
However, no starvation

46
Distributed deadlock in resource allocation

Distributed deadlock prevention
Circular-wait can be denied by defining a linear
ordering of resource types
Hold-and-wait condition can be denied by
requiring that a process request all of its
required resources at one time
The process is blocked until all requests can be
granted simultaneously
Resource requirements need to be known in advance

47
Distributed deadlock

Distributed deadlock avoidance is impractical
Every node must keep track of the global state of
the system
The process of checking for a safe global state
must be done under mutual exclusion
Otherwise two nodes, each considering a different
request, could erroneously honor both requests
when only one is safe
Checking for safe states involves considerable
processing overhead for a distributed system with
a large number of processes and resources

48
Distributed deadlock

Distributed deadlock detection
Each site only knows about its own resources
Deadlock may involve distributed resources
Three possible techniques
Centralized control
Hierarchical control
Distributed control

49
Distributed deadlock

Distributed deadlock detection
Centralized control
One site is responsible for deadlock detection
Simple, subject to failure of central node
Hierarchical control
Sites organized as a tree
Each node collects information from children
Detects deadlocks at common ancestor
Distributed control
All processes cooperate in the deadlock detection
function
This may have considerable overhead

50
Deadlock in message communication

Mutual Waiting
Deadlock can exist due to mutual waiting among a
group of processes when each process is waiting
for a message from another process and there are
no messages in transit

P1 is waiting for a message from either P2 or P5
51
Deadlock in message communication

Unavailability of Message Buffers
Well known problem in packet-switching data
networks
Store-and-forward deadlock
Example of direct store-and-forward deadlock
buffer space for A is filled with packets
destined for B
The reverse is true at B.

52
Deadlock in message communication

Unavailability of Message Buffers
For each node, the queue to the adjacent node in
one direction is full with packets destined for
the next node beyond
Indirect store-and-forward deadlock

Write a Comment

User Comments (0)