Chapter 18: Distributed Process Management - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Chapter 18: Distributed Process Management

Description:

Destroy the process on A and create it on B. Move at least the PCB ... J receives a message from node I, it updates its local CJ to 1 max{ CJ , Ti } ... – PowerPoint PPT presentation

Number of Views:676
Avg rating:3.0/5.0
Slides: 53
Provided by: markt2
Category:

less

Transcript and Presenter's Notes

Title: Chapter 18: Distributed Process Management


1
Chapter 18 Distributed Process Management
  • CS 472 Operating Systems
  • Indiana University Purdue University Fort Wayne

2
Distributed Process Management
  • Note Chapter 18 is an online chapter and does
    not appear in the textbook
  • Available under Online Chapters at
  • WilliamStallings.com/OS/OSe6.html
  • Be aware that the URL is case sensitive
  • A .pdf of this chapter should also be available
    on the class web site under Resources

3
Distributed Process Management
  • This chapter concerns some issues in developing a
    distributed OS
  • Process migration
  • Global state of a distributed system
  • Distributed mutual exclusion

4
Process migration
  • A sufficient amount of the state of a process
    must be transferred from one computer to another
    for the process to execute on the target machine
  • Goals
  • Load sharing
  • Efficient interaction with other processes and
    data
  • Access to special resources
  • Survival

5
Process migration goals
  • Load sharing
  • Move processes from heavily loaded to lightly
    load systems
  • OS typically initiates the migration for load
    sharing
  • Communications performance
  • Processes that interact intensively can be moved
    to the same node to reduce communications cost
  • May be better to move process to where the data
    reside when the data set is large
  • The process itself typically initiates this
    migration

6
Process migration goals
  • Utilizing special capabilities
  • Process can migrate to take advantage of unique
    hardware or software capabilities
  • Availability (survival)
  • Long-running process may need to move because the
    machine it is running on will be down

7
To migrate process P from A to B . . .
  • Destroy the process on A and create it on B
  • Move at least the PCB
  • Update any links between P and other processes
    and data . . .
  • Including any outstanding messages and signals,
    open files, etc.

8
(No Transcript)
9
Migration of process P from A to B
  • Entire address space can be moved or pieces
    transferred on demand
  • Transfer strategies (assuming paged virtual
    memory)
  • Eager (all)
  • Precopy
  • Eager(dirty)
  • Copy-on-reference
  • Flushing

10
Transfer strategies
  • Eager (all) Transfer entire address space
  • No trace of process is left behind
  • If address space is large and if the process does
    not need most of it, then this approach my be
    unnecessarily expensive
  • Precopy Process continues to execute on the
    source node while the address space is copied
  • Pages modified on the source during precopy
    operation have to be copied a second time
  • Reduces the time that a process is frozen and
    cannot execute during migration

11
Transfer strategies
  • Eager (dirty) Transfer only modified pages in
    main memory
  • Any additional blocks of the virtual address
    space are transferred on demand from disk
  • The source machine is involved throughout the
    life of the process
  • Copy-on-reference Transfer pages only when
    referenced
  • Has lowest initial cost of process migration
  • Flushing Pages are cleared from main memory by
    flushing dirty pages to disk
  • Relieves the source of holding any pages of the
    migrated process in main memory
  • Needed pages are subsequently loaded from disk

12
Initiation of migration can be made by ...
  • by a load balancing process
  • by the process itself
  • for communications performance
  • for survival
  • to access special resources
  • by the foreign system (eviction)

13
Global state of a distributed system
  • Difficult concept to understand
  • Global state of a distributed system needs to be
    known for mutual exclusion, avoiding deadlock,
    etc.
  • Operating system cannot know the current state of
    all process in the distributed system

14
Global state of a distributed system
  • A node can only know the current state of all
    local processes and earlier states of remote
    processes
  • States of remote processes are known only through
    messages
  • Even the exact times of remote states cannot be
    known
  • It is impossible to synchronize clocks of nodes
    accurately enough to be of use

15
Example
  • A bank is distributed over two branches
  • To close a checking account at a bank, the
    account balance (global state of account) needs
    to be known
  • Deposits may not have cleared
  • Fund transfers may be pending
  • Checks may not have been cashed
  • Ask all correspondents to state pending activity
  • Close the account when all reply
  • Situation is analogous to determining the global
    state of a system

16
Example
  • At 3 PM the account balance is to be determined
  • Messages are exchanged for needed information
  • A snapshot is established for each branch as of
    3 PM

17
Example
  • Suppose at the time of balance determination, a
    fund transfer message is in progress from branch
    A to branch B
  • The result is a false balance determination (0)

18
Example
  • To correct the balance, all messages in transit
    at the time of observation must be examined
  • Total consists of balance at both branches and
    amount in the messages

19
Example
  • Suppose the clocks at the two branches are not
    perfectly synchronized
  • Transfer amount at 301 from branch A
  • Amount arrives at branch B at 259
  • At 300 the amount is counted twice (200)

20
Terminology
  • Channel
  • Exists between two processes if they exchange
    messages
  • State
  • Sequence of messages that have been sent and
    received along channels incident with the process
  • Snapshot of a process
  • Current local state of the process . . .
  • together with the state as defined above
  • Global state
  • The combined snapshots of all processes

21
Problem
  • Process P gathers snapshots from the other
    processes and determines a global state
  • Process Q does the same
  • The two global states as determined by P and Q
    may be different
  • Solution Settle for consistent global states
  • Global states are consistent if . . .
  • for each message received, the snapshot of the
    sender indicates that the message was sent

22
Inconsistent global state
23
Consistent global state
24
Distributed snapshot algorithm
  • Assumes that all messages are delivered in the
    order sent and no messages are lost (e.g. TCP)
  • Special control message called a marker is used
  • Any process may initiate the algorithm by
  • recording its state
  • sending out the marker on all outgoing channels
    before any other messages are sent

25
(No Transcript)
26
Distributed snapshot algorithm
  • Let P be any participating process
  • Upon first receipt of the marker (say from
    process Q) process P does the following
  • P records its local state SP
  • P records the state of the incoming channel from
    Q to P as empty
  • P propagates the marker to all its neighbors
    along all outgoing channels
  • These three steps must be performed atomically
    without any other messages sent or received

27
Distributed snapshot algorithm
  • Later, when P receives a marker from another
    incoming channel (say, from process R) . . .
  • P records the state of the channel from R to P as
    the sequence of messages P has received from R
    from the time P recorded its local state SP to
    the time it received the marker from R
  • The algorithm terminates at process P once the
    marker has been received along every incoming
    channel

28
Distributed snapshot algorithm
  • Once the algorithm has terminated at all
    processes, the consistent global state can be
    assembled at any node
  • Any node wanting a consistent global state asks
    every other node to send it the state data
    recorded at that node

29
Distributed snapshot algorithm
  • The algorithm succeeds even if several nodes
    independently decide to initiate the algorithm
  • The algorithm is not affected by any other
    distributed algorithm the processes are
    executing
  • Algorithm terminates in a finite amount of time
  • Algorithm can be used to adapt any centralized
    algorithm to a distributed environment

30
Distributed mutual exclusion
  • Recall that shared memory and semaphores cannot
    be used to enforce mutual exclusion
  • Instead, any mechanism must depend on the
    exchange of messages
  • Algorithms for mutual exclusion may be
  • Centralized
  • Distributed

31
Centralized mutual exclusion algorithm
  • Algorithm is straightforward
  • One node is designated as the control node
  • This node controls access to all shared objects
  • Only the control node makes resource-allocation
    decisions
  • Uses Request, Permission, and Release messages
  • The control node may be a bottleneck
  • Failure of the control node causes a breakdown of
    mutual exclusion

32
Distributed mutual exclusion algorithm
  • Each node has only a partial picture of the total
    system and must make decisions based on this
    information
  • All nodes bear equal responsibility for the final
    decision
  • Failure of a node, in general, does not result in
    a total system collapse
  • There is no common clock and no way to adequately
    synchronize clocks

33
Distributed mutual exclusion
  • Distributed algorithm does require a time
    ordering of events
  • For this, an event is the sending of a message
  • Did event E1 on node S1 occur before event E2 on
    node S2 ?
  • Communication delays must be overcome
  • The answer need not be correct, but all nodes
    must reach the conclusion

34
Lamports timestamping algorithm
  • Gives a consistent time-ordering of events in a
    distributed system
  • Each node I has a local counter CI
  • When node I sends a message, it first increments
    CI by 1
  • Messages from node I all have form ( m, TI, I),
    where
  • m is the actual message (like Request or Release)
  • I is the node number
  • TI is a copy of CI (the nodes timestamp) at
    the time the message was created

35
Lamports timestamping algorithm
  • When node J receives a message from node I, it
    updates its local CJ to 1 max CJ , Ti
  • ( m, TI, I ) precedes ( m, TJ, J ) . . .
  • if TI lt TJ
  • if TI TJ and I lt J
  • For this to work, each message must be sent to
    all other nodes

36
Example
  • (a,1,1) lt (x,3,2) lt (b,5,1) lt (j,5,3)

37
Example
  • (a,1,1) lt (q,1,4)

38
Distributed mutual exclusion using a distributed
queue
  • The queue is just an array with one entry for
    each node
  • Requests for resources are granted FIFO, based on
    timestamped request messages
  • All nodes maintain a copy of the queue
  • Each node keeps the most recent message from each
    of the other nodes in the queue

39
Distributed mutual exclusion using a distributed
queue
  • All nodes agree on a order for the messages
    within the queue if no messages are in transit
  • The transit problem is overcome by the
    distributed queue algorithm (First Version)
  • Summary on next slide
  • 3(N-1) messages are involved per request
  • Version Two is more efficient 2(N-1) messages

40
Summary of distributed queue algorithm
  • A timestamped resource Request message is sent to
    all other nodes
  • A copy of the Request message is also saved in
    the queue of the requesting node
  • If it has not itself made a request, each node
    receiving a request sends a Reply message back to
    the sender
  • This assures that no earlier Request message is
    in transit when the requester makes its decision
  • A process may access the requested resource when
    its request is the earliest message in its queue
  • After acting on a resource request, a node sends
    a Release message to all other nodes and puts a
    copy in its own queue

41
2
Suppose node wants to enter a critical
section . . .
1 2 3 4
1 2 3 4
1
Q
P Q P P
P Q
2
P
L
Q
Q
L
P
P
L
1 2 3 4
1 2 3 4
3
4
Q P
Q P
Q reQuest P rePly L reLease
4
What if node made an earlier request (in
transit)?
42
Token-passing algorithm for distributed mutual
exclusion
  • Two arrays are used
  • Token array
  • Passed from node to node
  • The kth position contains timestamp of node k the
    last time the token visited that node
  • Request array
  • Maintained by each node
  • The jth position contains the timestamp of the
    last Request message received from node j

43
Token-passing algorithm
  • Send request to all other nodes
  • Wait for the token
  • Release the resource by sending the token to some
    node requesting the resource
  • Choose the first requesting node K whose Request
    message has a timestamp gt its timestamp in the
    token
  • That is requestK gt tokenK

44
2
Suppose node wants to enter a critical
section and holds the token
3
1 2 3 4
1 2 3 4
1

Q
Q
2
Q
Q
token
1 2 3 4
1 2 3 4
3
4
Q
Q
Q reQuest T Time of last visit
1 2 3 4
T T T T
token
45
Token-passing algorithm
  • See full algorithm in Figure 18.11
  • N messages are needed per resource request
  • Choice of next requesting node is not FIFO
  • However, no starvation

46
Distributed deadlock in resource allocation
  • Distributed deadlock prevention
  • Circular-wait can be denied by defining a linear
    ordering of resource types
  • Hold-and-wait condition can be denied by
    requiring that a process request all of its
    required resources at one time
  • The process is blocked until all requests can be
    granted simultaneously
  • Resource requirements need to be known in advance

47
Distributed deadlock
  • Distributed deadlock avoidance is impractical
  • Every node must keep track of the global state of
    the system
  • The process of checking for a safe global state
    must be done under mutual exclusion
  • Otherwise two nodes, each considering a different
    request, could erroneously honor both requests
    when only one is safe
  • Checking for safe states involves considerable
    processing overhead for a distributed system with
    a large number of processes and resources

48
Distributed deadlock
  • Distributed deadlock detection
  • Each site only knows about its own resources
  • Deadlock may involve distributed resources
  • Three possible techniques
  • Centralized control
  • Hierarchical control
  • Distributed control

49
Distributed deadlock
  • Distributed deadlock detection
  • Centralized control
  • One site is responsible for deadlock detection
  • Simple, subject to failure of central node
  • Hierarchical control
  • Sites organized as a tree
  • Each node collects information from children
  • Detects deadlocks at common ancestor
  • Distributed control
  • All processes cooperate in the deadlock detection
    function
  • This may have considerable overhead

50
Deadlock in message communication
  • Mutual Waiting
  • Deadlock can exist due to mutual waiting among a
    group of processes when each process is waiting
    for a message from another process and there are
    no messages in transit

P1 is waiting for a message from either P2 or P5
51
Deadlock in message communication
  • Unavailability of Message Buffers
  • Well known problem in packet-switching data
    networks
  • Store-and-forward deadlock
  • Example of direct store-and-forward deadlock
  • buffer space for A is filled with packets
    destined for B
  • The reverse is true at B.

52
Deadlock in message communication
  • Unavailability of Message Buffers
  • For each node, the queue to the adjacent node in
    one direction is full with packets destined for
    the next node beyond
  • Indirect store-and-forward deadlock
Write a Comment
User Comments (0)
About PowerShow.com