Fault Tolerance - PowerPoint PPT Presentation

1 / 77
About This Presentation
Title:

Fault Tolerance

Description:

Key goal of fault tolerance is to allow a system to continue to function after a ... Synchronous vs. asynchronous: Do the processes operate in lock-step? ... – PowerPoint PPT presentation

Number of Views:163
Avg rating:3.0/5.0
Slides: 78
Provided by: Ken667
Category:

less

Transcript and Presenter's Notes

Title: Fault Tolerance


1
Fault Tolerance
  • Introduction to Distributed SystemsCS
    457/557Fall 2008Kenneth Chiu

2
Nondistributed vs. Distributed
  • Can you make a nondistributed system
    fault-tolerant?
  • Can a non-distributed system have partial
    failures?
  • Key goal of fault tolerance is to allow a system
    to continue to function after a partial failure.

3
Basic Concepts
  • Availability
  • Is it working?
  • Reliability
  • Is it the same as availability?
  • How available is something that is down for 1 ms
    every hour? How reliable is it?
  • Safety
  • How does it fail? What happens when there are no
    signals?
  • Maintainability
  • How easy is it to repair? Can affect
    availability.
  • What is failure?
  • Cant meet promises.
  • Error is something that might lead to failure.
    Part of the state.
  • Fault is the cause of the error, like a short in
    a circuit, etc.
  • Transient
  • Intermittent
  • Permanent
  • A system is called k fault tolerant if it can
    tolerate k faults.

4
Failure Models
  • How can things fail (from the viewpoint of an
    observer)?
  • Why try to define them?
  • Because addressing them may have different levels
    of difficulty.
  • Some systems might be able to tolerate some
    kinds, but not others.
  • Some kinds of faults may be rare.
  • We do not want to address them, but we need to be
    able to define those kind precisely.

5
Failure Models
  • Crashes are also called fail-stop.
  • Arbitrary (byzantine) failures are all else, and
    include malicious/subverted servers, etc.
  • What kind of failures are easiest? What kind are
    hardest?

6
Redundancy
  • You can mask failures by being redundant.
  • Redundant information. How?
  • Time redundancy. How?
  • Physical redundancy. How?
  • Is nature redundant?

7
Triple Modular Redundancy
Signals pass through three devices.
  • Redundancy. Why three voters?
  • How many fail-stop faults can this tolerate?
  • How many response failures (wrong values)?

8
Process Resilience
9
  • Protect against process failures, so replicate in
    groups.

10
Process Groups
  • Key technique to tolerating faulty processes is
    to organize them into a group.
  • When a message is sent to a group, all members
    get them.
  • Can be dynamic. A process can also be in multiple
    groups.
  • Can be flat or hierarchical.

11
  • Flat groups All processes the same.
  • Hierarchical Might have a coordinator.
  • What happens when a process is lost in a flat
    group vs. hierarchical group?
  • In flat groups, all processes are the same, so
    there is no danger of losing the coordinator.
  • Which one is has more complicated decision making
    algorithms, probably?
  • Using a coordinator makes decision making easier.

Coordinator
Hierarchical group
Flat group
Worker
12
(No Transcript)
13
  • Group membership
  • Can use a group server. It is a centralized
    machine that maintains all groups.
  • Distributed? How would you do it in real life?
  • Join by multicasting to all.
  • Send a goodbye message when leaving?
  • Problematic if there is a crash, because there is
    no message announcing that the process that
    crashed has left.
  • Must notice that the crashed process no longer
    responds.
  • Joining and leaving have to be synchronous.
  • As soon as it joins, should receive all messages.
  • As soon as it leaves, must stop receiving.
  • Rebuilding the group is hard.

14
Failure Masking and Replication
  • Process groups are part of the solution for fault
    tolerance.
  • Replace single, vulnerable process with a (fault
    tolerant) group.
  • Two ways to approach replication primary-based
    or replicated write.
  • A system is k fault tolerant if it can tolerate k
    faults.
  • Suppose everything is fail-stop (silently). How
    many processes do we need to tolerate k faults
    for a simple voting system.
  • Suppose the failures are byzantine. Now how many?

15
Agreement in Faulty Systems
  • When using simple voting, we can tolerate k
    faults with 2k1. Not so simple when we are
    talking about agreement.
  • What aspects are important?
  • Synchronous vs. asynchronous Do the processes
    operate in lock-step?
  • Is communication latency bounded?
  • Is message delivery ordered?
  • Unicasting or multicasting?

16
  • Turns out that distributed agreement is only
    possible under these conditions.

17
Two Generals Problem
  • Two generals want to attack a city from different
    sides. They will only succeed if both attack at
    the same time. They can communicate only by
    messengers sent by horse.
  • How do they reach agreement?

18
Lamports Algorithm
  • Assumptions
  • Synchronous
  • Unicast, preserve ordering.
  • Latency is bounded.
  • Setup
  • N processes.
  • Each process i will send vi to others.
  • Goal is that each process will construct a vector
    V of length N such that if process i is
    nonfaulty,Vi vi.
  • Otherwise Vi is undefined.
  • Assume at most k faulty processes.

19
Lamports Algorithm
  • N 4, k 1.
  • Steps
  • Each process sends their value to the others
    using reliable unicasting.
  • Results collected into vectors.
  • Vectors re-distributed.
  • If any position has a majority, that is the value.

20
  • Now consider N 3, k 1.
  • Lamport proved that you need 3k1 to tolerate k
    faulty processes.
  • If you may have infinite delays, no agreement is
    possible.

21
Failure Detection
  • How do you detect that a process has failed?
  • Two mechanisms
  • Ping
  • Heartbeat
  • Nothing can really be done, except some kind of
    timeout.

22
Reliable Client-Server Communication
23
  • Besides faulty processes, also need to look at
    communications failures.
  • Point-point communication
  • TCP hides lost messages.
  • Crashes are not masked, however.

24
RPC Failures
  • Five different classes of failures.
  • Cant find server.
  • Request message lost.
  • Server crashes after receiving request.
  • Reply message is lost.
  • Client crashes after receiving request.

25
Cant Find Server
  • One possibility is to raise an exception.
  • Is RPC still completely transparent to the client?

26
Lost Request
  • Start a timer. If it expires, send another.
  • Or is the server down?

27
Server Crashes
  • What should the proper way to handle the two
    crashes be?
  • Can the client tell which one happened?
  • Can use different semantics At least once and at
    most once.
  • No general solution for exactly once. Consider a
    print server that crashes and comes back up.
  • Client sends a message, gets an ack that message
    was received.
  • Two strategies server can use sends a completion
    message either right before or right after.
  • If crash, client can never reissue, always
    reissue, only reissue if no ack, only reissue if
    there is an ack.

28
  • Three events that can happen at the server
  • Send the completion message (M).
  • Print the text (P).
  • Crash (C).

29
  • These events can occur in six different
    orderings
  • M ?P ?C A crash occurs after sending the
    completion message and printing the text.
  • M ?C (?P) A crash happens after sending the
    completion message, but before the text could be
    printed.
  • P ?M ?C A crash occurs after sending the
    completion message and printing the text.
  • P?C(?M) The text printed, after which a crash
    occurs before the completion message could be
    sent.
  • C (?P ?M) A crash happens before the server
    could do anything.
  • C (?M ?P) A crash happens before the server
    could do anything.

30
  • Server crashes and comes back up. No combination
    is satisfactory.

?
?
?
?
?
?
?
31
  • Server crashes and comes back up. No combination
    is satisfactory.

32
Lost Reply Messages
  • Start a timer, if no reply, send the request
    again.
  • How well does this work?
  • The problem is idempotency.

33
Client Crashes
  • Can leave dangling resources at the server. Four
    strategies
  • Use client log. If crash is detected, contact
    server to free resources. How well does this
    work?
  • Very expensive.
  • May have grand-orphans.
  • Network may be partitioned.
  • When the client reboots, it broadcasts, freeing
    all resources. Disadvantage?
  • May have long-running computations.
  • When a reboot message comes in, contact owners.
  • Use expiration. If a resource has not been freed
    for a while, free it automatically. Similar to a
    lease.

34
Reliable Group Communication
35
  • What is reliable multicasting?
  • What happens if during communication, someone
    joins the group?
  • What happens if the sending process crashes
    during the send?
  • To cover these, make a distinction between
    reliable multicasting with process failure and
    without.
  • With faulty processes, multicasting is reliable
    when it goes to all non-faulty group members. But
    how to agree on exactly what is the group?
  • Simpler if there is agreement on the group.

36
  • Consider a single sender multicasting. Assume
    that the underlying system only has unreliable
    multicast.
  • A sender wants to send message number 25.

37
Scalability Issues with Reliable Multicasting
  • What happens if there are a million receivers?
  • How many ACKs?
  • How about returning NAKs only?
  • What happens if a packet is dropped early?
  • How long to buffer a packet using NAKs?

38
Nonhierarchical Feedback Control
  • Only report NAKs, but multicast them to everyone.
    How does this resolve things?
  • Will everyone multicast the NAK at the same time?
    How to resolved?
  • A packet was lost early. All receivers schedule a
    retransmit, but one will go first.

39
  • Disadvantages
  • Still hard to make sure that only one NAK is
    sent.
  • Lots of interruptions. Receivers that got the
    packet successfully are forced to process useless
    NAKs. Solution?
  • Could have a separate multicast group for those.
  • But that requires highly efficient and reliable
    dynamic group management. Could involve just as
    many messages.
  • Could have ones that tend to miss them join.
  • Can have local recovery, to improve efficiency.

40
Hierarchical Feedback Control
  • Scalability for flat schemes is hard.
  • For large groups, need a hierachy.

41
  • Receivers are divided into subgroups, based on
    physical topology.
  • Subgroups are organized into a tree, with the
    subgroup containing the sender at the root.
    Within a subgroup?
  • Within subgroup, use a method that works well for
    small groups.
  • Each subgroup has a coordinator with a history
    buffer. What happens if the coordinator itself
    misses a packet?
  • If the coordinator misses a packet, it asks the
    coordinator of the parent subgroup.
  • When can a coordinator remove a packet from its
    history buffer?

Receiver
Root
Sender
WAN
S
R
C
C
LAN
Coordinator
42
Virtual Synchrony (1)
  • Figure 8-12. The logical organization of a
    distributed system to
  • distinguish between message receipt and message
    delivery.

43
Virtual Synchrony (2)
  • Figure 8-13. The principle of virtual synchronous
    multicast.

44
Message Ordering (1)
  • Four different orderings are distinguished
  • Unordered multicasts
  • FIFO-ordered multicasts
  • Causally-ordered multicasts
  • Totally-ordered multicasts

45
Message Ordering (2)
  • Figure 8-14. Three communicating processes in the
  • same group. The ordering of events
  • per process is shown along the vertical axis.

46
Message Ordering (3)
  • Figure 8-15. Four processes in the same group
    with two different senders, and a possible
    delivery order of messages under FIFO-ordered
    multicasting

47
Implementing Virtual Synchrony (1)
  • Figure 8-16. Six different versions of virtually
    synchronous reliable multicasting.

48
Implementing Virtual Synchrony (2)
  • Figure 8-17. (a) Process 4 notices that process 7
    has crashed and sends a view change.
  • Figure 8-17. (b) Process 6 sends out all
    itsunstable messages, followed by a flush
    message.
  • Figure 8-17. (c) Process 6 installs the new view
    when it has received a flush message from
    everyone else.

49
Distributed Commmit
50
Noncomputer-Based Distributed Systems
  • This is the Clayton Tunnel in 1841 in England.
  • A two-way tunnel.
  • At each entrance is a semaphore system that flips
    red when a train passes. It must be manually
    reset to green.
  • Before manual reset, the signal man must make
    sure that the train has exited.
  • Only one train allowed per track in the tunnel.
  • A telegraph, with a fixed set of 3 messages was
    provided.
  • TRAIN-IN-TUNNEL, TUNNEL-IS-CLEAR,
    HAS-THE-TRAIN-LEFT-THE-TUNNEL?
  • In case the semaphore failed, the signal man had
    red and white flags for manual signalling.

51
Noncomputer-Based Distributed Systems
A
B
  • Normal
  • A train enters, flips the semaphore signal red.
  • Signal man A sends TRAIN-IN-TUNNEL.
  • When train exists, opposite signal man B sends
    TUNNEL-IS-CLEAR.
  • Signal man A manually resets the signal to green.
  • Semaphore failure
  • A train enters, semaphore fails to flip, alarm
    rings.
  • Signal man A sends TRAIN-IN-TUNNEL.
  • Signal man A then manually raises a red flag.
  • When train exists, opposite signal man B sends
    TUNNEL-IS-CLEAR.
  • Signal man A changes red flag to white flag.
  • Should 2 and 3 be reversed?
  • Weaknesses?
  • What happens if the train has exited by the time
    the TRAIN-IN-TUNNEL message is sent?
  • How far apart do trains need to be? What happens
    if they are too close?

52
  • On August 25th, 1861
  • Three trains left Brighton at 828, 831, and
    835, due to late running of the first train.
  • The first train entered the tunnel, but the
    semaphore failed to flip to red.
  • The signal man A telegraphed TRAIN-IN-TUNNEL.
  • He went to manually raise a red flag, but was too
    slow, due to the trains being too close together.
  • The second train barely catches a glimpse of the
    red flag as he passes by, but cant stop in time
    and enters the tunnel. He stops in the middle of
    the tunnel and begins to back up.
  • The third train sees the red flag in time, and
    stops before entering.
  • The signal man A now telegraphs TRAIN-IN-TUNNEL,
    to indicate that there are two trains in the
    tunnel.
  • Signal man A now asks, HAS-THE-TRAIN-LEFT-THE-TUNN
    EL?
  • What should signal man B do now?
  • Signal man B, after the first train has left,
    responds TUNNEL-IS-CLEAR, thinking A meant the
    first train.
  • Signal man A thinks B meant the second train, and
    changes the flag to white.
  • The third train enters the tunnel.
  • 21 people died, 176 were injured. Whose fault was
    it?

53
Distributed Commit
  • Given a group of actors, how do you get them to
    either all agree to do something (commit), or not
    do it (abort)?
  • Suppose you are trying to arrange going to a
    movie with a group of friends by e-mail. You only
    want the event to happen if everyone can go. How
    do you do it?
  • Send out a group e-mail asking if they can make
    it.
  • Wait for responses, sent back just to you.
  • If anyone says they cannot, then send out an
    abort message to everyone.
  • If everyone says they can make it, send out a
    commit message.
  • How can this fail?
  • If communication is unreliable, then problems are
    numerous. Consider only fail-stop (crash)
    failures that are recoverable.
  • Suppose one of your friends has e-mail problems
    before 1, they never get the request.
  • Suppose he has e-mail problems after sending back
    OK?
  • Suppose you have e-mail problems?
  • Suppose one of your friends gets a girlfriend and
    starts ignoring you?

54
One-Phase Commit
  • Coordinator decides whether or not to perform
    (commit) the operation, and tells others.
  • What is the obvious problem?

55
Two-Phase Commit
  • Consists of a coordinator and participants.
  • Coordinator multicasts a VOTE_REQUEST message to
    all participants.
  • When a participant receives a VOTE_REQUEST
    message, it replies (unicast) with either
    VOTE_COMMIT or VOTE_ABORT.
  • A VOTE_COMMIT response is essentially a
    contractual guarantee that it will be able to
    commit.
  • Coordinator collects all votes. If all are
    VOTE_COMMIT, then it multicasts a GLOBAL_COMMIT
    message. Otherwise, it will multicast a
    GLOBAL_ABORT message.
  • When a participant receives GLOBAL_COMMIT, it
    locally commits if it receives GLOBAL_ABORT, it
    locally aborts.
  • Can a process have an error now?

56
Two-Phase Commit FSMs
Coordinator
Participant
  • Where does the waiting/blocking occur?
  • Coordinator-WAIT
  • Participant-INIT
  • Participant-READY

57
Two-Phase Commit Recovery
Wait State
Wait States
Participant
Coordinator
  • What happens in case of a crash? How do we detect
    a crash?
  • If timeout in Coordinator-WAIT, then abort.
  • If timout in Participant-INIT, then abort.
  • If timout in Participant-READY, then need to find
    out if globally committed or aborted.
  • Just wait for Coordinator to recover.
  • Check with others.

58
Two-Phase Commit Recovery
  • If in Participant-READY, and we wish to check
    with others
  • If Q is in COMMIT, then commit. If Q is in ABORT,
    then ABORT.
  • If Q in INIT, then can safely ABORT.
  • If all in READY, nothing can be done.

59
Two-Phase Commit Recovery
  • If in Participant-READY, and we wish to check
    with others
  • If Q is in COMMIT, then commit. If Q is in ABORT,
    then ABORT.
  • If Q in INIT, then can safely ABORT.
  • Does it make a difference whether we change
    state, then send, or send, then change state?
  • If all in READY, nothing can be done.

60
2PC Coordinator Code
  • write START_2PC to local logmulticast
    VOTE_REQUEST to all participantswhile not all
    votes have been collected wait for any
    incoming vote if timeout write
    GLOBAL_ABORT to local log multicast
    GLOBAL_ABORT to all participants exit
    record voteif all participants sent
    VOTE_COMMIT and coordinator

    votes COMMIT write GLOBAL_COMMIT to local
    log multicast GLOBAL_COMMIT to all
    participants else write GLOBAL_ABORT to
    local log multicast GLOBAL_ABORT to all
    participants

61
2PC Participant Code
  • write INIT to local logwait for VOTE_REQUEST
    from coordinatorif timeout write
    VOTE_ABORT to local log exitif
    participant votes COMMIT write VOTE_COMMIT
    to local log send VOTE_COMMIT to
    coordinator wait for DECISION from
    coordinator if timeout multicast
    DECISION_REQUEST to other participants
    wait until DECISION is received / Remain
    blocked / write DECISION to local log
    if DECISION GLOBAL_COMMIT
    write GLOBAL_COMMIT to local log else if
    DECISION GLOBAL_ABORT write
    GLOBAL_ABORT to local log else write
    VOTE_ABORT to local log send VOTE_ABORT to
    coordinator

62
2PC Decision Request Handler
  • while true wait until any incoming
    DECISION_REQUEST is

    received read most recently recorded state
    from the local log if state
    GLOBAL_COMMIT send GLOBAL_COMMIT to
    requesting participant else if state INIT
    or state GLOBAL_ABORT send
    GLOBAL_ABORT to requesting participant else
    skip / Participant remains blocked /

63
  • Can participants always make a decision to commit
    or abort in the face of failure?
  • No, so this is called a blocking commit protocol.

64
Three-Phase Commit
  • To avoid blocking, there is a three phase commit
    protocol.

65
Three-Phase Commit
  • The states of the coordinator and each
    participant satisfy the following two conditions
  • There is no single state from which it is
    possible to make a transition directly to either
    a COMMIT or an ABORT state.
  • There is no state in which it is not possible to
    make a final decision, and from which a
    transition to a COMMIT state can be made.

66
  • Participant Timeout
  • Init Abort
  • Precommit Commit
  • Ready Abort
  • Coordinator Timeout
  • Wait Abort
  • Precommit Commit

67
Recovery
68
Recovery
  • Recovery is to replace an erroneous state with an
    error-free state.
  • Two kinds
  • Backward recovery go back to a known good state
    (checkpoint).
  • Forward recovery attempt to go forward to a
    known good state.
  • Requires knowing in advance which errors might
    occur.
  • Examples of backward recovery? Forward recovery?
  • File system backups
  • Erasure codes

69
Checkpointing and Logging
  • Suppose you have a set of 20 files, totaling 100
    MB. You are constantly modifying these files.
  • You want to checkpoint the set of files as a
    whole. How do you do it?
  • Make a complete copy every time.
  • Suppose you want a checkpoint every hour, but you
    only change 1K scattered across all 20 files
    every 10 minutes? How much disk will this take?
    How much time?
  • Log the changes.
  • Suppose you completely replace every byte in half
    your files every 10 minutes. How much disk will
    this take?
  • Best is usually some combination.

70
Stable Storage
  • Suppose you want to make a disk system that is
    resilient from non-catastrophic hardware
    failures, like a sector going bad, or a disk
    going bad. How would you do it?
  • Use two disks. Make a duplicate of each sector on
    drive 1 on drive 2.
  • When a block is written, first update and verify
    on drive 1, then update and verify on drive 2.

71
Stable Storage
  • (b) How do we recover from a crash, if different
    value? If bad checksum?

72
Checkpointing
  • A checkpoint is a complete record of the state of
    an application.
  • How do you checkpoint a non-distributed
    application?
  • OS-level checkpointing Save the complete process
    state. Transparent to the application.
  • Application-level checkpointing Application
    saves its own state. Requires coding this
    functionality into the application.

73
Example E-Mail Client
  • Say you have an e-mail client like Outlook or
    Thunderbird.
  • What would the OS need to do to implement
    OS-level checkpointing? What needs to be saved?
  • Contents of memory, including stack.
  • Window state
  • Thread state. Which threads have been created
    with which options.
  • Signal handler state.
  • Open file descriptor state. File offset state.
  • Network connection state?
  • Others?
  • What would have to be implemented in app-level
    checkpointing?
  • What folder is in main window. What message is
    showing.
  • Composition windows. Contents. Cursor position.
    Undo buffer.
  • Which seems easier? How about a hybrid?

74
Distributed Checkpointing
  • Assume we have a complex computation running on
    10,000 CPUs. Normally, a single CPU system might
    fail once in 5 years, which might be acceptable.
  • Do you think this 10,000 CPU system is going to
    fail once in 5 years?
  • Checkpointing is the answer, but how? The
    computation works by sending many messages
    between the various machines.

75
Example A Broadcast
  • Assume some kind of computational fluid dynamics
    (CFD) code, computing the flow over a wing. For
    speed, we run on three processes.
  • Periodically, process 1 broadcasts a new velocity
    to processes 2 and 3.

Velocity updatebroadcast
1
2
3
76
Example A Broadcast
Process 2checkpoint(n) // Ckpt 2-1vel
receive_new_velocity()checkpoint(n) // Ckpt
2-2wr_log(upd d done, vupd)
Process 1checkpoint(n) // Ckpt
1-1broadcast_new_velocity(vel)checkpoint(n)
// Ckpt 1-2wr_log(upd d, vupd)
Process 3checkpoint(n) // Ckpt 3-1vel
receive_new_velocity()checkpoint(n) // Ckpt
3-2wr_log(vupd d done, vupd)
  • Each process enters their respective code
    fragments together (SPMD).
  • Assume sending a broadcast is asynchronous.
  • Process 1 broadcasts, Process 2-3 receive the new
    value.
  • Everyone is happy.

77
Example A Crash
Process 11 checkpoint(n) // Ckpt 1-12
broadcast_new_velocity(vel)3 checkpoint(n)
// Ckpt 1-24 wr_log(upd d, upd)
Process 21 checkpoint(n) // Ckpt 2-12 vel
receive_new_velocity()3 checkpoint(n) //
Ckpt 2-24 wr_log(upd d, upd)
Process 31 checkpoint(n) // Ckpt 3-12 vel
receive_new_velocity()3 checkpoint(n) //
Ckpt 3-24 wr_log(upd d, upd)
  • Suppose there is a power failure across the whole
    cluster. We can recover with checkpoints. Which
    ones do we use?
  • Suppose all logs have upd 10 as last thing.
  • Suppose none do, and we just use the last
    recorded checkpoint?
  • Suppose some have upd 10, some dont.

78
Consistent Checkpoints
  • A consistent collection (recovery line) of
    checkpoints cannot have a checkpoint from P1
    before it has sent a message M, and a checkpoint
    from P2 after it has received M.
  • Recovery to this state would lead to
  • P1 thinks it has not sent M. P2 thinks it has
    received M.
  • Impossible and inconsistent. There is no moment
    in time where this could have been the state of
    the distributed system.

C1-2
C1-1
C1-3
R2
C1-5
C1-4
M2
R1
M1
C2-1
C2-2
R3
C2-3
79
Independent Checkpointing
  • Suppose we have a set of processes that
    independently take periodic checkpoints. Can we
    always find a recovery line?
  • Known as the domino effect.
  • There is a technique described in the book on how
    to find one, if it exists.

80
Coordinated Checkpointing
  • There are distributed snapshot techniques that
    can help, but complex.
  • An alternative is to use a global coordinator.
  • Multicast a CHECKPOINT_REQUEST message.
  • Upon receipt, take a local checkpoint, block any
    new messages the application gives, and sends an
    ACK.
  • When coordinator gets an ACK from all processes,
    it sends back CHECKPOINT_DONE.

P1
ACK
ACK
C
CD
CR
CD
CR
M1
ACK
P2
ACK
81
Incremental
  • Every time a checkpoint is taken, all processes
    must write their local state.
  • If this is a CFD computation, and each process
    has a 1000x1000x1000 3D grid, then that is a lot
    of storage.
  • What if not all processes have changed their
    state?
  • One way is for each process to decide. May lead
    to a lot of network traffic.
  • If we only care about the coordinator state
    changes, then coordinator can just send
    checkpoint request to each process P it has sent
    a message to since last time.
  • Each process P must then cascade this checkpoint
    request to each process that P has sent a message
    to since the last checkpoint request.

82
Hybrid
  • What happens if we only use update logging?
  • What happens if we only use checkpointing?

83
Message Logging
  • Earlier, we mentioned that logging changes can be
    more efficient.
  • Assume that in a distributed system, all state
    changes are caused by the sending and receiving
    of messages (piecewise deterministic). Then we
    can just log messages.
  • These messages can then be replayed from the last
    checkpoint.

84
Message Replay
RecvM1
RecvM2
RecvM4
SendM3
S1
S2
S3
C1
Replay M1
Replay M2
Replay M4
RecvM1
RecvM2
RecvM4
SendM3
Recoverfrom C1
S1
S2
S3
  • How do we save the messages?

85
Pessimistic vs. Optimistic Logging
  • To replay messages, we need to save them.
  • We can save all messages to stable storage before
    we deliver to application. Upon crash, just
    replay since last checkpoint.
  • Disadvantages?
  • We can save all messages asynchronously. However
  • So R must be rolled back also, which can be
    expensive.
  • In the synchronous method, we pay a (smaller)
    cost up front.
  • In the asynchronous method, we pay a (bigger)
    cost upon failure.

86
Recovery Lines Revisited
C1-1
P1
P2
C2-1
  • Is the above a valid recovery line? If not, can
    we use logging to make it valid?
Write a Comment
User Comments (0)
About PowerShow.com