Distributed Systems Course Distributed transactions - PowerPoint PPT Presentation

About This Presentation

Distributed Systems Course Distributed transactions


Title: Figure 15.1 A distributed multimedia system Author: George Coulouris Last modified by: czhang Created Date: 6/18/2000 9:59:47 PM Document presentation format – PowerPoint PPT presentation

Number of Views:233
Avg rating:3.0/5.0
Slides: 52
Provided by: George508
Learn more at: http://users.cis.fiu.edu


Transcript and Presenter's Notes

Title: Distributed Systems Course Distributed transactions

Distributed Systems Course Distributed
  • 13.1 Introduction
  • 13.2 Flat and nested distributed transactions
  • 13.3 Atomic commit protocols
  • 13.4 Concurrency control in distributed
  • 13.5 Distributed deadlocks
  • 13.6 Transaction recovery

Commitment of distributed transactions -
  • a distributed transaction refers to a flat or
    nested transaction that accesses objects managed
    by multiple servers
  • When a distributed transaction comes to an end
  • the either all of the servers commit the
  • or all of them abort the transaction.
  • one of the servers is coordinator, it must ensure
    the same outcome at all of the servers.
  • the two-phase commit protocol is the most
    commonly used protocol for achieving this

Distributed transactions
In a nested transaction, the top-level
transaction can open subtransactions, and each
subtransaction can open further subtransactions
down to any depth of nesting
In the nested case, subtransactions at the same
level can run concurrently, so T1 and T2 are
concurrent, and as they invoke objects in
different servers, they can run in parallel.
A flat client transaction completes each of its
requests before going on to the next one.
Therefore, each transaction accesses servers
objects sequentially

Nested banking transaction
requests can be run in parallel - with several
servers, the nested transaction is more efficient
  • client transfers 10 from A to C and then
    transfers 20 from B to

The coordinator of a flat distributed transaction
Why might a participant abort a transaction?
  • Servers execute requests in a distributed
  • when it commits they must communicate with one
    another to coordinate their actions
  • a client starts a transaction by sending an
    openTransaction request to a coordinator in any
    server (next slide)
  • it returns a TID unique in the distributed
    system(e.g. server ID local transaction number)
  • at the end, it will be responsible for committing
    or aborting it
  • each server managing an object accessed by the
    transaction is a participant - it joins the
    transaction (next slide)
  • a participant keeps track of objects involved in
    the transaction
  • at the end it cooperates with the coordinator in
    carrying out the commit protocol
  • note that a participant can call abortTransaction
    in coordinator

A flat distributed banking transaction
a clients (flat) banking transaction involves
accounts A, B, C and D at servers BranchX,
BranchY and BranchZ
Each server is shown with a participant, which
joins the transaction by invoking the join method
in the coordinator
  • Note that the TID (T) is passed with each request
    e.g. withdraw(T,3)

The join operation
  • The interface for Coordinator is shown in Figure
  • it has openTransaction, closeTransaction and
  • openTransaction returns a TID which is passed
    with each operation so that servers know which
    transaction is accessing its objects
  • The Coordinator interface provides an additional
    method, join, which is used whenever a new
    participant joins the transaction
  • join(Trans, reference to participant)
  • informs a coordinator that a new participant has
    joined the transaction Trans.
  • the coordinator records the new participant in
    its participant list.
  • the fact that the coordinator knows all the
    participants and each participant knows the
    coordinator will enable them to collect the
    information that will be needed at commit time.

Atomic commit protocols
  • transaction atomicity requires that at the end,
  • either all of its operations are carried out or
    none of them.
  • in a distributed transaction, the client has
    requested the operations at more than one server
  • one-phase atomic commit protocol
  • the coordinator tells the participants whether to
    commit or abort
  • what is the problem with that?
  • this does not allow one of the servers to decide
    to abort it may have discovered a deadlock or
    it may have crashed and been restarted
  • two-phase atomic commit protocol
  • is designed to allow any participant to choose to
    abort a transaction
  • phase 1 - each participant votes. If it votes to
    commit, it is prepared. It cannot change its
    mind. In case it crashes, it must save updates in
    permanent store
  • phase 2 - the participants carry out the joint

The decision could be commit or abort -
participants record it in permanent store

Failure model for the commit protocols
  • Recall the failure model for transactions in
    Chapter 12
  • this applies to the two-phase commit protocol
  • Commit protocols are designed to work in
  • asynchronous system (e.g. messages may take a
    very long time)
  • servers may crash
  • messages may be lost.
  • assume corrupt and duplicated messages are
  • no byzantine faults servers either crash or
    they obey their requests
  • 2PC is an example of a protocol for reaching a
  • Chapter 11 says consensus cannot be reached in an
    asynchronous system if processes sometimes fail.
  • however, 2PC does reach consensus under those
  • because crash failures of processes are masked by
    replacing a crashed process with a new process
    whose state is set from information saved in
    permanent storage and information held by other

The two-phase commit protocol
Why does participant record updates in permanent
storage at bthis stage?
How many messages are sent between the
coordinator and each participant?
  • During the progress of a transaction, the only
    communication between coordinator and participant
    is the join request
  • The client request to commit or abort goes to the
  • if client or participant request abort, the
    coordinator informs the participants immediately
  • if the client asks to commit, the 2PC comes into
  • 2PC
  • voting phase coordinator asks all participants
    if they can commit
  • if yes, participant records updates in permanent
    storage and then votes
  • completion phase coordinator tells all
    participants to commit or abort
  • the next slide shows the operations used in
    carrying out the protocol

Operations for two-phase commit protocol
This is a request with a reply
Asynchronous request
  • participant interface- canCommit?, doCommit,
    doAbortcoordinator interface- haveCommitted,

The two-phase commit protocol
  • Phase 1 (voting phase)
  • 1. The coordinator sends a canCommit? request to
    each of the participants in the transaction.
  • 2. When a participant receives a canCommit?
    request it replies with its vote (Yes or No) to
    the coordinator. Before voting Yes, it prepares
    to commit by saving objects in permanent storage.
    If the vote is No the participant aborts
  • Phase 2 (completion according to outcome of
  • 3. The coordinator collects the votes (including
    its own).
  • (a)If there are no failures and all the votes are
    Yes the coordinator decides to commit the
    transaction and sends a doCommit request to each
    of the participants.
  • (b)Otherwise the coordinator decides to abort the
    transaction and sends doAbort requests to all
    participants that voted Yes.
  • 4. Participants that voted Yes are waiting for a
    doCommit or doAbort request from the coordinator.
    When a participant receives one of these messages
    it acts accordingly and in the case of commit,
    makes a haveCommitted call as confirmation to the

Communication in two-phase commit protocol
Think about the coordinator in step 1 - what is
the problem?
Think about step 2 - what is the problem for the
Think about participant before step 2 - what is
the problem?
  • Time-out actions in the 2PC
  • to avoid blocking forever when a process crashes
    or a message is lost
  • uncertain participant (step 2) has voted yes. it
    cant decide on its own
  • it uses getDecision method to ask coordinator
    about outcome
  • participant has carried out client requests, but
    has not had a Commit?from the coordinator. It can
    abort unilaterally
  • coordinator delayed in waiting for votes (step
    1). It can abort and send doAbort to

Performance of the two-phase commit protocol
  • if there are no failures, the 2PC involving N
    participants requires
  • N canCommit? messages and replies, followed by
    N doCommit messages.
  • the cost in messages is proportional to 3N, and
    the cost in time is three rounds of messages.
  • The haveCommitted messages are not counted
  • there may be arbitrarily many server and
    communication failures
  • 2PC is is guaranteed to complete eventually, but
    it is not possible to specify a time limit within
    which it will be completed
  • delays to participants in uncertain state
  • some 3PCs designed to alleviate such delays
  • they require more messages and more rounds for
    the normal case

13.3.2 Two-phase commit protocol for nested
  • Recall Fig 13.1b, top-level transaction T and
    subtransactions T1, T2, T11, T12, T21, T22
  • A subtransaction starts after its parent and
    finishes before it
  • When a subtransaction completes, it makes an
    independent decision either to commit
    provisionally or to abort.
  • A provisional commit is not the same as being
    prepared it is a local decision and is not
    backed up on permanent storage.
  • If the server crashes subsequently, its
    replacement will not be able to carry out a
    provisional commit.
  • A two-phase commit protocol is needed for nested
  • it allows servers of provisionally committed
    transactions that have crashed to abort them when
    they recover.

Figure 13.7Operations in coordinator for nested
The TID of a subtransaction is an extension of
its parent's TID, so that a subtransaction can
work out the TID of the top-level transaction.
The client finishes a set of nested transactions
by calling closeTransaction or abortTransacation
in the top-level transaction.
openSubTransaction(trans) -gt subTrans Opens a new
subtransaction whose parent is trans and returns
a unique subtransaction identifier. getStatus(tran
s)-gt committed, aborted, provisional Asks the
coordinator to report on the status of the
transaction trans. Returns values representing
one of the following committed, aborted,
  • This is the interface of the coordinator of a
  • It allows it to open further subtransactions
  • It allows its subtransactions to enquire about
    its status
  • Client starts by using OpenTransaction to open a
    top-level transaction.
  • This returns a TID for the top-level transaction
  • The TID can be used to open a subtransaction
  • The subtransaction automatically joins the parent
    and a TID is returned.

Transaction T decides whether to commit
T12 has provisionally committed and T11 has
aborted, but the fate of T12 depends on its
parent T1 and eventually on the top-level
transaction, T.
Although T21 and T22 have both provisionally
committed, T2 has aborted and this means that T21
and T22 must also abort.
Suppose that T decides to commit although T2 has
aborted, also that T1 decides to commit although
T11 has aborted
Figure 13.8
  • Recall that
  • A parent can commit even if a subtransaction
  • If a parent aborts, then its subtransactions must
  • In the figure, each subtransaction has either
    provisionally committed or aborted

Information held by coordinators of nested
  • When a top-level transcation commits it carries
    out a 2PC
  • Each coordinator has a list of its
  • At provisional commit, a subtransaction reports
    its status and the status of its descendents to
    its parent
  • If a subtransaction aborts, it tells its parent

Figure 13.9

T12 and T21 share a coordinator as they both run
at server N
When T2 is aborted it tells T (no information
about descendents)
A subtransaction (e.g. T21 and T22) is called an
orphan if one of its ancestors aborts
an orphan uses getStatus to ask its parent about
the outcome. It should abort if its parent has
canCommit? for hierarchic two-phase commit
canCommit?(trans, subTrans) -gt Yes / No Call a
coordinator to ask coordinator of child
subtransaction whether it can commit a
subtransaction subTrans. The first argument trans
is the transaction identifier of top-level
transaction. Participant replies with its vote
Yes / No.
Figure 13.10
  • Top-level transaction is coordinator of 2PC.
  • participant list
  • the coordinators of all the subtransactions that
    have provisionally committed
  • but do not have an aborted ancestor
  • E.g. T, T1 and T12 in Figure 13.8
  • if they vote yes, they prepare to commit by
    saving state in permanent store
  • The state is marked as belonging to the top-level
  • The 2PC may be performed in a hierarchic or a
    flat manner

Hierarchic 2PC - T asks canCommit? to T1 and T1
asks canCommit? to T12
The trans argument is used when saving the
objects in permanent storage
The subTrans argument is use to find the
subtransaction to vote on. If absent, vote no.

canCommit? for flat two-phase commit protocol
Compare the advantages and disadvantages of the
flat and nested approaches
canCommit?(trans, abortList) -gt Yes / No Call
from coordinator to participant to ask whether it
can commit a transaction. Participant replies
with its vote Yes / No.
Figure 13.11
  • Flat 2PC
  • the coordinator of the top-level transaction
    sends canCommit? messages to the coordinators of
    all of the subtransactions in the provisional
    commit list.
  • in our example, T sends to the coordinators of T1
    and T12.
  • the trans argument is the TID of the top-level
  • the abortList argument gives all aborted
  • e.g. server N has T12 prov committed and T21
  • On receiving canCommit, participant
  • looks in list of transactions for any that match
    trans (e.g. T12 and T21 at N)
  • it prepares any that have provisionally committed
    and are not in abortList and votes yes
  • if it can't find any it votes no

Time-out actions in nested 2PC
  • With nested transactions delays can occur in the
    same three places as before
  • when a participant is prepared to commit
  • when a participant has finished but has not yet
    received canCommit?
  • when a coordinator is waiting for votes
  • Fourth place
  • provisionally committed subtransactions of
    aborted subtransactions e.g. T22 whose parent T2
    has aborted
  • use getStatus on parent, whose coordinator should
    remain active for a while
  • If parent does not reply, then abort

Summary of 2PC
  • a distributed transaction involves several
    different servers.
  • A nested transaction structure allows
  • additional concurrency and
  • independent committing by the servers in a
    distributed transaction.
  • atomicity requires that the servers participating
    in a distributed transaction either all commit it
    or all abort it.
  • atomic commit protocols are designed to achieve
    this effect, even if servers crash during their
  • the 2PC protocol allows a server to abort
  • it includes timeout actions to deal with delays
    due to servers crashing.
  • 2PC protocol can take an unbounded amount of time
    to complete but is guaranteed to complete

13.4 Concurrency control in distributed
  • Each server manages a set of objects and is
    responsible for ensuring that they remain
    consistent when accessed by concurrent
  • therefore, each server is responsible for
    applying concurrency control to its own objects.
  • the members of a collection of servers of
    distributed transactions are jointly responsible
    for ensuring that they are performed in a
    serially equivalent manner
  • therefore if transaction T is before transaction
    U in their conflicting access to objects at one
    of the servers then they must be in that order at
    all of the servers whose objects are accessed in
    a conflicting manner by both T and U

13.4.1 Locking
  • In a distributed transaction, the locks on an
    object are held by the server that manages it.
  • The local lock manager decides whether to grant a
    lock or make the requesting transaction wait.
  • it cannot release any locks until it knows that
    the transaction has been committed or aborted at
    all the servers involved in the transaction.
  • the objects remain locked and are unavailable for
    other transactions during the atomic commit
  • an aborted transaction releases its locks after
    phase 1 of the protocol.

Interleaving of transactions T and U at servers X
and Y
  • in the example on page 529, we have
  • T before U at server X and U before T at server Y
  • different orderings lead to cyclic dependencies
    and distributed deadlock
  • detection and resolution of distributed deadlock
    in next section

Write(A) at X locks A
Write(B) at Y locks B
Read(B) at Y waits for U
Read(A) at X waits for T

13.4.2 Timestamp ordering concurrency control
  • Single server transactions
  • coordinator issues a unique timestamp to each
    transaction before it starts
  • serial equivalence ensured by committing objects
    in order of timestamps
  • Distributed transactions
  • the first coordinator accessed by a transaction
    issues a globally unique timestamp
  • as before the timestamp is passed with each
    object access
  • the servers are jointly responsible for ensuring
    serial equivalence
  • that is if T access an object before U, then T is
    before U at all objects
  • coordinators agree on timestamp ordering
  • a timestamp consists of a pair ltlocal timestamp,
  • the agreed ordering of pairs of timestamps is
    based on a comparison in which the server-id part
    is less significant they should relate to time

Timestamp ordering concurrency control (continued)
Can the same ordering be achieved at all servers
without clock synchronization?
Why is it better to have roughly synchronized
  • The same ordering can be achieved at all servers
    even if their clocks are not synchronized
  • for efficiency it is better if local clocks are
    roughly synchronized
  • then the ordering of transactions corresponds
    roughly to the real time order in which they were
  • Timestamp ordering
  • conflicts are resolved as each operation is
  • if this leads to an abort, the coordinator will
    be informed
  • it will abort the transaction at the participants
  • any transaction that reaches the client request
    to commit should always be able to do so
  • participant will normally vote yes
  • unless it has crashed and recovered during the

Optimistic concurrency control
Use backward validation
1. write/read, 2. read/write, 3. write/write
  • each transaction is validated before it is
    allowed to commit
  • transaction numbers assigned at start of
  • transactions serialized according to transaction
  • validation takes place in phase 1 of 2PC protocol
  • consider the following interleavings of T and U
  • T before U at X and U before T at Y
  1. satisfied
  2. checked
  3. paralllel

Suppose T U start validation at about the same
Read(A) at X Read(B) at Y
Write(A) Write(B)
Read(B) at Y Read(A) at X
Write(B) Write(A)
X does T first Y does U first
No parallel Validation . commitment deadlock

Commitment deadlock in optimistic concurrency
  • servers of distributed transactions do parallel
  • therefore rule 3 must be validated as well as
    rule 2
  • the write set of Tv is checked for overlaps with
    write sets of earlier transactions
  • this prevents commitment deadlock
  • it also avoids delaying the 2PC protocol
  • another problem - independent servers may
    schedule transactions in different orders
  • e.g. T before U at X and U before T at Y
  • this must be prevented - some hints as to how on
    page 531

13.5 Distributed deadlocks
  • Single server transactions can experience
  • prevent or detect and resolve
  • use of timeouts is clumsy, detection is
  • it uses wait-for graphs.
  • Distributed transactions lead to distributed
  • in theory can construct global wait-for graph
    from local ones
  • a cycle in a global wait-for graph that is not in
    local ones is a distributed deadlock

Figure 13.12Interleavings of transactions U, V
and W
  • objects A, B managed by X and Y C and D by Z
  • next slide has global wait-for graph

U ? V at Y
V ? W at Z
W ? U at X

Figure 13.13Distributed deadlock
  • a deadlock cycle has alternate edges showing
    wait-for and held-by
  • wait-for added in order U ? V at Y V ? W at Z
    and W ? U at X


Deadlock detection - local wait-for graphs
  • Local wait-for graphs can be built, e.g.
  • server Y U ? V added when U requests
  • server Z V ? W added when V requests
  • server X W ? U added when W requests
  • to find a global cycle, communication between the
    servers is needed
  • centralized deadlock detection
  • one server takes on role of global deadlock
  • the other servers send it their local graphs from
    time to time
  • it detects deadlocks, makes decisions about which
    transactions to abort and informs the other
  • usual problems of a centralized service - poor
    availability, lack of fault tolerance and no
    ability to scale

Figure 13.14Local and global wait-for graphs
  • Phantom deadlocks
  • a deadlock that is detected, but is not really
  • happens when there appears to be a cycle, but one
    of the transactions has released a lock, due to
    time lags in distributing graphs
  • in the figure suppose U releases the object at X
    then waits for V at Y
  • and the global detector gets Ys graph before Xs
    (T ? U ? V ? T)

Edge chasing - a distributed approach to deadlock
  • a global graph is not constructed, but each
    server knows about some of the edges
  • servers try to find cycles by sending probes
    which follow the edges of the graph through the
    distributed system
  • when should a server send a probe (go back to Fig
  • edges were added in order U ? V at Y V ? W at Z
    and W ? U at X
  • when W ? U at X was added, U was waiting, but
  • when V ? W at Z, W was not waiting
  • send a probe when an edge T1 ? T2 when T2 is
  • each coordinator records whether its transactions
    are active or waiting
  • the local lock manager tells coordinators if
    transactions start/stop waiting
  • when a transaction is aborted to break a
    deadlock, the coordinator tells the participants,
    locks are removed and edges taken from wait-for

Edge-chasing algorithms
  • Three steps
  • Initiation
  • When a server notes that T starts waiting for U,
    where U is waiting at another server, it
    initiates detection by sending a probe containing
    the edge lt T ? U gt to the server where U is
  • If U is sharing a lock, probes are sent to all
    the holders of the lock.
  • Detection
  • Detection consists of receiving probes and
    deciding whether deadlock has occurred and
    whether to forward the probes.
  • e.g. when server receives probe lt T ? U gt it
    checks if U is waiting, e.g. U ? V, if so it
    forwards lt T ? U ? V gt to server where V waits
  • when a server adds a new edge, it checks whether
    a cycle is there
  • Resolution
  • When a cycle is detected, a transaction in the
    cycle is aborted to break the deadlock.

Figure 13.15Probes transmitted to detect deadlock
  • example of edge chasing starts with X sending ltW
    ? Ugt, then Y sends ltW ? U ? V gt, then Z sends ltW
    ? U ? V ? Wgt

Edge chasing conclusion
  • probe to detect a cycle with N transactions will
    require 2(N-1) messages.
  • Studies of databases show that the average
    deadlock involves 2 transactions.
  • the above algorithm detects deadlock provided
  • waiting transactions do not abort
  • no process crashes, no lost messages
  • to be realistic it would need to allow for the
    above failures
  • refinements of the algorithm (p 536-7)
  • to avoid more than one transaction causing
    detection to start and then more than one being
  • not time to study these now

Figure 13.16Two probes initiated

Figure 13.17Probes travel downhill

Summary of concurrency control for distributed
  • each server is responsible for the
    serializability of transactions that access its
    own objects.
  • additional protocols are required to ensure that
    transactions are serializable globally.
  • timestamp ordering requires a globally agreed
    timestamp ordering
  • optimistic concurrency control requires global
    validation or a means of forcing a global
    ordering on transactions.
  • two-phase locking can lead to distributed
  • distributed deadlock detection looks for cycles
    in the global wait-for graph.
  • edge chasing is a non-centralized approach to the
    detection of distributed deadlocks
  • .

13.6 Transaction recovery
What is meant by durability?
What is meant by failure atomicity?
  • Atomicity property of transactions
  • durability and failure atomicity
  • durability requires that objects are saved in
    permanent storage and will be available
  • failure atomicity requires that effects of
    transactions are atomic even when the server
  • Recovery is concerned with
  • ensuring that a servers objects are durable and
  • that the service provides failure atomicity.
  • for simplicity we assume that when a server is
    running, all of its objects are in volatile
  • and all of its committed objects are in a
    recovery file in permanent storage
  • recovery consists of restoring the server with
    the latest committed versions of all of its
    objects from its recovery file

Recovery manager
  • The task of the Recovery Manager (RM) is
  • to save objects in permanent storage (in a
    recovery file) for committed transactions
  • to restore the servers objects after a crash
  • to reorganize the recovery file to improve the
    performance of recovery
  • to reclaim storage space (in the recovery file).
  • media failures
  • i.e. disk failures affecting the recovery file
  • need another copy of the recovery file on an
    independent disk. e.g. implemented as stable
    storage or using mirrored disks
  • we deal with recovery of 2PC separately (at the
  • we study logging (13.6.1) but not shadow versions

Recovery - intentions lists
  • Each server records an intentions list for each
    of its currently active transactions
  • an intentions list contains a list of the object
    references and the values of all the objects that
    are altered by a transaction
  • when a transaction commits, the intentions list
    is used to identify the objects affected
  • the committed version of each object is replaced
    by the tentative one
  • the new value is written to the servers recovery
  • in 2PC, when a participant says it is ready to
    commit, its RM must record its intentions list
    and its objects in the recovery file
  • it will be able to commit later on even if it
  • when a client has been told a transaction has
    committed, the recovery files of all
    participating servers must show that the
    transaction is committed,
  • even if they crash between prepare to commit and

Types of entry in a recovery file
Why is that a good idea?
Object state flattened to bytes
first entry says prepared
  • For distributed transactions we need information
    relating to the 2PC as well as object values,
    that is
  • transaction status (committed, prepared or
  • intentions list

Logging - a technique for the recovery file
  • the recovery file represents a log of the history
    of all the transactions at a server
  • it includes objects, intentions lists and
    transaction status
  • in the order that transactions prepared,
    committed and aborted
  • a recent snapshot a history of transactions
    after the snapshot
  • during normal operation the RM is called whenever
    a transaction prepares, commits or aborts
  • prepare - RM appends to recovery file all the
    objects in the intentions list followed by status
    (prepared) and the intentions list
  • commit/abort - RM appends to recovery file the
    corresponding status
  • assume append operation is atomic, if server
    fails only the last write will be incomplete
  • to make efficient use of disk, buffer writes.
    Note sequential writes are more efficient than
    those to random locations
  • committed status is forced to the log - in case
    server crashes

Log for banking service
  • Logging mechanism for Fig 12.7 (there would
    really be other objects in log file)
  • initial balances of A, B and C 100, 200, 300
  • T sets A and B to 80 and 220. U sets B and C
    to 242 and 278
  • entries to left of line represent a snapshot
    (checkpoint) of values of A, B and C before T
    started. T has committed, but U is prepared.
  • the RM gives each object a unique identifier (A,
    B, C in diagram)
  • each status entry contains a pointer to the
    previous status entry, then the checkpoint can
    follow transactions backwards through the file

Recovery of objects - with logging
  • When a server is replaced after a crash
  • it first sets default initial values for its
  • and then hands over to its recovery manager.
  • The RM restores the servers objects to include
  • all the effects of all the committed transactions
    in the correct order and
  • none of the effects of incomplete or aborted
  • it reads the recovery file backwards (by
    following the pointers)
  • restores values of objects with values from
    committed transactions
  • continuing until all of the objects have been
  • if it started at the beginning, there would
    generally be more work to do
  • to recover the effects of a transaction use the
    intentions list to find the value of the objects
  • e.g. look at previous slide (assuming the server
    crashed before T committed)
  • the recovery procedure must be idempotent

Logging - reorganising the recovery file
  • RM is responsible for reorganizing its recovery
  • so as to make the process of recovery faster and
  • to reduce its use of space
  • checkpointing
  • the process of writing the following to a new
    recovery file
  • the current committed values of a servers
  • transaction status entries and intentions lists
    of transactions that have not yet been fully
  • including information related to the two-phase
    commit protocol (see later)
  • checkpointing makes recovery faster and saves
    disk space
  • done after recovery and from time to time
  • can use old recovery file until new one is ready,
    add a mark to old file
  • do as above and then copy items after the mark to
    new recovery file
  • replace old recovery file by new recovery file

Figure 13.20Shadow versions
Recovery of the two-phase commit protocol
  • The above recovery scheme is extended to deal
    with transactions doing the 2PC protocol when a
    server fails
  • it uses new transaction status values done,
    uncertain (see Fig 13.6)
  • the coordinator uses committed when result is
  • done when 2PC complete ( if a transaction is done
    its information may be removed when reorganising
    the recovery file)
  • the participant uses uncertain when it has voted
    Yes committed when told the result (uncertain
    entries must not be removed from recovery file)
  • It also requires two additional types of entry

Type of entry Description of contents of entry
Coordinator Transaction identifier, list of participants added by RM when coordinator prepared
Participant Transaction identifier, coordinator added by RM when participant votes yes

Log with entries relating to two-phase commit
Start at end, for U find it is committed and a
We have T committed and coordinator
But if the server has crashed before the last
entry we have U uncertain and participant
or if the server crashed earlier we have U
prepared and participant
  • entries in log for
  • T where server is coordinator (prepared comes
    first, followed by the coordinator entry, then
    committed done is not shown)
  • and U where server is participant (prepared comes
    first followed by the participant entry, then
    uncertain and finally committed)
  • these entries will be interspersed with values of
  • recovery must deal with 2PC entries as well as
    restoring objects
  • where server was coordinator find coordinator
    entry and status entries.
  • where server was participant find participant
    entry and status entries

Recovery of the two-phase commit protocol
the most recent entry in the recovery file
determines the status of the transaction at the
time of failure
the RM action for each transaction depends on
whether server was coordinator or participant and
the status

Figure 13.23Nested transactions
Summary of transaction recovery
  • Transaction-based applications have strong
    requirements for the long life and integrity of
    the information stored.
  • Transactions are made durable by performing
    checkpoints and logging in a recovery file, which
    is used for recovery when a server is replaced
    after a crash.
  • Users of a transaction service would experience
    some delay during recovery.
  • It is assumed that the servers of distributed
    transactions exhibit crash failures and run in an
    asynchronous system,
  • but they can reach consensus about the outcome of
    transactions because crashed servers are replaced
    with new processes that can acquire all the
    relevant information from permanent storage or
    from other servers

Write a Comment
User Comments (0)
About PowerShow.com