Distributed Systems: - PowerPoint PPT Presentation

1 / 269
About This Presentation
Title:

Distributed Systems:

Description:

Title: No Slide Title Author: Pierre Verbaeten Last modified by: VERBAETEN Created Date: 12/29/1999 10:10:44 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 270
Provided by: PierreVe3
Category:

less

Transcript and Presenter's Notes

Title: Distributed Systems:


1
Distributed Systems Shared Data
2
Overview of chapters
  • Introduction
  • Co-ordination models and languages
  • General services
  • Distributed algorithms
  • Shared data
  • Ch 13 Transactions and concurrency control,
    13.1-13.4
  • Ch 14 Distributed transactions
  • Ch 15 Replication

3
Overview
  • Transactions and locks
  • Distributed transactions
  • Replication

4
Overview
  • Transactions
  • Nested transactions
  • Locks
  • Distributed transactions
  • Replication

5
Transactions Introduction
  • Environment
  • data partitioned over different servers on
    different systems
  • sequence of operations as individual unit
  • long-lived data at servers (cfr. Databases)
  • transactions approach to achieve consistency
    of data in a distributed environment

6
Transactions Introduction
  • Example

A
Person 1 Withdraw ( A, 100) Deposit (B, 100)
Person 2 Withdraw ( C, 200) Deposit (B, 200)
C
B
7
Transactions Introduction
  • Critical section
  • group of instructions ? indivisible block wrt
    other cs
  • short duration
  • atomic operation (within a server)
  • operation is free of interference from operations
    being performed on behalf of other (concurrent)
    clients
  • concurrency in server ? multiple threads
  • atomic operation ltgt critical section
  • transaction

8
Transactions Introduction
  • Critical section
  • atomic operation
  • transaction
  • group of different operations properties
  • single transaction may contain operations on
    different servers
  • possibly long duration

ACID properties
9
Transactions ACID
  • Properties concerning the sequence of operations
    that read or modify shared data
  • tomicity
  • onsistency
  • solation
  • urability

10
Transactions ACID
  • Atomicity or the all-or-nothing property
  • a transaction
  • commits completes successfully or
  • aborts has no effect at all
  • the effect of a committed transaction
  • is guaranteed to persist
  • can be made visible to other transactions
  • transaction aborts can be initiated by
  • the system (e.g. when a node fails) or
  • a user issuing an abort command

11
Transactions ACID
  • Consistency
  • a transaction moves data from one consistent
    state to another
  • Isolation
  • no interference from other transactions
  • intermediate effects invisible to other
    transactions
  • The isolation property has 2 parts
  • serializability running concurrent transactions
    has the same effect as some serial ordering of
    the transactions
  • Failure isolation a transaction cannot see the
    uncommitted effects of another transaction

12
Transactions ACID
  • Durability
  • once a transaction commits, the effects of the
    transaction are preserved despite subsequent
    failures

13
Transactions Life histories
  • Transactional service operations
  • OpenTransaction() ? Trans
  • starts new transaction
  • returns unique identifier for transaction
  • CloseTransaction(Trans) ? (Commit, Abort)
  • ends transaction
  • returns commit if transaction committed else
    abort
  • AbortTransaction(Trans)
  • aborts transaction

14
Transactions Life histories
  • History 1 success

T OpenTransaction() operation operation
. operation CloseTransaction(T)
Operations have read or write semantics
15
Transactions Life histories
  • History 2 abort by client

T OpenTransaction() operation operation
. operation AbortTransaction(T)
16
Transactions Life histories
  • History 3 abort by server

T OpenTransaction() operation operation
. operation
Server aborts!
Error reported
17
Transactions Concurrency
  • Illustration of well known problems
  • the lost update problem
  • inconsistent retrievals
  • operations used implementations
  • Withdraw(A, n)
  • Deposit(A, n)

b A.read() A.write( b - n)
b A.read() A.write( b n)
18
Transactions Concurrency
  • The lost update problem

Transaction T Withdraw(A,4) Deposit(B,4)
Transaction U Withdraw(C,3) Deposit(B,3)
Interleaved execution of operations on B ? ?
19
Transactions Concurrency
  • The lost update problem

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
20
Transactions Concurrency
  • The lost update problem

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
21
Transactions Concurrency
  • The lost update problem

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
bt B.read()
bt200
bu B.read() B.write(bu3)
22
Transactions Concurrency
  • The lost update problem

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
bt B.read()
bt200
bu B.read() B.write(bu3)
B.write(bt4)
23
Transactions Concurrency
  • The lost update problem

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
bt B.read()
bt200
bu B.read() B.write(bu3)
B.write(bt4)
Correct B 207!!
24
Transactions Concurrency
  • The inconsistent retrieval problem

Transaction T Withdraw(A,50) Deposit(B,50)
Transaction U BranchTotal()
25
Transactions Concurrency
  • The inconsistent retrieval problem

Transaction T A ? B 50
Transaction U BranchTotal
bt A.read() A.write(bt-50)
26
Transactions Concurrency
  • The inconsistent retrieval problem

Transaction T A ? B 50
Transaction U BranchTotal
bt A.read() A.write(bt-50)
bu A.read() bu bu B. read() bu bu
C.read()
bt B.read() B.write(bt50)
27
Transactions Concurrency
  • The inconsistent retrieval problem

Transaction T A ? B 50
Transaction U BranchTotal
bt A.read() A.write(bt-50)
bu A.read() bu bu B. read() bu bu
C.read()
bt B.read() B.write(bt50)
28
Transactions Concurrency
  • Illustration of well known problems
  • the lost update problem
  • inconsistent retrievals
  • elements of solution
  • execute all transactions serially?
  • No concurrency ? unacceptable
  • execute transactions in such a way that overall
    execution is equivalent with some serial
    execution
  • sufficient? Yes
  • how? Concurrency control

29
Transactions Concurrency
  • The lost update problem serially equivalent
    interleaving

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
30
Transactions Concurrency
  • The lost update problem serially equivalent
    interleaving

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
31
Transactions Concurrency
  • The lost update problem serially equivalent
    interleaving

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
bt B.read() B.write(bt4)
32
Transactions Concurrency
  • The lost update problem serially equivalent
    interleaving

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
bt B.read() B.write(bt4)
bu B.read() B.write(bu3)
33
Transactions Concurrency
  • The lost update problem serially equivalent
    interleaving

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
bt B.read() B.write(bt4)
bu B.read() B.write(bu3)
34
Transactions Recovery
  • Illustration of well known problems
  • a dirty read
  • premature write
  • operations used implementations
  • Withdraw(A, n)
  • Deposit(A, n)

b A.read() A.write( b - n)
b A.read() A.write( b n)
35
Transactions Recovery
  • A dirty read problem

Transaction T Deposit(A,4)
Transaction U Deposit(A,3)
Interleaved execution and abort ? ?
36
Transactions Recovery
  • A dirty read problem

Transaction T 4 ? A
Transaction U 3? A
bt A.read() A.write(bt4)
37
Transactions Recovery
  • A dirty read problem

Transaction T 4 ? A
Transaction U 3? A
bt A.read() A.write(bt4)
bu A.read() A.write(bu3)
38
Transactions Recovery
  • A dirty read problem

Transaction T 4 ? A
Transaction U 3? A
bt A.read() A.write(bt4)
bu A.read() A.write(bu3)
Commit
Abort
39
Transactions Recovery
  • Premature write or Over-writing uncommitted
    values

Transaction T Deposit(A,4)
Transaction U Deposit(A,3)
Interleaved execution and Abort ? ?
40
Transactions Recovery
  • Over-writing uncommitted values

Transaction T 4 ? A
Transaction U 3? A
bt A.read() A.write(bt4)
41
Transactions Recovery
  • Over-writing uncommitted values

Transaction T 4 ? A
Transaction U 3? A
bt A.read() A.write(bt4)
bu A.read() A.write(bu3)
42
Transactions Recovery
  • Over-writing uncommitted values

Transaction T 4 ? A
Transaction U 3? A
bt A.read() A.write(bt4)
bu A.read() A.write(bu3)
Abort
43
Transactions Recovery
  • Illustration of well known problems
  • a dirty read
  • premature write
  • elements of solution
  • Cascading Aborts a transaction reading
    uncommitted data must be aborted if the
    transaction that modified the data aborts
  • to avoid cascading aborts, transactions can only
    read data written by committed transactions
  • undo of write operations must be possible

44
Transactions Recovery
  • how to preserve data despite subsequent failures?
  • usually by using stable storage
  • two copies of data stored
  • in separate parts of disks
  • not decay related (probability of both parts
    corrupted is small)

45
Nested Transactions
  • Transactions composed of several
    sub-transactions
  • Why nesting?
  • Modular approach to structuring transactions in
    applications
  • means of controlling concurrency within a
    transaction
  • concurrent sub-transactions accessing shared data
    are serialized
  • a finer grained recovery from failures
  • sub-transactions fail independent

46
Nested Transactions
T Transfer
T1 Deposit
T2 Withdraw
  • Sub-transactions commit or abort independently
  • without effect on outcome of other
    sub-transactions or enclosing transactions
  • effect of sub-transaction becomes durable only
    when top-level transaction commits

47
Concurrency control locking
  • Environment
  • shared data in a single server (this section)
  • many competing clients
  • problem
  • realize transactions
  • maximize concurrency
  • solution serial equivalence
  • difference with mutual exclusion?

48
Concurrency control locking
  • Protocols
  • Locks
  • Optimistic Concurrency Control
  • Timestamp Ordering

49
Concurrency control locking
  • Example
  • access to shared data within a transaction?
    lock ( data reserved for )
  • exclusive locks
  • exclude access by other transactions

50
Concurrency control locking
  • Same example (lost update) with locking

Transaction T Withdraw(A,4) Deposit(B,4)
Transaction U Withdraw(C,3) Deposit(B,3)
Colour of data show owner of lock
51
Concurrency control locking
  • Exclusive locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
52
Concurrency control locking
  • Exclusive locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
53
Concurrency control locking
  • Exclusive locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
54
Concurrency control locking
  • Exclusive locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
55
Concurrency control locking
  • Exclusive locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bu B.read()
56
Concurrency control locking
  • Exclusive locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
57
Concurrency control locking
  • Exclusive locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
CloseTransaction(T)
58
Concurrency control locking
  • Exclusive locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
CloseTransaction(T)
59
Concurrency control locking
  • Exclusive locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
CloseTransaction(T)
B.write(bu3)
60
Concurrency control locking
  • Exclusive locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
CloseTransaction(T)
B.write(bu3)
CloseTransaction(U)
61
Concurrency control locking
  • Exclusive locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
CloseTransaction(T)
B.write(bu3)
CloseTransaction(U)
62
Concurrency control locking
  • Basic elements of protocol
  • serial equivalence
  • requirements
  • all of a transactions accesses to a particular
    data item should be serialized with respect to
    accesses by other transactions
  • all pairs of conflicting operations of 2
    transactions should be executed in the same order
  • how?
  • A transaction is not allowed any new locks after
    it has released a lock
  • Two-phase locking

63
Concurrency control locking
  • Two-phase locking
  • Growing Phase
  • new locks can be acquired
  • Shrinking Phase
  • no new locks
  • locks are released

64
Concurrency control locking
  • Basic elements of protocol
  • serial equivalence ? two-phase locking
  • hide intermediate results
  • conflict between
  • release of lock access by other transactions
    possible
  • access should be delayed till commit/abort
    transaction
  • how?
  • New mechanism?
  • (better) release of locks only at commit/abort
  • strict two-phase locking
  • locks held till end of transaction

65
Concurrency control locking
  • How increase concurrency and preserve serial
    equivalence?
  • Granularity of locks
  • Appropriate locking rules

66
Concurrency control locking
  • Granularity of locks
  • observations
  • large number of data items on server
  • typical transaction needs only a few items
  • conflicts unlikely
  • large granularity
  • limits concurrent access
  • example all accounts in a branch of bank are
    locked together
  • small granularity
  • overhead

67
Concurrency control locking
  • Appropriate locking rules
  • when conflicts?
  • Read Write locks

68
Concurrency control locking
  • Lock compatibility

For one data item
69
Concurrency control locking
  • Strict two-phase locking
  • locking
  • done by server (containing data item)
  • unlocking
  • done by commit/abort of the transactional service

70
Concurrency control locking
  • Use of locks on strict two-phase locking
  • when an operation accesses a data item
  • not locked yet
  • lock set operation proceeds
  • conflicting lock set by another transaction
  • transaction must wait till ...
  • non-conflicting lock set by another transaction
  • lock shared operation proceeds
  • locked by same transaction
  • lock promoted if necessary operation proceeds

71
Concurrency control locking
  • Use of locks on strict two-phase locking
  • when an operation accesses a data item
  • when a transaction is committed/aborted
  • server unlocks all data items locked for the
    transaction

72
Concurrency control locking
  • Lock implementation
  • lock manager
  • managing table of locks
  • transaction identifiers
  • identifier of (locked) data item
  • lock type
  • condition variable
  • for waiting transactions

73
Concurrency control locking
  • Deadlocks
  • a state in which each member of a group of
    transactions is waiting for some other member to
    release a lock
  • no progress possible!
  • Example with read/write locks

74
Concurrency control locking
  • Same example (lost update) with locking

Transaction T Withdraw(A,4) Deposit(B,4)
Transaction U Withdraw(C,3) Deposit(B,3)
Colour of data show owner of lock
75
Concurrency control locking
  • Read/write locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
76
Concurrency control locking
  • Read/write locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
77
Concurrency control locking
  • Read/write locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
78
Concurrency control locking
  • Read/write locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
79
Concurrency control locking
  • Read/write locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bu B.read()
80
Concurrency control locking
  • Read/write locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
81
Concurrency control locking
  • Read/write locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
B.write(bu3)
Deadlock!!
82
Concurrency control locking
  • Solutions to the Deadlock problem
  • Prevention
  • by locking all data items used by a transaction
    when it starts
  • by requesting locks on data items in a predefined
    order
  • Evaluation
  • impossible for interactive transactions
  • reduction of concurrency

83
Concurrency control locking
  • Solutions to the Deadlock problem
  • Detection
  • the server keeps track of a wait-for graph
  • lock edge is added
  • unlock edge is removed
  • the presence of cycles may be checked
  • when an edge is added
  • periodically
  • example

84
Concurrency control locking
  • Read/write locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
85
Concurrency control locking
  • Wait-for graph

A
C
Held by
T
U
B
86
Concurrency control locking
  • Read/write locks

Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
B.write(bu3)
87
Concurrency control locking
  • Wait-for graph

A
C
Held by
T
U
B
88
Concurrency control locking
  • Wait-for graph

A
C
Held by
T
U
B
89
Concurrency control locking
  • Wait-for graph

T
U
B
Cycle ? deadlock
90
Concurrency control locking
  • Solutions to the Deadlock problem
  • Detection
  • the server keeps track of a wait-for graph
  • the presence of cycles must be checked
  • once a deadlock detected, the server must select
    a transaction and abort it (to break the cycle)
  • choice of transaction? Important factors
  • age of transaction
  • number of cycles the transaction is involved in

91
Concurrency control locking
  • Solutions to the Deadlock problem
  • Timeouts
  • locks granted for a limited period of time
  • within period lock invulnerable
  • after period lock vulnerable

92
Overview
  • Transactions
  • Distributed transactions
  • Flat and nested distributed transactions
  • Atomic commit protocols
  • Concurrency in distributed transactions
  • Distributed deadlocks
  • Transaction recovery
  • Replication

93
Distributed transactions
  • Definition
  • Any transaction whose activities involve
    multiple servers
  • Examples
  • simple client accesses several servers
  • nested server accesses several other servers

94
Distributed transactions
  • Examples simple
  • Serial execution of requests on different server

95
Distributed transactions
  • Examples nesting
  • Serial or parallel execution of requests on
    different servers

96
Distributed transactions
  • Examples

97
Distributed transactions
  • Commit agreement between all servers involved
  • to commit
  • to abort
  • take one server as coordinator
  • simple (?) protocol
  • single point of failure?
  • tasks of the coordinator
  • keep track of other servers, called workers
  • responsible for final decision

98
Distributed transactions
  • New service operations
  • AddServer( TransID, CoordinatorID)
  • called by clients
  • first operation on server that has not joined the
    transaction yet
  • NewServer( TransID, WorkerID)
  • called by new server on the coordinator
  • coordinator records ServerID of the worker in its
    workers list

99
Distributed transactions
coordinator
  • Examples simple

A
1. T XOpenTransaction()
2. XWithdraw(A,4)
T OpenTransaction() XWithdraw(A,4) ZDeposit
(C,4) YWithdraw(B,3) ZDeposit(D,3) CloseTrans
action(T)
B
C,D
100
Distributed transactions
coordinator
  • Examples simple

A
4. XNewServer(T, Z)
T OpenTransaction() XWithdraw(A,4) ZDeposit
(C,4) YWithdraw(B,3) ZDeposit(D,3) CloseTrans
action(T)
B
3. ZAddServer(T, X)
5. ZDeposit(C,4)
C,D
worker
101
Distributed transactions
coordinator
  • Examples simple

A
7. XNewServer(T, Y)
T OpenTransaction() XWithdraw(A,4) ZDeposit
(C,4) YWithdraw(B,3) ZDeposit(D,3) CloseTrans
action(T)
6. YAddServer(T, X)
B
8. YWithdraw(B,3)
worker
C,D
worker
102
Distributed transactions
coordinator
  • Examples simple

A
T OpenTransaction() XWithdraw(A,4) ZDeposit
(C,4) YWithdraw(B,3) ZDeposit(D,3) CloseTrans
action(T)
B
worker
9. ZDeposit(D, 3)
C,D
worker
103
Distributed transactions
coordinator
  • Examples simple

A
10. XCloseTransaction(T)
T OpenTransaction() XWithdraw(A,4) ZDeposit
(C,4) YWithdraw(B,3) ZDeposit(D,3) CloseTrans
action(T)
B
worker
C,D
worker
104
Distributed transactions
coordinator
  • Examples data at servers

A
B
worker
C,D
worker
105
Overview
  • Transactions
  • Distributed transactions
  • Flat and nested distributed transactions
  • Atomic commit protocols
  • Concurrency in distributed transactions
  • Distributed deadlocks
  • Transaction recovery
  • Replication

106
Atomic Commit protocol
  • Elements of the protocol
  • each server is allowed to abort its part of a
    transaction
  • if a server votes to commit it must ensure that
    it will eventually be able to carry out this
    commitment
  • the transaction must be in the prepared state
  • all altered data items must be on permanent
    storage
  • if any server votes to abort, then the decision
    must be to abort the transaction

107
Atomic Commit protocol
  • Elements of the protocol (cont.)
  • the protocol must work correctly, even when
  • some servers fail
  • messages are lost
  • servers are temporarily unable to communicate

108
Atomic Commit protocol
  • Protocol
  • Phase 1 voting phase
  • Phase 2 completion according to outcome of vote

109
Atomic Commit protocol
  • Protocol

Coordinator Step Status
Worker Step Status
1 prepared to commit
2 prepared to commit
3 (counting votes) committed
4 committed
done
110
Atomic Commit protocol
  • Protocol Phase 1 voting phase
  • Coordinator for operation CloseTransaction
  • sends CanCommit to each worker
  • behaves as worker in phase 1
  • waits for replies from workers
  • Worker when receiving CanCommit
  • if for worker transaction can commit
  • saves data items
  • sends Yes to coordinator
  • if for worker transaction cannot commit
  • sends No to coordinator
  • clears data structures, removes locks

111
Atomic Commit protocol
  • Protocol Phase 2
  • Coordinator collecting votes
  • all votes Yes
  • commit transaction send DoCommit to workers
  • one vote No
  • abort transaction
  • Worker voted yes, waits for decision of
    coordinator
  • receives DoCommit
  • makes committed data available removes locks
  • receives AbortTransaction
  • clears data structures removes locks

112
Atomic Commit protocol
  • Timeouts
  • worker did all/some operations and waits for
    CanCommit
  • unilateral abort possible
  • coordinator waits for votes of workers
  • unilateral abort possible
  • worker voted Yes and waits for final decision of
    coordinator
  • wait unavoidable
  • extensive delay possible
  • additional operation GetDecision can be used to
    get decision from coordinator or other workers

113
Atomic Commit protocol
  • Performance
  • C ? W CanCommit N-1 messages
  • W ? C Yes/No N-1 messages
  • C ? W DoCommit N-1 messages
  • W ? C HaveCommitted N-1 messages
  • (unavoidable) delays possible

114
Atomic Commit protocol
  • Nested Transactions
  • top level transaction subtransactions
  • transaction tree

115
Atomic Commit protocol
T11
T12
T21
T1
T
T22
T2
116
Atomic Commit protocol
  • Nested Transactions
  • top level transaction subtransactions
  • transaction tree
  • coordinator top level transaction
  • subtransaction identifiers
  • globally unique
  • allow derivation of ancestor transactions(why
    necessary?)

117
Atomic Commit protocol
  • Nested Transactions Transaction IDs

118
Atomic Commit protocol
  • Upon completion of a subtransaction
  • independent decision to commit or abort
  • commit of subtransaction
  • only provisionally
  • status (including status of descendants) reported
    to parent
  • final outcome dependant on its ancestors
  • abort of subtransaction
  • implies abort of all its descendants
  • abort reported to its parent (always possible?)

119
Atomic Commit protocol
  • Data structures
  • commit list list of all committed
    (sub)transactions
  • aborts list list of all aborted
    (sub)transactions
  • example

120
Atomic Commit protocol
  • Data structures example

121
Atomic Commit protocol
  • Data structures example

122
Atomic Commit protocol
  • Data structures example

123
Atomic Commit protocol
  • Data structures example

T1
T12
T
T21
Z
N
T2
124
Atomic Commit protocol
  • Data structures example

125
Atomic Commit protocol
  • Data structures example

T11
T1
T12
T
T21
Z
N
T2
126
Atomic Commit protocol
  • Data structures example

127
Atomic Commit protocol
  • Data structures example

T11
abort
T1
T12
T
T21
Z
N
T2
128
Atomic Commit protocol
  • Data structures example

129
Atomic Commit protocol
  • Data structures example

T11
abort
T1
T12
T
T21
Z
N
T2
130
Atomic Commit protocol
  • Data structures example

131
Atomic Commit protocol
  • Data structures example

T11
abort
T1
T12
commit
T
T21
Z
N
T2
132
Atomic Commit protocol
  • Data structures example

133
Atomic Commit protocol
  • Data structures example

T11
abort
T1
T12
commit
commit
T
T21
Z
N
T2
134
Atomic Commit protocol
  • Data structures example

135
Atomic Commit protocol
  • Data structures example

136
Atomic Commit protocol
  • Data structures example

T11
abort
T1
T12
commit
commit
T
T21
Z
N
T2
137
Atomic Commit protocol
  • Data structures example

138
Atomic Commit protocol
  • Data structures example

T11
abort
T1
T12
commit
commit
T
T21
Z
N
T2
139
Atomic Commit protocol
  • Data structures example

140
Atomic Commit protocol
  • Data structures example

T11
abort
T1
T12
commit
commit
T
T21
commit
Z
N
T2
141
Atomic Commit protocol
  • Data structures example

142
Atomic Commit protocol
  • Data structures example

T11
abort
T1
T12
commit
commit
T
T21
commit
Z
N
T2
T22
143
Atomic Commit protocol
  • Data structures example

144
Atomic Commit protocol
  • Data structures example

T11
abort
T1
T12
commit
commit
T
T21
commit
Z
N
T2
T22
commit
145
Atomic Commit protocol
  • Data structures example

146
Atomic Commit protocol
  • Data structures example

T11
abort
T1
T12
commit
commit
T
T21
commit
Z
N
T2
T22
commit
abort
147
Atomic Commit protocol
  • Data structures example

148
Atomic Commit protocol
  • Data structures example

149
Atomic Commit protocol
  • Data structures final data

150
Atomic Commit protocol
  • Algorithm of coordinator (flat protocol)
  • Phase 1
  • send CanCommit to each worker in commit list
  • TransactionId T
  • abort list
  • coordinator behaves as worker
  • Phase 2 (as for non-nested transactions)
  • all votes Yes
  • commit transaction send DoCommit to workers
  • one vote No
  • abort transaction

151
Atomic Commit protocol
  • Algorithm of worker (flat protocol)
  • Phase 1 (after receipt of CanCommit)
  • at least one (provisionally) committed descendant
    of top level transaction
  • transactions with ancestors in abort list are
    aborted
  • prepare for commit of other transactions
  • send Yes to coordinator
  • no (provisionally) committed descendant
  • send No to coordinator
  • Phase 2 (as for non-nested transactions)

152
Atomic Commit protocol
  • Algorithm of worker (flat protocol)
  • Phase 1 (after receipt of CanCommit)
  • Phase 2 voted yes, waits for decision of
    coordinator
  • receives DoCommit
  • makes committed data available removes locks
  • receives AbortTransaction
  • clears data structures removes locks

153
Atomic Commit protocol
  • Timeouts
  • same 3 as above
  • worker did all/some operations and waits for
    CanCommit
  • coordinator waits for votes of workers
  • worker voted Yes and waits for final decision of
    coordinator
  • provisionally committed child with an aborted
    ancestor
  • does not participate in algorithm
  • has to make an enquiry itself
  • when?

154
Atomic Commit protocol
  • Data structures final data

155
Atomic Commit protocol
  • Data structures example

T11
abort
T1
T12
commit
commit
T
T21
commit
Z
N
T2
T22
commit
abort
156
Overview
  • Transactions
  • Distributed transactions
  • Flat and nested distributed transactions
  • Atomic commit protocols
  • Concurrency in distributed transactions
  • Distributed deadlocks
  • Transaction recovery
  • Replication

157
Distributed transactions Locking
  • Locks are maintained locally (at each server)
  • it decides whether
  • to grant a lock
  • to make the requesting transaction wait
  • it cannot release the lock until it knows whether
    the transaction has been
  • committed
  • aborted
  • at all servers
  • deadlocks can occur

158
Distributed transactions Locking
  • Locking rules for nested transactions
  • child transaction inherits locks from parents
  • when a nested transaction commits, its locks are
    inherited by its parents
  • when a nested transaction aborts, its locks are
    removed
  • a nested transaction can get a read lock when all
    the holders of write locks (on that data item)
    are ancestors
  • a nested transaction can get a write lock when
    all the holders of read and write locks (on that
    data item) are ancestors

159
Distributed transactions Locking
  • Who can access A?

T11
A
T1
T12
T
T21
Z
N
T2
T22
160
Overview
  • Transactions
  • Distributed transactions
  • Flat and nested distributed transactions
  • Atomic commit protocols
  • Concurrency in distributed transactions
  • Distributed deadlocks
  • Transaction recovery
  • Replication

161
Distributed deadlocks
  • Single server approaches
  • prevention difficult to apply
  • timeouts value with variable delays?
  • Detection
  • global wait-for-graph can be constructed from
    local ones
  • cycle in global graph possible without cycle in
    local graph

162
Distributed transactions Deadlocks
W
U
V
163
Distributed transactions Deadlocks
  • Algorithms
  • centralised deadlock detection not a good idea
  • depends on a single server
  • cost of transmission of local wait-for graphs
  • distributed algorithm
  • complex
  • phantom deadlocks
  • edge chasing approach

164
Distributed transactions Deadlocks
  • Phantom deadlocks
  • deadlock detected that is not really a deadlock
  • during deadlock detection
  • while constructing global wait-for graph
  • waiting transaction is aborted

165
Distributed transactions Deadlocks
  • Edge Chasing
  • distributed approach to deadlock detection
  • no global wait-for graph is constructed
  • servers attempt to find cycles
  • by forwarding probes ( messages) that follow
    edges of the wait-for graph throughout the
    distributed system

166
Distributed transactions Deadlocks
  • Edge Chasing
  • three steps
  • initiation transaction starts waiting
  • new probe constructed
  • detection probe received
  • extend probe
  • check for loop
  • forward new probe
  • resolution

167
Distributed transactions Deadlocks
  • Edge Chasing initiation
  • send out probe
  • when transaction T starts waiting for U (and U
    is already waiting for )
  • in case of lock sharing, different probes are
    forwarded

T ? U
168
Distributed transactions Deadlocks
Initiation
W
C
Z
U
V
169
Distributed transactions Deadlocks
  • Edge Chasing detection
  • when receiving probe
  • Check if U is waiting
  • if U is waiting for V (and V is waiting)add V to
    probe
  • check for loop in probe?
  • yes ? deadlock
  • no ? forward new probe

T ? U
T ? U ? V
170
Distributed transactions Deadlocks
Initiation
W
C
Z
U
V
171
Distributed transactions Deadlocks
  • Edge Chasing resolution
  • abort one transaction
  • problem?
  • Every waiting transaction can initiate deadlock
    detection
  • detection may happen at different servers
  • several transactions may be aborted
  • solution transactions priorities

172
Distributed transactions Deadlocks
  • Edge Chasing transaction priorities
  • assign priority to each transaction, e.g. using
    timestamps
  • solution of problem above
  • abort transaction with lowest priority
  • if different servers detect same cycle, the same
    transaction will be aborted

173
Distributed transactions Deadlocks
  • Edge Chasing transaction priorities
  • other improvements
  • number of initiated probe messages ?
  • detection only initiated when higher priority
    transaction waits for a lower priority one
  • number of forwarded probe messages ?
  • probes travel downhill -from transaction with
    high priority to transactions with lower
    priorities
  • probe queues required more complex algorithm

174
Overview
  • Transactions
  • Distributed transactions
  • Flat and nested distributed transactions
  • Atomic commit protocols
  • Concurrency in distributed transactions
  • Distributed deadlocks
  • Transaction recovery
  • Replication

175
Transactions and failures
  • Introduction
  • Approaches to fault-tolerant systems
  • replication
  • instantaneous recovery from a single fault
  • expensive in computing resources
  • restart and restore consistent state
  • less expensive
  • requires stable storage
  • slow(er) recovery process

176
Transactions and failures
  • Overview
  • Stable storage
  • Transaction recovery
  • Recovery of the two-phase commit protocol

177
Transactions and failures Stable storage
  • Ensures that any essential permanent data will be
    recoverable after any single system failure
  • allow system failures
  • during a disk write
  • damage to any single disk block
  • hardware solution ? RAID technology
  • software solution
  • based on pairs of blocks for same data item
  • checksum to determine whether block is good or
    bad

178
Transactions and failures Stable storage
  • Based on the following invariant
  • not more than one block of any pair is bad
  • if both are good
  • same data
  • except during execution of write operation
  • write operation
  • maintains invariant
  • writes on both blocks are done strictly
    sequential
  • restart of stable storage server after crash
  • recovery procedure to restore invariant

179
Transactions and failures Stable storage
  • Recovery for a pair
  • both good and the same
  • ok
  • one good, one bad
  • copy good block to bad block
  • both good and different
  • copy one block to the other

180
Transactions and failures
  • Overview
  • Stable storage
  • Transaction recovery
  • Recovery of the two-phase commit protocol

181
Transactions and failures Transaction recovery
  • atomic property of transaction implies
  • durability
  • data items stored in permanent storage
  • data will remain available indefinitely
  • failure atomicity
  • effects of transactions are atomic even when
    servers fail
  • recovery should ensure durability and failure
    atomicity

182
Transactions and failures Transaction recovery
  • Assumptions about servers
  • servers keep data in volatile storage
  • committed data recorded in a recovery file
  • single mechanism recovery manager
  • save data items in permanent storage for
    committed transactions
  • restore the servers data items after a crash
  • reorganize the recovery file to improve
    performance of recovery
  • reclaim storage space in the recovery file

183
Transactions and failures Transaction recovery
  • Elements of algorithm
  • each server maintains an intention list for all
    of its active transactions pairs of
  • name
  • new value
  • decision of server prepared to commit a
    transaction
  • intention list saved in the recovery file
    (stable storage)
  • server receives DoCommit
  • commit recorded in recovery file
  • after a crash based on recovery file
  • effects of committed transactions restored (in
    correct order)
  • effects of other transactions neglected

184
Transactions and failures Transaction recovery
  • Alternative implementations for recovery file
  • logging technique
  • shadow versions
  • (see book for details)

185
Transactions and failures
  • Overview
  • Stable storage
  • Transaction recovery
  • Recovery of the two-phase commit protocol

186
Transactions and failures two-phase commit
protocol
  • Server can fail during commit protocol
  • each server keeps its own recovery file
  • 2 new status values
  • done
  • uncertain

187
Transactions and failures two-phase commit
protocol
  • meaning of status values
  • committed
  • coordinator outcome of votes is yes
  • worker protocol is complete
  • done
  • coordinator protocol is complete
  • uncertain
  • worker voted yes outcome unknown

188
Transactions and failures two-phase commit
protocol
  • Recovery actions (status_at_) in recovery file
  • prepared_at_coordinator
  • no decision before failure of server
  • send AbortTransaction to all workers
  • aborted_at_coordinator
  • send AbortTransaction to all workers
  • committed_at_coordinator
  • decision to commit taken before crash
  • send DoCommit to all workers
  • resume protocol

189
Transactions and failures two-phase commit
protocol
  • Recovery actions (status_at_) in recovery file
  • committed_at_worker
  • send HaveCommitted to coordinator
  • uncertain_at_worker
  • send GetDecision to coordinator to get status
  • prepared_at_worker
  • not yet voted yes
  • unilateral abort possible
  • done_at_coordinator
  • no action required

190
Overview
  • Transactions
  • Distributed transactions
  • Replication
  • System model and group communication
  • Fault-tolerant services
  • Highly available services
  • Transactions with replicated data

191
Replication
  • A technique for enhancing services
  • Performance enhancement
  • Increased availability
  • Fault tolerance
  • Requirements
  • Replication transparency
  • Consistency

192
Overview
  • Transactions
  • Distributed transactions
  • Replication
  • System model and group communication
  • Fault-tolerant services
  • Highly available services
  • Transactions with replicated data

193
System model and group communication
  • Architectural model

194
System model and group communication
  • 5 phases in the execution of a request
  • FE issues requests to one or more RMs
  • Coordination needed to execute requests
    consistently
  • FIFO
  • Causal
  • Total
  • Execution by all managers, perhaps tentatively
  • Agreement
  • Response

195
System model and group communication
  • Need for dynamic groups!
  • Role of group membership service
  • Interface for group membership changes
    create/destroy groups, add process
  • Implementing a failure detector monitor group
    members
  • Notifying members of group membership changes
  • Performing group address expansion
  • Handling network partitions group is
  • Reduced primary-partition
  • Split partitionable

196
System model and group communication
197
System model and group communication
  • View delivery
  • To all members when a change in membership occurs
  • ltgt receive view
  • Event occurring in a view v(g) at process p
  • Basic requirements for view delivery
  • Order if process p delivers v(g)
    and then v(g) then no
    process delivers v(g) before v(g)
  • Integrity if p delivers v(g) then p ?
    v(g)
  • Non-triviality if q joins group and remains
    reachable then eventually q ? v(g) at
    p

198
System model and group communication
  • View-synchronous group communication
  • Reliable multicast handle changing group views
  • Guarantees
  • Agreement correct processes deliver the same set
    of messages in any given view
  • Integrity if a process delivers m, it will
    not deliver it again
  • Validity if the system fails to deliver m to q
    then other processes will
    deliver v(g) (v(g) q)
    before delivering m

199
System model and group communication
200
Overview
  • Transactions
  • Distributed transactions
  • Replication
  • System model and group communication
  • Fault-tolerant services
  • Highly available services
  • Transactions with replicated data

201
Fault-tolerant services
  • Goal provide a service that is correct
    despite up to f process failures
  • Assumpti
Write a Comment
User Comments (0)
About PowerShow.com