Title: Postgres-R(SI): Combining Replica Control with Concurrency Control based on Snapshot Isolation
1Postgres-R(SI) Combining Replica Control with
Concurrency Control based on Snapshot Isolation
- Shuqing Wu Bettina Kemme
- School of Computer Science,
- McGill University, Montreal
- CS632 Seminar
- By
- Rishiraj Gupta (05305014)
- CSE, MTech1
- IIT Bombay
2Introduction
- Oracle and PostgreSQL use multiversion
concurrency control providing the isolation level
snapshot isolation - Postgres-R(SI), an extension of PostgreSQL offers
transparent replication. - In this paper, the replica control algorithm is
combined with PostgreSQLs concurrency control
component.
3Introduction (contd)
- Data Replication
- Increase throughput
- Decrease response time
- Provide Fault tolerance
- Challenge Replica Control
- Standard correctness criteria
1-copy-serializability
4Implementing Replica Control
- Kernel-based approach
- Middleware-based approach
- Advantages
- Modular
- Do not need access to DB code
- Reusability
- Disadvantages
- No access to concurrency control information in
the kernel
5Replica Control (RC) methods
- Through Middleware
- Has to implement its own concurrency control (CC)
component. - Only SQL statements are visible but not the
individual tuples to be accessed. - Integrating RC into Kernel of DB system
- Directly interacts with the tuple-based CC of the
DB system. - Direct access to tuples and/or logs allows an
efficient propagation of changed tuples. - Replica Tool comes with the DB system package.
6Classifying Replica Control algorithms
- Time point of coordination
- Lazy schemes
- Eager schemes
- Update handling
- Primary copy approach
- Update everywhere
7Concurrency Control Snapshot Isolation
- Snapshot Isolation (SI)
- Conflict only write/write
- Read from a snapshot of the committed data as of
the time the transaction starts. - First committer wins of 2 concurrent conflicting
write transactions, one commits, the other aborts - Similarity with optimistic concurrency control
- Very popular (Oracle, PostgreSQL, SQL Server 2005)
T0
w(x)
commit
T1
w(x)
r(x)
w(y)
abort
T2
r(x)
w(x)
time
8Overview on Snapshot Isolation(SI)
- Allows some non-serializable executions, but
provided more concurrency for RO transactions. - Read operations read from a committed snapshot
of DB, while Write operations of concurrent
transactions are disallowed. - Replica Control integrated with MVCC algorithm of
version7 PostgreSQL, in order to detect
write/write conflicts and determine snapshots of
read operations. - Borland, Oracle, version 7 of PostgreSQL
implements this SI using MVCC.
9Global Communication System(GCS)
- Properties of GCS
- Uniform reliable delivery
- all or nothing
- avoids lost transactions
- Total Order
- If two member receive m1 and m2, they both
receive them in the same order - determines serialization order
- Virtual Synchrony
- GCS maintains view of the currently connected
members. - If two members both first receive view1 and then
view2, they receive the same set of messages
while members of view1 - handles failures
10SI-P Concurrency Control in PostgreSQL
- Each Transaction Ti assigned a unique identifier
TIDi - Committed (aborted) Version
- w(X) acquire an exclusive lock on X, which is
only released at end of transaction. - Valid Version of X Version of X created by last
committed transaction that updated X (only
one is Valid) - Active Version of X
- t_xmin TID of transaction that created tuple
version V - t_xmax TID of transaction that invalidated V by
an update - Visible tuple version Version read by a r(X)
11Version creation during execution
t_xmin T0 t_xmax Null
b0
Time
- Execution
- Version Check
- Exclusive Lock
- t_xmin, t_xmax
- Waiting queue
- Termination
- clog update
- locks release
w0(X)
X V0
c0
t_xmin T1 t_xmax Null
t_xmin T0 t_xmax T1
b1
w1(X)
X V1
X V0
b2
t1
w2(Y)
c1
t_xmin T3 t_xmax Null
t_xmin T0 t_xmax T1
t_xmin T1 t_xmax T3
w2(X)
t2
a2
b3
X V2
X V1
X V0
w3(X)
t3
b4
r4(X)
t4
At t2 T2 must abort, Why? At t3 V0 V1
committed, V1 valid, V2 is active version At t4
T4 Reads V1
12Information maintained by SI-P
- PostgreSQL keeps information about each active
transaction Ti in shared memory. - xmax TID for the next transaction which will
start just after Ti - xip a list of TIDs of all active transactions
at the moment Ti starts - Ticonc Set of concurrent transactions whose
updates are invisible to Ti - Ticom all other committed transactions.
- clog file storing commit/abort log entries.
13An example for SI-P Protocol
t_xmin T0 t_xmax Null
b0
Time
- Original
- At t2 T2 aborts.
- Replacing the order of w2(X)
- Suppose instead of T1 committing, it aborts.
w0(X)
X V0
c0
t_xmin T1 t_xmax Null
t_xmin T0 t_xmax T1
b1
w1(X)
X V1
X V0
b2
t1
w2(Y)
w2(X)
t_xmin T3 t_xmax Null
t_xmin T0 t_xmax T1
t_xmin T1 t_xmax T3
c1
a1
t_xmin T2 t_xmax T3
w2(X)
t2
t_xmax Null
a2
c2
b3
X V2
X V1
X V0
X V2
w3(X)
t3
b4
r4(X)
t4
At t2 T2 Waits for lock release. When T1
aborts at a1, T2 rebegins the version check for
X t_xmin T0 ( T0 doesnt belongs to T2conc
T0 belongs to T2com) Therefore, T2 gets the
lock on X, and can commit after doing w(X)
At t2 T2 Waits for lock release. When T1
commits at c1, T2 again does the version check
for X t_xmin T1 ( T1 belongs to T2conc )
Therefore, T2 still aborts.
14SI-PR Replica Control based on SI-P
- TIDs are local at each replica and will differ at
different replicas - To compare transactions across replicas, each
update transaction receives a GID - GIDs are generated by using total order in which
writesets are delivered. - Conflicting operations of both local and remote
transactions are executed in the order of
writeset delivery. - Writesets delivered atomically.
15Local Transaction (Ti)
- Execution Phase
- Same as in SI-P. Version check for each w(X).
- New version of X and GID of valid visible
version are added to the writeset WSi. - Send Phase
- Writeset is multicast to all replicas in total
order multicast. - Commit Phase (atomic)
- On delivery of writeset, its GIDi is generated
- If Ti was not yet aborted, GIDi added in internal
table, clog updated, locks released waking up
waiting transactions. - Abort Phase
- clog updated, locks released.
16Remote Transaction (atomic)
- On receiving WSi, Ti is started, its GIDi
generated and (GIDi/TIDi) added to internal
table. - Version Check and Early execution
- Valid version sent along with WSi is compared
with local valid version, to decide
serializablity. - Any local transaction whos WS has not yet been
delivered or sent, and is holding lock on tuples
required by remote transaction, must be aborted. - Late Execution
- Aborting is not done immediately. Its delayed
till Ti is ready to commit on local transaction.
- Commit Phase
- Updates performed, clog updated, locks released.
17Example execution of SI-PR
Multicast order T1 ? T2
R2
R1
t_xmin T0 t_xmax Null
t_xmin T0 t_xmax Null
T0 GID0 T0 commit
T1 GID0 T1 commit
X V0
X V0
t_xmin T0 t_xmax T1
t_xmin T1 t_xmax Null
t_xmin T0 t_xmax T2
t_xmin T2 t_xmax Null
X V0
X V1
X V0
X V2
XV1, GID0
t_xmin T0 t_xmax T2
t_xmin T1 t_xmax Null
t_xmin T2 t_xmax Null
T1 GID1 T1 commit
T1 GID1 T2 abort T1 commit
T1
X V0
X V1
X V2
abort
XV2, GID0
T2 abort
Discard
R1 R2 are two replicas
Time
Time
18An Example Why Late Execution useful?
- A remote transaction should not immediately abort
a local transaction whose writeset has been sent
but not yet delivered.
R1, R2, R3 are three replicas
Assume X , Y V0, GID 0
R1
R2
R3
T1 w(X)
T3 w(Y)
T2 w(Y) w(X)
T2 Aborts T1 Commit T3 Commit
T1 Commit T2 Abort T3 Commit
T1 Commit T3 Abort T2 Commit
T3 Commit T2 Aborts
If Aborting was done as Early Execution
By using Late Execution
All three send their write sets
concurrently Ideally, T1 should commit, T2 abort
T3 commit
Assume Delivery order T1 -gt T2 -gt T3
19Recovery
- GCS delivers view change message to all replicas.
- Crashed Replica rejoins
- Peer sends writesets from previous GID (when
failed) till current GID using log. - No version check required
- New writesets received are buffered.
- New Replica joining
- All writesets applied, that were delivered before
view change message. - For each tuple, visible version transferred to
new replica together with GID of the transaction
that created it. - Restart a total failure
- A new replica group started by replica with
largest GID
20Architecture of Postgres-R
21Evaluation Experiment 1
(a)TPC-W Browsing (Read Only)
(b) TPC-W Ordering (Update)
Test uses OSDL-DBT-1 (TPC-W) benchmark to
evaluate, which simulates an online bookstore
with 80 Browsing and 20 Ordering
transactions. Database Size 1000 items and 40
clients.
22Evaluation Experiment 2
100 Update Transactions
Update Workload Response Time
Test uses Database consisting of 10 tables with
10,000 tuples each. Atleast 20,000 transactions
were run to achieve stable results. Each test has
a fixed set of clients.
23Experimental Conclusions
- Experiment 1
- Response time increases with increasing load.
- Response time of centralized system is much worse
than the replicated configuration. - 10 replica system has less response time than 5
replica system. - Experiment 2
- Central system has considerable faster response
times until max. throughput for 5 clients is
reached. - Replication might pay off in update intensive
workloads in cases system enables to distributed
work loads.
24Conclusions
- We looked at PostgreSQLs version 7 Concurrency
control mechanism (SI-P) - Replica Control protocol (SI-PR) based on SI-P.
- Paper presented an eager, update everywhere
database replication approach based on snapshot
isolation. - Current solutions of pgreplication project of
PostgreSQL can be found at - http//gborg.postgresql.org/project/pgreplication
25Thank you !!
- Questions ??
- rishirajgupta_at_cse.iitb.ac.in
26Extra Slides
27Experiment 3 (Replication Overhead)
To Evaluate the delay incurred by
replication Setup 10 table database, single
client. Transactions updating 10
random tuples
28Transaction Execution
29Version creation during execution
t_xmin T0 t_xmax Null
b0
Time
- Execution
- Version Check
- Exclusive Lock
- t_xmin, t_xmax
- Waiting queue
- Termination
- clog update
- locks release
w0(X)
X V0
c0
t_xmin T1 t_xmax Null
t_xmin T0 t_xmax T1
b1
w1(X)
X V1
X V0
b2
t1
w2(Y)
c1
t_xmin T3 t_xmax Null
t_xmin T0 t_xmax T1
t_xmin T1 t_xmax T3
w2(X)
t2
a2
b3
X V2
X V1
X V0
w3(X)
t3
b4
r4(X)
t4
At t3 V0 V1 committed, V1 valid, V2 is
active version
30Example execution of SI-PR
R2
R1
t_xmin T0 t_xmax Null
t_xmin T0 t_xmax Null
T0 GID0 T0 commit
X V2
X V2
t_xmin T0 t_xmax Null
t_xmin T0 t_xmax Null
t_xmin T0 t_xmax Null
t_xmin T0 t_xmax Null
X V2
X V2
X V2
X V2
XV1, GID0
t_xmin T0 t_xmax Null
t_xmin T0 t_xmax Null
t_xmin T0 t_xmax Null
T1 GID1 T1 commit
T1 GID1 T2 abort T1 commit
X V2
X V2
X V2
XV2, GID0
T2 abort
Discard
R1 R2 are two replicas
Time
Time