A Theory of Redo Recovery - PowerPoint PPT Presentation

About This Presentation

Title:

A Theory of Redo Recovery

Description:

Synchronize State update & ops replayed ... State of any prefix of CSG can be recovered by ... Generalize physiological ops. read/write multiple variables ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 26

Provided by: lom1

Category:

more less

Transcript and Presenter's Notes

Title: A Theory of Redo Recovery

1
A Theory of Redo Recovery

David Lomet
Microsoft Research, Redmond
Mark Tuttle
HP Research, Cambridge

2
Big Picture
Much simpler than our VLDB95 paper

Redo Recovery requires
Good db state
Replay of the right operations
Good state updates conflict order not required
Write-read conflicts can be ignored
Some db variables irrelevant (dont need to
update them)
Synchronize State update ops replayed
Captured in recovery Invariant
We prove that maintaining invariant ? recovery
Current recovery methods maintain invariant
Show how current methods work (e.g. ARIES redo)
Show how new methods could work

3
Conflict State Graph (CSG)

Conflict graph (Borrowed from Concurrency
Control)
Nodes are log operations Edges conflicts (RW,
WR, WW)
State graph SG
Add writes(node) ltname, valuegt of vars
updated
State for SG ltx,vgt ltx,vgt in writes(n) and n is
last node in state graph with x in vars(n)
Final state Sfinal of CSG is desired recovered
state
Any prefix of a state graph is a state graph
Prefix node in prefix ? predecessor in prefix
State of any prefix of CSG can be recovered by
Replaying operations in suffix in conflict graph
order

We will relax CSG requirements
4
Conflict State Graph States
O readsetx writesltx,1gt
Write-read edge
Write-read write-write read-write edge
P readsetx writeslty,2gt
Q readsetx writesltx,3gt
Read-write edge
5
Installation Graph

Example Initial stable state ltx,0gtlty,0gt
O x ? x1
P y ? x1
After O,P, state is ltx,1gt,lty,2gt
Flush y to disk- Stable state is ltx,0gtlty,2gt
Replay O- generates correct state ltx,1gt,lty,2gt
Os readset x unchanged by Ps installation
Even though Write-Read edge orders P after O
Installation graph
conflict graph without write-read edges
Installation state graph (ISG)
same writes(n) for node n as conflict state graph
State of any prefix of ISG can be recovered
More prefixes (states) because of fewer edges

y written by P
6
Installation State Graph States
x0,y0
O readsetx writesltx,1gt
Removed write-read edge
x1,y0
ISG recoverable state
P readsetx writeslty,2gt
Retained write-write read-write edge
x0,y2
x1, y2
Q readsetx writesltx,3gt
Retained read-write edge
x3, y2
7
Exposed Variables

Example
O1 x ? z1
O2 x ? 25
After O2, we dont care about x value of O1
Variable x is unexposed after ops I (O1 here)
if
minconflict op in Ops(log) I writes x
Without reading it
xs value is a dont care when x is unexposed
This is example of Physical Logging
Prefix of installation graph explains state S if
values of exposed variables in S are the same as
values in state of prefix of ISG

8
Potentially Recoverable State

Potentially recoverable state state that
by the replay of a subset of operations of the
conflict graph, in conflict order, will produce
the recovered state Sfinal
Theorem If S is a state explained by a prefix of
the installation graph, then S is potentially
recoverable

9
REDO Test Recovery Procedure

REDO tests ops in conflict order log scan
Yes (true) replay operation
No (false) bypass operation
redo_set OREDO(O..) O on scanned log
Recover Procedure
Set log scan point to checkpoint
while not at log end
O ? current log operation
State if REDO(O,State,Log,Analysis)
Then O(State)
Else State
Advance log scan point to next operation
End

10
Recovery

Recoverable system a system with
a potentially recoverable state Spot
Replay of Os in redo_set from Spot produces
Sfinal
Inv ops(Log)-redo_set defines prefix of the
installation state graph that explains State
Every system change must be atomic transition
maintaining Inv
Corollary Given a state, log, checkpoint, and an
execution of Recover (identifying redo_set)
If Inv holds
Then System is recoverable

Only specific potentially recoverable state is
recoverable
11
Write Graph

Write graph start from installation state graph
Collapse set of nodes (acyclic) merges nodes
Add new node for next operation
Add edge (collapse cycles)
Remove a write of an unexposed variable
We do not care about values of unexposed
variables
Write graph captures entire system state
Prefix that is stable
Suffix in cache
Cache Manager uses write graph
To maintain potentially recoverable state
Usually by collapsing suffix node into stable
prefix

12
Write Graph via Node Collapse Fewer States
x0,y0
O readsetx writesltx,1gt
Removed write-read edge Write graph remains
acyclic Based on installation graph
Ops(n) O,P Writes(n) ltx,3gt
P readsetx writeslty,2gt
x0,y2
Q readsetx writesltx,3gt
Retained read-write edge translates to flush
order for cache manager
Keep only one version of each variable in cache
x3, y2
13
Managing Recovery
Updating State
Log
O1
Atomic
O2
Collapse to Install
X
O3
Volatile State Suffix of Write Graph In Cache
Removing O3 from redo_set
14
Physiological Recovery
Physical and Logical Recovery described in paper

Physiological recovery (e.g. ARIES)
Operation Form read A, write A
Log Op has LSN
Variable tagged LSN of last log op writing it
REDO ops LSN gt variable LSN ? Yes (Replay)
Our explanation
Ops writing variable collapsed to one cache node
Flushing page to stable state (root of write
graph)
Collapses cache node into stable state node
Keeps state potentially recoverable
redo test ? nodes ops removed from redo_set
Maintains invariant Inv
state change redo_set change is atomic

15
Extended LSN Method

Generalize physiological ops
read/write multiple variables
Our example ops can read X, write Y (like P)
also read X, write X
LSNs still effective for REDO test
Flush synchronizes change to state and redo_set
Cache management
Now requires flush of one variable before another
Our theory captures this careful write
requirement
Consider B-tree split (Blink-tree)
Next slide shows half split graphically
Must also post index term for new node

16
Extended Recovery Blink-tree Split
New Node Y
Old Node X
x0,y0
Update Node X
Move half to node Y Read X, write Y
P readsetx writeslty,2gt
x0,y2
Flush Y before X In SqlServer 6.0
Update node X remove Y records
x3, y2
17
Recoverable Systems Summary

Cache management keeps state potentially
recoverable
Very generally via write graph
Derived from installation state graph
Maintains invariant INV
so that replayed operations are correct set
By synchronizing changes to redo_set with changes
to state

18
Questions?
19
Outline

Foundation
Conflict graph, state graphs, recovered state
Abstract Recovery
Cache Management maintaining state
Installation order weaker update order than
conflict order
Recovery
Recovery procedure, redo test
Invariant guarantees correct recovery
Coordinating state before failure with recovery
execution after failure
Recoverable Systems
Write graphs for maintaining potentially
recoverable state
Maintaining recovery invariant
Explaining current recovery methods

20
Managing the Cache

Stable state prefix of write graph
Usually a single node
Means stable state potentially recoverable
Cache usually contains write graph suffix
Volatile state- which is lost during system crash
Usually collapsing nodes so that one node per
variable
State update move a minimum write graph node in
cache to stable state atomically
Start with potentially recoverable state
Atomic transition frequently node collapse
New potentially recoverable state

21
Maintaining Recovery Invariant

Potentially recoverable state only half of job
Ops(log) Redo_set must explain state
Jobs need to be synchronized to enforce INV
Examples Stable state is root of write graph
Logical recovery (in paper)
Physical recovery (in paper)
Physiological recovery
Extended recovery

22
Logical Recovery

Logical recovery with arbitrary log ops System
R
Quiesce and write shadow checkpoint to disk
By dumping cache contents to disk shadow pages
Disk shadow is installed atomically
Replacing old versions of shadow variables
Our explanation
Shadow coalesced on disk is single write graph
node
Encompassing all changes from last checkpoint
Hence is a write graph prefix
Shadow installed atomically via pointer swing
Accomplished by writing new pointer in checkpoint
record to log
Log is truncated with the writing checkpoint
record
All prior records are added to checkpoint
Which installs all earlier operations
simultaneously with stable state update, hence
maintaining Inv