A Theory of Redo Recovery - PowerPoint PPT Presentation

About This Presentation
Title:

A Theory of Redo Recovery

Description:

Synchronize State update & ops replayed ... State of any prefix of CSG can be recovered by ... Generalize physiological ops. read/write multiple variables ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 26
Provided by: lom1
Category:
Tags: recovery | redo | theory

less

Transcript and Presenter's Notes

Title: A Theory of Redo Recovery


1
A Theory of Redo Recovery
  • David Lomet
  • Microsoft Research, Redmond
  • Mark Tuttle
  • HP Research, Cambridge

2
Big Picture
Much simpler than our VLDB95 paper
  • Redo Recovery requires
  • Good db state
  • Replay of the right operations
  • Good state updates conflict order not required
  • Write-read conflicts can be ignored
  • Some db variables irrelevant (dont need to
    update them)
  • Synchronize State update ops replayed
  • Captured in recovery Invariant
  • We prove that maintaining invariant ? recovery
  • Current recovery methods maintain invariant
  • Show how current methods work (e.g. ARIES redo)
  • Show how new methods could work

3
Conflict State Graph (CSG)
  • Conflict graph (Borrowed from Concurrency
    Control)
  • Nodes are log operations Edges conflicts (RW,
    WR, WW)
  • State graph SG
  • Add writes(node) ltname, valuegt of vars
    updated
  • State for SG ltx,vgt ltx,vgt in writes(n) and n is
    last node in state graph with x in vars(n)
  • Final state Sfinal of CSG is desired recovered
    state
  • Any prefix of a state graph is a state graph
  • Prefix node in prefix ? predecessor in prefix
  • State of any prefix of CSG can be recovered by
  • Replaying operations in suffix in conflict graph
    order

We will relax CSG requirements
4
Conflict State Graph States
O readsetx writesltx,1gt
Write-read edge
Write-read write-write read-write edge
P readsetx writeslty,2gt
Q readsetx writesltx,3gt
Read-write edge
5
Installation Graph
  • Example Initial stable state ltx,0gtlty,0gt
  • O x ? x1
  • P y ? x1
  • After O,P, state is ltx,1gt,lty,2gt
  • Flush y to disk- Stable state is ltx,0gtlty,2gt
  • Replay O- generates correct state ltx,1gt,lty,2gt
  • Os readset x unchanged by Ps installation
  • Even though Write-Read edge orders P after O
  • Installation graph
  • conflict graph without write-read edges
  • Installation state graph (ISG)
  • same writes(n) for node n as conflict state graph
  • State of any prefix of ISG can be recovered
  • More prefixes (states) because of fewer edges

y written by P
6
Installation State Graph States
x0,y0
O readsetx writesltx,1gt
Removed write-read edge
x1,y0
ISG recoverable state
P readsetx writeslty,2gt
Retained write-write read-write edge
x0,y2
x1, y2
Q readsetx writesltx,3gt
Retained read-write edge
x3, y2
7
Exposed Variables
  • Example
  • O1 x ? z1
  • O2 x ? 25
  • After O2, we dont care about x value of O1
  • Variable x is unexposed after ops I (O1 here)
    if
  • minconflict op in Ops(log) I writes x
  • Without reading it
  • xs value is a dont care when x is unexposed
  • This is example of Physical Logging
  • Prefix of installation graph explains state S if
    values of exposed variables in S are the same as
    values in state of prefix of ISG

8
Potentially Recoverable State
  • Potentially recoverable state state that
  • by the replay of a subset of operations of the
    conflict graph, in conflict order, will produce
    the recovered state Sfinal
  • Theorem If S is a state explained by a prefix of
    the installation graph, then S is potentially
    recoverable

9
REDO Test Recovery Procedure
  • REDO tests ops in conflict order log scan
  • Yes (true) replay operation
  • No (false) bypass operation
  • redo_set OREDO(O..) O on scanned log
  • Recover Procedure
  • Set log scan point to checkpoint
  • while not at log end
  • O ? current log operation
  • State if REDO(O,State,Log,Analysis)
  • Then O(State)
  • Else State
  • Advance log scan point to next operation
  • End

10
Recovery
  • Recoverable system a system with
  • a potentially recoverable state Spot
  • Replay of Os in redo_set from Spot produces
    Sfinal
  • Inv ops(Log)-redo_set defines prefix of the
    installation state graph that explains State
  • Every system change must be atomic transition
    maintaining Inv
  • Corollary Given a state, log, checkpoint, and an
    execution of Recover (identifying redo_set)
  • If Inv holds
  • Then System is recoverable

Only specific potentially recoverable state is
recoverable
11
Write Graph
  • Write graph start from installation state graph
  • Collapse set of nodes (acyclic) merges nodes
  • Add new node for next operation
  • Add edge (collapse cycles)
  • Remove a write of an unexposed variable
  • We do not care about values of unexposed
    variables
  • Write graph captures entire system state
  • Prefix that is stable
  • Suffix in cache
  • Cache Manager uses write graph
  • To maintain potentially recoverable state
  • Usually by collapsing suffix node into stable
    prefix

12
Write Graph via Node Collapse Fewer States
x0,y0
O readsetx writesltx,1gt
Removed write-read edge Write graph remains
acyclic Based on installation graph
Ops(n) O,P Writes(n) ltx,3gt
P readsetx writeslty,2gt
x0,y2
Q readsetx writesltx,3gt
Retained read-write edge translates to flush
order for cache manager
Keep only one version of each variable in cache
x3, y2
13
Managing Recovery
Updating State
Log
O1
Atomic
O2
Collapse to Install
X
O3
Volatile State Suffix of Write Graph In Cache
Removing O3 from redo_set
14
Physiological Recovery
Physical and Logical Recovery described in paper
  • Physiological recovery (e.g. ARIES)
  • Operation Form read A, write A
  • Log Op has LSN
  • Variable tagged LSN of last log op writing it
  • REDO ops LSN gt variable LSN ? Yes (Replay)
  • Our explanation
  • Ops writing variable collapsed to one cache node
  • Flushing page to stable state (root of write
    graph)
  • Collapses cache node into stable state node
  • Keeps state potentially recoverable
  • redo test ? nodes ops removed from redo_set
  • Maintains invariant Inv
  • state change redo_set change is atomic

15
Extended LSN Method
  • Generalize physiological ops
  • read/write multiple variables
  • Our example ops can read X, write Y (like P)
  • also read X, write X
  • LSNs still effective for REDO test
  • Flush synchronizes change to state and redo_set
  • Cache management
  • Now requires flush of one variable before another
  • Our theory captures this careful write
    requirement
  • Consider B-tree split (Blink-tree)
  • Next slide shows half split graphically
  • Must also post index term for new node

16
Extended Recovery Blink-tree Split
New Node Y
Old Node X
x0,y0
Update Node X
Move half to node Y Read X, write Y
P readsetx writeslty,2gt
x0,y2
Flush Y before X In SqlServer 6.0
Update node X remove Y records
x3, y2
17
Recoverable Systems Summary
  • Cache management keeps state potentially
    recoverable
  • Very generally via write graph
  • Derived from installation state graph
  • Maintains invariant INV
  • so that replayed operations are correct set
  • By synchronizing changes to redo_set with changes
    to state

18
Questions?
19
Outline
  • Foundation
  • Conflict graph, state graphs, recovered state
  • Abstract Recovery
  • Cache Management maintaining state
  • Installation order weaker update order than
    conflict order
  • Recovery
  • Recovery procedure, redo test
  • Invariant guarantees correct recovery
  • Coordinating state before failure with recovery
    execution after failure
  • Recoverable Systems
  • Write graphs for maintaining potentially
    recoverable state
  • Maintaining recovery invariant
  • Explaining current recovery methods

20
Managing the Cache
  • Stable state prefix of write graph
  • Usually a single node
  • Means stable state potentially recoverable
  • Cache usually contains write graph suffix
  • Volatile state- which is lost during system crash
  • Usually collapsing nodes so that one node per
    variable
  • State update move a minimum write graph node in
    cache to stable state atomically
  • Start with potentially recoverable state
  • Atomic transition frequently node collapse
  • New potentially recoverable state

21
Maintaining Recovery Invariant
  • Potentially recoverable state only half of job
  • Ops(log) Redo_set must explain state
  • Jobs need to be synchronized to enforce INV
  • Examples Stable state is root of write graph
  • Logical recovery (in paper)
  • Physical recovery (in paper)
  • Physiological recovery
  • Extended recovery

22
Logical Recovery
  • Logical recovery with arbitrary log ops System
    R
  • Quiesce and write shadow checkpoint to disk
  • By dumping cache contents to disk shadow pages
  • Disk shadow is installed atomically
  • Replacing old versions of shadow variables
  • Our explanation
  • Shadow coalesced on disk is single write graph
    node
  • Encompassing all changes from last checkpoint
  • Hence is a write graph prefix
  • Shadow installed atomically via pointer swing
  • Accomplished by writing new pointer in checkpoint
    record to log
  • Log is truncated with the writing checkpoint
    record
  • All prior records are added to checkpoint
  • Which installs all earlier operations
    simultaneously with stable state update, hence
    maintaining Inv

23
Physical Recovery
  • Physical recovery writes entire page
  • Pages are written back to disk
  • When prefix of log contains only pages already
    written back, log is truncated
  • Via checkpoint record indicating redo pass start
  • All records scanned during recovery are replayed
  • REDO(op) always is yes
  • Our explanation
  • Operations are blind writes of single variable-
    read set is empty
  • All variables with operations not in checkpoint
    are unexposed
  • These operations are replayed during recovery
  • They never read
  • Writing to those variables leaves them unexposed
  • However, they are now set to be installed
  • Installation occurs when checkpoint record is
    written
  • Operations now not part of redo scan are thus
    installed

24
Our Goal
  • REDO Recovery explanation (Not all of recovery)
  • Cache management stage data to stable state
  • Goal fewer writes less constrained order
  • Some methods require careful write ordering why?
  • Recovery which ops to replay
  • And how to coordinate state changes with replay
    changes
  • Provably ensure recoverability
  • Disclaimers
  • Abstract story- real recovery needs more
  • Simpler operation model than past work
  • Not everything is explained
  • All actually used recovery techniques are handled
  • But not all recovery techniques we know of are
    quite captured

25
System Model
  • State ltname, valuegt
  • Operation
  • readset(O) set of variables read by O
  • writeset(O) set of variables written by O
  • Operations are atomic system must ensure
    atomicity
  • Operation Sequence
  • Sequence of ops O1,O2,Ok Ofinal
  • State Sequence
  • Sequence of states S1, S2, Sk Sfinal
    generated by op seg from S0
  • Ok precedes (leads to) Sk when executed
    against Sk-1
  • Recovery goal
  • From some state and a record of operations (on
    log)
  • Reproduce last state in sequence Sfinal
Write a Comment
User Comments (0)
About PowerShow.com