A%20chicken%20in%20every%20pot:%20a%20persistent%20snapshot%20memory%20scaled%20in%20time - PowerPoint PPT Presentation

About This Presentation
Title:

A%20chicken%20in%20every%20pot:%20a%20persistent%20snapshot%20memory%20scaled%20in%20time

Description:

Another talk: how to construct archived page tables: :Construct APT (v4) = recorded (v4) Construct APT (v5) Filtering example: ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 49
Provided by: csBra
Category:

less

Transcript and Presenter's Notes

Title: A%20chicken%20in%20every%20pot:%20a%20persistent%20snapshot%20memory%20scaled%20in%20time


1
A chicken in every pota persistent snapshot
memoryscaled in time
  • Liuba Shrira and Hao Xu
  • Brandeis University

2
Storage systems the 7 year itch
  • 1984 rotational delay FFS
  • 1991 large memory - LFS
  • 1998 cheaper disk - Elephant
  • 2005 .. a chicken in every pot
  • snapshot box on the side..

3
Trends
  • Hardware Disk
  • Cheap (1/GB) and cheaper
  • Software Industry Forbes (12/2004) says
  • need for keeping past state is growing

4
Trends cont.
  • - A casino chases a card counter
  • - IT dept. chased by Sarbanes Oxley
  • - Hippocratic DB audited about patient privacy
    preservation
  • Need to analyze past activity

5
SNAP a snapshot system for an object storage
system
  • Goal
  • Storage system capability for
  • back-in-time execution (BITE)
  • application runs against
  • read-only snapshots
  • without synchronization analysis in
    retrospect

6
Baseline Requirements for BITE
  • Consistent snapshots same (old) invariants
    hold
  • BITE of general code after-the-fact ad-hoc
    analysis
  • ( vs predefined SQL access methods)
  • App chooses the snapshot snapshot state
    meaningful
  • to app (vs some time in the past )
  • High time resolution fine-grained past
    analysis (vs backup for recovery)

7
Over long time-scales..
  • Living with the past how close?
  • today too close (Temporal DB, CVFS)
  • or too far (warehouse - Netezza)
  • Snapshots can be of long-term importance, or
    transient
  • today uniform - apps can not discriminate
  • Inherent tension
  • latency of access vs
  • cost of representation (space and
    time)
  • today limited adaptation - compress
    or not

8
Capturing past states
  • Two ways
  • Cheep - no-overwrite update
  • past stays put, copy new
  • less to write, but
  • bloated DB, past inherits same rep
  • Opportunistic- in-place update
  • past is copied-out, separated
  • more to write but can write smartly, can
  • tailor past rep, and DB stays clustered (vigor)

9
Our requirements
  • Non-disruptive past just right distance -
    separated
  • At adaptive distance
  • e.g. faster BITE on more recent states
  • Discriminated past
  • application classifies, snapshot system
    filters
  • Some snapshots outlive others,
  • some can be accessed faster
  • Flexible classification e.g. after the fact

10
Snapshot system operations
  • Request to take a snapshot (declaration)
  • sid snapshot_request (filter_spec)
  • Request to access a snapshot v
  • snapshot_access (sid)
  • Request to specify a filter for a snapshot v
  • lazy_filter (sid,filter_spec)
  • T1, T2, S1, T3, T4, T5, S2,

11
Baseline storage system
  • General interface
  • pages and a page table
  • transactions access objects on pages
  • Server
  • DB disk slotted pages of objects
  • physical oid (page,o)
  • and a page table
  • Transaction Log
  • Cache pages and modifed object cache

12
Storage system, cont.optimistic CCARIES
  • Clients
  • fetch pages, run transactions
  • send modifed objects to server
  • Server
  • validates, commits (WAL)
  • caches committed modifications
  • no-force, no-STEAL

13
The snapshot system
  • Archive separated from DB
  • Archive i/o sequential, DB random
  • Copy-on-write (COW)
  • copy out snapshot states into archive
  • just before updating DB
  • during cleaning.

14
Snapshot interface
  • Same as DB -
  • Snapshot Pages
  • Snapshot Page Table
  • So BITE is transparent
  • BITE on snapshot S(v) uses PageTable(v)

15
Snapshot systembelow the interface
  • Some S(v) pages are in the archive,
  • some in DB
  • and pages in the archive can have
  • a different representations

16
BITE (v) namespace redirection
17
Creating non-disruptive snapshots (i/o bound
system)
  • Archiving snapshot states when cleaning
  • can slow down cleaning
  • compared to a system without snapshots.
  • Copying to the archive disk (sequential I/O)
  • in parallel
  • to database I/O (random)
  • can partially hide archiving cost
  • behind database I/O.

18
Creating snapshots how well can you hide?
  • Is determined by
  • how much is archived
  • compactness of snapshot representation,
  • frequency, snapshot
  • update workload (overwriting)
  • cost of archiving,
  • sequential, other archive traffic BITE

19
Creating snapshots some issues
  • Issue
  • avoid overwriting snapshot states
  • (without blocking, pinning etc)
  • Issue
  • update snapshot meta data efficiently
  • (large, dynamic page tables )
  • Issue
  • filter out long-lived snaps (focus here)

20
New techniques for copy-out snapshots
  • - VMOB in-memory versioned data structure
    preserves snapshot states w/out blocking
  • LPT incrementally archived page table with
    logarithmic reconstruction cost
  • Filtering exploit smart representation for
  • past states (focus here)

21
Filtering motivation
  • Want unlimited past at high resolution
  • but
  • some snapshots are transient
  • others of long-term interest to application
  • application needs to discriminate between
    snapshots

22
Thresher a filtering system for SNAP
23
Snapshot representation
  • What can representation do for filtering?
  • life-time based allocation
  • avoids fragmentation
  • diff-based encoding
  • reduces cost of copying
  • adaptive combination -
  • real winner

24
Example hierarchical snapshots at multiple time
granularity
  • ICU patient monitoring DB takes snapshots
  • minute by minute vital sign monitor readings
  • hourly includes nurses writeup summarizing
    monitor readings
  • daily includes doctors notes summarizing
    nurses checkups
  • Doctors have longer life-time than nurses

25
Brief overview snapshot creation
  • Some notation
  • Snapshot span
  • Recorded pages
  • example
  • .. v4, T w (x_P), T w (y_S), v5, T..
  • Span of v4 T, T
  • Pages recorded by snapshot v4 P, S

26
Incremental snapshot creation
  • Archived snapshot pages dispersed
  • v4 P S v5 P Q
  • -----------------------------------------------
    -?
  • Archived snapshot page tables (PT)
  • PT(v4) addr (P4), addr(S4) PT(v5) addr(P5),
    addr(Q5)..
  • -----------------------------------------------
    --?
  • Another talk how to construct archived page
    tables
  • Construct APT (v4) recorded (v4)
    Construct APT (v5)

27
Filtering example filter out short-lived v5
  • Doctors Nurses
  • v4 P S v5 P Q
    v6
  • -----------------------------------------------
    -? Archive

  • Filter long-lived v4, reclaim v5
  • reclaim P5
  • retain Q5 (v4 needs it)
  • filtering incremental snapshots creates
    fragmentation

28
Problem fragmentation
  • fragmented archive, over time
  • non sequential archive writes
  • or
  • random reads to copy out long lived states

29
Our approach filter-spec
  • Filter spec determines
  • relative snapshot lifetime
  • App knows best
  • the app supplies a filter spec
  • the system filters

30
avoid fragmentation with filter-spec
  • Known at snapshot declaration
  • use lifetime-based allocation
  • After the fact -
  • use a flexible rep to filter lazily
  • rep allows adaptive trade-off
  • cost of filtering vs cost of BITE

31
App specifies filter at declaration
P4 S4 Q5 long-lived
pages --------------------------------------
----------? P5
short-lived -------------------------------
-----------------? Invariant to reclaim w/out
fragmentation, short-lived areas store no
long-lived pages
32
FilterTree filter pages for free
33
After-the-fact (lazy) filtering
  • Some applications want
  • to defer filter specification
  • Lazy filtering requires copying
  • We can specialize representation (compact)
  • to reduce copying cost

34
Compact representation diffs
  • Two components filtered separately
  • compact diffs reduce cost of copying
  • (diffs clustered by page)
  • checkpoints accelerate BITE
  • (page-based snapshots
  • system-declared, can use FilterTree)

35
Adaptive trade-off
  • Like recovery log
  • less frequent checkpoints
  • increase compactness
  • more frequent checkpoints
  • accelerate BITE

36
Lazy filtering checkpoints filtered for free
Archive regions for diff extents
FilterTree for checkpoints
G2(diffs)
E


B1
G1(diffs)
E1
E2
E3
B1
B2
B3
37
But some applications want more
  • lazy filtering
  • and
  • faster BITE
  • e.g.
  • - app runs BITE on batch of recent snapshots
  • to decide which ones to retain -
  • needs fast BITE to keep up..

38
Combined hybrid
  • Faster BITE in recent window
  • and
  • Lazy filtering

39
Hybrid checkpoints and checkpointfiltered for
free
40
Status
  • Implemented
  • SNAP and Thresher for Thor storage system
  • Performance results
  • encouraging.
  • here is a 5000 feet view

41
Performance metrics
  • Cost of filtering
  • non-disruptiveness rate-of-drain/ rate-of-pour
  • t_clean determins rate-of-drain
  • workload parameter overwriting
  • Compactness of diff-based rep
  • retention relative to page-based rep
  • R_diff - fixed
  • R_ckp - tunable by frequency of checkpoints
  • workload parameter density
  • BITE - page-based snapshots, vs diff-based vs
    DB

42
Non-disruptiveness
  • Storage system w/hybrid snapshots vs
  • w/out snapshots (Thor)
  • How much drop in
  • rate-of-drain / rate-of-pour

43
Experimental configuration
  • Workoads
  • extend multiuser 007 to control
  • density
  • overwriting
  • System configuration
  • single client, medium 007 small DB 185MB
  • multiple clients large DB 140GB

44
FIlterTree
  • Free!

45
Non-disruptiveness/ single client summertime
life is easy
46
Non-disruptiveness/multi user DB works harder
47
Summary non-disruptive snapshot memory
  • Unlimited filtered past
  • is cheaper than you may think.
  • .. A chicken in every pot..
  • Every storage system
  • can have a snapshot box on the side..

48
To get there
  • Generalize
  • ARIES/ STEAL / underway
  • file systems / need extended interfaces
  • Beyond
  • upgrades/ have techniques
  • provenance / need ideas..
Write a Comment
User Comments (0)
About PowerShow.com