Venti: a new approach to archival storage - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Venti: a new approach to archival storage

Description:

... Vac 'vac' ... a user, it appears that vac compresses any amount of data down ... vac file1. vac:64daefaecc4df4b5cb48a368b361ef56012a4f46 $ vac f ... – PowerPoint PPT presentation

Number of Views:360
Avg rating:3.0/5.0
Slides: 21
Provided by: ssrnet
Category:

less

Transcript and Presenter's Notes

Title: Venti: a new approach to archival storage


1
Venti a new approach to archival storage
  • Sean Quinlan, Sean Dorward
  • (Bell Laboratory)
  • Proceedings of theUSENIX FAST 2002 Conference
    onFile and Storage Technologies

2
Introduction
  • Data archival
  • Prevalent form
  • Regular backup of data to magnetic tape(tar,
    ufsdump ) full or incremental
  • Typical scenario
  • To provide backup as a central service for a
    number of client machines

network
Backupserver
clients
3
Introduction
  • Problem
  • Restoring data from tape backup is tedious and
    error prone
  • Violates access permission
  • Tapes are mislabeled or reused or lost
  • Size is usually very big
  • So, snapshot file system is introduced

4
Introduction
  • Snapshot
  • Each snapshot is complete file system tree
  • Much like full backup
  • But, the Implementation
  • Resembles incremental backup
  • WAFL, AFS, Plan 9

Distributed OS beingdeveloped in Bell LAB
5
Archival system in Plan 9
  • Optical Jukebox
  • WORM device for data archival in Plan 9
  • Philosophy
  • Random access storage is sufficiently cheap that
    it is feasible to retain snapshots permanently
  • Storage required to retain all daily snapshots of
    a file system is surprisingly modest

6
Venti Archival Server
  • Venti
  • Block level network storage system intended for
    archival data
  • Venti itself doesnt provide the services of a
    file or backup system
  • Backend archival storage for client application
  • Identifies data blocks by a hash of their
    contents
  • Hash(called fingerprint) can be used as address
    for read/write operations

7
Venti Archival Server
  • Characteristics of Venti
  • A block cannot be modified without changing its
    address
  • Because, blocks are addressed by fingerprint of
    their contents
  • Leads to write-once policy
  • Writes are idempotent
  • Multiple writes of same data can be coalesced and
    do not require additional space
  • Provides inherent integrity checking of data

8
Venti Archival Server
network
Client request
H(A)
A
H(B)
B
H(C)
C
index
data block
Venti server
9
Choice of Hash Function
  • Sha1 hash function
  • Cryptographic hash
  • Output is 160 bit (20 byte)
  • Computations of hash value is efficient
  • Is 160 bit hash value is enough?
  • If storage size is exabyte (1018 bytes) with 8K
    block-gt probability of a collision is less than
    10-20

10
Applications Using Venti
  • Venti is a building block on which a variety of
    storage application can be constructed
  • Data retrieval
  • To enable successive data blocks to be retrieved,
    application must record the fingerprints of each
    blocks
  • One approach is to pack the fingerprints into
    additional blocks (called pointer blocks)
  • Tree structure
  • Application can record only root fingerprint

11
Applications Using Venti
  • Tree structure for storing a linear sequence of
    blocks

12
Application Example - Vac
  • vac
  • Application in Plan 9 system for storing a
    collection of files and directories as single
    object (like tar, zip)
  • Reverse is unvac
  • Output is always 45 bytes ASCII value
  • Fixed header ASCII representation of root
    fingerprint
  • For a user, it appears that vac compresses any
    amount of data down to 45 bytes

13
Application Example - Vac
  • Vac example from Plan 9 manual page

Without f, output goes to stdout
vac file1 vac64daefaecc4df4b5cb48a368b361ef5601
2a4f46 vac f vacfile file1 file2
With f, output goes to specified file
14
Implementation
  • Prototype
  • Uses
  • Append-only log of data blocks
  • Index that maps fingerprints to locations in log
  • Additionally
  • For robustness uses RAID
  • For performance caching, striping,
    write buffering

15
Implementation
  • Data log

16
Implementation
  • Index
  • Main performance penalty for Venti is index
    lookup
  • So fingerprint value itself is hashed in index
  • Additionally, caching, striping, write buffering
    are used for performance

17
Performance
  • Trace
  • 10 years disk trace of two file servers in Bell
    lab is used
  • Bootes 1990-1997
  • Emelie 1997-2001

18
Performance
  • Emelie storage size

19
Performance
  • Emelie ratio of archival to active data

20
Conclusion
  • Approach of identifying a block by Sha1 hash is
    well suited to archival storage
  • Write-once model and ability to coalesce
    duplicate copies of a block makes Venti a useful
    building block for many interesting storage
    application
  • By rapid growth in capacity of magnetic disks, it
    seems unlikely that archival data will be deleted
    to reclaim space-gt Venti provides an attractive
    approach to storing that data
Write a Comment
User Comments (0)
About PowerShow.com