Pastiche: Making Backup Cheap and Easy - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Pastiche: Making Backup Cheap and Easy

Description:

Goal: Find a good buddy that owns all or most of your data chunks. ... Snapshot process (A stores snapshot on B): A sends public key to B (for future validation) ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 25
Provided by: unkn491
Category:

less

Transcript and Presenter's Notes

Title: Pastiche: Making Backup Cheap and Easy


1
Pastiche Making Backup Cheap and Easy
  • Presented by
  • Boon Thau Loo
  • CS294-4

2
Outline
  • Motivation and Goals
  • Enabling Technologies
  • System Design
  • Implementation and Evaluation
  • Conclusion

3
Motivation
  • Majority of users do not backup their data. Those
    who do
  • Dont backup very often.
  • Dont backup everything.
  • Backup is a significant cost in large
    organizations.
  • Why not use excess disk space for backups?
  • File systems are only half-full on average.
  • Disks are cheap.

4
Pastiche Goals
  • P2P Backup System
  • Target environment
  • Cooperation though untrusted machines.
  • End-user machines
  • Leverage common data when possible for space
    efficiency (backup buddies).
  • Preserve privacy.
  • Efficient, cost-free, administrative-free

5
Enabling Technologies
  • Pastry for self-organizing routing and object
    location.
  • Content-based Indexing (Manber94, LBFS)
  • Identify boundary regions (anchors) that divide
    file into chunks
  • Rabin fingerprinting
  • Isolate changes in each chunk.
  • SHA-1 hash of each chunk
  • Convergent encryption (used by FARSITE)
  • Encrypt file using key derived from files
    contents.
  • Further encrypt using clients key.
  • Encrypted key is stored with file in FARSITE

6
System Design
  • Data Chunks
  • File meta-data
  • Abstracts
  • Joining Pastry
  • Finding Backup buddies
  • Backup Protocol
  • Restoration
  • Failures and Malicious Nodes
  • Greed Prevention

7
Data Chunks
  • Data is stored on disk as immutable chunks.
  • Content-based indexing convergent encryption
  • Chunks are stored for local host and/or on backup
    clients.
  • Each chunk carries owner lists and maintains
    reference count.
  • When a newly written file is closed, it is
    scheduled for chunking
  • Hc Handle
  • Ic Chunk ID
  • Kc Encryption key
  • Chunk ID list forms file signature.

8
Data Chunks (Cont)
  • Backup Request
  • Remote hosts must supply public key with backup
    request.
  • If chunk exist, add requesting host to owner
    list.
  • Local reference count is incremented.
  • Delete Request
  • Requests from remote hosts must be signed by
    secret key.
  • Check against public key (cached from earlier
    backup request)
  • When reference count 0, chunk is removed.

9
File Meta-data
  • File meta-data
  • List of handles Hc for chunks comprising the
    file.
  • Ownership, permissions, creation and modification
    times.
  • Mutable with fixed Hc, Kc and Ic
  • File system root meta-data Hc generated based on
    host-specific passphrase.

10
Abstracts
  • Initial backup of a freshly installed machine is
    most expensive.
  • Goal Find a good buddy that owns all or most of
    your data chunks.
  • Naïve solution Ship full signature of new node
    around.
  • Expensive 20 bytes per chunk for a 16KB chunk.
  • Solution Send a random subset of signatures
    called an abstract.

11
Joining Pastry
  • Pastry
  • Self-organizing, p2p overlay
  • Each node maintains
  • Leaf set L/2 closest smaller (larger) nodeIDs
  • Neighborhood set Closest nodes according to
    proximity metric
  • Routing table Prefix routing
  • Join Pastry overlay with nodeID set to
    Hash(hostname)
  • Find backup buddies

12
Finding Backup Buddies
  • After joining network, route Pastry message with
    abstract to a random nodeID.
  • Each node along the route returns its coverage
    (fraction of chunks in abstract stored locally)
    with the abstract
  • Lighthouse sweep Rotating probe process repeated
    if there are insufficient candidate set by
    varying first digit of original nodeID

13
Not Enough Buddies?
  • Each node tries to find 5 buddies.
  • What if you cant find enough buddies?
  • Real possibility for rare installations
  • Create coverage-rate Pastry overlay
  • Replace network proximity distance metric with
    coverage-rate.
  • Pastry neighbor set set of nodes encountered
    during join with best coverage available.
  • Find buddies in the neighborhood set
  • A is a buddy for B, but may not vice versa (no
    symmetry)
  • Possibility of malicious nodes to misreport
    coverage.

14
Backup Protocol
  • Each Pastiche node controls its own archival
    plan.
  • Snapshot a discrete backup event.
  • Meta-data skeleton for each snapshot stored on
    per-file logs.
  • State necessary for new snapshot Add set, delete
    set, meta-data list

15
Backup Protocol (Cont..)
  • Snapshot process (A stores snapshot on B)
  • A sends public key to B (for future validation)
  • A forwards chunkIDs of add set to B.
  • B fetch chunks not already stored locally.
  • A sends delete list (signed with As private key)
  • A sends updated meta-data.
  • A sends commit request, B responds when all
    changes are persistent.

16
Restoration
  • Partial restores is straightforward. Obtain
    chunks from buddy.
  • Recover entire machine
  • Keep copy of root meta-data object in each member
    of leaf set.
  • Rejoin with same nodeID (based on hostname)
  • Retrieve root meta-data object from any node in
    leaf set.
  • Root block contain list of buddies.

17
Detecting Failure and Malice
  • Failures
  • Buddy can drop chunks if it runs out of disk
    space.
  • Buddy may crash or leave the network.
  • Malicious buddy may pretend to store your chunks.
  • Solutions
  • Before taking a new snapshot, query buddies for
    random subset of chunks. Provides instantaneous
    assurance.
  • Periodic probing of buddy Analysis shows that
    checking 0.1 of all chunks is enough.
  • Sybil attack? Malicious party occupy substantial
    fraction of nodeID space.

18
Greed Prevention
  • Greedy host can consumes storage.
  • Three solutions
  • Group backup clients based on resources consumed.
  • Cryptographic puzzles according to storage
    consumed.
  • Electronic currency
  • Currency accounting requires atomicity between
    exchange of currency and backup.

19
Implementation
  • Chunkstore file system
  • Container files LRU cache of decrypted,
    recently used files for performance.
  • Chunks increase internal fragmentation.
  • Backup daemon
  • Server Manages remote requests for storage and
    restoration.
  • Client Supervises selection of buddies and
    snapshots.

20
Evaluation
  • Compare ext2fs with chunkstore on modified
    Andrew benchmark
  • Total overhead of 7.4 is reasonable.
  • Overheads due to meta-data management, and Rabin
    fingerprints computation (for finding anchors)
  • Backup and restore compares favorably to NFS
    cross-machine copy.
  • Conclusion service does not penalize file
    system performance unduly.

21
Evaluation (Cont)
  • Question How large must the abstract be?
  • Compare machines with a freshly installed machine
  • Abstract size does not seem to matter much.

22
Evaluation (Cont)
  • Question How effective is the lighthouse sweep
    in discovering buddies?
  • Simulation 50000 Pastiche nodes with 11 types of
    nodes.
  • Lighthouse is good enough for common nodes
    (gt10).
  • Rare nodes would require coverage-rate overlay.

23
Evaluation (Cont)
  • Question How effective is the coverage-rate
    overlay in discovering buddies?
  • 10000 nodes
  • 3 types of nodes
  • One of a thousand species (Same species share 70
    of content)
  • One of a hundred genera (30)
  • One of ten orders (20)
  • Only same species can back each other up.
  • For a neighborhood size of 256, 85 were able to
    find at least one buddy. 72 found at least 5.
  • Neighborhood size matters!

24
Conclusion
  • Pastiche P2P backup mechanism.
  • What is Pastiche engineered mostly for?
  • What do end-users backup?
  • Data files (Overlap is minimal)
  • Applications (Lots of overlaps, but would you
    back up your apps?)
  • Privacy?
  • Closely coupled with Pastry
  • Lighthouse sweep.
  • Needs large neighborhood set.
Write a Comment
User Comments (0)
About PowerShow.com