Data Replication in OceanStore Dennis Geels UC Berkeley - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Data Replication in OceanStore Dennis Geels UC Berkeley

Description:

Applications can relax data consistency in exchange for improved performance ... Greater flexibility, wide range of consistency semantics ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 32
Provided by: vijaysh
Category:

less

Transcript and Presenter's Notes

Title: Data Replication in OceanStore Dennis Geels UC Berkeley


1
Data Replication in OceanStore - Dennis
Geels UC Berkeley
  • Vijay Shiv Kumar

2
Talk Outline
  • Introduction to OceanStore
  • Replication Approach
  • System overview
  • Replication subsystem design
  • Evaluation and results
  • Conclusions and future work

3
Introduction to OceanStore
  • A utility infrastructure to span the globe
    and provide continuous access to persistent
    information
  • Global-scale storage system
  • Designed to support billions of users and
    exabytes of data.
  • Utility model No single entity must own or
    control all the machines in the network.
  • Does not entrust privacy or integrity of user
    data to any server.

4
Introduction to OceanStore
  • Machines communicate using an overlay network
    named Tapestry.
  • Each machine has its own GUID.
  • Client applications access OceanStore through a
    local daemon process.
  • Client machines interact with the inner ring
  • Serializes and processes client requests
  • Uses a Byzantine-fault-tolerant protocol
  • High-bandwidth connections

5
(No Transcript)
6
Why Replication ?
  • Only clients and inner ring servers
  • Servers have limited CPU, storage and network
    resources.
  • Inner ring becomes a bottleneck.
  • Some clients may have high access latency
  • High availability (even in presence of network
    outages etc.)

7
Replication Approach
  • Automatically creates and maintains soft-state
    replicas of data near or on client machines.
  • Replicas cooperate to share data and disseminate
    updates efficiently. No compromise on security
    and consistency guarantees.
  • Flexible read interface
  • Applications can relax data consistency in
    exchange for improved performance
  • Supports retrieval of arbitrarily old versions of
    client data (time travel)

8
System overview Data Object Format
  • OceanStore is a versioning storage system
  • Data Object ordered sequences of read-only
    versions
  • Entire stream of versions of a given object is
    named by its AGUID (secure hash of owners key
    application id)
  • Each version is named by its VGUID (hash of
    contents)

9
Data Object Format
  • For larger versions, data is stored in a B-tree
    structure.
  • Individual read-only blocks are stored
    independently and are named by their BGUIDs.
  • Allows for sharing of blocks.

10
Data Object
  • Heartbeat Mapping between AGUID and latest VGUID

11
System Overview Client Requests
  • Heartbeat Requests
  • To learn the VGUID of the latest version
  • Update Requests
  • Must go to the inner ring
  • Predicate-based update interface (Bayou)
  • Greater flexibility, wide range of consistency
    semantics
  • e.g. Current version of data object matches a
    specified VGUID.
  • List of ltpredicate, actiongt pairs.
  • Signed update response includes a heartbeat
    naming the new version.

12
Client Requests
  • Read Requests
  • Need not go to the inner ring
  • Read responses need not be signed (Client can
    verify the versions contents against the
    requested VGUID).
  • Rich predicate-based read interface
  • Client needs
  • AGUID of data object
  • User-level description of the desired portion of
    the data.

13
Replication system design
  • Replicas are created on available servers near
    the clients or on the clients themselves
    (Promiscuous caching).
  • Design involves a second-tier with two main
    additional components
  • Secondary Replica Store copies of data objects
    and process heartbeat/data requests
  • Dissemination tree Multicast tree to propagate
    updates/requests among replicas

14
(No Transcript)
15
Secondary Replica
  • Data stored is purely soft-state. Easily
    verifiable.

16
Dissemination Tree
  • Multicast trees
  • Rooted at the primary replica
  • Heartbeats, Update / Read
  • Certificates, requests
  • Updates
  • New replicas join the tree by sending a request
    addressed to replica label.
  • Periodic rejoining may shrink trees height but
    will never increase it.

17
Dissemination Tree
18
Creating Replicas
  • New replicas created on client machine when a
    client application first reads a data object.
  • Machine joins the dissemination tree, using a
    join-tree request.
  • Can also be created on nearby machines.
  • Client sends requests with a TTL parameter.
  • If TTL expires before replica found, then last
    machine creates replica (as a proxy).

19
Updating Replicas
  • Primary replicas forward down the tree
  • heartbeat verifying new versions VGUID,
  • Successful action from the update
  • VGUID for the previous version
  • Each secondary replica
  • Checks for the correct initial version
  • Applies the update locally
  • Verifies that new VGUID matches heartbeat

20
Updating Replicas
21
Reading a Replica
  • Clients specify
  • AGUID of data object to be read
  • Version predicate (Restrictions on versions)
  • Most recent version as of certain time
  • Sequence number in a specified range
  • Selection
  • Range of bytes to read, once a version is chosen
  • Failure Mode
  • Specify behavior of replica if request cannot be
    fully satisfied locally.

22
Reading a Replica
23
Reclaiming resources
  • Machines may remove replicas when
  • they fall out of use
  • resources are needed for more important tasks
  • Remaining children in the dissemination tree are
    notified
  • they reconnect elsewhere
  • Local replica being soft-state, is simply thrown
    away.

24
Evaluation - Benchmarks
  • To simulate data sharing among collaborating
    users
  • Several clients open the same data object
  • Concurrently submit R/W requests for 5 min.
  • Wait 5 seconds between requests
  • Use this benchmark to measure the effect of local
    replication on client read latency.
  • To simulate single-source streaming data
  • Single writer repeatedly overwrites portions of a
    data object, with zero think time between updates
  • Many readers continually query their local
    secondary replica for the latest version of the
    data object. When they detect a new version, they
    reread the object.
  • Use this benchmark to measure the latency of
    update propagation in a dissemination tree.

25
Evaluation - Infrastructure
  • Testbed Local cluster of 42 machines at UCB
  • Multiple OceanStore machines on each real machine
  • Simulated operation in a wide-area network using
    an artificial transit-stub network (?) of 495
    nodes
  • Inter-domain latencies of 150 ms
  • Local-area latencies of 10-50 ms
  • Inner ring servers placed on well-connected nodes
    in different domains
  • One hundred other nodes distributed randomly
    throughout the network.

26
Results
27
Results
28
Results
29
Conclusion
  • The replication system effectively reduces client
    read latency.
  • The self-organizing dissemination trees propagate
    new data quickly and efficiently except in cases
    where write traffic is high.
  • The need for versioning and cryptography support
    limits the scope of work
  • Should be easily portable to other P2P storage
    systems which support signed, mutable data objects

30
Future Work
  • Include selection clause in update format and in
    the dissemination tree lease
  • Better replica management based on user activity
  • Better cache management by the replica stage
  • Extend second tier to perform in the absence of
    primary tier
  • Better heuristics for dissemination tree joining
  • Piggy-back a read request onto the join-tree
    request

31
QUESTIONS
Write a Comment
User Comments (0)
About PowerShow.com