Replica Control for Peer-to-Peer Storage Systems - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Replica Control for Peer-to-Peer Storage Systems

Description:

Peer-to-peer (P2P) has emerged as an important paradigm model for sharing ... Torus. Hierarchical. Multi-column. and so on... 12 /40. Quorum-Based Schemes (1/2) ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 41
Provided by: GOO47
Category:

less

Transcript and Presenter's Notes

Title: Replica Control for Peer-to-Peer Storage Systems


1
Replica Control for Peer-to-Peer Storage Systems
2
P2P
  • Peer-to-peer (P2P) has emerged as an important
    paradigm model for sharing resources at the edges
    of the Internet.
  • The most widely exploited resource is storage, as
    typified in P2P music file sharing
  • Napster
  • Gnutella
  • Following the great success of P2P file sharing,
    a natural next step is to develop wide-area, P2P
    storage systems to aggregate the storage across
    the Internet.

3
Replica Control Protocol
  • Replication
  • to maintain multiple copies of some critical data
    to increase the availability
  • Replica Control Protocol
  • to guarantee a consistent view of the replicated
    data

4
Replica Control Methods
  • Optimistic
  • Proceed optimistically with computation on the
    available subgroup and join later with
    consistency
  • Approaches
  • Log, Version vector, etc.
  • Pessimistic
  • Restrict computations with worst-case assumptions
  • Approaches
  • Primary site, Voting, etc.

5
Write-ahead Log
  • Files are actually modified in place, but before
    any file block is changed, a record is written to
    a log telling which node is making the change,
    which file block is being changed, and what the
    old and new values are
  • Only after the log has been written successfully
    is the change made to the file
  • Write-ahead log can be used for undo (rollback)
    and redo

6
Version Vector
  • Version vector for file f
  • N element vector, where N is the number of nodes
    in which f is stores
  • The ith element represents the number of updates
    done by node I
  • A vector V dominated V if
  • Every element in V gt corresponding element in V
  • Conflicts if neither dominates

7
Version Vector
  • Consistency resolution
  • If V dominates V, inconsistent can be resolved
    by copying V to V
  • If V and V conflict, inconsistency is detected
  • Version vector can detect only update conflicts
    cannot detect read-write conflicts

8
Primary Site Approach
  • Data replicated on at least k1 nodes (for
    k-resilient)
  • One node acts as the primary site (PS)
  • Any read request is served by the PS
  • Any write request is copied to all other back-up
    sites
  • Any write request to back-up sites are forwarded
    to the PS

9
PS Failure Handling
  • If back-up fails, no interruption in service
  • If PS fails, there are two possibilities
  • If the network not segmented
  • Choose a back-up node as the primary
  • If checkpointing has been active, need to restart
    only from the previous checkpoint
  • If segmented
  • Only the partition with PS can progress
  • Other partitions stops updates on data
  • Necessary to distinguish between site failures
    and network partitions

10
Voting Approach
  • V votes are distributed to n replicas with
  • VwVr gt V
  • VwVw gt V
  • Obtain Vr or more votes to read
  • Obtain Vw or more votes to write
  • Quorum system is more general than voting

11
Quorum Systems
  • Trees
  • Grid-based (array-based)
  • Torus
  • Hierarchical
  • Multi-column
  • and so on

12
Quorum-Based Schemes (1/2)
  • n replicas with version numbers
  • Read operation
  • Read-lock and access a read quorum
  • Obtaining a largest-version-number replica
  • Write operation
  • Write-lock and access a write quorum
  • Updating all replicas with the new version number

the largest 1
13
Quorum-Based Schemes (2/2)
The set of replicas must behave as if there is
only a single copy. This is the strictest
consistency criterion.
  • One-copy equivalence is guaranteed If we restrict
  • Write-write and write-read lock exclusion
  • Intersection Property
  • A non-empty intersection in any pair of
  • A read quorum and a write quorum
  • Two write quorums

14
Witnesses
  • Witness
  • - small entity that maintains enough
    information to identify the replicas that
    contain the most recent version of the data
  • - the information could also be a timestamp
    containing the time of the latest update
  • - the information could also be a version
    number, which is an integer incremented each time
    the data are updated

15
Classification of P2P Storage Sys.
  • Unstructured
  • Replication Strategies for Highly Available
    Peer-to-peer Storage
  • Replication Strategies in Unstructured
    Peer-to-peer Networks
  • Structured
  • CFS
  • PAST
  • LAR
  • Ivy
  • Oasis
  • Om
  • Eliot
  • Sigma (for mutual exclusion primitive)

Read only
Read/Write (Mutable)
16
Ivy
  • Stores a set of logs with the aid of distributed
    hash tables.
  • Ivy keeps, for each participant, a log storing
    all its updates, and maintains data consistency
    optimistically by performing conflict resolutions
    among all logs. (Maintain data consistency in a
    best-effort manner)
  • The logs should be kept indefinitely and a
    participant must scan all the logs related to a
    file to look up the up-to-date file data. Thus,
    Ivy is only suitable for small groups of
    participants.

17
Ivy uses DHT
(Ivy)
Distributed application
data
get (key)
put(key, data)
(DHash)
Distributed hash table
lookup(key)
node IP address
(Chord)
Lookup service
  • DHT provides
  • Simple API
  • Put(key, value) and get(key) ? value
  • Availability (Replication)
  • Robustness (Integrity checking)

18
Solution Log Based
  • Update Each participant maintains a log of
    changes to the file system
  • Lookup Each participant scans all logs

19
Eliot
  • Eliot relies a reliable, fault-tolerant,
    immutable P2P storage substrate ? Charles to
    store data blocks, and uses an auxiliary metadata
    service (MS) for storing mutable metadata.
  • It supports NFS-like consistency semantics
    however, the traffic between MS and the client is
    high for such semantics.
  • It also supports AFS open-close consistency
    semantics however, this semantics may cause the
    problem of lost updates.
  • The MS service is provided by a conventional
    replicated database, which may be not fit for
    dynamic P2P environments.

20
Oasis
  • Oasis is based on Giffords weighted voting
    quorum concept and allows dynamic quorum
    membership.
  • It spreads versioned metadata along with data
    replicas over the P2P network.
  • To complete an operation on a data object, a
    client must first find a metadata related to the
    object and figure out the total number of votes,
    required votes for read/write operations, replica
    list, and so on, to form a quorum accordingly.
  • One drawback of Oasis is that if a node happens
    to use a stale metadata, the data consistency may
    be violated.

21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
Om
  • Om is based on the concepts of automatic replica
    regeneration and replica membership
    reconfiguration.
  • The consistency is maintained by two quorum
    systems a read-one-write-all quorum system for
    accessing replicas, and a witness-modeled quorum
    system for reconfiguration.
  • Om allows replica regeneration from single
    replica. However, a write in Om is always first
    forwarded to the primary copy, which serializing
    all writes and uses a two-phase procedure to
    propagate the write to all secondary replicas.
  • The drawbacks of Om are (1) the primary replica
    may become a bottleneck (2) the overhead
    incurred by the two-phase procedure may be too
    high (3) the reconfiguration by witness model
    has the probability of violating consistency.

26
Om and Pastry
PAST example Object key 100
80
120
90
104
98
103
99
101
100
27
Om and Pastry
PAST example Object key 100 Replication
80
120
90
104
98
103
99
101
100
28
Om and Pastry
PAST example Object key 100 Replication Replic
a crash
80
120
90
104
98
103
99
101
100
29
Om and Pastry
PAST example Object key 100 Replication Replic
a crash Regeneration
80
120
90
104
98
103
99
101
100
30
Om Normal Case Operation
Read-one / write-all approach Writes serialized
via primary
80
120
90
104
write
98
103
99
101
100
read
primary
31
Om System Architecture Overview
32
Witness Model
  • The witness model utilizes the following limited
    view divergence property
  • Intuitively, the property says that two replicas
    are unlikely to have a completely different view
    regarding the reachability of a set of
    randomly-placed witnesses.

33
Witness Model
  • To utilize the limited view divergence property,
    all replicas logically organize the witnesses
    into an mt matrix
  • The number of rows, m, determines the probability
    of intersection
  • The number of columns, t, protects against the
    failure of individual witnesses, so that each row
    has at least one functioning witness with high
    probability

34
Witness Model
35
Sigma System Model
36
Sigma System Model
  • Replicas are always available, but their internal
    states may be randomly reset.? failure-recovery
    model
  • The number of clients is unpredictable. Clients
    are not malicious and fail stop.
  • Clients and replicas communicate via messages,
    which could be replicated, lost, but never
    forged.

37
Sigma
  • The Sigma protocol intelligently collect states
    from all replicas to achieve mutual exclusion.
  • The basic idea of the Sigma protocol is as
    follows. A node u wishing to be the winner of the
    mutual exclusion sends a timestamped request for
    each of the totally n (n3k1) replicas and waits
    for replies. On receiving a request from u, a
    node v should put us request in a local queue by
    the timestamp order, takes the node as the winner
    whose request is in the front of the queue, and
    reply the winner ID to u.

38
Sigma
  • When the number of replies received by u exceeds
    m (m2k1), u acts according to the following
    conditions(1) if more than m replies take v as
    the winner, then u is the winner. (2) if more
    than m replies take w (w?u) as the winner, then w
    is the winner and u just keeps waiting.(3) if no
    node is regarded as the winner by more than m
    replies, then u sends YIELD message to cancel its
    request temporarily and then re-inserts its
    request again with random backoff.
  • In this manner, one node can eventually be
    elected as the winner even when communication
    delay variance is large.
  • A drawback of the Sigma protocol is that a node
    needs to send requests to all replicas and gets
    advantaged replies from a large portion (?2/3) of
    nodes to be the winner of the mutual exclusion,
    which will incur large overhead. Moreover, the
    overhead will even be larger under an environment
    of high contention.

39
References
  • IvyA. Muthitacharoen, R. Morris, T. Gil, and B.
    Chen, Ivy A Read/write Peer-to-peer File
    System, in Proc. of the Symposium on Operating
    Systems Design and Implementation (OSDI), 2002.
  • EliotC. Stein, M. Tucker, and M. Seltzer,
    Building a Reliable Mutable File System on
    Peer-to-peer Storage, in Proc. of 21st IEEE
    Symposium on Reliable Distributed Systems
    (WRP2PDS), 2002.
  • OasisM. Rodrig, and A. Lamarca, Decentralized
    Weighted Voting for P2P Data Management, in
    Proc. of the 3rd ACM International Workshop on
    Data Engineering for Wireless and Mobile Access,
    pp. 8592, 2003.
  • OMH. Yu. and A. Vahdat, Consistent and
    Automatic Replica Regeneration, in Proc. of
    First Symposium on Networked Systems Design and
    Implementation (NSDI '04), 2004.
  • Sigma S. Lin, Q. Lian, M. Chen, and Z. Zhang,
    A practical distributed mutual exclusion
    protocol in dynamic peer-to- peer systems, in
    Proc. of 3rd International Workshop on
    Peer-to-Peer Systems (IPTPS04), 2004.

40
  • QA
Write a Comment
User Comments (0)
About PowerShow.com