PAST: A largescale persistent peertopeer storage utility - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

PAST: A largescale persistent peertopeer storage utility

Description:

Napster: A peer-to-peer file sharing application ... Napster. decentralized storage of actual content ... like a decentralized Napster. distributed index and search ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 38
Provided by: debo129
Category:

less

Transcript and Presenter's Notes

Title: PAST: A largescale persistent peertopeer storage utility


1
PAST A large-scale persistent peer-to-peer
storage utility
  • LECS Reading Group
  • 10/23/2001

2
P2P in the Internet
  • Napster A peer-to-peer file sharing application
  • allow Internet users to exchange files directly
  • simple idea hugely successful
  • fastest growing Web application
  • 50 Million users in January 2001
  • shut down in February 2001
  • similar systems/startups followed in rapid
    succession
  • Gnutella, Scour, Freenet, Groove, Flycode, vTrails

3
Peer-to-peer computing
  • Peer-to-peer systems
  • distributed, nodes have identical capabilities
    and responsibilities, communication is
    asymmetric
  • Technical Potential
  • can harness huge amounts of resources.
  • user PCs disk space, upstream bandwidth, CPU
    cycles
  • without requiring expensive hardware,
    bandwidth, rack space
  • completely distributed
  • robust, less vulnerable to DoS attacks, harder to
    censor

Technical Challenges are decentralized
control, self-organization, adaptation and
scalability!
4
Napster
128.1.2.3
(xyz.mp3, 128.1.2.3)
Central Napster server
5
Napster
128.1.2.3
xyz.mp3 ?
128.1.2.3
Central Napster server
6
Napster
128.1.2.3
xyz.mp3 ?
Central Napster server
7
Gnutella
8
Gnutella
xyz.mp3 ?
9
Gnutella
10
Gnutella
xyz.mp3
11
Peer-to-peer File Sharing
  • Napster
  • decentralized storage of actual content
  • transfer content directly from one peer (client)
    to another
  • centralized index and search
  • simple, but O(N) state and single point of
    failure
  • Gnutella
  • like a decentralized Napster
  • distributed index and search
  • Robust, but worst case O(N) messages per lookup

Next generation systems build on distributed
indexing, lookup services
12
Large-scale Storage Management Systems
  • Distributed storage infrastructure
  • PAST (Rice and Microsoft Research, routing
    substrate - Pastry)
  • OceanStore (U.C.Berkeley, routing substrate -
    Tapestry)
  • Publius (ATT)
  • Farsite (Microsoft Research)
  • CFS (MIT, routing substrate - Chord)
  • GRCD(UC Berkeley, builds on CAN)
  • Goals
  • Continuous access to persistent information
  • Utility infrastructure that manages customer
    content
  • Resilience to DoS attacks, censorship, other node
    failures.

13
PAST
  • Internet-based, peer-to-peer global storage
    utility
  • Goals strong persistence, high availability,
    scalability and security
  • Overview
  • PAST API for Clients
  • Pastry
  • Peer-to-peer routing substrate
  • Storage management
  • store multiple replicas of files
  • Cache management
  • cache additional copies of popular files

14
PAST API for Clients
  • fileId Insert(name, owner-credentials, k, file)
  • stores file at k distinct nodes in the PAST
    network
  • fileId SHA-1(name, owner-credentials, random
    number)
  • file Lookup(fileId)
  • reliably retrieve a copy of the file, normally
    from a nearby node
  • Reclaim(fileId, owner-credentials)
  • reclaims the storage occupied by the k copies of
    the file identified by fileId.

Archival storage and content distribution not a
general purpose FS No searching, directory
lookup, or key distribution operations
15
PAST IDs
  • File Identifier - 160 bits 128 msb forms the
    KeyID
  • Node Identifier -128 bits
  • Both are uniformly distributed
  • Both lie in the same namespace
  • How to map Key IDs to Node IDs?
  • Use Pastry

16
Pastry Peer-to-peer routing substrate
  • Provide generic, scalable indexing, data location
    and routing for peer-to-peer applications
  • Inspiration from Plaxtons algorithm (used in web
    content distribution eg. Akamai) and Landmark
    hierarchy routing
  • Goals
  • Efficiency
  • Scalability
  • Fault Resilience
  • Self-organization (completely decentralized)

17
Pastry Basic Idea
18
Pastry Basic Idea
insert(K1,V1)
19
Pastry Basic Idea
insert(K1,V1)
20
Pastry Basic Idea
(K1,V1)
21
Pastry Basic Idea
retrieve (K1)
22
PAST/Pastry Node ID space
128 bits ( max. 2128 nodes)
Node id
0
1
L1


b bits
L levels b 128/L bits per level NodeId
sequence of L, base 2b (b-bit) digits
21280
2128 - 1
1
1
Circular Namespace
23
State of a Pastry Node
Node 1 2 3 has routing table
  • Entries consist of nodeId,
  • IP address of node
  • Routing Table
  • ceil(log2b N ) levels each level corresponds to
    a row
  • 2b 1 entries per level i.e., columns per row
  • each entry per level n corresponds to a node
    whose nodeId
  • matches in the first n digits, differs in digit
    (n1)

Xi Yi
1 Yi
1 2
is every number in 0,.., 2b-1 1
is every number in 0,.., 2b-1 2
is every number in 0,.., 2b-1 3

Xi, Yi are any numbers in 0,.., 2b-1
24
State of a Pastry Node
  • Leaf Set
  • l nearby nodes based on proximity in nodeId
    space
  • Neighborhood Set
  • l nearby nodes based on network proximity metric
  • not used for routing
  • used during node addition/recovery

16-bit nodeId space l 8, b 2
Leaf Set Entries
10233102
25
Routing Requests in Pastry
  • Route (my-id, key-id, message)
  • if (key-id in range of my leaf-set)
  • forward to the numerically closest node in
    leaf set
  • else
  • forward to a node node-id in the routing table
    s. th. node-id shares a longer prefix with
    key-id than my-id
  • else
  • forward to a node node-id that shares the same
    length prefix with key-id as my-id but is
    numerically closer

Routing takes O(log N) messages
26
Node Addition
A 10
  • X joining node
  • A node nearby X (network proximity)
  • Z node numerically closest to X2
  • Routing Table of X
  • leaf-set(X) leaf-set(Z)
  • neighborhood-set(X)
  • neighborhood-set(A)
  • routing table X, row i routing
    table Ni, row i, where Ni is the ith node
    encountered along the route from A to Z
  • X notifies all-nodes in leaf-set(X) which update
    their state.

N1
N36
Lookup(216)
N2
240
Z 210
27
Node Failures, Recovery
  • Rely on a soft-state protocol to deal with node
    failures
  • Neighboring nodes in the nodeId space
    periodically
  • exchange keepalive msgs
  • unresponsive nodes for a period T removed from
  • leaf-sets
  • recovering nodes contacts last known leaf set,
    updates its own leaf set, notifies members of its
    presence.
  • Randomized routing to deal with malicious nodes
    that can cause repeated query failures

PASTRY details buried in Middleware 2001 paper
28
PAST Storage Management
  • Goals
  • High global storage utilization
  • Graceful degradation near maximal utilization
  • Design Goals
  • Local coordination among nodes.
  • Fully integrate storage management w/ file
    insertion.
  • Modest performance overheads.
  • Challenges
  • Balancing unused storage among nodes vs.
    requirement to maintain copies of each file at k
    nodes with nodeIds closest to fileId

29
Storage Load Imbalance
  • Causes
  • storage capacity differences among individual
    PAST nodes
  • high variance in file size distribution
  • statistical variation in fileID and nodeID
    assignments
  • Impact
  • not all of the k-closest nodes can accommodate a
    file replica
  • 3 solutions to deal with imbalances

30
1 Per-node storage control
  • No more than 2 orders of magnitude difference in
    storage capacity of individual nodes assumed
  • Advertised capacity controls admission of new
    nodes (compared to average capacity)
  • too large split into multiple nodeIds
  • too small reject

31
2 Replica Diversion
  • Necessary when a node A among the k closest (to
    the fileId) cannot accommodate the file copy
    locally
  • GOAL balance the unused storage space among the
    nodes in a leaf set
  • Node A diverts copy to node B in its leaf set if
  • B is not among k-closest
  • B does not already have a diverted replica
  • Replica diversion controlled by 3 policies to
    avoid performance penalty of unnecessary replica
    diversion.

32
3 File Diversion
  • Necessary when file insert fails even with
    replica diversion
  • GOAL Balance the unused storage space among
    different portions of the nodeId space in PAST
  • client generates new fileId for the file and
    retries up to 3 times
  • application notified after 4 successive file
    insert failures
  • can retry with smaller file size or k (
    replicas) value.

33
PAST Cache Management at Nodes
  • Why cache file copies?
  • k replicas may not be enough for very popular
    files
  • beneficial if there exists spatial locality among
    clients of a particular file
  • Goals
  • minimize client access latencies
  • fetch distance in terms of Pastry Routing hops
  • maximize query throughput
  • balance query load in the system

34
Caching Policies
  • Insertion Insert a file routed through a node
    as part of an Insert or Lookup operation if
  • file size size
  • Replacement GreedyDual-Size (GD-S) Policy
  • assign weight Hd to each file d, weight
    inversely proportional to file size d
  • evict file v with minimum weight Hv
  • subtract Hv from the weights of all remaining
    cached files (enforces aging)

35
Evaluation
  • PAST implemented in JAVA
  • Network Emulation using JavaVM
  • 2 workloads (based on NLANR traces) for
  • file sizes
  • 4 normal distributions of node storage sizes

36
Key Results
  • STORAGE
  • Replica and file diversion improved global
    storage utilization from 60.8 to 98 compared to
    without
  • insertion failures drop to
  • Caveat Storage capacities used in experiment,
    1000x times below what might be expected in
    practice.
  • CACHING
  • Routing Hops with caching lower than without
    caching even with 99 storage utilization
  • Caveat median file sizes very low, likely
    caching performance will degrade if this is
    higher.

37
Questions
  • Is PASTRY really self-organizing?
  • IP multicast based expanding ring search etc.
    not viable.
  • Get Nearest network node externally for Node
    Joins/Additions how will you do this in
    practice?
  • Is strong persistence an overkill?
  • Makes the system needlessly complicated
    (especially w.r.t replica maintenance and
    diversion policies)
  • k the number of replicas anyway.
  • How do caches purge copies of Reclaimed files?
  • How to deal with arbitrary large files?
  • Isnt CFS block based storage scheme much better
    in this case?
Write a Comment
User Comments (0)
About PowerShow.com