Scalable peer-to-peer substrates: A new foundation for distributed applications? - PowerPoint PPT Presentation

About This Presentation
Title:

Scalable peer-to-peer substrates: A new foundation for distributed applications?

Description:

Self-organizing overlay network. Lookup/insert object in log16 N routing steps (expected) ... fx. ex. dx. cx. bx. ax. 9x. 8x. 7x. 5x. 4x. 3x. 2x. 1x. Routing ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 67
Provided by: csUi
Category:

less

Transcript and Presenter's Notes

Title: Scalable peer-to-peer substrates: A new foundation for distributed applications?


1
Scalable peer-to-peer substrates A new
foundation for distributed applications?
  • Peter Druschel, Rice University
  • Antony Rowstron,
  • Microsoft Research Cambridge, UK
  • Collaborators
  • Miguel Castro, Anne-Marie Kermarrec, MSR
    Cambridge
  • Y. Charlie Hu, Sitaram Iyer, Animesh Nandi, Atul
    Singh, Dan Wallach, Rice University

2
Outline
  • Background
  • Pastry
  • Pastry proximity routing
  • PAST
  • SCRIBE
  • Conclusions

3
Background
  • Peer-to-peer systems
  • distribution
  • decentralized control
  • self-organization
  • symmetry (communication, node roles)

4
Peer-to-peer applications
  • Pioneers Napster, Gnutella, FreeNet
  • File sharing CFS, PAST SOSP01
  • Network storage FarSite Sigmetrics00,
    Oceanstore ASPLOS00, PAST SOSP01
  • Web caching SquirrelPODC02
  • Event notification/multicast Herald HotOS01,
    Bayeux NOSDAV01, CAN-multicast NGC01,
    SCRIBE NGC01, SplitStream submitted
  • Anonymity Crowds CACM99, Onion routing
    JSAC98
  • Censorship-resistance Tangler CCS02

5
Common issues
  • Organize, maintain overlay network
  • node arrivals
  • node failures
  • Resource allocation/load balancing
  • Resource location
  • Network proximity routing
  • Idea provide a generic p2p substrate

6
Architecture
Event notification
Network storage
?
P2p application layer
P2p substrate (self-organizing overlay network)
Pastry
TCP/IP
Internet
7
Structured p2p overlays
  • One primitive
  • route(M, X) route message M to the live node
    with nodeId closest to key X
  • nodeIds and keys are from a large, sparse id
    space

8
Distributed Hash Tables (DHT)
nodes
k1,v1
k2,v2
k3,v3
P2P overlay network
Operations insert(k,v) lookup(k)
k4,v4
k5,v5
k6,v6
  • p2p overlay maps keys to nodes
  • completely decentralized and self-organizing
  • robust, scalable

9
Why structured p2p overlays?
  • Leverage pooled resources (storage, bandwidth,
    CPU)
  • Leverage resource diversity (geographic,
    ownership)
  • Leverage existing shared infrastructure
  • Scalability
  • Robustness
  • Self-organization

10
Outline
  • Background
  • Pastry
  • Pastry proximity routing
  • PAST
  • SCRIBE
  • Conclusions

11
Pastry
  • Generic p2p location and routing substrate
  • Self-organizing overlay network
  • Lookup/insert object in lt log16 N routing steps
    (expected)
  • O(log N) per-node state
  • Network proximity routing

12
Pastry Related work
  • Chord Sigcomm01
  • CAN Sigcomm01
  • Tapestry TR UCB/CSD-01-1141
  • PNRP unpub.
  • Viceroy PODC02
  • Kademlia IPTPS02
  • Small World Kleinberg 99, 00
  • Plaxton Trees Plaxton et al. 97

13
Pastry Object distribution
  • Consistent hashing Karger et al. 97
  • 128 bit circular id space
  • nodeIds (uniform random)
  • objIds (uniform random)
  • Invariant node with numerically closest nodeId
    maintains object

2128-1
O
objId
nodeIds
14
Pastry Object insertion/lookup
2128-1
O
Msg with key X is routed to live node with nodeId
closest to X Problem complete routing table
not feasible
X
Route(X)
15
Pastry Routing
  • Tradeoff
  • O(log N) routing table size
  • O(log N) message forwarding steps

16
Pastry Routing table ( 65a1fcx)
Row 0
Row 1
Row 2
Row 3
log16 N rows
17
Pastry Routing
d471f1
d467c4
d462ba
d46a1c
d4213f
  • Properties
  • log16 N steps
  • O(log N) state

Route(d46a1c)
d13da3
65a1fc
18
Pastry Leaf sets
  • Each node maintains IP addresses of the nodes
    with the L/2 numerically closest larger and
    smaller nodeIds, respectively.
  • routing efficiency/robustness
  • fault detection (keep-alive)
  • application-specific local coordination

19
Pastry Routing procedure
if (destination is within range of our leaf set)
forward to numerically closest member else let
l length of shared prefix let d value of
l-th digit in Ds address if (Rld exists)
forward to Rld else forward to a known
node that (a) shares at least as long a
prefix (b) is numerically closer than this node
20
Pastry Performance
  • Integrity of overlay/ message delivery
  • guaranteed unless L/2 simultaneous failures of
    nodes with adjacent nodeIds
  • Number of routing hops
  • No failures lt log16 N expected, 128/b 1 max
  • During failure recovery
  • O(N) worst case, average case much better

21
Pastry Self-organization
  • Initializing and maintaining routing tables and
    leaf sets
  • Node addition
  • Node departure (failure)

22
Pastry Node addition
d471f1
d467c4
d462ba
d46a1c
d4213f
New node d46a1c
Route(d46a1c)
d13da3
65a1fc
23
Node departure (failure)
  • Leaf set members exchange keep-alive messages
  • Leaf set repair (eager) request set from
    farthest live node in set
  • Routing table repair (lazy) get table from peers
    in the same row, then higher rows

24
Pastry Experimental results
  • Prototype
  • implemented in Java
  • emulated network
  • deployed testbed (currently 25 sites worldwide)

25
Pastry Average of hops
L16, 100k random queries
26
Pastry of hops (100k nodes)
L16, 100k random queries
27
Pastry routing hops (failures)
L16, 100k random queries, 5k nodes, 500 failures
28
Outline
  • Background
  • Pastry
  • Pastry proximity routing
  • PAST
  • SCRIBE
  • Conclusions

29
Pastry Proximity routing
  • Assumption scalar proximity metric
  • e.g. ping delay, IP hops
  • a node can probe distance to any other node
  • Proximity invariant
  • Each routing table entry refers to a node
    close to the local node (in the proximity space),
    among all nodes with the appropriate nodeId
    prefix.
  • Locality-related route qualities
  • Distance traveled
  • Likelihood of locating the nearest replica

30
Pastry Routes in proximity space
31
Pastry Distance traveled
L16, 100k random queries, Euclidean proximity
space
32
Pastry Locality properties
  • 1) Expected distance traveled by a message in
    the proximity space is within a small constant of
    the minimum
  • 2) Routes of messages sent by nearby nodes with
    same keys converge at a node near the source
    nodes
  • 3) Among k nodes with nodeIds closest to the
    key, message likely to reach the node closest to
    the source node first

33
Pastry Node addition
34
Pastry delay vs IP delay
GATech top., .5M hosts, 60K nodes, 20K random
messages
35
Pastry API
  • route(M, X) route message M to node with nodeId
    numerically closest to X
  • deliver(M) deliver message M to application
  • forwarding(M, X) message M is being forwarded
    towards key X
  • newLeaf(L) report change in leaf set L to
    application

36
Pastry Security
  • Secure nodeId assignment
  • Secure node join protocols
  • Randomized routing
  • Byzantine fault-tolerant leaf set membership
    protocol

37
Pastry Summary
  • Generic p2p overlay network
  • Scalable, fault resilient, self-organizing,
    secure
  • O(log N) routing steps (expected)
  • O(log N) routing table size
  • Network proximity routing

38
Outline
  • Background
  • Pastry
  • Pastry proximity routing
  • PAST
  • SCRIBE
  • Conclusions

39
PAST Cooperative, archival file storage and
distribution
  • Layered on top of Pastry
  • Strong persistence
  • High availability
  • Scalability
  • Reduced cost (no backup)
  • Efficient use of pooled resources

40
PAST API
  • Insert - store replica of a file at k diverse
    storage nodes
  • Lookup - retrieve file from a nearby live storage
    node that holds a copy
  • Reclaim - free storage associated with a file
  • Files are immutable

41
PAST File storage
fileId
Insert fileId
42
PAST File storage
Storage Invariant File replicas are stored
on k nodes with nodeIds closest to fileId (k
is bounded by the leaf set size)
43
PAST File Retrieval

C
k replicas
Lookup
file located in log16 N steps (expected) usually
locates replica nearest client C
fileId
44
PAST Exploiting Pastry
  • Random, uniformly distributed nodeIds
  • replicas stored on diverse nodes
  • Uniformly distributed fileIds
  • e.g. SHA-1(filename,public key, salt)
  • approximate load balance
  • Pastry routes to closest live nodeId
  • availability, fault-tolerance

45
PAST Storage management
  • Maintain storage invariant
  • Balance free space when global utilization is
    high
  • statistical variation in assignment of files to
    nodes (fileId/nodeId)
  • file size variations
  • node storage capacity variations
  • Local coordination only (leaf sets)

46
Experimental setup
  • Web proxy traces from NLANR
  • 18.7 Gbytes, 10.5K mean, 1.4K median, 0 min,
    138MB max
  • Filesystem
  • 166.6 Gbytes. 88K mean, 4.5K median, 0 min, 2.7
    GB max
  • 2250 PAST nodes (k 5)
  • truncated normal distributions of node storage
    sizes, mean 27/270 MB

47
Need for storage management
  • No diversion (tpri 1, tdiv 0)
  • max utilization 60.8
  • 51.1 inserts failed
  • Replica/file diversion (tpri .1, tdiv .05)
  • max utilization gt 98
  • lt 1 inserts failed

48
PAST File insertion failures
49
PAST Caching
  • Nodes cache files in the unused portion of their
    allocated disk space
  • Files caches on nodes along the route of lookup
    and insert messages
  • Goals
  • maximize query xput for popular documents
  • balance query load
  • improve client latency

50
PAST Caching
fileId
Lookup topicId
51
PAST Caching
52
PAST Security
  • No read access control users may encrypt content
    for privacy
  • File authenticity file certificates
  • System integrity nodeIds, fileIds non-forgeable,
    sensitive messages signed
  • Routing randomized

53
PAST Storage quotas
  • Balance storage supply and demand
  • user holds smartcard issued by brokers
  • hides user private key, usage quota
  • debits quota upon issuing file certificate
  • storage nodes hold smartcards
  • advertise supply quota
  • storage nodes subject to random audits within
    leaf sets

54
PAST Related Work
  • CFS SOSP01
  • OceanStore ASPLOS 2000
  • FarSite Sigmetrics 2000

55
Outline
  • Background
  • Pastry
  • Pastry locality properties
  • PAST
  • SCRIBE
  • Conclusions

56
SCRIBE Large-scale, decentralized multicast
  • Infrastructure to support topic-based
    publish-subscribe applications
  • Scalable large numbers of topics, subscribers,
    wide range of subscribers/topic
  • Efficient low delay, low link stress, low node
    overhead

57
SCRIBE Large scale multicast
topicId
Publish topicId
Subscribe topicId
58
Scribe Results
  • Simulation results
  • Comparison with IP multicast delay, node stress
    and link stress
  • Experimental setup
  • Georgia Tech Transit-Stub model
  • 100,000 nodes randomly selected out of .5M
  • Zipf-like subscription distribution, 1500 topics

59
Scribe Topic popularity
gsize(r) floor(Nr -1.25 0.5) N100,000 1500
topics
60
Scribe Delay penalty
Relative delay penalty, average and maximum
61
Scribe Node stress
62
Scribe Link stress
One message published in each of the 1,500 topics
63
Related works
  • Narada
  • Bayeux/Tapestry
  • CAN-Multicast

64
Summary
  • Self-configuring P2P framework for topic-based
    publish-subscribe
  • Scribe achieves reasonable performance when
    compared to IP multicast
  • Scales to a large number of subscribers
  • Scales to a large number of topics
  • Good distribution of load

65
Status
  • Functional prototypes
  • Pastry Middleware 2001
  • PAST HotOS-VIII, SOSP01
  • SCRIBE NGC 2001, IEEE JSAC
  • SplitStream submitted
  • Squirrel PODC02
  • http//www.cs.rice.edu/CS/Systems/Pastry

66
Current Work
  • Security
  • secure routing/overlay maintenance/nodeId
    assignment
  • quota system
  • Keyword search capabilities
  • Support for mutable files in PAST
  • Anonymity/Anti-censorship
  • New applications
  • Free software releases

67
Conclusion
  • For more information
  • http//www.cs.rice.edu/CS/Systems/Pastry
Write a Comment
User Comments (0)
About PowerShow.com