Scalable peertopeer substrates: A new foundation for distributed applications - PowerPoint PPT Presentation

1 / 83
About This Presentation
Title:

Scalable peertopeer substrates: A new foundation for distributed applications

Description:

Scalable peer-to-peer substrates: A new foundation for distributed applications? ... Multicast: Herald [HotOS'01], Bayeux [NOSDAV'01], CAN-multicast [NGC'01], SCRIBE ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 84
Provided by: win4
Category:

less

Transcript and Presenter's Notes

Title: Scalable peertopeer substrates: A new foundation for distributed applications


1
Scalable peer-to-peer substrates A new
foundation for distributed applications?
  • Antony Rowstron,
  • Microsoft Research Cambridge, UK
  • Peter Druschel, Rice University
  • Collaborators
  • Miguel Castro, Anne-Marie Kermarrec, MSR
    Cambridge
  • Y. Charlie Hu, Sitaram Iyer, Dan Wallach, Rice
    University

2
Outline
  • Background
  • Squirrel
  • Pastry
  • Pastry locality properties
  • SCRIBE
  • PAST
  • Conclusions

3
Background Peer-to-peer Systems
  • distributed
  • decentralized control
  • self-organizing
  • symmetric communication/roles

4
Background
  • Peer-to-peer applications
  • Pioneers Napster, Gnutella, FreeNet
  • File sharing CFS, PAST SOSP01
  • Network storage FarSite Sigmetrics00,
    Oceanstore ASPLOS00, PAST SOSP01
  • Multicast Herald HotOS01, Bayeux NOSDAV01,
    CAN-multicast NGC01, SCRIBE NGC01

5
Common issues
  • Organize, maintain overlay network
  • node arrivals
  • node failures
  • Resource allocation/load balancing
  • Resource location
  • Locality (network proximity)
  • Idea generic P2P substrate

6
Architecture
Event notification
Network storage
?
P2p application layer
P2p substrate (self-organizing overlay network)
DHT
TCP/IP
Internet
7
DHTs
  • Peer-to-peer object location and routing
    substrate
  • Distributed Hash Table maps object key to a live
    node
  • Insert(key,object)
  • Lookup(key)
  • Key typically 128 bits
  • Pastry (developed at MSR Cambrdige/Rice) is an
    example of such an infrastructure.

8
DHT Related work
  • Chord (MIT/UCB) Sigcomm01
  • CAN (ACIRI/UCB) Sigcomm01
  • Tapestry (UCB) TR UCB/CSD-01-1141
  • PNRP (Microsoft) Huitema et al, unpub.
  • Kleinberg 99
  • Plaxton et al. 97

9
Outline
  • Background
  • Squirrel
  • Pastry
  • Pastry locality properties
  • SCRIBE
  • PAST
  • Conclusions

10
Squirrel Web Caching
  • Reduce latency,
  • Reduce external bandwidth
  • Reduce web server load.
  • ISPs, Corporate network boundaries, etc.
  • Cooperative Web Caching group of web caches
    working together and acting as one web cache.

11
Web Cache
Browser Cache
Browser
Centralized Web Cache
Web Server
Browser Cache
Browser
Internet
LAN
Sharing!
12
Decentralized Web Cache
Browser Cache
Browser
Web Server
Browser Cache
Browser
Internet
LAN
  • Why?
  • How?

13
Why peer-to-peer ?
  • Cost of dedicated web cache
  • No additional hardware
  • Administrative costs
  • Self-organizing
  • Scaling needs upgrading
  • Resources grow with clients
  • Single point of failure
  • Fault-tolerant by design

14
Setting
  • Corporate LAN
  • 100 - 100,000 desktop machines
  • Single physical location
  • Each node runs an instance of Squirrel
  • Sets it as the browsers proxy

15
Approaches
  • Home-store model
  • Directory model
  • Both approaches require key generation
  • Hash(URL)
  • Collision resistant (e.g. SHA1)
  • Hash(http//www.research.microsoft.com/antr) -gt
    4ff367a14b374e3dd99f

16
Home-store model
client
URL hash
home
17
Home-store model
client
home
thats how it works!
18
Directory model
  • Client nodes always store objects in local
    caches.
  • Main difference between the two schemes whether
    the home node also stores the object.
  • In the directory model, it only stores pointers
    to recent clients, and forwards requests to them.

19
Directory model
client
home
20
Directory model
client
delegate
random entry
home
21
(skip) Full directory protocol
22
Recap
  • Two endpoints of design space, based on the
    choice of storage location.
  • At first sight, both seem to do about as well.
    (e.g. hit ratio, latency).

23
Quirk
  • Consider a
  • Web page with many images, or
  • Heavily browsing node
  • In the Directory scheme,
  • Many home nodes pointing to one delegate
  • Home-store natural load balancing

.. evaluation on trace-based workloads ..
24
Trace characteristics
25
Total external bandwidth
Redmond
26
Total external bandwidth
Cambridge
27
LAN Hops
Redmond
28
LAN Hops
Cambridge
29
Load in requests per sec
100000
Home-store
Directory
10000
1000
Redmond
Number of such seconds
100
10
1
0
10
20
30
40
50
Max objects served per-node / second
30
Load in requests per sec
1e07
Home-store
Directory
1e06
100000
10000
Cambridge
Number of such seconds
1000
100
10
1
0
10
20
30
40
50
Max objects served per-node / second
31
Load in requests per min
100
Home-store
Directory
Redmond
10
Number of such minutes
1
0
50
100
150
200
250
300
350
Max objects served per-node / minute
32
Load in requests per min
Home-store
Directory
10000
1000
Cambridge
Number of such minutes
100
10
1
0
20
40
60
80
100
120
Max objects served per-node / minute
33
Outline
  • Background
  • Squirrel
  • Pastry
  • Pastry locality properties
  • SCRIBE
  • PAST
  • Conclusions

34
Pastry
  • Generic p2p location and routing substrate (DHT)
  • Self-organizing overlay network
  • Consistent hashing
  • Lookup/insert object in lt log16 N routing steps
    (expected)
  • O(log N) per-node state
  • Network locality heuristics

35
Pastry Object distribution
2128 - 1
O
  • Consistent hashing Karger et al. 97
  • 128 bit circular id space
  • nodeIds (uniform random)
  • objIds/keys (uniform random)
  • Invariant node with numerically closest nodeId
    maintains object

objId/key
nodeIds
36
Pastry Object insertion/lookup
2128 - 1
O
Msg with key X is routed to live node with nodeId
closest to X Problem complete routing table
not feasible
X
Route(X)
37
Pastry Routing
  • Tradeoff
  • O(log N) routing table size
  • O(log N) message forwarding steps

38
Pastry Routing table ( 65a1fcx)
Row 0
Row 1
Row 2
Row 3
log16 N rows
39
Pastry Routing
d471f1
d467c4
d462ba
d46a1c
d4213f
  • Properties
  • log16 N steps
  • O(log N) state

Route(d46a1c)
d13da3
65a1fc
40
Pastry Leaf sets
  • Each node maintains IP addresses of the nodes
    with the L numerically closest larger and smaller
    nodeIds, respectively.
  • routing efficiency/robustness
  • fault detection (keep-alive)
  • application-specific local coordination

41
Pastry Routing procedure
If (destination is within range of our leaf set)
forward to numerically closest member else let
l length of shared prefix let d value of
l-th digit in Ds address if (Rld exists)
forward to Rld else forward to a known
node that (a) shares at least as long a
prefix (b) is numerically closer than this node
42
Pastry Routing
  • Integrity of overlay
  • guaranteed unless L/2 simultaneous failures of
    nodes with adjacent nodeIds
  • Number of routing hops
  • No failures lt log16 N expected, 128/4 1 max
  • During failure recovery
  • O(N) worst case, average case much better

43
Demonstration
  • VisPastry

44
Pastry Self-organization
  • Initializing and maintaining routing tables and
    leaf sets
  • Node addition
  • Node departure (failure)

45
Pastry Node addition
d471f1
d467c4
d462ba
d46a1c
d4213f
New node d46a1c
Route(d46a1c)
d13da3
65a1fc
46
Node departure (failure)
  • Leaf set members exchange keep-alive messages
  • Leaf set repair (eager) request set from
    farthest live node in set
  • Routing table repair (lazy) get table from peers
    in the same row, then higher rows

47
Pastry Experimental results
  • Prototype
  • implemented in Java
  • emulated network

48
Pastry Average of hops
L16, 100k random queries
49
Pastry of hops (100k nodes)
L16, 100k random queries
50
Pastry routing hops (failures)
L16, 100k random queries, 5k nodes, 500
failures
51
Outline
  • Background
  • Squirrel
  • Pastry
  • Pastry locality properties
  • SCRIBE
  • PAST
  • Conclusions

52
Pastry Locality properties
  • Assumption scalar proximity metric
  • e.g. ping/RTT delay, IP hops
  • a node can probe distance to any other node
  • Proximity invariant
  • Each routing table entry refers to a node
    close to the local node (in the proximity space),
    among all nodes with the appropriate nodeId
    prefix.

53
Pastry Routes in proximity space
54
Pastry Distance traveled
L16, 100k random queries
55
Pastry Locality properties
  • 1) Expected distance traveled by a message in
    the proximity space is within a small constant of
    the minimum
  • 2) Routes of messages sent by nearby nodes with
    same keys converge at a node near the source
    nodes
  • 3) Among k nodes with nodeIds closest to the
    key, message likely to reach the node closest to
    the source node first

56
Demonstration
  • VisPastry

57
Pastry Node addition
58
Pastry API
  • route(M, X) route message M to node with nodeId
    numerically closest to X
  • deliver(M) deliver message M to application
    (callback)
  • forwarding(M, X) message M is being forwarded
    towards key X (callback)
  • newLeaf(L) report change in leaf set L to
    application (callback)

59
Pastry Security
  • Secure nodeId assignment
  • Randomized routing
  • Byzantine fault-tolerant leaf set membership
    protocol

60
Pastry Summary
  • Generic p2p overlay network
  • Scalable, fault resilient, self-organizing,
    secure
  • O(log N) routing steps (expected)
  • O(log N) routing table size
  • Network locality properties

61
Outline
  • Background
  • Squirrel
  • Pastry
  • Pastry locality properties
  • SCRIBE
  • PAST
  • Conclusions

62
SCRIBE Large-scale, decentralized event
notification
  • Infrastructure to support topic-based
    publish-subscribe applications
  • Scalable large numbers of topics, subscribers,
    wide range of subscribers/topic
  • Efficient low delay, low link stress, low node
    overhead

63
SCRIBE Large scale event notification
topicId
Publish topicId
Subscribe topicId
64
Scribe Results
  • Simulation results
  • Comparison with IP multicast delay, node stress
    and link stress
  • Experimental setup
  • Georgia Tech Transit-Stub model
  • 60000 nodes randomly selected /500 000
  • Zipf-like subscription distribution, 1500 topics

65
Scribe Topic distribution
Windows Update
Stock Alert
Number of subscribers
Instant Messaging
Topic Rank
66
Scribe Delay penalty
67
Scribe Node stress
68
Scribe Link stress
69
Related works
  • Narada
  • Bayeux/Tapestry
  • Multicast/CAN

70
Summary
  • Self-configuring P2P framework for topic-based
    publish-subscribe
  • Scribe achieves reasonable performance /IP
    multicast
  • Scales to a large number of subscribers
  • Scales to a large number of topics
  • Good distribution of load

71
Outline
  • Background
  • Squirrel
  • Pastry
  • Pastry locality properties
  • SCRIBE
  • PAST
  • Conclusions

72
PAST Cooperative, archival file storage and
distribution
  • Layered on top of Pastry
  • Strong persistence
  • High availability
  • Scalability
  • Reduced cost (no backup)
  • Efficient use of pooled resources

73
PAST API
  • Insert - store replica of a file at k diverse
    storage nodes
  • Lookup - retrieve file from a nearby live storage
    node that holds a copy
  • Reclaim - free storage associated with a file
  • Files are immutable

74
PAST File storage
fileId
Insert fileId
75
PAST File storage
Storage Invariant File replicas are stored
on k nodes with nodeIds closest to fileId (k
is bounded by the leaf set size)
76
PAST File Retrieval

C
k replicas
Lookup
file located in log16 N steps (expected) usually
locates replica nearest client C
fileId
77
PAST Exploiting Pastry
  • Random, uniformly distributed nodeIds
  • replicas stored on diverse nodes
  • Uniformly distributed fileIds
  • e.g. SHA-1(filename,public key, salt)
  • approximate load balance
  • Pastry routes to closest live nodeId
  • availability, fault-tolerance

78
PAST Storage management
  • Maintain storage invariant
  • Balance free space when global utilization is
    high
  • statistical variation in assignment of files to
    nodes (fileId/nodeId)
  • file size variations
  • node storage capacity variations
  • Local coordination only (leaf sets)

79
Experimental setup
  • Web proxy traces from NLANR
  • 18.7 Gbytes, 10.5K mean, 1.4K median, 0 min,
    138MB max
  • Filesystem
  • 166.6 Gbytes. 88K mean, 4.5K median, 0 min, 2.7
    GB max
  • 2250 PAST nodes (k 5)
  • truncated normal distributions of node storage
    sizes, mean 27/270 MB

80
Need for storage management
  • No diversion (tpri 1, tdiv 0)
  • max utilization 60.8
  • 51.1 inserts failed
  • Replica/file diversion (tpri .1, tdiv .05)
  • max utilization gt 98
  • lt 1 inserts failed

81
PAST File insertion failures
82
PAST Caching
  • Nodes cache files in the unused portion of their
    allocated disk space
  • Files caches on nodes along the route of lookup
    and insert messages
  • Goals
  • maximize query xput for popular documents
  • balance query load
  • improve client latency

83
PAST Caching
fileId
Lookup topicId
84
PAST Caching
85
PAST Security
  • No read access control users may encrypt content
    for privacy
  • File authenticity file certificates
  • System integrity nodeIds, fileIds non-forgeable,
    sensitive messages signed
  • Routing randomized

86
PAST Storage quotas
  • Balance storage supply and demand
  • user holds smartcard issued by brokers
  • hides user private key, usage quota
  • debits quota upon issuing file certificate
  • storage nodes hold smartcards
  • advertise supply quota
  • storage nodes subject to random audits within
    leaf sets

87
PAST Related Work
  • CFS SOSP01
  • OceanStore ASPLOS 2000
  • FarSite Sigmetrics 2000

88
Status
  • Functional prototypes
  • Pastry Middleware 2001
  • PAST HotOS-VIII, SOSP01
  • SCRIBE NGC 2001
  • Squirrel submitted
  • http//www.microsoft.research.com/antr/Pastry

89
Current Work
  • Security
  • secure nodeId assignment
  • quota system
  • Keyword search capabilities
  • Support for mutable files in PAST
  • Anonymity/Anti-censorship
  • New applications
  • Software releases

90
Conclusion
  • For more information
  • http//www.research.microsoft.com/antr/Pastry
  • Ill be here till friday lunchtime feel free to
    stop me and talk
Write a Comment
User Comments (0)
About PowerShow.com