BeeHive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in P2P Overlays - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

BeeHive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in P2P Overlays

Description:

domain name service (DNS) and web access. O(logN/loglogN) ... hysteresis to limit thrashing. mutable objects. version number. proactive propagation to all nodes ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 25

Provided by: ram56

Category:

more less

Transcript and Presenter's Notes

Title: BeeHive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in P2P Overlays

1
BeeHive Exploiting Power Law Query Distributions
for O(1) Lookup Performance in P2P Overlays

Venugopalan Ramasubramanian (Rama)
and
Emin Gün Sirer

2
introduction

distributed peer-peer overlay networks
decentralized
self-organized
distributed hash tables (DHTs)
store lookup interface
unstructured DHTs
Freenet, Gnutella, Kazaa
bad lookup performance accuracy and latency

3
structured overlays
lookup performance lookup performance
CAN O(dN1/d)
Chord, Kademlia, Pastry, Tapestry, Viceroy O(logN)
de Bruijn graphs (Koorde) O(logN/loglogN)
Kelips O(1)

latency sensitive applications
domain name service (DNS) and web access

median mean
overlay rtt 81.9 ms 202 ms
DNS lookup 112 ms 256 ms
4
overview of beehive

general replication framework
proactive replication based on popularity of
objects
exploit structure of DHTs
goals
performance O(1) amortized lookup time
scalability minimize number of replicas and
reduce storage, bandwidth, and network load
adaptivity promptly respond to changes in
popularity flash crowds
mutable objects quickly disseminate object
updates to all replicas

5
prefix matching DHTs
2012
object 0121
6
key intuition
0021
0112
0122
2012
by replicating popular objects more and unpopular
objects less, the overall lookup performance can
be tuned to any desired constant efficiently.
7
popularity based replication

levels of replication 0, 1, 2,
nodes share i matching prefixes with the object
replicated on N/bi nodes
i hop lookup latency
lower level ? greater replication
default level ?logN?
replicated only at the home node

8
analytical model

optimization problem
minimize total number of replicas, s.t.,
average lookup performance ? C
zipf-like power law query distributions
popularity of ith popular request ? 1/i?
dns requests alpha 0.91 MIT trace
web requests

trace Dec UPisa FuNet UCB Quest NLANR
alpha 0.83 0.84 0.84 0.83 0.88 0.90
9
optimization problem

minimize (number of replicas)
x0 x1/b x2/b2 xK-1/bK-1
such that (average lookup time is C hops)
(x01-? x11-? x21-? xK-11-?) ? K C
and
x0 ? x1 ? x2 ? ? xK-1 ? 1
b base K logb(N)
xj fraction of objects replicated at level j or
lower

10
optimal solution
K is determined by setting (typically 2 or
3) xK-1 ? 1 ? dK-1 (K C) / (1 d
dK-1) ? 1
optimal replicas per node (1 1/b) / (1 d
dK-1)?/(1- ?) 1/bK
11
example

b 32
C 1
? 0.9
N 10,000
M 1,000,000
x0 0.001102 1102 objects
x1 0.0519 51900 objects
x2 1
total storage 3700 objects per node
total storage for Kelips M/?N 10,000 objects
per node

12
performance vs overhead trade off
13
analytical model

configurable target lookup performance
continuous range
even better with proximity routing
minimizing number of replicas provides storage as
well as bandwidth efficiency
k is a upper bound on lookup performance of
successful query
assumptions
homogeneous object sizes
infrequent updates

14
beehive replication protocol

aggregation phase
popularity of objects, zipf parameter
local measurement and limited aggregation
analysis phase
apply analytical model
locally change replication level
replication phase
push new replicas to nodes one hop away
remove old replicas no longer required

15
beehive replication protocol
home node
0 1 2
E
L 3
0 1
0 1
0 1
E
B
I
L 2
0
0
0
0
0
0
0
0
0
L 1
A
B
C
D
E
F
G
H
I
16
beehive replication protocol

periodic packets to nodes in routing table
asynchronous and independent
exploit structure of underlying DHT
replication packet sent by node A to each node B
in level i of routing table
node B pushes new replicas to A and tells A which
replicas to remove
fluctuations in estimated popularity
aging to prevent sudden changes
hysteresis to limit thrashing

17
mutable objects

version number
proactive propagation to all nodes
home node sends to nodes in level i of routing
table
level i nodes send to level i1 node
lazy propagation
replication phase
handles missed out updates due to join and leave

18
implementation

pastry (FreePastry 1.3)
replication also on leaf-set nodes
combine pastry heart-beat and routing table
maintenance with beehive aggregation and
replication packets
prototype DNS name server and resolver
UDP based and serves A queries
fall back to legacy DNS
home node detects and propagates updates

19
evaluation DNS application

DNS survey
queried 594059 unique domain names
TTL distribution 95 lt 1 day
rate of change of entries 0.13 per day
MIT DNS trace 4 11 december 2000
4 million queries for 300,000 distinct names
zipf parameter 0.91
setup
simulation mode on single node
1024 nodes, 40960 distinct objects
7 queries per sec from MIT trace
0.8 per day rate of change

20
evaluation lookup performance
3
2.5
2
latency (hops)
1.5
1
0.5
Pastry
PC-Pastry
Beehive
0
0
8
16
24
32
40
time (hours)
21
evaluation overhead
Storage
Object Transfers
6
x 10
2
PC-Pastry
average number of replicas per node average number of replicas per node
Pastry 40
Beehive 380
PC-Pastry 420
Kelips 1280
Beehive
1.5
object transfers ()
1
0.5
0
0
8
16
24
32
40
time (hours)
22
evaluation flash crowds
Latency
3
2.5
2
latency (hops)
1.5
1
Pastry
0.5
PC-Pastry
Beehive
0
32
40
48
56
64
72
80
time (hours)
popularity reversal complete inversion in
popularity-rank of all the objects.
23
evaluation zipf parameter change
24
conclusions