Title: SHELL: A Distributed and Oblivious Heap with Applications for Robust Information Systems and Heterogeneous Peer-to-Peer Networks
1SHELL A Distributed and Oblivious Heapwith
Applications for Robust Information Systems and
Heterogeneous Peer-to-Peer Networks
Christian Scheideler Stefan Schmid
Network Algorithms Summer 2008
2Bevor wir SHELL anschauen...
- Prof. Scheideler an Konferenz
- Deshalb Spezialprogramm
- Shell
- - Baut auf gelerntem auf!
- Ongoing work...
- ? Keine Unterlagen
- ? Hat noch Lücken, ev. auch Fehler
- ? / ? Slides auf Englisch damit auch sonst mal
gebrauchbar! - ? Offen für Inputs / Ideen!
DISTRIBUTED COMPUTING
3Motivation
- Today, still many challenges in distributed
systems (e.g., the Internet) - E.g., viruses, spam, DoS attacks, selfish users,
etc. - Very active research
- For example, peer-to-peer computing
- Dynamics / churn Peers join and leave
frequently - In 1,000,000 network where peer sessions are
around 60 minutes, there are hundreds of
membership changes every second! - Peer-to-peer based on contributions of
participants problematic if users are selfish! - E.g., BitThief free-rides in BitTorrent
- Heterogeneity peers have different Internet
connections, different CPUs, run different
operating systems, etc.
DISTRIBUTED COMPUTING
4SHELL Overview
- SHELL our overlay architecture
- Basically, a distributed heap
- Refresher min heap
- - children have larger key
- than parent
- - e.g., useful for priority
- queues (fast removeMin())
DISTRIBUTED COMPUTING
slide from GAD lecture 2008...
5Heap Refresher
6A Distributed Heap?
- What is a distributed heap?
- We assume that peers have a key / order / rank /
id - - for example time when peer joined
- (Min-) heap property Peers only connect to peers
of lower order - - for example peers only connect to older peers
- - Shell constructs a directed overlay
- (however, backward edges, see later)
DISTRIBUTED COMPUTING
28
26
23
21
20
19
18
17
16
9
10
3
7An Oblivious Distributed Heap? (1)
- What is an oblivious distributed heap?
- Oblivious overlay topology only depends on set
of currently active peers (and their IDs /
orders) in the network - - but not on history, e.g., on time when these
peers joined! - - example if at join time, a new peer is
inserted at the end of a list of peers, the
resulting topology is not oblivious - - example if a new peer is inserted in a list
of peers with respect to the peers order, the
topology is oblivious
DISTRIBUTED COMPUTING
8An Oblivious Distributed Heap? (2)
- Why is oblivious good?
- - the oblivious property is useful when it comes
to fault-tolerance - - e.g., desktops may crash temporarily, and will
then rejoin - - if topology is oblivious, peers can remember
their old contacts, and - when an old contact reappears, it can be
integrated - immediately (instantaneous rejoin)
DISTRIBUTED COMPUTING
- Many systems today are oblivious
- - e.g., Pastry, Chord, etc.
- - but not e.g., Pagoda
- - many systems in practice are not Gnutella,
BitTorrent, etc.
9Objectives of Shell
- Primary goal dynamic and robust overlay
- In particular
- - maintaining heap property
- - low peer degree, low network diameter, low
congestion - - fast join / rejoin / leave
- - peers can simply crash
DISTRIBUTED COMPUTING
- Applications
- - i-SHELL A distributed information system
robust to Sybil attacks - - h-SHELL A peer-to-peer system for
heterogeneous environments
10Overlay Graph (1)
- How to achieve these goals?
- Overlay based on continuous-discrete approach
- - basically a de Bruijn graph
-
- Refresher continuous-discrete approach
- - peers in cyclic 0,1)-interval
- - connected to peer responsible for continuous
position x/2 and (x1)/2
11Overlay Graph (2)
- Our distributed heap has larger peer degree
- Space is divided into different partitions
- - partition i 2i intervals of size 1/2i
- - global partition renders analysis
- simpler (same views)
12Overlay Graph (3)
- Peer connects to all peers of lower order in
- - Level-i home interval (interval which includes
position x of peer) - - Adjacent level-i intervals to home
- - de Bruijn intervals intervals which include
position x/2 and (x1)/2 - What is level i?
- - Level i chosen such that there are c log np
peers in interval - - np total number of peers in system with
lower order - - np can be estimated, in the following we
assume it is given
13Overlay Graph (4)
- In order to ensure connectivity when many peers
leave, interval size must be increased over time
(peer upgrades to larger partition) - Similarly, if many peers of lower order join in
interval, peers needs to downgrade - In addition to these forward edges, peers store
incoming edges - - called backward edges
14Overlay Graph (5)
- These edges are already sufficient for Shell
- However, in order to speed-up changes between
levels, peer additionally store pointers to peers
it would connect to if it upgraded - - to funnel to which peer would connect
- - of course, peer only connects to these lower
order peers once they are on the corresponding
level - - requires notification mechanism
Level 1
...
...
- In the following, we will
- not consider funnel edges
- in further detail!
Level i-2
Level i-1
Level i
15Implication Monotonicity
- From this construction, we can already derive
some properties - For instance, Shell features a monotonicity
property - If two peers p and p are connected to the same
interval I and if p is of larger order than p,
then p knows strictly more peers in I - - because peers only connect to lower order
peers in an interval
16Distributed Order... A Simplification
- In the following, we will assume that peers have
distinct IDs - E.g., assigned at join time by network entry
point - Otherwise in case of multiple joins close in
time, peers may not be able to decide which is
older gt need to introduce blackout zones, etc. - In the following, we will not consider this issue
in more detail
17Analysis of Degree (1)
- Topological description allows to analyze the
peer degree
- Peers employ the following strategy if number of
neighbors falls below c log n_p in at least one
interval, all intervals are doubled
- According to Chernoff bounds, it holds that
- if one interval contains c log n peers, there is
- no interval of size larger (1d) c log n for any
- d gt 0, with high probability.
-
- Therefore, degree is in O(log n) w.h.p.
- - with funnel edges, the degree is log square
-
18Analysis of Degree (2)
- What about incoming / backward edges?
19Routing (1)
- The Shell overlay allows peers to route messages
- Similarly to continuous-discrete routing
(adjusting one bit after another) - Routing operation route(x) consists of two phases
- Phase 1 Route along forward edges to peer of
lower order which is closest to x - (or to a lower order peer whose home
region contains position x) - Phase 2 Descent along backward edges to peer
which is closest to x
Implication If a peer wants to send a message
to a peer of lower order, only Phase 1 is
necessary, and the message will not traverse any
higher order peers!
20Routing (2)
- Observe that in our overlay, peers have multiple
neighbors which could be used for the next de
Bruijn routing hop (log n neighbors per interval) - This can be exploited in order to minimize
congestion - Routing policy peer p always forwards packets to
its neighbor which is of largest order among the
eligible peers (lower order than p) - This alleviates load on very low order peers
21Routing (3)
towards higher order peers
- Messages travel towards lower order peers
- But on each hop, as high order peer as possible
is taken
22Routing (4)
towards higher order peers
- Analysis of Phase 1
- - accoring to continuous-discrete routing, at
most log n hops are needed to destination - - we make the following observation
prob that all peers of order lower than p but
higher than n_p-l_1 are in other interval
prob that this peer is located in the
corresponding interval
23Routing (5)
towards higher order peers
- Summing up, after some lines of calculation, the
probability that the - final peer reached is of order np/2 or smaller
is at most O(np-c) for some constant c
With high probability, in first phase of routing,
request travels to peer of order at least np/2.
24Routing (6)
towards higher order peers
- So what is the congestion in the first routing
phase?
25Routing (7)
towards higher order peers
- So what is the congestion in the first routing
phase?
See our argument before...
At most k peers can send via p, routing path is
of length log 2k and probability that it enters
interval on one of these hops is c log k / k
26Routing (8)
Theorem First phase of routing terminates in
logarithmic time and yields congestion of
asymptotically log2 np.
27Routing (9)
- Routing phase 2 descent along backward edges to
higher order peers - - idea binary search which exploits
monotonicity property - - higher order peers know more about interval
- - on each level i, go to highest order peer
which is located in interval which includes final
position x - - terminates in logarithmic time
- - logarithmic congestion in each hop, a peer
forwards at most one request
28Join and Leave
- Join similar to lookup, find highest order peer
in final interval, get integrated - Leave peers can even crash, not particular
operation - Change of level in time O(1), update cost induced
at other peers in O(log2 n)
29Application 1 i-Shell
- i-Shell is a distributed information system
- Idea data management through consistent hashing
approach - Generalized to multiple levels on each level,
data is stored on peer closest to x - - on each hop during insertion, a replica is
placed - Order of peers time-stamps (assigned by network
entry point) - Thus peers only connect to older peers
30i-Shell
- Therefore
- - we immediately get that two peers p and p can
communicate on paths which include only peers
which are of peers at least their age - - this renders the communication independent of
younger peers - Side benefit measurement studies have shown that
older peers typically have a longer remaining
session time - - renders topology more stable
- Shells imply rebustness to various attacks
- E.g., Sybil attack
31Sybil Attack (1)
- Sybil attack
- - big problem in Internet
- - e.g., spam
- - Sybil book by Flora Rheta about person with
16 identities -
- Attacker seeks to acquire many identities
- - e.g., to control large fraction of network
- Countermeasures
- - virutal identities captchas etc.
- - real identities? botnet?
- - Douceur has shown that issue is difficult to
deal with in distributed environments...
32Sybil Attack (2)
- Shell is resilient to Sybil attacks of any scale!
- Model Sybil attack starts at some time t0
- Theorem traffic of old peers independent of
Sybil attack - Techniques
- - Admission control
- - Rate control
3
5
traffic between older peers unaffected
4
7
9
12
higher peers can perform a rate control algorithm
10
8
21
14
15
11
attack originates from lower peers
33Application 2 h-Shell
- Alternatively, IDs could represent inverse of the
peers capabilities - Therefore peers only connect to peers with
stronger capabilities - Interesting architecture for heterogeneous
systems - Corollary paths between strong peers only
include strong peers - Interesting, e.g., for
- multi-quality live-streaming
34Conclusion
- Distributed heap based on continuous-discrete
appraoch - Oblivious for highly transient environments
- Robustness to Sybil attacks of arbitrary scale
- Alternatively, useful for heterogeneous
environments - Work in progress...
35(No Transcript)