SHELL: A Distributed and Oblivious Heap with Applications for Robust Information Systems and Heterogeneous Peer-to-Peer Networks - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

SHELL: A Distributed and Oblivious Heap with Applications for Robust Information Systems and Heterogeneous Peer-to-Peer Networks

Description:

SHELL: A Distributed and Oblivious Heap. with Applications for Robust ... but not: e.g., Pagoda - many systems in practice are not: Gnutella, BitTorrent, etc. ... – PowerPoint PPT presentation

Number of Views:69

Avg rating:3.0/5.0

Slides: 36

Provided by: roger213

Category:

more less

Transcript and Presenter's Notes

Title: SHELL: A Distributed and Oblivious Heap with Applications for Robust Information Systems and Heterogeneous Peer-to-Peer Networks

1
SHELL A Distributed and Oblivious Heapwith
Applications for Robust Information Systems and
Heterogeneous Peer-to-Peer Networks
Christian Scheideler Stefan Schmid
Network Algorithms Summer 2008
2
Bevor wir SHELL anschauen...

Prof. Scheideler an Konferenz
Deshalb Spezialprogramm
Shell
- Baut auf gelerntem auf!
Ongoing work...
? Keine Unterlagen
? Hat noch Lücken, ev. auch Fehler
? / ? Slides auf Englisch damit auch sonst mal
gebrauchbar!
? Offen für Inputs / Ideen!

DISTRIBUTED COMPUTING
3
Motivation

Today, still many challenges in distributed
systems (e.g., the Internet)
E.g., viruses, spam, DoS attacks, selfish users,
etc.
Very active research
For example, peer-to-peer computing
Dynamics / churn Peers join and leave
frequently
In 1,000,000 network where peer sessions are
around 60 minutes, there are hundreds of
membership changes every second!
Peer-to-peer based on contributions of
participants problematic if users are selfish!
E.g., BitThief free-rides in BitTorrent
Heterogeneity peers have different Internet
connections, different CPUs, run different
operating systems, etc.

DISTRIBUTED COMPUTING
4
SHELL Overview

SHELL our overlay architecture
Basically, a distributed heap
Refresher min heap
- children have larger key
than parent
- e.g., useful for priority
queues (fast removeMin())

DISTRIBUTED COMPUTING
slide from GAD lecture 2008...
5
Heap Refresher

Heap in GAD...

6
A Distributed Heap?

What is a distributed heap?
We assume that peers have a key / order / rank /
id
- for example time when peer joined
(Min-) heap property Peers only connect to peers
of lower order
- for example peers only connect to older peers
- Shell constructs a directed overlay
(however, backward edges, see later)

DISTRIBUTED COMPUTING
28
26
23
21
20
19
18
17
16
9
10
3
7
An Oblivious Distributed Heap? (1)

What is an oblivious distributed heap?
Oblivious overlay topology only depends on set
of currently active peers (and their IDs /
orders) in the network
- but not on history, e.g., on time when these
peers joined!
- example if at join time, a new peer is
inserted at the end of a list of peers, the
resulting topology is not oblivious
- example if a new peer is inserted in a list
of peers with respect to the peers order, the
topology is oblivious

DISTRIBUTED COMPUTING
8
An Oblivious Distributed Heap? (2)

Why is oblivious good?
- the oblivious property is useful when it comes
to fault-tolerance
- e.g., desktops may crash temporarily, and will
then rejoin
- if topology is oblivious, peers can remember
their old contacts, and
when an old contact reappears, it can be
integrated
immediately (instantaneous rejoin)

DISTRIBUTED COMPUTING

Many systems today are oblivious
- e.g., Pastry, Chord, etc.
- but not e.g., Pagoda
- many systems in practice are not Gnutella,
BitTorrent, etc.

9
Objectives of Shell

Primary goal dynamic and robust overlay
In particular
- maintaining heap property
- low peer degree, low network diameter, low
congestion
- fast join / rejoin / leave
- peers can simply crash

DISTRIBUTED COMPUTING

Applications
- i-SHELL A distributed information system
robust to Sybil attacks
- h-SHELL A peer-to-peer system for
heterogeneous environments

10
Overlay Graph (1)

How to achieve these goals?
Overlay based on continuous-discrete approach
- basically a de Bruijn graph

Refresher continuous-discrete approach
- peers in cyclic 0,1)-interval
- connected to peer responsible for continuous
position x/2 and (x1)/2

11
Overlay Graph (2)

Our distributed heap has larger peer degree
Space is divided into different partitions
- partition i 2i intervals of size 1/2i
- global partition renders analysis
simpler (same views)

12
Overlay Graph (3)

Peer connects to all peers of lower order in
- Level-i home interval (interval which includes
position x of peer)
- Adjacent level-i intervals to home
- de Bruijn intervals intervals which include
position x/2 and (x1)/2
What is level i?
- Level i chosen such that there are c log np
peers in interval
- np total number of peers in system with
lower order
- np can be estimated, in the following we
assume it is given

13
Overlay Graph (4)

In order to ensure connectivity when many peers
leave, interval size must be increased over time
(peer upgrades to larger partition)
Similarly, if many peers of lower order join in
interval, peers needs to downgrade
In addition to these forward edges, peers store
incoming edges
- called backward edges

14
Overlay Graph (5)

These edges are already sufficient for Shell
However, in order to speed-up changes between
levels, peer additionally store pointers to peers
it would connect to if it upgraded
- to funnel to which peer would connect
- of course, peer only connects to these lower
order peers once they are on the corresponding
level
- requires notification mechanism

Level 1
...
...

In the following, we will
not consider funnel edges
in further detail!

Level i-2
Level i-1
Level i
15
Implication Monotonicity

From this construction, we can already derive
some properties
For instance, Shell features a monotonicity
property
If two peers p and p are connected to the same
interval I and if p is of larger order than p,
then p knows strictly more peers in I
- because peers only connect to lower order
peers in an interval

16
Distributed Order... A Simplification

In the following, we will assume that peers have
distinct IDs
E.g., assigned at join time by network entry
point
Otherwise in case of multiple joins close in
time, peers may not be able to decide which is
older gt need to introduce blackout zones, etc.
In the following, we will not consider this issue
in more detail

17
Analysis of Degree (1)

Topological description allows to analyze the
peer degree

Peers employ the following strategy if number of
neighbors falls below c log n_p in at least one
interval, all intervals are doubled

According to Chernoff bounds, it holds that
if one interval contains c log n peers, there is
no interval of size larger (1d) c log n for any
d gt 0, with high probability.

Therefore, degree is in O(log n) w.h.p.
- with funnel edges, the degree is log square

18
Analysis of Degree (2)

What about incoming / backward edges?

19
Routing (1)

The Shell overlay allows peers to route messages
Similarly to continuous-discrete routing
(adjusting one bit after another)
Routing operation route(x) consists of two phases
Phase 1 Route along forward edges to peer of
lower order which is closest to x
(or to a lower order peer whose home
region contains position x)
Phase 2 Descent along backward edges to peer
which is closest to x

Implication If a peer wants to send a message
to a peer of lower order, only Phase 1 is
necessary, and the message will not traverse any
higher order peers!
20
Routing (2)

Observe that in our overlay, peers have multiple
neighbors which could be used for the next de
Bruijn routing hop (log n neighbors per interval)
This can be exploited in order to minimize
congestion
Routing policy peer p always forwards packets to
its neighbor which is of largest order among the
eligible peers (lower order than p)
This alleviates load on very low order peers

21
Routing (3)

Visualization of routing

towards higher order peers

Messages travel towards lower order peers
But on each hop, as high order peer as possible
is taken

22
Routing (4)
towards higher order peers

Analysis of Phase 1
- accoring to continuous-discrete routing, at
most log n hops are needed to destination
- we make the following observation

prob that all peers of order lower than p but
higher than n_p-l_1 are in other interval
prob that this peer is located in the
corresponding interval
23
Routing (5)
towards higher order peers

Generally for i-th hop

Summing up, after some lines of calculation, the
probability that the
final peer reached is of order np/2 or smaller
is at most O(np-c) for some constant c

With high probability, in first phase of routing,
request travels to peer of order at least np/2.
24
Routing (6)
towards higher order peers

Definition of congestion

So what is the congestion in the first routing
phase?

25
Routing (7)
towards higher order peers

So what is the congestion in the first routing
phase?

See our argument before...
At most k peers can send via p, routing path is
of length log 2k and probability that it enters
interval on one of these hops is c log k / k
26
Routing (8)
Theorem First phase of routing terminates in
logarithmic time and yields congestion of
asymptotically log2 np.
27
Routing (9)

Routing phase 2 descent along backward edges to
higher order peers
- idea binary search which exploits
monotonicity property
- higher order peers know more about interval
- on each level i, go to highest order peer
which is located in interval which includes final
position x
- terminates in logarithmic time
- logarithmic congestion in each hop, a peer
forwards at most one request

28
Join and Leave

Join similar to lookup, find highest order peer
in final interval, get integrated
Leave peers can even crash, not particular
operation
Change of level in time O(1), update cost induced
at other peers in O(log2 n)

29
Application 1 i-Shell

i-Shell is a distributed information system
Idea data management through consistent hashing
approach
Generalized to multiple levels on each level,
data is stored on peer closest to x
- on each hop during insertion, a replica is
placed
Order of peers time-stamps (assigned by network
entry point)
Thus peers only connect to older peers

30
i-Shell

Therefore
- we immediately get that two peers p and p can
communicate on paths which include only peers
which are of peers at least their age
- this renders the communication independent of
younger peers
Side benefit measurement studies have shown that
older peers typically have a longer remaining
session time
- renders topology more stable
Shells imply rebustness to various attacks
E.g., Sybil attack

31
Sybil Attack (1)

Sybil attack
- big problem in Internet
- e.g., spam
- Sybil book by Flora Rheta about person with
16 identities
Attacker seeks to acquire many identities
- e.g., to control large fraction of network
Countermeasures
- virutal identities captchas etc.
- real identities? botnet?
- Douceur has shown that issue is difficult to
deal with in distributed environments...

32
Sybil Attack (2)

Shell is resilient to Sybil attacks of any scale!
Model Sybil attack starts at some time t0
Theorem traffic of old peers independent of
Sybil attack
Techniques
- Admission control
- Rate control

3
5
traffic between older peers unaffected
4
7
9
12
higher peers can perform a rate control algorithm
10
8
21
14
15
11
attack originates from lower peers
33
Application 2 h-Shell