PeertoPeer Networks - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

PeertoPeer Networks

Description:

Peer-to-Peer Networks. Distributed Algorithms for P2P. Distributed ... Censor-resistant stores [Eternity, FreeNet, ...] Application-layer multicast [Narada, ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 55
Provided by: Pascal103
Category:

less

Transcript and Presenter's Notes

Title: PeertoPeer Networks


1
Peer-to-Peer Networks
  • Distributed Algorithms for P2P
  • Distributed Hash Tables

P. Felber Pascal.Felber_at_eurecom.frhttp//www.eure
com.fr/felber/
2
Agenda
  • What are DHTs? Why are they useful?
  • What makes a good DHT design
  • Case studies
  • Chord
  • Pastry (locality)
  • TOPLUS (topology-awareness)
  • What are the open problems?

3
What is P2P?
  • A distributed system architecture
  • No centralized control
  • Typically many nodes, but unreliable and
    heterogeneous
  • Nodes are symmetric in function
  • Take advantage of distributed, shared resources
    (bandwidth, CPU, storage) on peer-nodes
  • Fault-tolerant, self-organizing
  • Operate in dynamic environment, frequent join and
    leave is the norm

Internet
4
P2P Challenge Locating Content
  • Simple strategy expanding ring search until
    content is found
  • If r of N nodes have copy, the expected search
    cost is at least N / r, i.e., O(N)
  • Need many copies to keep overhead small

Who has this paper?
5
Directed Searches
  • Idea
  • Assign particular nodes to hold particular
    content (or know where it is)
  • When a node wants this content, go to the node
    that is supposes to hold it (or know where it is)
  • Challenges
  • Avoid bottlenecks distribute the
    responsibilities evenly among the existing
    nodes
  • Adaptation to nodes joining or leaving (or
    failing)
  • Give responsibilities to joining nodes
  • Redistribute responsibilities from leaving nodes

6
Idea Hash Tables
  • A hash table associates data with keys
  • Key is hashed to find bucket in hash table
  • Each bucket is expected to hold items/buckets
    items
  • In a Distributed Hash Table (DHT), nodes are the
    hash buckets
  • Key is hashed to find responsible peer node
  • Data and load are balanced across nodes

7
DHTs Problems
  • Problem 1 (dynamicity) adding or removing nodes
  • With hash mod N, virtually every key will change
    its location!
  • h(k) mod m ? h(k) mod (m1) ? h(k) mod (m-1)
  • Solution use consistent hashing
  • Define a fixed hash space
  • All hash values fall within that space and do not
    depend on the number of peers (hash bucket)
  • Each key goes to peer closest to its ID in hash
    space (according to some proximity metric)

8
DHTs Problems (contd)
  • Problem 2 (size) all nodes must be known to
    insert or lookup data
  • Works with small and static server populations
  • Solution each peer knows of only a few
    neighbors
  • Messages are routed through neighbors via
    multiple hops (overlay routing)

9
What Makes a Good DHT Design
  • For each object, the node(s) responsible for that
    object should be reachable via a short path
    (small diameter)
  • The different DHTs differ fundamentally only in
    the routing approach
  • The number of neighbors for each node should
    remain reasonable (small degree)
  • DHT routing mechanisms should be decentralized
    (no single point of failure or bottleneck)
  • Should gracefully handle nodes joining and
    leaving
  • Repartition the affected keys over existing nodes
  • Reorganize the neighbor sets
  • Bootstrap mechanisms to connect new nodes into
    the DHT
  • To achieve good performance, DHT must provide low
    stretch
  • Minimize ratio of DHT routing vs. unicast latency

10
DHT Interface
  • Minimal interface (data-centric)
  • Lookup(key) ? IP address
  • Supports a wide range of applications, because
    few restrictions
  • Keys have no semantic meaning
  • Value is application dependent
  • DHTs do not store the data
  • Data storage can be build on top of DHTs
  • Lookup(key) ? data
  • Insert(key, data)

11
DHTs in Context
User Application
load_file
store_file
File System
Retrieve and store files Map files to blocks
CFS
load_block
store_block
Storage Replication Caching
Reliable Block Storage
DHash
lookup
DHT
Lookup Routing
Chord
receive
send
Transport
Communication
TCP/IP
12
DHTs Support Many Applications
  • File sharing CFS, OceanStore, PAST,
  • Web cache Squirrel,
  • Censor-resistant stores Eternity, FreeNet,
  • Application-layer multicast Narada,
  • Event notification Scribe
  • Naming systems ChordDNS, INS,
  • Query and indexing Kademlia,
  • Communication primitives I3,
  • Backup store HiveNet
  • Web archive Herodotus

13
DHT Case Studies
  • Case Studies
  • Chord
  • Pastry
  • TOPLUS
  • Questions
  • How is the hash space divided evenly among nodes?
  • How do we locate a node?
  • How does we maintain routing tables?
  • How does we cope with (rapid) changes in
    membership?

14
Chord (MIT)
  • Circular m-bit ID space for both keys and nodes
  • Node ID SHA-1(IP address)
  • Key ID SHA-1(key)
  • A key is mapped to the first node whose ID is
    equal to or follows the key ID
  • Each node is responsible for O(K/N) keys
  • O(K/N) keys move when a node joins or leaves

m6
2m-1 0
15
Chord State and Lookup (1)
  • Basic Chord each node knows only 2 other nodes
    on the ring
  • Successor
  • Predecessor (for ring management)
  • Lookup is achieved by forwarding requests around
    the ring through successor pointers
  • Requires O(N) hops

m6
2m-1 0
N1
N56
N8
K54
N51
N14
N48
N42
N21
N38
N32
16
Chord State and Lookup (2)
Finger table
  • Each node knows m other nodes on the ring
  • Successors finger i of n points to node at n2i
    (or successor)
  • Predecessor (for ring management)
  • O(log N) state per node
  • Lookup is achieved by following closest preceding
    fingers, then successor
  • O(log N) hops

N81
N14
N82
N14
m6
N84
N14
2m-1 0
N88
N21
N1
N816
N32
N832
N42
N56
N8
K54
1
2
N51
4
N14
N48
8
16
32
N42
N21
N38
N32
17
Chord Ring Management
  • For correctness, Chord needs to maintain the
    following invariants
  • For every key k, succ(k) is responsible for k
  • Successor pointers are correctly maintained
  • Finger table are not necessary for correctness
  • One can always default to successor-based lookup
  • Finger table can be updated lazily

18
Joining the Ring
  • Three step process
  • Initialize all fingers of new node
  • Update fingers of existing nodes
  • Transfer keys from successor to new node

19
Joining the Ring Step 1
  • Initialize the new node finger table
  • Locate any node n in the ring
  • Ask n to lookup the peers at j20, j21, j22
  • Use results to populate finger table of j

20
Joining the Ring Step 2
  • Updating fingers of existing nodes
  • New node j calls update function on existing
    nodes that must point to j
  • Nodes in the rangesj-2i , pred(j)-2i1
  • O(log N) nodes need to be updated

N81
N14
N82
N14
m6
N84
N14
2m-1 0
N88
N21
N1
N816
N32
N28
N832
N42
N56
N8
N51
N14
N48
16
N42
N21
N38
N32
21
Joining the Ring Step 3
  • Transfer key responsibility
  • Connect to successor
  • Copy keys from successor to new node
  • Update successor pointer and remove keys
  • Only keys in the range are transferred

22
Stabilization
  • Case 1 finger tables are reasonably fresh
  • Case 2 successor pointers are correct, not
    fingers
  • Case 3 successor pointers are inaccurate or key
    migration is incomplete MUST BE AVOIDED!
  • Stabilization algorithm periodically verifies and
    refreshes node pointers (including fingers)
  • Basic principle (at node n)
  • x n.succ.predif x ? (n, n.succ) n
    n.succ notify n.succ
  • Eventually stabilizes the system when no node
    joins or fails

23
Dealing With Failures
  • Failure of nodes might cause incorrect lookup
  • N8 doesnt know correct successor, so lookup of
    K19 fails
  • Solution successor list
  • Each node n knows r immediate successors
  • After failure, n knows first live successor and
    updates successor list
  • Correct successors guarantee correct lookups

m6
2m-1 0
N1
N56
N8
1
lookup(K19) ?
2
N51
4
N14
N48
8
16
N18
K19
N42
N21
N38
N32
24
Dealing With Failures (contd)
  • Successor lists guarantee correct lookup with
    some probability
  • Can choose r to make probability of lookup
    failure arbitrarily small
  • Assume half of the nodes fail and failures are
    independent
  • P(n.successor-list all dead) 0.5r
  • P(n does not break the Chord ring) 1 - 0.5r
  • P(no broken nodes) (1 0.5r)N
  • r 2log(N) makes probability 1 1/N
  • With high probability (1/N), the ring is not
    broken

25
Evolution of P2P Systems
  • Nodes leave frequently, so surviving nodes must
    be notified of arrivals to stay connected after
    their original neighbors fail
  • Take time t with N nodes
  • Doubling time time from t until N new nodes join
  • Halving time time from t until N nodes leave
  • Half-life minimum of halving and doubling time
  • Theorem there exist a sequence of joins and
    leaves such that any node that has received fewer
    than k notifications per half-life will be
    disconnected with probability at least (1
    1/(e-1))k 0.418k

26
Chord and Network Topology
Nodes numerically-close are not
topologically-close (1M nodes 10 hops)
27
Pastry (MSR)
  • Circular m-bit ID space for both keys and nodes
  • Addresses in base 2b with m / b digits
  • Node ID SHA-1(IP address)
  • Key ID SHA-1(key)
  • A key is mapped to the node whose ID is
    numerically-closest the key ID

m8
2m-1 0
b2
28
Pastry Lookup
  • Prefix routing from A to B
  • At hth hop, arrive at node that shares prefix
    with B of length at least h digits
  • Example 5324 routes to 0629 via 5324 ? 0748 ?
    0605 ? 0620 ? 0629
  • If there is no such node, forward message to
    neighbor numerically-closer to destination
    (successor) 5324 ? 0748 ? 0605 ? 0609 ? 0620 ?
    0629
  • O(log2b N) hops

29
Pastry State and Lookup
  • For each prefix, a node knows some other node (if
    any) with same prefix and different next digit
  • For instance, N0201
  • N- N1???, N2???, N3???
  • N0 N00??, N01??, N03??
  • N02 N021?, N022?, N023?
  • N020 N0200, N0202, N0203
  • When multiple nodes, choose topologically-closest
  • Maintain good locality properties (more on that
    later)

Routing table
m8
2m-1 0
b2
N0002
N0122
N3200
N0201
N0212
N0221
N3033
N0233
N0322
N3001
N2222
N1113
N2120
K2120
N2001
30
A Pastry Routing Table
b2, so node ID is base 4 (16 bits)
b2
m16
Node ID 10233102
Contains the nodes that are numerically closest
to local node MUST BE UP TO DATE
Leaf set
lt SMALLER
LARGER gt
10233033
10233021
10233120
10233122
10233001
10233000
10233230
10233232
Routing Table
02212102
1
22301203
31203203
Entries in the mth column have m as next digit
0
11301233
12230203
13021022
2
10031203
10132102
10323302
3
10222302
10200230
10211302
nth digit of current node
m/b rows
3
10230322
10231000
10232121
10233001
10233232
1
Entries in the nth row share the first n
digits with current node common-prefix
next-digit rest
10233120
0
2
Contains the nodes that are closest to local
node according to proximity metric
Neighborhood set
2b-1 entries per row
Entries with no suitable node ID are left empty
13021022
10200230
11301233
31301233
02212102
22301203
31203203
33213321
31
Pastry and Network Topology
Expected node distance increases with row number
in routing table
Smaller and smaller numerical jumps Bigger and
bigger topological jumps
32
Joining
X joins
X 0629
0629s routing table
33
Locality
  • The joining phase preserves the locality property
  • First A must be near X
  • Entries in row zero of As routing table are
    close to A, A is close to X ? X0 can be A0
  • The distance from B to nodes from B1 is much
    larger than distance from A to B (B is in A0) ?
    B1 can be reasonable choice for X1, C2 for X2,
    etc.
  • To avoid cascading errors, X requests the state
    from each of the node in its routing table and
    updates its own with any closer node
  • This scheme works pretty well in practice
  • Minimize the distance of the next routing step
    with no sense of global direction
  • Stretch around 2-3

34
Node Departure
  • Node is considered failed when its immediate
    neighbors in the node ID space cannot communicate
    with it
  • To replace a failed node in the leaf set, the
    node contacts the live node with the largest
    index on the side of failed node, and asks for
    its leaf set
  • To repair a failed routing table entry Rdl, node
    contacts first the node referred to by another
    entry Ril, i?d of the same row, and ask for that
    nodes entry for Rdl
  • If a member in the M table, is not responding,
    node asks other members for their M table, check
    the distance of each of the newly discovered
    nodes, and update its own M table

35
CAN (Berkeley)
  • Cartesian space (d-dimensional)
  • Space wraps up d-torus
  • Incrementally split space between nodes that join
  • Node (cell) responsible for key k is determined
    by hashing k for each dimension

d2
1
36
CAN State and Lookup
  • A node A only maintains state for its immediate
    neighbors (N, S, E, W)
  • 2d neighbors per node
  • Messages are routed to neighbor that minimizes
    Cartesian distance
  • More dimensions means faster the routing but also
    more state
  • (dN1/d)/4 hops on average
  • Multiple choices we can route around failures

d2
B
N
W
A
E
S
37
CAN Landmark Routing
  • CAN nodes do not have a pre-defined ID
  • Nodes can be placed according to locality
  • Use well known set of m landmark machines (e.g.,
    root DNS servers)
  • Each CAN node measures its RTT to each landmark
  • Orders the landmarks in order of increasing RTT
    m! possible orderings
  • CAN construction
  • Place nodes with same ordering close together in
    the CAN
  • To do so, partition the space into m! zones m
    zones on x, m-1 on y, etc.
  • A node interprets its ordering as the coordinate
    of its zone

38
CAN and Network Topology
C
CAB
A
Use m landmarks to split space in m! zones
CBA
ACB
Nodes get random zone in their zone
Topologically-close nodes tend to be in the same
zone
BCA
B
ABC
BAC
39
Topology-Awareness
  • Problem
  • P2P lookup services generally do not take
    topology into account
  • In Chord/CAN/Pastry, neighbors are often not
    locally nearby
  • Goals
  • Provide small stretch route packets to their
    destination along a path that mimics the
    router-level shortest-path distance
  • Stretch DHT-routing / IP-routing
  • Our solution
  • TOPLUS (TOPology-centric Look-Up Service)
  • An extremist design to topology-aware DHTs

40
TOPLUS Architecture
Group nodes in nested groups using IP prefixes
AS, ISP, LAN (IP prefix contiguous address range
of the form w.x.y.z/n)
Use IPv4 address range (32-bits) for node IDs and
key IDs
Assumption nodes with same IP prefix are
topologically close
IP Addresses
41
Node State
Each node n is part of a series of telescoping
sets Hi with siblings Si
Node n must know all up nodes in inner group
Node n must know one delegate node in each tier i
set S ? Si
IP Addresses
42
Routing with XOR Metric
  • To lookup key k, node n forwards the request to
    the node in its routing table whose ID j is
    closest to k according to XOR metric
  • Let j j31j30...j0 k k31k30...k0
  • Refinement of longest-prefix match
  • Note that closest ID is unique d(j,k) d(j,k)
    ? j j
  • Example (8 bits)
  • k 10010110
  • j 10110110 d(j,k) 25 32
  • j 10001001 d(j,k) 24 23 22 21 20
    31

43
Prefix Routing Lookup
Perform longest-prefix match against entries in
routing table using XOR metric
Route message to node in inner group with closest
ID (according to XOR metric)
Compute 32-bits key k (using hash function)
k
IP Addresses
44
TOPLUS and Network Topology
Smaller and smaller numerical and topological
jumps
Always move closer to the destination
45
Group Maintenance
  • To join the system, a node n find its closest
    node n
  • n copies the routing and inner-group tables of n
  • n modifies its routing table to satisfy a
    diversity property
  • Requires that the delegate nodes of n and n are
    distinct with high probability
  • Allows us to find a replacement delegate in case
    of failure
  • Upon failure, update inner-group tables
  • Lazy update of routing tables
  • Membership tracking within groups (local, small)

46
On-Demand Caching
To look up k, create kk with r first bits
replaced by w.x.y.z/r (node responsible for k in
cache)
Cache data in group (ISP, campus) with prefix
w.x.y.z/r
Extends naturally to multiple levels (cache
hierarchy)
k
IP Addresses
47
Measuring TOPLUS Stretch
  • Obtained prefix information from
  • BGP tables from Oregon, Michigan Universities
  • Routing registries from Castify, RIPE
  • Sample of 1000 different IP address
  • Point-to-point IP measurements using King
  • TOPLUS distance weighted average of all possible
    paths between source and destination
  • Weights probability of a delegate to be in each
    group
  • TOPLUS stretch TOPLUS distance / IP distance

48
Results
  • Original Tree
  • 250,562 distinct IP prefixes
  • Up to 11 levels of nesting
  • Mean stretch 1.17
  • 16-bit regrouping (gt16 ? 16)
  • Aggregate small tier-1 groups
  • Mean stretch 1.19
  • 8-bit regrouping (gt16 ? 8)
  • Mean stretch 1.28
  • Original1 add one level with 256 8-bit prefixes
  • Mean stretch 1.9
  • Artificial, 3-tier tree
  • Mean stretch 2.32

49
TOPLUS Summary
  • Problems
  • Non-uniform ID space (requires bias in hash to
    balance load)
  • Correlated node failures
  • Advantages
  • Small stretch
  • IP longest-prefix matching allows fast forwarding
  • On-demand P2P caching straightforward to
    implement
  • Can be easily deployed in a static environment
    (e.g., multi-site corporate network)
  • Can be used as benchmark to measure speed of
    other P2P services

50
Other Issues Hierarchical DHTs
  • The Internet is organized as a hierarchy
  • Should DHT designs be flat?
  • Hierarchical DHTs multiple overlays managed by
    possibly different DHTs (Chord, CAN, etc.)
  • First, locate group responsible for key in
    top-level DHT
  • Then, find peer in next-level overlay, etc.
  • By designating the most reliable peers as
    super-nodes (part of multiple overlays), number
    of hops can be significantly decreased
  • How can we deploy, maintain such architectures?

51
Hierarchical DHTs Example
CAN Group
s1
s4
Top-level Chord Overlay
s2
s3
Chord Group
52
Other Issues DHT Querying
  • DHTs allow us to locate data very quickly...
  • Lookup(Beatles/Help) ? IP address
  • ...but it only works for perfect matching
  • Users tend to submit broad queries
  • Lookup(Beatles/) ? IP address
  • Queries may be inaccurate
  • Lookup(Beattles/Help) ? IP address
  • Idea Index data using partial queries as keys
  • Other approach Fuzzy matching (UCSB)

53
Some Other Issues
  • Better handling of failures
  • In particular, Byzantine failures A single
    corrupted node may compromise the system
  • Reasoning with the dynamics of the system
  • A large system may never achieve a quiescent
    ideal state
  • Dealing with untrusted participants
  • Data authentication, integrity of routing tables,
    anonymity and censor resistance, reputation
  • Traffic-awareness, load balancing

54
Conclusion
  • DHT is a simple, yet powerful abstraction
  • Building block of many distributed services (file
    systems, application-layer multicast, distributed
    caches, etc.)
  • Many DHT designs, with various pros and cons
  • Balance between state (degree), speed of lookup
    (diameter), and ease of management
  • System must support rapid changes in membership
  • Dealing with joins/leaves/failures is not trivial
  • Dynamics of P2P network is difficult to analyze
  • Many open issues worth exploring
Write a Comment
User Comments (0)
About PowerShow.com