Introduction to Structured Overlay Networks - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

Introduction to Structured Overlay Networks

Description:

Gnutella. Completely decentralized. Ask everyone you know to find data. Very inefficient ... scalability that systems like Gnutella display because of their ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 70
Provided by: sei114
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Structured Overlay Networks


1
Introduction to Structured Overlay Networks
  • Seif Haridi
  • KTH/SICS

1
2
Presentation Overview
  • Gentle introduction to Structured Overlay
    Networks and Distributed Hash Tables
  • Chord algorithms and others

3
Whats a Distributed Hash Table (DHT)?
, which is distributed
  • An ordinary hash table
  • Every node provides a lookup operation
  • Provide the value associated with a key
  • Nodes keep routing pointers
  • If item not found, route to another node

11/20/2009
3
4
So what?
Time to find data is logarithmic Size of routing
tables is logarithmic Example log2(1000000)20 E
FFICIENT!
Store number of items proportional to number of
nodes Typically With D items and n nodes Store
D/n items per node Move D/n items when nodes
join/leave/fail EFFICIENT!
  • Self-management routing info
  • Ensure routing information is up-to-date
  • Self-management of items
  • Ensure that data is always replicated and
    available
  • Characteristic properties
  • Scalability
  • Number of nodes can be huge
  • Number of items can be huge
  • Self-manage in presence joins/leaves/failures
  • Routing information
  • Data items

11/20/2009
4
5
Traditional Motivation (1/2)
  • Peer-to-Peer file sharing very popular
  • Napster
  • Completely centralized
  • Central server knows who has what
  • Judicial problems
  • Gnutella
  • Completely decentralized
  • Ask everyone you know to find data
  • Very inefficient

central index
decentralized index
11/20/2009
5
6
Traditional Motivation (2/2)
  • Grand vision of DHTs
  • Provide efficient file sharing
  • Quote from Chord In particular, Chord can
    help avoid single points of failure or control
    that systems like Napster possess, and the lack
    of scalability that systems like Gnutella display
    because of their widespread use of broadcasts.
    Stoica et al. 2001
  • Hidden assumptions
  • Millions of unreliable nodes
  • User can switch off computer any time
    (leavefailure)
  • Extreme dynamism (nodes joining/leaving/failing)
  • Heterogeneity of computers and latencies
  • Untrusted nodes

11/20/2009
6
7
Motivation DHT overlay as communication
infra-structure
  • Internet communication
  • IP/port, TCP and UDP
  • Not suited for 21st century computing
  • Firewalls
  • NATs
  • Changing IP addresses

11/20/2009
7
8
Name based communication
  • DHTs can overcome these
  • How?
  • Use the DHT
  • Map names to locations
  • Bypass firewalls and NATs by routing through
    neighbors

11/20/2009
8
9
Name based communication
  • What about group communication?
  • IP Multicast is not enabled on the Internet
  • Use the overlay to broadcast to all nodes
  • Create multiple groups, broadcast within each

11/20/2009
9
10
Whats it good for?
  • Lets look at 10 applications built using such
    systems

11
Distributed Backup
  • Setup
  • Clients installed the backup tool
  • Decide on amount of space to share
  • Choose files for backup
  • Regular backup
  • Data is encrypted
  • Stored in the directory

12
Distributed File System
  • Similar to AFS and NFS
  • Files stored in directory
  • What is new?
  • Application logic self-managed
  • Add/remove servers on the fly
  • Automatically handles failures
  • Automatically load-balances
  • No manual configuration needed

13
P2P Cache
  • A distributed cache
  • Every node in an org. runs a client
  • Want to browse a web page?
  • If exists locally -gt download it from a peer
  • Otherwise, fetch and cache
  • No central proxy needed

14
P2P Web Servers
  • Distributed Web Server
  • Pages stored in the directory
  • What is new?
  • Application logic self-managed
  • Automatically load-balances
  • Add/remove servers on the fly
  • Automatically handles failures

15
P2P SIP
  • Session Initiation Protocol
  • Used to initiate calls on the Internet
  • Is being standardized
  • Use the directory to find end-hosts
  • Improving Skype

16
Host Identity Payload (HIP)
  • Uses the directory to provide seamless mobility
  • Unlike Mobile IP
  • No home agent needed
  • Self-managing

17
PIER (databases)
  • A relational view of the directory
  • Use SQL to fetch data
  • Standard operations (projection, selection,
    equi-join)

18
Summary
  • DHT is a useful data structure
  • Assumptions mentioned might not be true
  • Moderate amount of dynamism
  • Leave not same thing as failure
  • Dedicated servers
  • Nodes can be trusted
  • Less heterogeneity

19
Chord as Example of DHT
20
How to construct a DHT (Chord)?
  • Use a logical name space, called the identifier
    space, consisting of identifiers 0,1,2,, N-1
  • Identifier space is a logical ring modulo N
  • Every node picks a random identifier though Hash
    H
  • Example
  • Space N16 0,,15
  • Five nodes a, b, c, d, e
  • a picks 6
  • b picks 5
  • c picks 0
  • d picks 11
  • e picks 2

11/20/2009
20
21
Definition of Successor
  • The successor of an identifier is the
  • first node met going in clockwise direction
  • starting at the identifier
  • Example
  • succ(12)14
  • succ(15)2
  • succ(6)6

11/20/2009
21
22
Where to store data (Chord) ?
  • Use globally known hash function, H
  • Each item ltkey,valuegt gets
  • identifier H(key) k
  • Store each item at its successor
  • Node n is responsible for item k
  • Example
  • H(Marina)12
  • H(Peter)2
  • H(Seif)9
  • H(Stefan)14

Store number of items proportional to number of
nodes Typically (on average) With D items and n
nodes Store D/n items per node Move D/n items
when nodes join/leave/fail EFFICIENT!
11/20/2009
22
23
Where to point (Chord) ?
  • Each node points to its successor
  • The successor of a node n is succ(n1)
  • Known as a nodes succ pointer
  • Each node points to its predecessor
  • First node met in anti-clockwise direction
    starting at n-1
  • Known as a nodes pred pointer
  • Example
  • 0s successor is succ(1)2
  • 2s successor is succ(3)5
  • 5s successor is succ(6)6
  • 6s successor is succ(7)11
  • 11s successor is succ(12)0

11/20/2009
23
24
DHT Lookup
  • To lookup a key k
  • Calculate H(k)
  • Follow succ pointers until item k is found
  • Example
  • Lookup Seif at node 2
  • H(Seif)9
  • Traverse nodes
  • 2, 5, 6, 11 (BINGO)
  • Return Stockholm to initiator

11/20/2009
24
25
DHT Lookup
  • (a, b the segment of the ring moving clockwise
    from but not including a until and including b
  • n.foo(.) denotes an RPC of foo(.) to node n
  • n.bar denotes and RPC to fetch the value of the
    variable bar in node n
  • We call the process of finding the successor of
    an id a LOOKUP
  • // ask node n to find the successor of id
  • procedure n.findSuccessor(id)
  • if predecessor ? nil ? id ? (predecessor, n
    then return n
  • else if id ?(n, successor then
  • return successor
  • else // forward the query around the circle
  • return successor.findSuccessor(id)

11/20/2009
25
26
DHT Lookup and Update
  • // ask node n to find the successor of id
  • procedure n.put(id,value)
  • s findSuccessor(id)
  • s.store(id,value)
  • procedure n.get(id)
  • s findSuccessor(id)
  • return s.retrieve(id)
  • PUT and GET are nothing but lookups!!

11/20/2009
26
27
Speeding up lookups
  • If only pointer to succ(n1) is used
  • Worst case lookup time is N, for N nodes
  • Improving lookup time (finger/routing table)
  • Point to succ(n1)
  • Point to succ(n2)
  • Point to succ(n4)
  • Point to succ(n8)
  • Point to succ(n2M-1)
  • Distance always halved to
  • the destination

Time to find data is logarithmic Size of routing
tables is logarithmic Example log2(1000000)20 E
FFICIENT!
11/20/2009
27
28
Chord Routing (1/7)
Get(15)
0
15
1
15
  • Routing table size M, where N 2M
  • Every node n knows successor(n 2 i-1) ,for i
    1..M
  • Routing entries log2(N)
  • log2(N) hops from any node to any other node

2
14
13
3
12
4
11
5
10
6
9
7
8
11/20/2009
28
29
Chord Routing (2/7)
0
15
1
15
  • Routing table size M, where N 2M
  • Every node n knows successor(n 2 i-1) ,for i
    1..M
  • Routing entries log2(N)
  • log2(N) hops from any node to any other node

2
14
13
3
12
4
11
5
10
6
9
Get(15)
7
8
11/20/2009
29
30
Chord Routing (3/7)
Get(15)
0
15
1
15
  • Routing table size M, where N 2M
  • Every node n knows successor(n 2 i-1) ,for i
    1..M
  • Routing entries log2(N)
  • log2(N) hops from any node to any other node

2
14
13
3
12
4
11
5
10
6
9
7
8
11/20/2009
30
31
Chord Routing (4/7)
Get(15)
0
15
1
15
  • From node 1, only 2 hops to node 0 where item 15
    is stored
  • For an id space of 16 is, the maximum is log2(16)
    4 hops between any two nodes
  • In fact, if nodes are uniformly distributed, the
    maximum is log2( of nodes), i.e. log2(8) hops
    between any two nodes
  • The average complexity is
  • ½ log(nodes)

2
14
13
3
12
4
11
5
10
6
9
7
8
11/20/2009
31
32
Chord Routing (5/7) Pseudo code
findSuccessor(.)
  • // ask node n to find the successor of id
  • procedure n.findSuccessor(id)
  • if predecessor ? nil ? id ? (predecessor, n
    then return n
  • if id ?(n, successor then
  • return successor
  • else
  • n closestPrecedingNode(id)
  • return n.findSuccessor(id)
  • // search locally for the highest predecessor of
    id
  • procedure closestPrecedingNode(id)
  • for i m downto 1 do
  • if fingeri ?(n, id) then return
    fingeri
  • end
  • return n

11/20/2009
32
33
Chord Discussion
  • We are basically done
  • But.
  • What about joins and failures/leaves?
  • Nodes come and go as they wish
  • What about data?
  • Should I lose my doc because some kid decided to
    shut down his machine and he happened to store my
    file? What about storing addresses of files
    instead of files?
  • What did we gain compared to Gnutella? Increased
    guarantees and determinism?
  • So actually we just started..

11/20/2009
33
34
Agenda
  • Handling successor pointers
  • Joins, Leaves
  • Scalability
  • Routing table reducing the cost from O(N) to
    O(logN)
  • Failures (for all the above)

11/20/2009
34
35
Handling SuccessorsRing maintenance
  • Every thing depends on successor pointers, so, we
    better have them right all the time!!
  • In Chord, in addition to the successor pointer,
    every node has a predecessor pointer as well for
    ring maintenance

11/20/2009
35
36
Handling Dynamism
  • Periodic stabilization is used to make pointers
    eventually correct
  • Try pointing succ to closest alive successor
  • Try pointing pred to closest alive predecessor
  • When receiving notify(p) at n
  • if prednil or p is in (pred,n
  • set predp
  • Periodically at n
  • vsucc.pred
  • if v?nil and v is in (n,succ
  • set succv
  • send a notify(n) to succ

11/20/2009
36
37
Handling joins
  • When n joins
  • Find ns successor with lookup(n)
  • Set succ to ns successor
  • Stabilization fixes the rest

15
13
11
  • Periodically at n
  • set vsucc.pred
  • if v?nil and v is in (n,succ
  • set succv
  • send a notify(n) to succ
  • When receiving notify(p) at n
  • if prednil or p is in (pred,n
  • set predp

11/20/2009
S. Haridi, ID2210, Lecture 02
37
38
Handling Successors - Chord Algorithm
nil
11/20/2009
38
39
Handling Join/Leaves For FingersFinger
Stabilization (1/5)
  • Periodically refresh finger table entries, and
    store the index of the next finger to fix
  • This is also the initialization procedure for the
    finger table (copy the finger table of succ, then
    fix )
  • Local variable next initially 0
  • procedure n.fixFingers()next next1if next gt
    m then next 1fingernext findSuccessor(n
    ? 2next-1)

11/20/2009
39
40
Examplefinger stabilization (2/5)
  • Current situation succ(N48) is N60
  • Succ(N21.Fingerj.start) Succ(53)
    N21.Fingerj.node N60

N21.Fingerj.node
N21.Fingerj.start
N21
N32
N26
N60
N48
N53
11/20/2009
40
41
Examplefinger stabilization (3/5)
  • New node N56 joins and stabilizes successor
    pointer
  • Finger j of node N21 is wrong
  • N21 eventually try to fix finger j by looking up
    53 which stops at N48, however and nothing
    changes

N21.Fingerj.node
N21.Fingerj.start
N21
N32
N26
N60
N48
N53
N56
11/20/2009
41
42
Examplefinger stabilization (4/5)
  • N48 will eventually stabilize its successor
  • This means the ring is correct now.

N21.Fingerj.node
N21.Fingerj.start
N21
N32
N26
N60
N56
N48
N53
11/20/2009
42
43
Examplefinger stabilization (5/5)
  • When N21 tries to fix Finger j again, this time
    the response from N48 will be correct and N21
    corrects the finger

N21.Fingerj.node
N21.Fingerj.start
N21
N32
N26
N60
N56
N48
N53
11/20/2009
43
44
Agenda
  • Handling successor pointers
  • Joins, Leaves,
  • Scalability
  • Routing table reducing the cost from O(N) to
    O(log N)
  • Failures (for all the above)
  • Handling data
  • Joins, Leaves

11/20/2009
44
45
Handling Failures Replication of Successors
  • Evidently the failure of one successor pointer
    means total collapse
  • Solution A node has a successors list of size
    r containing the immediate r successors
  • How big should r be? log(nodes) or a large
    constant should be ok
  • Enhance periodic stabilization to handle failures

11/20/2009
45
46
Dealing with failures
  • Each node keeps a successor-list
  • Pointer to r closest successors
  • succ(n1)
  • succ(succ(n1)1)
  • succ(succ(succ(n1)1)1)
  • ...
  • If successor fails
  • Replace with closest alive successor
  • If predecessor fails
  • Set pred to nil

11/20/2009
46
47
Handling leaves
  • When n leaves
  • Just dissappear (like failure)
  • When pred detected failed
  • Set pred to nil
  • When succ detected failed
  • Set succ to closest alive in successor list

15
13
11
  • Periodically at n
  • set vsucc.pred
  • if v?nil and v is in (n,succ
  • set succv
  • send a notify(n) to succ
  • When receiving notify(p) at n
  • if prednil or p is in (pred,n
  • set predp

11/20/2009
S. Haridi, ID2210, Lecture 02
47
48
Handling Failures- Ring (1/5)
  • Maintaining the ring
  • Each node maintains a successor list of length r
  • If a nodes immediate successor fails, it uses
    the second entry in its successor list
  • updateSuccessorList copies a successor list from
    s removing last entry, and prepending s
  • Join a Chord containing node n
  • procedure n.join(n) predecessor nil s
    n.findSuccessor(n) updateSuccessorList(s.success
    orList)

11/20/2009
48
49
Handling Failures- Ring (2/5)
  • Check whether predecessor has failed (Failure
    detector)
  • procedure n.checkPredecessor()if predecessor
    has failed then predecessor nil

11/20/2009
49
50
Handling Failures- Ring (3/5)
  • procedure n.stabilize()
  • s Find first alive node in successorList
  • x s.predecessorif x not nil and x ? (n, s)
    then s x endupdateSuccessorList(s.successorLis
    t) s.notify(n)
  • procedure n.notify(n)if predecessor nil or
    n? (predecessor, n) then predecessor n

11/20/2009
50
51
Failure Ring (4/5)Example Node failure (N26)
  • Initially

suc(N21,2)
suc(N21,1)
suc(N26,1)
N32
N26
N21
pred(N32)
pred(N32)
  • After N21 performed stabilize(), before
    N21.notify(N32)

suc(N21,1)
N32
N26
N21
pred(N32)
11/20/2009
51
52
Failure Ring (5/5)Example - Node failure
(N26)
  • After N21 performed stabilize(), before
    N21.notify(N32)
  • N21.notify(N32) has no effect

suc(N21,1)
N32
N26
N21
pred(N32)
  • After N32.checkPredecessor()

suc(N21,1)
N32
N26
N21
  • Next N21.stabilize() fixes N32s predecessor

11/20/2009
52
53
Failure Lookups (1/5)
  • // ask node n to find the successor of id
  • procedure n.findSuccessor(id)
  • if id ?(n, successor then
  • return successor
  • else
  • n closestPreceedingNode(id)
  • return try
    n.findSuccessor(id) catch failure of n
    then mark n in finger. as
    failed n.findSuccessor(id)
  • // search locally for the highest predecessor of
    id
  • procedure closestPreceedingNode(id)
  • for i m downto 1 do
  • if fingeri.node is alive and
    fingeri ?(n, id) then return fingeri
  • end
  • return n

11/20/2009
53
54
Some Chord Results Load balancing of keys
  • For any set of N nodes and K keys, with high
    probability
  • Each node is responsible for at most (1 ?)K/N
    keys
  • When an (N 1)st node joins or leaves the
    network, responsibility for O(K/N) keys changes
    hands (and only to or from the joining or leaving
    node)
  • ? is bounded by (at most) O(log N)

55
Some Chord resultsLoad balancing of keys
  • ? is reduced to a small constant by running log N
    virtual nodes (each with own identifier) on each
    physical node.

56
Some Chord ResultsLookup is logarithmic in
number of Nodes
  • With high probability, the number of nodes that
    must be contacted to find a successor in an
    N-node network is O(log N).
  • This is only if node and key identifiers are
    random.

57
Some Chord ResultsSuccessor List Failure
  • If we use a successor list of length r ?(logN)
    in a network that is initially stable, and then
    every node fails with probability 1/2, then with
    high probability find successor returns the
    closest living successor to the query key
  • Notice it required the nodes in the successor
    list are random

58
Variations of Chord
  • DKS
  • Chord

59
DKS Routing
  • Generalization of Chord to provide arbitrary
    arity
  • Provide logk(n) hops per lookup
  • k being a configurable parameter
  • n being the number of nodes
  • Instead of only log2(n)

60
Achieving logk(n) lookup
  • Each node logk(N)L levels, NkL
  • Each level contains k intervals,
  • Example, k4, N64 (43), node 0

0
4
8
12
48
16
32
61
Achieving logk(n) lookup
  • Each node logk(N) levels, NkL
  • Each level contains k intervals,
  • Example, k4, N64 (43), node 0

0
4
8
12
48
16
32
62
Achieving logk(n) lookup
  • Each node logk(N) levels, NkL
  • Each level contains k intervals,
  • Example, k4, N64 (43), node 0

0
4
8
12
48
16
32
63
Arity is Important
  • Maximum number of hops can be configured
  • Example, a 2-hop system

64
Chord
  • The routing table has exponentially increasing
    pointers on the ring (node space) and NOT the
    identifier space (skip-list like structure)

65
Routing Table of Chord
  • Building the routing table
  • log2N pointers
  • exponentially spaced pointers

Chord
66
Chord vs. Chord
Good for load balancing
67
Effect of virtual nodes
68
Stretch (proximity routing)
  • the ratio between the
  • latency of a Chord lookup from the time the
    lookup is initiated to the time the result is
    returned to the initiator, and
  • latency of an optimal lookup using the underlying
    network
  • Network lookup
  • is computed as the round-trip time between the
    initiator and the server responsible for the
    queried ID.

69
Stretch
Write a Comment
User Comments (0)
About PowerShow.com