Principles%20of%20Reliable%20Distributed%20Systems%20%20Tutorial%204:%20SkipNet - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Principles%20of%20Reliable%20Distributed%20Systems%20%20Tutorial%204:%20SkipNet

Description:

node name = reverse DNS name of the host (com.microsoft.host1) ... Key property of SkipNet: two address spaces ... Lookup starts from root only. Unequal load ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 34
Provided by: idi3
Learn more at: http://eecourses.technion.ac.il
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Principles%20of%20Reliable%20Distributed%20Systems%20%20Tutorial%204:%20SkipNet


1
Principles of Reliable Distributed Systems
Tutorial 4 SkipNet
  • Spring 2008
  • Alex Shraer

2
Reading Material
  • SkipNet A Scalable Overlay Network with
    Practical Locality PropertiesHarvey, Jones,
    Saroiu, Theimer, WolmanMicrosoft Research

3
Reminder DHT Advantages
  • Peer-to-peer no centralized control or
    infrastructure
  • Scalability O(log N) routing, routing tables,
    join time
  • Load-balancing
  • Overlay robustness

4
DHT Disadvantages SkipNet Motivation
  • No control where data is stored
  • Data may be stored far from its users
  • Data may be stored outside its administrative
    domain
  • hard to administer privileges
  • invites different security attacks
  • Local accesses leave local organization
  • In practice, organizations want
  • Content Locality explicitly place data where we
    want (inside the organization)
  • Path Locality guarantee that local traffic (a
    user in the organization looks for a file of the
    organization) remains local
  • No prefix search
  • Search(key) returns file whose name has key as
    prefix

5
Practical Requirements
  • Data Controllability
  • Organizations want control over their own data
  • Even if local data is globally available
  • Manageability
  • Data control allows for data administration,
    provisioning and manageability

6
Practical Requirements (contd)
  • Security
  • Content and path locality are key building blocks
    for dealing with certain external attacks (DoS,
    Traffic analysis)
  • Data availability
  • Local data survives network partitions.
  • Performance
  • Data can be stored near clients that use it

7
SkipNet Content Locality
  • Place files at nodes according to names
  • Name ID space (DNS-like)
  • for files and nodes
  • node name reverse DNS name of the host
    (com.microsoft.host1)
  • file names have same prefix
  • Problem?

8
Constrained Load-Balancing
  • Data uniformly distributed in designated subset
    of nodes
  • e.g., inside organization
  • How can this be achieved?
  • Numeric ID space!
  • similar to Chord, Pastry and others
  • nodes are randomly distributed
  • Hashes of the node names and content identifiers
    mapped into the numeric ID.
  • Content is stored on the node with id closest to
    contents hashed name.
  • Key property of SkipNet two address spaces

9
Skip Lists - Reminder
  • In-memory dictionary data structure.
  • Sorted linked list with a subset of nodes having
    additional links to skip over many list elements
  • Perfect (deterministic) skip list
  • Pointer at level h skips over 2h elements
  • Search O (log N), N number of nodes in the
    list.
  • Insertion/deletion expensive/awkward

10
Skip Lists - Reminder
  • Probabilistic skip list
  • Node at level h with probability 1/2h
  • Search, Insert, Delete O (log N) w.h.p.

11
Skip List Good for Us?
  • The Good
  • Sorted list path locality for name-based search
  • O(log N) search with skip pointers
  • Up to log(N) skip pointers O(log N) instertion
  • The Bad
  • Lookup starts from root only
  • Unequal load
  • nodes on the top levels have high chance to be in
    routing path

12
SkipNet Global View
L 3
L 2
L 1
Level L 0
The full SkipNet routing infrastructure for an 8
node system, including the ring labels.
13
SkipNet Structure
  • Skip Graph Distributed Skip List
  • Every node belongs to rings at all levels
  • Search can start at any node
  • Use doubly linked lists at each level to account
    for absence of head and tail nodes.
  • Perfect vs. Probabilistic
  • Perfect Pointers at level h point to nodes that
    are exactly 2h nodes to the left and right.
  • Probabilistic A node in level h
    probabilistically determines which ring it
    belongs to.
  • All rings are sorted according to Name IDs
  • Ring membership is according to Numeric IDs
  • All nodes sharing the same prefix of Numeric IDs
    of length h are members of the same ring at level
    h

14
SkipNet Routing Tables
Ring 000
Ring 001
Ring 010
Ring 011
Ring 100
Ring 101
Ring 110
Ring 111
L 3
Ring 01
Ring 00
Ring 10
Ring 11
L 2
Node As Routing Table
Ring 1
Ring 0
L 1
Root Ring
Level L 0
15
An Alternative View
010
101
110
M
D
O
000
001
A
T
Z
V
X
100
111
011
Level
2 T T
1 M X
0 D Z
Level
2 D D
1 Z O
0 X T
SkipNet nodes ordered by name ID. Routing tables
of nodes A and V shown.
16
Routing By Name ID
  • Routing in Skip Graph Search in Skip Lists
  • Simple Rule
  • Forward the message to node that is closest to
    destination, without going too far.
  • Route either clockwise/counterclockwise
  • Terminates when messages arrives at a node whose
    name ID is closest to destination.
  • Number of hops is O(log N) w.h.p.

17
Example Routing from A to V
Ring 000
Ring 001
Ring 010
Ring 011
Ring 100
Ring 101
Ring 110
Ring 111
L 3
Ring 01
Ring 00
Ring 10
Ring 11
L 2
Ring 1
Ring 0
L 1
Root Ring
Level L 0
18
Example Routing from A to V
Ring 000
Ring 001
Ring 010
Ring 011
Ring 100
Ring 101
Ring 110
Ring 111
L 3
Ring 01
Ring 00
Ring 10
Ring 11
L 2
Node Ts Routing Table
Ring 1
Ring 0
L 1
Root Ring
Level L 0
19
Example Routing from A to V
Ring 100
Ring 101
Ring 110
Ring 111
Ring 001
Ring 010
Ring 011
Ring 000
L 3
Ring 01
Ring 00
Ring 10
Ring 11
L 2
Ring 1
Ring 0
L 1
Root Ring
Level L 0
20
Example Routing to Object
Ring 100
Ring 101
Ring 110
Ring 111
Ring 001
Ring 010
Ring 011
Ring 000
L 3
Ring 01
Ring 00
Ring 10
Ring 11
L 2
Ring 1
Ring 0
L 1
Root Ring
Level L 0
21
Name ID Routing Algorithm
// Invoked at all nodes (including the source
and // destination nodes) along the routing
path. RouteByNameID(msg) // Forward along
the longest pointer // that is between us and
msg.nameID. h localNode.maxHeight
while (h gt 0) nbr localNode.RouteTabl
emsg.dirh if (LiesBetween(localNode.n
ameID,
nbr.nameID,
msg.nameID,
msg.dir)) SendToNode(msg, nbr)
return h h - 1
// hlt0 implies we are the closest node.
DeliverMessage(msg.msg)
Load Balancing
  • SendMsg(nameID, msg)
  • if( LongestPrefix(nameID,localNode.nameID)0 )
  • msg.dir RandomDirection()
  • else if( nameIDltlocalNode.nameID )
  • msg.dir counterClockwise
  • else
  • msg.dir clockwise
  • msg.nameID nameID
  • RouteByNameID(msg)

Path Locality
22
Routing By Numeric ID
  • Numeric ids are random, no ring is sorted by
    them
  • We cant route top-down!
  • Bottom-up Routing
  • Routing begins at level 0 ring until a node is
    found whose numeric ID matches the destination
    numeric ID in the first digit.
  • Messages forwarded from ring in level h, Rh, to a
    ring in level h1, Rh1, such that nodes in Rh1
    share h1 digits with destination numeric ID.
  • Terminates when message delivered, or none the
    nodes in Rh share h1 digits with destination
    numeric ID, at a node in Rh with closest possible
    numeric id.

23
Example Routing by Numeric ID
Ring 000
Ring 001
Ring 010
Ring 011
Ring 100
Ring 101
Ring 110
Ring 111
M
O
D
L 3
T
A
V
Z
X
M
D
Ring 01
Ring 00
Ring 10
Ring 11
A
T
L 2
O
V
Z
X
M
O
D
Ring 1
Ring 0
T
L 1
A
Z
V
X
M
D
O
Root Ring
T
A
Level L 0
V
Z
X
  • Hash(Foo.c) 101

24
Routing by Numeric ID
  • The same routing tables are used for routing by
    nameID and numericID
  • When Numeric IDs are binary in each ring Rh, in
    expectation only 2 nodes visited before
    encountering one belonging to the next ring
    Rh1
  • The number of message hops is O(log N) w.h.p.

25
Routing Algorithm
  • // Invoked at all nodes (including the source and
    destination nodes) along the routing path.
  • // Initially msg.ringLvl -1, msg.startNode
    msg.bestNode null msg.finalDestination
    false
  • RouteByNumericID(msg)
  • if (msg.numID localNode.numID
    msg.finalDestination)
  • DeliverMessage(msg.msg)
  • return
  • if (localNode msg.startNode) // Done
    traversing current ring.
  • msg.finalDestination true
  • SendToNode(msg.bestNode)
  • return
  • h CommonPrefixLen(msg.numID, localNode.numID)
  • if (h gt msg.ringLvl) // Found a higher ring.
  • msg.ringLvl h
  • msg.startNode msg.bestNode localNode
  • else if ( abs(localNode.numID - msg.numID) lt
    abs(msg.bestNode.numID - msg.numID))
  • // Found a better candidate for current ring.
  • msg.bestNode localNode

26
Base (k) for Numeric IDs
  • If a higher base kgt2 is used for Numeric IDs the
    routing is O(klogkN) w.h.p.
  • When we increase k
  • more rings in each level
  • less levels
  • less pointers in routing table
  • less state but more hops
  • Optimization - dense routing table (R-Table)
  • Normal (sparse) R-Table k-1 pointers to
    contiguous nodes in both directions at each
    level.
  • More state but less hops

27
Node Join
  • Two-stage process (1) bottom-up (2) top-down
  • Bottom-up find the top level ring that matches
    the nodes numeric ID.
  • Top-down build the new nodes routing table
  • Find a neighbor in the top ring using name ID
    search.
  • Starting from this neighbor, search for the name
    ID at the next lower level and thus find
    neighbors at lower level.
  • Repeated until the search reaches the root.
  • Update of the existing nodes routing tables
  • after the new node has joined the root ring.

28
Node join illustrated
Joining node
Ring P1
Ring P0
Only a few in expectation
Ring P
29
Node Join - Analysis
  • Key ideas
  • Climb to a weakly populated ring.
  • Search for the nodes neighbors at the lower
    levels only after finding the neighbors at the
    higher levels.
  • The range of traversed nodes at the level the
    range of neighbors at the next higher level.
  • Insertion traverses O(log N) hops whp
  • Expected O(log N) levels, constant number of
    neighbors at each level.

30
Node Departure/Failure
  • Graceful (notified) vs crash departure
  • Key issue routing tables update
  • Key idea separate vital info from optimizations
  • Routing is correct as long as the root level ring
    is maintained.
  • Other levels regarded as optimization hints
  • Does this remind something?
  • Upper-ring membership maintained through a
    background repair process.

31
Leaf Sets
  • Idea use redundant pointers at level 0
  • Store L/2 pointers in each direction
  • SkipNet uses L16
  • Not an original SkipNet idea used in Pastry.
  • Protect from independent failures
  • Improve the search performance
  • rout directly using leaf set if got within L/2 of
    the target

32
Constrained Load Balancing (CLB)
  • Multiple DHTs with differing scopes using a
    single SkipNet structure
  • A result of the ability to route in both address
    spaces
  • Divide data object names into two parts with !
    CLB Domain CLB Suffix
  • microsoft.com!skipnet.htm
    l
  • microsoft.com/skipnet.html! controlled
    placement
  • !microsoft.com/skipnet.html Global DHT

33
CLB Example
com.microsoft
com.sun
gov.irs
edu.ucb
  • File ID com.microsoft!skipnet.html
  • Route by name ID to com.microsoft
  • Inside com.microsoft, route by numeric ID to
    hash(skipnet.html)

34
SkipNet Path Locality
com.microsoft
com.sun
gov.irs
edu.ucb
  • Organizations correspond to contiguous SkipNet
    segments
  • Internal routing by NameID remains internal
  • Nodes have left / right pointers
About PowerShow.com