Title: Chord:%20A%20Scalable%20Peer-to-peer%20Lookup%20Service%20for%20Internet%20Applications
1Chord A Scalable Peer-to-peer Lookup Service for
Internet Applications
2A peer-to-peer storage problem
- 10000 scattered music enthusiasts
- Willing to store and serve replicas
- Contributing resources, e.g., storage, bandwidth,
etc. - How do you find the data?
- Efficient Lookup mechanism needed!
3The lookup problem
N2
N1
N3
Internet
Keytitle ValueMP3 data
?
Client
Publisher
Lookup(title)
N6
N4
N5
4Centralized lookup (Napster)
N2
N1
SetLoc(title, N4)
N3
Client
DB
N4
Publisher_at_
Lookup(title)
Keytitle ValueMP3 data
N8
N9
N7
N6
Simple, but O(N) state and a single point of
failure Legal issues!
5Napster Publish
insert(X, 123.2.21.23) ...
I have X, Y, and Z!
123.2.21.23
6Napster Search
123.2.0.18
search(A) --gt 123.2.0.18
Where is file A?
7Napster
- Central Napster server
- Can ensure correct results?
- Bottleneck for scalability
- Single point of failure
- Susceptible to denial of service
- Malicious users
- Lawsuits, legislation
- Search is centralized
- File transfer is direct (peer-to-peer)
8Flooded queries (Gnutella)
N2
N1
Lookup(title)
N3
Client
N4
Publisher_at_
Keytitle ValueMP3 data
N6
N8
N7
N9
Robust, but worst case O(N) messages per lookup
9Gnutella Query Flooding
Breadth-First Search (BFS)
10Gnutella Query Flooding
- A node/peer connects to a set of Gnutella
neighbors - Forward queries to neighbors
- Client which has the Information responds.
- Flood network with TTL for termination
- Results are complete
- Bandwidth wastage
11Gnutella Random Walk
- Improved over query flooding
- Same overly structure to Gnutella
- Forward the query to random subset of it
neighbors - Reduced bandwidth requirements
- Incomplete results
- High latency
Peer nodes
12Kazza (Fasttrack Networks)
- Hybrid of centralized Napster and decentralized
Gnutella - Super-peers act as local search hubs
- Each super-peer is similar to a Napster server
for a small portion of the network - Super-peers are automatically chosen by the
system based on their capacities (storage,
bandwidth, etc.) and availability (connection
time) - Users upload their list of files to a super-peer
- Super-peers periodically exchange file lists
- You send queries to a super-peer for files of
interest - The local super-peer may flood the queries to
other super-peers for the files of interest, if
it cannot satisfy the queries. - Exploit the heterogeneity of peer nodes
13Kazza
- Uses supernodes to improvescalability, establish
hierarchy - Uptime, bandwidth
- Closed-source
- Uses HTTP to carry out download
- Encrypted protocol queuing, QoS
14KaZaA Network Design
15KaZaA File Insert
insert(X, 123.2.21.23) ...
I have X!
123.2.21.23
16KaZaA File Search
Where is file A?
17Routed queries (Freenet, Chord, etc.)
N2
N1
N3
Client
N4
Lookup(title)
Publisher
Keytitle ValueMP3 data
N6
N8
N7
N9
18Routing challenges
- Define a useful key nearness metric
- Keep the hop count small
- Keep the tables small
- Stay robust despite rapid change (node
addition/removal) - Freenet emphasizes anonymity
- Chord emphasizes efficiency and simplicity
19Chord properties
- Efficient O(log(N)) messages per lookup
- N is the total number of servers
- Scalable O(log(N)) state per node
- Robust survives massive failures
- Proofs are in paper / tech report
- Assuming no malicious participants
20Chord overview
- Provides peer-to-peer hash lookup
- Lookup(key) ? IP address
- Mapping key ? IP address
- How does Chord route lookups?
- How does Chord maintain routing tables?
21Chord IDs
- Key identifier SHA-1(key)
- Node identifier SHA-1(IP address)
- Both are uniformly distributed
- Both exist in the same ID space
- How to map key IDs to node IDs?
22Consistent hashing Karger 97
Key 5
K5
Node 105
N105
K20
Circular 7-bit ID space
N32
N90
K80
A key is stored at its successor node with next
higher ID
23Basic lookup
N120
N10
Where is key 80?
N105
N32
N90 has K80
N90
K80
N60
24Simple lookup algorithm
- Lookup(my-id, key-id)
- n my successor
- if my-id lt n lt key-id
- call Lookup(id) on node n // next hop
- else
- return my successor // done
- Correctness depends only on successors
25Finger table allows log(N)-time lookups
½
¼
Fast track/ Express lane
1/8
1/16
1/32
1/64
1/128
N80
26Finger i points to successor of n2i
N120
112
½
¼
1/8
1/16
1/32
1/64
1/128
N80
27Lookup with fingers
- Lookup(my-id, key-id)
- look in local finger table for
- highest node n s.t. my-id lt n lt key-id
- if n exists
- call Lookup(id) on node n // next hop
- else
- return my successor // done
28Lookups take O(log(N)) hops
N5
N10
N110
K19
N20
N99
N32
Lookup(K19)
N80
N60
29Joining linked list insert
N25
N36
1. Lookup(36)
K30 K38
N40
30Join (2)
N25
2. N36 sets its own successor pointer
N36
K30 K38
N40
31Join (3)
N25
3. Copy keys 26..36 from N40 to N36
N36
K30
K30 K38
N40
32Join (4)
N25
4. Set N25s successor pointer
N36
K30
K30 K38
N40
Update finger pointers in the background Correct
successors produce correct lookups
33Failures might cause incorrect lookup
N120
N10
N113
N102
Lookup(90)
N85
N80
N80 doesnt know correct successor, so incorrect
lookup
34Solution successor lists
- Each node knows r immediate successors
- After failure, will know first live successor
- Correct successors guarantee correct lookups
- Guarantee is with some probability
35Choosing the successor list length
- Assume 1/2 of nodes fail
- P(successor list all dead) (1/2)r
- I.e. P(this node breaks the Chord ring)
- Depends on independent failure
- P(no broken nodes) (1 (1/2)r)N
- r 2log(N) makes prob. 1 1/N
36Lookup with fault tolerance
- Lookup(my-id, key-id)
- look in local finger table and successor-list
- for highest node n s.t. my-id lt n lt key-id
- if n exists
- call Lookup(id) on node n // next hop
- if call failed,
- remove n from finger table
- return Lookup(my-id, key-id)
- else return my successor // done
37Chord status
- Working implementation as part of CFS
- Chord library 3,000 lines of C
- Deployed in small Internet testbed
- Includes
- Correct concurrent join/fail
- Proximity-based routing for low delay
- Load control for heterogeneous nodes
- Resistance to spoofed node IDs
38Experimental overview
- Quick lookup in large systems
- Low variation in lookup costs
- Robust despite massive failure
- See paper for more results
- Experiments confirm theoretical results
39Chord lookup cost is O(log N)
Average Messages per Lookup
Number of Nodes
Constant is 1/2
40Failure experimental setup
- Start 1,000 CFS/Chord servers
- Successor list has 20 entries
- Wait until they stabilize
- Insert 1,000 key/value pairs
- Five replicas of each
- Stop X of the servers
- Immediately perform 1,000 lookups
41Massive failures have little impact
(1/2)6 is 1.6
Failed Lookups (Percent)
Failed Nodes (Percent)
42Related Work
- CAN (Ratnasamy, Francis, Handley, Karp, Shenker)
- Pastry (Rowstron, Druschel)
- Tapestry (Zhao, Kubiatowicz, Joseph)
- Chord emphasizes simplicity
43Chord Summary
- Chord provides peer-to-peer hash lookup
- Efficient O(log(n)) messages per lookup
- Robust as nodes fail and join
- Good primitive for peer-to-peer systems
- http//www.pdos.lcs.mit.edu/chord
44Reflection on Chord
- Strict overlay structure
- Strict data placement
- If data keys are uniformly distributed, and of
keys gtgt of nodes - Load balanced
- Each node has O(1/N) fraction of keys
- Node addition/deletion only move O(1/N) load,
load movement is minimized!
45Reflection on Chord
- Routing table (successor list finger table)
- Deterministic
- Network topology unaware
- Routing latency could be a problem
- Proximity Neighbor Selection (PNS)
- m neighbor candidates, choose min latency
- Still O(logN) hops
46Reflection on Chord
- Predecessor Successor must be correct,
aggressively maintained - Finger tables are lazily maintained
- Tradeoff bandwidth, routing correctness
47Reflection on Chord
- Assume uniform node distribution
- In the wild, nodes are heterogeneous
- Load imbalance!
- Virtual servers
- A node hosts multiple virtual servers
- O(logN)
48(No Transcript)
49Join lazy finger update is OK
N2
N25
K30
N36
N40
N2 finger should now point to N36, not
N40 Lookup(K30) visits only nodes lt 30, will
undershoot
50CFS a peer-to-peer storage system
- Inspired by Napster, Gnutella, Freenet
- Separates publishing from serving
- Uses spare disk space, net capacity
- Avoids centralized mechanisms
- Delete this slide?
- Mention distributed hash lookup
51CFS architecturemove later?
Block storage Availability / replication Authentic
ation Caching Consistency Server
selection Keyword search Lookup
Dhash distributed block store
Chord
- Powerful lookup simplifies other mechanisms