Title: Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan
1Chord A Scalable Peer-to-peer Lookup Service for
Internet ApplicationsIon Stoica, Robert Morris,
David Karger, M. Frans Kaashoek, Hari Balakrishnan
- Presented by Alexei Semenov
2Introduction
- Main problem with peer-to-peer applications - we
need to efficiently locate the node that stores a
particular data item. - Chord Only one operation given a key, it maps
the key onto a node. Uses a variant of consistent
hashing to assign keys to Chord nodes. - The advantages of using the consistent hashing
- balances the load
- little movement of keys when nodes join or leave
the system - What distinguishes Chord from many other
peer-to-peer lookup protocols? - Simplicity
- Provable correctness
- Provable performance
3Related Work
- Chord vs traditional name and location services
- Freenet provides anonymity, while Chord doesnt
- Globe exploits network locality better than Chord
- Plaxton provides stronger guarantees than Chord
- Though in some aspects Chord performs worse than
other services, its advantage is that it still
performs well and in some other aspects even
better. And it is considerably less complicated.
4System Model
- Features of Chord
- Load balance
- Decentralization
- Scalability
- Availability
- Flexible naming
- Chord software performs as a library linked
with the client and server applications using it.
There are two ways of interaction between the
application and the Chord - Chord provides a lookup(key) algorithm, that
yields the IP address of the node responsible for
the key. - Chord software on each node notifies the
application of changes in the set of keys that
the node is responsible for.
5The Base Chord Protocol Consistent Hashing (1)
- Chord uses consistent hashing, but improves its
scalability by avoiding the requirement that
every node knows about every other node. - Consistent hash function assigns each node and
key an m-bit identifier using a base
hash-function such as SHA-1. A nodes identifier
is chosen by hashing the nodes IP address, while
a key identifier is produced by hashing the key. - Consistent hashing assigns keys to nodes as
follows Identifiers are ordered in an identifier
circle modulo 2m. Key k is assigned to the first
node whose identifier is equal to or follows k in
the identifier space. This node is called the
successor node of k.
6The Base Chord Protocol Consistent Hashing (2)
- Example m3 The successor identifier 1 is node
1, so key 1 would be located at node 1.
Similarly, key 2 would be located at node 3, and
key 6 at node 0.
Consistent hashing enables nodes to enter and
leave the network with minimal disruption.
7The Base Chord Protocol Scalable Key Location
- Using only consistent hashing may require to
traverse all nodes to find the appropriate
mapping. Thats why Chord maintains an additional
routing information. - Each node n maintains a routing table with at
most m entries, where m is the number of bits in
the key/node identifiers. This table is called
the finger table. - A finger table entry includes both the Chord
identifier and the IP address (and port number)
of the relevant node.
8The Base Chord Protocol Node Joins
- Nodes can leave or join at any time. Preserving
the ability to locate every key in the network
may present a challenge. Chord deals with this
problem by making sure that - Each nodes successor is correctly maintained
- For every key k, node successor(k) is responsible
for k. - Each nodes predecessor is correctly maintained
- When a node n joins the network, Chord performs 3
operations - Initializes the predecessor and fingers of node n
- Updates the fingers and predecessors of existing
nodes to reflect the addition of n - Notifies the higher layer software so that it can
transfer state associated with keys that node n
is now responsible for.
9Concurrent Operations and failures
- Stabilization
- Needed in case of concurrent joins. Basic
stabilization protocol is used to keep nodes
successor pointers up to date, which is
sufficient to guarantee correctness of lookups.
Successor pointers are then used to verify and
correct finger table entries, which allows these
lookups to be fast as well as correct. - Failures and Replication
- When a node n fails, nodes whose finger tables
include n must find ns successor. Besides the
failure of n must not allow any disruption of
queries that are in progress. - To successfully recover from the failure, one
needs to maintain correct successor pointers. For
that matter each Chord node maintains a
successor-list of its r nearest successors on
the Chord ring.
10Stabilization Example
11Simulation Results
- Protocol Simulator
- Implemented in iterative style, which means that
a node that resolves a lookup initiates all
communication. It asks a series of nodes for
information from their finger tables, each time
moving closer on the Chord ring to the desired
successor. - Load Balance
- The number of keys per node exhibits large
variations that increase linearly with the number
of keys. - Path Length
- The mean path length increases logarithmically
with the number of nodes. - Simultaneous Node Failures
- No significant lookup failure
12Experimental Results
- Prototype implementation of Chord was deployed on
the Internet. Chord nodes at ten sites on a
subnet of the RON test-bed in the USA in
California, Colorado, Massachusetts, New York,
North Carolina and Pennsylvania. Chord software
runs on UNIX, uses 160-bit keys obtained from the
SHA-1 cryptographic hash function, and uses TCP
to communicate between nodes. Chord runs in
iterative style.
- Figure shows the measured latency of Chord
lookups over a range of number of nodes. - Lookup latency grows slowly with the total number
of nodes, which confirms with the simulation
results, demonstrating Chords scalability.
13Conclusion
- Chord features simplicity, provable correctness
and provable performance even when there are
concurrent node arrivals and departures. - It continues to function properly even when the
nodes information is only partially correct. - It scales well with the number of nodes, recovers
from large numbers of simultaneous node failures
and joins, and answers most lookups correctly
even during recovery. - Chord might be valuable to peer-to-peer,
large-scale distributed applications such as
cooperative file sharing, time-shared available
storage systems, etc.