Peer-to-Peer Systems: Theory - PowerPoint PPT Presentation

About This Presentation
Title:

Peer-to-Peer Systems: Theory

Description:

... to Peer, file sharing system ... File Sharing. We know it's everywhere! Characteristics completely ... locality in file-sharing workload. significant ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:5.0/5.0
Slides: 38
Provided by: Ash8
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Peer-to-Peer Systems: Theory


1
Peer-to-Peer SystemsTheory Practice
  • Ashwin R. Bharambe
  • 15744 Lecture

2
Overview
  • Internet Indirection Infrastructure (i3)
  • Freenet
  • BitTorrent
  • Content distribution
  • Effect of P2P networks on the Internet
  • How does the new traffic matrix look like?

3
i3 Motivation
  • Todays Internet based on point-to-point
    abstraction
  • Applications need more
  • Multicast
  • Mobility
  • Anycast
  • Existing solutions
  • Change IP layer
  • Overlays

So, whats the problem? A different solution for
each service
4
The i3 solution
Every problem in CS ?
Indirection
Only primitive needed
  • Solution
  • Add an indirection layer on top of IP
  • Implement using overlay networks
  • Solution Components
  • Naming using identifiers
  • Subscriptions using triggers
  • DHT as the gluing substrate

5
i3 Rendezvous Communication
  • Packets addressed to identifiers (names)
  • Trigger(Identifier, IP address) inserted by
    receiver

Sender
Receiver (R)
Senders decoupled from receivers
6
i3 Service Model
  • API
  • sendPacket(id, p)
  • insertTrigger(id, addr)
  • removeTrigger(id, addr) // optional
  • Best-effort service model (like IP)
  • Triggers periodically refreshed by end-hosts
  • Reliability, congestion control, and flow-control
    implemented at end-hosts

7
i3 Implementation
  • Use a Distributed Hash Table
  • Scalable, self-organizing, robust
  • Suitable as a substrate for the Internet

IP.route(R)
Sender
Receiver (R)
DHT.put(id)
DHT.put(id)
8
Mobility and Multicast
send to many
  • Mobility supported naturally
  • End-host inserts trigger with new IP address, and
    everything transparent to sender
  • Robust, and supports location privacy
  • Multicast
  • All receivers insert triggers under same ID
  • Sender uses that ID for sending
  • Can optimize tree construction to balance load

9
Anycast
send to any one
  • Generalized matching
  • First k-bits have to match, longest prefix match
    among rest

Triggers
(R1)
a
b1
(R2)
a
b2
a
b
Sender
a
b3
(R3)
  • Related triggers must be on same server
  • Server selection (randomize last bits)

10
Generalization Identifier Stack
  • Stack of identifiers
  • i3 routes packet through these identifiers
  • Receivers
  • trigger maps id to ltstack of idsgt
  • Sender can also specify id-stack in packet
  • Mechanism
  • first id used to match trigger
  • rest added to the RHS of trigger
  • recursively continued

11
Service Composition
  • Receiver mediated R sets up chain and passes
    id_gif/jpg to sender sender oblivious
  • Sender-mediated S can include (id_gif/jpg, ID)
    in his packet receiver oblivious

S_GIF/JPG
Receiver R (JPG)
Sender (GIF)
ID
R
ID_GIF/JPG
S_GIF/JPG
12
Public, Private Triggers
  • Servers publish their public ids e.g., via DNS
  • Clients contact server using public ids, and
    negotiate private ids used thereafter
  • Useful
  • Efficiency -- private ids chosen on close-by
    i3-servers
  • Security -- private ids are shared-secrets

13
Scalable Multicast
  • Replication possible at any i3-server in the
    infrastructure.
  • Tree construction can be done internally

(g, data)
g R2
g R1
g x
R2
x R4
x R3
R1
R3
R4
14
Evaluation
  • Efficiency
  • Metric Latency stretch
  • Sender ? i3 takes many hops
  • Sender ? i3 ? Receiver triangle routing
  • Heuristics reduce stretch to about 1.5
  • Good enough?
  • Performance
  • Overheads
  • lookups in hash table, finger table
  • What speeds can this support?
  • Indirection layer over IP
  • Decoupling of senders and receivers
  • One framework for various new abstractions
  • Scalable, incrementally deployable
  • Efficiency?

15
Switch tracks
I dont understand any DHT stuff its all
unreal All I understand is FILE SHARING
16
P2P Applications
  • Centralized model
  • e.g., Napster
  • global index held by central authority
  • direct contact between requestors and providers

Index server
NAPSTER
17
P2P Applications
  • Decentralized model
  • e.g., Freenet, Gnutella
  • no global index local knowledge only
  • (approximate answers)
  • contact mediated by chain of intermediaries

Index servers
FREENET or GNUTELLA
KAZAA
18
What is Freenet and Why?
  • Distributed, Peer to Peer, file sharing system
  • Completely anonymous, for producers or consumers
    of information
  • Resistance to attempts by third parties to deny
    access to information

19
Freenet How it works
  • Data structure
  • Key Management
  • Problems
  • How can one node know about others
  • How can it get data from remote nodes
  • How to add new nodes to Freenet
  • How does Freenet manage its data

20
Data structure
  • Each document is associated with a key
  • Routing Table
  • ltaddress, key valuegt pairs
  • Data Structure should be able to
  • rapidly find the document given a certain key
  • rapidly find the closest key to a given key
  • keep track the popularity of documents and know
    which document to delete when under pressure

21
Key Management(1)
  • A way to locate a document anywhere
  • Keys are used to form a URI
  • Keyword-signed Key(KSK)
  • Based on a short descriptive string, usually a
    set of keywords that can describe the document
  • Example University/cmu/cs/ashu
  • Uniquely identify a document
  • Potential problem global namespace

22
Key Management (2)
  • Signed-subspace Key (SSK)
  • Add sender information to avoid namespace
    conflict
  • Private key to sign / public key to verify
  • Content-hash Key(CHK)
  • Hash of the document

23
Sorry, No
  • Forward to nearest untried key
  • Perform a depth-first search

C
B
A
  • On success, return data to
  • upstream requestor
  • Cache the data source

D
A, Help me!
I
24
Routing algorithm characteristics
Data partitioning
  • Key clustering
  • Nodes know about keys similar to theirs
  • Store clusters of files with same keys
  • Popular data gets cached more
  • Seamless replication to avoid hot-spots
  • As time progresses, connectivity increases

25
File insertion
  • Query the file key
  • A response ? key collision
  • Re-send with a different key
  • On success, nodes cache the file with a pointer
    to the data source

26
Node join
  • Need to assign a key to the node
  • Two options
  • Existing node chooses the key
  • Joining node chooses its key
  • Whats the problem?
  • Uses a bit commitment protocol
  • hash(a) ? hash(b hash(a))
  • ? hash(c hash(b hash(a)))

27
Anonymity
  • Sender remains anonymous
  • Data sources are randomly modified as packet
    traverses
  • Use pre-routing with mix-nets to enhance
  • Receiver (or key) anonymity
  • mix-nets

28
Network convergence
  • X-axis time
  • Y-axis of pathlength
  • 1000 Nodes, 50 items datastore, 250 entries
    routing table
  • the routing tables were initialized to
    ring-lattice topology
  • Pathlength the number of hops actually taken
    before finding the data.

29
Scalability
  • X-axis of nodes
  • Y-axis of pathlength
  • The relation between network size and average
    pathlenth.
  • Initially, 20 nodes. Add nodes regularly.

30
Small world Model
  • X-axis of links
  • Y-axis fraction of nodes
  • (log-scale)
  • Most of nodes have only few connections while a
    small number of news have large set of
    connections.

WHY?
Power law
31
Whats good?
  • Distributed storage and retrieval
  • Anonymity
  • Adaptive replication
  • based on usage patterns

Anything else?
32
Is it perfect?
  • Query path-length
  • Not bounded
  • Difficult to know the cause of search failures
  • Document did not exist?
  • Could not find it?

Anything else?
33
Switch tracks
How does file sharing change the Internet?
34
File Sharing
  • We know its everywhere!
  • Characteristics completely different from the web
  • How does this change the Internet as we know?
  • Traffic patterns

35
Users are patient
batch mode delivery!
36
Aging!
  • Older clients request less
  • Chance of using a client remains same
  • Amount of information requested decreases

37
Audio-Video
Small objects ? audio Large objects ? video
38
Object Dynamics
  • Fetch-at-most-once
  • Short-lived popularity
  • Recently born objects most popular
  • Most requests are for old objects

39
File sharing not Zipf!
40
Conclusions
  • Many other interesting aspects
  • Some obvious, some not
  • Contribution
  • Fetch-at-most-once
  • Also, study of locality in file-sharing workload
  • significant locality
  • substantial opportunity for caching

P2P systems are here to stay!
Write a Comment
User Comments (0)
About PowerShow.com