Computer networks - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Computer networks

Description:

A client contacts the tracker identified in the .torrent file (using HTTP) ... Client uses hash from .torrent to confirm that segment is legitimate ... – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 30
Provided by: youngh5
Category:

less

Transcript and Presenter's Notes

Title: Computer networks


1
Computer networks
  • Lecture 11 P2P, Semantic P2P
  • Prof. Younghee Lee

2
Peer-to-Peer?
  • Centralized server
  • Distributed server
  • Client server paradigm
  • Plat RPC
  • Hierarchical DNS, mount
  • Peer to Peer paradigm
  • both a client and a transient server Easy part
  • How a peer determines which peers have the
    desired content?
  • Connected peers that have copies of the desired
    object. Difficult part
  • Dynamic member list makes it more difficult
  • Pure Gnutella, Chord
  • Hybrid Napster, Groove
  • Other challenges
  • Scalability up to hundred of thousands or
    millions of machines
  • Dynamicity machines can come and go any time

3
P2P file sharing
  • Napster
  • Centralized, sophisticated search
  • C-S search
  • Point to point file transfer
  • Gnutella
  • Decentralized directory
  • Flooding, TTL, unreachable nodes
  • FastTrack (KaZaA)
  • Heterogeneous peers
  • Freenet
  • Anonymity, caching, replication

4
Routing Structured Approaches
  • Goal make sure that an item (file) identified is
    always found in a reasonable of steps
  • Abstraction a distributed hash-table (DHT) data
    structure
  • insert(id, item)
  • item query(id)
  • Note item can be anything a data object,
    document, file, pointer to a file
  • Proposals
  • CAN (ICIR/Berkeley)
  • Chord (MIT/Berkeley)
  • Pastry (Rice)
  • Tapestry (Berkeley)

5
High level idea Indirection
  • Indirection in space
  • Logical IDs (Content based)
  • Routing to those IDs
  • Content addressable network
  • Tolerant of nodes joining and leaving the
    network
  • Indirection in time
  • Scheme to temporally decouple send and receive
  • Soft state
  • publisher requests TTL on storage
  • Distributed Hash Table

6
Distributed Hash Table (DHT)
  • Hash table
  • Data structure that maps keys to values
  • DHT
  • Hash table
  • but spread across the Internet
  • Interface
  • insert(key, value)
  • lookup(key)
  • Every DHT node supports a single operation
  • Given key as input route messages toward node
    holding key

7
DHT in action put()
(K1,V1)
insert(K1,V1)
Operation take key as input route messages to
node holding key
8
DHT in action get()
retrieve (K1)
Operation take key as input route messages to
node holding key
9
Routing Chord
  • Associate to each node and item a unique id in an
    uni-dimensional space
  • Goals
  • Scales to hundreds of thousands of nodes
  • Handles rapid arrival and failure of nodes
  • Properties
  • Routing table size O(log(N)) , where N is the
    total number of nodes
  • Guarantees that a file is found in O(log(N)) steps

10
AsideConsistent Hashing Karger97
  • A key is stored at its successor node with next
    higher ID
  • This is designed to let nodes enter and leave the
    network with minimal disruption

11
Routing Chord Basic Lookup
12
Routing Finger table - Faster Lookups
13
Routing join operation
14
Routing Chord Summary
  • Assume identifier space is 02m
  • Each node maintains
  • Finger table
  • Entry i in the finger table of n is the first
    node that succeeds or equals n 2i
  • Predecessor node
  • An item identified by id is stored on the
    successor node of id
  • Pastry
  • Similar to Chord

15
CAN
  • Associate to each node and item a unique id in an
    d-dimensional space
  • Virtual Cartesian coordinate space
  • Entire space is partitioned amongst all the nodes
  • Every node owns a zone in the overall space
  • Abstraction
  • Can store data at points in the space
  • Can route from one point to another
  • Point node that owns the enclosing zone
  • Properties
  • Routing table size O(d)
  • Guarantees that a file is found in at most dn1/d
    steps, where n is the total number of nodes

16
CAN E.g. Two Dimensional Space
  • Space divided between nodes
  • All nodes cover the entire space
  • Each node covers either a square or a rectangular
    area of ratios 12 or 21
  • Example
  • Assume space size (8 x 8)
  • Node n1(1, 2) first node that joins ? cover the
    entire space

7
6
5
4
3
n1
2
1
0
2
3
4
5
6
7
0
1
17
CAN E.g. Two Dimensional Space
  • Node n2(4, 2) joins ? space is divided between
    n1 and n2

7
6
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
18
CAN E.g. Two Dimensional Space
  • Node n2(4, 2) joins ? space is divided between
    n1 and n2

7
6
n3
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
19
CAN E.g. Two Dimensional Space
  • Nodes n4(5, 5) and n5(6,6) join

7
6
n5
n4
n3
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
20
CAN E.g. Two Dimensional Space
  • Nodes n1(1, 2) n2(4,2) n3(3, 5)
    n4(5,5)n5(6,6)
  • Items f1(2,3) f2(5,1) f3(2,1) f4(7,5)

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
21
CAN E.g. Two Dimensional Space
  • Each item is stored by the node who owns its
    mapping in the space

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
22
CAN Query Example
  • Each node knows its neighbors in the d-space
  • Forward query to the neighbor that is closest to
    the query id
  • Example assume n1 queries f4

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
23
Routing Concerns/optimization
  • Each hop in a routing-based P2P network can be
    expensive
  • No correlation between neighbors and their
    location
  • A query can repeatedly jump from Europe to North
    America, though both the initiator and the node
    that store the item are in Europe!
  • Solutions Tapestry takes care of this
    implicitly CAN and Chord maintain multiple
    copies for each entry in their routing tables and
    choose the closest in terms of network distance
  • CAN/Chord Optimizations
  • Weight neighbor nodes by RTT
  • When routing, choose neighbor who is closer to
    destination with lowest RTT from me
  • Reduces path latency
  • Multiple physical nodes per virtual node
  • Reduces path length (fewer virtual nodes)
  • Reduces path latency (can choose physical node
    from virtual node with lowest RTT)
  • Improved fault tolerance (only one node per zone
    needs to survive to allow routing through the
    zone)
  • What type of lookups?
  • Only exact match!

24
BitTorent
  • A p2p file sharing system
  • Load sharing through file splitting
  • Uses bandwidth of peers instead of a server
  • Successfully used
  • Used to distribute RedHat 9 ISOs (about 80TB)
  • Setup
  • A seed node has the file
  • File is split into fixed-size segments (256KB
    typ)
  • Hash calculated for each segment
  • A tracker node is associated with the file
  • A .torrent meta-file is built for the file
    identifies the address of the tracker node
  • The .torrent file is passed around the web

25
BitTorent
  • Download
  • A client contacts the tracker identified in the
    .torrent file (using HTTP)
  • Tracker sends client a (random) list of peers who
    have/are downloading the file
  • Client contacts peers on list to see which
    segments of the file they have
  • Client requests segments from peers (via TCP)
  • Client uses hash from .torrent to confirm that
    segment is legitimate
  • Client reports to other peers on the list that it
    has the segment
  • Other peers start to contact client to get the
    segment (while client is getting other segments)

26
Conclusions
  • Distributed Hash Tables are a key component of
    scalable and robust overlay networks
  • CAN O(d) state, O(dn1/d) distance
  • Chord O(log n) state, O(log n) distance
  • Both can achieve stretch
  • Simplicity is key
  • Services built on top of distributed hash tables
  • p2p file storage, i3 (chord)
  • multicast (CAN, Tapestry)
  • persistent storage (OceanStore using Tapestry)

27
Semantic based p2p systems 1
  • P2P systems no notion of semantics difficulty
    in knowledge sharing
  • Semantic web
  • knowledge sharing among different nodes with
    possibly different schemas. use centralized
    repository
  • Combining P2P and Semantic system large scale
    collection of structured data
  • Now
  • Several efforts to such directions
  • no efficient scalable semantic P2P system yet
    important area of research

28
Semantic based p2p systems 1
  • Efforts to this direction 3 categories
  • Addressing Storing and querying of RDF stores
  • Resource Description Framework (RDF) a general
    method of modeling information through a variety
    of syntax formats.
  • Addressing issues of semantic interoperability in
    a peering setting
  • Attempts to build scalable infrastructures for
    semantic systems
  • GridVine semantic interoperability scalability
    issues
  • Benchmarking system on this now)

29
Various approaches
  • Addressing Storing and querying of RDF stores
  • RDF data in distributed environment
  • Mediator stores a central schema and does query
    reformulation
  • Different mediator architectures Hierarchical
    Mediator architecture (HMA)
  • Semantic Query Routing in Unstructured Networks
    using social metaphors
  • Human being may observe the communication between
    others and knows who may be able to answer the
    queries
  • Expertise based peer selection
  • Addressing issues of semantic interoperability in
    a peering setting
  • Semantic interoperability problem by focusing on
    the (pairwise) mapping between dynamic schemas.
  • Semantic Mapping by Approximation
  • An approximation value that indicates how
    strongly a concept is a subconcept of the second
    is calculated for each pair of concepts.
  • Satisfying Ontology Mapping satisfying decision
    maker 3

30
Various approaches
  • Attempts to build scalable infrastructures for
    semantic systems
  • Active XML
  • dynamic XML documents over web services for
    distributed data integration
  • Edutella
  • attempts to design and implement a schema based
    P2P infrastructure for the semantic web. It
    focuses on the exchange of learning material.
  • Piazza
  • a peer data management system that facilitates
    decentralized sharing of heterogeneous data
  • PIER
  • P2P Information Exchange and Retrieval (PIER) a
    P2P query engine for query processing in Internet
    scale distributed systems.
  • PeerDB
  • an object management system that provides
    sophisticated searching capabilities.
  • GridVine 4
  • the first attempt at using a structured overlay
    network (namely P-Grid) to realize semantic
    overlays
  • It realizes semantic overlays by separating a
    logical layer from a physical layer, applying the
    well known database principle of data
    independence.

31
Various approaches
  • Semantic-based Peer- to-Peer systems
  • SWAP Project (Semantic Web and Peer-to-Peer).
  • X-Leges System. This is for Legislative document
    exchange.
  • Semantic based P2P System for local e-Government
    Madrid university
  • Many e-government within semantic web P2P
  • DIP projects, IFIP WG8.5, . e-government
  • Semantic Grid Resource Discovery using DHTs in
    Atlas Athens university
  • Resource discovery in semantic Grid RDF based
    query answering on top of P2P networks
  • A semantic web service based P2P infrastructure
    for the interoperability of medical information
    system METU Turkey

32
Semantic Overlay Networks in P2P2
  • Hash-based queries scale, but not semantic
  • How about forwarding queries to peer that are
    more likely to have what you find
  • Ontological structure
  • Query is routed within each relevant cluster only
  • Reduce flood messages (comparing the case where
    you search the contents by flooding)

33
Semantic Overlay Networks in P2P
Semantic Overlay Network
  • Overlay network associated with a concept of
    classification hierarchy

34
Semantic Overlay Networks in P2P
  • Generating semantic overlay network
  • Less nodes per SON more results sooner
  • Less SONs per node less connections
  • Coverage?

35
Semantic Overlay Networks in P2P
  • More SONs per node high coverage
  • too many link many query messages
  • Layered SONs
  • Choosing SONs to join
  • c1, c2, c9(for c3, c4), c12(for c5, c6 .c8)

Hierarchy of concepts
15
36
Semantic Overlay Networks in P2P
  • Problems?

37
References
  • Vijay Srinivas Agneeswaran, A Survey of Semantic
    Based Peer-to-Peer Systems, Distributed
    Information Systems Lab (LSIR) Ecole
    Polytechnique Federale de Lausanne.
  • Arturo Cresco et al., Semantic Overlay Networks
    for P2P systems, Google Technologies Inc.,
    Stanford University
  • Marc Erhig and Steffen Staab. Satisficing
    Ontology Mapping. In Steffen Staab and Heiner
    Stuckenschmidt, editors, Semantic Web and
    Peer-to-Peer. Springer-Verlag, Berlin Heidelberg,
    2006.
  • Karl Aberer, Philippe Cudre-Mauroux, Manfred
    Hauswirth, and Tim Van Pelt. GridVine Building
    Internet-Scale Semantic Overlay Networks. In
    Sheila A. McIlraith, Dimitris Plexousakis, and
    Frank van Harmelen, editors, International
    Semantic Web Conference, volume 3298 of Lecture
    Notes in Computer Science, pages 107121.
    Springer, 2004.

2009-07-15
37
ICE0602 Ubiquitous Networking
Write a Comment
User Comments (0)
About PowerShow.com