Information Retrieval Techniques for PeertoPeer Networks - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Information Retrieval Techniques for PeertoPeer Networks

Description:

forwards a query to a subset of peers based on some aggregated statistics. ... alomost a 90% recall rate while using only 38% of the messages BFS required. ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 23
Provided by: dslabCsi
Category:

less

Transcript and Presenter's Notes

Title: Information Retrieval Techniques for PeertoPeer Networks


1
Information Retrieval Techniques for Peer-to-Peer
Networks
  • Demetrios Zeinalipour-Yazti
  • Dimitrios Gunopulos
  • Vana Kalogeraki
  • Computing in Science engineering IEEE 2004

2
Outline
  • Introduction
  • Peer-to-Peer network Information Retrieval
    techniques
  • Experiments
  • Conclusion

3
Introduction (cont.)
  • Powerful PCs shifts the client-server model
    toward a peer-to-peer(P2P) toplogy.
  • Peers collaborate in an ad hoc manner and share
    information in large-scale distributed
    environments without centralized coordination.
  • alternative model for the architecture of
    centralized Web crawlers.( JXTA)

4
Introduction (cont.)
  • P2P Information-retrieval (IR) environment we
    discuss here assumes the each peer has a database
    of documents that if shares in the network.
  • A node send Query messages contain sets of
    keywords to its peers.
  • The peer replies message contains pointer to the
    matching documents if the evaluation is
    successful.

5
Introduction
  • Traditional IR algorithms do not apply directly
    to P2P system because there is no central
    repository.
  • Improving search capabilities is an important
    step in making P2P systems applicable to a wide
    set of applications beyond simple object storage.

6
P2P Network IR Techniques (cont.)
  • Breadth-First Search (BFS)
  • used in Gnutella
  • Node q generates Query message to all its
    neighbors( peer).
  • peer p forward the query to all its peer except
    that the sender and then searches its local
    repository.
  • QueryHit messages travel along the same paths
    that carried the query messages.

7
BFS (cont.)
8
BFS
  • Too many messages
  • network utilization
  • processing resources
  • To solve by associating each query with
    time-to-live (TTL) parameter.

9
Random Breadth-First-Search (RBFS) (cont.)
  • peer q forwards search message to only a fraction
    of its peers, selected at random.
  • does not require global knowledge.
  • the query might not reach some large network
    segments because nodes dont understand that a
    particular link could take the query to that
    large segment.

10
RBFS
11
Intelligent Search Mechanism (ISM) (cont.)
  • peer propagates the query message to those peers
    more likely to reply.
  • profile mechanism profile of neighboring peers.
  • relevance rank uses profiles to select the
    relevant neighbors.
  • once profile repository is full, the node employs
    a least recently used (LRU) policy to keep the
    most recent queries.

12
ISM (cont.)
13
ISM (cont.)
  • RRpl(Pi, q) Qsim(qj, q)aS(Pi, qj).
  • exP1 replied to queries q1 and q2 with
    similarities Qsim(q1, q) 0.5and Qsim(q2, q) 0.1
    to the query q.
  • for P2, Qsim(q3, q) 0.4, Qsim(q4, q) 0.3
  • a10, 0.510 0.110 gt 0.410 0.310
  • a 1, 0.5 0.1 lt 0.4 0.3
  • a 0, 1 1 1 1 (gtRES)

14
ISM
  • potential disadvantage
  • always query the same neighbors
  • newly added neighbors are not given the
    opportunity to be explored
  • pick a small random subset of peers and add it to
    the set of most relevant peers for each query.

15
gtRES (cont.)
  • forwards a query to a subset of peers based on
    some aggregated statistics.
  • a peer q forwards a search message to k peers,
    which had returned the most results for the last
    m queries

16
gtRES
17
P2P Network IR Techniques (cont.)
  • Random-Walker Searches
  • each node randomly forwards a query message,
    called a walker, to one of its peers.
  • messages increase linearly
  • adaptive probabilistic search (APS) feedback
    from previous search ( rather than randomly
    forward)
  • Local Routing indices
  • a node knows which peers lead to the desirable
    documents, but it doesnt know the exact path to
    those documents.
  • push update( each node sends information to its
    peers)
  • (It completes the ISM approach)

18
P2P Network IR Techniques
  • randomized gossiping
  • bad in scalability
  • Centralized Approaches
  • Central repository for all peers index of shared
    documents.
  • Searching Object Identifiers
  • use object identifiers (hashcode on the name of
    file) rather then keywords
  • but cant capture the relevance of the documents.
  • Distributed IR
  • assume that the querying party has some
    statistical knowledge about each databases
    contents ? need global view of the system.

19
PeerWare Infrastructure and Experiments (cont.)
  • Evaluation metrics
  • To implement only algorithms that requires local
    knowledge BFS, RBFS, gtRES, and ISM
  • recall rate the fraction of documents each of
    the search mechanisms retrieves
  • technique efficiency of messages needed to
    find the results

20
Experiment( cont.)
TTL 4 ( BFS, ISM) ISM achieved alomost a 90
recall rate while using only 38 of the messages
BFS required. Started out with a low recall rate
gtRES ISM initially choose their neighbors.
21
Experiment
  • TTL 5
  • ISM uses only 57 of message but discovers almost
    all documents.
  • (ISM, RBFS, and gtRES had the same query response
    time (QRT) 250ms)

22
Conclusion
  • P2P computing model distributes the incurred
    network load on all interested parties
  • Efficient search and retrieval still remains a
    challenge to be explored.
  • Another challenge is to create overlay topologies
    in which close-by nodes have semantically
    related documents and interests.
  • To test our middleware infrastructure publicly
    available and test it in a larger and more
    realistic environment.
Write a Comment
User Comments (0)
About PowerShow.com