PlanetP: Using Gossiping to Build Content Addressable PeertoPeer Information Sharing Communities - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

PlanetP: Using Gossiping to Build Content Addressable PeertoPeer Information Sharing Communities

Description:

Local Index (Bloom Filter) ... Update of a peer's bloom filter could be spread to the whole community in ... intensive than Bloom filter changes gossiping ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 30
Provided by: cse92
Category:

less

Transcript and Presenter's Notes

Title: PlanetP: Using Gossiping to Build Content Addressable PeertoPeer Information Sharing Communities


1
PlanetP Using Gossiping to Build Content
Addressable Peer-to-Peer Information Sharing
Communities
  • Jianzhi Wang

2
Outline
  • Introduction
  • Architecture
  • Performance
  • Conclusion and Future Work

3
Introduction
  • PlanetP is a content addressable
    publish/subscribe service for unstructured
    peer-to-peer(P2P) communities.
  • PlanetP is composed of two components
  • an infrastructural gossiping layer
  • an approximation of a state-of-the-art text-based
    search and rank algorithm.
  • membership directory.
  • an extremely compact content index. (Bloom
    Filter)

4
Outline
  • Introduction
  • Architecture
  • Gossiping
  • Content Search and Retrieval
  • Document Locating
  • Local Index
  • Global Index
  • Ranking algorithm
  • Performance
  • Conclusion and Future Work

5
Gossiping
  • Demers algorithm
  • Rumor
  • Periodically push change to a randomly chosen
    peer
  • Anti-entropy
  • Pull data structure summary from a randomly
    chosen peer

6
Gossiping (Cont.)
  • Extended Gossiping Algorithm
  • Rumor
  • Anti-entropy
  • High dynamic, less accurate
  • Partial anti-entropy
  • Advantages
  • Reduce cost
  • Reduce the propagation time

7
Document Locating
  • Uses the global index to derive the set of peers
    that have these terms.
  • t p
  • Forwards the query to these peers and asks them
    to return URLs for any documents that are
    relevant to the query
  • t D

8
Local Index (Bloom Filter)
  • Bloom filter is an array of bits used to
    represent a set of strings in our case, the set
    of terms in the peers local index.

Autonomic cat
Hash Functions

1
1
1
1
1
1
1
XML snippet (inversed index)
Yes
No
D1
9
Advantages of Bloom Filter
  • It is extremely high compacted
  • Update of a peers bloom filter could be spread
    to the whole community in constant time,
    regardless how many changes it includes
  • The cost of replicating the global index can be
    reduced by simply decreasing the gossiping rate,
    a tradeoff between propagation time and bandwidth
    usage

10
Global Index
  • The global index is replicated on every peer in
    the community by gossiping
  • The global index maintains all of the information
    about the documents in the community.
  • (peer IP, peer status, bloom filter)

11
Content Ranking Algorithm
  • Ranking Document (TFxIDF)
  • For a query, the rank of a document is defined as
    the similarity between them
  • The weight of a term for a document is determined
    by its appear frequency in this document
  • The weight of a term for a query follows the
    inverse of how often it shows up in the entire
    collection (IDF)

12
Content Ranking Algorithm (Cont.)
  • Document-Query Similarity
  • Question
  • How to get the number of documents and the number
    of times that a specific term appears in the
    collection?
  • Answer
  • We need an approximating algorithm

13
Content Ranking Algorithm (Cont.)
  • Mapping terms to peers
  • Ranking Peers
  • Inverse Peer Frequency (IPF)
  • Peer rank for a query
  • Select Peers (Q,k)
  • Ranking peers for Q
  • Contact peers from top to bottom of the ranks
  • Each contacted peers return the relevant
    documents with the query-document similarity
    rank.
  • Stop contacting when p consecutive peers fail to
    contribute document to the top k ranked documents

14
Outline
  • Introduction
  • Architecture
  • Performance
  • Search Efficacy
  • Storage Cost
  • Gossiping Performance
  • Conclusion and Future Work

15
Search Efficiency
  • Two accepted information retrieval metrics
  • Recall (R)
  • The capability of getting the relevant documents
  • Precision (P)
  • The capability of getting the correct documents
  • The characteristic of the collections used
  • Number documents
  • Number unique terms
  • Number of queries
  • Documents-to-peers distribution
  • Uniform
  • Weibull

16
Evaluation Result
  • PlanetP has very close performance to the
    centralized implementation.
  • The performance of PlanetP is independent from
    how the documents are distributed in the
    community.
  • Reducing the percentage of unique term indices
    dos not sacrifice the performance much.

17
Evaluation Results (Cont.)
  • PlanetPs adaptive stopping heuristic is critical
    to its performance. (How to determine p)
  • Dynamic heuristic allows the search to contact
    more peers when documents are widely distributed.

18
Storage Cost
  • PlanetP could be easily scaled to several
    thousand peers

19
Gossiping Performance
  • The reliability of PlanetPs gossiping algorithms
  • Whether the change can be propagated to all
    on-line nodes or not?
  • What kind of changes will be gossiped?
  • changes in a Bloom filter
  • joining of a new member (peer)
  • rejoin of a previous off-line member (peer)
  • the leaving of a peer will not be gossiped

20
Gossiping of Bloom Filter Changes
  • Simulates the addition of 1000 new terms to some
    peers
  • This is a not a small change to the simulation
    collection

21
Gossiping of Bloom Filter Changes (Cont.)
  • Gossiping new information is very scalable
  • The way to decrease the communication overhead is
    very simple
  • PlanetP gossiping significantly overperforms
    ones that only use anti-entropy.

22
Joint of New Members (worst case)
  • Joining process is much more bandwidth intensive
    than Bloom filter changes gossiping
  • PlanetP does quite well in DSL or higher
    bandwidth network
  • PlanetP requires modification for mixed network
    with lower bandwidth

23
Modified Gossiping Algorithms
  • The members with lower network bandwidth could be
    the bottleneck of the community

24
Dynamic Operation (Common Case)
  • Gossiping with partial anti-entropy performs
    better than the one only with anti-entropy

25
Dynamic Operation (Cont.)
  • By applying the modified gossiping algorithm for
    a mixed network, the fast peers of the community
    do not lose much performance.
  • The normal case does not need much bandwidth

26
  • Introduction
  • Architecture
  • Performance
  • Conclusion and Future Work

27
Conclusion
  • PlanetP is a powerful P2P publish/subscribe
    information sharing infrastructure
  • PlanetP is able to robustly disseminate the new
    information throughout the community.
  • PlanetPs extreme compact global index does not
    effect its ranking accuracy and performance
  • PlanetPs does not require high cost in a scale
    of several thousand peers.

28
Future Work
  • What is the performance of PlanetP in community
    with the scale of millions or billions of peers?

29
Question?
Write a Comment
User Comments (0)
About PowerShow.com