Title: PlanetP: Using Gossiping to Build Content Addressable PeertoPeer Information Sharing Communities
1PlanetP Using Gossiping to Build Content
Addressable Peer-to-Peer Information Sharing
Communities
2Outline
- Introduction
- Architecture
- Performance
- Conclusion and Future Work
3Introduction
- PlanetP is a content addressable
publish/subscribe service for unstructured
peer-to-peer(P2P) communities. - PlanetP is composed of two components
- an infrastructural gossiping layer
- an approximation of a state-of-the-art text-based
search and rank algorithm. - membership directory.
- an extremely compact content index. (Bloom
Filter)
4Outline
- Introduction
- Architecture
- Gossiping
- Content Search and Retrieval
- Document Locating
- Local Index
- Global Index
- Ranking algorithm
- Performance
- Conclusion and Future Work
5Gossiping
- Demers algorithm
- Rumor
- Periodically push change to a randomly chosen
peer - Anti-entropy
- Pull data structure summary from a randomly
chosen peer
6Gossiping (Cont.)
- Extended Gossiping Algorithm
- Rumor
- Anti-entropy
- High dynamic, less accurate
- Partial anti-entropy
- Advantages
- Reduce cost
- Reduce the propagation time
-
7Document Locating
- Uses the global index to derive the set of peers
that have these terms. - t p
- Forwards the query to these peers and asks them
to return URLs for any documents that are
relevant to the query - t D
8Local Index (Bloom Filter)
- Bloom filter is an array of bits used to
represent a set of strings in our case, the set
of terms in the peers local index.
Autonomic cat
Hash Functions
1
1
1
1
1
1
1
XML snippet (inversed index)
Yes
No
D1
9Advantages of Bloom Filter
- It is extremely high compacted
- Update of a peers bloom filter could be spread
to the whole community in constant time,
regardless how many changes it includes - The cost of replicating the global index can be
reduced by simply decreasing the gossiping rate,
a tradeoff between propagation time and bandwidth
usage
10Global Index
- The global index is replicated on every peer in
the community by gossiping - The global index maintains all of the information
about the documents in the community. - (peer IP, peer status, bloom filter)
11Content Ranking Algorithm
- Ranking Document (TFxIDF)
- For a query, the rank of a document is defined as
the similarity between them - The weight of a term for a document is determined
by its appear frequency in this document - The weight of a term for a query follows the
inverse of how often it shows up in the entire
collection (IDF)
12Content Ranking Algorithm (Cont.)
- Document-Query Similarity
- Question
- How to get the number of documents and the number
of times that a specific term appears in the
collection? - Answer
- We need an approximating algorithm
-
13Content Ranking Algorithm (Cont.)
- Mapping terms to peers
- Ranking Peers
- Inverse Peer Frequency (IPF)
- Peer rank for a query
- Select Peers (Q,k)
- Ranking peers for Q
- Contact peers from top to bottom of the ranks
- Each contacted peers return the relevant
documents with the query-document similarity
rank. - Stop contacting when p consecutive peers fail to
contribute document to the top k ranked documents
14Outline
- Introduction
- Architecture
- Performance
- Search Efficacy
- Storage Cost
- Gossiping Performance
- Conclusion and Future Work
15Search Efficiency
- Two accepted information retrieval metrics
- Recall (R)
- The capability of getting the relevant documents
- Precision (P)
- The capability of getting the correct documents
- The characteristic of the collections used
- Number documents
- Number unique terms
- Number of queries
- Documents-to-peers distribution
- Uniform
- Weibull
16Evaluation Result
- PlanetP has very close performance to the
centralized implementation. - The performance of PlanetP is independent from
how the documents are distributed in the
community. - Reducing the percentage of unique term indices
dos not sacrifice the performance much.
17Evaluation Results (Cont.)
- PlanetPs adaptive stopping heuristic is critical
to its performance. (How to determine p) - Dynamic heuristic allows the search to contact
more peers when documents are widely distributed.
18Storage Cost
- PlanetP could be easily scaled to several
thousand peers
19Gossiping Performance
- The reliability of PlanetPs gossiping algorithms
- Whether the change can be propagated to all
on-line nodes or not? - What kind of changes will be gossiped?
- changes in a Bloom filter
- joining of a new member (peer)
- rejoin of a previous off-line member (peer)
- the leaving of a peer will not be gossiped
20Gossiping of Bloom Filter Changes
- Simulates the addition of 1000 new terms to some
peers - This is a not a small change to the simulation
collection
21Gossiping of Bloom Filter Changes (Cont.)
- Gossiping new information is very scalable
- The way to decrease the communication overhead is
very simple - PlanetP gossiping significantly overperforms
ones that only use anti-entropy.
22Joint of New Members (worst case)
- Joining process is much more bandwidth intensive
than Bloom filter changes gossiping - PlanetP does quite well in DSL or higher
bandwidth network - PlanetP requires modification for mixed network
with lower bandwidth
23Modified Gossiping Algorithms
- The members with lower network bandwidth could be
the bottleneck of the community
24Dynamic Operation (Common Case)
- Gossiping with partial anti-entropy performs
better than the one only with anti-entropy
25Dynamic Operation (Cont.)
- By applying the modified gossiping algorithm for
a mixed network, the fast peers of the community
do not lose much performance. - The normal case does not need much bandwidth
26- Introduction
- Architecture
- Performance
- Conclusion and Future Work
27Conclusion
- PlanetP is a powerful P2P publish/subscribe
information sharing infrastructure - PlanetP is able to robustly disseminate the new
information throughout the community. - PlanetPs extreme compact global index does not
effect its ranking accuracy and performance - PlanetPs does not require high cost in a scale
of several thousand peers.
28Future Work
- What is the performance of PlanetP in community
with the scale of millions or billions of peers?
29Question?