Peer to Peer Information Retrieval - PowerPoint PPT Presentation

About This Presentation
Title:

Peer to Peer Information Retrieval

Description:

P2PIR is one of the an application of peer to peer network. P2PIR combines key elements of File Sharing and Federal Information Retrieval. No single technique is used for all P2PIR problem. Recall and Precision are used for Evaluation of P2PIR. A field dealing with the structure, analysis, organization, storage, searching and retrieval of information is called information retrieval. And Searching in peer-to-peer networks is called Peer to Peer Information Retrieval. – PowerPoint PPT presentation

Number of Views:242
Slides: 22
Provided by: chetansun

less

Transcript and Presenter's Notes

Title: Peer to Peer Information Retrieval


1
Peer to PeerInformation Retrieval
  • By, Chetan K. Sundarde
  • _at_CHETANSUNDARDE
  • https//www.linkedin.com/in/chetansundarde

2
Outlines -
  • Peer to Peer Network
  • Information Retrieval
  • Peer to Peer Information Retrieval (P2PIR)
  • Peer to peer IR system architectures
  • Techniques used in IR in P2P networks
  • Basic algorithms used in P2PIR
  • Evaluation techniques used P2PIR
  • Challenges
  • Conclusion
  • References

3
Peer To Peer Network
  • Collection of distributed system
  • Computers leave and join the network frequently
  • Each computer acts as a server and a client
    simultaneously
  • three tasks that every peer-to-peer network
    performs
  • Searching Querying and getting list of document
    references.
  • Locating Resolve a document reference to
    concrete location - full document
  • Transferring download the document.

4
Applications of P2P
  • Information Retrieval
  • File Sharing
  • Gnutella, Napster, Bit-torrent, etc.

5
Information Retrieval -
  • A field dealing with the structure, analysis,
    organization, storage, searching and retrieval of
    information is called information retrieval
  • Search relevant documents, on the basis of user
    input

Query
IR
Document collection
Info. need
Retrieval
Answer list
6
Basic Architecture of Information Retrieval
User Interface
Text
User Need
Text Operations
Database Manager
Indexing
Query Operations
User Feedback
Searching
Index
Query
Text Database
Ranked Docs
Retrieved Docs
Ranking
7
Fields of Information Retrieval
Example of Content Example of Application Example of Task
Text Web Search Ad hoc search
Images, video Vertical Search Filtering
Scanned Document Desktop Search Question Answering
Text, Images, Audio, Video, Documents, zip files etc. Peer to Peer search P2P Information retrieval
P2PIR - P2P File Sharing and Federal IR
8
Comparison between File Sharing and Information
Retrieval
File Sharing Information Retrieval
Application Locating Searching
Index
-Content File Identifiers Document Content
-Size Small Large
Data Exchange
-Unit File Search Result
-Size Megabyte Kilobyte(small)
P2PIR- file sharing networks and federated
information retrieval
9
Peer to peer Information Retrieval (P2PIR)
  • Searching in peer-to-peer networks
  • Each peer shares its information with other peer
  • Peer searches information by sending queries to
    its peer
  • Routed to one or many other peers.
  • Query result is provide in the form of index

10
Generations of P2PIR
  • 1st generation
  • 2nd generation
  • 3rd generation

11
Peer to peer IR system architectures
  • Based on relationship between peers
  • Cooperative system
  • Uncooperative system
  • Based on the network structure
  • Centralized network
  • Structured architecture
  • Unstructured architecture
  • Based on task perform in P2P network
  • Centralized Global Index
  • Distributed Global Index
  • Strict Local Indices
  • Aggregated Local Indices

12
Peer to peer IR system architectures
  • Based on relationship between peers
  • Cooperative system
  • resource description, collection statistics and
    collection index are usually stored in the
    central place
  • Peer can use this information to help there
    search
  • Uncooperative system
  • Each peer is independent
  • Based on the network structure
  • Centralized network
  • Structured architecture
  • Unstructured architecture

13
Peer to peer IR system architectures.
  • Centralized network
  • mix of traditional client-server architecture and
    pure peer to peer architecture
  • Unstructured Architecture
  • All the peers in the system are equal.
  • They all can issue request, response to other
    request and route requests to other nodes to
    locate information.
  • Structured architecture
  • peers are grouped or clustered
  • Documents are placed not at random nodes but at
    specified location
  • use of Distributed Hash Table (DHT)

14
Peer to peer IR system architectures.
  • Based on task perform in P2P network
  • Sub-Task perform by Searching Task
  • Indexing
  • Who constructs the index? Where is stored?
  • Querying Routing
  • What path is used to send Query?
  • Query Processing
  • Which peer performs the actual query processing?
  • Four commonly used peer-to-peer architectures
  • Centralized Global Index
  • Distributed Global Index
  • Strict Local Indices
  • Aggregated Local Indices

15
Peer-to-Peer architectures used in IR
G
G
G
G
G
G
G
G
G
G
Central Global Index
Distributed Global Index
L
L
L
L
L
L
L
L
L
L
L
L
Aggregated Local Index
Strict Local Index
16
Algorithm used in P2PIR
  • Statistical IR algorithms
  • Vector Space Model (VSM)
  • Document A books on computer networks
  • Document B network routing in P2P networks
  • Query Q computer network
  • Each elements of the vector corresponds to the
    importance of the term in the document
  • Ranking of retrieved documents based Similarity
    between document vector and query vector

17
Algorithm used in P2PIR
  • Statistical IR algorithms
  • Latent Semantic Indexing (LSI)

documents
Va Vb
..
terms
  • SVD singular value decomposition
  • Reduce dimensionality
  • Discover word semantics
  • Cat lt-gt Pet
  • Bus lt-gt Travel

18
Algorithm used in P2PIR
  • Distributed Hash Table (DHT)
  • method of hash table lookup over a decentralized
    distributed network
  • Keyvalue pairs are stored in
  • Kdhash (books on computer networks)
  • Kqhash (computer network)
  • the DHT at a parent node. (Structured
    Architecture)
  • Any node in the DHT can then efficiently retrieve
    the value by providing its key.
  • Napster and BitTorrent
  • modern DHTs are CAN, Chord, etc.
  • Extend with Content-Based Search
  • Full-Text Retrieval
  • Content-Based Image Retrieval
  • Content-Based Music Retrieval ,etc.

19
P2P Information Retrieval Techniques
Unstructured
Structured
pSearch
BFS, RBFS, Eg. Gnutella
Routing Indices
Clustering
Indexing
Blind Search
Random Walk
Semantic Searching Eg. (SON)
Blind Search
Clustering
20
Evaluation in P2P IR
  • Recall (Are all the relevant documents
    retrieved?)
  • fraction of the documents that are relevant to
    the query that are successfully retrieved
  • Recall number of retrieved relevant in answer/
    total number of relevant in the collection.
  • Precision (Are the retrieved documents relevant?)
  • fraction of documents retrieved that are relevant
    to a search query
  • Precision number of retrieved relevant in
    answer/ number of retrieved Measure

retrieved relevant
Relevant
Retrieved
21
Evaluation Techniques in P2P IR
  • F-Score / F-measure
  • Harmonic mean of precision and recall.
  • Hits per Query
  • average number of distinct relevant documents
    discovered per search query.

22
Applications Of P2P Information RetrievalIn Real
World
  • YaCy (www.yacy.net)
  • local index entries are injected into a
    distributed global index
  • YaCy uses no centralized servers, but
  • The resulting decentralized web search currently
    has about 1.4 billion documents in its index and
    more than 600 peer operators contribute each
    month. About 130,000 search queries are performed
    with this network each day (Feb 2015)
  • Faroo (www.faroo.com)
  • This is a proprietary peer-to-peer search engine
    that uses a distributed global index.
  • They perform distributed crawling and ranking.
  • Faroo encrypts queries and results for privacy
    protection.
  • 2 million peers.
  • Some other P2PIR system Sixearch, ODISSEA,
    MINERVA, Seeks, etc.

23
Applications Of P2P Information RetrievalIn Real
World
  • Some other P2PIR system
  • Sixearch
  • ODISSEA
  • MINERVA
  • Seeks

24
Challenges-
  • Cross-Language Information Retrieval
  • Maintaining index freshness
  • Security features
  • Quality of service
  • Efficient use of resources
  • Increase range of peer-to-peer network

25
Conclusion -
  • P2PIR is one of the application of peer to peer
    network
  • P2PIR combines key elements of File Sharing and
    Federal Information Retrieval
  • No single technique is used for all P2PIR problem
  • Recall and Precision are used for Evaluation of
    P2PIR

26
References
  • ALMER S. TIGELAAR, DJOERD HIEMSTRA and DOLF
    TRIESCHNIGG Peer-to-Peer Information Retrieval
    University of Twente, IEEE PAPER SEPT 2012.
  • Rasanjalee Dissanayaka Mudiyanselage.
    Ontology-based Search Algorithms over Large-
    Scale Unstructured Peer-to-Peer Networks.Georgia
    State University, IEEE , OCT 2014
  • Demetrios Zeinalipour-Yazti . Information
    Retrieval in Peer-to-Peer Systems . UNIVERSITY
    OF CALIFORNIA RIVERSIDE, JUNE, IEEE 2003.
  • Chengye lu. Peer to Peer English/Chinese
    Cross-Language Information Retrieval.Queensland
    University of Technology, SEPT 2008.

27
References
  • Xiuqi Li and Jie Wu Searching Techniques in
    Peer-to-Peer Networks. Florida Atlantic
    University Boca Raton, FL 33431, 2007
  • Christos Gkantsidis, Milena Mihail, and Amin
    Saberi. Random Walks in Peer-to-Peer Networks.
    Georgia Institute of Technology, Atlanta, GA,
    2002.
  • Taoufik Yeferny, Amel Bouzeghoub and Khedija
    Arour. A QUERY LEARNING ROUTING APPROACH BASED
    ON SEMANTIC CLUSTERS.International Journal of
    Advanced Information Technology (IJAIT) Vol. 1,
    No.6, December 2011
  • Yulian YANG . Semantic Information Retrieval
    over P2P Networks.Universit de Lyon, CNRS
    INSA-Lyon, LIRIS, UMR5205, F-69621, France, 2009.

28
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com