Distributed Content Based Visual Information Retrieval System On Peer To Peer Network - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed Content Based Visual Information Retrieval System On Peer To Peer Network

Description:

... system is compatible with Gnutella (v0.4) Protocol. Content Based Information ... Conclusion. We saw implementation of CBIR system over current Gnutella network ... – PowerPoint PPT presentation

Number of Views:165
Avg rating:3.0/5.0
Slides: 43
Provided by: chrisv8
Learn more at: https://crystal.uta.edu
Category:

less

Transcript and Presenter's Notes

Title: Distributed Content Based Visual Information Retrieval System On Peer To Peer Network


1
Distributed Content Based Visual Information
Retrieval System On Peer To Peer Network
by SAMEER ABROL
Source ACM Transactions on Information
Systems (TOIS) Volume 22 ,  Issue 3  (July
2004) Pages 477 - 501   Year of
Publication 2004 ISSN1046-8188 Authors
Irwin King  The Chinese University of Hong
Kong, Shatin, Hong Kong Cheuk
Hang Ng  The Chinese University of Hong Kong,
Shatin, Hong Kong Ka Cheung
Sia  The Chinese University of Hong Kong, Shatin,
Hong Kong Publisher ACM Press   New York, NY,
USA
2
Discussion Outline
  • Introduction
  • Background
  • Contribution
  • Clustering Of P2P Network
  • Content Based Query Routing
  • Architecture And Design Of DISCOVIR
  • Experimental Analysis
  • Conclusion
  • Questions

Content Based Information Retrieval

3
Introduction
  • Peer-To-Peer Applications (e.g. Gnutella) have
    demonstrated the significance of distributed
    information sharing systems.
  • Peer-To-Peer Network offers a completely
    decentralized and distributed paradigm.
  • Currently, most content-based image retrieval
    (CBIR) systems are based on the centralized
    computing model.
  • P2P offer advantages of decentralization by
    distributing
  • Storage
  • Information
  • Computation Cost 
  • Because of these desirable qualities,  many
    research projects have been focused on
  • Designing different P2P system
  • Improving their performance


Content Based Information Retrieval
4
Discussion Outline
  • Introduction
  • Background
  • Contribution
  • Clustering Of P2P Network
  • Content Based Query Routing
  • Architecture And Design Of DISCOVIR
  • Experimental Analysis
  • Conclusion
  • Questions

Content Based Information Retrieval

5
Peer-To-Peer Networks
  • Unlike client-server architecture Individual
    computers connect directly with each other
    without using dedicated servers
  • Each computer acts as a server and a client
    simultaneously
  • Computers leave and join the network frequently
  • Emerging P2P networks offer the following
    Advantages
  • Distributed Resource The storage, information
    and computational cost can be distributed among
    the peers
  • Increased Reliability No reliance on
    centralized coordinators
  • Comprehensiveness of Information The P2P
    network has the potential of reaching every
    computer on the Internet

Content Based Information Retrieval
6
Peer-TO-Peer NetworksFlooding Broadcast of
Queries
Plainly, this model is wasteful because peers
are forced to handle irrelevant queries
  • Different files are shared by different Peers
  • Broadcasts a query request to its connecting
    peers
  • Share Information directly with each other
    (unlike Client-Server Architecture)
  • Messages sent over multiple hops
  • Each Peer looks up its own local shared
    collection and responds to queries

FIG 1 Illustration of Information retrieval
in P2P
Content Based Information Retrieval
7
Peer-To-Peer NetworksOther Discovery Methods
These Methods are still under research
for Content Based Information
Retrieval
  • Distributed Hash Tables
  • Technique to map Filename to a Key
  • Each peer stores a certain range of (Key, Value)
    pairs
  • Some of the examples are CHORD (Key as a m-bit
    integer ) and CAN (key as a point on
    d-dimensional Cartesian coordinate space) models
  • DHTs mandate a specific network structure and
    incur a certain penalty on joining and leaving
    the network
  • Their performance under the dynamic conditions of
    prevalent P2P systems is unknown
  • Routing indices approach
  • Each peer maintains a Routing index
  • This method requires all peers to agree upon a
    set of document categories

Content Based Information Retrieval
8
Content Based Information Retrieval
The Goal of CBIR systems is to operate on
collection of images and, in response to visual
queries, extract relevant image.
  • Features are extracted from the images in the
    database which are stored and indexed (done
    off-line).
  • Query example image from which image features are
    extracted
  • These image features are used to find images in
    the database which are most similar
  • Candidate list of similar images are shown to the
    user
  • From the user feed-back query is optimized and
    used as a new query in an iterative manner

Content Based Information Retrieval
9
Content-based information retrievalBasic Concept
Fig 2
Content Based Information Retrieval
10
Discussion Outline
  • Introduction
  • Background
  • Contribution
  • Clustering Of P2P Network
  • Content Based Query Routing
  • Architecture And Design Of DISCOVIR
  • Experimental Analysis
  • Conclusion
  • Questions


Content Based Information Retrieval
11
Contributions
  • Efficient Data Lookup
  • Organize information in P2P network
  • Route the query intelligently according to the
    content of query
  • Unlike CAN or Chord, allows peer to index its own
    collection, no fix topology, no fix data
    placement
  • Rich Query
  • User can perform more complex query based on the
    content of information rather than simple text
  • Algorithm is implemented on DISCOVIR system
    (covered in the last section)

Content Based Information Retrieval
12
Discussion Outline
  • Introduction
  • Background
  • Contribution
  • Clustering Of P2P Network
  • Content Based Query Routing
  • Architecture And Design Of DISCOVIR
  • Experimental Analysis
  • Conclusion
  • Questions


Content Based Information Retrieval
13
Peer Clustering
Network is organized in a systematic way
like the Yellow Pages in order to
improve Query efficiency
  • Cluster Peers with similar image features
    together
  • Makes use of an extra layer of connections,
    called attractive links, on top of the original
    P2P network
  • Each peer shares a set of images and is
    responsible for extracting the content-based
    feature of its shared images
  • This collection of feature vectors is used to
    describe the characteristic of the shared images
  • Similarity between two peers is determined using
    the set of feature vectors as signature value of
    a peer

Content Based Information Retrieval
14
Peer ClusteringSummary Of Key Terms
Term Definition
Linkrandom (p) Random Link The Original connection in P2P network which a peer p makes randomly to another peer in the network
Linkattractive (p) Attractive Link - The connection which a peer p makes explicitly to another peer, with which they share similar images
Cat (p) A signature value representing the characteristic of a peer p
Sim (p, q) The distance measure between two peers p, q which is a function of Cat (p) and Cat (q)
Sim (p, Q) The distance measure between a peer p and image query Q
Peer (p,t) The set of peers that a peer can reach within t hops
Collection (p) The set of images that a peer p shares
Match (Collection (p) , Q) The distance measure from each image in Collection (p) to the query q
Table I
Content Based Information Retrieval
15
Peer ClusteringImportant Definitions
  • Collection (p)
  • Represents set of n images a peer p shares
  • Low level feature extraction is performed on each
    image to map it to a multi-dimensional vector by
    function f
  • Where f is a specific feature extraction
    function, R is the Real-valued d dimensional
    vector
  • After extraction, each peer contains a set of
    feature vectors
  • Where,

Content Based Information Retrieval
16
Peer ClusteringImportant Definitions
  • Cat (p)
  • Where and are the mean and
    variance of the image vectors collection that
    peer shares
  • Sim (p, q)
  • Distance measure between two peers signature
    values, Cat (p) and Cat (q)

Content Based Information Retrieval
17
Peer ClusteringExplanation
We Assume each peer often shares images relates
to a certain topic
  • The feature vectors of its shared images form a
    sub-cluster in the high dimensional space
  • Thus, Mean and Variance can describe the
    characteristics of this collection
  • The target is to group these sub-clusters to form
    a cluster of peers that shares similar images
  • The more similar two peers p and q are, the
    smaller the value of Sim(p, q)
  • Sim(p, q) measure is small when
    are close and both are
    small
  • If both variances measures are small, it means
    the shared images are highly related to common
    topic
  • When the means are close, it means that the two
    sub-clusters are close in high dimension space

Content Based Information Retrieval
18
Peer ClusteringClustering Algorithm
Peer Clustering is done by assigning attractive
links There are mainly three steps in peer
clustering strategy
  • Signature Value Calculation
  • Every peer calculates its signature value,
    Cat (p), based on the characteristic of images
    shared by that peer p.
  • Neighborhood Discovery
  • Broadcasts a Signature query message to ask for
    signature values of peers, Peer (p, t)
  • Similarity Calculation and Attractive Link
    Establishment
  • The new peer p can now find other peers with
    signature values closest to its own value
  • Makes an Attractive connection to link them up

Algorithm 1
Content Based Information Retrieval
19
Peer ClusteringIllustration Of Peer Clustering
  • A peer named Tree1 means the majority of images
    it shares are related to Tree
  • On joining the network, Tree4 connects to
    randomly selected peer Sunset4
  • It send out a signature query to learn the
    location and signature value of other peers
  • After collecting replies from other peers, peer
    Tree4 makes an attractive link to Tree3 to
    perform peer clustering. This is because Sim
    (Tree4,Tree3) is the smallest
  • Peers of similar characteristics will gradually
    be connected by an attractive link to form a
    cluster

Content Based Information Retrieval
20
Peer ClusteringIllustration Of Peer Clustering
Figure 3
Content Based Information Retrieval
21
Peer ClusteringIllustration Of Peer Clustering
Table II
Content Based Information Retrieval
22
Discussion Outline
  • Introduction
  • Background
  • Contribution
  • Clustering Of P2P Network
  • Content Based Query Routing
  • Architecture And Design Of DISCOVIR
  • Experimental Analysis
  • Conclusion
  • Questions


Content Based Information Retrieval
23
Firework Query ModelIllustration Of Firework
Query Model
Figure 4
Content Based Information Retrieval
24
Firework Query Model
To make use of the cluster P2P network, Firework
Query Model is proposed
  • It is a content based routing strategy
  • A query message is routed selectively according
    to the content of the query
  • In this model, a query message will first walk
    around the network from peer to peer by random
    link
  • Once it reaches the target cluster, the query
    message is broadcast by peers through the
    attractive connections inside the cluster

Algorithm 2
Content Based Information Retrieval
25
Firework Query ModelIllustration Of Firework
Query Model
Assume peer Tree4 initiates a search to find
similar images to its query image, which is an
image of Sea
  • First, the features of this query image are
    extracted and used to calculate the similarity
    between the query and its own signature value Cat
    (p)
  • Since similarity measure between query and its
    signature value is smaller than a preset
    threshold, , Tree4 sends query to Sunset4
  • On receiving this query Sunset4 carries out two
    steps
  • Shared file lookup The peer looks up its shared
    image collection for those matching the query
  • Route selection The peer calculates the
    similarity between query and its signature value,
    which is represented as,

Content Based Information Retrieval
26
Firework Query ModelIllustration Of Firework
Query Model (Cont)
  • Mechanisms used to prevent query messages from
    Looping
  • Replicated Message checking rule
  • The new Query message is checked against the
    local cache for duplication
  • If the message has already passed through before,
    it is not propagated
  • Time-To-Live (TTL)
  • For Random Connections, the probability of
    decreasing the TTL value is 1
  • For Attractive Connections, the probability is an
    arbitrary value in 0,1 called Chance-To-Survive
  • This strategy reduces the number of messages
    passing outside the target cluster
  • More Information can be retrieved inside the
    cluster

Content Based Information Retrieval
27
Discussion Outline
  • Introduction
  • Background
  • Contribution
  • Clustering Of P2P Network
  • Content Based Query Routing
  • Architecture And Design Of DISCOVIR
  • Experimental Analysis
  • Conclusion
  • Questions


Content Based Information Retrieval
28
DISCOVIR
DISCOVIR system is compatible with Gnutella
(v0.4) Protocol
  • Each peer is responsible to perform feature
    extraction on its shared images using DISCOVIR
    Client Program
  • With this program, each peer maintains is local
    index of feature vectors of its image collection
  • When the peer initiates a query by giving an
    example image and a particular feature extracting
    method, it sends the feature vector, contained in
    a query message, to all its connecting peers
  • Other peers compare this query to their feature
    vector index
  • Based on a distance measure, they find a set of
    similar images and return results back to the
    requestor
  • Two types of messages are added
  • ImageQuery It carries the name of the feature
    extraction method and the feature vector of query
    method
  • ImageQueryHit It contains the location,
    filename and size of similar image retrieved, and
    their similarity measure to the query

Content Based Information Retrieval
29
DISCOVIRScreen Shot
Figure 5 Screen-shot of DISCOVIR
Content Based Information Retrieval
30
DISCOVIRType of Messages
Figure 6 ImageQuery message Format
Figure 7 ImageQueryHit message Format
Content Based Information Retrieval
31
DISCOVIRArchitecture
  • Connection Manager Responsible for setting up
    and managing TCP connection between DISCOVIR
    clients
  • Packet Router Controls the routing, assemble
    and disassemble messages between the DISCOVIR
    network and different components of the DISCOVIR
    client program
  • Plug-in Manager Coordinates the download and
    storage of different feature extraction plug-ins
    and their interaction with Feature Extractor and
    Image Indexer
  • HTTP Agent It is a tiny Web-Server that handles
    file download requests from other DISCOVIR peers
  • Feature Extractor Collaborates with the Plug-in
    Manager to perform feature extraction and
    Thumbnail generation from shared image collection
  • Preprocessing Extracts the feature vector of
    shared images in order to make the collection
    searchable in the network
  • Real-Time Extraction Extracts the feature
    vector of the query image on the fly and passes
    the query to packet router
  • Image Indexer Indexes the image collection by
    content feature and carries out clustering to
    speed up retrieval of images

Content Based Information Retrieval
32
DISCOVIRArchitecture
Figure 8
Content Based Information Retrieval
33
DISCOVIRFlow Of Operations
  • Preprocessing
  • Plug-in Manager module is responsible to query
    the list of available feature extraction modules
    on the DISCOVIR control website
  • Selected Feature extraction modules are
    downloaded and installed upon users request
  • The Feature Extractor module extracts features
    and generates thumbnails for all shared images
    using a particular feature extraction method
    needed by user
  • The Image Indexer module then indexes the image
    collection using the extracted multidimensional
    feature vectors
  • Connection Establishment
  • Connection Manager module asks the Bootstrap
    server for peers available for accepting incoming
    connections
  • Query Message Routing
  • The Feature extractor module process the query
    image and assembles an ImageQuery message to be
    sent out through the Packet Router module
  • When other peers receives the image query
    messages, they perform two operations
  • Query Message Propagation Packet router module
    employs checking rules
  • Local Index Look-up The peer uses the image
    indexer module and information in ImageQuery
    message to search its local index of shared files
    for similar images. ImageQueryHit message is
    delivered back to the requestor through Packet
    Router module once similar images are retrieved
  • Query Result Display When an ImageQueryHit
    message returns to the requestor, it will obtain
    a list detailing the location and size of matched
    images. HTTP Agent module downloads (thumbnails,
    full size image) from peer using HTTP protocol.

Content Based Information Retrieval
34
Discussion Outline
  • Introduction
  • Background
  • Contribution
  • Clustering Of P2P Network
  • Content Based Query Routing
  • Architecture And Design Of DISCOVIR
  • Experimental Analysis
  • Conclusion
  • Questions


Content Based Information Retrieval
35
DISCOVIRPerformance Metrics
  • Recall The success rate of the desired result
    retrieved
  • where Ra is the number of retrieved relevant
    documents,
  • R is the total number of relevant
    documents in P2P network
  • Query Scope Fraction of peers being visited by
    each query
  • where Vpeer is the number of peers
    that received and handled the query,
  • Tpeer is the total
    number of peers in P2P netwrok
  • Query Efficiency The ratio between the recall
    and query scope

Content Based Information Retrieval
36
DISCOVIRExperiment
Figure 9 Recall versus Number of peers
Content Based Information Retrieval
37
DISCOVIRExperiment
Figure 10 Query Scope versus Number of Peers
Content Based Information Retrieval
38
DISCOVIRExperiment
Figure 11 Query Efficiency versus Number of Peers
Content Based Information Retrieval
39
Discussion Outline
  • Introduction
  • Background
  • Contribution
  • Clustering Of P2P Network
  • Content Based Query Routing
  • Architecture And Design Of DISCOVIR
  • Experimental Analysis
  • Conclusion
  • Questions


Content Based Information Retrieval
40
Conclusion
  • We saw implementation of CBIR system over current
    Gnutella network
  • Such Architecture fully utilizes the storage and
    computation capacity of computers in the Internet
  • To solve the query broadcasting problem they
    proposed a peer clustering and intelligent query
    routing strategy to search images efficiently
    over P2P network
  • Firework Query Model out performs the BFS method
    in both network traffic cost and query efficiency
    measure

Content Based Information Retrieval
41
References
  • Peer clustering and Firework Query Model by Cheuk
    Hang Ng, Cheung Sia
  • Efficient Information Retrieval in Peer to Peer
    Networks by Tang, C., Xu, Z., and Mahalingam
  • Evaluating Content Based Image Retrieval System
    by Sharon McDonald, Ting-Sheng Lai, John Tait
  • Image Search Engines (An Overview) by Th. Gevers
    and A.W.M Smeulders

Content Based Information Retrieval
42
Content Based Information Retrieval
Write a Comment
User Comments (0)
About PowerShow.com