Title: Distributed Content Based Visual Information Retrieval System On Peer To Peer Network
1Distributed Content Based Visual Information
Retrieval System On Peer To Peer Network
by SAMEER ABROL
Source ACM Transactions on Information
Systems (TOIS) Volume 22 , Issue 3 (July
2004) Pages 477 - 501 Year of
Publication 2004 ISSN1046-8188 Authors
Irwin King The Chinese University of Hong
Kong, Shatin, Hong Kong Cheuk
Hang Ng The Chinese University of Hong Kong,
Shatin, Hong Kong Ka Cheung
Sia The Chinese University of Hong Kong, Shatin,
Hong Kong Publisher ACM Press New York, NY,
USA
2Discussion Outline
- Introduction
- Background
- Contribution
- Clustering Of P2P Network
- Content Based Query Routing
- Architecture And Design Of DISCOVIR
- Experimental Analysis
- Conclusion
- Questions
Content Based Information Retrieval
3Introduction
- Peer-To-Peer Applications (e.g. Gnutella) have
demonstrated the significance of distributed
information sharing systems. - Peer-To-Peer Network offers a completely
decentralized and distributed paradigm. - Currently, most content-based image retrieval
(CBIR) systems are based on the centralized
computing model. - P2P offer advantages of decentralization by
distributing - Storage
- Information
- Computation Cost
- Because of these desirable qualities, many
research projects have been focused on - Designing different P2P system
- Improving their performance
Content Based Information Retrieval
4Discussion Outline
- Introduction
- Background
- Contribution
- Clustering Of P2P Network
- Content Based Query Routing
- Architecture And Design Of DISCOVIR
- Experimental Analysis
- Conclusion
- Questions
Content Based Information Retrieval
5Peer-To-Peer Networks
- Unlike client-server architecture Individual
computers connect directly with each other
without using dedicated servers - Each computer acts as a server and a client
simultaneously - Computers leave and join the network frequently
- Emerging P2P networks offer the following
Advantages - Distributed Resource The storage, information
and computational cost can be distributed among
the peers - Increased Reliability No reliance on
centralized coordinators - Comprehensiveness of Information The P2P
network has the potential of reaching every
computer on the Internet
Content Based Information Retrieval
6Peer-TO-Peer NetworksFlooding Broadcast of
Queries
Plainly, this model is wasteful because peers
are forced to handle irrelevant queries
- Different files are shared by different Peers
- Broadcasts a query request to its connecting
peers - Share Information directly with each other
(unlike Client-Server Architecture) - Messages sent over multiple hops
- Each Peer looks up its own local shared
collection and responds to queries
FIG 1 Illustration of Information retrieval
in P2P
Content Based Information Retrieval
7Peer-To-Peer NetworksOther Discovery Methods
These Methods are still under research
for Content Based Information
Retrieval
- Distributed Hash Tables
- Technique to map Filename to a Key
- Each peer stores a certain range of (Key, Value)
pairs - Some of the examples are CHORD (Key as a m-bit
integer ) and CAN (key as a point on
d-dimensional Cartesian coordinate space) models - DHTs mandate a specific network structure and
incur a certain penalty on joining and leaving
the network - Their performance under the dynamic conditions of
prevalent P2P systems is unknown - Routing indices approach
- Each peer maintains a Routing index
- This method requires all peers to agree upon a
set of document categories
Content Based Information Retrieval
8Content Based Information Retrieval
The Goal of CBIR systems is to operate on
collection of images and, in response to visual
queries, extract relevant image.
- Features are extracted from the images in the
database which are stored and indexed (done
off-line). - Query example image from which image features are
extracted - These image features are used to find images in
the database which are most similar - Candidate list of similar images are shown to the
user - From the user feed-back query is optimized and
used as a new query in an iterative manner
Content Based Information Retrieval
9Content-based information retrievalBasic Concept
Fig 2
Content Based Information Retrieval
10Discussion Outline
- Introduction
- Background
- Contribution
- Clustering Of P2P Network
- Content Based Query Routing
- Architecture And Design Of DISCOVIR
- Experimental Analysis
- Conclusion
- Questions
Content Based Information Retrieval
11Contributions
- Efficient Data Lookup
- Organize information in P2P network
- Route the query intelligently according to the
content of query - Unlike CAN or Chord, allows peer to index its own
collection, no fix topology, no fix data
placement - Rich Query
- User can perform more complex query based on the
content of information rather than simple text - Algorithm is implemented on DISCOVIR system
(covered in the last section)
Content Based Information Retrieval
12Discussion Outline
- Introduction
- Background
- Contribution
- Clustering Of P2P Network
- Content Based Query Routing
- Architecture And Design Of DISCOVIR
- Experimental Analysis
- Conclusion
- Questions
Content Based Information Retrieval
13Peer Clustering
Network is organized in a systematic way
like the Yellow Pages in order to
improve Query efficiency
- Cluster Peers with similar image features
together - Makes use of an extra layer of connections,
called attractive links, on top of the original
P2P network - Each peer shares a set of images and is
responsible for extracting the content-based
feature of its shared images - This collection of feature vectors is used to
describe the characteristic of the shared images - Similarity between two peers is determined using
the set of feature vectors as signature value of
a peer
Content Based Information Retrieval
14Peer ClusteringSummary Of Key Terms
Term Definition
Linkrandom (p) Random Link The Original connection in P2P network which a peer p makes randomly to another peer in the network
Linkattractive (p) Attractive Link - The connection which a peer p makes explicitly to another peer, with which they share similar images
Cat (p) A signature value representing the characteristic of a peer p
Sim (p, q) The distance measure between two peers p, q which is a function of Cat (p) and Cat (q)
Sim (p, Q) The distance measure between a peer p and image query Q
Peer (p,t) The set of peers that a peer can reach within t hops
Collection (p) The set of images that a peer p shares
Match (Collection (p) , Q) The distance measure from each image in Collection (p) to the query q
Table I
Content Based Information Retrieval
15Peer ClusteringImportant Definitions
- Collection (p)
- Represents set of n images a peer p shares
- Low level feature extraction is performed on each
image to map it to a multi-dimensional vector by
function f -
- Where f is a specific feature extraction
function, R is the Real-valued d dimensional
vector - After extraction, each peer contains a set of
feature vectors - Where,
-
Content Based Information Retrieval
16Peer ClusteringImportant Definitions
- Cat (p)
- Where and are the mean and
variance of the image vectors collection that
peer shares -
-
- Sim (p, q)
- Distance measure between two peers signature
values, Cat (p) and Cat (q) -
-
Content Based Information Retrieval
17Peer ClusteringExplanation
We Assume each peer often shares images relates
to a certain topic
- The feature vectors of its shared images form a
sub-cluster in the high dimensional space - Thus, Mean and Variance can describe the
characteristics of this collection - The target is to group these sub-clusters to form
a cluster of peers that shares similar images - The more similar two peers p and q are, the
smaller the value of Sim(p, q) - Sim(p, q) measure is small when
are close and both are
small - If both variances measures are small, it means
the shared images are highly related to common
topic - When the means are close, it means that the two
sub-clusters are close in high dimension space
Content Based Information Retrieval
18Peer ClusteringClustering Algorithm
Peer Clustering is done by assigning attractive
links There are mainly three steps in peer
clustering strategy
- Signature Value Calculation
- Every peer calculates its signature value,
Cat (p), based on the characteristic of images
shared by that peer p. - Neighborhood Discovery
- Broadcasts a Signature query message to ask for
signature values of peers, Peer (p, t) - Similarity Calculation and Attractive Link
Establishment - The new peer p can now find other peers with
signature values closest to its own value - Makes an Attractive connection to link them up
Algorithm 1
Content Based Information Retrieval
19Peer ClusteringIllustration Of Peer Clustering
- A peer named Tree1 means the majority of images
it shares are related to Tree - On joining the network, Tree4 connects to
randomly selected peer Sunset4 - It send out a signature query to learn the
location and signature value of other peers - After collecting replies from other peers, peer
Tree4 makes an attractive link to Tree3 to
perform peer clustering. This is because Sim
(Tree4,Tree3) is the smallest - Peers of similar characteristics will gradually
be connected by an attractive link to form a
cluster
Content Based Information Retrieval
20Peer ClusteringIllustration Of Peer Clustering
Figure 3
Content Based Information Retrieval
21Peer ClusteringIllustration Of Peer Clustering
Table II
Content Based Information Retrieval
22Discussion Outline
- Introduction
- Background
- Contribution
- Clustering Of P2P Network
- Content Based Query Routing
- Architecture And Design Of DISCOVIR
- Experimental Analysis
- Conclusion
- Questions
Content Based Information Retrieval
23Firework Query ModelIllustration Of Firework
Query Model
Figure 4
Content Based Information Retrieval
24Firework Query Model
To make use of the cluster P2P network, Firework
Query Model is proposed
- It is a content based routing strategy
- A query message is routed selectively according
to the content of the query - In this model, a query message will first walk
around the network from peer to peer by random
link - Once it reaches the target cluster, the query
message is broadcast by peers through the
attractive connections inside the cluster
Algorithm 2
Content Based Information Retrieval
25Firework Query ModelIllustration Of Firework
Query Model
Assume peer Tree4 initiates a search to find
similar images to its query image, which is an
image of Sea
- First, the features of this query image are
extracted and used to calculate the similarity
between the query and its own signature value Cat
(p) - Since similarity measure between query and its
signature value is smaller than a preset
threshold, , Tree4 sends query to Sunset4 - On receiving this query Sunset4 carries out two
steps - Shared file lookup The peer looks up its shared
image collection for those matching the query -
- Route selection The peer calculates the
similarity between query and its signature value,
which is represented as, -
Content Based Information Retrieval
26Firework Query ModelIllustration Of Firework
Query Model (Cont)
- Mechanisms used to prevent query messages from
Looping - Replicated Message checking rule
- The new Query message is checked against the
local cache for duplication - If the message has already passed through before,
it is not propagated - Time-To-Live (TTL)
- For Random Connections, the probability of
decreasing the TTL value is 1 - For Attractive Connections, the probability is an
arbitrary value in 0,1 called Chance-To-Survive - This strategy reduces the number of messages
passing outside the target cluster - More Information can be retrieved inside the
cluster
Content Based Information Retrieval
27Discussion Outline
- Introduction
- Background
- Contribution
- Clustering Of P2P Network
- Content Based Query Routing
- Architecture And Design Of DISCOVIR
- Experimental Analysis
- Conclusion
- Questions
Content Based Information Retrieval
28DISCOVIR
DISCOVIR system is compatible with Gnutella
(v0.4) Protocol
- Each peer is responsible to perform feature
extraction on its shared images using DISCOVIR
Client Program - With this program, each peer maintains is local
index of feature vectors of its image collection - When the peer initiates a query by giving an
example image and a particular feature extracting
method, it sends the feature vector, contained in
a query message, to all its connecting peers - Other peers compare this query to their feature
vector index - Based on a distance measure, they find a set of
similar images and return results back to the
requestor - Two types of messages are added
- ImageQuery It carries the name of the feature
extraction method and the feature vector of query
method - ImageQueryHit It contains the location,
filename and size of similar image retrieved, and
their similarity measure to the query
Content Based Information Retrieval
29DISCOVIRScreen Shot
Figure 5 Screen-shot of DISCOVIR
Content Based Information Retrieval
30DISCOVIRType of Messages
Figure 6 ImageQuery message Format
Figure 7 ImageQueryHit message Format
Content Based Information Retrieval
31DISCOVIRArchitecture
- Connection Manager Responsible for setting up
and managing TCP connection between DISCOVIR
clients - Packet Router Controls the routing, assemble
and disassemble messages between the DISCOVIR
network and different components of the DISCOVIR
client program - Plug-in Manager Coordinates the download and
storage of different feature extraction plug-ins
and their interaction with Feature Extractor and
Image Indexer - HTTP Agent It is a tiny Web-Server that handles
file download requests from other DISCOVIR peers - Feature Extractor Collaborates with the Plug-in
Manager to perform feature extraction and
Thumbnail generation from shared image collection - Preprocessing Extracts the feature vector of
shared images in order to make the collection
searchable in the network - Real-Time Extraction Extracts the feature
vector of the query image on the fly and passes
the query to packet router - Image Indexer Indexes the image collection by
content feature and carries out clustering to
speed up retrieval of images
Content Based Information Retrieval
32DISCOVIRArchitecture
Figure 8
Content Based Information Retrieval
33DISCOVIRFlow Of Operations
- Preprocessing
- Plug-in Manager module is responsible to query
the list of available feature extraction modules
on the DISCOVIR control website - Selected Feature extraction modules are
downloaded and installed upon users request - The Feature Extractor module extracts features
and generates thumbnails for all shared images
using a particular feature extraction method
needed by user - The Image Indexer module then indexes the image
collection using the extracted multidimensional
feature vectors - Connection Establishment
- Connection Manager module asks the Bootstrap
server for peers available for accepting incoming
connections - Query Message Routing
- The Feature extractor module process the query
image and assembles an ImageQuery message to be
sent out through the Packet Router module - When other peers receives the image query
messages, they perform two operations - Query Message Propagation Packet router module
employs checking rules - Local Index Look-up The peer uses the image
indexer module and information in ImageQuery
message to search its local index of shared files
for similar images. ImageQueryHit message is
delivered back to the requestor through Packet
Router module once similar images are retrieved - Query Result Display When an ImageQueryHit
message returns to the requestor, it will obtain
a list detailing the location and size of matched
images. HTTP Agent module downloads (thumbnails,
full size image) from peer using HTTP protocol.
Content Based Information Retrieval
34Discussion Outline
- Introduction
- Background
- Contribution
- Clustering Of P2P Network
- Content Based Query Routing
- Architecture And Design Of DISCOVIR
- Experimental Analysis
- Conclusion
- Questions
Content Based Information Retrieval
35DISCOVIRPerformance Metrics
- Recall The success rate of the desired result
retrieved - where Ra is the number of retrieved relevant
documents, - R is the total number of relevant
documents in P2P network - Query Scope Fraction of peers being visited by
each query -
- where Vpeer is the number of peers
that received and handled the query, - Tpeer is the total
number of peers in P2P netwrok - Query Efficiency The ratio between the recall
and query scope -
-
Content Based Information Retrieval
36DISCOVIRExperiment
Figure 9 Recall versus Number of peers
Content Based Information Retrieval
37DISCOVIRExperiment
Figure 10 Query Scope versus Number of Peers
Content Based Information Retrieval
38DISCOVIRExperiment
Figure 11 Query Efficiency versus Number of Peers
Content Based Information Retrieval
39Discussion Outline
- Introduction
- Background
- Contribution
- Clustering Of P2P Network
- Content Based Query Routing
- Architecture And Design Of DISCOVIR
- Experimental Analysis
- Conclusion
- Questions
Content Based Information Retrieval
40Conclusion
- We saw implementation of CBIR system over current
Gnutella network - Such Architecture fully utilizes the storage and
computation capacity of computers in the Internet - To solve the query broadcasting problem they
proposed a peer clustering and intelligent query
routing strategy to search images efficiently
over P2P network - Firework Query Model out performs the BFS method
in both network traffic cost and query efficiency
measure
Content Based Information Retrieval
41References
- Peer clustering and Firework Query Model by Cheuk
Hang Ng, Cheung Sia - Efficient Information Retrieval in Peer to Peer
Networks by Tang, C., Xu, Z., and Mahalingam - Evaluating Content Based Image Retrieval System
by Sharon McDonald, Ting-Sheng Lai, John Tait - Image Search Engines (An Overview) by Th. Gevers
and A.W.M Smeulders
Content Based Information Retrieval
42Content Based Information Retrieval