Distributed Content Based Visual Information Retrieval System On Peer To Peer Network

About This Presentation

Title:

Distributed Content Based Visual Information Retrieval System On Peer To Peer Network

Description:

... system is compatible with Gnutella (v0.4) Protocol. Content Based Information ... Conclusion. We saw implementation of CBIR system over current Gnutella network ... – PowerPoint PPT presentation

Number of Views:170

Avg rating:3.0/5.0

Slides: 43

Provided by: chrisv8

Learn more at: https://crystal.uta.edu

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Content Based Visual Information Retrieval System On Peer To Peer Network

1
Distributed Content Based Visual Information
Retrieval System On Peer To Peer Network
by SAMEER ABROL
Source ACM Transactions on Information
Systems (TOIS) Volume 22 , Issue 3 (July
2004) Pages 477 - 501 Year of
Publication 2004 ISSN1046-8188 Authors
Irwin King The Chinese University of Hong
Kong, Shatin, Hong Kong Cheuk
Hang Ng The Chinese University of Hong Kong,
Shatin, Hong Kong Ka Cheung
Sia The Chinese University of Hong Kong, Shatin,
Hong Kong Publisher ACM Press New York, NY,
USA
2
Discussion Outline

Introduction
Background
Contribution
Clustering Of P2P Network
Content Based Query Routing
Architecture And Design Of DISCOVIR
Experimental Analysis
Conclusion
Questions

Content Based Information Retrieval

3
Introduction

Peer-To-Peer Applications (e.g. Gnutella) have
demonstrated the significance of distributed
information sharing systems.
Peer-To-Peer Network offers a completely
decentralized and distributed paradigm.
Currently, most content-based image retrieval
(CBIR) systems are based on the centralized
computing model.
P2P offer advantages of decentralization by
distributing
Storage
Information
Computation Cost
Because of these desirable qualities, many
research projects have been focused on
Designing different P2P system
Improving their performance

Content Based Information Retrieval
4
Discussion Outline

Introduction
Background
Contribution
Clustering Of P2P Network
Content Based Query Routing
Architecture And Design Of DISCOVIR
Experimental Analysis
Conclusion
Questions

Content Based Information Retrieval

5
Peer-To-Peer Networks

Unlike client-server architecture Individual
computers connect directly with each other
without using dedicated servers
Each computer acts as a server and a client
simultaneously
Computers leave and join the network frequently
Emerging P2P networks offer the following
Advantages
Distributed Resource The storage, information
and computational cost can be distributed among
the peers
Increased Reliability No reliance on
centralized coordinators
Comprehensiveness of Information The P2P
network has the potential of reaching every
computer on the Internet

Content Based Information Retrieval
6
Peer-TO-Peer NetworksFlooding Broadcast of
Queries
Plainly, this model is wasteful because peers
are forced to handle irrelevant queries

Different files are shared by different Peers
Broadcasts a query request to its connecting
peers
Share Information directly with each other
(unlike Client-Server Architecture)
Messages sent over multiple hops
Each Peer looks up its own local shared
collection and responds to queries

FIG 1 Illustration of Information retrieval
in P2P
Content Based Information Retrieval
7
Peer-To-Peer NetworksOther Discovery Methods
These Methods are still under research
for Content Based Information
Retrieval

Distributed Hash Tables
Technique to map Filename to a Key
Each peer stores a certain range of (Key, Value)
pairs
Some of the examples are CHORD (Key as a m-bit
integer ) and CAN (key as a point on
d-dimensional Cartesian coordinate space) models
DHTs mandate a specific network structure and
incur a certain penalty on joining and leaving
the network
Their performance under the dynamic conditions of
prevalent P2P systems is unknown
Routing indices approach
Each peer maintains a Routing index
This method requires all peers to agree upon a
set of document categories

Content Based Information Retrieval
8
Content Based Information Retrieval
The Goal of CBIR systems is to operate on
collection of images and, in response to visual
queries, extract relevant image.

Features are extracted from the images in the
database which are stored and indexed (done
off-line).
Query example image from which image features are
extracted
These image features are used to find images in
the database which are most similar
Candidate list of similar images are shown to the
user
From the user feed-back query is optimized and
used as a new query in an iterative manner

Content Based Information Retrieval
9
Content-based information retrievalBasic Concept
Fig 2
Content Based Information Retrieval
10
Discussion Outline

Introduction
Background
Contribution
Clustering Of P2P Network
Content Based Query Routing
Architecture And Design Of DISCOVIR
Experimental Analysis
Conclusion
Questions

Content Based Information Retrieval
11
Contributions

Efficient Data Lookup
Organize information in P2P network
Route the query intelligently according to the
content of query
Unlike CAN or Chord, allows peer to index its own
collection, no fix topology, no fix data
placement
Rich Query
User can perform more complex query based on the
content of information rather than simple text
Algorithm is implemented on DISCOVIR system
(covered in the last section)

Content Based Information Retrieval
12
Discussion Outline

Introduction
Background
Contribution
Clustering Of P2P Network
Content Based Query Routing
Architecture And Design Of DISCOVIR
Experimental Analysis
Conclusion
Questions

Content Based Information Retrieval
13
Peer Clustering
Network is organized in a systematic way
like the Yellow Pages in order to
improve Query efficiency

Cluster Peers with similar image features
together
Makes use of an extra layer of connections,
called attractive links, on top of the original
P2P network
Each peer shares a set of images and is
responsible for extracting the content-based
feature of its shared images
This collection of feature vectors is used to
describe the characteristic of the shared images
Similarity between two peers is determined using
the set of feature vectors as signature value of
a peer

Content Based Information Retrieval
14
Peer ClusteringSummary Of Key Terms
Term Definition
Linkrandom (p) Random Link The Original connection in P2P network which a peer p makes randomly to another peer in the network
Linkattractive (p) Attractive Link - The connection which a peer p makes explicitly to another peer, with which they share similar images
Cat (p) A signature value representing the characteristic of a peer p
Sim (p, q) The distance measure between two peers p, q which is a function of Cat (p) and Cat (q)
Sim (p, Q) The distance measure between a peer p and image query Q
Peer (p,t) The set of peers that a peer can reach within t hops
Collection (p) The set of images that a peer p shares
Match (Collection (p) , Q) The distance measure from each image in Collection (p) to the query q
Table I
Content Based Information Retrieval
15
Peer ClusteringImportant Definitions

Collection (p)
Represents set of n images a peer p shares
Low level feature extraction is performed on each
image to map it to a multi-dimensional vector by
function f
Where f is a specific feature extraction
function, R is the Real-valued d dimensional
vector
After extraction, each peer contains a set of
feature vectors
Where,

Content Based Information Retrieval
16
Peer ClusteringImportant Definitions

Cat (p)
Where and are the mean and
variance of the image vectors collection that
peer shares
Sim (p, q)
Distance measure between two peers signature
values, Cat (p) and Cat (q)

Content Based Information Retrieval
17
Peer ClusteringExplanation
We Assume each peer often shares images relates
to a certain topic

The feature vectors of its shared images form a
sub-cluster in the high dimensional space
Thus, Mean and Variance can describe the
characteristics of this collection
The target is to group these sub-clusters to form
a cluster of peers that shares similar images
The more similar two peers p and q are, the
smaller the value of Sim(p, q)
Sim(p, q) measure is small when
are close and both are
small
If both variances measures are small, it means
the shared images are highly related to common
topic
When the means are close, it means that the two
sub-clusters are close in high dimension space

Content Based Information Retrieval
18
Peer ClusteringClustering Algorithm
Peer Clustering is done by assigning attractive
links There are mainly three steps in peer
clustering strategy

Signature Value Calculation
Every peer calculates its signature value,
Cat (p), based on the characteristic of images
shared by that peer p.
Neighborhood Discovery
Broadcasts a Signature query message to ask for
signature values of peers, Peer (p, t)
Similarity Calculation and Attractive Link
Establishment
The new peer p can now find other peers with
signature values closest to its own value
Makes an Attractive connection to link them up

Algorithm 1
Content Based Information Retrieval
19
Peer ClusteringIllustration Of Peer Clustering

A peer named Tree1 means the majority of images
it shares are related to Tree
On joining the network, Tree4 connects to
randomly selected peer Sunset4
It send out a signature query to learn the
location and signature value of other peers
After collecting replies from other peers, peer
Tree4 makes an attractive link to Tree3 to
perform peer clustering. This is because Sim
(Tree4,Tree3) is the smallest
Peers of similar characteristics will gradually
be connected by an attractive link to form a
cluster

Content Based Information Retrieval
20
Peer ClusteringIllustration Of Peer Clustering
Figure 3
Content Based Information Retrieval
21
Peer ClusteringIllustration Of Peer Clustering
Table II
Content Based Information Retrieval
22
Discussion Outline

Introduction
Background
Contribution
Clustering Of P2P Network
Content Based Query Routing
Architecture And Design Of DISCOVIR
Experimental Analysis
Conclusion
Questions

Content Based Information Retrieval
23
Firework Query ModelIllustration Of Firework
Query Model
Figure 4
Content Based Information Retrieval
24
Firework Query Model
To make use of the cluster P2P network, Firework
Query Model is proposed

It is a content based routing strategy
A query message is routed selectively according
to the content of the query
In this model, a query message will first walk
around the network from peer to peer by random
link
Once it reaches the target cluster, the query
message is broadcast by peers through the
attractive connections inside the cluster

Algorithm 2
Content Based Information Retrieval
25
Firework Query ModelIllustration Of Firework
Query Model
Assume peer Tree4 initiates a search to find
similar images to its query image, which is an
image of Sea

First, the features of this query image are
extracted and used to calculate the similarity
between the query and its own signature value Cat
(p)
Since similarity measure between query and its
signature value is smaller than a preset
threshold, , Tree4 sends query to Sunset4
On receiving this query Sunset4 carries out two
steps
Shared file lookup The peer looks up its shared
image collection for those matching the query
Route selection The peer calculates the
similarity between query and its signature value,
which is represented as,

Content Based Information Retrieval
26
Firework Query ModelIllustration Of Firework
Query Model (Cont)

Mechanisms used to prevent query messages from
Looping
Replicated Message checking rule
The new Query message is checked against the
local cache for duplication
If the message has already passed through before,
it is not propagated
Time-To-Live (TTL)
For Random Connections, the probability of
decreasing the TTL value is 1
For Attractive Connections, the probability is an
arbitrary value in 0,1 called Chance-To-Survive
This strategy reduces the number of messages
passing outside the target cluster
More Information can be retrieved inside the
cluster

Content Based Information Retrieval
27
Discussion Outline

Introduction
Background
Contribution
Clustering Of P2P Network
Content Based Query Routing
Architecture And Design Of DISCOVIR
Experimental Analysis
Conclusion
Questions

Content Based Information Retrieval
28
DISCOVIR
DISCOVIR system is compatible with Gnutella
(v0.4) Protocol

Each peer is responsible to perform feature
extraction on its shared images using DISCOVIR
Client Program
With this program, each peer maintains is local
index of feature vectors of its image collection
When the peer initiates a query by giving an
example image and a particular feature extracting
method, it sends the feature vector, contained in
a query message, to all its connecting peers
Other peers compare this query to their feature
vector index
Based on a distance measure, they find a set of
similar images and return results back to the
requestor
Two types of messages are added
ImageQuery It carries the name of the feature
extraction method and the feature vector of query
method
ImageQueryHit It contains the location,
filename and size of similar image retrieved, and
their similarity measure to the query

Content Based Information Retrieval
29
DISCOVIRScreen Shot
Figure 5 Screen-shot of DISCOVIR
Content Based Information Retrieval
30
DISCOVIRType of Messages
Figure 6 ImageQuery message Format
Figure 7 ImageQueryHit message Format
Content Based Information Retrieval
31
DISCOVIRArchitecture

Connection Manager Responsible for setting up
and managing TCP connection between DISCOVIR
clients
Packet Router Controls the routing, assemble
and disassemble messages between the DISCOVIR
network and different components of the DISCOVIR
client program
Plug-in Manager Coordinates the download and
storage of different feature extraction plug-ins
and their interaction with Feature Extractor and
Image Indexer
HTTP Agent It is a tiny Web-Server that handles
file download requests from other DISCOVIR peers
Feature Extractor Collaborates with the Plug-in
Manager to perform feature extraction and
Thumbnail generation from shared image collection
Preprocessing Extracts the feature vector of
shared images in order to make the collection
searchable in the network
Real-Time Extraction Extracts the feature
vector of the query image on the fly and passes
the query to packet router
Image Indexer Indexes the image collection by
content feature and carries out clustering to
speed up retrieval of images

Content Based Information Retrieval
32
DISCOVIRArchitecture
Figure 8
Content Based Information Retrieval
33
DISCOVIRFlow Of Operations

Preprocessing
Plug-in Manager module is responsible to query
the list of available feature extraction modules
on the DISCOVIR control website
Selected Feature extraction modules are
downloaded and installed upon users request
The Feature Extractor module extracts features
and generates thumbnails for all shared images
using a particular feature extraction method
needed by user
The Image Indexer module then indexes the image
collection using the extracted multidimensional
feature vectors
Connection Establishment
Connection Manager module asks the Bootstrap
server for peers available for accepting incoming
connections
Query Message Routing
The Feature extractor module process the query
image and assembles an ImageQuery message to be
sent out through the Packet Router module
When other peers receives the image query
messages, they perform two operations
Query Message Propagation Packet router module
employs checking rules
Local Index Look-up The peer uses the image
indexer module and information in ImageQuery
message to search its local index of shared files
for similar images. ImageQueryHit message is
delivered back to the requestor through Packet
Router module once similar images are retrieved
Query Result Display When an ImageQueryHit
message returns to the requestor, it will obtain
a list detailing the location and size of matched
images. HTTP Agent module downloads (thumbnails,
full size image) from peer using HTTP protocol.

Content Based Information Retrieval
34
Discussion Outline

Introduction
Background
Contribution
Clustering Of P2P Network
Content Based Query Routing
Architecture And Design Of DISCOVIR
Experimental Analysis
Conclusion
Questions

Content Based Information Retrieval
35
DISCOVIRPerformance Metrics

Recall The success rate of the desired result
retrieved
where Ra is the number of retrieved relevant
documents,
R is the total number of relevant
documents in P2P network
Query Scope Fraction of peers being visited by
each query
where Vpeer is the number of peers
that received and handled the query,
Tpeer is the total
number of peers in P2P netwrok
Query Efficiency The ratio between the recall
and query scope

Content Based Information Retrieval
36
DISCOVIRExperiment
Figure 9 Recall versus Number of peers
Content Based Information Retrieval
37
DISCOVIRExperiment
Figure 10 Query Scope versus Number of Peers
Content Based Information Retrieval
38
DISCOVIRExperiment
Figure 11 Query Efficiency versus Number of Peers
Content Based Information Retrieval
39
Discussion Outline

Introduction
Background
Contribution
Clustering Of P2P Network
Content Based Query Routing
Architecture And Design Of DISCOVIR
Experimental Analysis
Conclusion
Questions

Content Based Information Retrieval
40
Conclusion

We saw implementation of CBIR system over current
Gnutella network
Such Architecture fully utilizes the storage and
computation capacity of computers in the Internet
To solve the query broadcasting problem they
proposed a peer clustering and intelligent query
routing strategy to search images efficiently
over P2P network
Firework Query Model out performs the BFS method
in both network traffic cost and query efficiency
measure

Content Based Information Retrieval
41
References

Peer clustering and Firework Query Model by Cheuk
Hang Ng, Cheung Sia
Efficient Information Retrieval in Peer to Peer
Networks by Tang, C., Xu, Z., and Mahalingam
Evaluating Content Based Image Retrieval System
by Sharon McDonald, Ting-Sheng Lai, John Tait
Image Search Engines (An Overview) by Th. Gevers
and A.W.M Smeulders

Content Based Information Retrieval
42
Content Based Information Retrieval

Write a Comment

User Comments (0)