Affinity Hybrid Tree: An Indexing Technique for ContentBased Image Retrieval in Multimedia Databases

About This Presentation

Title:

Affinity Hybrid Tree: An Indexing Technique for ContentBased Image Retrieval in Multimedia Databases

Description:

Kasturi Chatterjee & Shu-Ching Chen. Distributed Multimedia Information System Laboratory. School of Computing ... [5] P. Ciaccia, M. Patella, and P. Zezula. ... – PowerPoint PPT presentation

Number of Views:111

Avg rating:3.0/5.0

Slides: 32

Provided by: fiu8

Category:

more less

Transcript and Presenter's Notes

Title: Affinity Hybrid Tree: An Indexing Technique for ContentBased Image Retrieval in Multimedia Databases

1
Affinity Hybrid Tree An Indexing Technique for
Content-Based Image Retrieval in Multimedia
Databases

Kasturi Chatterjee Shu-Ching Chen
Distributed Multimedia Information System
Laboratory
School of Computing and Information Sciences
Florida International University, Miami, FL
33199, USA

2
Outline

Motivation
Need of indexing in multimedia databases
Need of high level image relationships
Need of embedding high level
relationships in the index structure
Literature Review
Multidimensional index structures
Indexing mechanisms supporting CBIR and
RF
Affinity relationships
Affinity Hybrid Tree (AH-Tree)
Proposed structure
Characteristics
Similarity queries
Experimental Analysis
Conclusion and Future Work

3
Motivation

Need of indexing in multimedia databases
Popularity of multimedia presentation and
storage
Emphasizes the requirement of efficient
multimedia storage and retrieval
Indexing is an integral part of designing
a database system to reduce computation
overhead and optimize retrieval
Multimedia data (e.g., image)
representation is different from traditional
data, generally in the form of multi-dimensional
feature vectors Index structures should
handle high dimensionality efficiently as
higher the dimension, better is the multimedia
representation and more satisfactory are the
retrieval results

Thus, we need a specialized Index Structure and
Retrieval mechanism different from traditional
indexing to handle the above concerns

4
Motivation

Need of high level image relationships
Content Based Image Retrieval (CBIR) with
Relevance Feedback (RF) is a popular image
retrieval mechanism
CBIR incorporates high-level image
relationship in the retrieval method with the
help of RF to capture users similarity concept
More accurate the users similarity
concept interpretation, better is the
relevance of the query results

5
Motivation

Need of embedding high level relationships in the
indexstructure
Index structures is required to aid a
retrieval mechanism in terms of computational
efficiency and high level image relationship
improves the quality and relevance of the
retrieval result

Hence, to have an efficient multimedia storage
supporting retrieval mechanism
such as CBIR, an index structure is necessary
supporting high level image
relationship

6
Literature Review

Multi-dimensional index structures
Feature based
Distance based
Each category can be further sub divided into
Data partitioned
Space partitioned

project an image as a feature vector in a feature
space and index the space
e.g., KDB-tree and R-Tree

Distance based indexing structures are built
based on the distances or similarities
between two data objects
e.g., M-tree, vp-tree

DP-based index structure consists of bounding
regions (BRs) arranged in a (spatial)
containment hierarchy, e.g., R-tree, X-tree

Consists of space recursively partitioned into
mutually disjoint subspaces e.g.,
KDB-tree, vp-tree

7
Literature Review

Indexing mechanisms supporting CBIR and RF
None of the discussed index mechanism captures
the high-level relationship as it is without
attempting to translate it into its low level
equivalence 4611
Capturing the users similarity concept
in the form of feature level closeness is
often error-prone and/or impose heavy burden on
the end user (which is not desirable)
Affinity relationships
A way to capture users similarity
measure in a CBIR paradigm
Parameter of Markov Model Mediator as
proposed in 16 whose main idea is more
frequently two images are accessed together, more
related they are and more is their affinity
value The relative affinity measurement
between two images m and n is calculated by

use m,k -gt usage pattern of image m w.r.t query
image qk per time period
access k -gt access frequency of query qk per
time period
8
Literature Review

Limitations of existing works
None of the existing multidimensional
indexing structure can incorporate the
high-level image relationship efficiently in its
framework
Feature based index structures cannot
incorporate high-level
image relationship because Spatial
Access Methods require
the distances between objects to be
strictly related to the
object position in a low-dimensional
vector space 13 etc.
Distance based index structures can
incorporate the high level image relationship
as it is but not efficiently since of object
pair distance calculations are huge

Thus, image object similarity needs to be
translated along different
dimensions, which becomes problematic

Thus, it negates the very essence of indexing
which aims at keeping the computation
overhead as low as possible

9
Affinity Hybrid Tree (AH-Tree)

Proposed Structure
Solves the two limitations discussed by
combining Feature based and Distance based
index mechanisms into one hybrid structure

Feature based index mechanism filters the
feature space and reduce the of distance
computations to be performed
Reduce computational overhead
Distance based index mechanism
incorporates the high-level image
relationship as it is without translating
it into its low-level equivalence
Increase retrieved image relevance by capturing
the user concept as it is
10
Affinity Hybrid Tree (AH-Tree)

Building AH-Tree
1. build Space Index by feeding data points
2. for each data node of the space index,
check if of data points is greater than a
threshold
3. if yes, merge the data nodes builda
distance based index for each data node. Each
data node consists of the root of the
corresponding distance based index tree

11
Affinity Hybrid Tree (AH-Tree)

Characteristics
Incorporation of affinity value The
affinity value between the image objects is
incorporated after the AH-Tree is built
during the queries for pruning the tree
Why after the tree is built and not during
?
AH-tree uses a metric distance function
to calculate the (dis)similarity among the
image objects in the distance based index
structure
A distance function is metric if it
follows the laws of
(a) symmetry
(b) positivity
(c) triangular inequality

12
Affinity Hybrid Tree (AH-Tree)

Why after the tree is built and not during ?
(cont) In order to satisfy the triangular
inequality property, the affinity value could
not be incorporated in the tree during building
as it will necessitate the affinity value
between the image objects used to scale the
similarity measurement to be equal, which is
clearly not possible.
Lemma 1 The affinity relationship cannot be
involved while constructing the metric tree as it
no longer keeps the search space metric.

13
Affinity Hybrid Tree (AH-Tree)

Characteristics
Promotion of affinity value
As discussed, since the affinity
value is included in the tree after the
construction, thus a technique should be defined
to allow for the promotion of the affinity value
from the leaves to the intermediate parent nodes
till the root
The affinity value is promoted as follows
1. during each query Q, the affinity value of
the leaves with respect to the query object is
derived
2. the affinity value of a leaf node along with
its siblings is used to calculate the affinity
value of their parent (intermediate node)
3. step 2 is repeated till the affinity value
of the root is determined
The affinity value is thus promoted and
distributed to each node of the tree is used
during retrievals of the similarity query Q

14
Affinity Hybrid Tree (AH-Tree)

Promotion of affinity value (cont)
The promotion technique is pictorially
described as follows
1. Let Na and Nb be the leaf nodes of the
distance based index structure with
imageobjects Oa and Ob respectively, and let Nr
be their parent
2. Let affa,q and affb,q be the
pre-computedaffinity values between query object
anda b, respectively
3. Thus, affinity of the parent Nr is equal
tomax (affa,q , affb,q )

max (affa,q , affb,q )
15
Affinity Hybrid Tree (AH-Tree)

Promotion of affinity value (cont)The defined
promotion technique ensures 2 important criteria
Avoiding false dismissal
Avoiding unnecessary traversal
Thus, the affinity promotion technique
implemented has two advantages
enables us to embed high level
relationships in the metric space, thus
providing better users concept capturing and
better query result
avoids unnecessary traversal,
it further aids in reducing the
computation overhead

the promotion of the maximum value ensures that
if any of the children possess an affinity value
gt required affinity, the parent is traversed and
included in the query path
Unnecessary traversal is avoided by discarding
the parent node from the traversal path if none
of its children has an affinity value gt required
affinity
16
Affinity Hybrid Tree (AH-Tree)

Similarity Queries
query represented as a collection of features
once the feature vector is obtained, AH-Tree is
traversed from root to feature subspace once
appropriate feature subspaces are obtained,
corresponding metric spaces are merged the
affinity value is promoted from leaf to the root
appropriate image objects are returned whose
(i) distances with the queried
object satisfy the similarity
measurement requirement and
(ii) affinity values with the
queried object are greater than or equal
to the supplied affinity value

k-NN Search Returns the top k similar objects
to a query image
17
Affinity Hybrid Tree (AH-Tree)

Range Queries
Both search range and search radius are
supplied with the query
Search the Feature Space to get subspaces
overlapping with the query object or
falling within the specified range3. Merge
neighboring feature spaces to increase the
metric search space
4. Affinity Promotion
5. Similarities of router objects are evaluated
against the query object with respect to the
search radius. If satisfied, evaluated with
respect to the supplied affinity value

18
Affinity Hybrid Tree (AH-Tree)

Range Queries (cont)
6. If both are satisfied, the metric search is
iterated for the sub-tree of the
routingobjectAdditional Characteristics

The search radius (range) is often difficult for
the naïve user to specify correctly. To avoid
such aproblem, a parallel result queue is
maintained which consists of objects satisfying
only the affinity check even if the similarity
check fails. The queue is presented to the user
if he/she is not satisfied with the query
result. Gives a higher priority to the high
level image relationships over low level
representations.

19
Affinity Hybrid Tree (AH-Tree)

k-NN Search
1. The feature space uses a branch and
bound 8 technique to perform the k-NN
search2. Performs ordered depth-first search
in the feature space 3. Determines the
k-nearest sub spaces of a given query point
at each non-leaf node, metric
bounds are calculated between
the query point and all its
MBRs and stored in an ordered
list list pruned depending
on similarity measures on
reaching data nodes, the nearest distance
is updated and iteration continues
until k nearest sub spaces
are obtained
4. The metric trees
corresponding to each feature space are
combined in an ordered fashion to refine
the query result and increase the metric
search space

20
Affinity Hybrid Tree (AH-Tree)

k-NN Search (cont)
5. Metric Space search is the same as discussed
in the range search method except thatboth the
search radius and the affinityvalue become
dynamic here

The search radius and affinity value are made
dynamic by making them the distance and affinity
valuebetween Q and the current kth nearest
neighbor respectively, storing all non-leaf
nodes satisfying similarity measurement in a
priority queue
21
Experimental Analysis

Experimental Setup
application was built using C in
Linux environment
node size of 4KB was used
image database has 10,000 images of 72 semantic
categories feature matrix was developed
from color information of each image in
HSV color space a 10,000 x 10,000
affinity matrix was re-computed from the
training set and used to capture user perception
Computation Overhead Computation overhead is
expressed with the following
a) I/O Cost
b) CPU Cost ( of distance
computations)

The AH-Tree is compared with the performance of
M-Tree which has the potential of introducing
the high-level image relationship in its index
structure
The AH-Tree is not compared with any space based
index structures since they are incapable of
embedding high level relationship as it is into
their index structures
22
Experimental Analysis

I/O Costthe space filtering mechanism reduces
the number of image objects in the metric space
which affects the I/O Cost
Tree Construction
Range Query

23
Experimental Analysis

I/O Cost
k-NN Search

24
Experimental Analysis

CPU Costthe space filtering mechanism reduces
the number of image objects in the metric space
which reduces the of similarity computation,
thus reducing CPU Cost drastically
Tree Construction
Range Query

25
Experimental Analysis

CPU Cost
k-NN Search

26
Experimental Analysis

Accuracy
accuracy is defined as the percentage of the
retrieved images that are semantically
related to the query image

Such a stark difference in accuracy is clearly
due to the introduction of the high-level
relationship in the AH-Tree which captures the
users concept of similarity better than only
relying upon the distance measurement
27
Conclusion

AH-Tree clearly outperforms existing
multi-dimensional indexing structures in terms of
Computation Overhead and Accuracy of query
results in case of content based image retrieval
paradigm
The introduction of feature space filtering
technique has a great effect on such drastic
improvements
Using metric space to introduce the high level
image relationship as it is helps to produce such
high accuracy retrieval results
AH-Tree is flexible in introducing any kind of
high level image relationships
The AH-Tree, to the best of our knowledge, is the
first attempt in combining Feature Space and
Metric Space to produce a hybrid structure
capable of solving the two major goals of an
index structure supporting multimedia objects
Efficient query retrieval
with reduced CPU and I/O cost
Relevant query results with
a satisfactory accuracy
measurement

28
Future Work

Introducing query refinement mechanisms
Integrating some data mining techniques to
calculate affinity relationships on the fly
Developing a unified seamless index structure
supporting all kinds of multimedia objects and
retrieval

29
Questions
30
References

1 S. Berchtold, D. A. Keim, and H. Kriegel. The
x-tree an index structure for high dimensional
data. In Proceedings of the 22nd International
Conference on Very Large Databases, pages 2839,
Bombay, India, September 1996.
2 K. Chakrabarti. Hybrid tree code.
//www.ics.uci.edu/ kaushik/research/htree.html,
2005.
3 K. Chakrabarti and S. Mehrotra. The hybrid
tree An index structure for high-dimensional
feature spaces. In Proceedings of the IEEE
International Conference on Data Engineering,
pages 440447, Sydney, Australia, March 1999.4
K. Chakrabarti, K. Porkaew, M. Ortega, and S.
Mehrotra. Evaluating refined queries in top-k
retrieval systems. IEEE Transactions on Knowledge
and Data Engineering (TKDE), 16(2)256270,
February 2004.
5 P. Ciaccia, M. Patella, and P. Zezula.
M-tree An efficient access method for similarity
search in metric spaces. In Proceedings of the
23rd VLDB International Conference, pages
426435, Athens, Greece, August 1997.
6 R. Fagin. Fuzzy queries in multimedia
database systems. In PODS 98 Proceedings of the
seventeenth ACM SIGACTSIGMOD-SIGART symposium on
Principles of database systems, pages 110,
Seattle, Washington, United States, June 1998.
7 D. Greene. An implementation and performance
analysis of spatial data access methods. In
Proceedings of ICDE, pages 606615, Los Angeles,
California, United States, February 1989.
8 A. Guttman. R-trees A dynamic index
structure for spatial searching. In Proceedings
of the 1984 ACM SIGMOD International Conference
on Management of Data, pages 4757, Boston,
Massachusetts, Unites States, June 1984.
9 R. Krishnapuram, S. Medasani, J. Hwan, C. Y.
Sik, and R. Balasubramaniam. Content based image
retrieval based on fuzzy approach. IEEE
Transactions on Knowledge and Data Engineering
(TKDE), 16(10)11851199, 2004.
10 D. B. Lomet and B. Salzberg. The hb-tree A
multiattribute indexing method with good
guaranteed performance. ACM Transactions on
Database Systems, 15(4)625658, 1990.
11 A. Motro. Vague A user interface to
relational databases that permits vague queries.
ACM Transactions on Office Information Systems,
6(3)187214, 1988.

31
References

12 M. Patella. M-tree code. http//www-db.deis.u
nibo.it/Mtree, 2005. 13 J. Robinson. The
k-d-b-tree A search structure for large
multidimensional dynamic indexes. In Proceedings
of the 1981 ACM SIGMOD International Conference
on Management of Data, pages 1018, Ann Arbor,
Michigan, United States, April 1981.
14 N. Roussopoulos, S. Kelley, and F. Vincent.
Nearest neighbor queries. In Proceedings of the
1995 ACM SIGMOD international conference on
Management of Data, pages 71 79, San Jose,
California, United States, May 1995.
15 Y. Rui, T. Huang, and S. Mehrotra. Content
based image retrieval with image retrieval in
mars. In Proceedings of International Conference
on Image Processing, pages 815818, Santa
Barbara, California, United States, October 1997.
16 M.-L. Shyu, S.-C. Chen, M. Chen, C. Zhang,
and C.-M. Shu. MMM A stochastic mechanism for
image database queries. In Proceedings of the
IEEE Fifth International Symposium on Multimedia
Software Engineering (MSE2003), pages 188195,
Taichung, Taiwan, ROC, December 2003.
17 D. A. White and R. Jain. Similarity indexing
with sstree. In Proceedings of the 12th
International Conference on Data Engineering,
pages 516523, New Orleans, LA, United States,
February 1996.
18 P. N. Yianilos. Data structures and
algorithms for nearest neighbor search in general
metric spaces. In Proceedings of the 3rd Annual
ACM-SIAM Symposium on Discrete Algorithms, pages
311321, Philadelphia, PA, United States, January
1993.
19 P. Zezula, P. Ciaccia, and F. Rabitti.
M-tree A dynamic index for similarity queries in
multimedia databases. In Technical Report 7,
HERMES ESPRIT LTR Projects, 1996.