Title: Affinity Hybrid Tree: An Indexing Technique for ContentBased Image Retrieval in Multimedia Databases
1Affinity Hybrid Tree An Indexing Technique for
Content-Based Image Retrieval in Multimedia
Databases
- Kasturi Chatterjee Shu-Ching Chen
- Distributed Multimedia Information System
Laboratory - School of Computing and Information Sciences
- Florida International University, Miami, FL
33199, USA
2Outline
- Motivation
- Need of indexing in multimedia databases
- Need of high level image relationships
- Need of embedding high level
relationships in the index structure - Literature Review
- Multidimensional index structures
- Indexing mechanisms supporting CBIR and
RF - Affinity relationships
- Affinity Hybrid Tree (AH-Tree)
- Proposed structure
- Characteristics
- Similarity queries
- Experimental Analysis
- Conclusion and Future Work
3Motivation
- Need of indexing in multimedia databases
- Popularity of multimedia presentation and
storage - Emphasizes the requirement of efficient
multimedia storage and retrieval - Indexing is an integral part of designing
a database system to reduce computation
overhead and optimize retrieval - Multimedia data (e.g., image)
representation is different from traditional
data, generally in the form of multi-dimensional
feature vectors Index structures should
handle high dimensionality efficiently as
higher the dimension, better is the multimedia
representation and more satisfactory are the
retrieval results -
- Thus, we need a specialized Index Structure and
Retrieval mechanism different from traditional
indexing to handle the above concerns
4Motivation
- Need of high level image relationships
- Content Based Image Retrieval (CBIR) with
Relevance Feedback (RF) is a popular image
retrieval mechanism - CBIR incorporates high-level image
relationship in the retrieval method with the
help of RF to capture users similarity concept - More accurate the users similarity
concept interpretation, better is the
relevance of the query results -
5Motivation
- Need of embedding high level relationships in the
indexstructure - Index structures is required to aid a
retrieval mechanism in terms of computational
efficiency and high level image relationship
improves the quality and relevance of the
retrieval result
- Hence, to have an efficient multimedia storage
supporting retrieval mechanism - such as CBIR, an index structure is necessary
supporting high level image - relationship
6Literature Review
- Multi-dimensional index structures
- Feature based
-
-
- Distance based
-
-
- Each category can be further sub divided into
- Data partitioned
-
- Space partitioned
-
-
- project an image as a feature vector in a feature
space and index the space - e.g., KDB-tree and R-Tree
- Distance based indexing structures are built
based on the distances or similarities
between two data objects - e.g., M-tree, vp-tree
- DP-based index structure consists of bounding
regions (BRs) arranged in a (spatial)
containment hierarchy, e.g., R-tree, X-tree
- Consists of space recursively partitioned into
mutually disjoint subspaces e.g.,
KDB-tree, vp-tree
7Literature Review
- Indexing mechanisms supporting CBIR and RF
None of the discussed index mechanism captures
the high-level relationship as it is without
attempting to translate it into its low level
equivalence 4611 - Capturing the users similarity concept
in the form of feature level closeness is
often error-prone and/or impose heavy burden on
the end user (which is not desirable) -
- Affinity relationships
- A way to capture users similarity
measure in a CBIR paradigm - Parameter of Markov Model Mediator as
proposed in 16 whose main idea is more
frequently two images are accessed together, more
related they are and more is their affinity
value The relative affinity measurement
between two images m and n is calculated by -
use m,k -gt usage pattern of image m w.r.t query
image qk per time period
access k -gt access frequency of query qk per
time period
8Literature Review
- Limitations of existing works
- None of the existing multidimensional
indexing structure can incorporate the
high-level image relationship efficiently in its
framework - Feature based index structures cannot
incorporate high-level - image relationship because Spatial
Access Methods require - the distances between objects to be
strictly related to the - object position in a low-dimensional
vector space 13 etc. -
- Distance based index structures can
incorporate the high level image relationship
as it is but not efficiently since of object
pair distance calculations are huge
- Thus, image object similarity needs to be
translated along different - dimensions, which becomes problematic
- Thus, it negates the very essence of indexing
which aims at keeping the computation
overhead as low as possible
9Affinity Hybrid Tree (AH-Tree)
- Proposed Structure
- Solves the two limitations discussed by
combining Feature based and Distance based
index mechanisms into one hybrid structure -
-
Feature based index mechanism filters the
feature space and reduce the of distance
computations to be performed
Reduce computational overhead
Distance based index mechanism
incorporates the high-level image
relationship as it is without translating
it into its low-level equivalence
Increase retrieved image relevance by capturing
the user concept as it is
10Affinity Hybrid Tree (AH-Tree)
- Building AH-Tree
- 1. build Space Index by feeding data points
- 2. for each data node of the space index,
check if of data points is greater than a
threshold - 3. if yes, merge the data nodes builda
distance based index for each data node. Each
data node consists of the root of the
corresponding distance based index tree -
-
11Affinity Hybrid Tree (AH-Tree)
- Characteristics
- Incorporation of affinity value The
affinity value between the image objects is
incorporated after the AH-Tree is built
during the queries for pruning the tree - Why after the tree is built and not during
? - AH-tree uses a metric distance function
to calculate the (dis)similarity among the
image objects in the distance based index
structure - A distance function is metric if it
follows the laws of - (a) symmetry
- (b) positivity
- (c) triangular inequality
12Affinity Hybrid Tree (AH-Tree)
- Why after the tree is built and not during ?
(cont) In order to satisfy the triangular
inequality property, the affinity value could
not be incorporated in the tree during building
as it will necessitate the affinity value
between the image objects used to scale the
similarity measurement to be equal, which is
clearly not possible. - Lemma 1 The affinity relationship cannot be
involved while constructing the metric tree as it
no longer keeps the search space metric.
13Affinity Hybrid Tree (AH-Tree)
- Characteristics
- Promotion of affinity value
- As discussed, since the affinity
value is included in the tree after the
construction, thus a technique should be defined
to allow for the promotion of the affinity value
from the leaves to the intermediate parent nodes
till the root - The affinity value is promoted as follows
- 1. during each query Q, the affinity value of
the leaves with respect to the query object is
derived - 2. the affinity value of a leaf node along with
its siblings is used to calculate the affinity
value of their parent (intermediate node) - 3. step 2 is repeated till the affinity value
of the root is determined -
- The affinity value is thus promoted and
distributed to each node of the tree is used
during retrievals of the similarity query Q
14Affinity Hybrid Tree (AH-Tree)
- Promotion of affinity value (cont)
- The promotion technique is pictorially
described as follows - 1. Let Na and Nb be the leaf nodes of the
distance based index structure with
imageobjects Oa and Ob respectively, and let Nr
be their parent - 2. Let affa,q and affb,q be the
pre-computedaffinity values between query object
anda b, respectively - 3. Thus, affinity of the parent Nr is equal
tomax (affa,q , affb,q )
max (affa,q , affb,q )
15Affinity Hybrid Tree (AH-Tree)
- Promotion of affinity value (cont)The defined
promotion technique ensures 2 important criteria
- Avoiding false dismissal
-
- Avoiding unnecessary traversal
-
- Thus, the affinity promotion technique
implemented has two advantages - enables us to embed high level
relationships in the metric space, thus
providing better users concept capturing and
better query result - avoids unnecessary traversal,
it further aids in reducing the
computation overhead
the promotion of the maximum value ensures that
if any of the children possess an affinity value
gt required affinity, the parent is traversed and
included in the query path
Unnecessary traversal is avoided by discarding
the parent node from the traversal path if none
of its children has an affinity value gt required
affinity
16Affinity Hybrid Tree (AH-Tree)
- Similarity Queries
- query represented as a collection of features
- once the feature vector is obtained, AH-Tree is
traversed from root to feature subspace once
appropriate feature subspaces are obtained,
corresponding metric spaces are merged the
affinity value is promoted from leaf to the root - appropriate image objects are returned whose
- (i) distances with the queried
object satisfy the similarity
measurement requirement and - (ii) affinity values with the
queried object are greater than or equal
to the supplied affinity value
k-NN Search Returns the top k similar objects
to a query image
17Affinity Hybrid Tree (AH-Tree)
- Range Queries
- Both search range and search radius are
supplied with the query - Search the Feature Space to get subspaces
overlapping with the query object or
falling within the specified range3. Merge
neighboring feature spaces to increase the
metric search space - 4. Affinity Promotion
- 5. Similarities of router objects are evaluated
against the query object with respect to the
search radius. If satisfied, evaluated with
respect to the supplied affinity value
18Affinity Hybrid Tree (AH-Tree)
- Range Queries (cont)
- 6. If both are satisfied, the metric search is
iterated for the sub-tree of the
routingobjectAdditional Characteristics
- The search radius (range) is often difficult for
the naïve user to specify correctly. To avoid
such aproblem, a parallel result queue is
maintained which consists of objects satisfying
only the affinity check even if the similarity
check fails. The queue is presented to the user
if he/she is not satisfied with the query
result. Gives a higher priority to the high
level image relationships over low level
representations.
19Affinity Hybrid Tree (AH-Tree)
- k-NN Search
- 1. The feature space uses a branch and
bound 8 technique to perform the k-NN
search2. Performs ordered depth-first search
in the feature space 3. Determines the
k-nearest sub spaces of a given query point
at each non-leaf node, metric
bounds are calculated between
the query point and all its
MBRs and stored in an ordered
list list pruned depending
on similarity measures on
reaching data nodes, the nearest distance
is updated and iteration continues
until k nearest sub spaces
are obtained - 4. The metric trees
corresponding to each feature space are
combined in an ordered fashion to refine
the query result and increase the metric
search space
20Affinity Hybrid Tree (AH-Tree)
- k-NN Search (cont)
- 5. Metric Space search is the same as discussed
in the range search method except thatboth the
search radius and the affinityvalue become
dynamic here -
-
The search radius and affinity value are made
dynamic by making them the distance and affinity
valuebetween Q and the current kth nearest
neighbor respectively, storing all non-leaf
nodes satisfying similarity measurement in a
priority queue
21Experimental Analysis
- Experimental Setup
- application was built using C in
Linux environment - node size of 4KB was used
image database has 10,000 images of 72 semantic
categories feature matrix was developed
from color information of each image in
HSV color space a 10,000 x 10,000
affinity matrix was re-computed from the
training set and used to capture user perception - Computation Overhead Computation overhead is
expressed with the following - a) I/O Cost
- b) CPU Cost ( of distance
computations) -
The AH-Tree is compared with the performance of
M-Tree which has the potential of introducing
the high-level image relationship in its index
structure
The AH-Tree is not compared with any space based
index structures since they are incapable of
embedding high level relationship as it is into
their index structures
22Experimental Analysis
- I/O Costthe space filtering mechanism reduces
the number of image objects in the metric space
which affects the I/O Cost - Tree Construction
-
- Range Query
23Experimental Analysis
24Experimental Analysis
- CPU Costthe space filtering mechanism reduces
the number of image objects in the metric space
which reduces the of similarity computation,
thus reducing CPU Cost drastically - Tree Construction
- Range Query
25Experimental Analysis
26Experimental Analysis
- Accuracy
- accuracy is defined as the percentage of the
retrieved images that are semantically
related to the query image -
Such a stark difference in accuracy is clearly
due to the introduction of the high-level
relationship in the AH-Tree which captures the
users concept of similarity better than only
relying upon the distance measurement
27Conclusion
- AH-Tree clearly outperforms existing
multi-dimensional indexing structures in terms of
Computation Overhead and Accuracy of query
results in case of content based image retrieval
paradigm - The introduction of feature space filtering
technique has a great effect on such drastic
improvements - Using metric space to introduce the high level
image relationship as it is helps to produce such
high accuracy retrieval results - AH-Tree is flexible in introducing any kind of
high level image relationships - The AH-Tree, to the best of our knowledge, is the
first attempt in combining Feature Space and
Metric Space to produce a hybrid structure
capable of solving the two major goals of an
index structure supporting multimedia objects - Efficient query retrieval
with reduced CPU and I/O cost - Relevant query results with
a satisfactory accuracy
measurement
28Future Work
- Introducing query refinement mechanisms
- Integrating some data mining techniques to
calculate affinity relationships on the fly - Developing a unified seamless index structure
supporting all kinds of multimedia objects and
retrieval
29Questions
30References
- 1 S. Berchtold, D. A. Keim, and H. Kriegel. The
x-tree an index structure for high dimensional
data. In Proceedings of the 22nd International
Conference on Very Large Databases, pages 2839,
Bombay, India, September 1996. - 2 K. Chakrabarti. Hybrid tree code.
//www.ics.uci.edu/ kaushik/research/htree.html,
2005. - 3 K. Chakrabarti and S. Mehrotra. The hybrid
tree An index structure for high-dimensional
feature spaces. In Proceedings of the IEEE
International Conference on Data Engineering,
pages 440447, Sydney, Australia, March 1999.4
K. Chakrabarti, K. Porkaew, M. Ortega, and S.
Mehrotra. Evaluating refined queries in top-k
retrieval systems. IEEE Transactions on Knowledge
and Data Engineering (TKDE), 16(2)256270,
February 2004. - 5 P. Ciaccia, M. Patella, and P. Zezula.
M-tree An efficient access method for similarity
search in metric spaces. In Proceedings of the
23rd VLDB International Conference, pages
426435, Athens, Greece, August 1997. - 6 R. Fagin. Fuzzy queries in multimedia
database systems. In PODS 98 Proceedings of the
seventeenth ACM SIGACTSIGMOD-SIGART symposium on
Principles of database systems, pages 110,
Seattle, Washington, United States, June 1998. - 7 D. Greene. An implementation and performance
analysis of spatial data access methods. In
Proceedings of ICDE, pages 606615, Los Angeles,
California, United States, February 1989. - 8 A. Guttman. R-trees A dynamic index
structure for spatial searching. In Proceedings
of the 1984 ACM SIGMOD International Conference
on Management of Data, pages 4757, Boston,
Massachusetts, Unites States, June 1984. - 9 R. Krishnapuram, S. Medasani, J. Hwan, C. Y.
Sik, and R. Balasubramaniam. Content based image
retrieval based on fuzzy approach. IEEE
Transactions on Knowledge and Data Engineering
(TKDE), 16(10)11851199, 2004. - 10 D. B. Lomet and B. Salzberg. The hb-tree A
multiattribute indexing method with good
guaranteed performance. ACM Transactions on
Database Systems, 15(4)625658, 1990. - 11 A. Motro. Vague A user interface to
relational databases that permits vague queries.
ACM Transactions on Office Information Systems,
6(3)187214, 1988.
31References
- 12 M. Patella. M-tree code. http//www-db.deis.u
nibo.it/Mtree, 2005. 13 J. Robinson. The
k-d-b-tree A search structure for large
multidimensional dynamic indexes. In Proceedings
of the 1981 ACM SIGMOD International Conference
on Management of Data, pages 1018, Ann Arbor,
Michigan, United States, April 1981. - 14 N. Roussopoulos, S. Kelley, and F. Vincent.
Nearest neighbor queries. In Proceedings of the
1995 ACM SIGMOD international conference on
Management of Data, pages 71 79, San Jose,
California, United States, May 1995. - 15 Y. Rui, T. Huang, and S. Mehrotra. Content
based image retrieval with image retrieval in
mars. In Proceedings of International Conference
on Image Processing, pages 815818, Santa
Barbara, California, United States, October 1997. - 16 M.-L. Shyu, S.-C. Chen, M. Chen, C. Zhang,
and C.-M. Shu. MMM A stochastic mechanism for
image database queries. In Proceedings of the
IEEE Fifth International Symposium on Multimedia
Software Engineering (MSE2003), pages 188195,
Taichung, Taiwan, ROC, December 2003. - 17 D. A. White and R. Jain. Similarity indexing
with sstree. In Proceedings of the 12th
International Conference on Data Engineering,
pages 516523, New Orleans, LA, United States,
February 1996. - 18 P. N. Yianilos. Data structures and
algorithms for nearest neighbor search in general
metric spaces. In Proceedings of the 3rd Annual
ACM-SIAM Symposium on Discrete Algorithms, pages
311321, Philadelphia, PA, United States, January
1993. - 19 P. Zezula, P. Ciaccia, and F. Rabitti.
M-tree A dynamic index for similarity queries in
multimedia databases. In Technical Report 7,
HERMES ESPRIT LTR Projects, 1996.