CIS750 - PowerPoint PPT Presentation

About This Presentation
Title:

CIS750

Description:

Design fast search algorithms that locate objects that match a query object, ... day. Mutlimedia Indexing Detailed outline. Generic Multimedia Indexing. problem dfn ... – PowerPoint PPT presentation

Number of Views:126
Avg rating:3.0/5.0
Slides: 58
Provided by: Vas111
Learn more at: https://cis.temple.edu
Category:
Tags: cis750

less

Transcript and Presenter's Notes

Title: CIS750


1
CIS750 Seminar in Advanced Topics in Computer
ScienceAdvanced topics in databases
Multimedia Databases
  • V. Megalooikonomou
  • Generic Multimedia Indexing
  • (some slides are based on notes by C. Faloutsos)

2
General Overview
  • Multimedia Indexing
  • Spatial Access Methods (SAMs)
  • k-d trees
  • Point Quadtrees
  • MX-Quadtree
  • z-ordering
  • R-trees
  • Generic Multimedia Indexing

3
Mutlimedia Indexing Detailed outline
  • Generic Multimedia Indexing
  • problem dfn
  • Distance function
  • Similarity queries Types
  • Requirements (ideal method)
  • Basic idea, Lower-bounding
  • Gemini approach
  • Applications
  • 1-D Time sequences
  • 2-D Color images

4
Generic Multimedia Indexing - problem
  • Given a database of multimedia objects
  • Design fast search algorithms that locate objects
    that match a query object, exactly or
    approximately
  • Objects
  • 1-d time sequences
  • Digitized voice or music
  • 2-d color images
  • 2-d or 3-d gray scale medical images
  • Video clips
  • E.g. Find companies whose stock prices move
    similarly

5
Mutlimedia Indexing Detailed outline
  • Generic Multimedia Indexing
  • problem dfn
  • Distance function
  • Similarity queries Types
  • Requirements (ideal method)
  • Basic idea, Lower-bounding
  • Gemini approach
  • Applications
  • 1-D Time sequences
  • 2-D Color images

6
Generic Multimedia Indexing- problem
  • 1st step provide a measure for the distance
    between two objects
  • Distance function D()
  • Given two objects OA, OB the distance
    (dis-similarity) of the two objects is denoted
    by
  • D(OA, OB)
  • E.g., Euclidean distance (sum of squared
    differences) of two equal-length time series

7
Mutlimedia Indexing Detailed outline
  • Generic Multimedia Indexing
  • problem dfn
  • Distance function
  • Similarity queries
  • Requirements (ideal method)
  • Basic idea, Lower-bounding
  • Gemini approach
  • Applications
  • 1-D Time sequences
  • 2-D Color images

8
Types of Similarity Queries
  • Similarity queries are classified into
  • Whole match queries
  • Given a collection of N objects O1,, ON and a
    query object Q find data objects that are within
    distance ? from Q
  • Sub-pattern Match
  • Given a collection of N objects O1,, ON and a
    query (sub-) object Q and a tolerance ? identify
    the parts of the data objects that match the
    query Q

9
Types of Similarity Queries
std
S1
F(S1)
1
365
day
F(Sn)
Sn
avg
day
1
365
  • Similarity queries are classified into
  • Whole match queries
  • Given a collection of N objects O1,, ON and a
    query object Q find data objects that are within
    distance ? from Q
  • Sub-pattern Match
  • Given a collection of N objects O1,, ON and a
    query (sub-) object Q and a tolerance ? identify
    the parts of the data objects that match the
    query Q

10
Types of Similarity Queries
std
S1
F(S1)
1
365
day
F(Sn)
Sn
avg
day
1
365
  • Similarity queries are classified into
  • Whole match queries
  • Given a collection of N objects O1,, ON and a
    query object Q find data objects that are within
    distance ? from Q
  • Sub-pattern Match
  • Given a collection of N objects O1,, ON and a
    query (sub-) object Q and a tolerance ? identify
    the parts of the data objects that match the
    query Q

11
Types of Similarity Queries
  • Similarity queries are classified into
  • Whole match queries
  • Given a collection of N objects O1,, ON and a
    query object Q find data objects that are within
    distance ? from Q
  • Sub-pattern Match
  • Given a collection of N objects O1,, ON and a
    query (sub-) object Q and a tolerance ? identify
    the parts of the data objects that match the
    query Q

12
Types of Similarity Queries
  • Additional types of queries
  • K- Nearest Neighbor queries
  • Given a collection of N objects O1,, ON and a
    query object Q find the K most similar data
    objects to Q
  • All pairs queries (or spatial joins)
  • Given a collection of N objects O1,, ON find all
    objects that are within distance ? from each other

13
Types of Similarity Queries
  • Additional types of queries
  • K- Nearest Neighbor queries
  • Given a collection of N objects O1,, ON and a
    query object Q find the K most similar data
    objects to Q
  • All pairs queries (or spatial joins)
  • Given a collection of N objects O1,, ON find all
    objects that are within distance ? from each other

14
Mutlimedia Indexing Detailed outline
  • Generic Multimedia Indexing
  • problem dfn
  • Distance function
  • Similarity queries Types
  • Requirements (ideal method)
  • Basic idea, Lower-bounding
  • Gemini approach
  • Applications
  • 1-D Time sequences
  • 2-D Color images

15
Idea method requirements
  • Fast sequential scanning and distance
    calculation with each and every object too slow
    for large databases
  • Correct No false dismissals. False alarms are
    acceptable. Why?
  • Small space overhead
  • Dynamic easy to insert, delete, and update
    objects

16
Approach Outline
  • Use k feature extraction functions to map objects
    into k-dimensional space (applying a mapping F ()
    )
  • Use highly fine-tuned database SAMs (Spatial
    Access Methods) like R-trees to accelerate the
    search (by pruning out large portions of the
    database that are not promising)

17
Mutlimedia Indexing Detailed outline
  • Generic Multimedia Indexing
  • problem dfn
  • Distance function
  • Similarity queries Types
  • Requirements (ideal method)
  • Basic idea, Lower-bounding
  • Gemini approach
  • Applications
  • 1-D Time sequences
  • 2-D Color images

18
Basic idea
  • Focus on whole match queries
  • Given a collection of N objects O1,, ON, a
    distance/dis-similarity function D(Oi, Oj), and
    a query object Q find data objects that are
    within distance ? from Q
  • Sequential scanning?

19
Basic idea
  • Focus on whole match queries
  • Given a collection of N objects O1,, ON, a
    distance/dis-similarity function D(Oi, Oj), and
    a query object Q find data objects that are
    within distance ? from Q
  • Sequential scanning?
  • May be too slow.. Why?

20
Basic idea
  • Focus on whole match queries
  • Given a collection of N objects O1,, ON, a
    distance/dis-similarity function D(Oi, Oj), and
    a query object Q find data objects that are
    within distance ? from Q
  • Sequential scanning?
  • May be too slow.. for the following reasons
  • Distance computation is expensive (e.g., editing
    distance in DNA strings)
  • The Database size N may be huge
  • Faster alternative?

21
Basic idea
  • Faster alternative
  • Step 1 a quick and dirty test to discard
    quickly the vast majority of non-qualifying
    objects
  • Step 2 use of SAMs to achieve faster than
    sequential searching
  • Example
  • Database of yearly stock price movements
  • Euclidean distance function
  • Characterize with a single number (feature)
  • Or use two or more features

22
Basic idea - illustration
  • A query with tolerance ? becomes a sphere with
    radius ?

23
Basic idea caution!
  • The mapping F() from objects to k-d points should
    not distort the distances
  • D() distance of two objects
  • Df() distance of their corresponding feature
    vectors
  • Ideally, perfect preservation of distances
  • In practice, a guarantee of no false dismissals
  • How?

24
Basic idea caution!
  • The mapping F() from objects to k-d points should
    not distort the distances
  • D() distance of two objects
  • Df() distance of the corresponding feature
    vectors
  • Ideally, perfect preservation of distances
  • In practice, a guarantee of no false dismissals
  • How? If the distance in f-space matches or
    underestimates the distance between two objects
    in the original space

25
Basic idea Lower bounding
  • Let O1, O2 be two objects with distance function
    D() and F(O1), F(O2), be their feature vectors
    with distance function Df(), then
  • To guarantee no false dismissals for whole
    match queries, the feature extraction function
    F() should satisfy
  • Df(F(O1), F(O2)) ? D(O1, O2)
  • for every pair of objects O1, O2

26
Lower bounding - Proof
  • Let Q be the query object and O be the qualifying
    object and ? be the tolerance.
  • Prove If object O qualifies it will be retrieved
    by a range query in the f-space
  • Or, D(Q, O) ? ? ? Df(F(Q), F(O)) ? ?
  • However, Df(F(Q), F(O)) ? D(Q, O) ? ? ?
  • What about all-pairs?
  • What about nearest-neighbor queries?

27
Lower bounding - Proof
  • Let Q be the query object and O be the qualifying
    object and ? be the tolerance.
  • Prove If object O qualifies it will be retrieved
    by a range query in the f-space
  • Or, D(Q, O) ? ? ? Df(F(Q), F(O)) ? ?
  • However, Df(F(Q), F(O)) ? D(Q, O) ? ? ?
  • What about all-pairs? (spatial join on
    f-space)
  • What about nearest-neighbor queries?

28
Lower bounding - Proof
  • Let Q be the query object and O be the qualifying
    object and ? be the tolerance.
  • Prove If object O qualifies it will be retrieved
    by a range query in the f-space
  • Or, D(Q, O) ? ? ? Df(F(Q), F(O)) ? ?
  • However, Df(F(Q), F(O)) ? D(Q, O) ? ? ?
  • What about all-pairs? (spatial join on
    f-space)
  • What about nearest-neighbor queries? ??

29
Mutlimedia Indexing Detailed outline
  • Generic Multimedia Indexing
  • problem dfn
  • Distance function
  • Similarity queries Types
  • Requirements (ideal method)
  • Basic idea, Lower-bounding
  • Gemini approach
  • Applications
  • 1-D Time sequences
  • 2-D Color images

30
GEneric Multimedia object INdexIng
  • GEMINI approach
  • Determine distance function D()
  • Find one or more numerical feature-extraction
    functions (to provide a quick and dirty test)
  • Prove that Df() lower-bounds D() to guarantee no
    false dismissals
  • Use a SAM (e.g., R-tree) to store and retrieve
    k-d feature vectors
  • !!! The methodology focuses on the speed of
    search only not on the quality of the results
    which relies on the distance function

31
Generic Multimedia Object Indexing
  • Applications
  • 1-d time sequences
  • 2-d color images
  • Problems to solve
  • How to apply the lower-bounding lemma
  • Curse of Dimensionality (time sequences)
  • Cross-talk of features (color images)

32
Mutlimedia Indexing Detailed outline
  • Generic Multimedia Indexing
  • problem dfn
  • Distance function
  • Similarity queries Types
  • Requirements (ideal method)
  • Basic idea, Lower-bounding
  • Gemini approach
  • Applications
  • 1-D Time sequences
  • 2-D Color images

33
1-D Time Sequences
  • Distance function Euclidean distance
  • Find features that
  • Preserve/lower-bound the distance
  • Carry as much information as possible(reduce
    false alarms)
  • If we are allowed to use only one feature what
    would this be?

34
1-D Time Sequences
  • Distance function Euclidean distance
  • Find features that
  • Preserve/lower-bound the distance
  • Carry as much information as possible(reduce
    false alarms)
  • If we are allowed to use only one feature what
    would this be? The average.
  • extending it

35
1-D Time Sequences
  • Distance function Euclidean distance
  • Find features that
  • Preserve/lower-bound the distance
  • Carry as much information as possible(reduce
    false alarms)
  • If we are allowed to use only one feature what
    would this be? The average.
  • extending it
  • The average of 1st half, of the 2nd half, of the
    1st quarter, etc.
  • Coefficients of the Fourier transform (DFT),
    wavelet transform, etc.

36
1-D Time Sequences
  • Show that the distance in feature space
    lower-bounds the actual distance
  • What about DFT?

37
1-D Time Sequences
  • Show that the distance in feature space
    lower-bounds the actual distance
  • What about DFT?
  • Parsevals Theorem DFT preserves the energy
    of the signal as well as the distances between
    two signals.
  • D(x,y) D(X,Y)
  • where X and Y are the Fourier transforms of
    x and y
  • If we keep the first k ? n coefficients of DFT we
    lower-bound the actual distance

38
1-D Time Sequences
  • Response time improves as the transform
    concentrates more the energy of the signal
  • DFT concentrates the energy for a large class of
    signals, the colored noises
  • Colored noises skewed energy spectrum that drops
    as O(f -b)
  • Energy spectrum or power spectrum of a signal is
    the square of the amplitude Xf as a function of
    the frequency f
  • b 2 random walks or brown noise (very
    predictable)
  • b ? 2 black noises
  • b 1 pink noise
  • b 0 white noise (completely unpredictable)
  • Colored noises even in images (photographs)

39
Mutlimedia Indexing Detailed outline
  • Generic Multimedia Indexing
  • problem dfn
  • Distance function
  • Similarity queries Types
  • Requirements (ideal method)
  • Basic idea, Lower-bounding
  • Gemini approach
  • Applications
  • 1-D Time sequences
  • 2-D Color images

40
2-D color images
  • Image features for Content Based Image Retrieval
    (CBIR)
  • Low Level
  • Color color histograms
  • Texture directionality, granularity, contrast
  • Shape turning angle, moments of inertia,
    pattern spectrum
  • Position 2D strings method
  • etc
  • Object Level
  • Regions

41
2-D color images Color histograms
  • Each color image a 2-d array of pixels
  • Each pixel 3 color components (R,G,B)
  • h colors each color denoting a point in 3-d
    color space (as high as 224 colors)
  • For each image compute the h-element color
    histogram each component is the percentage of
    pixels that are most similar to that color
  • The histogram of image I is defined as
  • For a color Ci , Hci(I) represents the number
    of pixels of color Ci in image I
  • OR
  • For any pixel in image I, Hci(I) represents the
    possibility of that pixel having color Ci.

42
2-D color images Color histograms
  • Usually cluster similar colors together and
    choose one representative color for each color
    bin
  • Most commercial CBIR systems include color
    histogram as one of the features (e.g., QBIC of
    IBM)
  • No space information

43
Color histograms - distance
  • One method to measure the distance between two
    histograms x and y is
  • where the color-to-color similarity matrix
    A has entries aij that describe the similarity
    between color i and color j

44
Color histograms lower bounding
  • Two obstacles for using color-histograms as
    feature vectors in GEMINI
  • Dimensionality curse (h is large 64, 128)
  • Distance function is quadratic
  • It involves all cross terms (cross-talk among
    features)
  • - expensive to compute
  • - precludes the use of SAMs

bright red
pink
orange
x
q
e.g.,64 colors
45
Color histograms lower bounding
  • 1st step define the distance function between
    two color images D()dh()
  • 2nd step find numerical features (one or more)
    whose Euclidean distance lower-bounds dh()
  • If we allowed to use one numerical feature to
    describe the color image what should it be?
  • Avg. amount for each color component (R,G,B)
  • Where ,
    similarly for G and B
  • Where P is the number of pixels in the
    image, R(p) is the red component (intensity) of
    the p-th pixel

46
Color histograms lower bounding
  • Given the average color vectors and of two
    images we define davg() as the Euclidean distance
    between the 3-d average color vectors
  • 3rd step to prove that the feature distance
    davg() lower-bounds the actual distance dh()
  • Main idea of approach
  • First a filtering using the average (R,G,B)
    color,
  • then a more accurate matching using the full
    h-element histogram

47
Color auto-correlogram
  • pick any pixel p1 of color Ci in the image I
  • at distance k away from p1 pick another pixel p2
  • what is the probability that p2 is also of color
    Ci ?

Red ?
k
P2
P1
Image I
48
Color auto-correlogram
  • The auto-correlogram of image I for color Ci ,
    distance k
  • Integrate both color information and space
    information.

49
Color auto-correlogram
50
Implementations
  • Pixel Distance Measures
  • Use D8 distance (also called chessboard
    distance)
  • Choose distance k1,3,5,7
  • Computation complexity
  • Histogram
  • Correlogram

51
Implementations
  • Features Distance Measures
  • D( f(I1) - f(I2) ) is small ? I1 and I2 are
    similar.
  • Example f(a)1000, f(a)1050 f(b)100,
    f(b)150
  • For histogram
  • For correlogram

52
Color Histogram vs Correlogram
  • If there is no difference between the query and
    the target images, both methods have good
    performance.

Correlogram method
Query Image (512 colors)
1st
2nd
3rd
4th
5th
Histogram method
1st
2nd
3rd
4th
5th
53
Color Histogram vs Correlogram
  • The correlogram method is more stable to color
    change than the histogram method.

Query
Correlogram method 1st Histogram method 48th
Target
54
Color Histogram vs Correlogram
  • The correlogram method is more stable to large
    appearance change than the histogram method

Query
Correlogram method 1st Histogram method 31th
Target
55
Color Histogram vs Correlogram
  • The correlogram method is more stable to contrast
    brightness change than the histogram method.

Query 3
Query 1
Query 2
Query 4
C 178th H 230th
C 1st H 1st
C 1st H 3rd
C 5th H 18th
Target
56
Color Histogram vs Correlogram
  • The color correlogram describes the global
    distribution of local spatial correlations of
    colors.
  • Its easy to compute
  • Its more stable than the color histogram method

57
Mutlimedia Indexing Conclusions
  • GEMINI is a popular method
  • Whole matching problem
  • Should pay attention to
  • Distance functions
  • Feature Extraction functions
  • Lower Bounding
  • Particular application
  • Sub-pattern matching?
Write a Comment
User Comments (0)
About PowerShow.com