Large Scale Discovery of Spatially Related Images - PowerPoint PPT Presentation

About This Presentation
Title:

Large Scale Discovery of Spatially Related Images

Description:

Large Scale Discovery of Spatially Related Images – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 28
Provided by: cmpFel
Category:

less

Transcript and Presenter's Notes

Title: Large Scale Discovery of Spatially Related Images


1
Large Scale Discoveryof Spatially Related Images
  • Ondrej Chum and Jirí Matas
  • Center for Machine Perception
  • Czech Technical University
  • Prague

2
Related Vision Problems
  • Organize my holiday snapshots
  • Schaffalitzky and Zisserman ECCV02
  • Find images containing a given object
    (window)
  • Sivic ICCV03, Nister CVPR06, Jegou CVPR07,
    Philbin CVPR07, Chum ICCV07
  • Find small object in a film
  • Sivic and Zisserman CVPR04
  • Match and reconstruct Saint Marco
  • Snavely, Seitz and Szeliski SIGGRAPH06

This Work
  • Find and match ALL spatially related images in a
    large database, using only visual information,
    i.e. not using (flicker) tags, EXIF info, GPS, .

3
Visual Only Approach
  • Large database (100 000 images in our
    experiments)
  • Find spatially related clusters
  • Fast method, even for sizes up to 250 images
  • Probability of successful discovery of spatial
    relation of images independent of database size

4
Image Clustering and its Time Complexity
Standard Approach (using image retrieval) Quadrat
ic method in the size of database D -- O(D2) the
multiplicative constant at the quadratic term 1
quadratic even for small D
  1. Take each image in turn
  2. Use a image retrieval system to retrieve related
    images
  3. Compute connected components of the graph
  • Proposed method
  • Seed Generation hashingcharacterize images by
    pseudo-random numbers stored in a hash table
  • time complexity equal to the sum of
    variances of Poisson distributions
  • linear for database size D ¼ 250
  • 2. Seed Growing retrieval
  • complete the clusters only for cluster
    members c ltlt D, complexity O(cD)

5
Building on Two Methods
  • Fast (low recall) seed generation based on
    hashing
  • Thorough (high recall) seed growing based on
    image retrieval

Chum, Philbin, Isard, and Zisserman Scalable
Near Identical Image and Shot Detection CIVR 2007
Chum, Philbin, Sivic, Isard, and Zisserman Total
Recall Automatic Query Expansion with a
Generative Feature Model for Object
Retrieval ICCV 2007
6
Image Representation
Feature detector
SIFT descriptor Lowe04
Vector quantization

Visual vocabulary
Bag of words
Set of words
7
Hypothesizing Seeds with min-Hash
  • Spatially related images share visual words
  • Problem Robustly estimate set overlap of high
    dimensional sparse binary vectors in constant
    time independent of the dimensionality (d¼105)
  • Set overlap probabilistically estimated via
    min-Hash
  • Similar approach as LSH (locally sensitive
    hashing)

Image similarity measured as a set overlap (using
min-Hash algorithm)
A1 n A2
A1
A2
A1 U A2
8
min-Hash
  • According to some (replicable) key select a small
    number of non-zero elements
  • Similar vectors should have similar selected
    elements
  • Key generate a random number (a hash) for each
    dimension, choose nonzero element with minimal
    value of the key

29 12 19
0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
26 3 26
0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0
29 12 1
1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
35 27 7
0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0
0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
9
Seed Generation Probability of Success
An image pair forms a seed if at least one of k
s-tuples of min-Hashes
agrees. Probability that an image pair is
retrieved is a function of the similarity
where s,k are user-controllable parameters of
the method s governs the size of the hashing
table k is number of hashing tables Successfully
retrieved pair of images at least one
collision in one of the tables (equivalent to
AND-OR)
10
Probability of Retrieving an Image Pair
Images of the same object and unrelated images
Near duplicate Images
13.9 (sim 0,066)
100 (sim 0.746)
probability of retrieval
100 (sim 0.322)
8.9 (sim 0.057)
99.5 (sim 0,217)
5.1 (sim 0.047)
similarity (set overlap)
11
Spatially Related Images
18.9 (sim 0,074)
5.1 (sim 0,047)
probability of retrieval (log scale)
8.9
10.7
7.2
9.8
similarity (set overlap)
13.9
5.1
8.9
16.3
13.9
12
Seed Generation
5
4
6
4
7
10
94.00
85.73
68.88
P (no seed)
13
Seed Generation
Resemblance to RANSAC Related image pair an
all inlier sample (there is no need to enumerate
them all, one hit is sufficient) Probability of
retrieving an image pair fraction of
inliers The number of related image pairs how
many times we can try
68.88
55.13
31.84
1.94
P (no seed)
14
At Least One Seed in Cluster
Estimate of the probability of failure plot
against the size of the cluster assumption used
in this plot all images in the cluster are
related
P(no seed)
cluster size
15
Growing the Seed
  • Application of Total Recall
  • Combining average query expansion and transitive
    closure
  • 3D geometric constraint (not only affine
    transformation)
  • Tighter geometric constraints (10 pixel threshold)

Average query expansion (from possibly multiple
coplanar structures)
backproject features
query
enhanced query
Transitive closure crawl
16
Summary of the Method
Images
Rejected seed
Unknown structure
min-Hash seeds
Spatial verification
Query Expansion
x
Seed
Failed retrieval
Cluster skeleton
Missed cluster
17
Experiment 1 Univ. of Kentucky Dataset
  • Nister Stewenius
  • 2550 clusters of size 4 very small clusters
  • partial ground truth different cluster
    share the same background
  • How many clusters have at least one seed?

46.9
CONTRAST DIFFERENT TASK If we were looking for
ALL results not ANY (seed) the standard retrieval
measure on this dataset would be only 1.63 out of
4
18
Experimental Validation UKY dataset
In University of Kentucky dataset average
similarity slightly above 0.06

P(no seed)
cluster size
19
Experimental Results on 100k Images
Images downloaded from FLICKR Includes 11 Oxford
Landmarks with manually labelled ground truth
All Soul's
Hertford
Ashmolean
Keble
Balliol
Magdalen
Bodleian
Pitt Rivers
Christ Church
Radcliffe Camera
Cornmarket
20
Experimental Results on 100k Images
Settings scalable to millions images, also
finding small clusters
Settings scalable to billions images, only
finding larger clusters
Timing 17 min 13 sec 16 min 20 sec 0.019 sec
/ image
21
Application Object Labelling
  • Factorizing the clusters using multiple
    constrains
  • Matches between images
  • Weak geometric constraints (coplanarity,
    disparity)
  • Photographers psychology tends to take
    pictures of single objects

22
(No Transcript)
23
(No Transcript)
24
Automatic 3D Reconstruction
25
(No Transcript)
26
Conclusions
  • Novel method for fast clustering in large
    collections
  • Combines fast low recall method (seed generation)
    and thorough (total recall) method for seed
    growing
  • Probability of finding a cluster rapidly
    increases with its size and is independent of the
    size of the database
  • Can be incrementally updated as the database
    grows
  • Efficient 0.019 sec / image on a single PC
  • Fully parallelizable
  • A state of the art near duplicate detection comes
    as a bonus (as a part of seed generation)

27
Thank you!
Technical Report available http//cmp.felk.cvut.cz
/chum/papers/Chum-TR-08.pdf
Thanks to Daniel Martinec, Michal Perdoch, James
Philbin, Jakub Pokluda
Write a Comment
User Comments (0)
About PowerShow.com