Nearest%20Neighbor - PowerPoint PPT Presentation

About This Presentation
Title:

Nearest%20Neighbor

Description:

Pick a subset I of random coordinates. Hash function, h(p), will return a bucket ID ... Requires parameter tweaking (size of I and number of hash buckets) ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 19
Provided by: csC76
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Nearest%20Neighbor


1
Nearest Neighbor
  • Paul Hsiung
  • March 16, 2004

2
Quick Review of NN
  • Set of points P
  • Query point q
  • Distance metric d
  • Find p in P such that d(p,q) lt d(p,q)for all
    p in P

p
q
3
NN Used In
  • Image databases Pentland et al
  • Color indexing swain et al
  • Recognizing 3D objects Murase et al
  • Shapes Mori et al
  • Drug testing
  • DNA sequence matching Buhler

4
Tree-based Approaches
  • Quadtrees
  • Split middle in all dimensions
  • Split until no points or one point left
  • Kd-trees
  • Split in one dimension
  • Pick the middle wisely
  • Ball-trees
  • Pick two pivots and split
  • SR-trees
  • We have rectangles and spheres, so why not
    combine them

5
Indyks Gripe
  • Beyond 10 or 20 dimensions, tree-based structures
    will look at many points
  • No better than brute force linear search
  • So he came up with a hash table approach
    Locality Sensitive Hashing (LSH)
  • Rest of talk will be on his paper

6
LSH
7
Interlude Near Neighbor
  • Set of points P
  • Query point q
  • Distance metric d
  • Find p in P such that d(p,q) lt (1e)d(P,q)where
    d(P,q) is the distance of q to its closest point
    in P

d(P,q)
p
q
(1e)d(P,q)
8
Hash
  • Pick a subset I of random coordinates
  • Hash function, h(p), will return a bucket ID
  • h(p) projection of p on I

9
Intuition
  • If two points are close, they hash to same bucket
    with some probability p1
  • If they are far, they hash to same bucket with a
    smaller probability p2 lt p1

10
Indyks Hash
  • Convert coordinates of p to 0,1d
  • Use Hamming distance d(p,q) positions on
    which p and q differ
  • Example
  • p(0,1,0,1,1,1,0,0,1,0)
  • I2,5,7
  • Then, h(p)(1,1,0)
  • Demo
  • http//web.mit.edu/ardonite/6.838/locality-hashing
    .htm

11
Why Locality-sensitive?
  • Prh(p)h(q)(1-d(p,q)/D)k
  • D is the number of dimensions in the binary
    representation
  • k is the size of I
  • We can vary the probability by changing k

k1
k2
Pr
Pr
distance
distance
12
Now to Use It (Training)
  • Generate l hash functions h1..hl
  • Store each point p in the bucket hi(p) of the
    i-th hash array, i1...l

13
Now to Use It (Query)
  • Retrieve all the points that belong to the
    buckets h1(q)..hl(q)
  • Return the retrieved point that is closest to q
  • This solves the Near Neighbor problem

14
Indyks Results
  • Compared with another tree-based algorithm
  • Color histogram dataset from Corel Draw
  • 20,000 images, 64 dimensions
  • Used 1k, 2k, 5k, 10k, 19k points for training
  • 1k points are used for query
  • Computed missed ratio fraction of queries with
    no hits

15
Indyks Results
16
Results II
17
Ugly Side
  • Works best with Hamming distance
  • Can be extended from L1 and L2 norms
  • Requires parameter tweaking (size of I and number
    of hash buckets)
  • Does not work well on uniform data

18
Bibliography
  • A. Gionis, P. Indyk, R. Motwani. Similarity
    Search in High Dimensions via Hashing. In VLDB
    25th, 1999
  • J. Buhler. Efficient Large-Scale Sequence
    Comparison by Locality-Sensitive Hashing. In
    Bioinformatics 17(5) 419-428, 2001
  • H. Murase, S. K. Nayar. Visual Learning and
    Recognition of 3D Objects from Appearance. In
    IJCV, Vol. 14, No. 1 5-24, 1995
  • A. Pentland, R.W. Picard, S. Scalroff. Photobook
    Tools for Content Based Manipulation of Image
    Databases. In SPIE Vol. 2185 34-47, 1994
  • M.J. Swain, D.H. Ballard. Color Indexing. In
    IJCV, Vol. 7, No. 1 11-32, 1991
  • G. Mori, S. Belongie, J. Malik. Shape Contexts
    Enable Efficient Retrieval of Similar Shapes.
    CVPR 1 723-730, 2001
  • Slides Algorithms for Nearest Neighbor Search
    by Piotr Indyk
  • Slides Approximate Nearest Neighbor in High
    Dimensions via Hashing by Aris Gionis, Piotr
    Indyk, and Rajeev Motwani
Write a Comment
User Comments (0)
About PowerShow.com