Nearest%20Neighbor

About This Presentation

Title:

Nearest%20Neighbor

Description:

Pick a subset I of random coordinates. Hash function, h(p), will return a bucket ID ... Requires parameter tweaking (size of I and number of hash buckets) ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 19

Provided by: csC76

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Nearest%20Neighbor

1
Nearest Neighbor

Paul Hsiung
March 16, 2004

2
Quick Review of NN

Set of points P
Query point q
Distance metric d
Find p in P such that d(p,q) lt d(p,q)for all
p in P

p
q
3
NN Used In

Image databases Pentland et al
Color indexing swain et al
Recognizing 3D objects Murase et al
Shapes Mori et al
Drug testing
DNA sequence matching Buhler

4
Tree-based Approaches

Quadtrees
Split middle in all dimensions
Split until no points or one point left
Kd-trees
Split in one dimension
Pick the middle wisely
Ball-trees
Pick two pivots and split
SR-trees
We have rectangles and spheres, so why not
combine them

5
Indyks Gripe

Beyond 10 or 20 dimensions, tree-based structures
will look at many points
No better than brute force linear search
So he came up with a hash table approach
Locality Sensitive Hashing (LSH)
Rest of talk will be on his paper

6
LSH
7
Interlude Near Neighbor

Set of points P
Query point q
Distance metric d
Find p in P such that d(p,q) lt (1e)d(P,q)where
d(P,q) is the distance of q to its closest point
in P

d(P,q)
p
q
(1e)d(P,q)
8
Hash

Pick a subset I of random coordinates
Hash function, h(p), will return a bucket ID
h(p) projection of p on I

9
Intuition

If two points are close, they hash to same bucket
with some probability p1
If they are far, they hash to same bucket with a
smaller probability p2 lt p1

10
Indyks Hash

Convert coordinates of p to 0,1d
Use Hamming distance d(p,q) positions on
which p and q differ
Example
p(0,1,0,1,1,1,0,0,1,0)
I2,5,7
Then, h(p)(1,1,0)
Demo
http//web.mit.edu/ardonite/6.838/locality-hashing
.htm

11
Why Locality-sensitive?

Prh(p)h(q)(1-d(p,q)/D)k
D is the number of dimensions in the binary
representation
k is the size of I
We can vary the probability by changing k

k1
k2
Pr
Pr
distance
distance
12
Now to Use It (Training)

Generate l hash functions h1..hl
Store each point p in the bucket hi(p) of the
i-th hash array, i1...l

13
Now to Use It (Query)

Retrieve all the points that belong to the
buckets h1(q)..hl(q)
Return the retrieved point that is closest to q
This solves the Near Neighbor problem

14
Indyks Results

Compared with another tree-based algorithm
Color histogram dataset from Corel Draw
20,000 images, 64 dimensions
Used 1k, 2k, 5k, 10k, 19k points for training
1k points are used for query
Computed missed ratio fraction of queries with
no hits

15
Indyks Results
16
Results II
17
Ugly Side

Works best with Hamming distance
Can be extended from L1 and L2 norms
Requires parameter tweaking (size of I and number
of hash buckets)
Does not work well on uniform data

18
Bibliography

A. Gionis, P. Indyk, R. Motwani. Similarity
Search in High Dimensions via Hashing. In VLDB
25th, 1999
J. Buhler. Efficient Large-Scale Sequence
Comparison by Locality-Sensitive Hashing. In
Bioinformatics 17(5) 419-428, 2001
H. Murase, S. K. Nayar. Visual Learning and
Recognition of 3D Objects from Appearance. In
IJCV, Vol. 14, No. 1 5-24, 1995
A. Pentland, R.W. Picard, S. Scalroff. Photobook
Tools for Content Based Manipulation of Image
Databases. In SPIE Vol. 2185 34-47, 1994
M.J. Swain, D.H. Ballard. Color Indexing. In
IJCV, Vol. 7, No. 1 11-32, 1991
G. Mori, S. Belongie, J. Malik. Shape Contexts
Enable Efficient Retrieval of Similar Shapes.
CVPR 1 723-730, 2001
Slides Algorithms for Nearest Neighbor Search
by Piotr Indyk
Slides Approximate Nearest Neighbor in High
Dimensions via Hashing by Aris Gionis, Piotr
Indyk, and Rajeev Motwani

Write a Comment

User Comments (0)

About PowerShow.com

Nearest%20Neighbor - PowerPoint PPT Presentation

Nearest%20Neighbor

Pick a subset I of random coordinates. Hash function, h(p), will return a bucket ID ... Requires parameter tweaking (size of I and number of hash buckets) ... – PowerPoint PPT presentation