SIFT - PowerPoint PPT Presentation

About This Presentation
Title:

SIFT

Description:

... with differing levels of image noise Find nearest neighbor in database of 30,000 features Performance: ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 97
Provided by: jwk72
Category:
Tags: sift | database | image

less

Transcript and Presenter's Notes

Title: SIFT


1
SIFT
  • Guest Lecture by Jiwon Kim
  • http//www.cs.washington.edu/homes/jwkim/

2
SIFT Features andIts Applications
3
Autostitch Demo
4
Autostitch
  • Fully automatic panorama generation
  • Input set of images
  • Output panorama(s)
  • Uses SIFT (Scale-Invariant Feature Transform) to
    find/align images

5
1. Solve for homography
6
1. Solve for homography
7
1. Solve for homography
8
2. Find connected sets of images
9
2. Find connected sets of images
10
2. Find connected sets of images
11
3. Solve for camera parameters
  • New images initialised with rotation, focal
    length of best matching image

12
3. Solve for camera parameters
  • New images initialised with rotation, focal
    length of best matching image

13
4. Blending the panorama
  • Burt Adelson 1983
  • Blend frequency bands over range ? l

14
2-band Blending
Low frequency (l gt 2 pixels)
High frequency (l lt 2 pixels)
15
Linear Blending
16
2-band Blending
17
So, what is SIFT?
  • Scale-Invariant Feature Transform
  • David Lowe at UBC
  • Scale/rotation invariant
  • Currently best known feature descriptor
  • Many real-world applications
  • Object recognition
  • Panorama stitching
  • Robot localization
  • Video indexing

18
Example object recognition
19
SIFT properties
  • Locality features are local, so robust to
    occlusion and clutter
  • Distinctiveness individual features can be
    matched to a large database of objects
  • Quantity many features can be generated for even
    small objects
  • Efficiency close to real-time performance

20
SIFT algorithm overview
  • Feature detection
  • Detect points that can be repeatably selected
    under location/scale change
  • Feature description
  • Assign orientation to detected feature points
  • Construct a descriptor for image patch around
    each feature point
  • Feature matching

21
1. Feature detection
  • Detect points stable under location/scale change
  • Build continuous space (x, y, scale)
  • Approximated by multi-scale Difference-of-Gaussian
    pyramid
  • Select maxima/minima in (x, y, scale)

22
1. Feature detection
23
1. Feature detection
  • Localize extrema by fitting a quadratic
  • Sub-pixel/sub-scale interpolation using Taylor
    expansion
  • Take derivative and set to zero

24
1. Feature detection
  • Discard low-contrast/edge points
  • Low contrast discard keypoints with lt
    threshold
  • Edge points high contrast in one direction, low
    in the other ? compute principal curvatures from
    eigenvalues of 2x2 Hessian matrix, and limit ratio

25
1. Feature detection
  • Example
  • (a) 233x189 image
  • (b) 832 DOG extrema
  • (c) 729 left after peak
  • value threshold
  • (d) 536 left after testing
  • ratio of principle
  • curvatures

26
2. Feature description
  • Assign orientation to keypoints
  • Create histogram of local gradient directions
    computed at selected scale
  • Assign canonical orientation at peak of smoothed
    histogram

27
2. Feature description
  • Construct SIFT descriptor
  • Create array of orientation histograms
  • 8 orientations x 4x4 histogram array 128
    dimensions

28
2. Feature description
  • Advantage over simple correlation
  • Gradients less sensitive to illumination change
  • Gradients may shift robust to deformation,
    viewpoint change

29
Performance stability to noise
  • Match features after random change in image scale
    orientation, with differing levels of image
    noise
  • Find nearest neighbor in database of 30,000
    features

30
Performancestability to affine change
  • Match features after random change in image scale
    orientation, with 2 image noise, and affine
    distortion
  • Find nearest neighbor in database of 30,000
    features

31
Performance distinctiveness
  • Vary size of database of features, with 30 degree
    affine change, 2 image noise
  • Measure correct for single nearest neighbor
    match

32
3. Feature matching
  • For each feature in A, find nearest neighbor in B

A
B
33
3. Feature matching
  • Nearest neighbor search too slow for large
    database of 128-dimenional data
  • Approximate nearest neighbor search
  • Best-bin-first Beis et al. 97 modification to
    k-d tree algorithm
  • Use heap data structure to identify bins in order
    by their distance from query point
  • Result Can give speedup by factor of 1000 while
    finding nearest neighbor (of interest) 95 of the
    time

34
3. Feature matching
  • Reject false matches
  • Compare distance of nearest neighbor to second
    nearest neighbor
  • Common features arent distinctive, therefore bad
  • Threshold of 0.8 provides excellent separation

35
3. Feature matching
  • Now, given feature matches
  • Find an object in the scene
  • Solve for homography (panorama)

36
3. Feature matching
  • Example 3D object recognition

37
3. Feature matching
  • 3D object recognition
  • Assume affine transform clusters of size gt3
  • Looking for 3 matches out of 3000 that agree on
    same object and pose too many outliers for
    RANSAC or LMS
  • Use Hough Transform
  • Each match votes for a hypothesis for object
    ID/pose
  • Voting for multiple bins large bin size allow
    for error due to similarity approximation

38
3. Feature matching
  • 3D object recognition solve for pose
  • Affine transform of x,y to u,v
  • Rewrite to solve for transform parameters

39
3. Feature matching
  • 3D object recognition verify model
  • Discard outliers for pose solution in prev step
  • Perform top-down check for additional features
  • Evaluate probability that match is correct
  • Use Bayesian model, with probability that
    features would arise by chance if object was not
    present
  • Takes account of object size in image, textured
    regions, model feature count in database,
    accuracy of fit Lowe 01

40
Planar recognition
  • Training images

41
Planar recognition
  • Reliably recognized at a rotation of 60 away
    from the camera
  • Affine fit approximates perspective projection
  • Only 3 points are needed for recognition

42
3D object recognition
  • Training images

43
3D object recognition
  • Only 3 keys are needed for recognition, so extra
    keys provide robustness
  • Affine model is no longer as accurate

44
Recognition under occlusion
45
Illumination invariance
46
Applications of SIFT
  • Object recognition
  • Panoramic image stitching
  • Robot localization
  • Video indexing
  • The Office of the Past
  • Document tracking and recognition

47
Location recognition
48
Robot Localization
49
Map continuously built over time
50
Locations of map features in 3D
51
  • Sony Aibo
  • SIFT usage
  • Recognize
  • charging
  • station
  • Communicate
  • with visual
  • cards
  • Teach object
  • recognition

52
The Office of the Past
  • Paper everywhere

53
Unify physical andelectronic desktops
Video camera
  • Recognize video of paper on physical desktop
  • Tracking
  • Recognition
  • Linking

Desktop
54
Unify physical andelectronic desktops
Video camera
  • Applications
  • Find lost documents
  • Browse remote desktop
  • Find electronic version
  • History-based queries

Desktop
55
Example input video
56
Demo Remote desktop
57
System overview
Video camera
Computer
User
Desk
58
System overview
Video of desk
59
System overview
Images from PDF
Video of desk
60
System overview
Images from PDF
Video of desk
Track recognize
61
System overview
Internal representation
Images from PDF
Video of desk
Track recognize
T
T1
62
System overview
Internal representation
Images from PDF
Video of desk
Track recognize
T
T1
Scene Graph
63
System overview
Where is my W-2?
Internal representation
Images from PDF
Video of desk
Track recognize
T
T1
64
System overview
Where is my W-2?
Answer
Internal representation
Images from PDF
Video of desk
Track recognize
Desk
Desk
T
T1
65
Assumptions
  • Document
  • Corresponding electronic copy exists
  • No duplicates of same document

66
Assumptions
  • Document
  • Corresponding electronic copy exists
  • No duplicates of same document
  • Motion
  • 3 event types move/entry/exit
  • One document at a time
  • Only topmost document can move

67
Non-assumptions
  • Desk need not be initially empty

68
Non-assumptions
  • Desk need not be initially empty
  • Stacks may overlap

69
Algorithm overview
Input Frames


70
Algorithm overview
Input Frames


Event Detection
before
after
71
Algorithm overview
Input Frames


Event Detection
before
after
Event Interpretation
A document moved from (x1,y1) to (x2,y2)
72
Algorithm overview
Input Frames


Event Detection
before
after
Event Interpretation
A document moved from (x1,y1) to (x2,y2)
File1.pdf
Document Recognition
File2.pdf
File3.pdf
73
Algorithm overview
Input Frames


Event Detection
before
after
Event Interpretation
A document moved from (x1,y1) to (x2,y2)
File1.pdf
Document Recognition
File2.pdf
File3.pdf
Scene Graph Update
Desk
Desk
74
Algorithm overview
Input Frames


Event Detection
before
after
Event Interpretation
A document moved from (x1,y1) to (x2,y2)
SIFT
File1.pdf
Document Recognition
File2.pdf
File3.pdf
Scene Graph Update
Desk
Desk
75
Document tracking example
before
after
76
Document tracking example
before
after
77
Document tracking example
before
after
78
Document tracking example
before
after
79
Document tracking example
before
after
80
Document tracking example
before
after
81
Document tracking example
before
after
82
Document tracking example
before
after
83
Document tracking example
before
after
84
Document tracking example
Motion (x,y,?)
before
after
85
Document Recognition
  • Match against PDF image database



File2.pdf
File3.pdf
File4.pdf
File5.pdf
File6.pdf
File1.pdf
86
Document Recognition
  • Performance analysis
  • Tested 20 pages against database of 162 pages

87
Document Recognition
  • Performance analysis
  • Tested 20 pages against database of 162 pages
  • 200x300 pixels per document for reliable match

Recognition Rate
Document Resolution
88
Document Recognition
  • Performance analysis
  • Tested 20 pages against database of 162 pages
  • 200x300 pixels per document for reliable match

0.9
Recognition Rate
300
Document Resolution
89
Results
  • Input video
  • 40 minutes
  • 1024x768 _at_ 15 fps
  • 22 documents, 49 events
  • Running time
  • Video processed offline
  • No optimization
  • A few hours for entire video

90
Demo Paper tracking
91
Photo sorting example
92
Photo sorting example
93
Demo Photo sorting
94
Future work
  • Enhance realism
  • Handle more realistic desktops
  • Real-time performance
  • More applications
  • Support other document tasks
  • E.g., attach reminder, cluster documents
  • Beyond documents
  • Other 3D desktop objects, books/CDs

95
Summary
  • SIFT is
  • Scale/rotation invariant local feature
  • Highly distinctive
  • Robust to occlusion, illumination change, 3D
    viewpoint change
  • Efficient (real-time performance)
  • Suitable for many useful applications

96
References
  • Distinctive image features from scale-invariant
    keypoints
  • David G. Lowe, International Journal of Computer
    Vision, 60, 2 (2004), pp. 91-110
  • Recognising panoramas
  • Matthew Brown and David G. Lowe, International
    Conference on Computer Vision (ICCV 2003), Nice,
    France (October 2003), pp. 1218-25.
  • Video-Based Document Tracking Unifying Your
    Physical and Electronic Desktops
  • Jiwon Kim, Steven M. Seitz and Maneesh Agrawala,
    ACM Symposium on User Interface Software and
    Technology (UIST 2004), pp. 99-107.
Write a Comment
User Comments (0)
About PowerShow.com