CS 223-B Part A Lect. : Advanced Features - PowerPoint PPT Presentation

About This Presentation
Title:

CS 223-B Part A Lect. : Advanced Features

Description:

CS 223-B Part A Lect. : Advanced Features Sebastian Thrun Gary Bradski http://robots.stanford.edu/cs223b/index.html Readings This lecture is in 2 separate parts: A ... – PowerPoint PPT presentation

Number of Views:170
Avg rating:3.0/5.0
Slides: 46
Provided by: robotsSta
Category:
Tags: advanced | features | lect | part

less

Transcript and Presenter's Notes

Title: CS 223-B Part A Lect. : Advanced Features


1
CS 223-B Part A Lect. Advanced Features
  • Sebastian Thrun
  • Gary Bradski

http//robots.stanford.edu/cs223b/index.html
2
Readings
  • This lecture is in 2 separate parts A -
    Fourier, Gabor, SIFT and B - Texture and other
    operators. B is optional due to time
    limitations. Good to look through nevertheless.
  • Read
  • Computer Vision, Forsyth Ponce
  • Chapters 7 and (optional for texture) 9 but do
    it lightly just for the gist.
  • David G. Lowe, Distinctive Image Features from
    Scale-Invariant Keypoints, IJCV04.
  • Just read/take notes on basic flow of the
    algorithm.
  • W. Freeman and E. Adelson, The Design and Use of
    Steerable Filters, IEEE Trans. Patt. Anal. and
    Machine Intell., Vol. 13, No. 9.
  • Read pages 1-15.

3
Left over questions
  • Calibration question the optimization is based
    on gradient descent iterations which depend on
    finding a good initial starting guess.
  • How do we scale image derivatives?? Great
    question
  • Images exist as brightness values over pixels.
    What are the units then of a simple derivative
    operator like -1 0 1?

In the features lecture, we only wanted to find
edges (identification), but what if we
had instead wanted to make measurements?
In optical flow, we end up wanting to
calculate the velocity v which is found (in the
optical flow lecture) to be equal to It, the
temporal derivative (image difference) I(t1)
I(t) which is in pixels divided by the spatial
derivative Ix in brightness/pixel
vx pixels It / Ix brightness/(brightness/pix
el) Oops! Our derivative is a factor of 2 too
great gt NEED TO NORMALIZE Ix -1/2 0
1/2.
4
Good Features beat Good Algorithms
  • For tasks such as recognition, tracking, and
    segmentation, experience shows
  • With the right features, all algorithms will
    work well.
  • With the wrong features, good algorithms will
    work marginally better than bad/simple
    algorithms, but it wont work well.

5
Fourier Transform 1
  • Foundational trick represent signal/data in
    terms of an orthogonal basis. For example, a
    vector v in 3 space can be represented as a
    projection onto 3 orthonormal vectors
  • In the same way, a function can be represented as
    a point projected into a space of (infinitely
    many) orthogonal functions. For Fourier
    transforms, we project a function into a space of
    cos and sin
  • Intuitively, how do we know this sin, cos basis
    is orthogonal?
  • Sin or Cos periodically spend as much time above
    as below the axis. If the frequency is
    mismatched, the functions will cancel each other
    out over minus to plus infinity.
  • Formally, one could use
  • To prove

Eqns from Computer Vision IT412
6
Fourier Transform 2
Fourier transform is defined as continuous
Inverse transform gets rid of freq. components
In general, Fourier transform is complex
The Fourier Spectrum is then
The Phase is then
We often view the Power Spectrum
7
Fourier Properties
Fourier Transform
Is linear
Its spatial scale is inverse to frequency
Shift goes to phase change
Fourier Transform Symmetries are
Is the complex conjugate
Convolution Property
Note that scale property implies delta function
goes to uniform
8
Fourier Discrete (DFT)
  • Animals and Machines live in a discrete world.
    To move the continuous
  • Fourier world to its discrete version, we sample
  • gt Multiply by infinite series of delta
    functions spaced apart
  • gt Convolve with a uniform function inversely
    spaced

9
Fourier Discrete (DFT) 2
All real world signals are band limited That
is, they dont have infinite frequencies nor
infinite spatial extend. This is good, otherwise
our discrete Fourier copies would collide and
alias together. But, what if we still sample too
seldom? Even band limited will eventually
collide.
How do we keep the copies apart? Sample at at
least twice the signals band limit frequency gt
Niquist Criterion
10
2D DFT
Discrete Fourier Transform (DFT)
Inverse DFT
Optimally implemented on serial machines via the
Fast Fourier Transform (FFT), DFT is faster on
parallel machines.
11
Fourier Examples
Raw Image
Fourier Amplitude
Sinusoid, higher frequency
DC term side lobes wide spacing
Sinusoid, lower frequency
DC term side lobes close spacing
Sinusoid, tilted
Titled spectrum
Images from Steve Lehar http//cns-alumni.bu.edu/
slehar An Intuitive Explanation of Fourier Theory
12
More Fourier Examples
  • Fourier basis element
  • example, real part
  • Fu,v(x,y)
  • Fu,v(x,y)const. for (uxvy)const.
  • Vector (u,v)
  • Magnitude gives frequency
  • Direction gives orientation.

Slides from Marc Pollefeys, Comp 256 lecture 7
13
More Fourier Examples
Here u and v are larger than in the previous
slide.
Slides from Marc Pollefeys, Comp 256 lecture 7
14
More Fourier Examples
And larger still...
Slides from Marc Pollefeys, Comp 256 lecture 7
15
Fourier Filtering
Multiply by a filter in the frequency domain gt
convolve with the fiter in spatial domain.
Fourier Amplitude
Images from Steve Lehar http//cns-alumni.bu.edu/
slehar An Intuitive Explanation of Fourier Theory
16
Fourier Lens
Remember that Fourier transform takes delta
functions to uniform, and uniform to delta?
Figures from Steve Lehar http//cns-alumni.bu.edu/
slehar An Intuitive Explanation of Fourier
Theory
17
Phase Caries More Information
Raw Images
Reconstruct (inverse FFT) mixing the magnitude
and phase images
18
Phase Coherence for Feature Detection?
Note that the Fourier components for a square
wave cohere (are in phase) at the step junction
Here, they must all pass through zero right at
the step edge, and achieve local maximums at the
corners.
Phase coherence is maximal at corner points of
triangle and trapezoid waves too
Images Peter Kovesi, Proc. VIIth Digital Image
Computing Techniques and Applications, Sun C.,
Talbot H., Ourselin S. and Adriaansen T. (Eds.),
10-12 Dec. 2003, Sydney
19
Phase Coherence for Feature Detection
Gist of the idea Fourier transform yields a
series of real and imaginary sinusoidal terms. At
any point x, the local Fourier components will
each have an amplitude An(x) and a phase angle
fn(x). Vector addition of these terms yields an
vector E(x) at the average phase angle.
Morrone defined a measure that at absolute phase
coherence will be 1 everything points in the
same direction -- and for no phase coherence will
be zero. Local maximums indicate edges and
corners, insensitive to contrast in the image.
In practice, these local components are
calculated with Gabor filters at
several orientations that can yield oriented
edges and corners.
Images Peter Kovesi, Proc. VIIth Digital Image
Computing Techniques and Applications, Sun C.,
Talbot H., Ourselin S. and Adriaansen T. (Eds.),
10-12 Dec. 2003, Sydney
20
Phase Coherence for Feature Detection
Comparison of phase vs. Harris Corner detector.
Harris response varies by 2 or more orders of
magnitudethreshold? Phase can only vary between
0 and 1 and is not sensitive to contrast or
lighting.
Images Peter Kovesi, Proc. VIIth Digital Image
Computing Techniques and Applications, Sun C.,
Talbot H., Ourselin S. and Adriaansen T. (Eds.),
10-12 Dec. 2003, Sydney
21
Gabor filters and Jets
  • Global information is used for physical systems
    identification.
  • Impulse response of a centrifuge to identify
    resonance points which indicate which spin
    frequencies to avoid.
  • Local information is used for physical signal
    analysis.
  • In images, it is the relationship of details that
    matter, not (usually) things like average
    brightness.
  • In 1946, Gabor suggested representing signals
    over space and time called Information diagrams.
    He showed that a Gaussian occupies minimal area
    in such diagrams. Time and Frequency analysis
    are the two extremes of such an analysis.

22
Gabor filters and Jets
  • Gabor filters are formed by modulating a complex
    sinusoid by a Gaussian function.
  • Gabor filters became
    popular in vision partly
    because J.G Daugman (1980, 88, 90) showed that
    the receptive fields of most orientation
    receptive neurons in the (cats) brain looked
    very much like Gabor functions.
  • As with Gabor filters, the brain often makes use
    of over complete, non-orthogonal functions.

J.G.Daugman, Two dimensional spectral analysis
of cortical receptive field profiles, Vision
Res., vol.20.pp.847-856.1980
J. Daugman, Complete discrete 2-d gabor
transforms by neural network for image analysis
and compression, IEEE Transactions on Acoustics,
Speech, and Signal Processing, vol. 36, no. 7,
pp. 11691179, 1988.
Daugman, J.G. (1990) An informationtheoretic
view of analogue representation in striate
cortex, Computational Neuroscience, Ed. Schwartz,
E. L., Cambridge, MA MIT Press, 403424.
23
Gabor filters and Jets
Rotated Gaussian
Oriented Complex Sinusoid
2D Gabor filter
Depending on ones task (object ID, texture
analysis, tracking,) one must then decide what
size filters, in what orientations and what
frequencies to use.
24
Gabor filters and Jets
In practice, once the scales, orientation and
radial frequencies are chosen one usually sets
up filters in quadrature (90o phase shift) pairs
and just empirically normalizes them such that
the response is zero to a uniform background.
Quadrature pairs, in practice the center point
(p,q) is set to (0,0).
The magnitude response is then calculated as
25
Gabor filters and Jets
Von Der Malsburg organized Gabor filters at
multiple scales and orientations in a vector, or
Jet
A graph of such Jets (Elastic Graph Matching)
has proven to be a good primitive for object
recognition.
L. Wiskott, J-M. Fellous, N. Kuiger, C. Malsburg,
Face Recognition by Elastic Bunch Graph
Matching, IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol.19(7), July 1997,
pp. 775-779.
Image from Laurenz Wiskott, http//itb.biologie.hu
-berlin.de/wiskott/
26
Gabor filters and Jets Example
Gabor Filters used
Gang Song, Tao Wang, Yimin Zhang, Wei Hu,
Guangyou Xu, Gary Bradski, Face Modeling and
Recognition Using Bayesian Networks, Submitted
to CVPR 2004
27
Scale
  • 3D to 2D Perspective projections give widely
    varying scale for the same object. Computer
    vision needs to address scale.
  • Gabor discussion above addressed image scale via
    the sigma of the modulating Gaussians and the
    frequency of the complex sinusoid.
  • We can directly deal with scale by repeatedly
    down-sampling the image to look for courser and
    courser patterns. We call this scale space, or
    Image Pyramids

28
Image Pyramids
Commonly, we down-sample by 2 or sqrt(2). Sqrt(2)
obviously calls for inter-pixel interpolation
For down-sample by 2, typical Gaussian sigma is
1.4. For Sqrt(2) sigma is typically the
sqrt(1.4).
Full power 2 pyramid only doubles the number of
pixels to process.
Laplacian Pyramid Error Pyramid
29
Steerability
Bill Freeman, in his 1992 Thesis determined the
necessary conditions for Steerability -- the
ability to synthesize a filter of any orientation
from a linear combination of filters at fixed
orientations. The simplest example of this is
oriented first derivative of Gaussian filters, at
0o and 90o
Steering Eqn
0o
90o
Synthesized 30o
Filter Set
Response
Taken from W. Freeman, T. Adelson, The Design
and Use of Sterrable Filters, IEEE Trans.
Patt, Anal. and Machine Intell., vol 13, 9, pp
891-900, Sept 1991
Raw Image
30
Steerability
Freeman showed that any band limited signal could
form a steerable basis with as many bases as it
had non-zero Fourier coefs. Important example is
2nd derivative of Gaussian
(Laplacian)
Taken from W. Freeman, T. Adelson, The Design
and Use of Steerable Filters, IEEE Trans. Patt,
Anal. and Machine Intell., vol 13, 9, pp
891-900, Sept 1991
31
Steerable Pyramid
We may combine Steerability with Pyramids to get
a Steerable Laplacian Pyramid as shown below
Decomposition
Reconstruction
High pass, since band pass in pyramid low pass at
bottom.
Oriented
Low Pass
2 Level decomposition of white circle example
Images from http//www.cis.upenn.edu/eero/steerp
yr.html
32
Scale Invariant Feature Transform
  • Idea is to find local features that stay the same
    (as much as possible) under
  • Scale change
  • 2D rotation in the image x,y plane
  • 3D rotation (affine variation)
  • Illumination
  • Collections of such features can be used for
    reliable
  • 3D object recognition
  • User interface, toy interface
  • Robot localization, navigation and mapping
  • Digital image stitching, organization
  • 3D scene understanding

33
Scale Invariant Feature Transform
  • High Level Algorithm
  • Find peak responses (over scale) in Laplacian
    pyramid.
  • Find response with sub-pixel accuracy.
  • Only keep corner like responses
  • Assign orientation
  • Create recognition signature
  • Solve affine parameters (3D rot. changes)

34
Scale Invariant Feature Transform
From Gaussian scale pyramid -- create
Difference of Gaussian (DOG) images
And find maximum response over space and scale
Images from David G. Lowe, Object recognition
from local scale-invariant features,
International Conference on Computer Vision,
Corfu, Greece (September 1999), pp. 1150-1157
35
Scale Invariant Feature Transform
At the location and scale of peak found, find the
gradient orientation
Use the gradients to only keep corner like
peaks in manner similar to Harris corner
detector
At each peak location and scale, use gradients to
form slip tolerant orientation histogram
recognition keys
Images from David G. Lowe, Object recognition
from local scale-invariant features,
International Conference on Computer Vision,
Corfu, Greece (September 1999), pp. 1150-1157
36
Scale Invariant Feature Transform
To account for out of image plane (3D) rotation,
solve for affine distortion parameters
For features found, set up system of equations
Which take the form of . Over
determined (least sqrs) solution is then
Eqns from David G. Lowe, Object recognition
from local scale-invariant features,
International Conference on Computer Vision,
Corfu, Greece (September 1999), pp. 1150-1157
37
Scale Invariant Feature Transform
Recognition example. Learned models of SIFT
features, and got object outline from background
subtraction
Objects may then be found under occlusion and 3D
rotation
Images from David Lowe, Object Recognition from
Local Scale-Invariant Features Proc. of the
International Conference on Computer Vision,Corfu
(Sept. 1999)
38
Scale Invariant Feature Transform
Image stitching example. Attach images together
from keypoints
Solving the homography
Finding similar images in a roll and stitching
Images from M. Brown and D. G. Lowe. Recognising
Panoramas. In Proceedings of the 9th
International Conference on Computer Vision
(ICCV2003)
39
Scale Invariant Feature Transform
Localizing Example
Find different views of same scene in video2
Given key images, find and trigger on them1
2) Josef Sivic and Andrew Zisserman, Video
Google A Text Retrieval Approach to Object
Matching in Videos, ICCV 2003
1) David G. Lowe, Distinctive Image Features from
Scale-Invariant Keypoints, Submitted to
International Journal of Computer Vision. Version
date June 2003
40
Log-Polar Transform
Go from Euclidian (x,y) to log-polar space
log(reiq) gt (log r, q) space.
Log-polar transform is always done relative to a
chosen center point (xc,yc)
  • Images, further advances in George Wolberg,
    Siavash Zokai, ROBUST
  • IMAGE REGISTRATION USING LOG-POLAR TRANSFORM,
    ICIP 2000

y
x
Rotation and scale are converted to shifts along
the q or log r axis. Shifting back to a
canonical location gives rotation and scale
invariance. If used on a Fourier image
(translation invariant), we get rotation, scale
and translation invariance (called Fourier-Mellin
transform)1.
41
Bilateral Filtering
  • We want smoothing that preserves edges. Typically
    done via P. Perona and J. Malik anisotropic
    diffusion. More clever is the Tomasi and
    Manduchi approximation
  • Rather than just convolve with a Gaussian in
    space
  • the convolution weights use a Gaussian in space
    together with a Gaussian in gray level values.

C. Tomasi and R. Manduchi, "Bilateral Filtering
for Gray and Color Images", Proceedings of the
1998 IEEE International Conference on Computer
Vision, Bombay, India
42
But Bio-Vision is more dynamic
  • Artifacts of competitive edge/diffusion process
    Neon Color Spreading Illusion

Best explanation is Grossberg and Mingolla edge
detectors need to be shut off, performed by
competitive inhibition. When weaker edges meet
stronger, the weaker edge is suppressed breaking
the dikes that hold back the diffusion process.
When the edges are disconnected, the illusion
goes away or is diminished below
Grossberg, S., Mingolla, E. (1985). Neural
Dynamics of Form Perception Boundary Completion.
Psychol. Rev., 92, 173--211.
43
Local vs. Global
Still, vision is a stranger thing than simple
processing
44
Local vs. Global
Still, vision is a stranger thing than simple
processing
45
Computer vision often misses the fact that vision
is an active sense
These lines are straight
Nothing is moving here
Write a Comment
User Comments (0)
About PowerShow.com