Object Recognition - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Object Recognition

Description:

Riesenhuber & Poggio, Hierarchical models of object recognition in cortex. ... neighbor in data base of stored features; use Hough transform to pool votes ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 39
Provided by: jochent
Category:

less

Transcript and Presenter's Notes

Title: Object Recognition


1
Object Recognition
  • Outline
  • Introduction
  • Representation Concept
  • Representation Features
  • Learning Recognition
  • Segmentation Recognition

2
  • Credits major sources of material, including
    figures and slides were
  • Riesenhuber Poggio, Hierarchical models of
    object recognition in cortex. Nature
    Neuroscience, 1991.
  • B. Mel. SeeMore. Neural Computation, 1997.
  • Ullman, Vidal-Naquet, Sari. Visual features of
    intermediate complexity and their use in
    classification. Nature Neuroscience, 2002.
  • David G. Lowe. Distinctive Image Features from
    Scale-Invariant Keypoints. Int. J. of Computer
    Vision, 2004.
  • and various resources on the WWW

3
Why is it difficult?
  • Because appearance drastically varies with
  • position/pose/scale
  • lighting/shadows
  • articulation/expression
  • partial occlusion
  • need invariant recognition!

4
The Classical View
Historically
Feature Extraction
Segmentation
Recognition
Image
Problem Bottom-up segmentation only works in
very limited range of situations! This
architecture is fundamentally flawed!
Two ways out 1) direct recognition, 2)
integration of seg.rec.
5
Ventral Stream
edges, bars
objects, faces
? larger RFs, higher complexity, higher
invariance ?
K.Tanaka (IT)
D.vanEssen (V2)
6
Basic Models
seminal work by Fukushima, newer version by
Riesenhuber and Poggio
7
Questions
  • what are the intermediate features?
  • how/why are they being learned?
  • how is invariance computation implemented?
  • what nonlinearities at what level (dendrites?)
  • how is invariance learned?
  • temporal continuity role of eye movements
  • basic model is feedforward, what do feedback
    connections do?
  • attention/segmentation/bayesian inference?

8
Representation Concept
  • 3-d models wont talk about
  • view-based
  • holistic descriptions of a view
  • invariant features/histogram techniques
  • spatial constellation of localized features

9
Holistic Descriptions ITemplates
  • Idea
  • compare image (regions) directly to template
  • image patches, object template are represented as
    high-dimensional vectors
  • simple comparison metrics (Euclidean distance,
    normalized correlation, ...)
  • Problem
  • such metrics not robust w.r.t. even small changes
    in position/aspect/scale changes or deformations
  • ? difficult to achieve invariance

10
Holistic Descriptions IIEigenspace Approach
  • Somewhat better Eigenspace approaches
  • perform Principal Component Analysis (PCA) on
    training images (e.g. Eigenfaces
  • compare images by projecting on subset of the PCs

MuraseNayar (1995)
TurkPentland (1992)
11
Assessment
  • quite successful for segmented and carefully
    aligned images (e.g., eyes and nose are at the
    same pixel coordinates in all images)
  • but similar problems as above
  • not well-suited for clutter
  • problems with occlusions
  • some notable extensions trying to deal with this
    (e.g., Leonardis, 1996,1997)

12
Feature Histograms
Idea reach invariance by computing invariant
features Examples Mel (1997), SchieleCrowley
(1997,2000)
histogram pooling throw occurrences of simple
feature from all image regions together into one
bin
13
  • Assessment
  • works very well for segmented images with
  • only one object, but...
  • Problem
  • histograms of simple features over the whole
    image leads to a superposition catastrophe,
    lacks a binding mechanism
  • consider several objects in scene histogram
    contains all their features no representation of
    which features came from same object
  • system breaks down for clutter or complex
    backgrounds

14
B. Mel (1997)
15
Training and test images, performance
A
B
C
D
E
16
Feature Constellations
Observation holistic templates and histogram
techniques cant handle cluttered scenes
well Idea How about constellations of
features? E.g. face is constellation of eyes,
nose, mouth, etc.
17
Representation Features
  • Only discuss local features
  • image patches
  • wavelet basis, e.g., Haar, Gabor
  • complex features, e.g., SIFT ( Scale Invariant
    Feature Transform)

18
Image Patches
Ullman, Vidal-Naquet, Sali (2002)
merit
likelihood ratio
weight
19
Intermediate complexity is best (trivial result,
really)
20
Recognition examples
21
Gabor Wavelets
image space
frequency space
  • in frequency space Gabor wavelet is a Gaussian
  • wavelet different wavelets are scaled/rotated
    versions of a mother wavelet

22
Gabor Wavelets as filters
Gabor filters sin() and cos() part
compute correlation of image with filter at every
location x0
23
Tiling of frequency space Jets
measured frequency tuning of biological neurons
(left) and dense coverage
applying different Gabor filters (with different
k) to same image location gives vector of filter
responses Jet
24
SIFT Features
  • step 1 find scale space extrema

25
  • step 2 apply contrast and curvature requirements

26
  • step 3 local image descriptor extracted at key
    points is a 128-dim vector

27
Learning and Recognition
  • top-down model matching
  • Elastic graph matching
  • bottom-up indexing
  • with or without shared features

28
Elastic Graph Matching (EGM)
Representation graph nodes labelled with Jets
(Gabor filter responses of different
scales/orientations) Matching Minimize cost
function that punishes dissimilarities of Gabor
responses and distortions of the graph through
stochastic optimization techniques
29
Bunch Graphs
Idea add invariance by labelling graph nodes
with collection or bunch of different feature
exemplars (Wiskott et.al.,1995, 1997) Advantage
can decouple finding the facial features from the
identification Matching uses a MAX rule.
30
Indexing Methods
  • when you want to recognize very many objects,
    its inefficient to individually check for each
    model by searching for all of its features in a
    top-down fashion
  • better indexing methods
  • also share features among object models

31
Recognition with SIFT features
  • recognition extract SIFT features match to
    nearest neighbor in data base of stored features
    use Hough transform to pool votes

32
Recognition with Gabor Jets and Color Features
33
Scaling Behavior when Sharing Features between
models
  • Recognition speed limited more by number of
    features rather than number of object models,
    modest number of features o.k.
  • can incorporate many feature types
  • can incorporate stereo (reasoning about
    occlusions)

34
Hierarchies of Features
  • Long history of using hierarchies
  • Fukushimas Neocognitron (1983),
  • NelsonSelinger (1998,1999)
  • Advantages using hierarchy
  • faster learning and processing
  • better grip on correlated
  • deformations
  • easier to find proper specificity
  • vs. invariance tradeoff?

35
Feature Learning
  • Unsupervised clustering not necessarily optimal
    for discrimination
  • Use big bag of features, fish out the useful ones
    (e.g. via boosting Viola, 1997) takes very long
    to train, since you have to consider every
    feature from that big bag
  • Note usefulness of one feature depends on the
    which other ones youre using already.
  • Learn higher level features as (nonlinear)
    combinations of lower level features (Perona
    et.al., 2000) also takes very long to train,
    only up to 5 features. But could use locality
    constraint

36
Feedback
Question Why all the feedback connections in the
brain? Important for on-line processing? Neurosci
ence Object recognition in 150 ms (Thorpe
et.al., 1996), but interesting temporal response
properties of IT neurons (OramRichmond, 1999)
some V1 neurons restore line behind an
occluder Idea Feed-forward architecture cant
correct errors made at early stages later on.
Feedback architecture can! High level
hypotheses try to reinforce their lower level
evidence while hypotheses compete at all
levels.
37
Recognition Segmentation
  • Basic Idea integrate recognition with
    segmentation in a feedback architecture
  • object hypotheses reinforce their supporting
    evidence and inhibit competing evidence,
    suppressing features that do not belong to them
    (idea goes back to at least the PDP books)
  • at the same time restore missing features due to
    partial occlusion (associative memory property)

38
Current work in this area
  • mostly demonstrating how recognition can aid
    segmentation
  • what is missing is a clear and elegant
    demonstration of a truly integrated system that
    shows how the two kinds of processing help each
    other
  • Maybe dont treat as two kinds of processing but
    one inference problem
  • how best to do this? million dollar question
Write a Comment
User Comments (0)
About PowerShow.com