Introduction to Object Recognition

About This Presentation

Title:

Introduction to Object Recognition

Description:

Given some knowledge of how certain objects may appear and an image of a scene ... (e.g., Murase and Nayar, 1995, Turk and Petland, 1991) ... – PowerPoint PPT presentation

Number of Views:98

Avg rating:3.0/5.0

Slides: 55

Provided by: george76

Learn more at: https://www.cse.unr.edu

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Object Recognition

1
Introduction to Object Recognition

CS773C Machine Intelligence Advanced Applications
Spring 2008 Object Recognition

2
Outline

The Problem of Object Recognition
Approaches to Object Recognition
Requirements and Performance Criteria
Representation Schemes
Matching Schemes
Example Systems
Indexing
Grouping
Error Analysis

3
Problem Statement

Given some knowledge of how certain objects may
appear and an image of a scene possibly
containing those objects, report which objects
are present in the scene and where.

Recognition should be (1) invariant to view
point changes and object transformations (2)
robust to noise and occlusions
4
Challenges

The appearance of an object can have a large
range of variation due to
photometric effects
scene clutter
changes in shape (e.g.,non-rigid objects)
viewpoint changes
Different views of the same object can give rise
to widely different images !!

5
Object Recognition Applications

Quality control and assembly in industrial
plants.
Robot localization and navigation.
Monitoring and surveillance.
Automatic exploration of image databases.

6
Human Visual Recognition

A spontaneous, natural activity for humans and
other biological systems.
People know about tens of thousands of different
objects, yet they can easily distinguish among
them.
People can recognize objects with movable parts
or objects that are not rigid.
People can balance the information provided by
different kinds of visual input.

7
Why Is It Difficult?

Hard mathematical problems in understanding the
relationship between geometric shapes and their
projections into images.
We must match an image to one of a huge number of
possible objects, in any of an infinite number of
possible positions (computational complexity)

8
Why Is It Difficult? (contd)

We do not understand the recognition problem

9
What do we do in practice?

Impose constraints to simplify the problem.
Construct useful machines rather than modeling
human performance.

10
Approaches Differ According To

Knowledge they employ
Model-based approach (i.e., based on explicit
model of the object's shape or appearance)
Context-based approach (i.e., based on the
context in which objects may be found)
Function-based approach (i.e., based on the
function for which objects may serve)

11
Approaches Differ According To (contd)

Restrictions on the form of the objects
2D or 3D objects
Simple vs complex objects
Rigid vs deforming objects
Representation schemes
Object-centered
Viewer-centered

12
Approaches Differ According To (contd)

Matching scheme
Geometry-based
Appearance-based
Image formation model
Perspective projection
Affine transformation (e.g., planar objects)
Orthographic projection scale

13
Requirements

Viewpoint Invariant
Translation, Rotation, Scale
Robust
Noise (i.e., sensor noise)
Local errors in early processing modules (e.g.,
edge detection)
Illumination/Shadows
Partial occlusion (i.e., self and from other
objects)
Intrinsic shape distortions (i.e., non-rigid
objects)

14
Performance Criteria

Scope
What kind of objects can be recognized and in
what kinds of scenes ?
Robustness
Does the method tolerate reasonable amounts of
noise and occlusion in the scene ?
Does it degrade gracefully as those tolerances
are exceeded ?

15
Performance Criteria (contd)

Efficiency
How much time and memory are required to search
the solution space ?
Accuracy
Correct recognition
False positives (wrong recognitions)
False negatives (missed recognitions)

16
Representation Schemes

(2) Viewer-centered

(1) Object-centered

17
Object-centered Representation

Associates a coordinate system with the object
The object geometry is expressed in this frame

Advantage every view of the object is
available Disadvantage might not be easy to
build (i.e., reconstruct 3D from 2D).
18
Object-centered Representation (contd)

Two different matching approaches
(1) Derive a similar object-centered description
from the scene and match it with the models (e.g.
using shape from X methods).
(2) Apply a model of the image formation process
on the candidate model to back-project it onto
the scene (camera calibration required).

19
Viewer-centered Representation

Objects are described by a set of characteristic
views or aspects

Advantages (i) Easier to build compared to
object-centered, (ii) matching is easier since it
involves 2D descriptions. Disadvantages
Requires a large number of views.
20
Predicting New Views

There is some evidence that the human visual
system uses a viewer-centered representation
for object recognition.
It predicts the appearance of objects in images
obtained under novel conditions by generalizing
from familiar images of the objects.

21
Predicting New Views (contd)
Familiar Views
Predict Novel View
22
Matching Schemes
(1) Geometry-based
explore correspondences between model and
scene features
(2) Appearance-based
represent objects from all possible viewpoints
and all possible illumination directions.
23
Geometry-based Matching

Advantage efficient in segmenting the object
of interest from the scene and robust in handling
occlusion
Disadvantage rely heavily on feature extraction
and their performance degrades when imaging
conditions give rise to poor segmentations.

24
Appearance-based Matching

Advantage circumvent the feature extraction
problem by enumerating many possible object
appearances in advance.
Disadvantages (i) difficulties with segmenting
the objects from the background and dealing with
occlusions, (ii) too many possible appearances,
(iii) how to sample the space of appearances ?

25
Model-Based Object Recognition

The environment is rather constraint and
recognition relies upon the existence of a set of
predefined objects.

26
Goals of Matching

Identify a group of features from an unknown
scene which approximately match a set of features
from a known view of a model object.
Recover the geometric transformation that the
model object has undergone

27
Transformation Space

2D objects (2 translation, 1 rotation, 1 scale)
3D objects, perspective projection (3 rotation, 3
translation)
3D objects, orthographic projection scale
(essentially 5 parameters and a constant for
depth)

28
Matching Two Steps

Hypothesis generation the identities of one or
more models are hypothesized.
Hypothesis verification tests are performed to
check if a given hypothesis is correct or not.

Models
29
Hypothesis Generation-Verification Example
30
Efficient Hypothesis Generation

How to choose the scene groups?
Do we need to consider every possible group?
How to find groups of features that are likely
to belong to the same object?
Use grouping schemes
Database organization and searching
Do we need to search the whole database of
models?
How should we organize the model database to
allow for fast and efficient storage and
retrieval?
Use indexing schemes

31
Interpretation Trees(E. Grimson and T.
Lozano-Perez, 1987)

Nodes of the tree represent match pairs (i.e.,
scene to model feature match).
Each level of the tree represents all possible
matches between an image feature fi and a model
feature mj
The tree represents the complete search space.

32
Interpretation Trees (contd)(E. Grimson and T.
Lozano-Perez, 1987)

Interpretation a path through the tree.
(Model features m1, m2, m3, m4)
(Scene features f1, f2)
Use a Depth-first-tree search to find a match
(or interpretation).

33
Interpretation Trees (contd)(E. Grimson and T.
Lozano-Perez, 1987)

Search space is very large (i.e., exponential
number of matches).
Find consistent interpretations without exploring
all possible ways of matching image and model
features.
Use geometric constraints to prune the tree
Unary constraints properties of individual
features (e.g., length/orientation of a line)
Binary constraints properties of pairs of
features (e.g., distance/angle between two
lines)

34
Alignment Approach(Huttenlocher and Ullman, 1990)

Most approaches searched for the largest pairing
of model and image features for which there exist
a single geometric transformation mapping each
model feature to its corresponding image feature.
The alignment approach seeks to recover the
geometric transformation between the model and
the scene using a minimum number of
correspondences.

35
Alignment Approach (contd)(Huttenlocher and
Ullman, 1990)

Weak perspective model (3 correspondences -
O(m3n3) cases)
x ?(sRxb)
? orthographic projection
s scale
R 3D rotation
b translation
Equivalent to an affine transformation (valid
when object is far from camera and object depth
small relative to distance from camera)
xLxb

36
Pose Clustering(e.g., Thompson and Mundy, 1987,
Ballard, 1981)

Main idea
If there is a transformation that can bring into
alignment a large number of features, then this
transformation will receive a large number of
votes.

37
Pose Clustering(e.g., Thompson and Mundy, 1987,
Ballard, 1981)

Main Steps
(1) Quantize the space of possible
transformations (usually 4D - 6D).
(2) For each hypothetical match, solve for the
transformation that aligns the matched
features.
(3) Cast a vote in the corresponding
transformation space bin.
(4) Find "peak" in transformation space.

38
Pose Clustering (example)(e.g., Thompson and
Mundy, 1987, Ballard, 1981)
39
Appearance-based Recognition(e.g., Murase and
Nayar, 1995, Turk and Petland, 1991)

Represent an object by the set of its possible
appearances (i.e., under all possible viewpoints
and illumination conditions).
Identifying an object implies finding the closest
stored image.

40
Appearance-based Recognition(e.g., Murase and
Nayar, 1995, Turk and Petland, 1991)

In practice, a subset of all possible appearances
is used.
Images are highly correlated, so compress them
into a low-dimensional space that captures key
appearance characteristics (e.g., use Principal
Component Analysis (PCA)).

41
Indexing-based Recognition

Preprocessing step groups of model features are
used to index the database and the indexed
locations are filled with entries containing
references to the model objects and information
that later can be used for pose recovering.
Recognition step groups of scene features are
used to index the database and the model objects
listed in the indexed locations are collected
into a list of candidate models (hypotheses).

42
Indexing-Based Recognition (contd)

Use a-priori stored information about the models
to quickly eliminate non-feasible matches during
recognition.

43
Invariants

Properties that do not change with object
transformations or viewpoint changes.
Ideally, we would like the index computed from a
group of model features to be invariant.
Only one entry per group needs to be stored this
way.

44
Planar (2D) objects

The index is computed based on invariant
properties.
One entry per group needs to be stored in this
case.

affine invariants (geometric hashing) Lamdan et
al., 1988
45
Geometric Hashing
46
Three-Dimensional Objects

No general-case invariants exist for single views
of general 3D objects (Clemens Jacobs, 1991).
Special case and model-based invariants (Rothwell
et al., 1995, Weinshall, 1993)

47
Indexing for 3D Object Recognition (contd)

One approach might be ...

48
Indexing for 3D Object Recognition (contd)

Another approach might be ...

49
Grouping

Grouping is the process that organizes the image
into parts, each likely to come from a single
object.
It reduces the number of hypotheses dramatically.
Non-accidental properties (grouping clues)
Orientation, Collinearity, Parallelism, Proximity

Convex groups (Jacobs, 1996)
50
Error Analysis

Uncertainty in feature locations
It is important to analyze the sensitivity of
each algorithm with respect to uncertainty in the
location of the image features.
Case of Indexing
Analyze how errors in the locations of the points
affects the invariants.

51
Error Analysis (contd)
52
References

E. Grimson and T. Lozano-Perez, "Localizing
overlapping parts by searching the interpretation
tree", IEEE Pattern Analysis and Machine
Intelligence, vol. 9, no. 4, pp. 469-482, July
1987.
D. Huttenlocher and S. Ullman, "Recognizing solid
objects by alignment with an image",
International Journal of Computer Vision, vol. 5,
no. 2, pp. 195-212, 1990.
Y. Lamdan, J. Schwartz, and H. Wolfson, "Affine
invariant model-based object recognition", IEEE
Trans. on Robotics and Automation, vol. 6, no. 5,
pp. 578-589, October 1990.
Rigoutsos I. Hummel R., "A Bayesian approach to
model matching with geometric hashing", CVGIP
Image Understanding, 62, 11-26, 1995.

53
References (contd)

D. Clemens and D. Jacobs, "Space and time bounds
on indexing 3D models from 2D images", IEEE
Pattern Analysis and Machine Intelligence, vol.
13 no. 10, pp. 1007-1017, 1991.
D. Thompson and J. Mundy, "Three dimensional
model matching from an unconstrained viewpoint",
IEEE Conference on Robotics and Automation, pp.
208-220, 1987.
D. Ballard, "Generalizing the hough transform to
detect arbitrary patterns", Pattern Recognition,
vol. 13, no. 2, pp. 111-122, 1981.
H. Murase and S. Nayar, "Visual learning and
recognition of 3D objects from appearance",
International Journal of Computer Vision, vol.
14, pp. 5-24, 1995.

54
References (contd)

M. Turk and A. Pentland, "Eigenfaces for
Recognition", Journal of Cognitive Neuroscience,
Vol. 3, pp. 71-86, 1991.
D. Jacobs, "Robust and efficient detection of
salient convex groups", IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol.
18, no. 1, pp. 23-37, 1996.
Bowyer and C. Dyer, "Aspect graphs an
introduction and survey of recent results",
International Journal of Imaging Systems and
Technology, vol. 2, pp. 315-328, 1990.

Write a Comment

User Comments (0)

About PowerShow.com

Introduction to Object Recognition - PowerPoint PPT Presentation

Introduction to Object Recognition

Given some knowledge of how certain objects may appear and an image of a scene ... (e.g., Murase and Nayar, 1995, Turk and Petland, 1991) ... – PowerPoint PPT presentation