Introduction to Object Recognition - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Object Recognition

Description:

Given some knowledge of how certain objects may appear and an image of a scene ... (e.g., Murase and Nayar, 1995, Turk and Petland, 1991) ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 55
Provided by: george76
Learn more at: https://www.cse.unr.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Object Recognition


1
Introduction to Object Recognition
  • CS773C Machine Intelligence Advanced Applications
  • Spring 2008 Object Recognition

2
Outline
  • The Problem of Object Recognition
  • Approaches to Object Recognition
  • Requirements and Performance Criteria
  • Representation Schemes
  • Matching Schemes
  • Example Systems
  • Indexing
  • Grouping
  • Error Analysis

3
Problem Statement
  • Given some knowledge of how certain objects may
    appear and an image of a scene possibly
    containing those objects, report which objects
    are present in the scene and where.

Recognition should be (1) invariant to view
point changes and object transformations (2)
robust to noise and occlusions
4
Challenges
  • The appearance of an object can have a large
    range of variation due to
  • photometric effects
  • scene clutter
  • changes in shape (e.g.,non-rigid objects)
  • viewpoint changes
  • Different views of the same object can give rise
    to widely different images !!

5
Object Recognition Applications
  • Quality control and assembly in industrial
    plants.
  • Robot localization and navigation.
  • Monitoring and surveillance.
  • Automatic exploration of image databases.

6
Human Visual Recognition
  • A spontaneous, natural activity for humans and
    other biological systems.
  • People know about tens of thousands of different
    objects, yet they can easily distinguish among
    them.
  • People can recognize objects with movable parts
    or objects that are not rigid.
  • People can balance the information provided by
    different kinds of visual input.

7
Why Is It Difficult?
  • Hard mathematical problems in understanding the
    relationship between geometric shapes and their
    projections into images.
  • We must match an image to one of a huge number of
    possible objects, in any of an infinite number of
    possible positions (computational complexity)

8
Why Is It Difficult? (contd)
  • We do not understand the recognition problem

9
What do we do in practice?
  • Impose constraints to simplify the problem.
  • Construct useful machines rather than modeling
    human performance.

10
Approaches Differ According To
  • Knowledge they employ
  • Model-based approach (i.e., based on explicit
    model of the object's shape or appearance)
  • Context-based approach (i.e., based on the
    context in which objects may be found)
  • Function-based approach (i.e., based on the
    function for which objects may serve)

11
Approaches Differ According To (contd)
  • Restrictions on the form of the objects
  • 2D or 3D objects
  • Simple vs complex objects
  • Rigid vs deforming objects
  • Representation schemes
  • Object-centered
  • Viewer-centered

12
Approaches Differ According To (contd)
  • Matching scheme
  • Geometry-based
  • Appearance-based
  • Image formation model
  • Perspective projection
  • Affine transformation (e.g., planar objects)
  • Orthographic projection scale

13
Requirements
  • Viewpoint Invariant
  • Translation, Rotation, Scale
  • Robust
  • Noise (i.e., sensor noise)
  • Local errors in early processing modules (e.g.,
    edge detection)
  • Illumination/Shadows
  • Partial occlusion (i.e., self and from other
    objects)
  • Intrinsic shape distortions (i.e., non-rigid
    objects)

14
Performance Criteria
  • Scope
  • What kind of objects can be recognized and in
    what kinds of scenes ?
  • Robustness
  • Does the method tolerate reasonable amounts of
    noise and occlusion in the scene ?
  • Does it degrade gracefully as those tolerances
    are exceeded ?

15
Performance Criteria (contd)
  • Efficiency
  • How much time and memory are required to search
    the solution space ?
  • Accuracy
  • Correct recognition
  • False positives (wrong recognitions)
  • False negatives (missed recognitions)

16
Representation Schemes
  • (2) Viewer-centered
  • (1) Object-centered

17
Object-centered Representation
  • Associates a coordinate system with the object
  • The object geometry is expressed in this frame

Advantage every view of the object is
available Disadvantage might not be easy to
build (i.e., reconstruct 3D from 2D).
18
Object-centered Representation (contd)
  • Two different matching approaches
  • (1) Derive a similar object-centered description
    from the scene and match it with the models (e.g.
    using shape from X methods).
  • (2) Apply a model of the image formation process
    on the candidate model to back-project it onto
    the scene (camera calibration required).

19
Viewer-centered Representation
  • Objects are described by a set of characteristic
    views or aspects

Advantages (i) Easier to build compared to
object-centered, (ii) matching is easier since it
involves 2D descriptions. Disadvantages
Requires a large number of views.
20
Predicting New Views
  • There is some evidence that the human visual
    system uses a viewer-centered representation
    for object recognition.
  • It predicts the appearance of objects in images
    obtained under novel conditions by generalizing
    from familiar images of the objects.

21
Predicting New Views (contd)
Familiar Views
Predict Novel View
22
Matching Schemes
(1) Geometry-based
explore correspondences between model and
scene features
(2) Appearance-based
represent objects from all possible viewpoints
and all possible illumination directions.
23
Geometry-based Matching
  • Advantage efficient in segmenting the object
    of interest from the scene and robust in handling
    occlusion
  • Disadvantage rely heavily on feature extraction
    and their performance degrades when imaging
    conditions give rise to poor segmentations.

24
Appearance-based Matching
  • Advantage circumvent the feature extraction
    problem by enumerating many possible object
    appearances in advance.
  • Disadvantages (i) difficulties with segmenting
    the objects from the background and dealing with
    occlusions, (ii) too many possible appearances,
    (iii) how to sample the space of appearances ?

25
Model-Based Object Recognition
  • The environment is rather constraint and
    recognition relies upon the existence of a set of
    predefined objects.

26
Goals of Matching
  • Identify a group of features from an unknown
    scene which approximately match a set of features
    from a known view of a model object.
  • Recover the geometric transformation that the
    model object has undergone

27
Transformation Space
  • 2D objects (2 translation, 1 rotation, 1 scale)
  • 3D objects, perspective projection (3 rotation, 3
    translation)
  • 3D objects, orthographic projection scale
    (essentially 5 parameters and a constant for
    depth)

28
Matching Two Steps
  • Hypothesis generation the identities of one or
    more models are hypothesized.
  • Hypothesis verification tests are performed to
    check if a given hypothesis is correct or not.

Models
29
Hypothesis Generation-Verification Example
30
Efficient Hypothesis Generation
  • How to choose the scene groups?
  • Do we need to consider every possible group?
  • How to find groups of features that are likely
    to belong to the same object?
  • Use grouping schemes
  • Database organization and searching
  • Do we need to search the whole database of
    models?
  • How should we organize the model database to
    allow for fast and efficient storage and
    retrieval?
  • Use indexing schemes

31
Interpretation Trees(E. Grimson and T.
Lozano-Perez, 1987)
  • Nodes of the tree represent match pairs (i.e.,
    scene to model feature match).
  • Each level of the tree represents all possible
    matches between an image feature fi and a model
    feature mj
  • The tree represents the complete search space.

32
Interpretation Trees (contd)(E. Grimson and T.
Lozano-Perez, 1987)
  • Interpretation a path through the tree.
  • (Model features m1, m2, m3, m4)
  • (Scene features f1, f2)
  • Use a Depth-first-tree search to find a match
    (or interpretation).

33
Interpretation Trees (contd)(E. Grimson and T.
Lozano-Perez, 1987)
  • Search space is very large (i.e., exponential
    number of matches).
  • Find consistent interpretations without exploring
    all possible ways of matching image and model
    features.
  • Use geometric constraints to prune the tree
  • Unary constraints properties of individual
    features (e.g., length/orientation of a line)
  • Binary constraints properties of pairs of
    features (e.g., distance/angle between two
    lines)

34
Alignment Approach(Huttenlocher and Ullman, 1990)
  • Most approaches searched for the largest pairing
    of model and image features for which there exist
    a single geometric transformation mapping each
    model feature to its corresponding image feature.
  • The alignment approach seeks to recover the
    geometric transformation between the model and
    the scene using a minimum number of
    correspondences.

35
Alignment Approach (contd)(Huttenlocher and
Ullman, 1990)
  • Weak perspective model (3 correspondences -
    O(m3n3) cases)
  • x ?(sRxb)
  • ? orthographic projection
  • s scale
  • R 3D rotation
  • b translation
  • Equivalent to an affine transformation (valid
    when object is far from camera and object depth
    small relative to distance from camera)

  • xLxb

36
Pose Clustering(e.g., Thompson and Mundy, 1987,
Ballard, 1981)
  • Main idea
  • If there is a transformation that can bring into
    alignment a large number of features, then this
    transformation will receive a large number of
    votes.

37
Pose Clustering(e.g., Thompson and Mundy, 1987,
Ballard, 1981)
  • Main Steps
  • (1) Quantize the space of possible
    transformations (usually 4D - 6D).
  • (2) For each hypothetical match, solve for the
    transformation that aligns the matched
    features.
  • (3) Cast a vote in the corresponding
    transformation space bin.
  • (4) Find "peak" in transformation space.

38
Pose Clustering (example)(e.g., Thompson and
Mundy, 1987, Ballard, 1981)
39
Appearance-based Recognition(e.g., Murase and
Nayar, 1995, Turk and Petland, 1991)
  • Represent an object by the set of its possible
    appearances (i.e., under all possible viewpoints
    and illumination conditions).
  • Identifying an object implies finding the closest
    stored image.

40
Appearance-based Recognition(e.g., Murase and
Nayar, 1995, Turk and Petland, 1991)
  • In practice, a subset of all possible appearances
    is used.
  • Images are highly correlated, so compress them
    into a low-dimensional space that captures key
    appearance characteristics (e.g., use Principal
    Component Analysis (PCA)).

41
Indexing-based Recognition
  • Preprocessing step groups of model features are
    used to index the database and the indexed
    locations are filled with entries containing
    references to the model objects and information
    that later can be used for pose recovering.
  • Recognition step groups of scene features are
    used to index the database and the model objects
    listed in the indexed locations are collected
    into a list of candidate models (hypotheses).

42
Indexing-Based Recognition (contd)
  • Use a-priori stored information about the models
    to quickly eliminate non-feasible matches during
    recognition.

43
Invariants
  • Properties that do not change with object
    transformations or viewpoint changes.
  • Ideally, we would like the index computed from a
    group of model features to be invariant.
  • Only one entry per group needs to be stored this
    way.

44
Planar (2D) objects
  • The index is computed based on invariant
    properties.
  • One entry per group needs to be stored in this
    case.

affine invariants (geometric hashing) Lamdan et
al., 1988
45
Geometric Hashing
46
Three-Dimensional Objects
  • No general-case invariants exist for single views
    of general 3D objects (Clemens Jacobs, 1991).
  • Special case and model-based invariants (Rothwell
    et al., 1995, Weinshall, 1993)

47
Indexing for 3D Object Recognition (contd)
  • One approach might be ...

48
Indexing for 3D Object Recognition (contd)
  • Another approach might be ...

49
Grouping
  • Grouping is the process that organizes the image
    into parts, each likely to come from a single
    object.
  • It reduces the number of hypotheses dramatically.
  • Non-accidental properties (grouping clues)
  • Orientation, Collinearity, Parallelism, Proximity

Convex groups (Jacobs, 1996)
50
Error Analysis
  • Uncertainty in feature locations
  • It is important to analyze the sensitivity of
    each algorithm with respect to uncertainty in the
    location of the image features.
  • Case of Indexing
  • Analyze how errors in the locations of the points
    affects the invariants.

51
Error Analysis (contd)
52
References
  • E. Grimson and T. Lozano-Perez, "Localizing
    overlapping parts by searching the interpretation
    tree", IEEE Pattern Analysis and Machine
    Intelligence, vol. 9, no. 4, pp. 469-482, July
    1987.
  • D. Huttenlocher and S. Ullman, "Recognizing solid
    objects by alignment with an image",
    International Journal of Computer Vision, vol. 5,
    no. 2, pp. 195-212, 1990.
  • Y. Lamdan, J. Schwartz, and H. Wolfson, "Affine
    invariant model-based object recognition", IEEE
    Trans. on Robotics and Automation, vol. 6, no. 5,
    pp. 578-589, October 1990.
  • Rigoutsos I. Hummel R., "A Bayesian approach to
    model matching with geometric hashing", CVGIP
    Image Understanding, 62, 11-26, 1995.

53
References (contd)
  • D. Clemens and D. Jacobs, "Space and time bounds
    on indexing 3D models from 2D images", IEEE
    Pattern Analysis and Machine Intelligence, vol.
    13 no. 10, pp. 1007-1017, 1991.
  • D. Thompson and J. Mundy, "Three dimensional
    model matching from an unconstrained viewpoint",
    IEEE Conference on Robotics and Automation, pp.
    208-220, 1987.
  • D. Ballard, "Generalizing the hough transform to
    detect arbitrary patterns", Pattern Recognition,
    vol. 13, no. 2, pp. 111-122, 1981.
  • H. Murase and S. Nayar, "Visual learning and
    recognition of 3D objects from appearance",
    International Journal of Computer Vision, vol.
    14, pp. 5-24, 1995.

54
References (contd)
  • M. Turk and A. Pentland, "Eigenfaces for
    Recognition", Journal of Cognitive Neuroscience,
    Vol. 3, pp. 71-86, 1991.
  • D. Jacobs, "Robust and efficient detection of
    salient convex groups", IEEE Transactions on
    Pattern Analysis and Machine Intelligence, vol.
    18, no. 1, pp. 23-37, 1996.
  • Bowyer and C. Dyer, "Aspect graphs an
    introduction and survey of recent results",
    International Journal of Imaging Systems and
    Technology, vol. 2, pp. 315-328, 1990.
Write a Comment
User Comments (0)
About PowerShow.com