Object Recognition using Local Affine Frames on Maximally Stable Extremal Regions - PowerPoint PPT Presentation

About This Presentation
Title:

Object Recognition using Local Affine Frames on Maximally Stable Extremal Regions

Description:

Appearance is consistent with the transformation. scaling, rotation, shearing ... Hessian-Affine. Edge. Intensity Extrema. Salient Regions. MSER. Harris-affine ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 34
Provided by: cse4
Category:

less

Transcript and Presenter's Notes

Title: Object Recognition using Local Affine Frames on Maximally Stable Extremal Regions


1
Object Recognition using Local Affine Frames on
Maximally Stable Extremal Regions
  • Stepan Obdrzalek
  • Jiri Matas

2
Proposed Algorithm
  • Identify affine-covariant regions of interest
  • MSER detector
  • Construct local affine frames (LAFs)
  • Invariant to geometry and photometrics
  • Normalize LAF geometry and color
  • Generate descriptors of patches
  • Discrete cosine transformation
  • Recognition Localization
  • Establish tentative correspondences
  • Find a globally consistent subset
  • Infer presence and location of object

3
Requirement for Region Detectors
  • Consistent
  • Discriminative
  • Invariant (actually covariant)
  • Appearance is consistent with the transformation
  • scaling, rotation, shearing
  • Fixed shape is insufficient
  • Shape must be covariant to object position
    (Sticky)

4
Popular Affine Covariant Detectors
  • Harris-Affine
  • Hessian-Affine
  • Edge
  • Intensity Extrema
  • Salient Regions
  • MSER

5
Harris-affine Hessian-affine
  • Detect interest points
  • Identify corners in image using Harris corner
    detector
  • Determine the characteristic scale
  • Maximization of Laplacian-of-Gaussians
  • Determine an elliptical region for each point
  • Second moment matrix

6
Edge based detector
  • Edges are stable across view, scale, illumination
  • Detect interest points
  • Identify corners in image using Harris corner
    detector
  • Identify edges using canny
  • Combine to form a parallelogram
  • Determine the characteristic scale
  • Parallelograms where textures hit an extremum

7
Intensity based detector
  • Detect interest points
  • Identify local extremum in intensity
  • Analyze rays projecting radially
  • Determine the characteristic scale
  • Best-fit ellipse that passes through ray-points
    with large intensity shifts

8
Salient region detector
  • Based on PDF of intensity values computed over
    elliptical region
  • Detect interest points
  • Measure the pixel entropy within elliptical
    regions
  • Select regions with high complexity
  • Determine the characteristic scale
  • Optimal scale is determined by the identified
    region

9
Maximally Stable Extremal Region (MSER)
  • Connected component of thresholded image
  • Efficient to implement O(number pixels)
  • Detect interest points
  • All pixels inside the MSER have higher or lower
    intensities than in the surrounding regions
  • Regions are selected to be stable over intensity
    range
  • Determine the characteristic scale
  • Optimal scale is automatic to MSER algorithm

10
Runtime comparison
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
Local Affine Frame (LAF) from Features
  • Comparing transformed image regions can be
    simplified by constructing a viewpoint invariant
    coordinate system that is feature-based
  • Coordinates are based on local features
  • Coordinates stick to features
  • Features must describe 6 degrees of freedom
  • Simple points and ellipses are not sufficient
  • MSER regions are sufficient
  • Assumptions
  • Local planarity
  • Perspective camera

16
Local Affine Frame (LAF) from Features
17
Local Affine Frame (LAF) from Features
  • 2D affine transformation has 6 degrees of
    freedom
  • 6 independent constraints must be found
  • Correspondence of 3 non-collinear points
  • Constraints are derived from detected primitives

18
Local Affine Frame (LAF) from Features
  • Region shape constructions
  • Center of gravity
  • 2 constraints resolves translation
  • 2x2 covariance matrix ?(ii)
  • 3 constraints Together with COG, fixes affine
    up to unknown rotation
  • Concavities
  • 4 constraints line and point tangent to line
  • Dont require detection of whole region
  • Curvature inflection points
  • From concave to convex
  • Straight line segments of boundary

19
Local Affine Frame (LAF) from Features
  • Intensity Constructions pixels inside a region
  • Orientations of gradients
  • Rotation
  • Direction of dominant texture periodicity
  • Rotaion
  • Extrema of RGB or any scalar function
  • 2 constraints

20
Local Affine Frame (LAF) from Features
  • Topology of regions Mutual configuration of
    regions
  • Nested regions
  • Neighboring regions
  • Holes
  • Incident regions

21
LAF Construction
  • Construction of primitives covering 6 degrees of
    freedom

22
Geometric Normalization
  • Translate between canonical / image frame
  • Origin (0,0)T, Basis Vectors (1,0)T, (0,1)T
  • Measurement Region (MR)
  • Image region used to determine local
    correspondences
  • (-2,3) x (-2,3)

23
Photometric Normalization
  • Translate between canonical / image frame
  • Reflections and shadows are ignored
  • Illumination, gain, aperture, etc. is modeled by
    affine transformations of color channels
  • Transformation between two patches I and I is
  • Requires 6 additional normalization parameters
  • Intensities are affinely transformed to have
  • zero mean
  • unit variance

24
Normalization of Local Representation
  • Translate between canonical / image frame
  • 12 normalization parameters stored with the
    descriptor
  • Coverage

25
Descriptors
  • Desirable properties
  • Distinguish between large number of regions
  • Maximize ratio of similarities between match
    mismatch
  • Robust or invariant to localization errors
    transformations
  • Efficient on memory and speed
  • Discrete Cosine Transformation (JPEG
    compression)
  • Algorithms require O(n lg n)
  • Hardware implementations
  • Robust to misalignment
  • Same discrimination as SIFT

26
Matching detected frames with query frames
  • Comparison
  • Compute similarities between all detected and
    query frames
  • Matching
  • Select most likely matches
  • Verification
  • Consistency check that incorporates geometric
    constraints

27
Comparison
  • Determine the probability that a transformation
    can take place
  • Based on training experience
  • If probability is below a threshold, 8
    similarity
  • Otherwise, determined by descriptor similarity

28
Matching
  • Nearest Match
  • Most common
  • For each detected frame, find closest query
    frame
  • Mutually Nearest Match
  • For symmetric matching (e.g. stereo)
  • For each detected, find closest query
  • For each query, find closest detected
  • Match if (close query close detected) or (diff
    lt threshold)
  • All (or N most) similar
  • Repetitive structures (many ambiguous
    correspondences)
  • Keep all correspondences, resolution left to
    verification
  • High number of false correspondences

29
Verification
  • All matches should be consistent with same model
  • 3D models would only be effective if visible
    parts of the image are very large (building
    interiors)
  • Sufficient to model as planar surfaces
  • If 2 tentative correspondences are part of the
    same plane
  • Similar geometric transformation
  • Similar photometric transformation
  • Set of all correspondences is decomposed into
    subsets of consistent correspondences
  • Each subset represents a single plane in the
    scene
  • Small sets are rejected

30
Experimental Validation COIL-100
  • 100 objects
  • 72 images each object
  • 5ยบ pose intervals
  • Controlled lighting

31
Experimental Validation ZuBuD
  • 201 buildings
  • 5 pictures each

32
Experimental Validation FOCUS
  • Product logos
  • Logos occupy small image portion
  • 360 color images

33
Conclusion
  • Object recognition based on local measurements
  • Affine invariance achieved by expressing local
    appearance in terms of affine covariant
    coordinates
  • Promising results
  • Problems
  • Speed is the primary issue
  • All query compared to all database
  • Speed improved using hashing, cost may be
    accuracy
  • Planar surface assumption
  • Rigid objects
  • Shadow, etc.
Write a Comment
User Comments (0)
About PowerShow.com