Dense Motion Estimation - PowerPoint PPT Presentation

About This Presentation

Dense Motion Estimation


Horn Schunck method optimizing a functional based on residuals from the brightness constancy constraint, and a particular regularization term expressing the ... – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 65
Provided by: staffUst6


Transcript and Presenter's Notes

Title: Dense Motion Estimation

Dense Motion Estimation
  • Reading Szeliski, Chapter 8

Dense Motion Estimation
Dense Motion Estimation
  • 2D motion in video sequence
  • Object tracking
  • Image stabilization

Motion Estimation
  • Error metric
  • Compare images
  • Search technique
  • Full search -- simple but slow
  • Hierarchical coarse-to-fine
  • Fourier transforms
  • Incremental methods
  • Optical flow
  • Multiple independent motions

Translational Alignment
  • Alignment between two images or image patches

Translational Alignment
  • Minimum of Sum of Squared Difference (SSD)
  • Assumption corresponding pixel values remains
    the same in the two images
  • ---- Brightness constancy constraint

Robust Error Metrics
  • Robust norm of error
  • (Huber 1981 Hampel, Ronchetti, Rousseeuw et al.
    1986 Black and Anandan 1996 Stewart 1999)
  • Sum of Absolute Difference (L1 norm)

Grows less quickly than the quadratic penalty
associated with least squares
ESAD is NOT differentiable at the origin, not
well suited to gradient descent approaches
Robust Error Metric
  • Smoothly varying function (Black and Rangarajan
    (1996) )
  • Quadratic for small values but
  • grows more slowly away from the origin
  • GemanMcClure function

Spatially Varying Weights
  • Pixels that may lie outside of the boundaries
  • Partially or completely downweight the
    contribution of certain pixels
  • Erase moving object for background alignment
  • Multiple moving objects

Weighted (or Windowed) SSD function
Weighted SSD
  • Large range of potential motion
  • Bias towards smaller overlap solutions

Bias and Gain (Exposure Differences)
  • For images being aligned were not taken with the
    same exposure
  • Simple model of linear intensity variation
  • --- Bias and Gain model

Bias and Gain
  • Least Squares with Bias and Gain
  • Linear regression
  • Color image
  • Estimate bias and gain for each color channel
  • Weighted prediction in video codecs

  • Cross-Correlation
  • Taking intensity difference
  • Maximize the produce of two aligned images

Is Bias and Gain modeling unnecessary?
Bright patch exists in images
Normalized Cross-Correlation
Mean images of the corresponding patches
  • NCC in -1,1
  • Works well when matching images taken with
    different exposure
  • Degrades for noisy low-contrast regions (Zero

Normalized Cross-Correlation
  • Normalized SSD score (Criminisi, Shotton, Blake
    et al., 2007).
  • Produce comparable results to NCC
  • More efficient when applied to a large number of
    overlapping patches using a moving average

Hierarchical Motion Estimation
  • How can we find its minimum?
  • Full search over some range of shifts
  • Often used for block matching in motion
    compensated video compression
  • Simple to implement but slow
  • To accelerate the search process
  • Hierarchical motion estimation

Hierarchical Motion Estimation
  • Steps
  • Construct image pyramid
  • At coarser levels, search over a smaller number
    of discrete pixels
  • Motion estimation at coarse level is used to
    initialize a smaller local search at the next
    finer level
  • Not guaranteed to produce the same results as a
    full search, but works almost as well and much

Hierarchical Motion Estimation
  • Image downsampling
  • Coarsest level search for the best that
    minimize the difference between
  • Full search over the range
  • Predict a likely displacement
  • Search over displacement is repeated at the finer
    level over a much narrower range
  • Incremental refinement step with warped image

Incremental Refinement
  • Nearest pixel integer pixel
  • Higher accuracy is required for stabilization or
  • Sub-pixel estimates
  • Evaluate several values (u,v) around the best
  • Interpolate the matching score to find the
    analytic minimum
  • Gradient descent on SSD energy function

Incremental Refinement
  • SSD energy and Taylor series expansion

Lucas and Kanade (1981)
Incremental Refinement
Optical flow constraint or brightness constancy
Incremental Refinement
Incremental Refinement
  • For efficiency
  • Precompute the Hessian and Jacobian image save
    significant computation
  • Precompute the inner product between the gradient
    field and shifted version of I1 allows the
    iterative re-computation of ei to be performed
    in constant time (independent of the number of

Incremental Refinement
  • Iterations
  • The effectiveness relies on the quality of Taylor
    series approximation
  • When far away from the true displacement (say,
    12 pixels), several iterations may be needed
  • It is possible to estimate a value for J_1 using
    a least squares fit to a series of larger
    displacements in order to increase the range of
    convergence (Jurie and Dhome 2002) or to learn
    a special-purpose recognizer for a given patch

Incremental Refinement
  • Stopping criterion
  • monitor the magnitude of the displacement
    correction u and to stop when it drops below a
    certain threshold (say, 1/10 of a pixel)
  • For larger motions
  • combine the incremental update rule with a
    hierarchical coarse-to-fine search strategy

Incremental Refinement
  • Poorly conditioned because of lack of
    two-dimensional texture in the patch being aligned

Uncertainty Modeling
  • Capture the reliability of a particular
    patch-based motion estimate
  • Simplest model covariance matrix
  • Captures the expected variance in the motion
    estimate in all possible directions
  • Under small amounts of additive Gaussian noise

Uncertainty modeling
  • For larger amounts of noise, the linearization
    performed by the LucasKanade algorithm is only
  • The minimum and maximum eigenvalues of the
    Hessian A can now be interpreted as the (scaled)
    inverse variances in the least-certain and
    most-certain directions of motion.

Bias and gain, weighting, and robust error metrics
  • 44 system of equations to estimate
  • Weighed SSD using Lucus-Kanade algorithm
  • Robust Error metrics
  • solved using the iteratively reweighted least
    squares technique

8.2 Parametric Motion
  • More sophisticated motion models
  • Affine, has 4 unknowns
  • Full search over possible range is impractical
  • Lucas-Kanade algorithm ? parametric motion models

(Lucas and Kanade 1981 Rehg and Witkin 1991 Fuh
and Maragos 1991 Bergen, Anandan, Hanna et al.
1992 Shashua and Toelg 1997 Shashua and Wexler
2001 Baker and Matthews 2004).
Parametric Motion
  • Instead of using a single constant translation u
  • Use a spatially varying motion field or
    correspondence map

Parametric Motion
Incremental Refinement
  • Translational motion
  • Parametric motion
  • Jacobian
  • (Gauss-Newton) Hessian
  • Gradient weighted residual vector

Patch-based Approximation
  • Expensive computation of A, b
  • N pixels and n parameters O(n2N)
  • Image to sub-blocks Pj, only accumulate the
    simpler 2x2 quantities

Compositional Approach
  • Complex parametric motion such as homography
  • Warp target image I_1 to the current estimate

Compositional Approach
  • and are assumed to be fairly similar,
    then only an incremental parametric motion is
    required, i.e. the incremental motion can be
    evaluated around

Szeliski and Shum (1997)
Compositional Approach
  • Homography

Compositional Approach
  • If the appearance of the warped and template
    images is similar enough, we can replace the
    gradient of with the gradient of
  • Pre-computate the Hessian matrix
  • The residual vector b can also be partially
    precomputed, i.e., the steepest descent images
    can can be
    precomputed and stored for later multiplication
    with the ea
    error images

Inverse Compositional Algorithm
Baker and Matthews (2004)
  • Rather than (conceptually) re-warping the warped
    target image I_1(x), they instead warp the
    template image I_0(x) and minimize
  • Identical to the forward warped algorithm with
  • Gradients are replaced by
  • Difference sign of e_i

Inverse Compositional Algorithm
Non-Linear Least Sequares
  • Solve using
  • Update
  • The parameter is an additional damping
    parameter used to ensure that the system takes a
    downhill step in energy (squared error) and is
    an essential component of the LevenbergMarquardt

8.4 Optical Flow
  • Optical flow or optic flow is the pattern of
    apparent motion of objects, surfaces, and edges
    in a visual scene caused by the relative motion
    between an observer (an eye or a camera) and the
  • The concept of optical flow was first studied in
    the 1940s and ultimately published by American
    psychologist James J. Gibson4 as part of his
    theory of affordance.
  • Optical flow techniques utilize this motion of
    the objects surfaces, and edges
  • motion detection, object segmentation,
    time-to-collision and focus of expansion
    calculations, motion compensated encoding, and
    stereo disparity measurement

8.4 Optical Flow
  • Independent estimate of motion at each pixel
  • Number of variables is twice the number of
    measurements -- underconstrained problem
  • two typical approaches
  • Patch-based or window-based approach
  • Add smoothness the terms on ui using
    regularization or Markov random fields and to
    search for a global minimum

Optical Flow
  • Phase correlation inverse of normalized
    cross-power spectrum
  • Block-based methods minimizing sum of squared
    differences or sum of absolute differences, or
    maximizing normalized cross-correlation
  • Differential methods of estimating optical flow,
    based on partial derivatives of the image signal
    and/or the sought flow field and higher-order
    partial derivatives, such as
  • LucasKanade Optical Flow Method regarding
    image patches and an affine model for the flow
  • HornSchunck method optimizing a functional
    based on residuals from the brightness constancy
    constraint, and a particular regularization term
    expressing the expected smoothness of the flow
  • BuxtonBuxton method based on a model of the
    motion of edges in image sequences9
  • BlackJepson method coarse optical flow via
  • General variational methods a range of
    modifications/extensions of HornSchunck, using
    other data terms and other smoothness terms.
  • Discrete optimization methods the search space
    is quantized, and then image matching is
    addressed through label assignment at every
    pixel, such that the corresponding deformation
    minimizes the distance between the source and the
    target image.10 The optimal solution is often
    recovered through min-cut max-flow algorithms,
    linear programming or belief propagation methods.

Optical Flow
  • Regularization-based framework Horn and Schunck
  • Instead of solving for each motion (or motion
    update) independently
  • Simultaneously minimized over all flow vectors
  • Smoothness constraints
  • Brightness constancy constraint

Optical Flow
  • Combine local and global flow estimation
  • Using a locally aggregated Hessian as the
    brightness constancy term
  • Replace per-pixel Hessian and
  • with aggregated version

Optical Flow
  • Combine global (parametric) and local motion
  • Estimate either per-image or per-segment affine
    motion models combined with per-pixel residual
  • Image brightness varying
  • Gradient descent and coarse-to-fine continuation
    methods to minimize the global energy function
  • Combinatorial optimization methods based on
    Markov random fields

Multi-frame Motion Estimation
  • Filter the spatio-temporal volume using oriented
    or steerable filters (Heeger 1988)
  • Spatio-temporal filtering uses a 3D volume around
    each pixel to determine the best orientation in
    spacetime, which corresponds to a pixels

Multi-frame Motion Estimation
  • Spatio-temporal filters have moderately large
    extents, which severely degrades the quality of
    their estimates near motion discontinuities
  • An alternative to full spatio-temporal filtering
    is to estimate more local spatio-temporal
    derivatives and use them inside a global
    optimization framework to fill in textureless
    regions (Bruhn,Weickert, and Schnorr 2005
    Govindu 2006).

8.5 Layered Motion
  • Global smoothness? Local neighborhood
  • Visual motion is caused by the movement of a
    number of objects at different depths
  • Pixels are grouped into appropriate objects or
  • The pixel motions can be described more succintly
    and estimated more reliably

Layered Motion
Layered Motion
  • Compact representation
  • Exploit the information available in multiple
    video frames
  • Accurately modeling the appearance of pixels near
    motion discontinuities
  • Image-based rendering
  • Object-level video editing

Layered Motion
Wang and Adelson (1994)
  • How to compute layered representation of a video?
  • Estimate affine motion models over a collection
    of non-overlapping patches
  • Cluster the estimates using K-means
  • Alternate between
  • Assigning pixels to layers
  • Recomputing the motion estimates for each layer
  • Construct layers
  • by warping and merging the various layer pieces
    from all frames together
  • median filter(shape composite layers that are
    robust to small intensity variations, infer
    occlusion between layers)

Layered Motion
Layered Motion
Weiss and Adelson (1996)
  • Probabilistic mixture model to
  • infer both the optimal number of layers and
  • the per-pixel layer assignments
  • Per-layer affine motion ? smooth regularized
    per-pixel motion (Weiss 1997)
  • Better handle curved layers

Layered Motion
  • Distinction between motion estimating and layer
  • Later estimating the layer colors
  • Generalized to account for real-world rigid
    motion scenes

Baker, Szeliski, and Anandan (1998)
A Layered Approach to Stereo Reconstruction
Baker, Szeliski, and Anandan (1998)
  • Motion of each frame
  • Described using a 3D camera model
  • Motion of each layer
  • Described using 3D plane equation
  • Per-pixel residual depth offsets
  • Initial layers estimation
  • Similar to Wang and Adelson, 1994
  • Affine motion ? homography
  • Final model refinement
  • Jointly re-optimize the layer pixel color and
    opacity and depth, plane, and motion parameters
  • By minimizing the discrepency between the
    re0synthesized and observed motion sequence

A Layered Approach to Stereo Reconstruction
Baker, Szeliski, and Anandan (1998)
  • Results

(g) before and (h) after residual depth estimation
A Layered Approach to Stereo Reconstruction
Baker, Szeliski, and Anandan (1998)
  • Motion boundaries and layer assignments are much
  • Individual layer color values are also shaper
  • because of per-pixel depth offsets
  • Require a rough initial assignment
  • Improvement Torr, Szeliski, and Anandan, 2001
  • Automated Bayesian techniques for
  • initializing the system and
  • Determining the optimal number of layers

Layered Motion
  • Active research area
  • Sawhney and Ayer 1996
  • Jojic and Frey 2001
  • Xiao and Shah 2005
  • Kumar, Torr, and Zisserman 2008
  • Thayananthan, Iwasaki, and Cipolla 2008
  • Schoenemann and Cremers 2008).
  • Alternate between segmentation and estimation of
    optical flow

Transparent Layers and Reflections
  • Reflection in windows, picture frames,
  • Reflection Model ?how much intensity each layer
    contributed to the final image

Glass surface
The amount of reflected light is quite low
compared to the transmitted light (the picture of
the girl) and yet the algorithm is still able to
recover both layers.
Transparent Layers and Reflections
  • If the motions of individual layers are known
  • Suffer from low-frequency ambiguities
  • Especially, the layers lacks dark pixels
  • The motion is uni-directional

Transparent Layers and Reflections
Szeliski, Avidan, and Anandan (2000)
  • Simultaneous estimation of motion and layer
  • Alternating between
  • Robustly computing the motion layers
  • Making conservative estimates of the layer
  • Final motion and layer
  • Polished using gradient descent on joint
    constrained least squares
  • Parametric motion models
  • Only valid for planar reflectors scenes with
    shallow depth
  • More extensions Swaminathan, Kang, Szeliski et
    al. 2002 Criminisi, Kang, Swaminathan et al.
    2005, Tsin, Kang, and Szeliski 2006
Write a Comment
User Comments (0)