Fusion of frequency and spatial domain information for motion analysis - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Fusion of frequency and spatial domain information for motion analysis

Description:

Fusion of frequency and spatial domain information for motion analysis ... Violated at motion and occlusion boundaries, specular reflections and transparency ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 22
Provided by: firdaus8
Category:

less

Transcript and Presenter's Notes

Title: Fusion of frequency and spatial domain information for motion analysis


1
Fusion of frequency and spatial domain
information for motion analysis
  • Alexia Briasoulli, Narendra Ahuja
  • Beckman Institute, Dept of ECE
  • UIUC
  • ICPR-2004

2
Motion Estimation Techniques
  • Single motion estimation based on optical flow
    uses Brightness Constancy Constraint
  • Data Conservation Constraint
  • Assumes that image brightness of a region remains
    constant while its location may change
  • Violated at motion and occlusion boundaries,
    specular reflections and transparency
  • Spatial Coherence Constraint
  • Assumes that surfaces have spatial extent and
    optical flow within the neighbourhood changes
    gradually
  • Violated at surface boundaries

3
Spatial Approaches
  • Detection and estimation of multiple motions is a
    segmentation problem
  • Ill posed Requires simultaneous determination
    of optical flow and motion boundary
  • Generalized Aperture Problem
  • Aperture size must be large to detect the
    presence of motion (to constrain the solution)
  • Aperture size must be small to avoid violating
    optical flow assumptions and avoid multiple
    motion
  • Most spatial approaches uses iterative-EM
    techniques on parametric models

4
Spatiotemporal Energy Model
  • Adelson and Bergen, 1984, Journal of the Optical
    Society of America
  • Provide an optical flow analysis for single body
    motion based on spectral anaysis
  • A motion sequence is a pattern in the x-y-t space
  • Velocity of motion corresponds to a 3D
    orientation in this space
  • Motion orientation and motion energy can be
    extracted by linear filters oriented in
    space-time and tuned in spatial frequency

5
Spectral vs. Spatial Approaches
  • Yu, et al. showed that
  • Spectral motion model describes both occlusion
    and transparency based discontinuities
  • Spatial model is more appropriate for occlusion
    analysis because it provide finer resolution and
    requires less frames
  • Spatially
  • Image sequence can be decomposed into different
    layers, where each layer has a smooth optical
    flow field
  • Discontinuities due to occlusion and transparency
    are different in the spatial domain
  • Occlusion is a step-function at the occlusion
    boundary
  • Transparency results from overlap of 2 motions in
    the window
  • Cannot be unified into a single model which
    accounts for both kinds of multiple motions

6
Spectral Analysis of Occlusion
  • Occlusion in spatial domain is modeled by
  • where
  • x 2D spatial coordinates
  • U(x) Heavyside unit step function describing
    the occlusion boundary
  • I1(x) Occluding 2D signal (foreground) moving
    with velocity v1(u1,v1)
  • I2(x) Occluded 2D signal (background) moving
    with velocity v2(u2,v2)
  • FT of the signal is
  • where
  • k spatial frequency (?x, ?y)T
  • ?t temporal frequency
  • The first term is the spectrum of the occluding
    signal along with a distortion term A(k)
  • The second term is the exact spectrum of the
    occluded signal
  • The third term is a convolution of a 3D spectral
    line passing through the origin and the spectrum
    of the occluded signal

7
Spectral Analysis of Transparency
  • Transparency is viewed as a special case of
    occlusion by substituting the Heavyside function
    with a real constant a (0ltalt1)
  • The corresponding spectrum is characterized by 2
    oriented planes without distortion
  • Though in the case occlusion there is an
    additional distortion term, most of the energy is
    concentrated on the two spectral planes
  • Thus both occlusion and transparency are
    characterized by multiple spectral planes passing
    through the origin
  • Corresponding motion is described by the normals
    to these planes!!

8
Comparison
  • With spectral analysis, multiple planes describe
    both occlusion and transparency based
    discontinuities
  • Spatial analysis is able to describe only
    occlusion based motion
  • However, there is a severe problem in obtaining
    the energy spectrum of an image sequence
  • Due to the block effect of the DFT
  • To overcome this, the LFT is used blurring of
    the spectrum reduces resolution of spectral
    model
  • To increase resolution, the LFT window needs to
    be increased, but in a large spatio-temporal
    neighborhood the constant motion assumption is
    endangered
  • Therefore occlusions are best analyzed in spatial
    domain

9
Why Integrated Approach?
  • Spectral analysis has following advantages
  • Motion estimation is based on phase changes of
    the FT, so it is robust to global illumination
    changes
  • Computational cost is significantly lower
  • Size and shape of moving objects do not affect
    analysis
  • However spectral information suffers from
    resolution problems
  • Use spatial information to improve analysis
    accuracy

10
Frequency Domain
  • M Number of moving objects l, 1 l M
  • luminance at pixel and velocity
  • The FT of object l is
  • Where

  • is the 2D freq
  • is the image size
  • is FT magnitude and is
    FT phase
  • Each object is displaced by after
    each frame, so its FT becomes
  • Background has FT
  • FT of frame k is
  • Measurement noise is

11
Frequency Domain (contd)
  • For frame 1
  • A moving object occludes one part of the
    background and un-occludes another. FT of
    un-occluded and occluded parts of background from
    frames 1 to k are and
  • FT of frame N
  • Stacking the FTs of the N frames
  • X Z Vnoise Vbck
  • where Z is N x (M 1) data matrix
  • Vnoise is additive measurement noise
  • Vbck represents occluded and unoccluded
    background areas

12
Frequency Domain (contd)
  • Decompose Z as Z AS
  • S Sb, S1,, SMT
  • A a Vandermonde matrix containing motion
    information, with rows
  • We have
  • is an overdetermined system
  • solve in an LS sense to get S

13
Counting Number of Objects
  • Rank of noiseless data correlation matrix
    RZARSAH, where is RS the correlation matrix of S
    is equal to rank of A
  • Due to Vandermonde structure of A, it has M
    independent columns. Therefore rank gives number
    of independently moving objects
  • For noise with RV s2I, singular values of
    sample correlation matrix RX are
  • Where are the
    singular values of RZ
  • In practice so M can be
    determined from them

14
Motion Estimation
  • FT of the frames contains motion information in
    the form of a sum of weighted harmonics
  • Authors propose a simple, computationally cheap
    method for motion estimation that is not
    restricted to constant translations
  • Constant Motion
  • Phase change F1,k of frames 1 k
  • Its inverse FT f1,k is a weighted sum of delta
    functions
  • Peaks corresponding to the harmonics
    for each object l
  • Can extract motion
  • In practice aliasing due neighboring peaks can
    degrade resolution
  • First detect and remove strong motion components.
  • Then weaker harmonics can be detected more easily

15
Time Varying Motion
  • Initially estimated gives the avg velocity
  • T1,k is the time from frame 1 to k
  • This can be repeated for shorter subsequences,
    until velocities become similar, which results in
    constant motion
  • If velocity of object l from frames 1 to k is
    and the rows of
    are
  • does not have Vandermonde structure, so
    number of independent motions cannot be estimated
    beforehand
  • However, displacements between frames 1 to k can
    be estimated
  • From these estimates number of motions can be
    found
  • S can be obtained by an LS solution of
    instead of A.

16
Difference Masks from Frequency Domain Solutions
  • A accurate solution for S can be obtained from X
    X Vbck Z Vnoise, if Vbck is known
  • Approximate Vbck using object mask which is
    iteratively improved
  • From each LS solution and frame luminance s
    at each pixel, get
  • Dmask,l closer to 1 for pixels belonging to
    object l since Dl(x,y)0 in these positions
  • In pixels not on object l, Dmask,l is closer to 0
  • Thus LS solution of Sl gives a measure of
    probability that a pixel (x,y) belongs to object l

17
Probability Masks from Velocity Mapping
  • Frame pixels are tracked by assigning object
    velocities or background velocity 0 to each of
    them
  • If a pixel is tracked with correct velocity
    its luminance remains fairly constant, i.e. it
    has small variance
  • pixel is tracked with incorrect velocity, its
    variance increases
  • Let Fl be the probability that tracked pixel has
    small variance i.e. the pixel belongs to object l
  • This gives spatial probability mask
  • that pixel (x,y) belongs to object l
  • Frequency based and spatial masks are combined to
    give an optimal probability mask that helps find
    the Vbck and S

18
Results Synthetic Data
  • Constant Motion
  • 2 squares translating against a black background
  • SVD of Rx has 2 harmonics
  • Peaks of FT give correct velocities, and moving
    objects are correctly segmented
  • Non-constant Translational Motion
  • 2 separate motions in frames 1-5 and frames 6-16
  • When all frames are used estimates between true
    values are obtained
  • When analyzed as two sequences, there is a clear
    separation of time varying velocites

19
Results - Real Image Synthetic Sequences
  • Motion is accurately estimated as 15,0
  • There are artifacts in determination of Vbck,
    which become zero when Vbck is known
  • Vertical stripes outside the motion region due to
    regularization in the LS solution

20
Results Real Sequence with Multiple Objects
  • Sequence with dark car moving rightwards and
    white car moving leftwards
  • Initial LS solution separates background and the
    2 moving objects
  • Frequency domain results are enhanced by the
    spatial techniques (probability masks) which
    reduces error

Original sequence
Initially recovered background
Originally extracted first and second cars
Cars after spatial masking technique
21
Conclusion and Critique
  • Contributions
  • Authors build on work done by Yu, Sommer, et
    al. towards integrating spatial and spectral
    methods
  • Propose new technique of determining number of
    objects, motion estimation and segmentation in
    frequency domain, coupled with object masks in
    spatial domain to improve robustness
  • Results look good
  • Limitations
  • Method is restricted to translational motion only
    not obvious how it can be extended to rotations
    and shear (rotations will destroy the Vandermonde
    structure of the A matrix)
  • The technique for non-constant translations is
    very hacky
  • Avoid addressing the segmentation problem by
    recasting it as temporal segmentation
  • I suspect it would not work very well in practice
  • Paper does not address the theoretical
    foundations of the mathematics.
Write a Comment
User Comments (0)
About PowerShow.com