Title: Fusion of frequency and spatial domain information for motion analysis
1Fusion of frequency and spatial domain
information for motion analysis
- Alexia Briasoulli, Narendra Ahuja
- Beckman Institute, Dept of ECE
- UIUC
- ICPR-2004
2Motion Estimation Techniques
- Single motion estimation based on optical flow
uses Brightness Constancy Constraint - Data Conservation Constraint
- Assumes that image brightness of a region remains
constant while its location may change - Violated at motion and occlusion boundaries,
specular reflections and transparency - Spatial Coherence Constraint
- Assumes that surfaces have spatial extent and
optical flow within the neighbourhood changes
gradually - Violated at surface boundaries
3Spatial Approaches
- Detection and estimation of multiple motions is a
segmentation problem - Ill posed Requires simultaneous determination
of optical flow and motion boundary - Generalized Aperture Problem
- Aperture size must be large to detect the
presence of motion (to constrain the solution) - Aperture size must be small to avoid violating
optical flow assumptions and avoid multiple
motion - Most spatial approaches uses iterative-EM
techniques on parametric models
4Spatiotemporal Energy Model
- Adelson and Bergen, 1984, Journal of the Optical
Society of America - Provide an optical flow analysis for single body
motion based on spectral anaysis - A motion sequence is a pattern in the x-y-t space
- Velocity of motion corresponds to a 3D
orientation in this space - Motion orientation and motion energy can be
extracted by linear filters oriented in
space-time and tuned in spatial frequency
5Spectral vs. Spatial Approaches
- Yu, et al. showed that
- Spectral motion model describes both occlusion
and transparency based discontinuities - Spatial model is more appropriate for occlusion
analysis because it provide finer resolution and
requires less frames - Spatially
- Image sequence can be decomposed into different
layers, where each layer has a smooth optical
flow field - Discontinuities due to occlusion and transparency
are different in the spatial domain - Occlusion is a step-function at the occlusion
boundary - Transparency results from overlap of 2 motions in
the window - Cannot be unified into a single model which
accounts for both kinds of multiple motions
6Spectral Analysis of Occlusion
- Occlusion in spatial domain is modeled by
- where
- x 2D spatial coordinates
- U(x) Heavyside unit step function describing
the occlusion boundary - I1(x) Occluding 2D signal (foreground) moving
with velocity v1(u1,v1) - I2(x) Occluded 2D signal (background) moving
with velocity v2(u2,v2) - FT of the signal is
- where
- k spatial frequency (?x, ?y)T
- ?t temporal frequency
- The first term is the spectrum of the occluding
signal along with a distortion term A(k) - The second term is the exact spectrum of the
occluded signal - The third term is a convolution of a 3D spectral
line passing through the origin and the spectrum
of the occluded signal
7Spectral Analysis of Transparency
- Transparency is viewed as a special case of
occlusion by substituting the Heavyside function
with a real constant a (0ltalt1) - The corresponding spectrum is characterized by 2
oriented planes without distortion - Though in the case occlusion there is an
additional distortion term, most of the energy is
concentrated on the two spectral planes - Thus both occlusion and transparency are
characterized by multiple spectral planes passing
through the origin - Corresponding motion is described by the normals
to these planes!!
8Comparison
- With spectral analysis, multiple planes describe
both occlusion and transparency based
discontinuities - Spatial analysis is able to describe only
occlusion based motion - However, there is a severe problem in obtaining
the energy spectrum of an image sequence - Due to the block effect of the DFT
- To overcome this, the LFT is used blurring of
the spectrum reduces resolution of spectral
model - To increase resolution, the LFT window needs to
be increased, but in a large spatio-temporal
neighborhood the constant motion assumption is
endangered - Therefore occlusions are best analyzed in spatial
domain
9Why Integrated Approach?
- Spectral analysis has following advantages
- Motion estimation is based on phase changes of
the FT, so it is robust to global illumination
changes - Computational cost is significantly lower
- Size and shape of moving objects do not affect
analysis - However spectral information suffers from
resolution problems - Use spatial information to improve analysis
accuracy
10Frequency Domain
- M Number of moving objects l, 1 l M
- luminance at pixel and velocity
- The FT of object l is
- Where
-
is the 2D freq - is the image size
- is FT magnitude and is
FT phase - Each object is displaced by after
each frame, so its FT becomes - Background has FT
- FT of frame k is
- Measurement noise is
11Frequency Domain (contd)
- For frame 1
- A moving object occludes one part of the
background and un-occludes another. FT of
un-occluded and occluded parts of background from
frames 1 to k are and - FT of frame N
- Stacking the FTs of the N frames
- X Z Vnoise Vbck
- where Z is N x (M 1) data matrix
- Vnoise is additive measurement noise
- Vbck represents occluded and unoccluded
background areas
12Frequency Domain (contd)
- Decompose Z as Z AS
- S Sb, S1,, SMT
- A a Vandermonde matrix containing motion
information, with rows - We have
- is an overdetermined system
- solve in an LS sense to get S
13Counting Number of Objects
- Rank of noiseless data correlation matrix
RZARSAH, where is RS the correlation matrix of S
is equal to rank of A - Due to Vandermonde structure of A, it has M
independent columns. Therefore rank gives number
of independently moving objects - For noise with RV s2I, singular values of
sample correlation matrix RX are - Where are the
singular values of RZ - In practice so M can be
determined from them
14Motion Estimation
- FT of the frames contains motion information in
the form of a sum of weighted harmonics - Authors propose a simple, computationally cheap
method for motion estimation that is not
restricted to constant translations - Constant Motion
- Phase change F1,k of frames 1 k
- Its inverse FT f1,k is a weighted sum of delta
functions - Peaks corresponding to the harmonics
for each object l - Can extract motion
- In practice aliasing due neighboring peaks can
degrade resolution - First detect and remove strong motion components.
- Then weaker harmonics can be detected more easily
15Time Varying Motion
- Initially estimated gives the avg velocity
- T1,k is the time from frame 1 to k
- This can be repeated for shorter subsequences,
until velocities become similar, which results in
constant motion - If velocity of object l from frames 1 to k is
and the rows of
are - does not have Vandermonde structure, so
number of independent motions cannot be estimated
beforehand - However, displacements between frames 1 to k can
be estimated - From these estimates number of motions can be
found - S can be obtained by an LS solution of
instead of A.
16Difference Masks from Frequency Domain Solutions
- A accurate solution for S can be obtained from X
X Vbck Z Vnoise, if Vbck is known - Approximate Vbck using object mask which is
iteratively improved - From each LS solution and frame luminance s
at each pixel, get - Dmask,l closer to 1 for pixels belonging to
object l since Dl(x,y)0 in these positions - In pixels not on object l, Dmask,l is closer to 0
- Thus LS solution of Sl gives a measure of
probability that a pixel (x,y) belongs to object l
17Probability Masks from Velocity Mapping
- Frame pixels are tracked by assigning object
velocities or background velocity 0 to each of
them - If a pixel is tracked with correct velocity
its luminance remains fairly constant, i.e. it
has small variance - pixel is tracked with incorrect velocity, its
variance increases - Let Fl be the probability that tracked pixel has
small variance i.e. the pixel belongs to object l - This gives spatial probability mask
- that pixel (x,y) belongs to object l
- Frequency based and spatial masks are combined to
give an optimal probability mask that helps find
the Vbck and S
18Results Synthetic Data
- Constant Motion
- 2 squares translating against a black background
- SVD of Rx has 2 harmonics
- Peaks of FT give correct velocities, and moving
objects are correctly segmented
- Non-constant Translational Motion
- 2 separate motions in frames 1-5 and frames 6-16
- When all frames are used estimates between true
values are obtained - When analyzed as two sequences, there is a clear
separation of time varying velocites
19Results - Real Image Synthetic Sequences
- Motion is accurately estimated as 15,0
- There are artifacts in determination of Vbck,
which become zero when Vbck is known - Vertical stripes outside the motion region due to
regularization in the LS solution
20Results Real Sequence with Multiple Objects
- Sequence with dark car moving rightwards and
white car moving leftwards - Initial LS solution separates background and the
2 moving objects - Frequency domain results are enhanced by the
spatial techniques (probability masks) which
reduces error
Original sequence
Initially recovered background
Originally extracted first and second cars
Cars after spatial masking technique
21Conclusion and Critique
- Contributions
- Authors build on work done by Yu, Sommer, et
al. towards integrating spatial and spectral
methods - Propose new technique of determining number of
objects, motion estimation and segmentation in
frequency domain, coupled with object masks in
spatial domain to improve robustness - Results look good
- Limitations
- Method is restricted to translational motion only
not obvious how it can be extended to rotations
and shear (rotations will destroy the Vandermonde
structure of the A matrix) - The technique for non-constant translations is
very hacky - Avoid addressing the segmentation problem by
recasting it as temporal segmentation - I suspect it would not work very well in practice
- Paper does not address the theoretical
foundations of the mathematics.