Fusion of frequency and spatial domain information for motion analysis - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Fusion of frequency and spatial domain information for motion analysis

Description:

Fusion of frequency and spatial domain information for motion analysis ... Violated at motion and occlusion boundaries, specular reflections and transparency ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 22

Provided by: firdaus8

Category:

more less

Transcript and Presenter's Notes

Title: Fusion of frequency and spatial domain information for motion analysis

1
Fusion of frequency and spatial domain
information for motion analysis

Alexia Briasoulli, Narendra Ahuja
Beckman Institute, Dept of ECE
UIUC
ICPR-2004

2
Motion Estimation Techniques

Single motion estimation based on optical flow
uses Brightness Constancy Constraint
Data Conservation Constraint
Assumes that image brightness of a region remains
constant while its location may change
Violated at motion and occlusion boundaries,
specular reflections and transparency
Spatial Coherence Constraint
Assumes that surfaces have spatial extent and
optical flow within the neighbourhood changes
gradually
Violated at surface boundaries

3
Spatial Approaches

Detection and estimation of multiple motions is a
segmentation problem
Ill posed Requires simultaneous determination
of optical flow and motion boundary
Generalized Aperture Problem
Aperture size must be large to detect the
presence of motion (to constrain the solution)
Aperture size must be small to avoid violating
optical flow assumptions and avoid multiple
motion
Most spatial approaches uses iterative-EM
techniques on parametric models

4
Spatiotemporal Energy Model

Adelson and Bergen, 1984, Journal of the Optical
Society of America
Provide an optical flow analysis for single body
motion based on spectral anaysis
A motion sequence is a pattern in the x-y-t space
Velocity of motion corresponds to a 3D
orientation in this space
Motion orientation and motion energy can be
extracted by linear filters oriented in
space-time and tuned in spatial frequency

5
Spectral vs. Spatial Approaches

Yu, et al. showed that
Spectral motion model describes both occlusion
and transparency based discontinuities
Spatial model is more appropriate for occlusion
analysis because it provide finer resolution and
requires less frames
Spatially
Image sequence can be decomposed into different
layers, where each layer has a smooth optical
flow field
Discontinuities due to occlusion and transparency
are different in the spatial domain
Occlusion is a step-function at the occlusion
boundary
Transparency results from overlap of 2 motions in
the window
Cannot be unified into a single model which
accounts for both kinds of multiple motions

6
Spectral Analysis of Occlusion

Occlusion in spatial domain is modeled by
where
x 2D spatial coordinates
U(x) Heavyside unit step function describing
the occlusion boundary
I1(x) Occluding 2D signal (foreground) moving
with velocity v1(u1,v1)
I2(x) Occluded 2D signal (background) moving
with velocity v2(u2,v2)
FT of the signal is
where
k spatial frequency (?x, ?y)T
?t temporal frequency
The first term is the spectrum of the occluding
signal along with a distortion term A(k)
The second term is the exact spectrum of the
occluded signal
The third term is a convolution of a 3D spectral
line passing through the origin and the spectrum
of the occluded signal

7
Spectral Analysis of Transparency

Transparency is viewed as a special case of
occlusion by substituting the Heavyside function
with a real constant a (0ltalt1)
The corresponding spectrum is characterized by 2
oriented planes without distortion
Though in the case occlusion there is an
additional distortion term, most of the energy is
concentrated on the two spectral planes
Thus both occlusion and transparency are
characterized by multiple spectral planes passing
through the origin
Corresponding motion is described by the normals
to these planes!!

8
Comparison

With spectral analysis, multiple planes describe
both occlusion and transparency based
discontinuities
Spatial analysis is able to describe only
occlusion based motion
However, there is a severe problem in obtaining
the energy spectrum of an image sequence
Due to the block effect of the DFT
To overcome this, the LFT is used blurring of
the spectrum reduces resolution of spectral
model
To increase resolution, the LFT window needs to
be increased, but in a large spatio-temporal
neighborhood the constant motion assumption is
endangered
Therefore occlusions are best analyzed in spatial
domain

9
Why Integrated Approach?

Spectral analysis has following advantages
Motion estimation is based on phase changes of
the FT, so it is robust to global illumination
changes
Computational cost is significantly lower
Size and shape of moving objects do not affect
analysis
However spectral information suffers from
resolution problems
Use spatial information to improve analysis
accuracy

10
Frequency Domain

M Number of moving objects l, 1 l M
luminance at pixel and velocity
The FT of object l is
Where
is the 2D freq
is the image size
is FT magnitude and is
FT phase
Each object is displaced by after
each frame, so its FT becomes
Background has FT
FT of frame k is
Measurement noise is

11
Frequency Domain (contd)

For frame 1
A moving object occludes one part of the
background and un-occludes another. FT of
un-occluded and occluded parts of background from
frames 1 to k are and
FT of frame N
Stacking the FTs of the N frames
X Z Vnoise Vbck
where Z is N x (M 1) data matrix
Vnoise is additive measurement noise
Vbck represents occluded and unoccluded
background areas

12
Frequency Domain (contd)

Decompose Z as Z AS
S Sb, S1,, SMT
A a Vandermonde matrix containing motion
information, with rows
We have
is an overdetermined system
solve in an LS sense to get S

13
Counting Number of Objects

Rank of noiseless data correlation matrix
RZARSAH, where is RS the correlation matrix of S
is equal to rank of A
Due to Vandermonde structure of A, it has M
independent columns. Therefore rank gives number
of independently moving objects
For noise with RV s2I, singular values of
sample correlation matrix RX are
Where are the
singular values of RZ
In practice so M can be
determined from them

14
Motion Estimation

FT of the frames contains motion information in
the form of a sum of weighted harmonics
Authors propose a simple, computationally cheap
method for motion estimation that is not
restricted to constant translations
Constant Motion
Phase change F1,k of frames 1 k
Its inverse FT f1,k is a weighted sum of delta
functions
Peaks corresponding to the harmonics
for each object l
Can extract motion
In practice aliasing due neighboring peaks can
degrade resolution
First detect and remove strong motion components.
Then weaker harmonics can be detected more easily

15
Time Varying Motion

Initially estimated gives the avg velocity
T1,k is the time from frame 1 to k
This can be repeated for shorter subsequences,
until velocities become similar, which results in
constant motion
If velocity of object l from frames 1 to k is
and the rows of
are
does not have Vandermonde structure, so
number of independent motions cannot be estimated
beforehand
However, displacements between frames 1 to k can
be estimated
From these estimates number of motions can be
found
S can be obtained by an LS solution of
instead of A.

16
Difference Masks from Frequency Domain Solutions

A accurate solution for S can be obtained from X
X Vbck Z Vnoise, if Vbck is known
Approximate Vbck using object mask which is
iteratively improved
From each LS solution and frame luminance s
at each pixel, get
Dmask,l closer to 1 for pixels belonging to
object l since Dl(x,y)0 in these positions
In pixels not on object l, Dmask,l is closer to 0
Thus LS solution of Sl gives a measure of
probability that a pixel (x,y) belongs to object l

17
Probability Masks from Velocity Mapping

Frame pixels are tracked by assigning object
velocities or background velocity 0 to each of
them
If a pixel is tracked with correct velocity
its luminance remains fairly constant, i.e. it
has small variance
pixel is tracked with incorrect velocity, its
variance increases
Let Fl be the probability that tracked pixel has
small variance i.e. the pixel belongs to object l
This gives spatial probability mask
that pixel (x,y) belongs to object l
Frequency based and spatial masks are combined to
give an optimal probability mask that helps find
the Vbck and S

18
Results Synthetic Data

Constant Motion
2 squares translating against a black background
SVD of Rx has 2 harmonics
Peaks of FT give correct velocities, and moving
objects are correctly segmented

Non-constant Translational Motion
2 separate motions in frames 1-5 and frames 6-16
When all frames are used estimates between true
values are obtained
When analyzed as two sequences, there is a clear
separation of time varying velocites

19
Results - Real Image Synthetic Sequences

Motion is accurately estimated as 15,0
There are artifacts in determination of Vbck,
which become zero when Vbck is known
Vertical stripes outside the motion region due to
regularization in the LS solution

20
Results Real Sequence with Multiple Objects

Sequence with dark car moving rightwards and
white car moving leftwards
Initial LS solution separates background and the
2 moving objects
Frequency domain results are enhanced by the
spatial techniques (probability masks) which
reduces error

Original sequence
Initially recovered background
Originally extracted first and second cars
Cars after spatial masking technique
21
Conclusion and Critique

Contributions
Authors build on work done by Yu, Sommer, et
al. towards integrating spatial and spectral
methods
Propose new technique of determining number of
objects, motion estimation and segmentation in
frequency domain, coupled with object masks in
spatial domain to improve robustness
Results look good
Limitations
Method is restricted to translational motion only
not obvious how it can be extended to rotations
and shear (rotations will destroy the Vandermonde
structure of the A matrix)
The technique for non-constant translations is
very hacky
Avoid addressing the segmentation problem by
recasting it as temporal segmentation
I suspect it would not work very well in practice
Paper does not address the theoretical
foundations of the mathematics.