Detecting Pedestrians Using Patterns of Motion and Appearance Viola presentation

About This Presentation

Transcript and Presenter's Notes

Title: Detecting Pedestrians Using Patterns of Motion and Appearance Viola

1
Detecting Pedestrians Using Patterns of Motion
and Appearance (Viola Jones)

Jasper Snoek

2
Closely Related Work

P. Viola M. Jones - Robust Real-time Object
Detection, Workshop on Statistical and
Computational Theories of Vision, July 2001
P. Viola M. Jones Rapid Object Detection
Using a Boosted Cascade of Simple Features,
ICCVPR, 2001.
P. Viola M. Jones Robust Real-Time Face
Detection, IJCV, 2003
P. Viola, M. Jones D. Snow Detecting
Pedestrians Using Patterns of Motion and
Appearance, ICCV 2003

3
The Goals

Development of a representation of image motion
which is extremely efficient.
Implementation of a state of the art pedestrian
detection system which operates on low-res images
under difficult conditions.

The Approach

Find extremely basic features of the images that
can be computed very quickly. (Real-time)
Get a huge set of features, and then use machine
learning techniques (AdaBoost) to find the best
distinguishing features.

4
The Features

First 5 images are created from the original 2
(It It1) to represent motion
?, U, D, L, R by shifting It 1 pixel in the
corresponding direction (e.g. U means Up, ? means
no shift, its the temporal gradient) and taking
the absolute difference with It1.
These images represent crude gradients in motion.
The sum of the pixels of the images going in the
direction of motion will be greater than those
that dont.

5
The Features

A feature is a thresholded filter, fi.
a if fi(It, ?, U, D, L, R) gt ti
ß otherwise
For some constants a, ß, ti
There are essentially 3 types of filters.
1. fi ri (S)
2. fi abs(ri (?) ri (S))
3. fj ?j(S)
?m represents a sum of pixels over a rectangular
filter m.
S is one of It, ?, U, D, L or R.
ri (S) is a sum of pixel values over a box region
of image S.

6
Examples

Take two images

It
It1
7
Representing Motion (Examples)

Compute U, D, L, R by shifting image It over 1
pixel and taking the absolute difference with
It1. ? is computed as just abs(It - It1).

D has a sum of 121,020 U has a sum of 62,126. So
motion is in the upward direction
D
U
8
Filter Type 1

fi ri (S) S is any of It, ?, U, D, L, R.
ri (S) is the sum of
pixel values over a box region.

L
9
Filter Type 2

1. fi abs(ri (?) ri (S))

?
U
S is any of U, D, L, R. ri (S) is the sum of
pixel values over a box region.
10
Rectangular Features (Filter Type 3)

fi ?i(S), ? represents a rectangular filter
The total difference in pixel values between the
dark and light parts of the rectangles are the
filters.

Difference 224
Difference 6,683
Difference 5476
If we set the threshold to 300 this filter can
recognize the symmetry between eyes.
11
Classifier

A classifier is a thresholded sum of features.
C(It, It1) 1 iff Si Fi(It, ?, U, D, L, R) gt T,
A feature is a thresholded filter.
a if fi(It, ?, U, D, L, R) gt ti
ß otherwise
This gives us 4 parameters to select (a, ß, ti,
T) in addition to choosing what subset of filters
to use.

12
AdaBoost

1990 - The Strength of Weak Learnability
(Schapire)
1997 Generalized version of AdaBoost (Schapire
Singer)
AdaBoost is an algorithm for constructing a
strong classifier as linear combination
of simple weak classifiers ht(x).

13
Cascaded Classifier

Using all the features in the classifier would
take too long.
Instead a cascade of classifiers was used where
each subsequent level of the cascade contains
more features.
This way image patches that are very different
from actual pedestrians can be thrown out using
only a few features.

14
Experiments

Train each classifier in the cascade using 2250
positive examples and 2250 false positives from
the previous stages of cascade. (This lowers the
false positive rate at each stage)
Each stage is trained so that 99.5 of true
positives from previous stage are kept while 10
of false positives are eliminated (if this cant
be done, more features are added).

15
Experiments

Two detectors (dynamic and static).
Dynamic trained using 54,624 filters on the
original image It and the motion images ?, U, D,
L, R.
Static trained using 24,328 filters on only the
original image It.

16
Results

ROC curves for the classification (by adjusting
the number of features)

17
Results

Correct detections - 80
False positives (the total number of false
positives / the total number of patches tested)
1/400,000 for the dynamic detector which
corresponds to 1 false positive every 2 frames.
1/15,000 for the static detector which
corresponds to 13 false positives per frame.

18
Results
Dynamic detector
Static detector
19

Dynamic Detector
Static Detector
20
Comments

Using more complex features such as optical flow
would likely be more successful (but might make
things slower).
Why not use basic background subtraction? It
would greatly reduce the amount of pixels the
detector would have to search over.

21
Comments

Using information about where pedestrians were in
previous frames would improve the detector and
help against occlusions, etc. (i.e. tracking).
Is overfitting a problem? AdaBoost can succumb
to overfitting the training data (thus
generalizing badly) by picking too many features.
Here we have 2250 training examples and 54,624
features. Is 24.3 features per training example
not too much?

Write a Comment

User Comments (0)

About PowerShow.com

Detecting Pedestrians Using Patterns of Motion and Appearance Viola PowerPoint PPT Presentation