New Features and Insights for Pedestrian Detection Stefan - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

New Features and Insights for Pedestrian Detection Stefan

Description:

New Features and Insights for Pedestrian Detection Stefan Walk, Nikodem Majer, Konrad Schindler, Bernt Schiele * Outline Authors Abstract Main contributions ... – PowerPoint PPT presentation

Number of Views:491

Avg rating:3.0/5.0

Slides: 39

Provided by: jdlAcCnp

Category:

more less

Transcript and Presenter's Notes

Title: New Features and Insights for Pedestrian Detection Stefan

1
New Features and Insights for Pedestrian Detection
Stefan Walk, Nikodem Majer, Konrad Schindler,
Bernt Schiele
1
2
Outline

Authors
Abstract
Main contributions
Algorithms
Experiments
Conclusion

2
3
Authors (1/4)

Stefan Walk
Experience
2007-, PhD Candidate in Computer Science,
Technische
Universität Darmstadt
2003-2007, Diploma in Physics, Technische
Universität
Darmstadt, Germany 2007
Research interest
People Detection
Detecting from video data (utilizing motion
information)
Papers
Multi-cue Onboard Pedestrian Detection (CVPR09)

3
4
Authors (2/4)

Nikodem Majer
Experience
2007-, PhD Candidate in Computer Science,
Technische
Universität Darmstadt
Research interest
Papers

4
5
Authors (3/4)

Konrad Schindler
Experience
2009- assistant professor, TU Darmstadt, Germany
2007-2008 post-doc, ETH Zurich
2004-2006 post-doc, Monash University,
Melbourne/Australia
2001-2003 research assistant, Graz University of
Technology, Austria
Research interest
computer vision (3D scene analysis, biologically
inspired vision, tracking)
image processing, pattern recognition, machine
learning, photogrammetry
Papers
PAMI10, CVPR10, ICCV10

5
6
Authors (4/4)

Bernt Schiele
Experience
1999-2004, Assistant Professor, ETH Zurich,
Switzerland
1997-2000, Postdoctoral Associate and Visiting
Assistant Professor,
MIT and Cambridge, MA, USA
1994, Visiting researcher at CMU
AE of PAMI, IJCV, AC of ECCV08, CVPR09,
ICCV09,
PC of ICCV 2011
Research interest
Perceptual computing, human-computer interfaces
Papers

6
7
Outline

Authors
Abstract
Main contributions
Algorithms
Experiments
Conclusion

7
8
Abstract (1/2)

Despite impressive progress in people detection
the performance on challenging datasets like
Caltech Pedestrians or TUD-Brussels is still
unsatisfactory
In this work we show that motion features derived
from optic flow yield substantial improvements on
image sequences, if implemented correctlyeven in
the case of low-quality video and consequently
degraded flow fields
Furthermore, we introduce a new feature,
self-similarity on color channels, which
consistently improves detection performance both
for static images and for video sequences, across
different datasets. In combination with HOG,
these two features outperform the
state-of-the-art by up to 20.

8
9
Abstract (2/2)

Finally, we report two insights concerning
detector evaluations, which apply to
classifier-based object detection in general
First, we show that a commonly under-estimated
detail of training, the number of bootstrapping
rounds, has a drastic influence on the relative
(and absolute) performance of different
feature/classifier combinations
Second, we discuss important intricacies of
detector evaluation and show that current
benchmarking protocols lack crucial details,
which can distort evaluations

9
10
Outline

Authors
Abstract
Main contributions
Algorithms
Experiments
Conclusion

10
11
Main contribution

First, we introduce a new feature based on
self-similarity of low level features, in
particular color histograms from different
sub-regions within the detector window
The second main contribution is to establish a
standard what pedestrian detection with a global
descriptor can achieve at present, including a
number of recent advances which we believe should
be part of the best practice, but have not yet
been included in systematic evaluations
Our third main contribution are two important
insights that apply not only to pedestrian
detection, but more generally to classifier-based
object detection. (1)Bootstrapping is very
important. (2)The existing evaluation protocol is
insufficient

11
12
Outline

Authors
Abstract
Main contributions
Algorithms
Experiments
Conclusion

12
13
Outline

???????????????????
????????????(Caltech Pedestrian,
TUD-Brussel)??????????????????????,?????????(?????
???????)
???????????????????????????????????????????????
?
Related Features
Haar-like, VJ 2001???????????
HOG (Histogram of Oriented Gradient), Dalal
2005???????????
HOF (Histogram of Flow), Dalal 2006???,?????????
HOG-LBP ??? 2009??????????,???
CSS (Color Self-similarity), ????
Related Classifiers
SVM
MPLBoost (Multiple Pose Boosting), Dollar 2008???

13
14
Haar-like feature (1/2)

Haar-like feature
????????????????????
???????????Haar?????
Haar?????
45, 22.5, 11.25?,???????
??????????Haar??(CVPR10)

??Haar??
Haar????????
14
15
Haar-like feature (2/2)

?????Haar??
???????????????????????????
??????????????????????
??????????????????,??(x,y,??)

15
16
HOG feature (1/1)

HOG feature-???????
?????Gamma??
?????????????????
????????,?????????????????????
????????????????????

HOG??????
16
HOG????????
17
HOF feature (1/1)

HOF feature-?????
???????x?y????? (??LK????)
???????,????????x?y??????,???????????
?????????????????????

Original 3x3 IMHwd (Internal Motion Boundary
wavelet diff.)
17
18
HOG-LBP (1/1)

HOG-LBP feature?HOG?LBP????
HOG????????????????
LBP (Local Binary Pattern)?????????
????INRIA??????????????????

LBP????
18
19
CSS (1/1)

CSS feature??????
??8x8?????,??????????????
We experimented with different color spaces,
including 3x3x3 histograms in RGB, HSV, HLS and
CIE Luv space, and 4x4 histograms in normalized
rg, HS and uv, discarding the intensity and only
keeping the chrominance. Among these, HSV worked
best, and is used in the following
???????????????????,?????L1-norm,L2-norm,
Chi-square distance?????,????????????
????,??64x128??????8x16128?8x8??,??128????,??????
???128x127/28,128?
Furthermore, second order image statistics,
especially co-occurrence histograms, are gaining
popularity, pushing feature spaces to extremely
high dimensions

19
20
Classifiers

SVMs
Linear SVM
Histogram Intersection Kernel SVM (HIKSVM)
MPLBoost Multiple Pose Boosting (In ECCV08
workshop)
?????????K???,????K?????,????????K???????????
??????,????????????????????,?????????
??????,????????,????????????positive??positive,???
????????negative??negative

20
21
Evaluation protocol (1/4)

????????????????
???????????????????????VOC??,???gt50
????????????????????????

21
22
Evaluation protocol (2/4)

We split the set of annotations and detections
into considered and ignored sets
Annotations can fall into the ignored set because
of size, position, occlusion level, aspect ratio
or non-pedestrian label in the Caltech setting
Detections can fall into the ignored set because
of size. E.g. if we wish to evaluate on
50-pixel-or-taller, unoccluded pedestrians, any
annotation labeled as occluded and any annotation
or detection lt50 pixels falls in the ignored set

22
23
Evaluation protocol (3/4)

For considered detections
If they match a considered annotation they count
as true positive
If they match no annotation, or only one that has
already been matched to another detection, they
count as false positive
If they match an ignored annotation they are
discarded
For ignored detections
If an ignored detection matches an ignored
annotation, it should be discarded
If an ignored detection matches no annotation, it
seems reasonable to discard it, but this may
introduce a bias
If an ignored detection matches a considered
annotation, count it as a true positive

23
24
Evaluation protocol (4/4)

To summarize, there is no single correct way how
to evaluate on a subset of annotations, and all
choices have undesirable side effects
It is therefore imperative that published results
are accompanied by detections, and that
evaluation scripts are made public
As there are boundary effects in almost any
setting (all realistic datasets have a minimum
annotation size), it must be possible for others
to verify that differences are not artifacts of
the evaluation

24
25
Outline

Authors
Abstract
Main contributions
Algorithms
Experiments
Conclusion

25
26
Database

INRIA?????
CalTech?????
2009?Dollar??
????
?????192k??,???155k??
???????,?????????(????3?????)???
??????,????????????
TUD-Brussel???
2009?Wojek??
????
?????,??1,326??,????????
????????????64x128,????48x96,??

26
27
Experiment1 HOG-LBP (1/1)
INRIA
TUD

However, while we were able to reproduce their
good results on INRIA Person, we could not gain
anything with LBPs on other datasets. They seem
to be affected when imaging conditions change (in
our case, we suspect demosaicing artifacts to be
the issue)

27
28
Experiment2 Color information (1/2)
TUD
TUD

More than 1fppi is usually not acceptable in any
practical application
Self-similarity of colors is more appropriate
than using the underlying color histograms
directly as feature
On the contrary, adding the color histogram
values directly even hurts the performance of HOG

28
29
Experiment2 Color information (2/2)

Why CSS is effective?
Self-similarity encodes relevant parts like
clothing and visible skin regions
Why directly using color information shows no
improvements?
The training data was recorded with a different
camera and in different lighting conditions than
the test data, so that the weights learned for
color do not generalize from one to the other.
(Similar reason to Haar feature)

29
30
Experiment3 Bootstrap (1/2)

With less than two bootstrapping rounds,
performance depends heavily on the initial
training set
At least two retraining rounds are required in
HOGlinear SVM framework
This problem will be alleviated by using more
initial negative samples, not solved

30
31
Experiment3 Bootstrap (2/2)

For boosting classifiers (Fig. 3(c))3, the
situation is worse although mean performance
seems stable over bootstrapping rounds, the
overall variance only decreases slowlythe
initial selection of negative samples has a high
influence on the final performance even after 3
bootstrapping rounds

31
32
Experiment4 Seed self similarity(1/1)
TUD

Self-similarity on HOG blocks shows little
improvement
It is important to make sure the result does not
depend on the initial selection of negative
samples, e.g. by retraining enough rounds with
SVMs

32
33
Experiment5 CalTech pedestrian (1/2)
33
34
Experiment5 CalTech pedestrian (2/2)

Color self-similarity is indeed complementary to
gradient information
The motion information contributes greatly on
pedestrian detection. The reason that HOF works
so well on the near scale is probably that
during multi-scale flow estimation compression
artifacts are less visible at higher pyramid
levels, so that the flow field is more accurate
for larger people
The performance of all evaluated algorithms is
abysmal under heavy occlusion

34
35
Experiment6 Haar feature (1/1)
TUD