Title: New Features and Insights for Pedestrian Detection Stefan
1New Features and Insights for Pedestrian Detection
Stefan Walk, Nikodem Majer, Konrad Schindler,
Bernt Schiele
1
2Outline
- Authors
- Abstract
- Main contributions
- Algorithms
- Experiments
- Conclusion
2
3Authors (1/4)
- Stefan Walk
- Experience
- 2007-, PhD Candidate in Computer Science,
Technische - Universität Darmstadt
- 2003-2007, Diploma in Physics, Technische
Universität - Darmstadt, Germany 2007
- Research interest
- People Detection
- Detecting from video data (utilizing motion
information) - Papers
- Multi-cue Onboard Pedestrian Detection (CVPR09)
3
4Authors (2/4)
- Nikodem Majer
- Experience
- 2007-, PhD Candidate in Computer Science,
Technische - Universität Darmstadt
- Research interest
-
- Papers
4
5Authors (3/4)
- Konrad Schindler
- Experience
- 2009- assistant professor, TU Darmstadt, Germany
- 2007-2008 post-doc, ETH Zurich
- 2004-2006 post-doc, Monash University,
- Melbourne/Australia
- 2001-2003 research assistant, Graz University of
Technology, Austria - Research interest
- computer vision (3D scene analysis, biologically
inspired vision, tracking) - image processing, pattern recognition, machine
learning, photogrammetry - Papers
- PAMI10, CVPR10, ICCV10
5
6Authors (4/4)
- Bernt Schiele
- Experience
- 1999-2004, Assistant Professor, ETH Zurich,
Switzerland - 1997-2000, Postdoctoral Associate and Visiting
Assistant Professor, - MIT and Cambridge, MA, USA
- 1994, Visiting researcher at CMU
- AE of PAMI, IJCV, AC of ECCV08, CVPR09,
ICCV09, - PC of ICCV 2011
- Research interest
- Perceptual computing, human-computer interfaces
- Papers
6
7Outline
- Authors
- Abstract
- Main contributions
- Algorithms
- Experiments
- Conclusion
7
8Abstract (1/2)
- Despite impressive progress in people detection
the performance on challenging datasets like
Caltech Pedestrians or TUD-Brussels is still
unsatisfactory - In this work we show that motion features derived
from optic flow yield substantial improvements on
image sequences, if implemented correctlyeven in
the case of low-quality video and consequently
degraded flow fields - Furthermore, we introduce a new feature,
self-similarity on color channels, which
consistently improves detection performance both
for static images and for video sequences, across
different datasets. In combination with HOG,
these two features outperform the
state-of-the-art by up to 20.
8
9Abstract (2/2)
- Finally, we report two insights concerning
detector evaluations, which apply to
classifier-based object detection in general - First, we show that a commonly under-estimated
detail of training, the number of bootstrapping
rounds, has a drastic influence on the relative
(and absolute) performance of different
feature/classifier combinations - Second, we discuss important intricacies of
detector evaluation and show that current
benchmarking protocols lack crucial details,
which can distort evaluations
9
10Outline
- Authors
- Abstract
- Main contributions
- Algorithms
- Experiments
- Conclusion
10
11Main contribution
- First, we introduce a new feature based on
self-similarity of low level features, in
particular color histograms from different
sub-regions within the detector window - The second main contribution is to establish a
standard what pedestrian detection with a global
descriptor can achieve at present, including a
number of recent advances which we believe should
be part of the best practice, but have not yet
been included in systematic evaluations - Our third main contribution are two important
insights that apply not only to pedestrian
detection, but more generally to classifier-based
object detection. (1)Bootstrapping is very
important. (2)The existing evaluation protocol is
insufficient
11
12Outline
- Authors
- Abstract
- Main contributions
- Algorithms
- Experiments
- Conclusion
12
13Outline
- ???????????????????
- ????????????(Caltech Pedestrian,
TUD-Brussel)??????????????????????,?????????(?????
???????) - ???????????????????????????????????????????????
? - Related Features
- Haar-like, VJ 2001???????????
- HOG (Histogram of Oriented Gradient), Dalal
2005??????????? - HOF (Histogram of Flow), Dalal 2006???,?????????
- HOG-LBP ??? 2009??????????,???
- CSS (Color Self-similarity), ????
- Related Classifiers
- SVM
- MPLBoost (Multiple Pose Boosting), Dollar 2008???
13
14Haar-like feature (1/2)
- Haar-like feature
- ????????????????????
- ???????????Haar?????
- Haar?????
- 45, 22.5, 11.25?,???????
- ??????????Haar??(CVPR10)
??Haar??
Haar????????
14
15Haar-like feature (2/2)
- ?????Haar??
- ???????????????????????????
- ??????????????????????
- ??????????????????,??(x,y,??)
15
16HOG feature (1/1)
- HOG feature-???????
- ?????Gamma??
- ?????????????????
- ????????,?????????????????????
- ????????????????????
HOG??????
16
HOG????????
17HOF feature (1/1)
- HOF feature-?????
- ???????x?y????? (??LK????)
- ???????,????????x?y??????,???????????
- ?????????????????????
Original 3x3 IMHwd (Internal Motion Boundary
wavelet diff.)
17
18HOG-LBP (1/1)
- HOG-LBP feature?HOG?LBP????
- HOG????????????????
- LBP (Local Binary Pattern)?????????
- ????INRIA??????????????????
LBP????
18
19CSS (1/1)
- CSS feature??????
- ??8x8?????,??????????????
- We experimented with different color spaces,
including 3x3x3 histograms in RGB, HSV, HLS and
CIE Luv space, and 4x4 histograms in normalized
rg, HS and uv, discarding the intensity and only
keeping the chrominance. Among these, HSV worked
best, and is used in the following - ???????????????????,?????L1-norm,L2-norm,
Chi-square distance?????,???????????? - ????,??64x128??????8x16128?8x8??,??128????,??????
???128x127/28,128? - Furthermore, second order image statistics,
especially co-occurrence histograms, are gaining
popularity, pushing feature spaces to extremely
high dimensions
19
20Classifiers
- SVMs
- Linear SVM
- Histogram Intersection Kernel SVM (HIKSVM)
- MPLBoost Multiple Pose Boosting (In ECCV08
workshop) - ?????????K???,????K?????,????????K???????????
- ??????,????????????????????,?????????
- ??????,????????,????????????positive??positive,???
????????negative??negative
20
21Evaluation protocol (1/4)
- ????????????????
- ???????????????????????VOC??,???gt50
- ????????????????????????
21
22Evaluation protocol (2/4)
- We split the set of annotations and detections
into considered and ignored sets - Annotations can fall into the ignored set because
of size, position, occlusion level, aspect ratio
or non-pedestrian label in the Caltech setting - Detections can fall into the ignored set because
of size. E.g. if we wish to evaluate on
50-pixel-or-taller, unoccluded pedestrians, any
annotation labeled as occluded and any annotation
or detection lt50 pixels falls in the ignored set
22
23Evaluation protocol (3/4)
- For considered detections
- If they match a considered annotation they count
as true positive - If they match no annotation, or only one that has
already been matched to another detection, they
count as false positive - If they match an ignored annotation they are
discarded - For ignored detections
- If an ignored detection matches an ignored
annotation, it should be discarded - If an ignored detection matches no annotation, it
seems reasonable to discard it, but this may
introduce a bias - If an ignored detection matches a considered
annotation, count it as a true positive
23
24Evaluation protocol (4/4)
- To summarize, there is no single correct way how
to evaluate on a subset of annotations, and all
choices have undesirable side effects - It is therefore imperative that published results
are accompanied by detections, and that
evaluation scripts are made public - As there are boundary effects in almost any
setting (all realistic datasets have a minimum
annotation size), it must be possible for others
to verify that differences are not artifacts of
the evaluation
24
25Outline
- Authors
- Abstract
- Main contributions
- Algorithms
- Experiments
- Conclusion
25
26Database
- INRIA?????
- CalTech?????
- 2009?Dollar??
- ????
- ?????192k??,???155k??
- ???????,?????????(????3?????)???
- ??????,????????????
- TUD-Brussel???
- 2009?Wojek??
- ????
- ?????,??1,326??,????????
- ????????????64x128,????48x96,??
26
27Experiment1 HOG-LBP (1/1)
INRIA
TUD
- However, while we were able to reproduce their
good results on INRIA Person, we could not gain
anything with LBPs on other datasets. They seem
to be affected when imaging conditions change (in
our case, we suspect demosaicing artifacts to be
the issue)
27
28Experiment2 Color information (1/2)
TUD
TUD
- More than 1fppi is usually not acceptable in any
practical application - Self-similarity of colors is more appropriate
than using the underlying color histograms
directly as feature - On the contrary, adding the color histogram
values directly even hurts the performance of HOG
28
29Experiment2 Color information (2/2)
- Why CSS is effective?
- Self-similarity encodes relevant parts like
clothing and visible skin regions - Why directly using color information shows no
improvements? - The training data was recorded with a different
camera and in different lighting conditions than
the test data, so that the weights learned for
color do not generalize from one to the other.
(Similar reason to Haar feature)
29
30Experiment3 Bootstrap (1/2)
- With less than two bootstrapping rounds,
performance depends heavily on the initial
training set - At least two retraining rounds are required in
HOGlinear SVM framework - This problem will be alleviated by using more
initial negative samples, not solved
30
31Experiment3 Bootstrap (2/2)
- For boosting classifiers (Fig. 3(c))3, the
situation is worse although mean performance
seems stable over bootstrapping rounds, the
overall variance only decreases slowlythe
initial selection of negative samples has a high
influence on the final performance even after 3
bootstrapping rounds
31
32Experiment4 Seed self similarity(1/1)
TUD
- Self-similarity on HOG blocks shows little
improvement - It is important to make sure the result does not
depend on the initial selection of negative
samples, e.g. by retraining enough rounds with
SVMs
32
33Experiment5 CalTech pedestrian (1/2)
33
34Experiment5 CalTech pedestrian (2/2)
- Color self-similarity is indeed complementary to
gradient information - The motion information contributes greatly on
pedestrian detection. The reason that HOF works
so well on the near scale is probably that
during multi-scale flow estimation compression
artifacts are less visible at higher pyramid
levels, so that the flow field is more accurate
for larger people - The performance of all evaluated algorithms is
abysmal under heavy occlusion
34
35Experiment6 Haar feature (1/1)
TUD
- Judging from the available research our feeling
is that Haar features can potentially harm more
than they help
35
36Outline
- Authors
- Abstract
- Main contributions
- Algorithms
- Experiments
- Conclusion
36
37Conclusion
- ????
- ???????????????????????(HOG)
- ?????????????????????(CSS)
- Bootstrap???????????????
- ???????????????
- ????
- LBP????INRIA?????
- HOG-linear SVM????2?bootstrap
- ??Haar??????????????
37
38Thanks!!
38