Title: Integrating%20Recognition%20and%20Reconstruction%20for%20Cognitive%20Scene%20Interpretation
1Integrating Recognition and Reconstruction for
Cognitive Scene Interpretation
- Bastian Leibe, Nico Cornelis, Kurt Cornelis, Luc
Van GoolComputer Vision Laboratory - ETH Zurich
- Sicily Workshop, Syracusa, 22.09.2006
VISICSKU Leuven
CVPR06 Video ProceedingsDAGM06
2Motivation
- Urban traffic scene analysis from a moving
vehicle - Detect objects in the image
- Localize them in 3D
- Build up a metric scene model
- Applications e.g. in driver assistance systems
3Challenges
Brightly-lit areas
Motion blur
Lense flaring
Dark shadows
Intra-category variability, multiple
viewpoints, partial occlusion, ...
4Cognitive Loop with 3D Geometry
- Connect recognition and reconstruction
- Reconstruction pathway delivers scene geometry
- ? Greatly improves recognition performance
- Recognition detects objects that disturb
reconstruction - ? More accurate geometry estimate
5Outline
- Hardware setup
- Reconstruction pathway
- Real-time Structure-from-Motion
- Real-time dense reconstruction
- Recognition pathway
- Local-feature based object detection
- Incorporation of scene geometry
- Temporal integration in world coordinate frame
- Feedback to reconstruction
- Results and Conclusion
6Hardware Setup
- Stereo camera rig mounted on top of the vehicle
- Calibrated w.r.t. wheel base points
- Video streams captured at 25 fps, 360?288
resolution
7Real-Time Structure-from-Motion
- Basis very fast feature matching
- Simple features
- Optimized for urban environment
- Only computed on green channel of a single camera
- Rest standard SfM pipeline
Cornelis et al., CVPR06
8Real-Time Dense Reconstruction
- Dense reconstruction on rectified images
- Ruled surface assumption to speed-up dense
reconstruction - Correlation measure Sum of per-pixel SSDs along
vertical lines - Line-sweep algorithm with ordering constraints
(DP) - Fast computation on GPU
- Errors introduced by pixels not belonging to
facades!
Cornelis et al., CVPR06
9Real-Time Dense Reconstruction (2)
- Merge dense reconstructions using known camera
poses. - Voted polygon carving on 2D projection
10Real-Time Dense Reconstruction (2)
- Merge dense reconstructions using known camera
poses. - Voted polygon carving on 2D projection
- Surfaces registered on world map using GPS
11Textured 3D Model
- Run-times
- SfM Bundle adjustment 26-30 fps on CPU
- Dense reconstruction 26 fps on GPU
12Information Flow into Recognition
- For each frame, 3D reconstruction delivers
- External camera calibration
- Ground plane estimate
- ? Used for improving recognition of the next
frame.
13Appearance-Based Car Detection
- Bank of 5 single-view ISM detectors
- Each based on 3 local cues
- Harris-Laplace, Hessian-Laplace, and DoG interest
regions - Local Shape Context descriptors
- Semi-profile detectors additionally mirrored
- Not real-time yet
Leibe, Mikolajczyk, Schiele,06
14Implicit Shape Model - Representation
- Learn appearance codebook
- Extract patches at interest points
- Agglomerative clustering ? codebook
- Learn spatial distributions
- Match codebook to training images
- Record matching positions on object
15Implicit Shape Model - Recognition
Interest Points
Leibe Schiele,04
16Implicit Shape Model - Recognition
Leibe Schiele,04
172D/3D Interactions
- Likelihood of 3D hypothesis H given image I and
2D detections h - 2D recognition score
- Expressed in terms of per-pixel p(figure)
probabilities
182D/3D Interactions
- Likelihood of 3D hypothesis H given image I and
2D detections h - 3D prior
- Distance prior (uniform range)
- Size prior (Gaussian)
- ? Significantly reduced search space
Search corridor
192D/3D Interactions
- Likelihood of 3D hypothesis H given image I and
2D detections h - 2D/3D transfer
- Two image-plane detections are consistent if they
correspond to the same 3D object - ? Multi-viewpoint integration
- ? Multi-camera integration
20Detections Using Ground Plane Constraints
left camera 1175 frames
21Quantitative Results
- Detection performance on first 600 frames
- All cars annotated that were gt50 visible
- Ground plane constraint significantly improves
precision - Performance 0.2 fp/image at 50 recall
22Temporal Integration
- Temporal integration in world coordinate frame
- Using external camera calibration from SfM.
- Each detection transfers to a 3D observation H.
- Find superset of 3D hypotheses .
- Estimate orientation using cluster shape
detected viewpoints. - Select set of 3D hypotheses that best explain the
observations.
23Hypothesis Selection for 3D Detections
- Quadratic Boolean Optimization Problem (from MDL)
- Individual scores (diagonal terms)
- Interaction costs (off-diagonal terms)
Leonardis et al,95
24Result of Temporal Integration
25Online 3D Car Location Estimates
263D Estimates After Convergence
27Feedback into 3D Reconstruction
- Feedback of detections segmentation maps
- Used to discard features on cars for SfM
- Used to mask out cars in dense reconstruction
- ? More accurate 3D estimates in the next frame.
28Another Application 3D City Modeling
- Enhancing your driving experience
29Conclusion
- System for traffic scene analysis integrating
- Structure-from-Motion
- Dense 3D Reconstruction
- Object detection and localization in 2D and 3D
- Temporal integration in world coordinate frame
- Cognitive Loop between 2D and 3D processing
- Reconstruction delivers camera calibration,
ground plane - 3D context tremendously improves recognition
performance - Car detection, segmentation makes 3D estimation
more accurate - System applied to challenging real-world task
- Real-time 3D reconstruction (26-30 fps)
- Accurate object detection 3D pose estimation
results
30- Thank you very much for your attention!
http//www.vision.ethz.ch/
http//www.esat.kuleuven.be/psi/visics/
31Real-Time Dense Reconstruction (2)
- Merge dense reconstructions using known camera
poses. - Voted polygon carving on 2D projection
32Real-Time Dense Reconstruction (3)
- Ruled surfaces are registered on world map using
GPS. - Run-times
- SfM Bundle adjustment 26-30 fps on CPU
- Dense reconstruction 26 fps on GPU
33MDL Hypothesis Selection
Joint work with Ales Leonardis, UOL
- Savings of a hypothesis
- with
- Sarea data points N belonging to h
- Smodel model cost
- Serror estimate of error, according to
- Final form of equation
34MDL Hypothesis Selection (2)
- Savings of combined hypothesis
- Goal Find combination that best explains the
data - Quadratic Boolean Optimization problem
Leonardis et al,95
35Conclusion
- System for cognitive scene analysis
- Structure-from-Motion
- Dense 3D Reconstruction
- Object detection and localization
- Temporal integration in world coordinate frame
- Cognitive Loop between 2D and 3D processing
- Reconstruction delivers camera calibration
ground plane. - 3D context tremendously improves recognition
performance. - Car detection/segmentation makes 3D estimation
more accurate. - Further work
- Add motion model for moving objects
- Extend recognition to more categories
- Optimize recognition run-time
36Information Flow into Recognition
- For each frame, 3D reconstruction delivers
- External camera calibration
- Ground plane estimate
- ? Used for improving recognition of the next
frame. - In return, recognition feeds back
- Object detections and top-down segmentation
- ? Used to discard features on cars for SfM
- ? Used to mask out cars in dense reconstruction
- ? More accurate 3D estimates in the next frame.
37Real-Time Dense Reconstruction
- Dense reconstruction on rectified images
- Sum of per-pixel SSDs along vertical lines as
correlation measure - Ruled surface assumption to speed-up dense
reconstruction - Fast computation on GPU
- Errors introduced by pixels not belonging to
facades!
38Real-Time Dense Reconstruction
- Dense reconstruction on rectified images
- Ruled surface assumption to speed-up dense
reconstruction - Correlation measure Sum of per-pixel SSDs along
vertical lines - Plane-sweep algorithm with ordering constraints
(DP) - Fast computation on GPU
39Real-Time Dense Reconstruction (2)
- Merge dense reconstructions using known camera
poses. - Voted polygon carving on 2D projection
40Real-Time Dense Reconstruction (2)
- Merge dense reconstructions using known camera
poses. - Voted polygon carving on 2D projection
41Outline
- Object categorization approach
- Initial hypothesis generation
- Category-specific figure-ground segmentation
- Hypothesis verification using segmentation
- Extensions
- Discussion Outlook
- New promising directions
42Segmentation Probabilistic Formulation
- Hypothesis generation
- Segmentation
43Segmentation
- Interpretation of p(figure) map
- per-pixel confidence in object hypothesis
- Use for hypothesis verification
44Formalization in MDL Framework
- Savings of a hypothesis
- Savings of hypothesis combination
- Goal Find combination that best explains the
image - Quadratic Boolean Optimization problem
Leonardis et al,95 - (In practice often sufficient to compute
greedy approximation)
452D/3D Interactions
- Relationship between 2D hypothesis h and 3D hypo
H given image I - 2D recognition score
- Expressed in terms of per-pixel p(figure)
probabilities
462D/3D Interactions
- Relationship between 2D hypothesis h and 3D hypo
H given image I - 3D prior
- Distance prior (uniform range)
- Size prior (Gaussian)
- ? Significantly reduced search space
Search corridor
472D/3D Interactions
- Relationship between 2D hypothesis h and 3D hypo
H given image I - 2D/3D transfer
- Two image-plane detections are consistent if they
correspond to the same 3D object - ? Multi-viewpoint integration
- ? Multi-camera integration
482D/3D Knowledge Transfer
492D/3D Knowledge Transfer
50Textured 3D model
51Effect of the Ground Plane
522D/3D Interactions
- Relationship between 2D hypothesis h and 3D hypo
H given image I - 2D/3D transfer
- Two image-plane detections are consistent if they
correspond to the same 3D object - ? Multi-viewpoint integration
- ? Multi-camera integration
53Ground Plane
- Use city model to reduce the amount of false
positives
54Cognitive Loop with 3D Geometry
- Cognitive Loop
- Bidirectional knowledge transfer involving a
semantic level