Integrating%20Recognition%20and%20Reconstruction%20for%20Cognitive%20Scene%20Interpretation - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Integrating%20Recognition%20and%20Reconstruction%20for%20Cognitive%20Scene%20Interpretation

Description:

Integrating Recognition and Reconstruction for Cognitive Scene Interpretation – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 31
Provided by: bast150
Learn more at: http://lear.inrialpes.fr
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Integrating%20Recognition%20and%20Reconstruction%20for%20Cognitive%20Scene%20Interpretation


1
Integrating Recognition and Reconstruction for
Cognitive Scene Interpretation
  • Bastian Leibe, Nico Cornelis, Kurt Cornelis, Luc
    Van GoolComputer Vision Laboratory
  • ETH Zurich
  • Sicily Workshop, Syracusa, 22.09.2006

VISICSKU Leuven

CVPR06 Video ProceedingsDAGM06
2
Motivation
  • Urban traffic scene analysis from a moving
    vehicle
  • Detect objects in the image
  • Localize them in 3D
  • Build up a metric scene model
  • Applications e.g. in driver assistance systems

3
Challenges
Brightly-lit areas
Motion blur
Lense flaring
Dark shadows
Intra-category variability, multiple
viewpoints, partial occlusion, ...
4
Cognitive Loop with 3D Geometry
  • Connect recognition and reconstruction
  • Reconstruction pathway delivers scene geometry
  • ? Greatly improves recognition performance
  • Recognition detects objects that disturb
    reconstruction
  • ? More accurate geometry estimate

5
Outline
  • Hardware setup
  • Reconstruction pathway
  • Real-time Structure-from-Motion
  • Real-time dense reconstruction
  • Recognition pathway
  • Local-feature based object detection
  • Incorporation of scene geometry
  • Temporal integration in world coordinate frame
  • Feedback to reconstruction
  • Results and Conclusion

6
Hardware Setup
  • Stereo camera rig mounted on top of the vehicle
  • Calibrated w.r.t. wheel base points
  • Video streams captured at 25 fps, 360?288
    resolution

7
Real-Time Structure-from-Motion
  • Basis very fast feature matching
  • Simple features
  • Optimized for urban environment
  • Only computed on green channel of a single camera
  • Rest standard SfM pipeline

Cornelis et al., CVPR06
8
Real-Time Dense Reconstruction
  • Dense reconstruction on rectified images
  • Ruled surface assumption to speed-up dense
    reconstruction
  • Correlation measure Sum of per-pixel SSDs along
    vertical lines
  • Line-sweep algorithm with ordering constraints
    (DP)
  • Fast computation on GPU
  • Errors introduced by pixels not belonging to
    facades!

Cornelis et al., CVPR06
9
Real-Time Dense Reconstruction (2)
  • Merge dense reconstructions using known camera
    poses.
  • Voted polygon carving on 2D projection

10
Real-Time Dense Reconstruction (2)
  • Merge dense reconstructions using known camera
    poses.
  • Voted polygon carving on 2D projection
  • Surfaces registered on world map using GPS

11
Textured 3D Model
  • Run-times
  • SfM Bundle adjustment 26-30 fps on CPU
  • Dense reconstruction 26 fps on GPU

12
Information Flow into Recognition
  • For each frame, 3D reconstruction delivers
  • External camera calibration
  • Ground plane estimate
  • ? Used for improving recognition of the next
    frame.

13
Appearance-Based Car Detection
  • Bank of 5 single-view ISM detectors
  • Each based on 3 local cues
  • Harris-Laplace, Hessian-Laplace, and DoG interest
    regions
  • Local Shape Context descriptors
  • Semi-profile detectors additionally mirrored
  • Not real-time yet

Leibe, Mikolajczyk, Schiele,06
14
Implicit Shape Model - Representation
  • Learn appearance codebook
  • Extract patches at interest points
  • Agglomerative clustering ? codebook
  • Learn spatial distributions
  • Match codebook to training images
  • Record matching positions on object

15
Implicit Shape Model - Recognition
Interest Points
Leibe Schiele,04
16
Implicit Shape Model - Recognition
Leibe Schiele,04
17
2D/3D Interactions
  • Likelihood of 3D hypothesis H given image I and
    2D detections h
  • 2D recognition score
  • Expressed in terms of per-pixel p(figure)
    probabilities

18
2D/3D Interactions
  • Likelihood of 3D hypothesis H given image I and
    2D detections h
  • 3D prior
  • Distance prior (uniform range)
  • Size prior (Gaussian)
  • ? Significantly reduced search space

Search corridor
19
2D/3D Interactions
  • Likelihood of 3D hypothesis H given image I and
    2D detections h
  • 2D/3D transfer
  • Two image-plane detections are consistent if they
    correspond to the same 3D object
  • ? Multi-viewpoint integration
  • ? Multi-camera integration

20
Detections Using Ground Plane Constraints
left camera 1175 frames
21
Quantitative Results
  • Detection performance on first 600 frames
  • All cars annotated that were gt50 visible
  • Ground plane constraint significantly improves
    precision
  • Performance 0.2 fp/image at 50 recall

22
Temporal Integration
  • Temporal integration in world coordinate frame
  • Using external camera calibration from SfM.
  • Each detection transfers to a 3D observation H.
  • Find superset of 3D hypotheses .
  • Estimate orientation using cluster shape
    detected viewpoints.
  • Select set of 3D hypotheses that best explain the
    observations.

23
Hypothesis Selection for 3D Detections
  • Quadratic Boolean Optimization Problem (from MDL)
  • Individual scores (diagonal terms)
  • Interaction costs (off-diagonal terms)

Leonardis et al,95
24
Result of Temporal Integration
25
Online 3D Car Location Estimates
26
3D Estimates After Convergence
27
Feedback into 3D Reconstruction
  • Feedback of detections segmentation maps
  • Used to discard features on cars for SfM
  • Used to mask out cars in dense reconstruction
  • ? More accurate 3D estimates in the next frame.

28
Another Application 3D City Modeling
  • Enhancing your driving experience

29
Conclusion
  • System for traffic scene analysis integrating
  • Structure-from-Motion
  • Dense 3D Reconstruction
  • Object detection and localization in 2D and 3D
  • Temporal integration in world coordinate frame
  • Cognitive Loop between 2D and 3D processing
  • Reconstruction delivers camera calibration,
    ground plane
  • 3D context tremendously improves recognition
    performance
  • Car detection, segmentation makes 3D estimation
    more accurate
  • System applied to challenging real-world task
  • Real-time 3D reconstruction (26-30 fps)
  • Accurate object detection 3D pose estimation
    results

30
  • Thank you very much for your attention!

http//www.vision.ethz.ch/
http//www.esat.kuleuven.be/psi/visics/
31
Real-Time Dense Reconstruction (2)
  • Merge dense reconstructions using known camera
    poses.
  • Voted polygon carving on 2D projection

32
Real-Time Dense Reconstruction (3)
  • Ruled surfaces are registered on world map using
    GPS.
  • Run-times
  • SfM Bundle adjustment 26-30 fps on CPU
  • Dense reconstruction 26 fps on GPU

33
MDL Hypothesis Selection
Joint work with Ales Leonardis, UOL
  • Savings of a hypothesis
  • with
  • Sarea data points N belonging to h
  • Smodel model cost
  • Serror estimate of error, according to
  • Final form of equation

34
MDL Hypothesis Selection (2)
  • Savings of combined hypothesis
  • Goal Find combination that best explains the
    data
  • Quadratic Boolean Optimization problem
    Leonardis et al,95

35
Conclusion
  • System for cognitive scene analysis
  • Structure-from-Motion
  • Dense 3D Reconstruction
  • Object detection and localization
  • Temporal integration in world coordinate frame
  • Cognitive Loop between 2D and 3D processing
  • Reconstruction delivers camera calibration
    ground plane.
  • 3D context tremendously improves recognition
    performance.
  • Car detection/segmentation makes 3D estimation
    more accurate.
  • Further work
  • Add motion model for moving objects
  • Extend recognition to more categories
  • Optimize recognition run-time

36
Information Flow into Recognition
  • For each frame, 3D reconstruction delivers
  • External camera calibration
  • Ground plane estimate
  • ? Used for improving recognition of the next
    frame.
  • In return, recognition feeds back
  • Object detections and top-down segmentation
  • ? Used to discard features on cars for SfM
  • ? Used to mask out cars in dense reconstruction
  • ? More accurate 3D estimates in the next frame.

37
Real-Time Dense Reconstruction
  • Dense reconstruction on rectified images
  • Sum of per-pixel SSDs along vertical lines as
    correlation measure
  • Ruled surface assumption to speed-up dense
    reconstruction
  • Fast computation on GPU
  • Errors introduced by pixels not belonging to
    facades!

38
Real-Time Dense Reconstruction
  • Dense reconstruction on rectified images
  • Ruled surface assumption to speed-up dense
    reconstruction
  • Correlation measure Sum of per-pixel SSDs along
    vertical lines
  • Plane-sweep algorithm with ordering constraints
    (DP)
  • Fast computation on GPU

39
Real-Time Dense Reconstruction (2)
  • Merge dense reconstructions using known camera
    poses.
  • Voted polygon carving on 2D projection

40
Real-Time Dense Reconstruction (2)
  • Merge dense reconstructions using known camera
    poses.
  • Voted polygon carving on 2D projection

41
Outline
  • Object categorization approach
  • Initial hypothesis generation
  • Category-specific figure-ground segmentation
  • Hypothesis verification using segmentation
  • Extensions
  • Discussion Outlook
  • New promising directions

42
Segmentation Probabilistic Formulation
  • Hypothesis generation
  • Segmentation

43
Segmentation
  • Interpretation of p(figure) map
  • per-pixel confidence in object hypothesis
  • Use for hypothesis verification

44
Formalization in MDL Framework
  • Savings of a hypothesis
  • Savings of hypothesis combination
  • Goal Find combination that best explains the
    image
  • Quadratic Boolean Optimization problem
    Leonardis et al,95
  • (In practice often sufficient to compute
    greedy approximation)

45
2D/3D Interactions
  • Relationship between 2D hypothesis h and 3D hypo
    H given image I
  • 2D recognition score
  • Expressed in terms of per-pixel p(figure)
    probabilities

46
2D/3D Interactions
  • Relationship between 2D hypothesis h and 3D hypo
    H given image I
  • 3D prior
  • Distance prior (uniform range)
  • Size prior (Gaussian)
  • ? Significantly reduced search space

Search corridor
47
2D/3D Interactions
  • Relationship between 2D hypothesis h and 3D hypo
    H given image I
  • 2D/3D transfer
  • Two image-plane detections are consistent if they
    correspond to the same 3D object
  • ? Multi-viewpoint integration
  • ? Multi-camera integration

48
2D/3D Knowledge Transfer
49
2D/3D Knowledge Transfer
50
Textured 3D model
51
Effect of the Ground Plane
52
2D/3D Interactions
  • Relationship between 2D hypothesis h and 3D hypo
    H given image I
  • 2D/3D transfer
  • Two image-plane detections are consistent if they
    correspond to the same 3D object
  • ? Multi-viewpoint integration
  • ? Multi-camera integration

53
Ground Plane
  • Use city model to reduce the amount of false
    positives

54
Cognitive Loop with 3D Geometry
  • Cognitive Loop
  • Bidirectional knowledge transfer involving a
    semantic level
About PowerShow.com