Seeing the World Behind the Image - PowerPoint PPT Presentation

1 / 113
About This Presentation
Title:

Seeing the World Behind the Image

Description:

Seeing the World Behind the Image – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 114
Provided by: dere114
Category:
Tags: behind | image | seeing | wcw | world

less

Transcript and Presenter's Notes

Title: Seeing the World Behind the Image


1
Seeing the World Behind the Image
Spatial Layout for 3D Scene Understanding
  • Derek Hoiem
  • July 10, 2007
  • Robotics Institute
  • Carnegie Mellon University

Thesis Committee Alexei A. Efros, Martial
Hebert, Rahul Sukthankar, Takeo Kanade, William
Freeman
2
Scene Understanding
3
The World Behind the Image
4
3D Spatial Layout
SKY
VERTICAL
VERTICAL
SUPPORT
  • Description of 3D Surfaces
  • Occlusion Relationships
  • Camera Viewpoint Objects

5
3D Spatial Layout
  • Description of 3D Surfaces
  • Occlusion Relationships
  • Camera Viewpoint Objects

6
3D Spatial Layout
Car
Person
Car
  • Description of 3D Surfaces
  • Occlusion Relationships
  • Camera Viewpoint Objects

7
Recent Work in 3D
Oliva Torralba 2001
Saxena, Chung Ng 2005
Torralba, Murphy Freeman 2003
8
Our Main Challenge
  • Recovering 3D geometry from single 2D projection
  • Infinite number of possible solutions!


9
Our World is Structured
Abstract World
Image Credit (left) F. Cunin and M.J. Sailor,
UCSD
10
Early Work in 3D Scene Understanding
Guzman 1968
Ohta Kanade 1978
  • Hansen Riseman 1978 (VISIONS)
  • Barrow Tenenbaum 1978 (Intrinsic Images)
  • Brooks 1979 (ACRONYM)
  • Marr 1982 (2½ D Sketch)

11
Learn the Structure of the World
12
Infer Most Likely Scene
Unlikely
Likely
13
Description of 3D Surfaces
  • Goal Label image into 7 Geometric Classes
  • Support
  • Vertical
  • Planar facing Left (?), Center ( ), Right (?)
  • Non-planar Solid (X), Porous or wiry (O)
  • Sky

?
14
Use All Available Cues
Color, texture, image location
Vanishing points, lines
Texture gradient
15
Get Good Spatial Support
50x50 Patch
50x50 Patch
16
Image Segmentation
  • Single segmentation wont work
  • Solution multiple segmentations


17
Labeling Segments


For each segment - Get P(good segment data)
P(label good segment, data)
18
Image Labeling
Labeled Segmentations

Labeled Pixels
19
Confidences from Logistic Adaboost with Decision
Trees
Gray?
High in Image?
No
Yes
No
Yes
High in Image?
Many Long Lines?
Smooth?
Green?

No
No
Yes
Yes
No
Yes
Yes
No
Blue?
Very High Vanishing Point?
Yes
No
Yes
No
P(label good segment, data)
Ground Vertical Sky
Collins et al. 2002
20
Surface Confidence Maps
Input Image
Most Likely Labels
Vertical
Sky
Support
21
Surface Estimates Outdoor
Avg. Accuracy Main Class 88 Subclass 62
Input Image
Ground Truth
Our Result
22
Surface Estimates Indoor
Avg. Accuracy Main Class 93 Subclass 76
Input Image
Ground Truth
Our Result
23
Automatic Photo Popup
Labeled Image
Fit Ground-Vertical Boundary with Line Segments
Form Segments into Polylines
Cut and Fold
Final Pop-up Model
Hoiem Efros Hebert 2005
24
Robot Navigation
Nabbe Hoiem Hebert Efros 2006
25
Robot Navigation
26
Image
Ground Truth
27
Occlusion Reasoning is Necessary
Ground Truth
3D Model
28
Recover Major Occlusions
29
Prior Work Finding Boundaries
NCuts Segmentation
Input Image
Pb Boundaries
NCuts Cour et al. 2004
Pb Martin et al. 2002
30
Segmentation into Physical Boundaries
31
Prior Work Figure/Ground Assignment
  • Line labeling approach
  • Focus on junctions

Guzman 1968
also Clowes 1971, Huffman 1971, Waltz 1975, ,
Saund 2006
32
Prior Work Figure/Ground Assignment
Input Image
Figure/Ground Goal
Pb Boundaries
Human Boundaries
Figure/Ground Accuracy
Shapemes CRF
Pb Boundaries 68.9
Human Boundaries 78.3
Boundary Shape Cues
Continuity/Junction Cues
Ren et al. 2006
33
Recover Major Occlusions
Occlusion Boundaries
Inferred Depth
34
Start with Oversegmentation
Occlusion boundary?
Initial Segmentation
35
2D Cues for Occlusions
Region Color and Texture
Boundaries Strength and Continuity
36
2D Junctions
2
1
3
2D Boundary T-Junction
Image
37
3D Surface Clues for Occlusions
Support
Planar
Porous
Sky
Solid
2
3
1
Geometric T-Junction
Surface Labels
38
3D Depth Cues for Occlusion
Surfaces
Initial Boundaries
Depth Underestimate
Depth Overestimate
39
Illustration of Depth Range
SKY
SUPPORT
Depth (Min)
Image
Depth (Max)
40
Gradual Occlusion Inference
?
Initial Segmentation
Final Boundaries
Initial Depth (Min)
Initial Depth (Max)
41
Gradual Occlusion Inference
P(occlusion)
Soft Boundary Map
Stage 1 Result
42
Gradual Occlusion Inference
P(occlusion)
Soft Boundary Map
Stage 1 Result
43
Gradual Occlusion Inference
P(occlusion) CRF(continuity, closure)
Soft-Max Boundary Map
Stage 2 Result
44
Gradual Occlusion Inference
P(occlusion) CRF(continuity, closure, surfaces)
Stage 3 Result
Soft-Max Boundary Map
45
Final Estimate
Depth (Min)
Boundaries, Foreground/Background, Contact
Depth (Max)
46
Evaluation
  • Training 50 images
  • Testing 250 images (50 quantitative)

47
Occlusion vs. Non-Occlusion
48
Foreground/Background Accuracy
Ours
Edge/Region Cues 3D Cues With CRF
Stage 1 58.7 71.7
Stage 2 65.4 75.6 77.3
Stage 3 68.2 77.1 79.9
Ren et al. 2006, Corel Images
Shapemes CRF
Pb Boundaries 68.9
Human Boundaries 78.3
49
Occlusion Result
Depth (Min)
Depth (Max)
Boundaries, Foreground/Background, Contact
50
Occlusion Result
Depth (Min)
Boundaries, Foreground/Background, Contact
Depth (Max)
51
3D Model with Occlusions
3D Model without Occlusion Reasoning
3D Model with Occlusion Reasoning
52
Recovering Viewpoint and Objects
Objects
3D Surfaces
Viewpoint
53
Results of a 2D Pedestrian Detector
True Detection
False Detections
Missed
Missed
True Detections
Detector from Dalal Triggs 2005
54
2D Contextual Reasoning
Kumar Hebert 2005
Torralba Murphy Freeman 2004
  • Winn Shotton 2006
  • Fink Perona 2003
  • Carbonetto Freitas Banard 2004
  • He Zemel Cerreira-Perpiñán 2004

55
Reasoning within the 3D Scene
Close
Not Close
56
Camera Viewpoint
57
Object Size ? Camera Viewpoint
Input Image
Loose Viewpoint Prior
58
Object Size ? Camera Viewpoint
Input Image
Loose Viewpoint Prior
59
Object Size ? Camera Viewpoint
Object Position/Sizes
Viewpoint
60
Object Size ? Camera Viewpoint
Object Position/Sizes
Viewpoint
61
Object Size ? Camera Viewpoint
Object Position/Sizes
Viewpoint
62
Object Size ? Camera Viewpoint
Object Position/Sizes
Viewpoint
63
Camera Viewpoint ??Object Height
Input Image
2D Object Heights
3D Object Heights
64
Viewpoint from Scene Matching
LabelMe with Viewpoint Annotations
Input Image


65
What does surface and viewpoint say about objects?
Image
P(object)
66
What does surface and viewpoint say about objects?
Image
P(surfaces)
P(viewpoint)
P(object surfaces, viewpoint)
P(object)
67
Input to Our Algorithm
Surface Estimates
Viewpoint Initial
Object Detection
Local Car Detector
Local Ped Detector
Surfaces
68
Exact Inference over Tree with Belief Propagation
Viewpoint
?
Local Object Evidence
Local Object Evidence
Objects
...
o1
on
Local Surface Evidence
Local Surface Evidence
Local Surfaces

s1
sn
69
Improved Viewpoint Estimate
Viewpoint Initial
Viewpoint Final
Likelihood
Likelihood
Horizon
Height
Horizon
Height
70
Improved Object Estimate
Car TP / FP Ped TP / FP
Initial (Local)
Final (Global)
Car Detection
4 TP / 1 FP
4 TP / 2 FP
Ped Detection
4 TP / 0 FP
3 TP / 2 FP
71
Experiments on LabelMe Dataset
  • Testing with LabelMe dataset
  • Cars as small as 14 pixels
  • Peds as small as 36 pixels

72
More Tasks ? Better Detection
Local Detector from Murphy et al. 2003
Car Detection
Pedestrian Detection
All Information Objects View
Objects Geom Objects Only
All Information Objects View
Objects Geom Objects Only
Detection Rate
Detection Rate
False Positives Per Image
False Positives Per Image
Hoiem Efros Hebert 2006
73
Good Detectors Become Better
Local Detector from Dalal-Triggs 2005
Car Detection
Pedestrian Detection
All Information Objects Only
All Information Objects Only
74
Better Detectors ? Better Viewpoint
Using 2005 Local Detector
Using 2003 Local Detector
Horizon Prior
Median Error
8.5
3.8
3.0
90 Bound
75
More is Better
  • More objects ? Better viewpoint
    estimates
  • Detect Cars Only 7.3 Error
  • Detect Peds Only 5.0 Error
  • Detect Both 3.8 Error

Better viewpoint ? Better object
detection 10 fewer false positives at same
detection rate
76
Results
Car TP / FP Ped TP / FP
Initial 6 TP / 1 FP
Final 9 TP / 0 FP
77
Results
Car TP / FP Ped TP / FP
Initial 3 TP / 3 FP
Final 5 TP / 1 FP
78
Putting Objects in Perspective
Ped
Ped
Car
79
Geometrically Coherent Image Interpretation
Surface Maps
Support
Viewpoint/Size Reasoning
Viewpoint and Objects
80
Geometrically Coherent Image Interpretation
Surface Maps
Depth, Boundaries
Support
Boundaries
Horizon, Object Maps
Horizon, Object Maps
Viewpoint/Size Reasoning
Viewpoint and Objects
81
Geometrically Coherent Image Interpretation
Input
Surfaces
Occlusion Boundaries
Viewpoint and Objects
82
Geometrically Coherent Image Interpretation
Input
Surfaces
Occlusion Boundaries
Viewpoint and Objects
83
Geometrically Coherent Image Interpretation
Input
Surfaces
Occlusion Boundaries
Viewpoint and Objects
84
Geometrically Coherent Image Interpretation
Input
Surfaces
Occlusion Boundaries
Viewpoint and Objects
85
Next Steps
  • More robust and comprehensive high level
    reasoning
  • Learn perceptual similarity and general
    appearance models

86
Conclusions
  • One image contains much 3D information
  • Learn statistical models of the structure of our
    world from training images
  • Important aspects of approach
  • Use all available cues
  • Delay decisions
  • Think of vision as one 3D scene understanding
    problem

87
Video
88
Thank you
  • Acknowledgements
  • Committee Alyosha, Martial, Rahul, Takeo, and
    Bill
  • Practice Presentation Srinivas, Tom, Alex

89
(No Transcript)
90
Vision as Scene Understanding
Ohta Kanade 1978
  • Guzman (SEE), 1968
  • Hansen Riseman (VISIONS), 1978
  • Barrow Tenenbaum 1978
  • Brooks (ACRONYM), 1979
  • Marr (2 ½ D sketch), 1982
  • Ohta Kanade, 1978

91
Vision as Scene Understanding
Guzman 1968
Ohta Kanade 1978
92
Results
Car TP / FP Ped TP / FP
93
Failures
94
Failures Reflections, Rare Viewpoint
Input Image
Ground Truth
Our Result
95
Results
Car TP / FP Ped TP / FP
Initial 1 TP / 23 FP
Final 0 TP / 10 FP
Local Detector from Murphy-Torralba-Freeman 2003
96
Results
Car TP / FP Ped TP / FP
Initial 1 TP / 5 FP
Final 5 TP / 2 FP
97
How do we get robust scene priors?
Hill
Standing on Step
98
(No Transcript)
99
How to find occluding contours?
100
Other slides
101
Overview of Our Algorithm
Input Image
Multiple Segmentations
Surface Estimates
Final Labels
Learned Models
102
Estimating surface properties
  • We want to know
  • Is a segment is good?
  • If so, what is the surface label?
  • Learn these likelihoods from training images

P(good segment data)
P(label good segment, data)
103
Results
Input Image
Ground Truth
Our Result
104
Results
Input Image
Ground Truth
Our Result
105
Average Accuracy
Main Class 88.1 Subclasses 61.5
106
Experiments Input Image
107
Experiments Ground Truth
108
Experiments Our Result
109
Surface Estimates Paintings
Input Image
Our Result
110
Object Pasting
Lalonde et al. 2007
111
Object Pasting
Before
After
112
Object Pasting
Before
After
113
Are Surfaces Enough?
114
(No Transcript)
115
3D Surface Clues for Occlusions
Support
Planar
Porous
Sky
Solid
2
1
3
2D Boundary T-Junction
Surface Labels
Write a Comment
User Comments (0)
About PowerShow.com