CV: methods of 3D sensing - PowerPoint PPT Presentation

About This Presentation

Title:

CV: methods of 3D sensing

Description:

approximations often very good in center of the FOV ... Left: from compturer generated image of a vase; right: from a bust of Mozart ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 64

Provided by: georges9

Learn more at: http://www.cse.msu.edu

Category:

more less

Transcript and Presenter's Notes

Title: CV: methods of 3D sensing

1
CV methods of 3D sensing

Structured light
Shape-from-shading
Photometric stereo
Depth-from-focus
Structure from motion.

2
Alternate projection models

orthographic
weak perspective
simpler mathematical models
approximations often very good in center of the
FOV
can use as a first approximation and then switch
to full perspective

3
Perspective vs orthographic projection
Orthographic is often used in design and
blueprints. True (scaled) dimensions can be taken
from the image
4
Orthographic projection
5
Weak perspective is orthographic and scaling
6
Study of approximation
7
P3P problem solve for pose of object relative to
camera using 3 corresponding points (Pi, Qi)
3 points in 3D
3 corresponding 2D image points
8
What is the pose of an object?

pose means position and orientation
work in 3D camera frame defined by a known
camera with known parameters
common problem given the image of a known model
of an object, compute the pose of that object in
the camera frame
needed for object recognition by alignment and
for robot manipulation

9
Recognition by alignment

Have CAD model of objects
Detect image features of objects
Compute object pose from 3D-2D point matches

10
P3P solution approach
11
General PnP problem

perspective n-point problem
Given n 3D points from some model
Given n 2D image points known to correspond to
the 3D model points
Given perspective transformation with known
camera parameters (not pose)
Solve for the location of all n model points in
terms of camera coordinates, or the relative
rotation and translation of the object model

12
Formal definition of PnP problem
Solutions exist for P3P in most cases there are
2 solutions in a rare case there are 4 solutions
(see Fischler and Bolles 1981 paper). An
interative solution, good for continuous tracking
is given below. A simpler solution using weak
perspective has been provided by Huttenlocher and
Ullman (1988)
13
Deriving 3 quadratic equations in 3 unknowns
We know qi by solving for the 3 ai we will known
where each Pi is located
We know the interpoint distances from the model
qi are unit vectors
14
Iteratively solving 3 equations in 3 unknowns
Want these all to be 0
15
Approximate via Taylor series
Start with some guessed a1, a2, a3 and move
along gradient toward 0,0,0
16
Solution using Newtons Method
17
Our functions have simple partial derivatives
18
Iteration can be very fast
19
Notes on this P3P method

the equations actually have 8 solutions
4 are behind the camera (-ai ai)
4 are possible, but rare
2 are common how to get both
solutions?
method used by Ohmura et al (1988) to track a
human face at workstation using points outside
the eyes and one under the nose
any 3 model points can align with any 3 image
points can match a ship to the image of a face

20
Using weak perspective

algorithm by Huttenlocher and Ullman is in
closed form no iterations
it produces 2 solutions
these solutions can be used as starting points
for the iterative perspective method
additional point correspondences can be used to
choose correct starting point

21
Shape from shading methods

Computing surface normals of diffuse objects from
the intensity of surface pixels

22
Surface normals in C orthographic projection
23
Information used by such algorithms

Typically use weak perspective projection model
Brightest surface elt points to light
Normal determined to be perpendicular at object
limb
Use differential equations to propagate z from
boundary using surface normal.
Smooth using neighbor information.

24
Results from Tsai-Shah Alg.
Left from compturer generated image of a vase
right from a bust of Mozart
25
Constraint on surface normals
There is a cone of constraint for a normal N
relative to the light source.
26
How to use the constraints?
27
Photometric stereo calibrate by lighting a
sphere, get tables
28
Photometric stereo 3 lights
29
Photometric stereo online
30
Comments

Photometric stereo is a brilliant idea
Rajarshi Ray got it to work well even on specular
objects, such as metal parts
Requires careful set up and calibration
Not a replacement for structured light, which has
better precision and flexibility as evidenced by
many applications.

31
Depth from focus

Humans and machine vision devices can use focus
in a single image to estimate depth

32
Use model of thins lens
World point P is in focus at image point p
33
Automatic focus technique

Consumer camera autofocus many methods
One method requires user to frame object in a
small window (face?)
Focus is changed automatically until the contrast
is the best
Search over focal length until small window has
the sharpest features (most energy)

34
Depth map from focus concept

for an entire range of focal lengths fi
set focal plane at fi and take image
for all pixels (x,y) in the image,
compute contrast fi, x, y
set Depthx,y max contrastfi, x, y

35
A look at blur vs focal length

Can define resolution limit in line pairs per
inch can define depth-of-field of sensing

36
Points P create a blurred image on non optimal
image planes
Point P is in focus on plane S, but out of focus
on planes S and S
Image plane
37
How many line pairs can be resolved?

imagine a target that is just a set of parallel
black lines on white paper
if lines are far apart relative to the blur
radius b, then their image will be a set of lines
if the lines are close relative to blur radius
b, then a gray image without clear lines will be
observed

38
Thin lens equation relates object depth to image
plane via f
For world point P in focus, then the thin lens
equation is 1/f 1/u
1/v
39
Derivation of thin lens equation from geometry
40
To compute depth-of-field

the blur changes for different locations via
simple geometry
move image forward get blur
move image backward get blur
move image plane to extremes within limiting
blur b and compute depth of field

41
extreme locations of v set the extremes of u
a is aperture. By similar triangles b/a
(v-v)/v so v/v (ab)/a
42
Compute near extreme of u
Apply thin lens equation with v
Note that if b0, we obtain Un U
43
Compute far extreme of u
DEF The depth of field is the difference between
the far and near object planes (Ur Un) for the
given imaging parameters and blur b. Smaller
focal lengths f yield larger DOF.
44
Example computation

assume f 50 mm, u 1000 mm,
b 0.025mm, a 5 mm
Un 1000 (5 0.025) / (5 25/50)
1000 (5.025)/5.5 914
Ur 1000 (5 0.025) / (5 25/50)
1000 (4.975)/4.5 1106

45
Example computation

assume f 25 mm, u 1000 mm,
b 0.025mm, a 5 mm
Un 1000 (5 0.025) / (5 25/25)
1000 (5.025)/6.0 838
Ur 1000 (5 0.025) / (5 25/25)
1000 (4.975)/4.5 1244
A smaller f gives larger DOF

46
Large a needed to pinpoint u

changing the aperture to 10 mm
Un 955mm
Ur 1050mm
changing the aperture to 20 mm
Un 977mm
Ur 1024mm
(See work of Murali Subbarao)

47
Structure from Motion

A moving camera/computer computes the 3D
structure of the scene and its own motion

48
Sensing 3D scene structure via a moving camera
We now have two views over time/space compared to
stereo which has multiple views at the same time.
49
Assumptions for now

The scene is rigid.
The scene may move or the camera may move giving
a sequence of 2 or more 2D images
Corresponding 2D image points (Pi, Pj) are
available across the images

50
What can be computed

The 3D coordinates of the scene points
The motion of the camera

Camera sees many frames of 2D points
Rigid scene with many 3D interest points
From Jabara, Azarbayejani, Pentland
51
From 2D point correspondences, compute 3D points
WP and TR
52
applications

We can compute a 3D model of a landmark from a
video
We can create 3D television!
We can compute the trajectory of the sensor
relative to the 3D object points

53
Use only 2D correspondences, SfM can compute 3D
jig pts
up to one scale factor.
54
http//www1.cs.columbia.edu/jebara/htmlpapers/SFM
/sfm.html Jabara, Azarbayejani, Pentland

Two video frames with corresponding 2D interest
points. 3D points can be computed from SfM
method.
Some edges detected from 2D gradients.
Texture mapping from 2D frames onto 3D polyhedral
model.
3D model can be viewed arbitrarily!

55
Virtual museums 3D TV?

Much work, and software, from about 10 years ago.
3D models, including shape and texture can be
made of famous places (Notre Dame, Taj Mahal,
Titanic, etc.) and made available to those who
cannot travel to see the real landmark.
Theoretically, only quality video is required.
Usually, some handwork is needed.

56
Shape from Motion methods

Typically require careful mathematics
EX from 5 matched points, get 10 equations to
estimate 10 unknowns also a more popular 8 pt
linear method
Effects of noise imply many matches needed, still
can have large errors
Methods can run in real time
Rich literature still evolving

57
Special mathematics

Epipolar geometry is modeled
Fundamental matrix computed from a pair of
cameras and point matches
Essential matrix specialization of fundamental
matrix when calibration is available

58
Epipolar constraint on view pair
A) Relative orientation of cameras C1 and C2 can
be computed from many point matches
B) 3D point positions (P) can also be computed
from many point matches. Fundamental matrix
represents the constraints.
59
Revisit Internal parameters of the camera 5,6,7
?

Properties of actual camera, not its pose
Actual focal length f
Actual pixel size Sx, Sy
Actual location Ix, Iy of optical axis on image
array
Can have skew Sk
Can have radial distortion of the lens r.

Sensor array
Optical axis
60
6 Extrinsic/external parameters

Define the pose of the camera in the world
3 rotation parameters relative to W
3 translation parameters
Projection of world to image
IP Mi Me WP
where Me has 6 parameters and Mi has 5

61
Fundamental matrix F

Represents epipolar structure of 2 views of scene
Depends only on the internal parameters of the
camera and the relative pose of the two views
Not dependent on the scene
Can compute F, and E, and more from many
correspondences lots of literature and public
software
What actual mathematical methods? What point
detection and point correspondence methods?

62
Summary of shape-from methods

each uses a simple source of information math
model often uses minimal information
Psychologist J.J. Gibson, and others, were aware
of information used by humans
David Marr, around 1980, proposed study of
Type-I AI research
study information processing problem
identify what information is used
develop/study algorithm choices
favor algorithm suited for human arch.

63
Recent years

Trend is away from minimal models minimal models
are fragile
Multiple channels cooperate and compete (see
experiments by Ramachandran at UCSD)
Human brain is more plastic than formerly
believed many things are learned, new neurons
and connections

Write a Comment

User Comments (0)