Towards direct spatial manipulation of virtual 3D objects using visionbased tracking and gesture rec - PowerPoint PPT Presentation

1 / 74

About This Presentation

Title:

Towards direct spatial manipulation of virtual 3D objects using visionbased tracking and gesture rec

Description:

Towards direct spatial manipulation. of virtual 3D objects ... Eyewear (personal displays) by Lumus Inc. ( www.lumus-optical.com ... – PowerPoint PPT presentation

Number of Views:154

Avg rating:3.0/5.0

Slides: 75

Provided by: tecgraf

Category:

more less

Transcript and Presenter's Notes

Title: Towards direct spatial manipulation of virtual 3D objects using visionbased tracking and gesture rec

1
Towards direct spatial manipulation of virtual
3D objects using vision-based tracking and
gesture recognitionof unmarked hands

Sinia Kolaric

Master s thesis (dissertação de mestrado)
defense slides March 28, 2008 Advisor Prof.
Marcelo Gattass Co-advisor Prof. Alberto Barbosa
Raposo
2
Motivation

Manipulating 3D objects
in an intuitive fashion
using bare hands

3
Objective
Implementing five basic 3D manipulation
operations

Selection Deselection
Translation
Rotation
Scaling

4
Selected Past Systems
(marked, instrumented and unmarked hands)
5
Cutler et al (1997)
6
Forsberg et al (1998)
7
Segen, Kumar (2000)
8
Schkolne (2001)
9
Bettio et al (2007)
10
Our approach
11
Workplace
12
Stereo rig calibration
13
Stereo rig calibration
Jean-Yves Bouguet
14
Stereo rig calibration

Zhangs method
Recovers
Intrinsic parameters
Two focal lengths fx, fy
The principal point (ox, oy)
Distortion parameters (k1, k2, k3, k4)
Extrinsic parameters
Rotation matrix R
Translation vector

15
Stereo rig calibration

Extrinsic parameters R, of the stereo rig are
then

16
Stereo rig calibration

Obtaining fundamental matrix F
Needed later on for triangulation based on stereo
vision
Holds for any pairof correspondingpoints

17
Stereo rig calibration
Finding fundamental matrix F of the stereo rig
Left camera view
Right camera view
Views differ slightly (stereo disparity)
18
Stereo rig calibration
Finding fundamental matrix F of the stereo rig
Left camera view
Right camera view
Find corresponding points (squares meeting
points)
There are 76 42 corresponding points for an
8x7 checkerboard
19
Stereo rig calibration
Finding fundamental matrix F of the stereo rig
Hartley, Zisserman MVG
20
Stereo rig calibration
Finding fundamental matrix F of the stereo
rig (normalized 8-point algorithm for F)
Hartley, Zisserman MVG
21
State switching using gestures
Each gesture is a switch triggering an event
Left hand
Right hand
22
State switching using gestures
Examples of manipulation operations
23
Viola-Jones detection method
(applied to hand detection gesture recognition)
Hit
Hit and false hit
24
Viola-Jones detection method
(applied to hand detection gesture recognition)
Hit and multiple false hits
Miss
25
Viola-Jones detection method

Pros

Cons

invariance with regard to background
insensitivity to changes in illumination/lighting
invariance with regard to camera
invariance with regard to scale
fast execution (15x faster than previous best
methods)
works with gray images only color is not needed

very long training times (up to several days for
one object (or hand posture) on a 30-node cluster)

26
Viola-Jones detection method
Originally developed for face detection
However, works for any type of object
27
Viola-Jones detection method
Extended Viola-Jones method by Lienhart, Maydt
28
Viola-Jones detection method
Extended Viola-Jones method by Lienhart, Maydt
29
Viola-Jones detection method
Strong classifier obtained by AdaBoost
A linear combination of weak classifiers
ht(x) (a weak classifier a rectangular feature)
Extended Viola-Jones method by Lienhart, Maydt
30
Viola-Jones detection method
Example A strong classifier consisting of two
weak classifiers
Extended Viola-Jones method by Lienhart, Maydt
31
Viola-Jones detection method

Strong classifiers can be arbitrarily accurate
but tend to become slow as more weak classifiers
are added during the learning process
Way out cascades of strong classifiers
Basically, several strong classifiers linked into
a chain

Extended Viola-Jones method by Lienhart, Maydt
32
Viola-Jones detection method
A cascade of strong classifiers
Extended Viola-Jones method by Lienhart, Maydt
33
2D hand trackingusing Flocks of KLT features
34
2D hand trackingusing Flocks of KLT features
35
2D hand trackingusing Flocks of KLT features

hands mean position average of KLT features
positions

36
2D hand trackingusing Flocks of KLT features

Two conditions enforced at each frame
No two KLT features can be closer to each other
than some threshold distance
No KLT feature can be further from the feature
median than a second threshold distance

2005 Kolsch, Turk - Hand tracking with Flocks of
Features
37
3D reconstruction of hands position using
triangulation

A 3D point in the scene gets projected on both
the left and right screen (image planes)

3D point
Hartley, Zisserman MVG
38
3D reconstruction of hands position using
triangulation

Ideal case rays back-projected from measured
pixel points do meet in space

Hartley, Zisserman MVG
39
3D reconstruction of hands position using
triangulation

Real life rays back-projected from imperfectly
measured pixel points do not meet in space

Hartley, Zisserman MVG
40
3D reconstruction of hands position using
triangulation

Solution mid-point method intersection
estimated as the point of minimum distance from
both rays

41
3D reconstruction of hands position using
triangulation

Better minimize geometric error by finding
points , so that

Hartley, Zisserman MVG
42
3D reconstruction of hands position using
triangulation

That is, minimize the cost function
Having , use any triangulation method
(e.g. mid-point) to find the originating 3D point

Hartley, Zisserman MVG
43
3D reconstruction of hands position using
triangulation
Hartley, Zisserman MVG
44
3D reconstruction of hands position using
triangulation
Hartley, Zisserman MVG
45
Basic ingredients summary

A well-defined workplace setup with a calibrated
stereo rig
Viola-Jones method for hand detection and
recognition
Flocks of KLT features for 2D hand tracking (in
both cameras views)
Triangulation (based on stereo vision) for
recovery of the third hand coordinate (depth)
using two tracked 2D positions

46
Tests Results
47
Tracing lines
48
Detector performance (hand posture OPEN)
49
Detector performance (hand posture POINTING)
50
Detector performance (hand posture FIST)
51
Video

show video

52
Contributions
53
1) 3D TRACKING OF UP TO TWO UNMARKED HANDS

Key ingredients
2D flock-of-KLT-features hand tracking
Triangulation based on stereo vision for
extracting hands third dimension

54
2) A NOVEL SPATIAL-INPUT DEVICE

Key ingredients
The aforementioned 3D unmarked hand tracking
Use of the Viola-Jones detection method for state
switching
Two hands give a 2 x 3 6 d.o.f. spatial input
device

55
3) FREE-HAND SPATIAL MANIPULATION

In conjuction with the aforementioned spatial
input device, the prototype developed enables the
user to
Manipulate 3D virtual objects using free-hand
motion
In other words, there is no need to instrument
the users hands in any way in order to perform
3D manipulation operations

56
Limitations
57
Flocks-of-features sometimes drift to surrounding
objects

Flocks of tracked features sometimes drift to
other objects
Can especially happen on cluttered desks

58
(Still) overly high false hit rates

Hand detectors false hit rates we achieved are
still too high
longer training sessions more powerful
computing resources needed
Various heuristics come to rescue (e.g. the
average posture in the last 1000 miliseconds)

59
Future work
60
Richer set of manipulations and deformations
Going beyond the basic set of manipulations we
want deformations too
61
Advanced (volumetric) topological data structures
Needed to support advanced deformation operations
62
Increasing robustness of detection

The goal isnt to add more gestures the goal is
to increase robustness of the existing 2-3
gestures detection
Too many gestures lead to users cognitive
overload (at least in the beginning)
2-3 gestures suffice to implement A LOT of
functionality

63
Improving sense of where

Improving sense of position and orientation in
the fishtank-VR by adding spatial cues
Shadows cast by 3D objects
2D Projections of 3D objects planes XY, YZ, ZX

64
Wide-angle/fish eye cameras
Increasing the workspace
65
Cameras towards the user
Workspace (use hands here)
Desktop computer users
66
Cameras towards the user
Notebook users
67
Cameras towards the user
camera built into the cellphone
use hands here
Mobile platform users ? 3D modeling on cell phones
68
Model-based (3D) hand tracking
3D hand tracking for more expressive manipulation
M. Bray, E. Koller-Meier, L. Van Gool (2007)
69
Dynamic gestures
Dynamic gestures for more expressive manipulation
Hand posture changes in space AND time
70
Human factors
Comfort zones for hand actions, while standing
Kölsch 2004
71
Human factors
osha.gov
72
Human factors
Achieving comfort putting elbows on chair
supports
73
Natural fit head-mounted displays
(user can reach into the display volume in front
of her/him)
Eyewear (personal displays) by Lumus Inc.
(www.lumus-optical.com/)
74
Integrating other ways to recover hands depth
hand
ZCam such a camera would eliminate the
calibration and triangulation steps
3DV Systems' ZCam depth-sensing camera
75
Thank you

Write a Comment

User Comments (0)