Towards direct spatial manipulation of virtual 3D objects using visionbased tracking and gesture rec - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

Towards direct spatial manipulation of virtual 3D objects using visionbased tracking and gesture rec

Description:

Towards direct spatial manipulation. of virtual 3D objects ... Eyewear (personal displays) by Lumus Inc. ( www.lumus-optical.com ... – PowerPoint PPT presentation

Number of Views:154
Avg rating:3.0/5.0
Slides: 75
Provided by: tecgraf
Category:

less

Transcript and Presenter's Notes

Title: Towards direct spatial manipulation of virtual 3D objects using visionbased tracking and gesture rec


1
Towards direct spatial manipulation of virtual
3D objects using vision-based tracking and
gesture recognitionof unmarked hands
  • Sinia Kolaric

Master s thesis (dissertação de mestrado)
defense slides March 28, 2008 Advisor Prof.
Marcelo Gattass Co-advisor Prof. Alberto Barbosa
Raposo
2
Motivation
  • Manipulating 3D objects
  • in an intuitive fashion
  • using bare hands

3
Objective
Implementing five basic 3D manipulation
operations
  • Selection Deselection
  • Translation
  • Rotation
  • Scaling

4
Selected Past Systems
(marked, instrumented and unmarked hands)
5
Cutler et al (1997)
6
Forsberg et al (1998)
7
Segen, Kumar (2000)
8
Schkolne (2001)
9
Bettio et al (2007)
10
Our approach
11
Workplace
12
Stereo rig calibration
13
Stereo rig calibration
Jean-Yves Bouguet
14
Stereo rig calibration
  • Zhangs method
  • Recovers
  • Intrinsic parameters
  • Two focal lengths fx, fy
  • The principal point (ox, oy)
  • Distortion parameters (k1, k2, k3, k4)
  • Extrinsic parameters
  • Rotation matrix R
  • Translation vector

15
Stereo rig calibration
  • Extrinsic parameters R, of the stereo rig are
    then

16
Stereo rig calibration
  • Obtaining fundamental matrix F
  • Needed later on for triangulation based on stereo
    vision
  • Holds for any pairof correspondingpoints

17
Stereo rig calibration
Finding fundamental matrix F of the stereo rig
Left camera view
Right camera view
Views differ slightly (stereo disparity)
18
Stereo rig calibration
Finding fundamental matrix F of the stereo rig
Left camera view
Right camera view
Find corresponding points (squares meeting
points)
There are 76 42 corresponding points for an
8x7 checkerboard
19
Stereo rig calibration
Finding fundamental matrix F of the stereo rig
Hartley, Zisserman MVG
20
Stereo rig calibration
Finding fundamental matrix F of the stereo
rig (normalized 8-point algorithm for F)
Hartley, Zisserman MVG
21
State switching using gestures
Each gesture is a switch triggering an event
Left hand
Right hand
22
State switching using gestures
Examples of manipulation operations
23
Viola-Jones detection method
(applied to hand detection gesture recognition)
Hit
Hit and false hit
24
Viola-Jones detection method
(applied to hand detection gesture recognition)
Hit and multiple false hits
Miss
25
Viola-Jones detection method
  • Pros
  • Cons
  • invariance with regard to background
  • insensitivity to changes in illumination/lighting
  • invariance with regard to camera
  • invariance with regard to scale
  • fast execution (15x faster than previous best
    methods)
  • works with gray images only color is not needed
  • very long training times (up to several days for
    one object (or hand posture) on a 30-node cluster)

26
Viola-Jones detection method
Originally developed for face detection
However, works for any type of object
27
Viola-Jones detection method
Extended Viola-Jones method by Lienhart, Maydt
28
Viola-Jones detection method
Extended Viola-Jones method by Lienhart, Maydt
29
Viola-Jones detection method
Strong classifier obtained by AdaBoost
A linear combination of weak classifiers
ht(x) (a weak classifier a rectangular feature)
Extended Viola-Jones method by Lienhart, Maydt
30
Viola-Jones detection method
Example A strong classifier consisting of two
weak classifiers
Extended Viola-Jones method by Lienhart, Maydt
31
Viola-Jones detection method
  • Strong classifiers can be arbitrarily accurate
    but tend to become slow as more weak classifiers
    are added during the learning process
  • Way out cascades of strong classifiers
  • Basically, several strong classifiers linked into
    a chain

Extended Viola-Jones method by Lienhart, Maydt
32
Viola-Jones detection method
A cascade of strong classifiers
Extended Viola-Jones method by Lienhart, Maydt
33
2D hand trackingusing Flocks of KLT features
34
2D hand trackingusing Flocks of KLT features
35
2D hand trackingusing Flocks of KLT features
  • hands mean position average of KLT features
    positions

36
2D hand trackingusing Flocks of KLT features
  • Two conditions enforced at each frame
  • No two KLT features can be closer to each other
    than some threshold distance
  • No KLT feature can be further from the feature
    median than a second threshold distance

2005 Kolsch, Turk - Hand tracking with Flocks of
Features
37
3D reconstruction of hands position using
triangulation
  • A 3D point in the scene gets projected on both
    the left and right screen (image planes)

3D point
Hartley, Zisserman MVG
38
3D reconstruction of hands position using
triangulation
  • Ideal case rays back-projected from measured
    pixel points do meet in space

Hartley, Zisserman MVG
39
3D reconstruction of hands position using
triangulation
  • Real life rays back-projected from imperfectly
    measured pixel points do not meet in space

Hartley, Zisserman MVG
40
3D reconstruction of hands position using
triangulation
  • Solution mid-point method intersection
    estimated as the point of minimum distance from
    both rays

41
3D reconstruction of hands position using
triangulation
  • Better minimize geometric error by finding
    points , so that

Hartley, Zisserman MVG
42
3D reconstruction of hands position using
triangulation
  • That is, minimize the cost function
  • Having , use any triangulation method
    (e.g. mid-point) to find the originating 3D point

Hartley, Zisserman MVG
43
3D reconstruction of hands position using
triangulation
Hartley, Zisserman MVG
44
3D reconstruction of hands position using
triangulation
Hartley, Zisserman MVG
45
Basic ingredients summary
  • A well-defined workplace setup with a calibrated
    stereo rig
  • Viola-Jones method for hand detection and
    recognition
  • Flocks of KLT features for 2D hand tracking (in
    both cameras views)
  • Triangulation (based on stereo vision) for
    recovery of the third hand coordinate (depth)
    using two tracked 2D positions

46
Tests Results
47
Tracing lines
48
Detector performance (hand posture OPEN)
49
Detector performance (hand posture POINTING)
50
Detector performance (hand posture FIST)
51
Video
  • show video

52
Contributions
53
1) 3D TRACKING OF UP TO TWO UNMARKED HANDS
  • Key ingredients
  • 2D flock-of-KLT-features hand tracking
  • Triangulation based on stereo vision for
    extracting hands third dimension

54
2) A NOVEL SPATIAL-INPUT DEVICE
  • Key ingredients
  • The aforementioned 3D unmarked hand tracking
  • Use of the Viola-Jones detection method for state
    switching
  • Two hands give a 2 x 3 6 d.o.f. spatial input
    device

55
3) FREE-HAND SPATIAL MANIPULATION
  • In conjuction with the aforementioned spatial
    input device, the prototype developed enables the
    user to
  • Manipulate 3D virtual objects using free-hand
    motion
  • In other words, there is no need to instrument
    the users hands in any way in order to perform
    3D manipulation operations

56
Limitations
57
Flocks-of-features sometimes drift to surrounding
objects
  • Flocks of tracked features sometimes drift to
    other objects
  • Can especially happen on cluttered desks

58
(Still) overly high false hit rates
  • Hand detectors false hit rates we achieved are
    still too high
  • longer training sessions more powerful
    computing resources needed
  • Various heuristics come to rescue (e.g. the
    average posture in the last 1000 miliseconds)

59
Future work
60
Richer set of manipulations and deformations
Going beyond the basic set of manipulations we
want deformations too
61
Advanced (volumetric) topological data structures
Needed to support advanced deformation operations
62
Increasing robustness of detection
  • The goal isnt to add more gestures the goal is
    to increase robustness of the existing 2-3
    gestures detection
  • Too many gestures lead to users cognitive
    overload (at least in the beginning)
  • 2-3 gestures suffice to implement A LOT of
    functionality

63
Improving sense of where
  • Improving sense of position and orientation in
    the fishtank-VR by adding spatial cues
  • Shadows cast by 3D objects
  • 2D Projections of 3D objects planes XY, YZ, ZX

64
Wide-angle/fish eye cameras
Increasing the workspace
65
Cameras towards the user
Workspace (use hands here)
Desktop computer users
66
Cameras towards the user
Notebook users
67
Cameras towards the user
camera built into the cellphone
use hands here
Mobile platform users ? 3D modeling on cell phones
68
Model-based (3D) hand tracking
3D hand tracking for more expressive manipulation
M. Bray, E. Koller-Meier, L. Van Gool (2007)
69
Dynamic gestures
Dynamic gestures for more expressive manipulation
Hand posture changes in space AND time
70
Human factors
Comfort zones for hand actions, while standing
Kölsch 2004
71
Human factors
osha.gov
72
Human factors
Achieving comfort putting elbows on chair
supports
73
Natural fit head-mounted displays
(user can reach into the display volume in front
of her/him)
Eyewear (personal displays) by Lumus Inc.
(www.lumus-optical.com/)
74
Integrating other ways to recover hands depth
hand
ZCam such a camera would eliminate the
calibration and triangulation steps
3DV Systems' ZCam depth-sensing camera
75
Thank you
Write a Comment
User Comments (0)
About PowerShow.com