Using a Webcam as a Game Controller - PowerPoint PPT Presentation

About This Presentation

Title:

Using a Webcam as a Game Controller

Description:

Removes the barrier of some funny plastic controller. ... Previous beat-matching games (Parappa, DDR) are very digital; I want to use a ... – PowerPoint PPT presentation

Number of Views:422

Avg rating:3.0/5.0

Slides: 34

Provided by: Bea73

Category:

more less

Transcript and Presenter's Notes

Title: Using a Webcam as a Game Controller

1
Using a Webcam as a Game Controller

Jonathan Blow
GDC 2002

2
Motivation

A potentially rich control paradigm, allowing for
nuance.
Removes the barrier of some funny plastic
controller.
Successful experiment Konamis Police 911

3
My game Air Guitar

A beat-matching game where you stand and play
air guitar to your favorite songs.
Previous beat-matching games (Parappa, DDR) are
very digital I want to use a webcam to make Air
Guitar more organic and to allow the user to be
expressive.
Technically demanding as a vision app (needs
semantics about what is what).

4
Real-World Concerns

Noise
Illumination changes
Camera auto-adjusts
Background changes / camera moves
Shadows
Camera saturation / under-excitement

5
Varying Lighting Conditions

Cant rely on RGB values to identify pixels
Need context hmm this becomes a hard AI problem.

6
Vision Techniques That Suck

Background subtraction (shadows, motion!)
Noise reduction by smoothing (resolution!)
Turning functions (unstable)
Frame coherence (just a band-aid)
Edge detection
Hysteresis (Latin for cheap hack)
Discreteness

7
General Paradigm

Technique should
Work on a still image
Be robust avoid discrete decisions wherever
possible.
Work in as general a case as we should manage,
but we wont strive to be ideally general.
We will do whatever it takes to get the job
done.

8
Restrained Ambition

Only trying to roughly determine the positions of
torso and arms
Okay to say the user must wear a long-sleeved
shirt of uniform color that contrasts with the
background
We wont dictate the color of the shirt (too
restrictive!)
We wont dictate colors of other things (users
skin, background).

9
Early Segmentation

Divide up the image into regions of like pixels
to ease computation.
Ad hoc technique iterate over scanlines
potentially adding each pixel to its neighbors
group.
This technique sucks.

10
The Unreasonable Instability of Approximate
Clustering

Real clustering is slow
Loose clustering is interactively unstable
Even just the small amount of camera noise makes
things go berserk motion is even worse.
Clustering is about continuous gt discrete. We
wanted to avoid that so we should be very careful.

11
My solution Be Inflexible

Simply divide the image into square regions of
constant size.
If any region needs more detail, subdivide it.
Noise still affects this system (some regions
subdividing / recombining from frame to frame)
but its relatively stable.

12
Which color space do we work in?

Want to group pixels that are alike nearby in
some color space.
Choices nonlinear RGB, linear-light RGB, CIE
LAB, many others.
CIE LAB produced nicer results for some ad hoc
segmentation experiments, but is expensive to
compute.
Linear-light RGB is the right thing for inverse
rendering techniques it is cheap to compute.
I started with CIE LAB, but now use linear RGB.

13
Simple Inverse Rendering

Assume all surfaces have Lambertian reflectance
p mlcos? ? is angle between light and surface
normal.
Cant disambiguate material color from illuminant
color
The compound color ml, under varying scale, forms
a vector through the origin in RGB space.
This is a much more specific relation than e.g.
Euclidean distance.

14
Covariance Bodies

5 numbers worth of storage
Ellipsoid-shaped (take eigenvectors of matrix)
Statistical significance expected value of
points
Advantage consistency under summation
Can use them to vaguely characterize shapes.
Generalizes to n dimensions.

15
Covariance Bodies for Color Plane Fitting

Least-squares plane fit uses the same matrix.
Track RHS. 3 more numbers
Sum these to get group plane fits.
(example)

16
Calibration Mode

Stand in a fixed pose
Pose designed to be easily recognizable
Gives us things that help later
Body measurements
Background of scene
Shirt color (and histogram)
Skin color
Coarse model of environment illumination

17
How We Recognize This Pose

Pick a color to look for isolate it.
Project this color to the X and Y image axes
Find spikes in projection
Use heuristics to judge shape and give a
confidence value
Outliers
Relative spike sizes
Screen real-estate occupied
(example)

18
Try many colors.

Sort colors present in scene by popularity
cluster them.
Create a fuzzy color cone through each cluster.
Vary the cone radius.
Do the recognition listed on previous slide
select the color cone with the best score.
Fixed color grid (to combat instability!)

19
(demo of calibration mode)
20
Head Finding

Many heuristics
Medium-detail region (Flatness sharpness)
But not a long sharp edge
Compact body
Skin-colored
Not the background

21
Skin color?

Fit points in RGB space with an approximating
surface?
Where do I get a good skin color database?

22
www.hotornot.com!

I get to work and check people out at the same
time.
(app demo)

23
Gameplay Recognition Mode

Goal Find positions of users torso and arms.
When were actually playing the game, we use the
info provided by calibration to help us.
Currently only use shirt skin color.

24
Body Shape Analysis

Slide a square window across the image for each
window position, use the pixel regions falling
within the window to perform a local shape
analysis.
Examine the resulting ellipses to find the arms.
These are long, centered ellipses round regions
are the torso. (example)
Path-trace these to get an ordered series of
points representing each arm.
Fit one or two line segments to this series of
points (one segment straight arm, two bent).

25
Hands in front of body?

The arm will blend into the body.
The hands will look like holes in the body.
This messes up arm detection.

26
Multi-step Process

Do a sliding window pass approximate extents of
torso using initial set of regions (holes may be
there).
Look for hand-colored blobs in this area.
Merge those blobs with the set of torso regions.
Do another sliding window pass, now detecting
elongated shapes (for arms).

27
Creating a 3D character pose from 2D information

Resolve ambiguities with game-domain constraints
(e.g. hands always within some plane in front of
torso).
Use inverse kinematics and some simple body
knowledge to recover 3D joint angles.
See the column The Inner Product in the April
2002 issue of Game Developer for an explanation
of 3D IK, and source code.

28
Method Advantages

Its reasonably fast
Works with moving background / camera
Doesnt care much about shadows

29
Method Shortcomings

Currently confused by similar colors (low
clustering resolution)
Requires a few more technical solutions before it
will be truly robust (e.g. auto gamma detection).

30
Future Work

Performance 640x480 _at_ 30fps
More inverse rendering work (specularity)
Local surface modeling (eliminate confusion due
to similar colors)
Texture classification
Mental model feedback

31
Coding Issues

How do you get video images from a webcam in
Windows?
VFW code by Nathan dObrenan in Game Programming
Gems 2
Unfortunately, VFW is a legacy API
DirectShow is the thing you need to use for
future compatibility.

32
DirectShow is terrible!

Needlessly complex and bloated.
The base classes provided in the DirectX SDK
induce a lot of latency (latency death)
A minimal implementation of just give me a damn
frame from the camera took 1,500 lines of code
should have taken 8.
Ask me if you want the source code
(jon_at_bolt-action.com)
Or use VFW or a proprietary API.

33
Blatant Plug