Using a Webcam as a Game Controller - PowerPoint PPT Presentation

About This Presentation
Title:

Using a Webcam as a Game Controller

Description:

Removes the barrier of some funny plastic controller. ... Previous beat-matching games (Parappa, DDR) are very digital; I want to use a ... – PowerPoint PPT presentation

Number of Views:422
Avg rating:3.0/5.0
Slides: 34
Provided by: Bea73
Category:

less

Transcript and Presenter's Notes

Title: Using a Webcam as a Game Controller


1
Using a Webcam as a Game Controller
  • Jonathan Blow
  • GDC 2002

2
Motivation
  • A potentially rich control paradigm, allowing for
    nuance.
  • Removes the barrier of some funny plastic
    controller.
  • Successful experiment Konamis Police 911

3
My game Air Guitar
  • A beat-matching game where you stand and play
    air guitar to your favorite songs.
  • Previous beat-matching games (Parappa, DDR) are
    very digital I want to use a webcam to make Air
    Guitar more organic and to allow the user to be
    expressive.
  • Technically demanding as a vision app (needs
    semantics about what is what).

4
Real-World Concerns
  • Noise
  • Illumination changes
  • Camera auto-adjusts
  • Background changes / camera moves
  • Shadows
  • Camera saturation / under-excitement

5
Varying Lighting Conditions
  • Cant rely on RGB values to identify pixels
  • Need context hmm this becomes a hard AI problem.

6
Vision Techniques That Suck
  • Background subtraction (shadows, motion!)
  • Noise reduction by smoothing (resolution!)
  • Turning functions (unstable)
  • Frame coherence (just a band-aid)
  • Edge detection
  • Hysteresis (Latin for cheap hack)
  • Discreteness

7
General Paradigm
  • Technique should
  • Work on a still image
  • Be robust avoid discrete decisions wherever
    possible.
  • Work in as general a case as we should manage,
    but we wont strive to be ideally general.
  • We will do whatever it takes to get the job
    done.

8
Restrained Ambition
  • Only trying to roughly determine the positions of
    torso and arms
  • Okay to say the user must wear a long-sleeved
    shirt of uniform color that contrasts with the
    background
  • We wont dictate the color of the shirt (too
    restrictive!)
  • We wont dictate colors of other things (users
    skin, background).

9
Early Segmentation
  • Divide up the image into regions of like pixels
    to ease computation.
  • Ad hoc technique iterate over scanlines
    potentially adding each pixel to its neighbors
    group.
  • This technique sucks.

10
The Unreasonable Instability of Approximate
Clustering
  • Real clustering is slow
  • Loose clustering is interactively unstable
  • Even just the small amount of camera noise makes
    things go berserk motion is even worse.
  • Clustering is about continuous gt discrete. We
    wanted to avoid that so we should be very careful.

11
My solution Be Inflexible
  • Simply divide the image into square regions of
    constant size.
  • If any region needs more detail, subdivide it.
  • Noise still affects this system (some regions
    subdividing / recombining from frame to frame)
    but its relatively stable.

12
Which color space do we work in?
  • Want to group pixels that are alike nearby in
    some color space.
  • Choices nonlinear RGB, linear-light RGB, CIE
    LAB, many others.
  • CIE LAB produced nicer results for some ad hoc
    segmentation experiments, but is expensive to
    compute.
  • Linear-light RGB is the right thing for inverse
    rendering techniques it is cheap to compute.
  • I started with CIE LAB, but now use linear RGB.

13
Simple Inverse Rendering
  • Assume all surfaces have Lambertian reflectance
  • p mlcos? ? is angle between light and surface
    normal.
  • Cant disambiguate material color from illuminant
    color
  • The compound color ml, under varying scale, forms
    a vector through the origin in RGB space.
  • This is a much more specific relation than e.g.
    Euclidean distance.

14
Covariance Bodies
  • 5 numbers worth of storage
  • Ellipsoid-shaped (take eigenvectors of matrix)
  • Statistical significance expected value of
    points
  • Advantage consistency under summation
  • Can use them to vaguely characterize shapes.
  • Generalizes to n dimensions.

15
Covariance Bodies for Color Plane Fitting
  • Least-squares plane fit uses the same matrix.
  • Track RHS. 3 more numbers
  • Sum these to get group plane fits.
  • (example)

16
Calibration Mode
  • Stand in a fixed pose
  • Pose designed to be easily recognizable
  • Gives us things that help later
  • Body measurements
  • Background of scene
  • Shirt color (and histogram)
  • Skin color
  • Coarse model of environment illumination

17
How We Recognize This Pose
  • Pick a color to look for isolate it.
  • Project this color to the X and Y image axes
  • Find spikes in projection
  • Use heuristics to judge shape and give a
    confidence value
  • Outliers
  • Relative spike sizes
  • Screen real-estate occupied
  • (example)

18
Try many colors.
  • Sort colors present in scene by popularity
    cluster them.
  • Create a fuzzy color cone through each cluster.
  • Vary the cone radius.
  • Do the recognition listed on previous slide
    select the color cone with the best score.
  • Fixed color grid (to combat instability!)

19
(demo of calibration mode)
20
Head Finding
  • Many heuristics
  • Medium-detail region (Flatness sharpness)
  • But not a long sharp edge
  • Compact body
  • Skin-colored
  • Not the background

21
Skin color?
  • Fit points in RGB space with an approximating
    surface?
  • Where do I get a good skin color database?

22
www.hotornot.com!
  • I get to work and check people out at the same
    time.
  • (app demo)

23
Gameplay Recognition Mode
  • Goal Find positions of users torso and arms.
  • When were actually playing the game, we use the
    info provided by calibration to help us.
  • Currently only use shirt skin color.

24
Body Shape Analysis
  • Slide a square window across the image for each
    window position, use the pixel regions falling
    within the window to perform a local shape
    analysis.
  • Examine the resulting ellipses to find the arms.
    These are long, centered ellipses round regions
    are the torso. (example)
  • Path-trace these to get an ordered series of
    points representing each arm.
  • Fit one or two line segments to this series of
    points (one segment straight arm, two bent).

25
Hands in front of body?
  • The arm will blend into the body.
  • The hands will look like holes in the body.
  • This messes up arm detection.

26
Multi-step Process
  • Do a sliding window pass approximate extents of
    torso using initial set of regions (holes may be
    there).
  • Look for hand-colored blobs in this area.
  • Merge those blobs with the set of torso regions.
  • Do another sliding window pass, now detecting
    elongated shapes (for arms).

27
Creating a 3D character pose from 2D information
  • Resolve ambiguities with game-domain constraints
    (e.g. hands always within some plane in front of
    torso).
  • Use inverse kinematics and some simple body
    knowledge to recover 3D joint angles.
  • See the column The Inner Product in the April
    2002 issue of Game Developer for an explanation
    of 3D IK, and source code.

28
Method Advantages
  • Its reasonably fast
  • Works with moving background / camera
  • Doesnt care much about shadows

29
Method Shortcomings
  • Currently confused by similar colors (low
    clustering resolution)
  • Requires a few more technical solutions before it
    will be truly robust (e.g. auto gamma detection).

30
Future Work
  • Performance 640x480 _at_ 30fps
  • More inverse rendering work (specularity)
  • Local surface modeling (eliminate confusion due
    to similar colors)
  • Texture classification
  • Mental model feedback

31
Coding Issues
  • How do you get video images from a webcam in
    Windows?
  • VFW code by Nathan dObrenan in Game Programming
    Gems 2
  • Unfortunately, VFW is a legacy API
  • DirectShow is the thing you need to use for
    future compatibility.

32
DirectShow is terrible!
  • Needlessly complex and bloated.
  • The base classes provided in the DirectX SDK
    induce a lot of latency (latency death)
  • A minimal implementation of just give me a damn
    frame from the camera took 1,500 lines of code
    should have taken 8.
  • Ask me if you want the source code
    (jon_at_bolt-action.com)
  • Or use VFW or a proprietary API.

33
Blatant Plug
  • Experimental Gameplay Workshop
  • Friday, 4pm-7pm, Fairmont Regency I
Write a Comment
User Comments (0)
About PowerShow.com