Using a Webcam as a Game Controller - PowerPoint PPT Presentation


PPT – Using a Webcam as a Game Controller PowerPoint presentation | free to download - id: 35023-MzBhY


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Using a Webcam as a Game Controller


Removes the barrier of some funny plastic controller. ... Previous beat-matching games (Parappa, DDR) are very digital; I want to use a ... – PowerPoint PPT presentation

Number of Views:370
Avg rating:3.0/5.0
Slides: 34
Provided by: Bea73
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Using a Webcam as a Game Controller

Using a Webcam as a Game Controller
  • Jonathan Blow
  • GDC 2002

  • A potentially rich control paradigm, allowing for
  • Removes the barrier of some funny plastic
  • Successful experiment Konamis Police 911

My game Air Guitar
  • A beat-matching game where you stand and play
    air guitar to your favorite songs.
  • Previous beat-matching games (Parappa, DDR) are
    very digital I want to use a webcam to make Air
    Guitar more organic and to allow the user to be
  • Technically demanding as a vision app (needs
    semantics about what is what).

Real-World Concerns
  • Noise
  • Illumination changes
  • Camera auto-adjusts
  • Background changes / camera moves
  • Shadows
  • Camera saturation / under-excitement

Varying Lighting Conditions
  • Cant rely on RGB values to identify pixels
  • Need context hmm this becomes a hard AI problem.

Vision Techniques That Suck
  • Background subtraction (shadows, motion!)
  • Noise reduction by smoothing (resolution!)
  • Turning functions (unstable)
  • Frame coherence (just a band-aid)
  • Edge detection
  • Hysteresis (Latin for cheap hack)
  • Discreteness

General Paradigm
  • Technique should
  • Work on a still image
  • Be robust avoid discrete decisions wherever
  • Work in as general a case as we should manage,
    but we wont strive to be ideally general.
  • We will do whatever it takes to get the job

Restrained Ambition
  • Only trying to roughly determine the positions of
    torso and arms
  • Okay to say the user must wear a long-sleeved
    shirt of uniform color that contrasts with the
  • We wont dictate the color of the shirt (too
  • We wont dictate colors of other things (users
    skin, background).

Early Segmentation
  • Divide up the image into regions of like pixels
    to ease computation.
  • Ad hoc technique iterate over scanlines
    potentially adding each pixel to its neighbors
  • This technique sucks.

The Unreasonable Instability of Approximate
  • Real clustering is slow
  • Loose clustering is interactively unstable
  • Even just the small amount of camera noise makes
    things go berserk motion is even worse.
  • Clustering is about continuous gt discrete. We
    wanted to avoid that so we should be very careful.

My solution Be Inflexible
  • Simply divide the image into square regions of
    constant size.
  • If any region needs more detail, subdivide it.
  • Noise still affects this system (some regions
    subdividing / recombining from frame to frame)
    but its relatively stable.

Which color space do we work in?
  • Want to group pixels that are alike nearby in
    some color space.
  • Choices nonlinear RGB, linear-light RGB, CIE
    LAB, many others.
  • CIE LAB produced nicer results for some ad hoc
    segmentation experiments, but is expensive to
  • Linear-light RGB is the right thing for inverse
    rendering techniques it is cheap to compute.
  • I started with CIE LAB, but now use linear RGB.

Simple Inverse Rendering
  • Assume all surfaces have Lambertian reflectance
  • p mlcos? ? is angle between light and surface
  • Cant disambiguate material color from illuminant
  • The compound color ml, under varying scale, forms
    a vector through the origin in RGB space.
  • This is a much more specific relation than e.g.
    Euclidean distance.

Covariance Bodies
  • 5 numbers worth of storage
  • Ellipsoid-shaped (take eigenvectors of matrix)
  • Statistical significance expected value of
  • Advantage consistency under summation
  • Can use them to vaguely characterize shapes.
  • Generalizes to n dimensions.

Covariance Bodies for Color Plane Fitting
  • Least-squares plane fit uses the same matrix.
  • Track RHS. 3 more numbers
  • Sum these to get group plane fits.
  • (example)

Calibration Mode
  • Stand in a fixed pose
  • Pose designed to be easily recognizable
  • Gives us things that help later
  • Body measurements
  • Background of scene
  • Shirt color (and histogram)
  • Skin color
  • Coarse model of environment illumination

How We Recognize This Pose
  • Pick a color to look for isolate it.
  • Project this color to the X and Y image axes
  • Find spikes in projection
  • Use heuristics to judge shape and give a
    confidence value
  • Outliers
  • Relative spike sizes
  • Screen real-estate occupied
  • (example)

Try many colors.
  • Sort colors present in scene by popularity
    cluster them.
  • Create a fuzzy color cone through each cluster.
  • Vary the cone radius.
  • Do the recognition listed on previous slide
    select the color cone with the best score.
  • Fixed color grid (to combat instability!)

(demo of calibration mode)
Head Finding
  • Many heuristics
  • Medium-detail region (Flatness sharpness)
  • But not a long sharp edge
  • Compact body
  • Skin-colored
  • Not the background

Skin color?
  • Fit points in RGB space with an approximating
  • Where do I get a good skin color database?

  • I get to work and check people out at the same
  • (app demo)

Gameplay Recognition Mode
  • Goal Find positions of users torso and arms.
  • When were actually playing the game, we use the
    info provided by calibration to help us.
  • Currently only use shirt skin color.

Body Shape Analysis
  • Slide a square window across the image for each
    window position, use the pixel regions falling
    within the window to perform a local shape
  • Examine the resulting ellipses to find the arms.
    These are long, centered ellipses round regions
    are the torso. (example)
  • Path-trace these to get an ordered series of
    points representing each arm.
  • Fit one or two line segments to this series of
    points (one segment straight arm, two bent).

Hands in front of body?
  • The arm will blend into the body.
  • The hands will look like holes in the body.
  • This messes up arm detection.

Multi-step Process
  • Do a sliding window pass approximate extents of
    torso using initial set of regions (holes may be
  • Look for hand-colored blobs in this area.
  • Merge those blobs with the set of torso regions.
  • Do another sliding window pass, now detecting
    elongated shapes (for arms).

Creating a 3D character pose from 2D information
  • Resolve ambiguities with game-domain constraints
    (e.g. hands always within some plane in front of
  • Use inverse kinematics and some simple body
    knowledge to recover 3D joint angles.
  • See the column The Inner Product in the April
    2002 issue of Game Developer for an explanation
    of 3D IK, and source code.

Method Advantages
  • Its reasonably fast
  • Works with moving background / camera
  • Doesnt care much about shadows

Method Shortcomings
  • Currently confused by similar colors (low
    clustering resolution)
  • Requires a few more technical solutions before it
    will be truly robust (e.g. auto gamma detection).

Future Work
  • Performance 640x480 _at_ 30fps
  • More inverse rendering work (specularity)
  • Local surface modeling (eliminate confusion due
    to similar colors)
  • Texture classification
  • Mental model feedback

Coding Issues
  • How do you get video images from a webcam in
  • VFW code by Nathan dObrenan in Game Programming
    Gems 2
  • Unfortunately, VFW is a legacy API
  • DirectShow is the thing you need to use for
    future compatibility.

DirectShow is terrible!
  • Needlessly complex and bloated.
  • The base classes provided in the DirectX SDK
    induce a lot of latency (latency death)
  • A minimal implementation of just give me a damn
    frame from the camera took 1,500 lines of code
    should have taken 8.
  • Ask me if you want the source code
  • Or use VFW or a proprietary API.

Blatant Plug
  • Experimental Gameplay Workshop
  • Friday, 4pm-7pm, Fairmont Regency I