Gesture Recognition - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Gesture Recognition

Description:

Labels 'left hand' and 'right hand' are assigned to to whichever blob is leftmost and rightmost. ... of least inertia1 (found by first eigenvector of the blob) ... – PowerPoint PPT presentation

Number of Views:2421
Avg rating:1.0/5.0
Slides: 39
Provided by: jaronsc
Category:

less

Transcript and Presenter's Notes

Title: Gesture Recognition


1
Gesture Recognition
  • 1. Recognition of parameterized gestures
  • 2. Real-time sign language recognition using a
  • single video camera
  • Jaron Schaeffer
  • Jaron.Schaeffer_at_jayweb.de

2
TOC
  • Recognition of parameterized gestures
  • Parametric gestures
  • Previous Approaches
  • Parametric Gaussian Hidden Markov Models
  • Training/Testing
  • Results
  • Real-time sign language recognition using a
    single video camera
  • Objective
  • Feature Extraction
  • The desk-based recognizer
  • The wearable-based recognizer

3
Part 1Recognition and Interpretation of
Parametric Gesture
1. Parametric gestures 2. Previous Approaches 3.
Parametric Gaussian Hidden Markov Models 4.
Training/Testing 5. Results
4
Recognition and Interpretation of Parametric
Gesture
  • What is a parametric gesture?
  • A gesture that has a parameter T
  • which is needed to fully understand
  • the gesture.
  • In the example, T is the size of
  • the fish, given by the distance
  • between the signers hands.
  • Another example a pointing gesture,
  • where T is the direction pointed
  • to.

I caught a fish. It was this big.
5
Recognition of Parametric GesturePrevious
Approaches 1
  • Ad-hoc method for each gesture to be
  • recognized
  • Use an ad-hoc method to extract the parameter T
    for each different parametric gesture
  • Problems
  • Difficult to write
  • Only works for gestures already labeled
  • Unknown gestures have to be modelled as noise
    from an existing prototype
  • A new method is needed for each gesture

6
Recognition of Parametric GesturePrevious
Approaches 2
  • Use multiple HMMs to cover the parameter
  • space
  • Use a HMM for each possible value of T in
    parameter space
  • Problems
  • Unknown, how many separate models will be
    necessary
  • As dimensionality of parameter space increases, a
    large number of models will be needed
  • Unreasonable demands on the amount of training
    data

7
RepetitionStandard continuous Gaussian HMMs
8
RepetitionStandard continuous Gaussian HMMs
  • Example Gaussian HMM

2
1
3
Likelihood for output of 2 is about 10 given the
system is in state 2
9
Parametric Gaussian HMMs The model
10
Parametric Gaussian HMMsTraining
  • Training means Set the HMM parameters to
    maximize the probability of the training
    sequences
  • Each training sequence is paired with a value of
    T
  • Baum-Welch form of expectation-maximization alg.
    is used to update the parameters of the output
    probability distributions

11
Training Parametric Gaussian HMMsExpectation-Max
imization algorithm
  • Assumption In addition to the observable data
    (the observation sequence xt), there is hidden
    data (the state sequence qt)
  • Expectation-Maximization algorithm
  • Expectation
  • Compute/guess value of the hidden data given
    some of the observable data (Forward/Backward-Alg.
    )
  • Maximization
  • Given this guess at the hidden data, compute an
    updated value of the parameters
  • Repeat until satisfied (change in parameters is
    small)
  • A lot of math no more details here

12
Training Parametric Gaussian HMMsTraining
results
  • After applying the EM algorithm for each training
    sequence, we get new values for
  • Ready for testing!

13
Recognition of Parametric GestureTesting
  • Testing
  • Given a parameterized HMM and an input sequence,
  • we wish to compute T and the probability of the
    input
  • sequence.
  • Extracing T
  • Complicated in contrast to normal HMM testing
  • Again, use an Expectation-Maximization (EM)
    algorithm
  • that finally leads to
  • Probability of the input sequence given T Use
    Viterbi.

14
Recognition of Parametric GestureResults
STIVE input and output
  • Testing for the fish size
  • gesture
  • 30 examples of the fish gesture were collected
    using STIVE (STereo INteravive Virtual
    Environment) at a frame rate of 20Hz
  • STIVE returned the 3D positions of head and hands
  • Each sequence in average 43 samples long
  • T interpreted as fish size in inches
  • Values varied from 7.7 in (small fish) to 36.6
    inches (repectable catch)

15
Recognition of Parametric GestureResults
STIVE input and output
  • Testing for the fish size
  • gesture
  • 6 state parameterized HMM with no skip
    transitions or backtransitions
  • Training with randomly chosen 15 sequences out of
    the 30, rest for testing

16
Recognition of Parametric GestureResults
Testing for the size gesture
Standard derivation

mean
Average absolute error of only 0.16 in
17
Recognition of Parametric GestureResults
  • Testing for the pointing gesture
  • HMM now parameterized by more than one variable
    ((X/Y) position of the plane in front of the
    user)
  • Motion capture system to record wrist position of
    right hand at a frame rate of 30Hz
  • 50 sequences collected
  • T interpreted as position of the wrist on the
    pointing plane
  • 8 state parameterized HMM with no skip
    transitions or backtransitions
  • 20 sequences for training, 30 for testing

18
Recognition of Parametric GestureResults
  • Testing for the pointing gesture Results

19
Recognition of Parametric GestureResults under
noise
The average error as a funtion of noise
  • N(0, x)-distributed noise added for testing
  • f(x) is mean error between estimated/measured T
    under noise and measured T in the noise-free case
  • Under noise, the HMM performs even better than
    directly measuring T
  • Why?
  • Direct measuring is more sensitive to noise,
    since only one still image is used to measure T
    the HMM uses the complete sequence to extract T.

f(x)
x
20
Recognition of Parametric GestureResults
  • Results quite good
  • Why?
  • Magnitude of Wj greatest for states corresponding
    to the middle phase of the gestures
  • In the middle phases of the gestures, variation
    of T maximally impacts the execution of the
    gesture
  • System automatically learns which segment in the
    gesture is most diagnostic of T

21
Part 2Real-time sign language recognition using
a single video camera
1. Objective 2. Feature Extraction 3. The
desk-based recognizer 4. The wearable-based
recognizer
22
Objective
  • Recognition of sentence-level American Sign
    Language (ASL)
  • Sentences of the form
  • personal pronoun verb noun adjective
    (same) personal pronoun
  • are to be recognized
  • Example I like cars red

23
The American Sign Language
  • Language of Choice for most deaf in the United
    States
  • Uses approx. 6000 gestures for common words and
    finger spelling for communicating obscure words
  • Signed conversations proceed at about the pace of
    spoken conversation
  • Some aspects of ASL ignored for simplification
  • Storing objects in space for later reference,
    moving of eyebrows for questions or directives

24
Understanding ASLThe Task
  • Two extensible HMM-based systems are provided for
    recognition, both using one color camera
  • Desk mounted camera in front of user
  • Camera mounted in a cap worn by the user
  • Tracking stage does not attempts fine description
    of hand shape, instead concentrates on the
    evolution of the gestures through time
  • 40-words test lexicon with words that would
    generate coherent sentences given the grammar
    constraint

25
Understanding ASL Hidden Markov Modeling
  • Estimate the number of different states involved
    in specifying a sign to determine the initial HMM
    topology
  • For less complicated signs, skip transitions can
    be introduced
  • Here, a 4 state HMM with one skip transition was
    determined to be appropriate

26
Understanding ASL Feature extraction - Hardware
  • Hands are tracked in real-time using a single
    color camera
  • 320x243 pixel resolution
  • Silicon graphics 200Mhz workstation maintains
    hand tracking at 10 frames per second
    (sufficient)
  • Natural color of hands is needed

27
Understanding ASL Feature Extraction - Hand
segmentation
  • Hand segmentation
  • To segment each hand initially, find a pixel of
    the natural hand color in the image
  • Take this pixel as a seed and tolerantly grow the
    hand region by checking the 8 neighbours for the
    appropriate color
  • Labels left hand and right hand are assigned
    to to whichever blob is leftmost and rightmost.

Seed pixels
right hand
left hand
What about occluding hands?
28
Understanding ASL Feature extraction Features
used
  • 16 element feature vector contructed for each
    hand
  • Centroid (X,Y) position
  • Change in (X,Y) to previous frame
  • Area in Pixels
  • Angle of axis of least inertia1 (found by first
    eigenvector of the blob)
  • Length of this eigenvector
  • Eccentricity2 of bounding ellipse

1. Inertia Trägheit 2. Eccentricity Hier
Abweichung von der Kreisform
29
Understanding ASL Feature Extraction Occluding
hands
  • Occlusion in hand
  • segmentation
  • Only one large blob
  • Assign each of the two hands the features of this
    single large blob
  • This method, combined with the time context
    provided by HMM, is sufficient to distinguish
    many different signs that have hand occlusions as
    a trait

30
Understanding ASL The desk-based recognizer
  • Camera on a desk in front of the user
  • 478 sentences used, constructed from the 40-words
    lexicon
  • Each sign is 1 to 3 seconds long
  • No pause between signs in a sentence, but
    sentences themselves are distinct
  • 384 sentences used for training, rest for testing

31
Understanding ASL The desk-based recognizer -
Training
  • Sentences are divided in five equal portions for
    initial segmentation
  • Initial estimates for the means and variances of
    the output prob. are provided iteratively using
    Viterbi alignment
  • Result are fed into a Baum-Welch re-estimator
    whose estimates are refined in embedded training
  • Contexts are not used, since they would require
    more data to train

32
Understanding ASL The desk-based recognizer
Test 1
  • Uses part-of-speech grammar
  • personal pronoun verb noun adjective
    (same) personal pronoun
  • Word recognition accuracy Acc is calculated by
  • N total number of words in test set
  • S number of substitutions
  • No insertions or deletions, since number and
    class of words to be recognized is known
  • Acc Percentage of correctly recognized words

33
Understanding ASL The desk-based recognizer
Test 2
  • Does not use part-of-speech grammar
  • Word recognition accuracy Acc is calculated by
  • N total number of words in test set
  • S number of substitutions
  • I number of insertions
  • D number of deletions
  • Insertions and deletions possible, since number
    of words an word class unknown
  • Acc can now be negative

34
Understanding ASL The desk-based recognizer
Results
  • Third test performed Strip the absolute (X,Y)
    positions from the feature vector
  • Simulates use of the recognizer in daily use if
    the signer is not always in the same position
    when the system is used
  • Word accuracy results

35
Understanding ASL The wearable-based recognizer
  • Camera mounted on a cap worn by the signer
  • Same 500 sentences
  • At beginning and end of sentence, hands were
    often found in a resting position
  • To take this into account, another token called
    silence was added to the dictionary
  • 400 sentences for training, 100 for testing

36
Understanding ASL The wearable-based recognizer
  • New grammar for testing purposes Only
    restriction is that each sentence is 5 words long
  • Word Accuracy Rate Acc is calculated in the same
    way as with the desk-based recognizer

37
Understanding ASL The wearable-based recognizer
- Results
38
End of presentation
  • Thanks for your attention!
  • References
  • Real-Time American Sign Language Recognition
    Using Desk and Wearable Computer Based Video
  • Thad Starner, Joshua Weaver, Alex Pentland
  • M.I.T. Media Laboratory Perceptual Computing
    Section Rechnical Resport No. 466
  • IEEE PAMI 1998
  • Recognition and Interpretation of Parametric
    Gesture
  • Andrew D. Wilson, Aaron F. Bobick
  • M.I.T. Media Laboratory Perceptual Computing
    Section Rechnical Resport No. 421
  • Internactional Conference on Computer Vision,
    1998
  • An Introduction to Hidden Markov models
  • L.R. Rabiner and B.H. Juang
  • IEEE ASSP Magazine, p. 4-16, Jan 1986

Any questions?
Write a Comment
User Comments (0)
About PowerShow.com