Combined Gesture-Speech Analysis and Synthesis - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Combined Gesture-Speech Analysis and Synthesis

Description:

The SIMILAR NoE Summer Workshop 2005. Combined Gesture-Speech Analysis and Synthesis ... VirtualDub. Anvil. Paraat. Image Processing and Feature Extraction: ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 21
Provided by: Emr96
Category:

less

Transcript and Presenter's Notes

Title: Combined Gesture-Speech Analysis and Synthesis


1
Combined Gesture-Speech Analysis and Synthesis
  • M. Emre Sargin, Engin Erzin, Yücel Yemez, A.
    Murat Tekalp
  • msargin,eerzin,yyemez,mtekalp_at_ku.edu.tr
  • Multimedia Vision and Graphics Laboratory, Koc
    University

2
Outline
  • Project Objective
  • Technical Description
  • Preparation of Gesture-Speech Database
  • Detection of Gesture Elements
  • Gesture-Speech Correlation Analysis
  • Synthesis of Gestures Accompanying Speech
  • Resources
  • Work Plan
  • Team Members

3
Project Objective
  • The production of speech and gesture is
    interactive throughout the entire communication
    process.
  • Computer-Human Interaction systems should be
    interactive such that, for an edutainment
    application, animated persons speech should be
    aided and complemented by its gestures.
  • Two main goals of this project
  • Analysis and modeling of correlation between
    speech and gestures.
  • Synthesis of correlated natural gestures
    accompanying speech.

4
Technical Description
  • Preparation of Gesture-Speech Database
  • Detection of Gesture Elements
  • Gesture-Speech Correlation Analysis
  • Synthesis of Gestures Accompanying Speech

5
Preparation of Database
  • Gestures of a specific person will be
    investigated.
  • The video database related with that specific
    person should include the gestures that he/she
    frequently uses.
  • Locations of head, arm, elbows, etc. should
    easily be detectable and traceable.

6
Detection of Gesture Elements
  • In this project, we consider arm and head
    gestures.
  • Main tasks included in detection of gesture
    elements
  • Tracking of head region.
  • Tracking of hand and possibly shoulder and elbow.
  • Extraction of gesture features.
  • Recognition and labeling of gestures.

7
Head Region Tracking
  • To extract motion information coming from head
    one should first extract head region.
  • Exhaustive search of head in each frame is a
    possible solution. However this is
    computationally inefficient.
  • Tracking is efficient by the means of
    computational complexity.
  • Motion information calculated for tracking will
    be used for head gesture features.

8
Tracking Methodology
  • Exhaustive search for head region in initial
    frame
  • Haar-Based Face Detection
  • Skin Color information
  • Extraction of motion information from head region
  • Optical flow vectors
  • Fitting global motion parameters optical flow
    vectors
  • Warp search window according to motion
    information.
  • Search for head region in the search window.

9
Head Tracking Results
10
Hand Tracking Methodology
  • Hand region will be extracted using skin color
    information.
  • Robust State-Space Tracking will be applied.
  • Observations are position of hand.
  • States are position, speed and acceleration of
    hand.
  • Kalman Filtering removes unwanted noise from
    features
  • In Regular Kalman Filter, parameters are fixed.
  • In Robust Kalman Filter parameters are
    re-adjusted for each iteration to minimize MSE
    and overcome the effects of abrupt changes in
    motion of hand.

11
Extraction of Gesture Features
  • Head Gesture Features Global Motion Parameters
    calculated within head region will be used.
  • Hand Gesture Features Hand center of mass
    position and calculated velocity will form hand
    gesture features.

12
Gesture-Speech Correlation Analysis
  • Recognized gestures are labeled w.r.t. time.
  • Head Gestures Down, Up, Left, Right, Left-Right,
  • Arm Gestures Abduction, Adduction, Extension,
  • Recognized speech patterns are labeled w.r.t.
    time.
  • Semantic Info Approval, Refusal phrases, etc.
  • Prosodic Info Intonational phrases, ToBI
    transcriptions, etc.
  • Correlation Analysis via examining
  • Co-occurrence Matrix
  • Input/Output Hidden Markov Models

13
Co-occurrence Matrix
  • Estimation of joint probability distribution
    function, f(g,s)
  • For each time sample give a vote to related
    gesture-speech label pair.
  • For a specific speech element the most correlated
    gesture feature will be
  • giargmax ( f (gx,si) )
  • Relatively easy to compute.
  • Gives an intuition about what we are examining.

x
14
Input/Output Hidden Markov Models
  • IOHMM is a graphical model which allows the
    mapping of input sequences into output sequences.
  • It is used in three tasks of sequence processing
  • Prediction
  • Regression
  • Classification
  • The model is trained to maximize the conditional
    distribution of an output sequence y1,,yt
    given an input sequence x1,,xt.
  • In our project
  • Input sequence will be speech labels.
  • Output sequence will be gesture labels.

15
Synthesis of Gestures Accompanying Speech
  • Based on the methodology used in correlation
    analysis given a speech signal
  • Features will be extracted.
  • Most probable speech label will be designated to
    speech patterns.
  • Gesture pattern that is most correlated with
    speech pattern will be used to animate a stick
    model of a person.

16
Resources
  • Database Preparation and Labeling
  • VirtualDub
  • Anvil
  • Paraat
  • Image Processing and Feature Extraction
  • Matlab Image Processing Toolbox
  • OpenCV Image Processing Library
  • Gesture-Speech Correlation Analysis
  • HTK HMM Toolbox
  • Torch Machine Learning Library

17
Work Plan
  • Timeline of the project
  • Schedule of the lectures

18
Team Members
  • Ferda Ofli
  • Koc University
  • Image, Video Processing and Feature Extraction
  • Yelena Yasinnik
  • Massachusetts Institute of Technology
  • Audio-Visual Correlation Analysis
  • Oya Aran
  • Bogazici University
  • Gesture Based Human-Computer Interaction Systems

19
Team Members
  • Alexey Anatolievich Karpov
  • Saint-Petersburg Institute for Informatics and
    Automation
  • Speech Based Human-Computer Interaction Systems
  • Stephen Wilson
  • University College Dublin
  • Audio-Visual Gesture Annotation
  • Alexander Refsum Jensenius
  • Department of Music, Oslo University
  • Gesture Analysis

20
References
  • Jie Yao and Jeremy R. Cooperstock, Arm Gesture
    Detection in a Classroom Environment, Proc.
    WACV02 pp. 153-157, 2002.
  • Y. Azoz, L. Devi. R. Sharma, Tracking Hand
    Dynamics in Unconstrained Environments, Proc.
    Int. Conference on Automatic Face and Gesture
    Recognition98 pp. 274-279, 1998.
  • S. Malassiotis, N. Aifanti, M.G. Strintzis, A
    Gesture Recognition System Using 3D Data, Proc.
    Int. Symposium on 3D Data Processing
    Visualization and Transmission02 pp.
    190-193,2002.
  • J-M. Chung, N. Ohnishi, Cue Circles Image
    Feature for Measuring 3-D Motion of Articulated
    Objects Using Sequential Image Pair, Proc. Int.
    Conference on Automatic Face and Gesture
    Recognition98 pp. 474-479, 1998.
  • S. Kettebekov, M. Yeasin, R. Sharma, Prosody
    based co-analysis for continuous recognition of
    coverbal gestures,Proc. ICMI02 pp.161-166,
    2002.
  • F. Quek, D. McNeill, R. Ansari, X-F. Ma, R.
    Bryll, S. Duncan, K.E. McCullough Gesture cues
    for conversational interaction in monocular
    video, Proc. Int. Workshop on Recognition,
    Analysis, and Tracking of Faces and Gestures in
    Real-Time Systems99 pp. 119-126, 1999.
  • For detailed information visit
    http//htk.eng.cam.ac.uk
  • Rabiner, L. Juang, B., An introduction to
    hidden Markov models ASSP Magazine, IEEE, Vol.3,
    Iss.1, pp. 4- 16, Jan 1986
  • Jae-Moon Chung Ohnishi, N., Cue circles image
    feature for measuring 3-D motion of articulated
    objects using sequential image pair Automatic
    Face and Gesture Recognition, 1998. Proceedings.
    Third IEEE International Conference on, Vol.,
    Iss., pp. 474-479, 14-16 Apr 1998
  • A. Just, O. Bernier, S. Marcel., Recognition of
    isolated complex mono- and bi-manual 3D hand
    gestures Proc. 6. ICAFGR, 2004   
Write a Comment
User Comments (0)
About PowerShow.com