multimodal emotion recognition and expressivity analysis ICME 2005 Special Session - PowerPoint PPT Presentation

About This Presentation
Title:

multimodal emotion recognition and expressivity analysis ICME 2005 Special Session

Description:

capability of machines to recognize, express, model, communicate and respond to ... deictic/conversational gestures 'body language' ... – PowerPoint PPT presentation

Number of Views:289
Avg rating:3.0/5.0
Slides: 41
Provided by: IVML
Category:

less

Transcript and Presenter's Notes

Title: multimodal emotion recognition and expressivity analysis ICME 2005 Special Session


1
multimodal emotion recognition and expressivity
analysisICME 2005 Special Session
  • Stefanos Kollias, Kostas Karpouzis
  • Image, Video and Multimedia Systems Lab National
    Technical University of Athens

2
expressivity and emotion recognition
  • affective computing
  • capability of machines to recognize, express,
    model, communicate and respond to emotional
    information
  • computers need the ability to recognize human
    emotion
  • everyday HCI is emotional three-quarters of
    computer users admit to swearing at computers ?
  • user input and system reaction are important to
    pinpoint problems or provide natural interfaces

3
the targeted interaction framework
  • Generating intelligent interfaces with affective,
    learning, reasoning and adaptive capabilities.
  • Multidisciplinary expertise is the basic means
    for novel interfaces, including perception and
    emotion recognition, semantic analysis,
    cognition, modelling and expression generation
    and production of multimodal avatars capable of
    adapting to the goals and context of interaction.
  •  Humans function due to four primary modes of
    being, i.e., affect, motivation, cognition, and
    behavior these are related to feeling, wanting,
    thinking, and acting.
  • Affect is particularly difficult requiring to
    understand and model the causes and consequences
    of emotions.  The latter, especially as realized
    in behavior, is a daunting task

4
(No Transcript)
5
everyday emotional states
I think you might be getting just a wee bit
bored maybe a coffee?
  • dramatic extremes (terror, misery, elation) are
    fascinating, but marginal for HCI.
  • the target of an affect-aware system
  • register everyday states with an emotional
    component excitement, boredom, irritation,
    enthusiasm, stress, satisfaction, amusement
  • achieve sensitivity to everyday emotional states

6
affective computing applications
  • detect specific incidents/situations that need
    human intervention
  • e.g. anger detection in a call center
  • naturalistic interfaces
  • keyboard/mouse/pointer paradigm can be difficult
    for the elderly, handicapped people or children
  • speech and gesture interfaces can be useful

7
the EU perspective
  • Until 2002, related research was dominated by
    mid-scale projects
  • ERMIS multimodal emotion recognition (facial
    expressions, linguistic and prosody analysis)
  • NECA networked affective ECAs
  • SAFIRA affective input interfaces
  • NICE Natural Interactive Communication for
    Edutainment
  • MEGA Multisensory Expressive Gesture
    Applications
  • INTERFACE Multimodal Analysis/Synthesis System
    for Human Interaction to Virtual and Augmented
    Environments

8
the EU perspective
  • FP6 (2002-2006) issued two calls for multimodal
    interfaces
  • Call 1 (April 2003) and Call 5 (September 2005)
    covering multimodal and multilingual areas
  • Integrated Projects AMI Augmented Multi-party
    Interaction and CHIL - Computers In the Human
    Interaction Loop
  • Networks of Excellence Humaine and Similar
  • Other calls covered Leisure and entertainment,
    e-Inclusion, Cognitive systems and Presence
    and Interaction

9
the HUMAINE Network of Excellence
  • FP6 Call 1 Network of Excellence Research on
    Emotions and Human-Machine Interaction
  • start 1st January 2004, duration 48 months
  • IST thematic priority Multimodal Interfaces
  • emotions in human-machine interaction
  • creation of a new, interdisciplinary research
    community
  • advancing the state of the art in a principled way

10
the HUMAINE Network of Excellence
  • 33 partner groups from 14 countries
  • coordinated by Queens University of Belfast
  • goals of HUMAINE
  • integrate existing expertise in psychology,
    computer engineering, cognition, interaction and
    usability
  • promote shared insight
  • http//emotion-research.net

11
moving forward
  • future EU orientations include (extracted from
    Call 1 evaluation, 2004)
  • adaptability and re-configurable interfaces
  • collaborative technologies and interfaces in the
    arts
  • less explored modalities, e.g. haptics,
    bio-sensing
  • affective computing, including character and
    facial expression recognition and animation
  • more product innovation and industrial impact
  • FP7 direction Simulation, Visualization,
    Interaction, Mixed Reality
  • blending semantic/knowledge and interface
    technologies

12
the special session
  • segment-based approach to the recognition of
    emotions in speech
  • M. Shami, M. Kamel, University of Waterloo
  • comparing feature sets for acted and spontaneous
    speech in view of automatic emotion recognition
  • T. Vogt, E. Andre, University of Augsburg
  • annotation and detection of blended emotions in
    real human-human dialogs recorded in a call
    center
  • L. Vidrascu, L. Devillers, LIMSI-CNRS, France
  • a real-time lip sync system using a genetic
    algorithm for automatic neural network
    configuration
  • G. Zoric, I. Pandzic, University of Zagreb
  • visual/acoustic emotion recognition
  • Cheng-Yao Chen, Yue-Kai Huang, Perry Cook,
    Princeton University
  • an intelligent system for facial emotion
    recognition
  • R. Cowie, E. Douglas-Cowie, Queens University of
    Belfast, J. Taylor, King's College, S. Ioannou,
    M. Wallace, IVML/NTUA

13
the big picture
  • feature extraction from multiple modalities
  • prosody, words, face, gestures, biosignals
  • unimodal recognition
  • multimodal recognition
  • using detected features to cater for affective
    interaction

14
audiovisual emotion recognition
  • the core system combines modules dealing with
  • visual signs
  • linguistic content of speech (what you say)
  • paralinguistic content (how you say it)
  • and recognition based on all the signs

15
facial analysis module
  • face detection, i.e. finding a face without prior
    information about its location
  • using prior knowledge about where to look
  • face tracking
  • extraction of key regions and points in the face
  • monitoring of movements over time (as features
    for users expressions/emotions)
  • provide confidence level for the validity of each
    detected feature

16
facial analysis module
  • face detection, obtained through SVM
    classification
  • facial feature extraction, by robust estimation
    of the primary facial features, i.e., eyes,
    mouth, eyebrows and nose
  • fusion of different extraction techniques, with
    confidence level estimation.
  • MPEG-4 FP and FAP feature extraction to feed the
    expression and emotion recognition task.
  • 3-D modeling for improved accuracy in FP and FAP
    feature estimation, at an increased computational
    load, when the facial user model is known.

17
facial analysis module
the extracted mask for the eyes
detected feature points in the masks
18
FAP estimation
  • Absence of clear quantitative definition of FAPs
  • It is possible to model FAPs through FDP feature
    points movement using distances s(x,y)

e.g. close_t_r_eyelid (F20) - close_b_r_eyelid
(F22) ? D13s (3.2,3.4) ? f13 D13 - D13-NEUTRAL
19
face detection

face
Classify

no face
SVM classifier

20
face detection
detected face
estimation of the active contour of the face
extraction of the facial area
21
facial feature extraction
extraction of eyes and mouth key regions within
the face
extraction of MPEG-4 Facial Points (FPs) key
points in the eye mouth regions
22
other visual features
  • visemes, eye gaze, head pose
  • movement patterns, temporal correlations
  • hand gestures, body movements
  • deictic/conversational gestures
  • body language
  • measurable parameters to render expressivity on
    affective ECAs
  • spatial extent, repetitiveness, volume, etc.

23
video analysis using 3D
  • Step 1 Scan or approximate 3d model
  • (in this case estimated from video data only
    using face space approach)

24
video analysis using 3D
  • Step 2 Represent 3d model using a
  • predefined template geometry, the same template
  • is used for expressions.
  • This template shows higher
  • density around eyes, and mouth
  • and lower density around flatter
  • areas such as cheeks, forehead,
  • etc.

25
video analysis using 3D
  • Step 3 Construct database of facial expressions
    by recording various actors. The statistics
    derived from these performances is stored in
    terms of a Dynamic Face Space
  • Step 4 Apply the expressions to the actor in the
    video data

26
video analysis using 3D
  • Step 5 Matching rotate head apply various
    expressions and match current state with 2D video
    frame
  • - Global Minimization process

27
video analysis using 3D
  • the global matching/minimization process is
    complex
  • it is sensitive to
  • illumination, which may vary across sequence,
  • shading, shadowing effects on the face,
  • color changes, or color differences
  • variability in expressions, some expressions
  • can not be generated using the statistics of the
    a priori recorded sequences
  • it is time consuming (several minutes per frame)

28
video analysis using 3D
Local template matching
Pose estimation
29
video analysis using 3D
30
video analysis using 3D
3D models
31
video analysis using 3D
Add expressions
32
auditory module
  • linguistic analysis aims to extract the words
    that the speaker produces
  • paralinguistic analysis aims to extract
    significant variations in the way words are
    produced - mainly in pitch, loudness, timing, and
    voice quality
  • both are designed to cope with the less than
    perfect signals that are likely to occur in real
    use

33
linguistic analysis
(a)
(a) The Linguistic Analysis Subsystem (b) The
Speech Recognition Module
Acoustic
Language
Enhanced
Modeling
Modeling
Speech
Signal
(b)
Parameter
Search
Extraction
Dictionary
Engine
Module
Text
34
paralinguistic analysis
  • ASSESS, developed by QUB, describes speech at
    multiple levels
  • intensity spectrum edits, pauses, frication
    raw pitch estimates a smooth fitted curve
    rises falls in intensity pitch

35
integrating the evidence
  • level 1
  • facial emotion
  • phonetic emotion
  • linguistic emotion
  • level 2
  • total emotional state (result of the "level 1
    emotions")
  • modeling technique fuzzy set theory (research by
    Massaro suggests this models the way humans
    integrate signs)

36
integrating the evidence
  • mechanisms linking attention and emotion in the
    brain form a useful model

Goals (SFG)
IMC (hetermodal CX)
Goals (Inhibition) ACG
Visual Input
Thalamus /Superior Colliculus
Salience NBM (Ach source)
Valence (Amygdala)
37
biosignal analysis
  • different emotional expressions produce different
    changes in autonomic activity
  • anger increased heart rate and skin temperature
  • fear increased heart rate, decreased skin
    temperature
  • happiness decreased heart rate, no change in
    skin temperature
  • easily integrated with external channels (face
    and speech)
  • presentation by J. Kim in the HUMAINE WP4
    workshop, September 2004

38
biosignal analysis
Acoustics and noise
EEG Brain waves
Respiration Breathing rate
Temperature
EMG Muscle tension
BVP- Blood volume pulse
GSR Skin conductivity
EKG Heart rate
39
biosignal analysis
  • skin-sensing requires physical contact
  • need to improve accuracy, robustness to motion
    artifacts
  • vulnerable to distortion
  • most research measures artificially elicited
    emotions in a lab environment and a from single
    subject
  • different individuals show emotion with different
    response in autonomic channels (hard for
    multi-subjects)
  • rarely studied physiological emotion recognition,
    literature offers ideas rather than well-defined
    solutions

40
multimodal emotion recognition
  • recognition models- application dependency
  • discrete / dimensional / appraisal theory models
  • theoretical models of multimodal integration
  • direct / separate / dominant / motor integration
  • modality synchronization
  • visemes/ EMGs FAPs / SC-RSP speech
  • temporal evolution and modality sequentiality
  • multimodal recognition techniques
  • classifiers context goals
    cognition/attention modality significance in
    interaction
Write a Comment
User Comments (0)
About PowerShow.com