Using Wordlevel Features to Better Predict Student Emotions during Spoken Tutoring Dialogues - PowerPoint PPT Presentation

About This Presentation
Title:

Using Wordlevel Features to Better Predict Student Emotions during Spoken Tutoring Dialogues

Description:

What happens to the velocity of a body when there is no force acting on it? ... Are there any other forces acting on the apple as it falls? ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 30
Provided by: MihaiR3
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Using Wordlevel Features to Better Predict Student Emotions during Spoken Tutoring Dialogues


1
Using Word-level Features to Better Predict
Student Emotions during Spoken Tutoring Dialogues
  • Mihai Rotaru
  • Diane J. Litman
  • DoD Group Meeting Presentation

2
Introduction
  • Why is important to detect/handle emotions?
  • Emotion annotation
  • Classification task
  • Previous work

3
(Spoken) Tutoring dialogues
  • Education
  • Classroom setting
  • Human (one-on-one) tutoring
  • Computer tutoring (ITS Intelligent Tutoring
    Systems)
  • Addressing the learning gap between human and
    computer tutoring
  • Dialogue-based ITS (Ex Why2)
  • Improve the language understanding module of ITS
  • Incorporate affective reasoning
  • Connection between learning and student emotional
    state
  • Adding human-provided emotional scaffolding to a
    reading tutor increases student persistence (Aist
    et al., 2002)

4
Human-Computer Excerpt
  • Tutor26 Let me ask my question in a more
    general way. What happens to the velocity of a
    body when there is no force acting on it?
  • Student27 dammit (ASR it is)
  • Tutor28 Could you please repeat that?
  • Student29 same (ASR i same)
  • Tutor30 Right. And what happens to the
    HORIZONTAL velocity of a body when there is no
    HORIZONTAL force acting on it?
  • Student31 zero (ASR the zero)
  • Tutor32 Well... OK, let's consider how forces
    in the vertical direction affect the motion of a
    body in a horizontal direction in a different
    scenario ltomittedgt
  • Student33 oh gravity you already said this (ASR
    beats gravity you're exert less)
  • Tutor34 Fine. Are there any other forces acting
    on the apple as it falls?
  • Student35 no why are you doing this again (ASR
    no y and to it yes)
  • Tutor36 Correct. The only force acting on the
    apple as it falls to the earth is the earth's
    gravitational force. In which direction does
    gravitational force act?
  • Student37 downward you computer (ASR downward
    you computer)

5
Affective reasoning
  • Prerequisites
  • Dialogue-based ITS Why2
  • Interaction via speech ITSPOKE (Intelligent
    Tutoring SPOKEn dialogue system)
  • Affective reasoning
  • Detect student emotions
  • Handle student emotions

6
  • Back-end is Why2-Atlas system (VanLehn et al.,
    2002)
  • Sphinx2 speech recognition and Cepstral
    text-to-speech

7
  • Back-end is Why2-Atlas system (VanLehn et al.,
    2002)
  • Sphinx2 speech recognition and Cepstral
    text-to-speech

8
  • Back-end is Why2-Atlas system (VanLehn et al.,
    2002)
  • Sphinx2 speech recognition and Cepstral
    text-to-speech

9
Student emotions
  • Emotion annotation
  • Perceived, intuitive expressions of emotion
  • Relative to other turns in context and tutoring
    task
  • 3 Main emotion classes
  • Negative - e.g. uncertain, bored, irritated,
    confused, sad (question turns)
  • Positive - e.g. confident, enthusiastic
  • Neutral - no strong expression of negative or
    positive emotion (grounding turns)
  • Corpora
  • Human-Human (453 student turns from 10 dialogues)
  • Human-Computer (333 student turns from 15
    dialogues)

10
Annotation example
  • Tutor Uh let us talk of one car first.
  • Student ok. (EMOTION NEUTRAL)
  • Tutor If there is a car, what is it that exerts
    force on the car such that it accelerates
    forward?
  • Student The engine. (EMOTION POSITIVE)
  • Tutor Uh well engine is part of the car, so how
    can it exert force on itself?
  • Student um (EMOTION NEGATIVE)

11
Classification task
  • 3 Levels of Annotation Granularity
  • NPN - Negative, Positive, Neutral
  • NnN - Negative, Non-Negative
  • positives and neutrals are conflated as
    Non-Negative
  • EnE - Emotional, Non-Emotional
  • negatives and positives are conflated as
    Emotional neutrals are Non-Emotional
  • useful for triggering system adaptation (HH
    corpus analysis)
  • Agreed subset
  • Predict the class of each student turn

12
Previous work - Features
  • Human-Human
  • 5 feature types
  • Acoustic-prosodic
  • amplitude, pitch, duration
  • Lexical
  • Other automatic
  • Manual
  • Identifiers
  • Combinations
  • Current turn
  • Contextual
  • Local previous two turns
  • Global all turns so far
  • Human-Computer
  • 3 feature types
  • Acoustic-prosodic
  • amplitude, pitch, duration
  • Lexical
  • Other automatic
  • Manual
  • Identifiers
  • Combinations

13
Previous work - Results
Litman and Forbes, ACL 2004
14
How to improve?
  • Use word-level features instead of turn-level
    features
  • Extend the pitch features set
  • Simplified word-level emotion model

15
Why word-level features?
  • Emotion might not be expressed over the entire
    turn
  • This is great

Angry
Happy
16
Why word-level features? (2)
  • Can approximate pitch contour better at sub-turn
    levels.
  • Especially for longer turns

This is great
17
Extended pitch features set
  • Previous work
  • Min, Max
  • Avg, Stdev
  • Extend with
  • Start, End
  • Regression coefficient and regression error
  • Quadratic regression coefficient

from Batliner et al. 2003
18
But wait
Features
Machine learning
Student turn
Turn emotional class
321654615, asdakd, 342.234234 Asdhkas, a34334,
324,7657755
Turn-level
Word-level
Word 1
321654615, asdakd, 342.234234 Asdhkas, a34334,
324,7657755
?


Turn emotional class
Word n
321654615, asdakd, 342.234234 Asdhkas, a34334,
324,7657755
Machine learning
321654615, asdakd, 342.234234 Asdhkas, a34334,
324,7657755
Sönmez et al., 1998
19
Word-level emotion model
Features
Machine learning
Student turn
Turn emotional class
321654615, asdakd, 342.234234 Asdhkas, a34334,
324,7657755
Turn-level
Word-level
Word-level emotion
Word 1
321654615, asdakd, 342.234234 Asdhkas, a34334,
324,7657755



Turn emotional class
Word n
Word-level emotion
321654615, asdakd, 342.234234 Asdhkas, a34334,
324,7657755
20
Word-level emotion model
  • Training phase
  • Each word labeled with turn class
  • Extra features to identify the position of the
    word in the turn (distance in words from the
    beginning and end of the turn)
  • Learn emotion model at the word level
  • Test phase
  • Predict each word class based on the learned
    model
  • Use majority/weighted voting to label the turn
    based on its word classes
  • Ties are broken randomly

21
Questions to answer
  • Will word level feature work better than turn
    level features for emotion prediction?
  • Yes
  • If yes, where does the advantage comes from?
  • Better prediction of longer turns
  • Is there a feature set that offers robust
    performance?
  • Yes. Combination of pitch and lexical features at
    word level.

22
Experiments
  • EnE classification, agreed turns
  • Two contrasting corpora
  • Two contrasting learners (WEKA)
  • IB1 nearest neighbor classifier
  • ADA boosted decision trees

23
Feature sets
  • Only pitch and lexical features
  • 6 sets of features
  • Turn level
  • Lex-Turn only lexical
  • Pitch-Turn only pitch
  • PitchLex-Turn lexical and prosodic
  • Word level
  • Lex-Word only lexical positional
  • Pitch-Word only pitch positional
  • PitchLex-Word lexical and prosodic positional
  • Baseline majority class
  • 10 x 10 cross validation

24
Results IB1 on HH
  • Word-level features significantly outperform
    turn-level features
  • Word-level better than turn-level on longer turns
  • Best performers Lex-Word, PitchLex-Word

25
Results ADA on HH
  • Turn-level performance increases a lot
  • Word-level significantly better than turn-level
    on features sets with pitch
  • Word-level better than turn-level on longer turns
    but the difference is smaller
  • Best performers Lex-Turn, Lex-Word,
    PitchLex-Word

26
Results IB1 on HC
  • Word-level features significantly outperform
    turn-level features
  • Lexical information less helpful than on HH
    corpus
  • Word-level better than turn-level on longer turns
  • Best performers Pitch-Word, PitchLex-Word

27
Results ADA on HC
  • Difference not significant anymore
  • IB1 better than ADA on word-level features
  • ADA has bigger variance on this corpus
  • Word-level better than turn-level on longer turns
    but the difference is smaller
  • Best performers Pitch-Turn, Pitch-Word,
    PitchLex-Turn, PitchLex-Word

28
Discussion
  • Lexical features at turn and word-level are
    similar
  • Performance dependent on corpus and learner
  • Pitch features differ significantly
  • Word-level better than turn-level (4/6)
  • PitchLex-Word a consistent best performer
  • Our best accuracies comparable with previous work

29
Conclusions Future work
  • Word-level better than turn-level for emotion
    prediction
  • Even under a very simple word-level emotion model
  • Word-level better at predicting longer turns
  • PitchLex-Word a consistent best performer
  • Future work
  • More refined word-level emotion models
  • HMMs
  • Co-training
  • Filter irrelevant words
  • Use the prosodic information left out
  • See if our conclusions generalize on detecting
    student uncertainty
  • Experiment with other sub-turn units (breath
    groups)

30
Feature Extraction per Student Turn
  • Five feature types
  • acoustic-prosodic (1)
  • non acoustic-prosodic
  • lexical (2)
  • other automatic (3)
  • manual (4)
  • identifiers (5)
  • Research questions
  • utility of different features
  • speaker and task dependence

31
Feature Types (1)
  • Acoustic-Prosodic Features (normalized)
  • 4 pitch (f0) max, min, mean, standard dev.
  • 4 energy (RMS) max, min, mean, standard dev.
  • 4 temporal turn duration (seconds)
  • pause length preceding turn (seconds)
  • tempo (syllables/second)
  • internal silence in turn (zero f0
    frames)
  • ? available to ITSPOKE in real time

32
Feature Types (2)
  • Lexical Items
  • word occurrence vector

33
Feature Types (3)
  • Other Automatic Features available from ITSPOKE
    logs
  • Turn Begin Time (seconds from dialog start)
  • Turn End Time (seconds from dialog start)
  • Is Temporal Barge-in (student turn begins before
    tutor turn ends)
  • Is Temporal Overlap (student turn begins and
    ends in tutor turn)
  • Number of Words in Turn
  • Number of Syllables in Turn

34
Feature Types (4)
  • Manual Features (currently) available only from
    human transcription
  • Is Prior Tutor Question (tutor turn contains
    ?)
  • Is Student Question (student turn contains ?)
  • Is Semantic Barge-in (student turn begins at
    tutor word/pause boundary)
  • Number of Hedging/Grounding Phrases (e.g.
    mm-hm, um)
  • Is Grounding (canonical phrase turns not
    preceded by a tutor question)
  • Number of False Starts in Turn (e.g.
    acc-acceleration)
Write a Comment
User Comments (0)
About PowerShow.com