Labelling Emotional User States in Speech: Where's the Problems, where's the Solutions? - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Labelling Emotional User States in Speech: Where's the Problems, where's the Solutions?

Description:

helpless, hesitant. emphatic (possibly indicating problems) touchy (=irritated) angry ... bound. pitch-lab. 21. Agreement. open for non-HUMAINE partners ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 24

Provided by: antonba

Category:

more less

Transcript and Presenter's Notes

Title: Labelling Emotional User States in Speech: Where's the Problems, where's the Solutions?

1
Labelling Emotional User States in
SpeechWhere's the Problems, where's the
Solutions?

Anton Batliner, Stefan SteidlUniversity of
Erlangen
HUMAINE WP5-WS, Belfast, December 2004

2
Overview

decisions to be made
mapping data onto labels
the human factor
and later on again some mapping
illustrations
the sparse data problem
new dimensions
new measure for labeller agreement
and afterwards what to do with the data?
handling of databases
we proudly present CEICES
some statements

3
Mapping data onto labels I

catalogue of labels
data-driven (selection from "HUMAINE"-catalogue?)
should be a semi-open class
unit of annotation
word (phrase or turn) in speech
in video?
alignment of video time stamps with speech data
necessary

4
Mapping data onto labels II

categorical (hard/soft) labelling vs. dimensions
formal vs. functional labelling
functional holistic user states
formal prosody, voice quality, syntax, lexicon,
FAPs, ...
reference baseline
speaker-/user-specific
neutral phase at beginning of interaction
sliding window
emotion content vs. signs of emotion?

5
The human factor I

expert labellers vs. naïve labellers
experts
experienced, i.e. consistent
with (theoretical) bias
expensive
few
"naïve" labellers
maybe less consistent
no bias, i.e., ground truth?
less expensive
more
representative data many data high effort
are there "bad" labellers?
does high interlabeller agreement really mean
good labellers?

6
The human factor IIevaluation of annotations
WP3
kappa etc.
engineering
past WP9
7
and later on

mapping of labels onto cover classes
sparse data
classification performance
embedding into the application task
small number of alternatives
criteria?
dimensional labels adequate?
human processing
system restrictions cf. the story of 33 vs. 2
levels of accentuation

8
The Sparse Data Problem

un-balanced distribution (Pareto?)
(too) few for robust training
down- or up-sampling necessary for testing
looking for "interesting" ("provocative"?)
data does this mean to beg the question?

9
The Sparse Data Problem Some Frequencies,
word-based
not neutral in - /23
(27/9.6)
10.3/4.6
15.4/8 with/without emph.

scenario-specific ironic vs. motherese/reprimandi
ng
emphatic in-between
rare birds in AIBO surprised, helpless, bored

consensus labelling
10
Towards New Dimensions

from categories to dimensions
confusion matrices similarities
?Non-Metrical Multi-Dimensional Solution (NMDS)

11
11 emotional user state labels,
data-driven,word-based

joyful
surprised
motherese
neutral (default)
rest (wast-paper-basket, non-neutral)
bored
helpless, hesitant
emphatic (possibly indicating problems)
touchy (irritated)
angry
reprimanding

effort 10-15 times real-time
12
confusion matrix majority voting 3/5 vs. rest
if 2/2/1, both 2/2 as maj. vot. ("pre-emphasis")
A T R J M E N W S B H
Angry 43 13 12 00 00 12 18 00 00 00 00
Touchy 04 42 11 00 00 13 23 00 00 02 00 Reprim.
03 15 45 00 01 14 18 00 00 00 00 Joyful 00 00 01
54 02 07 32 00 00 00 00 Mother. 00 00 01 00 61 04
30 00 00 00 00 Emph. 01 05 06 00 01 53 29 00 00
00 00 Neutral 00 02 01 00 02 13 77 00 00 00
00 Wast-p. 00 07 06 00 08 19 21 32 00 01
01 Surpr. 00 00 00 00 00 20 40 00 40 00 00 Bored
00 14 01 00 01 12 28 01 00 39 00 Helpl. 00 01
00 02 00 12 37 03 00 00 41
R reprimanding, W waste paper basket category
13
"traditional" emotional dimensions in
feeltraceVALENCE and AROUSAL
14
NMDS 2-dimensional solution with 7
labels,relative majority with pre-empasis
orientation ?
valence ?
15
and back

from categories to dimensions
what about the way back?
automatic clustering?
thresholds
....

16
Towards New Quality Measures

?
Stefan Steidl
Entropy-Based Evaluation of Decoders

17
Handling of Databases

http//www.phonetik.uni-muenchen.de/Forschung/BITS
/index.html
Publications
The Production of Speech Corpora (ISBN
3-8330-0700-1)
The Validation of Speech Corpora (ISBN
3-8330-0700-1)
The Production of Speech Corpora
Florian Schiel, Christoph Draxler
Angela Baumann, Tania Ellbogen, Alexander Steffen
Version 2.5 June 1, 2004

18
CEICES

Combining Efforts for Improving automatic
Classification of Emotional user States, a
"forced co-operation" initiative under the
guidance of HUMAINE
evaluation of annotations
assessment of F0 extraction algorithms
assessment of impact of single feature (classes)
improvement of classification performance via
sharing of features

19
Ingredients of CEICES

speech data German AIBO database
annotations
functional, emotional user states, word-based
(prosodic peculiarities, word-based)
manually corrected
segment boundaries for words
F0
specifications of Train/Vali/Test, etc.
reduction of effort ASCII file sharing via
portal
forced co-operation via agreement

20
corr. F0
aut. F0
wordbound.
pitch-lab.
21
Agreement

open for non-HUMAINE partners
nominal fee for distribution and handling
commitments
to share labels and extracted feature values
to use specified sub-samples
expected outcome
assessment of F0 extraction, impact of features,
...
set of feature classes/vectors with evaluation
common publication(s)

22
some statements

annotation has to be data-driven
there is no bad labellers
classification results have to be used for
labelling assessment
automatic labelling is not good enough - or,
maybe you should call it extraction
each label type has to be mapped onto very few
categorical classes at the end of the day

23
Thank you for your attention

Write a Comment

User Comments (0)