Context and Prosody in the Interpretation of Cue Phrases in Dialogue

About This Presentation

Title:

Context and Prosody in the Interpretation of Cue Phrases in Dialogue

Description:

Collection paradigm. Annotations. Perception Study of Okays ... Collection ... of affirmative cue words (alright, mm-hm, okay, right, uh-huh, yeah, yes, ... – PowerPoint PPT presentation

Number of Views:107

Avg rating:3.0/5.0

Slides: 45

Provided by: juliahir

Learn more at: http://www.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Context and Prosody in the Interpretation of Cue Phrases in Dialogue

1
Context and Prosody in the Interpretation of Cue
Phrases in Dialogue
Spoken Dialog with Humans and Machines

Julia Hirschberg
Columbia University and KTH
11/22/07

2
In collaboration with

Agustín Gravano, Stefan Benus, Héctor Chávez,
Shira Mitchell, and Lauren Wilcox
With thanks to Gregory Ward and Elisa Sneed
German

3
Managing Conversation

How do speakers indicate conversational structure
in human/human dialogue?
How do they communicate varying levels of
attention, agreement, acknowledgment?
What role does lexical choice play in these
communicative acts? Phonetic realization?
Prosodic variation? Prior context?
Can human/human behavior be modeled in Spoken
Dialogue Systems?

4
Cue Phrases/Discourse Markers/Cue Words/
Discourse Particles/Clue Words

Linguistic expressions that can be employed
to convey information about the discourse
structure, or
to make a semantic (literal?) contribution.
Examples
now, well, so, alright, and, okay, first, on the
other hand, by the way, for example,

5
Some Examples

thats pretty much okay
Speaker 1 between the yellow mermaid and
the whaleSpeaker 2 okaySpeaker 1 and it is
okay we gonna be placing the blue moon

6
A Problem for Spoken Dialogue systems

How do speakers produce and hearers interpret
such potentially ambiguous terms?
How important is acoustic/prosodic information?
Phonetic variation?
Discourse context?

7
Research Goals

Learn which features best characterize the
different functions of single affirmative cue
words.
Determine how these can be identified
automatically.
Important in Spoken Dialogue Systems
Understand user input.
Produce output appropriately.

8
Overview

Previous research
The Columbia Games Corpus
Collection paradigm
Annotations
Perception Study of Okays
Experimental design
Analysis and results
Machine Learning Experiments on Okay
Future work Entrainment and Cue Phrases

9
Previous Work

General studies
Schriffin 82, 87 Reichman 85 Grosz Sidner
86
Cues to cue phrase disambiguation
Hirschberg Litman 87, 93 Hockey 93 Litman
94
Cues to Dialogue Act identification
Jurafsky et al 98 Rosset Lamel 04
Contextual cues to the production of backchannels
Ward Tsukahara 00 Sanjanhar Ward 06

10
The Columbia Games CorpusCollection

12 spontaneous task-oriented dyadic conversations
in Standard American English (9h 8m speech)
2 subjects playing a series of computer games, no
eye contact (45m 39s mean session time)
2 sessions per subject, w/different partners
Several types of games, designed to vary the way
discourse entities became old, or given in the
discourse to study variation in intonational
realization of information status

11
Cards Game 1
?
Player 1 (Describer)
?
Player 2 (Searcher)

Short monologues
Vary frequency and order of occurrence of objects
on the cards.

12
Cards Game 2
?
Player 1 (Describer)
?
Player 2 (Searcher)

Dialogue
Vary frequency and order of occurrence of objects
on the cards across speakers.

13
Objects Game

Follower must place the target object where it
appears on the Describers screen solely via the
description provided (4h 19m)

Describer
Follower
14
The Columbia Games CorpusRecording and Logging

Recorded on separate channels in soundproof
booth, digitized and downsampled to 16k
All user and system behaviors logged

15
The Columbia Games CorpusAnnotation

Orthographic transcription and alignment (73k
words).
Laughs, coughs, breaths, smacks,
throat-clearings.
Self-repairs.
Intonation, using ToBI conventions.
Function (10 categories) of affirmative cue words
(alright, mm-hm, okay, right, uh-huh, yeah, yes,
).
Question form and function.
Turn-taking behaviors.

16
The Columbia Games CorpusToBI Labeling

Tones
Pitch accents L, H, LH, H!H,
Phrase accents L-, H-, !H-
Boundary tones L, H
Break Indices
Degrees of junction0 no word boundary ... 4
full intonational phrase boundary
Miscellaneous
Disfluencies, non-speech sounds,

17
The Columbia Games CorpusToBI Example
18
Perception StudySelection of Materials
Speaker 1 yeah um there's like there's some
space there's Speaker 2 okay I think I got it
Speaker 1 but it's gonna be below the
onion Speaker 2 okay
Speaker 1 okay alright I'll try it
okay Speaker 2 okay the owl is blinking
19
Perception StudyExperiment Design

54 instances of okay (18 for each function).
2 tokens for each okay
Isolated condition Only the word okay.
Contextualized condition 2 full speaker turns
The turn containing the target okay and
The previous turn by the other speaker.

20
Perception StudyExperiment Design

1/3 each 3 labelers agreed, 2, none
Two conditions
Part 1 54 isolated tokens
Part 2 54 contextualized tokens
Subjects asked to classify each token of okay
as
Acknowledgment / Agreement, or
Backchannel, or
Cue beginning discourse segment.

21
Perception StudyDefinitions Given to the Subjects

Acknowledge/Agreement
The function of okay that indicates I believe
what you said and/or I agree with what you
say.
Backchannel
The function of okay in response to another
speaker's utterance that indicates only Im
still here or I hear you and please continue.
Cue beginning discourse segment
The function of okay that marks a new segment of
a discourse or a new topic. This use of okay
could be replaced by now.

22
Perception StudySubjects and Procedure

Subjects
20 paid subjects (10 female, 10 male).
Ages between 20 and 60.
Native speakers of English.
No hearing problems.
GUI on a laboratory workstation with headphones.

23
Results Inter-Subject Agreement

Kappa measure of agreement with respect to chance
(Fleiss 71)

Isolated Condition Contextualized Condition
Overall .120 .294
Ack / Agree vs. Other .089 .227
Backchannel vs. Other .118 .164
Cue beginning vs. Other .157 .497
24
ResultsCues to Interpretation

Phonetic transcription of okay
Isolated Condition
Strong correlation for realization of initial
vowel
? Backchannel
? Ack/Agree, Cue Beginning
Contextualized Condition
No strong correlations found for phonetic
variants.

25
Results Cues to Interpretation
Isolated Condition Contextualized Condition
Ack / Agree Shorter /k/ Shorter latency between turns Shorter pause before okay
Backchannel Higher final pitch slope Longer 2nd syllable Lower intensity Higher final pitch slope More words by S2 before okay Fewer words by S1 after okay
Cue beginning Lower final pitch slope Lower overall pitch slope Lower final pitch slope Longer latency between turns More words by S1 after okay
S1 Utterer of the target okay. S2 The
other speaker.
26
Results Cues to Interpretation

Phrase-final intonation (ToBI)
(Both isolated and contextualized conditions.)
H-H ? Backchannel
H-L
L-H ? Ack/Agree, Backchannel
L-L ? Ack/Agree, Cue beginning

27
Perception Study Conclusions

Agreement
Availability of context improves inter-subject
agreement.
Cue beginnings easier to disambiguate than the
other two functions.
Cues to interpretation
Contextual features override word features
Exception Final pitch slope of okay in both
conditions.

28
Machine Learning Experiments Okay

Can we identify the different functions of okay
in our larger corpus reliably?
What features perform best?
How do these compare to those that predict human
judgments?

29
Method

ML Algorithm
JRip Wekas implementation of the propositional
rule learner Ripper (Cohen 95).
We also tried J4.8, Wekas implementation of the
decision tree learner C4.5 (Quinlan 93, 96),
with similar results.
10-fold cross validation in all experiments.

30
Units of Analysis

IPU (Inter-pausal unit)
Maximal sequence of words delimited by pause gt
50ms.
Conversational Turn
Maximal sequence of IPUs by the same speaker,
with no contribution from the other speaker.

31
Experimental features

Text-based features (from transcriptions)
Word ident, POS tags (auto) position of word in
IPU / turn
IPU, turn length in words prev turn same spkr?
Timing features (from time alignment)
Word / IPU / turn duration amount of spkr
overlap
Time to word beg/end in IPU, turn
Acoustic features
min, mean, max, stdev x pitch, intensity
Slope of pitch, stylized pitch, and intensity,
over the whole word, and over its last 100, 200,
300ms.
Acoustic features from last IPU of prior
speakers turn.

32
Results Classification of individual words

Classification of each individual word into its
most common functions.
alright ? Ack/Agree, Cue Begin, Other
mm-hm ? Ack/Agree, Backchannel
okay ? Ack/Agree, Backchannel, Cue Begin,
AckCueBegin, AckCueEnd, Other
right ? Ack/Agree, Check, Literal Modifier
yeah ? Ack/Agree, Backchannel

33
Majority Labeled Functions of Okay (n2434)

1137 Ackn / Agreemt
548 Cue begin discourse segment
232 Pivot ending (A/ACue end)
121 Backchannel
68 Pivot beginning (A/ACue beg)
33 Check with the interlocutor
29 Literal modifier
15 Stall / Filler
10 Cue end discourse seg
6 Back from task

34
Results Classification of okay
Feature Set Error Rate F-Measure F-Measure F-Measure F-Measure F-Measure
Feature Set Error Rate Ack /Agree Back-channel Cue Begin Ack/Agree Cue Begin Ack/Agree Cue End
Majority Label 1137 121 548 68 232
Text-based 31.7 .76 .16 .77 .09 .33
Acoustic 40.2 .69 .24 .64 .03 .25
Text-based Timing 25.6 .79 .31 .82 .18 .67
Full set 25.5 .80 .46 .83 .21 .66
Baseline (1) 48.3 .68 .00 .00 .00 .00
Human labelers (2) 14.0 .89 .78 .94 .56 .73
(1) Majority class baseline ACK/AGREE. (2)
Calculated wrt each labelers agreement with the
majority labels.
35
Conclusions ML Experiments

Context and timing features
Like perception in context results timing
Pause after okay, not before
of succeeding words
Acoustic features impoverished
No phonetic features
No pitch slope
But ToBI labels (where available) didnt help

36
Future Work

Experiments with full ToBI labeling
Other features
Lexical, Acoustic-Prosodic, and Discourse
Entrainment and Dis-Entrainment
Positive correlations for affirmative cue words
Affirmative cue word entrainment and game scores
Affirmative cue word entrainment and overlaps and
interruptions in turn-taking

37
Tack!
38
Other Work

Benus et al, 2007
The prosody of backchannels in American
English, ICPhS 2007, Saarbrücken, Germany,
August 2007.
Gravano et al, 2007
Classification of discourse functions of
affirmative words in spoken dialogue,
Interspeech 2007, Antwerp, Belgium, August 2007.

39
Importance for Spoken Dialogue Systems

Convey ambiguous terms with the intended meaning
Interpret the users input correctly

40
Experiment Design

Goal Study the relation between the down-stepped
contour and
Information status
Syntactic position
Discourse position
Spontaneous speech
Both monologue and dialogue

41
Experiment Design

Three computer games.
Two players, each on a different computer.
They collaborate to perform a common task.
Totally unrestricted speech.

42
Objects Game
?
Player 1 (Describer)
?
Player 2 (Searcher)

Dialogue
Vary target and surrounding objects (subject and
object position).

43
Games Session

Repeat 3 times
Cards Game 1
Cards Game 2
Short break (optional)
Repeat 3 times
Objects Game
Each subject participated in 2 sessions.
12 sessions

44
Subjects

Postings
Columbias webpage for temporary job adds.
Craigs list
http//www.craigslist.org
Category Gigs ? Event gigs
Problem
People are unreliable
50 did not show up, or cancelled with short
notice.

45
Subjects

Possible solutions
Give precise instructions to e-mail ALL required
info
Name, native speaker?, hearing impairments?, etc.
Ask for a phone number.
Call them and explain why it is so important for
us that they show up (or cancel with adecuate
notice).
Increase the pay after each session.
Example 5, 10, 15 instead of 10, 10, 10.

46
Recording

Sound-proof booth
2 subjects 1 or 2 confederates.
Head-mounted mics.
Digital Audio Tape (DAT) one channel per
speaker.
Wav files
One mono file per speaker.
Sample rate 48000
Downsampled to 16000 (but kept original files!)
20 hours of speech ? 2.8 GB (16k)

47
Logs

Log everything the subjects do to a text file.
Example
170355234 BEGIN_EXECUTION
170404868 NEXT_TURN
170431837 RESULTS 97 points awarded.
170438426 NEXT_TURN
170503873 RESULTS 92 points awarded.
...
Later, this may be used (e.g.) to divide each
session into smaller tasks or conversations.

Write a Comment

User Comments (0)

About PowerShow.com

Context and Prosody in the Interpretation of Cue Phrases in Dialogue - PowerPoint PPT Presentation

Context and Prosody in the Interpretation of Cue Phrases in Dialogue

Collection paradigm. Annotations. Perception Study of Okays ... Collection ... of affirmative cue words (alright, mm-hm, okay, right, uh-huh, yeah, yes, ... – PowerPoint PPT presentation