belief updating in spoken dialog systems

About This Presentation

Title:

belief updating in spoken dialog systems

Description:

Antoine Raux. 2 /36. spoken dialog systems. use natural language processing technology ... semantic confidence annotation [Walker, San-Segundo, Bohus, etc] ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 41

Provided by: danb7

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: belief updating in spoken dialog systems

1
belief updating in spoken dialog systems

Dan Bohus
Computer Science Department
Carnegie Mellon University
Pittsburgh, PA 15213

acknowledgements Alex Rudnicky, CMU Eric Horvitz
Tim Paek, MSR Antoine Raux
2
spoken dialog systems

use natural language processing technology
engage in a goal-oriented conversation
research community
information access, command-and-control
personal assistants, taskable agents, tutoring
systems
industry
simpler automated phone systems

3
the problem

misunderstandings lead to interaction breakdowns

4
more concretely

S What city are you leaving from?
U Urbana Champaign OKAY IN THAT same PAY
S Sorry Im not sure I understood what you said.
What city are you leaving from ?
U Urbana Champaign for minute since HEY
S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from?
U Chicago CHICAGO
S traveling from Chicago. Where would you like
to go?
U Huntsville SEOUL
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham THE
TRAVELING to berlin P_M
S traveling in the afternoon. Okay what day
would you be departing chicago
U /uh/ the tenth of august AT THE TENTH OF
AUGUST
S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that?
U my destination is Birmingham flight
destination mr WEEKEND
S traveling on Saturday, August 12th I have a
flight departing Chicago at 140pm arrives Seoul
at

5
understanding errors

stem from the speech recognition process
difficult operating conditions
typical word-error-rates
20-30
up to 50 for non-natives
strong negative impact on interactions

6
pathways to increased robustness

improve recognition

gracefully handle errors through conversation

detect the problems
develop a set of recovery strategies
know how to choose between them (policy)

7
guarding against misunderstandings

S What city are you leaving from ?
U for minute since HEY
S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from?
U CHICAGO
S traveling from Chicago. Where would you like
to go?
U SEOUL
S traveling to SeoulWhat day did you need to
travel?
U THE TRAVELING to berlin P_M
S traveling in the afternoon. Okay what day
would you be departing chicago
U AT THE TENTH OF AUGUST
S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that?
U flight destination mr WEEKEND

8
guarding against misunderstandings

S What city are you leaving from ?
U for minute since HEY
S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from?
U CHICAGO
S traveling from Chicago. Where would you like
to go?
U SEOUL
S traveling to SeoulWhat day did you need to
travel?
U THE TRAVELING to berlin P_M
S traveling in the afternoon. Okay what day
would you be departing chicago
U AT THE TENTH OF AUGUST
S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that?
U flight destination mr WEEKEND

/ 0.07
/ 0.72
arrival Seoul / 0.65
/ 0.65
confidence scores

confirmation actions
reject
explicit confirmation Did you say Seoul?
implicit confirmation traveling to Seoul
What day did you need to travel?
accept

/ 0.35
/ 0.58
/ 0.28
9
belief updating

S What city are you leaving from ?
U for minute since HEY
S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from?
U CHICAGO
S traveling from Chicago. Where would you like
to go?
U SEOUL
S traveling to SeoulWhat day did you need to
travel?
U THE TRAVELING to berlin P_M
S traveling in the afternoon. Okay what day
would you be departing chicago
U AT THE TENTH OF AUGUST
S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that?
U flight destination mr WEEKEND

/ 0.07
/ 0.72
arrival Seoul / 0.65
/ 0.65
confidence scores
/ 0.35
arrival ?
/ 0.58
/ 0.28
10
belief updating problem statement

given
an initial belief Binitial(C) over concept C
a system action SA(C)
a user response R
construct an updated belief
Bupdated(C) ? f(Binitial(C), SA(C), R)

S traveling to SeoulWhat day did you need to
travel?
U THE TRAVELING to berlin P_M

11
outline

related work
proposed approach
data
experiments and results
effects on global performance
conclusion and future work

related work proposed approach data
experiments and results global performance
conclusion
12
detecting misunderstandings and corrections

confidence annotation
word-level Cox, Chase, Bansal, Ravinshankar,
etc
semantic confidence annotation Walker,
San-Segundo, Bohus, etc
correction detection Litman, Swerts, Hirschberg,
Krahmer, Levow
detect when the user corrects the system

arrival Seoul / 0.65
S traveling to SeoulWhat day did you need to
travel? U THE TRAVELING to berlin P_M
Conf0.35
Corr0.47
arrival ?
related work proposed approach data
experiments and results global performance
conclusion
13
current solutions for tracking beliefs

most systems only track single values
new values overwrite old values
use simple heuristic rules
explicit confirmation
S did you say you wanted to fly to Seoul?
yes ? trust hypothesis
no ? delete hypothesis
other ? non-understanding
implicit confirmation
S traveling to Seoul what day did you need to
travel?
rely on new values overwriting old values

related work proposed approach data
experiments and results global performance
conclusion
14
outline

related work
proposed approach
data
experiments and results
effects on global performance
conclusion and future work

related work proposed approach data
experiments and results global performance
conclusion
15
belief updating problem statement
arrival Seoul / 0.65

S traveling to SeoulWhat day did you need to
travel?
U THE TRAVELING to berlin P_M

f
/ 0.35
arrival ?

given
an initial belief Binitial(C) over concept C
a system action SA(C)
a user response R
construct an updated belief
Bupdated(C) ? f(Binitial(C), SA(C), R)

related work proposed approach data
experiments and results global performance
conclusion
16
Bupdated(C) ? f(Binitial(C), SA(C), R)
belief representation
departure

most accurate representation
probability distribution over the set of
possible values

however
system hears only a small number of conflicting
values for a concept throughout a session
max 3 conflicting values heard
only in 7 of cases, more than 1 value heard

related work proposed approach data
experiments and results global performance
conclusion
17
belief representation
Bupdated(C) ? f(Binitial(C), SA(C), R)

compressed belief representation
k hypotheses other
dynamically add and drop hypotheses
remember m hypotheses, add n new ones (mnk)

S flying from Aspen what is your destination?
U NO NO I DIDNT THAT THAT

B(C) is a multinomial variable of degree k1

related work proposed approach data
experiments and results global performance
conclusion
18
system action
Bupdated(C) ? f(Binitial(C), SA(C), R)
related work proposed approach data
experiments and results global performance
conclusion
19
user response
Bupdated(C) ? f(Binitial(C), SA(C), R)
related work proposed approach data
experiments and results global performance
conclusion
20
approach
Bupdated(C) ? f(Binitial(C), SA(C), R)

multinomial regression problem
multinomial generalized linear model
sample efficient
stepwise approach
feature selection
BIC to control over-fitting
one separate model for each system action
Bupdated(C) ? fSA(C) (Binitial(C), R)

related work proposed approach data
experiments and results global performance
conclusion
21
outline

related work
proposed approach
data
experiments and results
effects on global performance
conclusion and future work

related work proposed approach data
experiments and results global performance
conclusion
22
data

collected with RoomLine
a phone-based mixed-initiative spoken dialog
system
conference room reservation
explicit and implicit confirmations
simple heuristic rules for belief updating
explicit confirm yes / no
implicit confirm new values overwrite old ones

related work proposed approach data
experiments and results global performance
conclusion
23
corpus

user study
46 participants (first-time users)
10 scenario-based interactions each
corpus
449 sessions, 8848 user turns
orthographically transcribed
manually annotated
misunderstandings
corrections
correct concept values

related work proposed approach data
experiments and results global performance
conclusion
24
outline

related work
proposed approach
data
experiments and results
effects on global performance
conclusion and future work

related work proposed approach data
experiments and results global performance
conclusion
25
models

k2 other (m1, n1)
k3 other (m2, n1)
k4 other (m3, n1)
full model
all features
basic model
all features except priors and confusability
runtime model
all features available at runtime

related work proposed approach data
experiments and results global performance
conclusion
26
baselines

initial baseline
accuracy of system beliefs before the update
heuristic baseline
accuracy of heuristic update rule used by the
system
correction baseline
accuracy if we knew exactly when the user
corrects the system

related work proposed approach data
experiments and results global performance
conclusion
27
results for k2 hyps other
explicit confirm
initial baseline (i)
heuristic baseline (h)
basic model (BM)
full model (FM)
runtime model (RM)
correctionbaseline (c)
related work proposed approach data
experiments and results global performance
conclusion
28
a question remains

does this really matter?

related work proposed approach data
experiments and results global performance
conclusion
29
outline

related work
proposed approach
data
experiments and results
effects on global performance
conclusion and future work

related work proposed approach data
experiments and results global performance
conclusion
30
a new user study

implemented models in RavenClaw
40 participants, first-time, non-native users
improvements more likely at high word-error-rates
10 scenario-driven interactions each
between-subjects 2 gender-balanced groups
control RoomLine using heuristic update rules
treatment RoomLine using runtime models

related work proposed approach data
experiments and results global performance
conclusion
31
effect on task success

logistic ANOVA on task success

p0.009
logit(TaskSuccess) ? 2.09 - 0.05WER
0.69Condition
100
80
probability of task success
60
40
20
0
20
40
60
80
100
0
word error rate
related work proposed approach data
experiments and results global performance
conclusion
32
how about efficiency?

ANOVA on task duration for successful tasks
Duration ? -0.21 0.013WER - 0.106Condition
significant improvement
equivalent to 7.9 absolute reduction in
word-error

p0.0003
related work proposed approach data
experiments and results global performance
conclusion
33
outline

related work
proposed approach
data
experiments and results
effects on global performance
conclusion and future work

related work proposed approach data
experiments and results global performance
conclusion
34
summary
arrival
departure
/ 0.72

U CHICAGO
S traveling from Chicago. Where would you like
to go?
U SEOUL
S traveling to SeoulWhat day did you need to
travel?
U THE TRAVELING to berlin P_M
S traveling in the afternoon. Okay what day
would you be departing chicago

/ 0.65
arrival Seoul / 0.65
departure
/ 0.35
arrival ?
departure

approach for constructing accurate beliefs
integrate information across multiple turns
large gains in task success and efficiency

related work proposed approach data
experiments and results global performance
conclusion
35
other advantages

learns from data
tuned to the domain in which it operates
sample efficient / scalable
performs a local one-turn optimization
works independently on concepts
portable
decoupled from dialog task specification
no strong assumptions about dialog management

related work proposed approach data
experiments and results global performance
conclusion
36
future work

integrate information from n-best list
integrate other high-level knowledge
domain-specific constraints
inter-concept dependencies
unsupervised / implicit learning
domain-specificity

related work proposed approach data
experiments and results global performance
conclusion
37
thank you! questions
38
improvements at different WER
absolute improvement in task success
word-error-rate
39
user study

10 scenarios, fixed order
presented graphically (explained during briefing)

participants compensated per task success

40
informative features

priors and confusability
initial confidence scores
concept identity
barge-in
expectation match
repeated grammar slots

Write a Comment

User Comments (0)

About PowerShow.com

belief updating in spoken dialog systems - PowerPoint PPT Presentation

belief updating in spoken dialog systems

Antoine Raux. 2 /36. spoken dialog systems. use natural language processing technology ... semantic confidence annotation [Walker, San-Segundo, Bohus, etc] ... – PowerPoint PPT presentation