belief updating in spoken dialog systems - PowerPoint PPT Presentation

About This Presentation
Title:

belief updating in spoken dialog systems

Description:

Antoine Raux. 2 /36. spoken dialog systems. use natural language processing technology ... semantic confidence annotation [Walker, San-Segundo, Bohus, etc] ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 41
Provided by: danb7
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: belief updating in spoken dialog systems


1
belief updating in spoken dialog systems
  • Dan Bohus
  • Computer Science Department
  • Carnegie Mellon University
  • Pittsburgh, PA 15213

acknowledgements Alex Rudnicky, CMU Eric Horvitz
Tim Paek, MSR Antoine Raux
2
spoken dialog systems
  • use natural language processing technology
  • engage in a goal-oriented conversation
  • research community
  • information access, command-and-control
  • personal assistants, taskable agents, tutoring
    systems
  • industry
  • simpler automated phone systems

3
the problem
  • misunderstandings lead to interaction breakdowns

4
more concretely
  • S What city are you leaving from?
  • U Urbana Champaign OKAY IN THAT same PAY
  • S Sorry Im not sure I understood what you said.
    What city are you leaving from ?
  • U Urbana Champaign for minute since HEY
  • S Im still having trouble understanding you. To
    go on, I need you to answer the following
    question. What city are you leaving from?
  • U Chicago CHICAGO
  • S traveling from Chicago. Where would you like
    to go?
  • U Huntsville SEOUL
  • S traveling to Seoul. What day did you need to
    travel?
  • U no no Im traveling to Birmingham THE
    TRAVELING to berlin P_M
  • S traveling in the afternoon. Okay what day
    would you be departing chicago
  • U /uh/ the tenth of august AT THE TENTH OF
    AUGUST
  • S a flight tomorrow I have a Korean Airlines
    flight departing Chicago at 140pm, arrives Seoul
    at 5pm the next day. Do you want to take that?
  • U my destination is Birmingham flight
    destination mr WEEKEND
  • S traveling on Saturday, August 12th I have a
    flight departing Chicago at 140pm arrives Seoul
    at

5
understanding errors
  • stem from the speech recognition process
  • difficult operating conditions
  • typical word-error-rates
  • 20-30
  • up to 50 for non-natives
  • strong negative impact on interactions

6
pathways to increased robustness
  • improve recognition
  • gracefully handle errors through conversation
  • detect the problems
  • develop a set of recovery strategies
  • know how to choose between them (policy)

7
guarding against misunderstandings
  • S What city are you leaving from ?
  • U for minute since HEY
  • S Im still having trouble understanding you. To
    go on, I need you to answer the following
    question. What city are you leaving from?
  • U CHICAGO
  • S traveling from Chicago. Where would you like
    to go?
  • U SEOUL
  • S traveling to SeoulWhat day did you need to
    travel?
  • U THE TRAVELING to berlin P_M
  • S traveling in the afternoon. Okay what day
    would you be departing chicago
  • U AT THE TENTH OF AUGUST
  • S a flight tomorrow I have a Korean Airlines
    flight departing Chicago at 140pm, arrives Seoul
    at 5pm the next day. Do you want to take that?
  • U flight destination mr WEEKEND

8
guarding against misunderstandings
  • S What city are you leaving from ?
  • U for minute since HEY
  • S Im still having trouble understanding you. To
    go on, I need you to answer the following
    question. What city are you leaving from?
  • U CHICAGO
  • S traveling from Chicago. Where would you like
    to go?
  • U SEOUL
  • S traveling to SeoulWhat day did you need to
    travel?
  • U THE TRAVELING to berlin P_M
  • S traveling in the afternoon. Okay what day
    would you be departing chicago
  • U AT THE TENTH OF AUGUST
  • S a flight tomorrow I have a Korean Airlines
    flight departing Chicago at 140pm, arrives Seoul
    at 5pm the next day. Do you want to take that?
  • U flight destination mr WEEKEND

/ 0.07
/ 0.72
arrival Seoul / 0.65
/ 0.65
confidence scores
  • confirmation actions
  • reject
  • explicit confirmation Did you say Seoul?
  • implicit confirmation traveling to Seoul
    What day did you need to travel?
  • accept

/ 0.35
/ 0.58
/ 0.28
9
belief updating
  • S What city are you leaving from ?
  • U for minute since HEY
  • S Im still having trouble understanding you. To
    go on, I need you to answer the following
    question. What city are you leaving from?
  • U CHICAGO
  • S traveling from Chicago. Where would you like
    to go?
  • U SEOUL
  • S traveling to SeoulWhat day did you need to
    travel?
  • U THE TRAVELING to berlin P_M
  • S traveling in the afternoon. Okay what day
    would you be departing chicago
  • U AT THE TENTH OF AUGUST
  • S a flight tomorrow I have a Korean Airlines
    flight departing Chicago at 140pm, arrives Seoul
    at 5pm the next day. Do you want to take that?
  • U flight destination mr WEEKEND

/ 0.07
/ 0.72
arrival Seoul / 0.65
/ 0.65
confidence scores
/ 0.35
arrival ?
/ 0.58
/ 0.28
10
belief updating problem statement
  • given
  • an initial belief Binitial(C) over concept C
  • a system action SA(C)
  • a user response R
  • construct an updated belief
  • Bupdated(C) ? f(Binitial(C), SA(C), R)
  • S traveling to SeoulWhat day did you need to
    travel?
  • U THE TRAVELING to berlin P_M

11
outline
  • related work
  • proposed approach
  • data
  • experiments and results
  • effects on global performance
  • conclusion and future work

related work proposed approach data
experiments and results global performance
conclusion
12
detecting misunderstandings and corrections
  • confidence annotation
  • word-level Cox, Chase, Bansal, Ravinshankar,
    etc
  • semantic confidence annotation Walker,
    San-Segundo, Bohus, etc
  • correction detection Litman, Swerts, Hirschberg,
    Krahmer, Levow
  • detect when the user corrects the system

arrival Seoul / 0.65
S traveling to SeoulWhat day did you need to
travel? U THE TRAVELING to berlin P_M
Conf0.35
Corr0.47
arrival ?
related work proposed approach data
experiments and results global performance
conclusion
13
current solutions for tracking beliefs
  • most systems only track single values
  • new values overwrite old values
  • use simple heuristic rules
  • explicit confirmation
  • S did you say you wanted to fly to Seoul?
  • yes ? trust hypothesis
  • no ? delete hypothesis
  • other ? non-understanding
  • implicit confirmation
  • S traveling to Seoul what day did you need to
    travel?
  • rely on new values overwriting old values

related work proposed approach data
experiments and results global performance
conclusion
14
outline
  • related work
  • proposed approach
  • data
  • experiments and results
  • effects on global performance
  • conclusion and future work

related work proposed approach data
experiments and results global performance
conclusion
15
belief updating problem statement
arrival Seoul / 0.65
  • S traveling to SeoulWhat day did you need to
    travel?
  • U THE TRAVELING to berlin P_M

f
/ 0.35
arrival ?
  • given
  • an initial belief Binitial(C) over concept C
  • a system action SA(C)
  • a user response R
  • construct an updated belief
  • Bupdated(C) ? f(Binitial(C), SA(C), R)

related work proposed approach data
experiments and results global performance
conclusion
16
Bupdated(C) ? f(Binitial(C), SA(C), R)
belief representation
departure
  • most accurate representation
  • probability distribution over the set of
    possible values
  • however
  • system hears only a small number of conflicting
    values for a concept throughout a session
  • max 3 conflicting values heard
  • only in 7 of cases, more than 1 value heard

related work proposed approach data
experiments and results global performance
conclusion
17
belief representation
Bupdated(C) ? f(Binitial(C), SA(C), R)
  • compressed belief representation
  • k hypotheses other
  • dynamically add and drop hypotheses
  • remember m hypotheses, add n new ones (mnk)

S flying from Aspen what is your destination?
U NO NO I DIDNT THAT THAT
  • B(C) is a multinomial variable of degree k1

related work proposed approach data
experiments and results global performance
conclusion
18
system action
Bupdated(C) ? f(Binitial(C), SA(C), R)
related work proposed approach data
experiments and results global performance
conclusion
19
user response
Bupdated(C) ? f(Binitial(C), SA(C), R)
related work proposed approach data
experiments and results global performance
conclusion
20
approach
Bupdated(C) ? f(Binitial(C), SA(C), R)
  • multinomial regression problem
  • multinomial generalized linear model
  • sample efficient
  • stepwise approach
  • feature selection
  • BIC to control over-fitting
  • one separate model for each system action
  • Bupdated(C) ? fSA(C) (Binitial(C), R)

related work proposed approach data
experiments and results global performance
conclusion
21
outline
  • related work
  • proposed approach
  • data
  • experiments and results
  • effects on global performance
  • conclusion and future work

related work proposed approach data
experiments and results global performance
conclusion
22
data
  • collected with RoomLine
  • a phone-based mixed-initiative spoken dialog
    system
  • conference room reservation
  • explicit and implicit confirmations
  • simple heuristic rules for belief updating
  • explicit confirm yes / no
  • implicit confirm new values overwrite old ones

related work proposed approach data
experiments and results global performance
conclusion
23
corpus
  • user study
  • 46 participants (first-time users)
  • 10 scenario-based interactions each
  • corpus
  • 449 sessions, 8848 user turns
  • orthographically transcribed
  • manually annotated
  • misunderstandings
  • corrections
  • correct concept values

related work proposed approach data
experiments and results global performance
conclusion
24
outline
  • related work
  • proposed approach
  • data
  • experiments and results
  • effects on global performance
  • conclusion and future work

related work proposed approach data
experiments and results global performance
conclusion
25
models
  • k2 other (m1, n1)
  • k3 other (m2, n1)
  • k4 other (m3, n1)
  • full model
  • all features
  • basic model
  • all features except priors and confusability
  • runtime model
  • all features available at runtime

related work proposed approach data
experiments and results global performance
conclusion
26
baselines
  • initial baseline
  • accuracy of system beliefs before the update
  • heuristic baseline
  • accuracy of heuristic update rule used by the
    system
  • correction baseline
  • accuracy if we knew exactly when the user
    corrects the system

related work proposed approach data
experiments and results global performance
conclusion
27
results for k2 hyps other
explicit confirm
initial baseline (i)
heuristic baseline (h)
basic model (BM)
full model (FM)
runtime model (RM)
correctionbaseline (c)
related work proposed approach data
experiments and results global performance
conclusion
28
a question remains
  • does this really matter?

related work proposed approach data
experiments and results global performance
conclusion
29
outline
  • related work
  • proposed approach
  • data
  • experiments and results
  • effects on global performance
  • conclusion and future work

related work proposed approach data
experiments and results global performance
conclusion
30
a new user study
  • implemented models in RavenClaw
  • 40 participants, first-time, non-native users
  • improvements more likely at high word-error-rates
  • 10 scenario-driven interactions each
  • between-subjects 2 gender-balanced groups
  • control RoomLine using heuristic update rules
  • treatment RoomLine using runtime models

related work proposed approach data
experiments and results global performance
conclusion
31
effect on task success
  • logistic ANOVA on task success

p0.009
logit(TaskSuccess) ? 2.09 - 0.05WER
0.69Condition
100
80
probability of task success
60
40
20
0
20
40
60
80
100
0
word error rate
related work proposed approach data
experiments and results global performance
conclusion
32
how about efficiency?
  • ANOVA on task duration for successful tasks
  • Duration ? -0.21 0.013WER - 0.106Condition
  • significant improvement
  • equivalent to 7.9 absolute reduction in
    word-error

p0.0003
related work proposed approach data
experiments and results global performance
conclusion
33
outline
  • related work
  • proposed approach
  • data
  • experiments and results
  • effects on global performance
  • conclusion and future work

related work proposed approach data
experiments and results global performance
conclusion
34
summary
arrival
departure
/ 0.72
  • U CHICAGO
  • S traveling from Chicago. Where would you like
    to go?
  • U SEOUL
  • S traveling to SeoulWhat day did you need to
    travel?
  • U THE TRAVELING to berlin P_M
  • S traveling in the afternoon. Okay what day
    would you be departing chicago

/ 0.65
arrival Seoul / 0.65
departure
/ 0.35
arrival ?
departure
  • approach for constructing accurate beliefs
  • integrate information across multiple turns
  • large gains in task success and efficiency

related work proposed approach data
experiments and results global performance
conclusion
35
other advantages
  • learns from data
  • tuned to the domain in which it operates
  • sample efficient / scalable
  • performs a local one-turn optimization
  • works independently on concepts
  • portable
  • decoupled from dialog task specification
  • no strong assumptions about dialog management

related work proposed approach data
experiments and results global performance
conclusion
36
future work
  • integrate information from n-best list
  • integrate other high-level knowledge
  • domain-specific constraints
  • inter-concept dependencies
  • unsupervised / implicit learning
  • domain-specificity

related work proposed approach data
experiments and results global performance
conclusion
37
thank you! questions
38
improvements at different WER
absolute improvement in task success
word-error-rate
39
user study
  • 10 scenarios, fixed order
  • presented graphically (explained during briefing)
  • participants compensated per task success

40
informative features
  • priors and confusability
  • initial confidence scores
  • concept identity
  • barge-in
  • expectation match
  • repeated grammar slots
Write a Comment
User Comments (0)
About PowerShow.com