Belief Updating in Spoken Dialog Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Belief Updating in Spoken Dialog Systems

Description:

spoken language interfaces lack robustness when faced with understanding errors. ... U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 40
Provided by: danb7
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Belief Updating in Spoken Dialog Systems


1
Belief Updating in Spoken Dialog Systems
  • Dan Bohus
  • www.cs.cmu.edu/dbohus
  • dbohus_at_cs.cmu.edu
  • Computer Science Department
  • Carnegie Mellon University
  • Pittsburgh, PA, 15217

2
problem
spoken language interfaces lack robustness when
faced with understanding errors.
  • stems mostly from speech recognition
  • spans most domains and interaction types

3
more concretely
  • S What city are you leaving from?
  • U Urbana Champaign OKAY IN THAT SAME PAY
  • S Sorry Im not sure I understood what you said.
    What city are you leaving from ?
  • U Urbana Champaign FOR MINUTE SINCE HEY
  • S Im still having trouble understanding you. To
    go on, I need you to answer the following
    question. What city are you leaving from?
  • U Chicago CHICAGO
  • S traveling from Chicago. Where would you like
    to go?
  • U Huntsville SEOUL
  • S traveling to Seoul. What day did you need to
    travel?
  • U no no Im traveling to Birmingham THE
    TRAVELING TO BERLIN P_M
  • S traveling in the afternoon. Okay what day
    would you be departing chicago
  • U /uh/ the tenth of august AT THE TENTH OF
    AUGUST
  • S a flight tomorrow I have a Korean Airlines
    flight departing Chicago at 140pm, arrives Seoul
    at 5pm the next day. Do you want to take that?
  • U my destination is Birmingham FLIGHT
    DESTINATION MR WEEKEND
  • S traveling on Saturday, August 12th I have a
    flight departing Chicago at 140pm arrives Seoul
    at

4
non- and misunderstandings
  • S What city are you leaving from?
  • U Urbana Champaign OKAY IN THAT SAME PAY
  • S Sorry Im not sure I understood what you said.
    What city are you leaving from ?
  • U Urbana Champaign FOR MINUTE SINCE HEY
  • S Im still having trouble understanding you. To
    go on, I need you to answer the following
    question. What city are you leaving from?
  • U Chicago CHICAGO
  • S traveling from Chicago. Where would you like
    to go?
  • U Huntsville SEOUL
  • S traveling to Seoul. What day did you need to
    travel?
  • U no no Im traveling to Birmingham THE
    TRAVELING TO BERLIN P_M
  • S traveling in the afternoon. Okay what day
    would you be departing chicago
  • U /uh/ the tenth of august AT THE TENTH OF
    AUGUST
  • S a flight tomorrow I have a Korean Airlines
    flight departing Chicago at 140pm, arrives Seoul
    at 5pm the next day. Do you want to take that?
  • U my destination is Birmingham FLIGHT
    DESTINATION MR WEEKEND
  • S traveling on Saturday, August 12th I have a
    flight departing Chicago at 140pm arrives Seoul
    at

5
approaches for increasing robustness
  • fix recognition
  • gracefully handle errors through interaction
  • detect the problems
  • develop a set of recovery strategies
  • know how to choose between them (policy)

6
six not-so-easy pieces
7
belief updating
misunderstandings
  • construct more accurate beliefs by integrating
    information over multiple turns

detection
S Where would you like to go? U Huntsville SEO
UL / 0.65
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
8
belief updating problem statement
  • given
  • an initial belief Pinitial(C) over concept C
  • a system action SA
  • a user response R
  • construct an updated belief
  • Pupdated(C) ? f (Pinitial(C), SA, R)

destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
9
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • some caveats and future work

related work restricted version data user
response analysis experiment results
caveats future work
10
confidence annotation heuristic updates
  • confidence annotation
  • traditionally focused on word-level errors
    Chase, Cox, Bansal, Ravinshankar
  • more recently semantic confidence annotation
    Walker, San-Segundo, Bohus
  • machine learning approach
  • results fairly good, but not perfect
  • heuristic updates
  • explicit confirmation no ? dont trust yes ?
    trust
  • implicit confirmation no ? dont trust o/w
    ? trust
  • suboptimal for several reasons

related work restricted version data user
response analysis experiment results
caveats future work
11
correction detection
  • detect if the user is trying to correct the
    system Litman, Swerts, Hirschberg, Krahmer,
    Levow
  • machine learning approach
  • features from different knowledge sources in the
    system
  • results fairly good, but not perfect

related work restricted version data user
response analysis experiment results
caveats future work
12
integration
  • confidence annotation and correction detection
    are useful tools
  • but separately, neither solves the problem
  • bridge together in a unified approach to
    accurately track beliefs

related work restricted version data user
response analysis experiment results
caveats future work
13
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • some caveats and future work

related work restricted version data user
response analysis experiment results
caveats future work
14
belief updating general form
  • given
  • an initial belief Pinitial(C) over concept C
  • a system action SA
  • a user response R
  • construct an updated belief
  • Pupdated(C) ? f (Pinitial(C), SA, R)

related work restricted version data user
response analysis experiment results
caveats future work
15
restricted version 2 simplifications
  • compact belief
  • system unlikely to hear more than 3 or 4 values
  • single vs. multiple recognition results
  • in our data max 3 values, only 6.9 have gt1
    value
  • confidence score of top hypothesis
  • updates after confirmation actions
  • reduced problem
  • ConfTopupdated(C) ? f (ConfTopinitial(C), SA, R)

related work restricted version data user
response analysis experiment results
caveats future work
16
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • some caveats and future work

related work restricted version data user
response analysis experiment results
caveats future work
17
data
  • collected with RoomLine
  • a phone-based mixed-initiative spoken dialog
    system
  • conference room reservation
  • search and negotiation
  • explicit and implicit confirmations
  • confidence threshold model ( some exploration)
  • unplanned implicit confirmations
  • I found 10 rooms for Friday between 1 and 3 p.m.
    Would like a small room or a large one?
  • I found 10 rooms for Friday between 1 and 3 p.m.
    Would like a small room or a large one?

related work restricted version data user
response analysis experiment results
caveats future work
18
corpus
  • user study
  • 46 participants (naïve users)
  • 10 scenario-based interactions each
  • compensated per task success
  • corpus
  • 449 sessions, 8848 user turns
  • orthographically transcribed
  • rich annotation correct concepts, corrections,
    etc.

related work restricted version data user
response analysis experiment results
caveats future work
19
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • some caveats and future work

related work restricted version data user
response analysis experiment results
caveats future work
20
user response types
  • following Krahmer and Swerts
  • study on Dutch train-table information system
  • 3 user response types
  • YES yes, right, thats right, correct, etc.
  • NO no, wrong, etc.
  • OTHER
  • cross-tabulated against correctness of
    confirmations

related work restricted version data user
response analysis experiment results
caveats future work
21
user responses to explicit confirmations
  • from transcripts
  • numbers in brackets from KrahmerSwerts
  • from decoded

YES NO Other
CORRECT 94 93 0 0 5 7
INCORRECT 1 6 72 57 27 37
YES NO Other
CORRECT 87 1 12
INCORRECT 1 61 38
related work restricted version data user
response analysis experiment results
caveats future work
22
other responses to explicit confirmations
  • 70 users repeat the correct value
  • 15 users dont address the question
  • attempt to shift conversation focus

User does not correct User corrects
CORRECT 1159 0
INCORRECT 29 10 of incor 250 90 of incor
related work restricted version data user
response analysis experiment results
caveats future work
23
user responses to implicit confirmations
  • Transcripts
  • numbers in brackets from KrahmerSwerts
  • Decoded

YES NO Other
CORRECT 30 0 7 0 63 100
INCORRECT 6 0 33 15 61 85
YES NO Other
CORRECT 28 5 67
INCORRECT 7 27 66
related work restricted version data user
response analysis experiment results
caveats future work
24
ignoring errors in implicit confirmations
User does not correct User corrects
CORRECT 552 2
INCORRECT 118 51 of incor 111 49 of incor
  • users correct later (40 of 118)
  • users interact strategically
  • correct only if essential

correct later correct later
critical 55 2
critical 14 47
related work restricted version data user
response analysis experiment results
caveats future work
25
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • some caveats and future work

related work restricted version data user
response analysis experiment results
caveats future work
26
machine learning approach
  • need good probability outputs
  • low cross-entropy between model predictions and
    reality
  • cross-entropy negative average log posterior
  • logistic regression
  • sample efficient
  • stepwise approach ? feature selection
  • logistic model tree for each action
  • root splits on response-type

related work restricted version data user
response analysis experiment results
caveats future work
27
features. target.
  • initial situation
  • initial confidence score
  • concept identity, dialog state, turn number
  • system action
  • other actions performed in parallel
  • features of the user response
  • acoustic / prosodic features
  • lexical features
  • grammatical features
  • dialog-level features
  • target was the value correct?

related work restricted version data user
response analysis experiment results
caveats future work
28
baselines
  • initial baseline
  • accuracy of system beliefs before the update
  • heuristic baseline
  • accuracy of heuristic rule currently used in the
    system
  • oracle baseline
  • accuracy if we knew exactly when the user is
    correcting the system

related work restricted version data user
response analysis experiment results
caveats future work
29
results explicit confirmation
Hard error ()
Soft error
related work restricted version data user
response analysis experiment results
caveats future work
30
results implicit confirmation
Hard error ()
Soft error
related work restricted version data user
response analysis experiment results
caveats future work
31
results unplanned implicit confirmation
Hard error ()
Soft error
related work restricted version data user
response analysis experiment results
caveats future work
32
informative features
  • initial confidence score
  • prosody features
  • barge-in
  • expectation match
  • repeated grammar slots
  • concept id

related work restricted version data user
response analysis experiment results
caveats future work
33
outline
  • related work
  • a reduced version. approach
  • data
  • user response analysis
  • experiments and results
  • some caveats and future work

related work restricted version data user
response analysis experiment results
caveats future work
34
eliminate simplification 1
  • current restricted version
  • belief confidence score of top hypothesis
  • only 6.9 of cases had more than 1 hypothesis
  • extend to
  • N hypotheses 1 (other), where N is a small
    integer (2 or 3)
  • approach multinomial generalized linear model
  • use information from multiple recognition
    hypotheses

related work restricted version data user
response analysis experiment results
caveats future work
35
eliminate simplification 2
  • current restricted version
  • only updates following system confirmation
    actions
  • users might correct the system at any point
  • extend to
  • updates after all system actions

related work restricted version data user
response analysis experiment results
caveats future work
36
shameless self promotion
- rejection threshold adaptation - nonu impact on
performance Interspeech-05
- comparative analysis of 10 recovery
strategies SIGdial-05
  • wizard experiment
  • towards learning nonu recovery policies
    Sigdial-05

37
shameless CMU promotion
  • Ananlada (Moss) Chotimongkol
  • automatic concept and task structure acquisition
  • Antoine Raux
  • turn-taking, conversation micro-management
  • Jahanzeb Sherwani
  • multimodal personal information management
  • Satanjeev Banerjee
  • meeting understanding
  • Stefanie Tomko
  • universal speech interface
  • Thomas Harris
  • multi-participant dialog
  • DoD / Young Researchers Roundtable

38
thankyou!
39
a more subtle caveat
  • distribution of training data
  • confidence annotator heuristic update rules
  • distribution of run-time data
  • confidence annotator learned model
  • always a problem when interacting with the world
  • hopefully, distribution shift will not cause
    large degradation in performance
  • remains to validate empirically
  • maybe a bootstrap approach?
Write a Comment
User Comments (0)
About PowerShow.com