sorry, I didn - PowerPoint PPT Presentation

About This Presentation
Title:

sorry, I didn

Description:

Dan Bohus www.cs.cmu.edu/~dbohus. Alexander I. Rudnicky www.cs.cmu.edu/~air ... U: Birmingham [BERLIN PM] System extracts incorrect information from the user's turn ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 28
Provided by: danb7
Learn more at: http://www.cs.cmu.edu
Category:
Tags: air | berlin | didn | sorry

less

Transcript and Presenter's Notes

Title: sorry, I didn


1
sorry, I didnt catch that! an investigation
of non-understandings and recovery strategies
  • Dan Bohus www.cs.cmu.edu/dbohus
  • Alexander I. Rudnicky www.cs.cmu.edu/air
  • Computer Science Department
  • Carnegie Mellon University
  • Pittsburgh, PA, 15213

2
systems often do not understand correctly
  • non-understandings and misunderstandings

3
systems often do not understand correctly
  • detection
  • strategies
  • policy (knowing how to engage the strategies)
  • typically trivial although diagnosis is not
  • large space of strategies
  • tradeoffs between them not well understood
  • simple heuristics incremental prompting

4
questions under investigation
  • what are the main causes of non-understandings?
  • how large is their impact on performance?
  • how do various recovery strategies compare to
    each other?
  • what are the relationships between strategies and
    user behaviors?
  • data
  • can we improve global dialog performance by using
    a smarter policy?
  • if yes, can we learn a better policy from data?

5
data collection
  • Roomline
  • phone-based, mixed-initiative system
  • conference room reservations
  • experimental design
  • control group uninformed recovery policy
  • wizard group recovery policy implemented by
    wizard
  • 46 participants, first-time users
  • tasks experimental procedure
  • up to 10 scenario-driven interactions

6
non-understanding recovery strategies
S For when do you need the conference room? 1.
ASK REPEAT Could you please repeat that? 2.
ASK REPHRASE Could you please try to
rephrase that? 3. NOTIFY (NTFY) Sorry, I
didnt catch that ... 4. YIELD TURN (YLD)
5. REPROMPT (RP) For when do you need the
conference room? 6. DETAILED REPROMPT (DRP)
Right now I need to know the date and time for
when you need the reservation 7. MOVE-ON
Sorry, I didnt catch that. For which day you
need the room? 8. YOU CAN SAY (YCS) Sorry, I
didnt catch that. For when do you need the
conference room? You can say something
like tomorrow at 10 am 9. TERSE YOU CAN SAY
(TYCS) Sorry, I didnt catch that. You can
say something like tomorrow at 10 am 10. FULL
HELP (HELP) Sorry, I didnt catch that. I
am currently trying to make a conference room
reservation for you. Right now I need to
know the date and time for when you need the
reservation. You can say something like
tomorrow at 10 am
7
corpus statistics
  • 449 sessions
  • 8278 user turns
  • utterances transcribed and checked
  • manual annotations
  • misunderstandings
  • correct concept values at each turn
  • sources of understanding errors
  • user response-types to recovery strategies

8
questions under investigation
  • data
  • what are the main causes of non-understandings?
  • how large is their impact on performance?
  • how do various recovery strategies compare to
    each other?
  • what are the relationships between strategies and
    user behaviors?

9
causes of non-understandings
user
system
conversationlevel
intentionlevel
signallevel
channellevel
10
causes of non-understandings
out-of-application
conversationlevel
16
out-of-grammar
intentionlevel
16
ASR error
signallevel
62
endpointer error
channellevel
11
questions under investigation
  • data
  • what are the main causes of non-understandings?
  • how large is their impact on performance?
  • how do various recovery strategies compare to
    each other?
  • what are the relationships between strategies and
    user behaviors?

data causes of non-understandings impact on
performance strategy comparison user behaviors
12
modeling impact on performance
  • logistic regression
  • P(Task Success)

1
1 e-(a ßFNON)
13
questions under investigation
  • data
  • what are the main causes of non-understandings?
  • how large is their impact on performance?
  • how do various recovery strategies compare to
    each other?
  • what are the relationships between strategies and
    user behaviors?

data causes of non-understandings impact on
performance strategy comparison user behaviors
14
strategy performance recovery rate
recovery rate
Help
Yield
Notify
MoveOn
RePrompt
YouCanSay
AskRepeat
AskRephrase
TerseYouCanSay
DetailedReprompt
  • overall logistic ANOVA
  • significant differences in mean recovery rates
  • all pairs comparison (corrected using FDR)

15
questions under investigation
  • data
  • what are the main causes of non-understandings?
  • how large is their impact on performance?
  • how do various recovery strategies compare to
    each other?
  • what are the relationships between strategies and
    user behaviors?

data causes of non-understandings impact on
performance strategy comparison user behaviors
16
user response types
  • tagging scheme by Shin
  • also used by Choularton, Raux
  • 5 categories
  • repeat
  • rephrase
  • contradict
  • change
  • other

17
response types after non-understaning
50
Communicator (Shin et al.)
40
Pizza (choularton dale)
Roomline (this study)
30
20
10
0
contradict
change
other
rephrase
repeat
18
user response types by strategy
100
Other
80
Change
Rephrase
60
Repeat
40
20
0
Help
Yield
Notify
MoveOn
RePrompt
AskRepeat
YouCanSay
AskRephrase
TerseYouCanSay
DetailedReprompt
19
summary
  • sources of non-understandings
  • impact on performance
  • strategy comparison
  • user responses
  • asr, but also language errors ? more shaping
    strategies
  • regression model allows better quantitative
    assessment
  • help, move-on ? further investigate move-on
  • margin for improving control over user responses
  • can we improve global dialog performance by using
    a smarter policy?
  • can we learn a better policy from data?
  • yes
  • preliminary results promising ?

20
thank you! questions
21
rejections
22
strategy performance assessment
  • recovery rate
  • recovery utility
  • weighted sum of correctly and incorrectly
    acquired concepts
  • weights are determined in a data-driven fashion
  • recovery efficiency
  • also takes time to recovery into account

23
experimental design scenarios
  • 10 scenarios, fixed order
  • presented graphically (explained during briefing)

24
strategy pair-wise comparison
  • recovery performance ranked list, based on
    pair-wise t-tests

RNK MOVE HELP TYCS RP YCS ARPH DRP NTFY AREP YLD
MOVE 1 MOVE - - - 1.31 1.33 1.35 1.71 1.8 1.91 2.06
HELP 2 HELP - - - - - - 1.55 1.64 1.73 1.87
HELP 3 TYCS - - - - - - 1.5 1.58 1.68 1.81
SIG 4 RP - - - - - - - - 1.46 1.58
HELP 5 YCS - - - - - - - - 1.44 1.55
SIG 6 ARPH - - - - - - - - 1.42 1.53
SIG ? DRP - - - - - - - - - -
SIG ? NTFY - - - - - - - - - -
SIG ? AREP - - - - - - - - - -
SIG ? YLD - - - - - - - - - -
  • CER evaluation shows similar results

25
recovery for various response-types
26
(No Transcript)
27
impact of recovery rate on performance
  • recovery next turn is correctly understood
  • P(Task Success)

1
1 e-(a ßRecoveryRate)
Write a Comment
User Comments (0)
About PowerShow.com