Title: Intelligent Help (or lack thereof) in Spoken Dialog Systems
1HELP!
- Intelligent Help (or lack thereof) in Spoken
Dialog Systems - Dialogs on Dialogs discussion
- Stefanie Tomko
- 20-Feb-04
2Papers
- Adding intelligent help to mixed-initiative
spoken dialogue systems. G. Gorrell, I. Lewin
M. Rayner. In Proc. of ICSLP, 2002. - Targeted help for spoken dialogue systems
intelligent feedback improves naive users'
performance. B.A. Hockey, O. Lemon, E. Campana,
L. Hiatt, G. Aist, J. Hieronymous, A. Gruenstein,
J. Downding. In Proc. of EACL, 2003. - ????? ? there isn't a lot out there about this!
3We need Help!
- 56 of NL system users in experiment asked for
help without explicit knowledge that they could
do so - Speech Graffiti users knew about various
help/orientation keywords - 91 used options
- 70 used where was I?
- 48 used help
4What is Help?
How do I do ltsomethinggt?
I didn't understand what you said.
How do I say that?
Where was I?
What can I do?
5User-initiated Help examples
- NL Movieline
- Wordy, general
- This system allows you to obtain movie and
theater information for Pittsburgh. You can ask
for the location, phone number, and movie listing
for a certain theater. Or, you can ask about a
particular movie to get the rating and or genre
find out where it is playing. Specify both a
movie and theater to obtain showtimes. If you get
stuck, you can say Reset, to start over. - Jupiter
- Example based
- You can ask about general weather forecasts as
well as information on temperature, windspeed, - Try saying one of the following 'what's the
weather for Denver?' 'what cities do you know
about?' 'what do you know about besides weather?'
'what can I say?' - Try saying one of the following 'are there any
advisories for the United States?' 'what is the
extended forecast for Boston?' 'will it rain in
Toronto?'
6User-initiated Help examples
- Speech Graffiti
- Somewhat "state" based
- slot options you can say, rating is... G, PG,
PG-13, R, NC-17, not rated, or you can ask, what
is the rating? - options you can specify or ask about title, show
time, day - help gives list of keywords on 1st round, then
gives explanation of keyword functions - TellMe
- Orientation, or, at main level, lots of general
system info - You're in Sports, in the NHL section.
7These are all kind of "dumb"
- They might not take system state into account
- They aren't smart about what users really want to
do - They might not tell users exactly how to speak
- They might not orient users to where they are in
the system - But at least they give users some information
8System-initiated "Help" examples
- NL Movieline
- Excuse me?
- Didn't catch that.
- Jupiter
- Pardon me?
- Speech Graffiti
- I'm sorry, I'm having trouble understanding you
- TellMe
- I'm sorry, I didn't get that. Please say a
category in Travel. - These are really dumb!
9Intelligent/Targeted Help
- Makes system-initiated help a little smarter
- Goal provide immediate feedback, tailored to
what the user said, for cases in which the system
was not able to understand an utterance - Kind of different perspective compared to
traditional error handling
What should I do to deal with this error?
How can I help the user not make this error in
the future?
10Gorrell et al ICSLP paper
- Grammar-based vs. statistical LMs
- Grammars easy to create (?)
- GB performs better if users know what to say
- SLMs better for unusual less constrained utts
- 1st attempt recognition only (i.e. no help)
- Run all utts through GBLM SLM, choose based on
confidence scores - Not reliable enough
11On/Off House
- User initiative
- Natural language
- Turn off the light in the bathroom
- Are the hall and kitchen lights switched on?
- Could you tell me which lights are on?
12Targeted Help
13Classification
- Hand-classified training set
- 12 classes
- 24 features
- Most common classes
- REFEXP_COMMAND (35)
- I didn't quite catch that. To turn a device on or
off, you could try something like 'turn on the
kitchen light.' - LONG_COMMAND (13)
- I didn't quite catch that. Long commands can be
difficult to understand. Perhaps try giving
separate commands for each device. - PRON_COMMAND (11)
- I didn't quite catch that. To change the status
of a device or group of devices you've just
referred to, you could try for example 'turn it
on' or 'turn them off.'
14Evaluation
- Baseline classification error 65
- Cross-validated final decision tree error 12
- Between-subjects user study task
- call a voice-enabled house leave it in a secure
state - No training
- Targeted help (N16) vs. control help (N15)
15Results
Targeted help Control help
WER (GB only?) 39 55
Grammaticality 47 36
WER(?) 1st 5 utts 45 76
16Results (2)
- Targeted help group had more variety in
constructions - Targeted help users requested help more often
- Six TH users vs. only one (!) control user
- Longer dialogs in TH groups
- Some of this is system exploration
- No significant differences in awareness of final
house state or perception of systems' abilities - No comparison of task completion
17Hockey et al EACL paper
- Domain WITAS command control for robotic
helicopter - Targeted Help is an independent module
SLM parsable?
Grammar-based LM parsable?
Send to SLM
no
yes
no
yes
Play regular output
Create play appropriate help message
18Help message content
- Message contains one or more of
- A. What the system heard
- A report of the backup SLM recognition hypothesis
- B. What the problem was (diagnostic)
- A description of the problem with the user's
utterance - C. What you might say instead
- A similar in-grammar example
- Rule-based determination of exact content
for B C - Not clear how often A B C appear in what
combinations
19B. Diagnostic
- Endpointing
- Check if initial recognized word is ok initial
parsable-input word - Out-of-volcabulary
- Compare SLM vocab to GBLM vocab
- Subcategorization
- Check features of verbs in SLM hypothesis
- Zoom in intrans
- gt ! Zoom in on the red car
20C. In-grammar example
- Try to use words dialog-move type from user's
original utterance - wh-question
- yn-question
- answer
- command
21Evaluation
- Between-groups user study
- Targeted help vs. no help
- Was user-initiated help available?
- N20, 5 tasks each
- Only T1 T5 assessed
- Locate an x and then land at the y
22Results
- Significantly fewer TH users gave up on tasks
- Control users gave up on 39 of tasks
- TH users gave up on only 6
- Time to completion effects
- Hard to measure "completion!"
- Task (gt users get better over time)
- Help x Task
- Help alone
- (plt.1 in "lenient" analysis)
23Discussion
- Definitely an improvement over "dumb" options
- How easy are these options to automate and port
to new domains/systems? - Classifier version needs training data
- Rule-based version needs rules
- Is there such a thing as too smart?
- The system doesn't understand the word X
- The system doesn't understand the word X used
with the red car
24Discussion (2)
- Do grammaticality improvements fostered by TH
persist? - How frequently is TH activated?
- Does frequency decrease over time?
- At a faster rate cf. plain-old help?
- In rule-based system, how often do both LMs fail?
25Discussion (3)
- How often does either system (esp. rule-based)
provide inappropriate help? - Wrong dialogue-move type?
- Wrong vocabulary?
- What of 1st-utt-after-TH are grammatical?
- cf. plain-old help
- Are there other ways to implement/ supplement TH?
- State information?
- Back-off to directed dialog? (in worst case)
26Anything else?
- Let me know if you come across any more
references to this sort of thing