Title: Error Handling in the RavenClaw Dialog Management Framework
1Error Handling in the RavenClaw Dialog Management
Framework Dan Bohus, Alexander I.
Rudnicky Computer Science Department, Carnegie
Mellon University
( infrastructure architecture )
( current research )
belief updating
RavenClaw dialog management
1
3
Bohus and Rudnicky - Constructing Accurate
Beliefs in Spoken Dialog Systems, in ASRU-2005
- RavenClaw dialog management framework for
complex, task-oriented domains - Dialog Task Specification (DTS) hierarchical
plan which captures the domain-specific
dialog control logic - Dialog Engine executes a given DTS.
- platform for research
- error handling this poster
- multi-participant dialog (Thomas Harris)
- turn-taking (Antoine Raux)
systems built
demo Roomline
- problem confidence scores provide an initial
assessment for the reliability of the
information obtained from the user. However, a
system should leverage information available
in subsequent user responses in order to
update and improve the accuracy of its beliefs. - goal bridge confidence annotation and
correction detection in a unified framework
for belief updating in task oriented spoken
dialog system - approach
- machine learning (generalized linear models)
- integrate features from multiple knowledge
sources in the system - work with compressed beliefs
- top hypothesis other ASRU-2005 paper
- k hypotheses other work in progress
sample problem
- conference room reservations
- live schedules for 13 rooms in 2 buildings
on campus - size, location, a/v equipment
- recognition sphinx-2 3-gram
- parsing phoenix
- synthesis cepstral theta
- information access
- Lets Go! Bus Information, RoomLine
- guidance through procedures
- LARRI, IPA
- taskable agent
- Vera
- command-and-control
- TeamTalk
S where would you like to fly from? U
Boston/0.45 Austin/0.30 S sorry, did you
say you wanted to fly from Boston? U No/0.37
Aspen / 0.7 Updated belief ? Boston/?
Austin/? Aspen/?
results data driven approach significantlyoutper
forms common heuristics
error rates
30
30
initial
RavenClaw error handling architecture
20
20
2
heuristic
10
10
proposed
- goal task-independent, adaptive and scalable
error handling architecture - approach
- error handling strategies and error handling
decision process are decoupled from the
dialog task ? reusability, uniformity,
plug-and-play strategies, lessens development
effort - error handling decision process implemented in a
distributed fashion - local concept error handling decision process
(handles potential misunderstandings) - local request error handling decision process
(handles non-understandings) - currently implemented as POMDPs (for concepts)
and MDPs (for request agents)
updates followingexplicit confirmation
updates followingimplicit confirmation
oracle
misunderstanding recovery strategies
learning policies for recovering from
non-understandings
4
Explicit Confirmation Did you say you wanted a
room on Friday? Implicit Confirmation a room on
Friday for what time?
Bohus and Rudnicky - Sorry, I didnt Catch That!
an Investigation of Non-understanding Errors
and Recovery Strategies, in SIGdial-2005
- question can dialog performance be improved by
using a better, more informed policy for
engaging non-understanding recovery strategies? - approach a between-groups experiment
- control group system chooses a
non-understanding strategy randomly - (i.e. in an uninformed
fashion) - wizard group a human wizard chooses which
strategy should be used
whenever a non-understanding happens - 23 participants in each condition
- first-time users, balanced by gender x native
language - each attempted a maximum of 10 scenario-based
interactions - evaluated global dialog performance (task
success) and various local non-understanding
recovery performance metrics (see side panel)
results wizard policy outperforms uninformed
recovery policy on a number of global and local
metrics
80
100
non-understanding recovery strategies
wizard policy
80
60
uninformed policy
60
RoomLine
40
40
- AskRepeat
- Can you please repeat that?
- AskRephrase
- Could you please try to rephrase that?
- Reprompt
- Would you like a small or a large room?
- DetailedReprompt
- Sorry, Im not sure I understood you correctly.
Right now I need to know if you would prefer
a small or a large room. - Notify
- Sorry, I didnt catch that
- Yield
- Ø
- MoveOn
- Sorry, I didt catch that. Once choice would
be Wean Hall 7220. Would you like a reserva-
tion for this room? - YouCanSay
- Sorry, I didnt catch that. Right now Im
trying to find out if you would prefer a small
room or a large one. You can say I want a
small room or I want a large room. If the
size of the room doesnt matter to you, just
say I dont care. - TerseYouCanSay
- Full-Help
- Sorry, I didnt catch that. So far I found 5
rooms matching your constraints. Right now Im
trying to find out if you would prefer a small
room or a large one. You can say I want a
small room or I want a large room. If the
size of the room doesnt matter to you, just
say I dont care.
20
20
non-natives
natives
non-natives
natives
iWelcome
GetQuery
x DoQuery
DiscussResults
avg. task success rate
avg. recovery WER
1
5
date
end_time
4
1
rGetDate
rGetEndTime
3
concept error handling MDP
0
2
concept error handling MDP
-1
start_time
1
rGetStartTime
non-natives
natives
non-natives
natives
avg. recovery conceptutility
avg. recovery efficiency
request error handling MDP
request error handling MDP
predicting likelihood of success
- question can we learn a better policy from
data? - decision theoretic approach
- learn to predict likelihood of success for each
strategy - use features available at runtime
- stepwise logistic regression (good class
posterior probabilities) - compute expected utility for each strategy
- choose strategy with maximum expected utility
- preliminary results promising, a new experiment
needed for validation
for 5 of 10 strategies models perform better than
a majority baseline, on both soft and hard error
dialog task specification
majority baseline error ? cross-validation error
dialog engine
start_time start_time time
Reprompt 49.2 ? 32.8
YouCanSay 48.6 ? 34.3
TerseYouCanSay 43.5 ? 32.6
MoveOn 35.6 ? 30.0
DetailedReprompt 37.7 ? 34.4
date date start_time start_time
time end_time end_time time
error handling decision process
ExplConf(start_time)
date date start_time start_time
time end_time end_time time location
location network with_network ? true
without_network ? false
GetStartTime
GetQuery
error handling strategies
RoomLine
rejection threshold optimization
transfer of confidence annotators across domains
dialog stack
expectation agenda
5
6
System User Parse System
For when do you need the room? Lets try two
to four p.m. time(two) end_time(four) Did
you say you wanted the room starting at two
p.m.?
Bohus and Rudnicky - A Principled Approach for
Rejection Threshold Optimization in Spoken
Dialog Systems, to be presented at Interspeech
work in progress, in collaboration with Antoine
Raux
- error handling strategies are implemented as
library dialog agents - new strategies can be plugged in as they are
developed
- data-driven approach for tuning state-specific
rejection thresholds in a spoken dialog system
- migrate (adapt) a confidence annotator trained
with data from domain A to domain B, without
any labeled data in the new domain (B).