Error Handling in the RavenClaw Dialog Management Framework - PowerPoint PPT Presentation

About This Presentation

Title:

Error Handling in the RavenClaw Dialog Management Framework

Description:

Did you say you wanted a room on Friday? Implicit Confirmation. a room on Friday ... error handling [this poster] multi-participant dialog (Thomas Harris) ... – PowerPoint PPT presentation

Number of Views:104

Avg rating:3.0/5.0

Slides: 2

Provided by: dbo1

Category:

more less

Transcript and Presenter's Notes

Title: Error Handling in the RavenClaw Dialog Management Framework

1
Error Handling in the RavenClaw Dialog Management
Framework Dan Bohus, Alexander I.
Rudnicky Computer Science Department, Carnegie
Mellon University
( infrastructure architecture )
( current research )
belief updating
RavenClaw dialog management
1
3
Bohus and Rudnicky - Constructing Accurate
Beliefs in Spoken Dialog Systems, in ASRU-2005

RavenClaw dialog management framework for
complex, task-oriented domains
Dialog Task Specification (DTS) hierarchical
plan which captures the domain-specific
dialog control logic
Dialog Engine executes a given DTS.
platform for research
error handling this poster
multi-participant dialog (Thomas Harris)
turn-taking (Antoine Raux)

systems built
demo Roomline

problem confidence scores provide an initial
assessment for the reliability of the
information obtained from the user. However, a
system should leverage information available
in subsequent user responses in order to
update and improve the accuracy of its beliefs.
goal bridge confidence annotation and
correction detection in a unified framework
for belief updating in task oriented spoken
dialog system
approach
machine learning (generalized linear models)
integrate features from multiple knowledge
sources in the system
work with compressed beliefs
top hypothesis other ASRU-2005 paper
k hypotheses other work in progress

sample problem

conference room reservations
live schedules for 13 rooms in 2 buildings
on campus
size, location, a/v equipment
recognition sphinx-2 3-gram
parsing phoenix
synthesis cepstral theta

information access
Lets Go! Bus Information, RoomLine
guidance through procedures
LARRI, IPA
taskable agent
Vera
command-and-control
TeamTalk

S where would you like to fly from? U
Boston/0.45 Austin/0.30 S sorry, did you
say you wanted to fly from Boston? U No/0.37
Aspen / 0.7 Updated belief ? Boston/?
Austin/? Aspen/?
results data driven approach significantlyoutper
forms common heuristics
error rates
30
30
initial
RavenClaw error handling architecture
20
20
2
heuristic
10
10
proposed

goal task-independent, adaptive and scalable
error handling architecture
approach
error handling strategies and error handling
decision process are decoupled from the
dialog task ? reusability, uniformity,
plug-and-play strategies, lessens development
effort
error handling decision process implemented in a
distributed fashion
local concept error handling decision process
(handles potential misunderstandings)
local request error handling decision process
(handles non-understandings)
currently implemented as POMDPs (for concepts)
and MDPs (for request agents)

updates followingexplicit confirmation
updates followingimplicit confirmation
oracle
misunderstanding recovery strategies
learning policies for recovering from
non-understandings
4
Explicit Confirmation Did you say you wanted a
room on Friday? Implicit Confirmation a room on
Friday for what time?
Bohus and Rudnicky - Sorry, I didnt Catch That!
an Investigation of Non-understanding Errors
and Recovery Strategies, in SIGdial-2005

question can dialog performance be improved by
using a better, more informed policy for
engaging non-understanding recovery strategies?
approach a between-groups experiment
control group system chooses a
non-understanding strategy randomly
(i.e. in an uninformed
fashion)
wizard group a human wizard chooses which
strategy should be used
whenever a non-understanding happens
23 participants in each condition
first-time users, balanced by gender x native
language
each attempted a maximum of 10 scenario-based
interactions
evaluated global dialog performance (task
success) and various local non-understanding
recovery performance metrics (see side panel)

results wizard policy outperforms uninformed
recovery policy on a number of global and local
metrics

80

100
non-understanding recovery strategies
wizard policy
80
60
uninformed policy
60
RoomLine
40
40

AskRepeat
Can you please repeat that?
AskRephrase
Could you please try to rephrase that?
Reprompt
Would you like a small or a large room?
DetailedReprompt
Sorry, Im not sure I understood you correctly.
Right now I need to know if you would prefer
a small or a large room.
Notify
Sorry, I didnt catch that
Yield
Ø
MoveOn
Sorry, I didt catch that. Once choice would
be Wean Hall 7220. Would you like a reserva-
tion for this room?
YouCanSay
Sorry, I didnt catch that. Right now Im
trying to find out if you would prefer a small
room or a large one. You can say I want a
small room or I want a large room. If the
size of the room doesnt matter to you, just
say I dont care.
TerseYouCanSay
Full-Help
Sorry, I didnt catch that. So far I found 5
rooms matching your constraints. Right now Im
trying to find out if you would prefer a small
room or a large one. You can say I want a
small room or I want a large room. If the
size of the room doesnt matter to you, just
say I dont care.

20
20
non-natives
natives
non-natives
natives
iWelcome
GetQuery
x DoQuery
DiscussResults
avg. task success rate
avg. recovery WER

1

5
date
end_time
4
1
rGetDate
rGetEndTime
3
concept error handling MDP
0
2
concept error handling MDP
-1
start_time
1
rGetStartTime
non-natives
natives
non-natives
natives
avg. recovery conceptutility
avg. recovery efficiency
request error handling MDP
request error handling MDP
predicting likelihood of success

question can we learn a better policy from
data?
decision theoretic approach
learn to predict likelihood of success for each
strategy
use features available at runtime
stepwise logistic regression (good class
posterior probabilities)
compute expected utility for each strategy
choose strategy with maximum expected utility
preliminary results promising, a new experiment
needed for validation

for 5 of 10 strategies models perform better than
a majority baseline, on both soft and hard error
dialog task specification
majority baseline error ? cross-validation error
dialog engine
start_time start_time time
Reprompt 49.2 ? 32.8
YouCanSay 48.6 ? 34.3
TerseYouCanSay 43.5 ? 32.6
MoveOn 35.6 ? 30.0
DetailedReprompt 37.7 ? 34.4
date date start_time start_time
time end_time end_time time
error handling decision process
ExplConf(start_time)
date date start_time start_time
time end_time end_time time location
location network with_network ? true
without_network ? false
GetStartTime
GetQuery
error handling strategies
RoomLine
rejection threshold optimization
transfer of confidence annotators across domains
dialog stack
expectation agenda
5
6
System User Parse System
For when do you need the room? Lets try two
to four p.m. time(two) end_time(four) Did
you say you wanted the room starting at two
p.m.?
Bohus and Rudnicky - A Principled Approach for
Rejection Threshold Optimization in Spoken
Dialog Systems, to be presented at Interspeech
work in progress, in collaboration with Antoine
Raux