Estimation Method of User Satisfaction Using N-gram-based Dialog History Model for Spoken Dialog System - PowerPoint PPT Presentation

About This Presentation

Title:

Estimation Method of User Satisfaction Using N-gram-based Dialog History Model for Spoken Dialog System

Description:

Title: Estimation Method of User Satisfaction Using N-gram-based Dialog History Model for Spoken Dialog System Author: Sunao Hara Last modified by – PowerPoint PPT presentation

Number of Views:123

Avg rating:3.0/5.0

Slides: 34

Provided by: Sunao

Learn more at: http://www.lrec-conf.org

Category:

more less

Transcript and Presenter's Notes

Title: Estimation Method of User Satisfaction Using N-gram-based Dialog History Model for Spoken Dialog System

1
Estimation Method of User Satisfaction Using
N-gram-based Dialog History Model for Spoken
Dialog System
LREC2010 O3 - Dialogue and Evaluation

Sunao Hara, Norihide Kitaoka, Kazuya Takeda
naoh, kitaoka, kazuya.takeda_at_nagoya-u.jp

2
Introduction

Introduction
Musicnavi2 database
N-gram modeling
Estimation experiment
Conclusion

The aim of this study
Construct an estimation model of user
satisfaction for spoken dialog systems (SDSs)
based on the real PC environment data
Experiment
Field experiment using a SDS for the music
retrieval application
Construct and evaluate an estimation model for
user satisfaction using N-gram history model

3
Background (1/2)

Use of speech input applications (e.g. Skype)by
PC users is spreading
More users may use Spoken Dialog Systems
(SDSs)via the Internet
The acoustic properties of PC environments differ
among users
e.g. microphones, noise conditions, etc.
From a practical application standpoint
Evaluation and prediction of the system
performance (User Satisfaction) are also
important issues

Collect the speech under realistic PC environment
Build an estimation model for User Satisfaction
4
Background (2/2)

The evaluation using automatically measured
metrics
Tune up the system parameters in the designing
stage
Use to select the best dialog strategy for SDS
applications
PARADISE Framework Walker, et al. 1997
The detection of problematic dialog for call
center Interactive Voice Response (IVR) systems
To detect that the conversation will break
down, as soon as possible
Problematic dialog predictor using SLU-success
feature Walker, et al. 2002
N-gram-based call quality monitoring system Kim
2007

Spoken Language Understanding
Can we estimate the user satisfaction of SDSby
modeling the dialog context?
5
MusicNavi2 database

Introduction
Musicnavi2 database
N-gram modeling
Estimation experiment
Conclusion

Field experiment using a musicretrieval system
with spoken dialog interface
1. Download the system through the Internet
2. Use it for a certain period
3. Fill in questionnaires on the web page
Music retrieval system - MusicNavi2
Music retrieval application Spoken dialog
interface
The spoken dialogue interface for retrievingand
playing songs stored in users PC
Can collect speech data in corporation with a
server program via the Internet

6
Example of a dialog
U User S System
Users utterances / Systems prompts
U Hello ( ko-n-ni-chi-wa)
S Hello
U Da-i-to-ka-i
S Do you want to retrieve the song Da-i-to-ka-i?
U Yes ( ha-i )
S Now, playing the song Da-i-to-ka-i by Crystal King.
U Stop ( te-i-shi )
S Now, stopping.
7
Data collection by the field test

Large scaled field test through the Internet
Subjects used MusicNavi2 on their own PC
Participants 1369 subjects
Total of usage 488 hours
Users task
To listen to at least five songs
To perform at least twenty QA dialogs, or to
use the system for over forty minutes
Questionnaire (only by task complete users)
Satisfaction level for SDS from 1 to 5

1 Extremelyunsatisfied 2Unsatisfied 3Acceptable 4Satisfied 5 Extremelysatisfied
8
Distributions of the experimental subjects
and the equipments used by them

Subjects who answered questionnaires
449 Subjects (278 males and 171 females)
Total 34296 utterances

Microphone
Loudspeaker / headphone
9
Overview of the MusicNavi2 database
Word Error Rate
Utterancesper song played
of utterances
10
Pre-analysis of the MusicNavi2 database

Classification of users by their satisfaction
level
task complete users c 1, 2, 3, 4, 5
task incomplete users c ?
Summary of data
Total 518 subjects

c ? 1 2 3 4 5
of subjects 69 38 102 107 155 47
of utterances 52.2 134.5 119.7 114.9 106.5 98.4
WER 70.5 54.1 51.0 46.8 41.2 35.3
Utt. / song 107 7.21 5.34 5.12 4.22 3.43
11
Modeling method for the dialog
context

Introduction
Musicnavi2 database
N-gram modeling
Estimation experiment
Conclusion

The dialog management of SDS isdesigned by a
dialog developer
The management is not always satisfactory for
users
Assume that satisfaction appears in the dialog
context
Statistically learning the naturalness of the
dialog
Use N-gram to model the dialog context
Construct models for each class of users
Estimate the unknown users satisfaction based on
the likelihood of N-gram model

12
Spoken dialog logs to Dilaog act symbols

Vocabulary size of the recognition dictionary
That is, the number of the songs
Is different between the users
Word level information is informative, but it is
too sparse to deal with as statistically
Use dialog act symbols for the users/systems
acts
Defined 21 system dialog acts and 19 user dialog
acts

13
Example of an encoded dialog
U User S System
Users utterances / Systems prompts
U Hello ( ko-n-ni-chi-wa)
S Hello
U Da-i-to-ka-i
S Do you want to retrieve the song Da-i-to-ka-i?
U Yes ( ha-i )
S Now, playing the song Da-i-to-ka-i by Crystal King.
U Stop ( te-i-shi )
S Now, stopping.
Dialog act symbols
x1 USR_CMD_HELLO
x2 SYS_INFO_GREETING
x3 USR_REQUEST_BYMUSIC
x4 SYS_CONFIRM_KEYWORD
x5 USR_CMD_YES
x6 SYS_PLAY_SONG
x7 USR_CMD_STOP
x8 SYS_INFO_STOPPED
14
Modeling the dialog act sequence by N-gram

A dialog act sequence
arranged the dialog act symbols in time order t.
N-gram probability ( likelihood) when given a
model for a user class c

15
Estimation experiment

Introduction
Musicnavi2 database
N-gram modeling
Estimation experiment
Conclusion

Detection of the users classusing N-gram model
Experimental conditions
N-gram 1-gram, 2-gram, , 8-gram
Witten-Bell smoothing (using SRILM toolkit)
Input sequence USR, SYS, SYSUSR
Leave-one-out cross validation

16
Estimation experiment

Detection method
Model selection by thresholding the likelihood
ratio
Evaluation metrics
ROC curve
Area under the ROC curve (AUC)

true detection
false detection
17
AUC (Area under the ROC curve)

task incomplete users

unsatisfied users

N SYS USR SYSUSR
1-gram 0.901 0.873 0.927
2-gram 0.948 0.929 0.977
3-gram 0.989 0.954 0.993
4-gram 0.995 0.952 0.997
5-gram 0.993 0.954 0.995
6-gram 0.989 0.951 0.995
7-gram 0.988 0.946 0.995
8-gram 0.987 0.936 0.994
SYS USR SYSUSR
0.611 0.638 0.619
0.628 0.644 0.724
0.591 0.651 0.704
0.583 0.681 0.739
0.629 0.662 0.739
0.632 0.639 0.761
0.604 0.633 0.765
0.592 0.622 0.756
18
Detection result of task incomplete users

SYSUSR

N AUC
1-gram 0.927
2-gram 0.977
3-gram 0.993
4-gram 0.997
5-gram 0.995
6-gram 0.995
7-gram 0.995
8-gram 0.994
19
Detection result of unsatisfied users

SYSUSR

N AUC
1-gram 0.619
2-gram 0.724
3-gram 0.704
4-gram 0.739
5-gram 0.739
6-gram 0.761
7-gram 0.765
8-gram 0.756
20
Conclusion

Introduction
Musicnavi2 database
N-gram modeling
Estimation experiment
Conclusion

Estimation method of user satisfactionusing
N-gram-based dialog history model for SDS
Constructed the real PC environmental database
Achieved high performance in the detection of
task incomplete users
100 true detection rate, when 6 false detection
rate
Not sufficient performance in the detection of
unsatisfied users
N-gram model was effective by comparison of
1-gram
Using both system and user dialog act was
effective
Future works
N-gram model-based estimation of dialog failure
(online detection)
Analysis of the dialog context affected user
satisfaction
Integrated method of using acoustic features,
prosodic features, dialog features, etc.

Thanks for your kind attention!

22
(No Transcript)
23
Modeling the dialog act sequence by N-gram

Encoded dialog logs to dialog act symbols
automatically
A dialog act sequence x
arranged the dialog act symbols in time order t.
N-gram probability(Likelihood) when given a
model with a satisfaction level s

Usersdialog acts Using speech recognition results They are defined in recognition dictionary
Systems dialog acts Using system prompts or responses They are the same as systems internal act
24
Detection by thresholding

Model selection by an a posteriori odds
classifier,
Introduce a priori odds 1/a and Bayes factor B
Finally,

a 1 means ML classifier
25
6-???????????

N-gram???????????????????
????
??????1???????517?????????????????(Leave one
out)
??? s ? (?????), 1(??), 2, 3, 4, 5(??)
N-gram 1-gram, 2-gram, , 8-gram
????
?????????????(USR)
??????????????(SYS)
?????????????????(USRSYS)
????
????(Accuracy)

26
???(6-???)?????

???????????????
???????? x ????????????????????
?????????????

27
Detection result for 6-classes of satisfaction
28
Confusion matrix

3-gram of SYS sequence

Estimated Estimated Estimated Estimated Estimated Estimated Estimated
? 1 2 3 4 5
? 43 5 7 5 6 3
1 0 7 8 9 11 3
2 1 8 31 16 35 11
3 0 9 22 23 45 8
4 0 8 34 29 66 18
5 0 4 5 6 24 8
Actual
29
???????????????

???????????????????????????????
?????????????????

? ?????
? ?????
??? ???
? ?????
??????
30
(No Transcript)
31
Modeling the N-gram

Encoded to dialog log to dialog act symbols
automatically
Users dialog acts
Using speech recognition results
They are defined in recognition dictionary
Systems dialog acts
Using system responses or acts
They are the same as systems internal act
A dialog act sequence x
Arranged the dialog act symbols in time order t.
6?????????N-gram??????
Witten-Bell smoothing SRILM toolkit ???

32
Example of a dialog
U Hello ( ko-n-ni-chi-wa)
S Hello
U Da-i-to-ka-i
S Do you want to retrieve the song Da-i-to-ka-i?
U Yes ( ha-i )
S Now, playing the song Da-i-to-ka-i by Crystal King.
U Stop ( te-i-shi )
S Now, stopping.
U User S System
33