Title: Estimation Method of User Satisfaction Using N-gram-based Dialog History Model for Spoken Dialog System
1Estimation Method of User Satisfaction Using
N-gram-based Dialog History Model for Spoken
Dialog System
LREC2010 O3 - Dialogue and Evaluation
- Sunao Hara, Norihide Kitaoka, Kazuya Takeda
- naoh, kitaoka, kazuya.takeda_at_nagoya-u.jp
2Introduction
- Introduction
- Musicnavi2 database
- N-gram modeling
- Estimation experiment
- Conclusion
- The aim of this study
- Construct an estimation model of user
satisfaction for spoken dialog systems (SDSs)
based on the real PC environment data - Experiment
- Field experiment using a SDS for the music
retrieval application - Construct and evaluate an estimation model for
user satisfaction using N-gram history model
3Background (1/2)
- Use of speech input applications (e.g. Skype)by
PC users is spreading - More users may use Spoken Dialog Systems
(SDSs)via the Internet - The acoustic properties of PC environments differ
among users - e.g. microphones, noise conditions, etc.
- From a practical application standpoint
- Evaluation and prediction of the system
performance (User Satisfaction) are also
important issues
Collect the speech under realistic PC environment
Build an estimation model for User Satisfaction
4Background (2/2)
- The evaluation using automatically measured
metrics - Tune up the system parameters in the designing
stage - Use to select the best dialog strategy for SDS
applications - PARADISE Framework Walker, et al. 1997
- The detection of problematic dialog for call
center Interactive Voice Response (IVR) systems - To detect that the conversation will break
down, as soon as possible - Problematic dialog predictor using SLU-success
feature Walker, et al. 2002 - N-gram-based call quality monitoring system Kim
2007
Spoken Language Understanding
Can we estimate the user satisfaction of SDSby
modeling the dialog context?
5MusicNavi2 database
- Introduction
- Musicnavi2 database
- N-gram modeling
- Estimation experiment
- Conclusion
- Field experiment using a musicretrieval system
with spoken dialog interface - 1. Download the system through the Internet
- 2. Use it for a certain period
- 3. Fill in questionnaires on the web page
- Music retrieval system - MusicNavi2
- Music retrieval application Spoken dialog
interface - The spoken dialogue interface for retrievingand
playing songs stored in users PC - Can collect speech data in corporation with a
server program via the Internet
6Example of a dialog
U User S System
Users utterances / Systems prompts
U Hello ( ko-n-ni-chi-wa)
S Hello
U Da-i-to-ka-i
S Do you want to retrieve the song Da-i-to-ka-i?
U Yes ( ha-i )
S Now, playing the song Da-i-to-ka-i by Crystal King.
U Stop ( te-i-shi )
S Now, stopping.
7Data collection by the field test
- Large scaled field test through the Internet
- Subjects used MusicNavi2 on their own PC
- Participants 1369 subjects
- Total of usage 488 hours
- Users task
- To listen to at least five songs
- To perform at least twenty QA dialogs, or to
use the system for over forty minutes - Questionnaire (only by task complete users)
- Satisfaction level for SDS from 1 to 5
-
1 Extremelyunsatisfied 2Unsatisfied 3Acceptable 4Satisfied 5 Extremelysatisfied
8Distributions of the experimental subjects
and the equipments used by them
- Subjects who answered questionnaires
- 449 Subjects (278 males and 171 females)
- Total 34296 utterances
Microphone
Loudspeaker / headphone
9Overview of the MusicNavi2 database
Word Error Rate
Utterancesper song played
of utterances
10Pre-analysis of the MusicNavi2 database
- Classification of users by their satisfaction
level - task complete users c 1, 2, 3, 4, 5
- task incomplete users c ?
- Summary of data
- Total 518 subjects
c ? 1 2 3 4 5
of subjects 69 38 102 107 155 47
of utterances 52.2 134.5 119.7 114.9 106.5 98.4
WER 70.5 54.1 51.0 46.8 41.2 35.3
Utt. / song 107 7.21 5.34 5.12 4.22 3.43
11Modeling method for the dialog
context
- Introduction
- Musicnavi2 database
- N-gram modeling
- Estimation experiment
- Conclusion
- The dialog management of SDS isdesigned by a
dialog developer - The management is not always satisfactory for
users - Assume that satisfaction appears in the dialog
context - Statistically learning the naturalness of the
dialog - Use N-gram to model the dialog context
- Construct models for each class of users
- Estimate the unknown users satisfaction based on
the likelihood of N-gram model
12Spoken dialog logs to Dilaog act symbols
- Vocabulary size of the recognition dictionary
- That is, the number of the songs
- Is different between the users
- Word level information is informative, but it is
too sparse to deal with as statistically - Use dialog act symbols for the users/systems
acts - Defined 21 system dialog acts and 19 user dialog
acts
13Example of an encoded dialog
U User S System
Users utterances / Systems prompts
U Hello ( ko-n-ni-chi-wa)
S Hello
U Da-i-to-ka-i
S Do you want to retrieve the song Da-i-to-ka-i?
U Yes ( ha-i )
S Now, playing the song Da-i-to-ka-i by Crystal King.
U Stop ( te-i-shi )
S Now, stopping.
Dialog act symbols
x1 USR_CMD_HELLO
x2 SYS_INFO_GREETING
x3 USR_REQUEST_BYMUSIC
x4 SYS_CONFIRM_KEYWORD
x5 USR_CMD_YES
x6 SYS_PLAY_SONG
x7 USR_CMD_STOP
x8 SYS_INFO_STOPPED
14Modeling the dialog act sequence by N-gram
- A dialog act sequence
- arranged the dialog act symbols in time order t.
- N-gram probability ( likelihood) when given a
model for a user class c -
-
15Estimation experiment
- Introduction
- Musicnavi2 database
- N-gram modeling
- Estimation experiment
- Conclusion
- Detection of the users classusing N-gram model
- Experimental conditions
- N-gram 1-gram, 2-gram, , 8-gram
- Witten-Bell smoothing (using SRILM toolkit)
- Input sequence USR, SYS, SYSUSR
- Leave-one-out cross validation
16Estimation experiment
- Detection method
- Model selection by thresholding the likelihood
ratio - Evaluation metrics
- ROC curve
- Area under the ROC curve (AUC)
true detection
false detection
17AUC (Area under the ROC curve)
N SYS USR SYSUSR
1-gram 0.901 0.873 0.927
2-gram 0.948 0.929 0.977
3-gram 0.989 0.954 0.993
4-gram 0.995 0.952 0.997
5-gram 0.993 0.954 0.995
6-gram 0.989 0.951 0.995
7-gram 0.988 0.946 0.995
8-gram 0.987 0.936 0.994
SYS USR SYSUSR
0.611 0.638 0.619
0.628 0.644 0.724
0.591 0.651 0.704
0.583 0.681 0.739
0.629 0.662 0.739
0.632 0.639 0.761
0.604 0.633 0.765
0.592 0.622 0.756
18Detection result of task incomplete users
N AUC
1-gram 0.927
2-gram 0.977
3-gram 0.993
4-gram 0.997
5-gram 0.995
6-gram 0.995
7-gram 0.995
8-gram 0.994
19Detection result of unsatisfied users
N AUC
1-gram 0.619
2-gram 0.724
3-gram 0.704
4-gram 0.739
5-gram 0.739
6-gram 0.761
7-gram 0.765
8-gram 0.756
20Conclusion
- Introduction
- Musicnavi2 database
- N-gram modeling
- Estimation experiment
- Conclusion
- Estimation method of user satisfactionusing
N-gram-based dialog history model for SDS - Constructed the real PC environmental database
- Achieved high performance in the detection of
task incomplete users - 100 true detection rate, when 6 false detection
rate - Not sufficient performance in the detection of
unsatisfied users - N-gram model was effective by comparison of
1-gram - Using both system and user dialog act was
effective - Future works
- N-gram model-based estimation of dialog failure
(online detection) - Analysis of the dialog context affected user
satisfaction - Integrated method of using acoustic features,
prosodic features, dialog features, etc.
21- Thanks for your kind attention!
22(No Transcript)
23Modeling the dialog act sequence by N-gram
- Encoded dialog logs to dialog act symbols
automatically - A dialog act sequence x
- arranged the dialog act symbols in time order t.
- N-gram probability(Likelihood) when given a
model with a satisfaction level s -
-
Usersdialog acts Using speech recognition results They are defined in recognition dictionary
Systems dialog acts Using system prompts or responses They are the same as systems internal act
24Detection by thresholding
- Model selection by an a posteriori odds
classifier, - Introduce a priori odds 1/a and Bayes factor B
- Finally,
a 1 means ML classifier
256-???????????
- N-gram???????????????????
- ????
- ??????1???????517?????????????????(Leave one
out) - ??? s ? (?????), 1(??), 2, 3, 4, 5(??)
- N-gram 1-gram, 2-gram, , 8-gram
- ????
- ?????????????(USR)
- ??????????????(SYS)
- ?????????????????(USRSYS)
- ????
- ????(Accuracy)
26???(6-???)?????
- ???????????????
- ???????? x ????????????????????
- ?????????????
27Detection result for 6-classes of satisfaction
28Confusion matrix
Estimated Estimated Estimated Estimated Estimated Estimated Estimated
? 1 2 3 4 5
? 43 5 7 5 6 3
1 0 7 8 9 11 3
2 1 8 31 16 35 11
3 0 9 22 23 45 8
4 0 8 34 29 66 18
5 0 4 5 6 24 8
Actual
29???????????????
- ???????????????????????????????
- ?????????????????
? ?????
? ?????
??? ???
? ?????
??????
30(No Transcript)
31Modeling the N-gram
- Encoded to dialog log to dialog act symbols
automatically - Users dialog acts
- Using speech recognition results
- They are defined in recognition dictionary
- Systems dialog acts
- Using system responses or acts
- They are the same as systems internal act
- A dialog act sequence x
- Arranged the dialog act symbols in time order t.
- 6?????????N-gram??????
- Witten-Bell smoothing SRILM toolkit ???
-
32Example of a dialog
U Hello ( ko-n-ni-chi-wa)
S Hello
U Da-i-to-ka-i
S Do you want to retrieve the song Da-i-to-ka-i?
U Yes ( ha-i )
S Now, playing the song Da-i-to-ka-i by Crystal King.
U Stop ( te-i-shi )
S Now, stopping.
U User S System
33- Introduction
- Musicnavi2 database
- N-gram modeling
- Estimation experiment
- Conclusion