Estimation Method of User Satisfaction Using N-gram-based Dialog History Model for Spoken Dialog System - PowerPoint PPT Presentation

About This Presentation
Title:

Estimation Method of User Satisfaction Using N-gram-based Dialog History Model for Spoken Dialog System

Description:

Title: Estimation Method of User Satisfaction Using N-gram-based Dialog History Model for Spoken Dialog System Author: Sunao Hara Last modified by – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 34
Provided by: Sunao
Learn more at: http://www.lrec-conf.org
Category:

less

Transcript and Presenter's Notes

Title: Estimation Method of User Satisfaction Using N-gram-based Dialog History Model for Spoken Dialog System


1
Estimation Method of User Satisfaction Using
N-gram-based Dialog History Model for Spoken
Dialog System
LREC2010 O3 - Dialogue and Evaluation
  • Sunao Hara, Norihide Kitaoka, Kazuya Takeda
  • naoh, kitaoka, kazuya.takeda_at_nagoya-u.jp

2
Introduction
  1. Introduction
  2. Musicnavi2 database
  3. N-gram modeling
  4. Estimation experiment
  5. Conclusion
  • The aim of this study
  • Construct an estimation model of user
    satisfaction for spoken dialog systems (SDSs)
    based on the real PC environment data
  • Experiment
  • Field experiment using a SDS for the music
    retrieval application
  • Construct and evaluate an estimation model for
    user satisfaction using N-gram history model

3
Background (1/2)
  • Use of speech input applications (e.g. Skype)by
    PC users is spreading
  • More users may use Spoken Dialog Systems
    (SDSs)via the Internet
  • The acoustic properties of PC environments differ
    among users
  • e.g. microphones, noise conditions, etc.
  • From a practical application standpoint
  • Evaluation and prediction of the system
    performance (User Satisfaction) are also
    important issues

Collect the speech under realistic PC environment
Build an estimation model for User Satisfaction
4
Background (2/2)
  • The evaluation using automatically measured
    metrics
  • Tune up the system parameters in the designing
    stage
  • Use to select the best dialog strategy for SDS
    applications
  • PARADISE Framework Walker, et al. 1997
  • The detection of problematic dialog for call
    center Interactive Voice Response (IVR) systems
  • To detect that the conversation will break
    down, as soon as possible
  • Problematic dialog predictor using SLU-success
    feature Walker, et al. 2002
  • N-gram-based call quality monitoring system Kim
    2007

Spoken Language Understanding
Can we estimate the user satisfaction of SDSby
modeling the dialog context?
5
MusicNavi2 database
  1. Introduction
  2. Musicnavi2 database
  3. N-gram modeling
  4. Estimation experiment
  5. Conclusion
  • Field experiment using a musicretrieval system
    with spoken dialog interface
  • 1. Download the system through the Internet
  • 2. Use it for a certain period
  • 3. Fill in questionnaires on the web page
  • Music retrieval system - MusicNavi2
  • Music retrieval application Spoken dialog
    interface
  • The spoken dialogue interface for retrievingand
    playing songs stored in users PC
  • Can collect speech data in corporation with a
    server program via the Internet

6
Example of a dialog
U User S System
Users utterances / Systems prompts
U Hello ( ko-n-ni-chi-wa)
S Hello
U Da-i-to-ka-i
S Do you want to retrieve the song Da-i-to-ka-i?
U Yes ( ha-i )
S Now, playing the song Da-i-to-ka-i by Crystal King.
U Stop ( te-i-shi )
S Now, stopping.
7
Data collection by the field test
  • Large scaled field test through the Internet
  • Subjects used MusicNavi2 on their own PC
  • Participants 1369 subjects
  • Total of usage 488 hours
  • Users task
  • To listen to at least five songs
  • To perform at least twenty QA dialogs, or to
    use the system for over forty minutes
  • Questionnaire (only by task complete users)
  • Satisfaction level for SDS from 1 to 5

1 Extremelyunsatisfied 2Unsatisfied 3Acceptable 4Satisfied 5 Extremelysatisfied
8
Distributions of the experimental subjects
and the equipments used by them
  • Subjects who answered questionnaires
  • 449 Subjects (278 males and 171 females)
  • Total 34296 utterances

Microphone
Loudspeaker / headphone
9
Overview of the MusicNavi2 database
Word Error Rate
Utterancesper song played
of utterances
10
Pre-analysis of the MusicNavi2 database
  • Classification of users by their satisfaction
    level
  • task complete users c 1, 2, 3, 4, 5
  • task incomplete users c ?
  • Summary of data
  • Total 518 subjects

c ? 1 2 3 4 5
of subjects 69 38 102 107 155 47
of utterances 52.2 134.5 119.7 114.9 106.5 98.4
WER 70.5 54.1 51.0 46.8 41.2 35.3
Utt. / song 107 7.21 5.34 5.12 4.22 3.43
11
Modeling method for the dialog
context
  1. Introduction
  2. Musicnavi2 database
  3. N-gram modeling
  4. Estimation experiment
  5. Conclusion
  • The dialog management of SDS isdesigned by a
    dialog developer
  • The management is not always satisfactory for
    users
  • Assume that satisfaction appears in the dialog
    context
  • Statistically learning the naturalness of the
    dialog
  • Use N-gram to model the dialog context
  • Construct models for each class of users
  • Estimate the unknown users satisfaction based on
    the likelihood of N-gram model

12
Spoken dialog logs to Dilaog act symbols
  • Vocabulary size of the recognition dictionary
  • That is, the number of the songs
  • Is different between the users
  • Word level information is informative, but it is
    too sparse to deal with as statistically
  • Use dialog act symbols for the users/systems
    acts
  • Defined 21 system dialog acts and 19 user dialog
    acts

13
Example of an encoded dialog
U User S System
Users utterances / Systems prompts
U Hello ( ko-n-ni-chi-wa)
S Hello
U Da-i-to-ka-i
S Do you want to retrieve the song Da-i-to-ka-i?
U Yes ( ha-i )
S Now, playing the song Da-i-to-ka-i by Crystal King.
U Stop ( te-i-shi )
S Now, stopping.
Dialog act symbols
x1 USR_CMD_HELLO
x2 SYS_INFO_GREETING
x3 USR_REQUEST_BYMUSIC
x4 SYS_CONFIRM_KEYWORD
x5 USR_CMD_YES
x6 SYS_PLAY_SONG
x7 USR_CMD_STOP
x8 SYS_INFO_STOPPED
14
Modeling the dialog act sequence by N-gram
  • A dialog act sequence
  • arranged the dialog act symbols in time order t.
  • N-gram probability ( likelihood) when given a
    model for a user class c

15
Estimation experiment
  1. Introduction
  2. Musicnavi2 database
  3. N-gram modeling
  4. Estimation experiment
  5. Conclusion
  • Detection of the users classusing N-gram model
  • Experimental conditions
  • N-gram 1-gram, 2-gram, , 8-gram
  • Witten-Bell smoothing (using SRILM toolkit)
  • Input sequence USR, SYS, SYSUSR
  • Leave-one-out cross validation

16
Estimation experiment
  • Detection method
  • Model selection by thresholding the likelihood
    ratio
  • Evaluation metrics
  • ROC curve
  • Area under the ROC curve (AUC)

true detection
false detection
17
AUC (Area under the ROC curve)
  • task incomplete users
  • unsatisfied users

N SYS USR SYSUSR
1-gram 0.901 0.873 0.927
2-gram 0.948 0.929 0.977
3-gram 0.989 0.954 0.993
4-gram 0.995 0.952 0.997
5-gram 0.993 0.954 0.995
6-gram 0.989 0.951 0.995
7-gram 0.988 0.946 0.995
8-gram 0.987 0.936 0.994
SYS USR SYSUSR
0.611 0.638 0.619
0.628 0.644 0.724
0.591 0.651 0.704
0.583 0.681 0.739
0.629 0.662 0.739
0.632 0.639 0.761
0.604 0.633 0.765
0.592 0.622 0.756
18
Detection result of task incomplete users
  • SYSUSR

N AUC
1-gram 0.927
2-gram 0.977
3-gram 0.993
4-gram 0.997
5-gram 0.995
6-gram 0.995
7-gram 0.995
8-gram 0.994
19
Detection result of unsatisfied users
  • SYSUSR

N AUC
1-gram 0.619
2-gram 0.724
3-gram 0.704
4-gram 0.739
5-gram 0.739
6-gram 0.761
7-gram 0.765
8-gram 0.756
20
Conclusion
  1. Introduction
  2. Musicnavi2 database
  3. N-gram modeling
  4. Estimation experiment
  5. Conclusion
  • Estimation method of user satisfactionusing
    N-gram-based dialog history model for SDS
  • Constructed the real PC environmental database
  • Achieved high performance in the detection of
    task incomplete users
  • 100 true detection rate, when 6 false detection
    rate
  • Not sufficient performance in the detection of
    unsatisfied users
  • N-gram model was effective by comparison of
    1-gram
  • Using both system and user dialog act was
    effective
  • Future works
  • N-gram model-based estimation of dialog failure
    (online detection)
  • Analysis of the dialog context affected user
    satisfaction
  • Integrated method of using acoustic features,
    prosodic features, dialog features, etc.

21
  • Thanks for your kind attention!

22
(No Transcript)
23
Modeling the dialog act sequence by N-gram
  • Encoded dialog logs to dialog act symbols
    automatically
  • A dialog act sequence x
  • arranged the dialog act symbols in time order t.
  • N-gram probability(Likelihood) when given a
    model with a satisfaction level s

Usersdialog acts Using speech recognition results They are defined in recognition dictionary
Systems dialog acts Using system prompts or responses They are the same as systems internal act
24
Detection by thresholding
  • Model selection by an a posteriori odds
    classifier,
  • Introduce a priori odds 1/a and Bayes factor B
  • Finally,

a 1 means ML classifier
25
6-???????????
  • N-gram???????????????????
  • ????
  • ??????1???????517?????????????????(Leave one
    out)
  • ??? s ? (?????), 1(??), 2, 3, 4, 5(??)
  • N-gram 1-gram, 2-gram, , 8-gram
  • ????
  • ?????????????(USR)
  • ??????????????(SYS)
  • ?????????????????(USRSYS)
  • ????
  • ????(Accuracy)

26
???(6-???)?????
  • ???????????????
  • ???????? x ????????????????????
  • ?????????????

27
Detection result for 6-classes of satisfaction
28
Confusion matrix
  • 3-gram of SYS sequence

Estimated Estimated Estimated Estimated Estimated Estimated Estimated
? 1 2 3 4 5
? 43 5 7 5 6 3
1 0 7 8 9 11 3
2 1 8 31 16 35 11
3 0 9 22 23 45 8
4 0 8 34 29 66 18
5 0 4 5 6 24 8
Actual
29
???????????????
  • ???????????????????????????????
  • ?????????????????

? ?????
? ?????
??? ???
? ?????
??????
30
(No Transcript)
31
Modeling the N-gram
  • Encoded to dialog log to dialog act symbols
    automatically
  • Users dialog acts
  • Using speech recognition results
  • They are defined in recognition dictionary
  • Systems dialog acts
  • Using system responses or acts
  • They are the same as systems internal act
  • A dialog act sequence x
  • Arranged the dialog act symbols in time order t.
  • 6?????????N-gram??????
  • Witten-Bell smoothing SRILM toolkit ???

32
Example of a dialog
U Hello ( ko-n-ni-chi-wa)
S Hello
U Da-i-to-ka-i
S Do you want to retrieve the song Da-i-to-ka-i?
U Yes ( ha-i )
S Now, playing the song Da-i-to-ka-i by Crystal King.
U Stop ( te-i-shi )
S Now, stopping.
U User S System
33
  1. Introduction
  2. Musicnavi2 database
  3. N-gram modeling
  4. Estimation experiment
  5. Conclusion
Write a Comment
User Comments (0)
About PowerShow.com