DQR test suites for spoken dialogue system evaluation : A paradigm for a qualitative evaluation

About This Presentation

Title:

DQR test suites for spoken dialogue system evaluation : A paradigm for a qualitative evaluation

Description:

LREC '98 Granada Jean-Yves Antoine. DQR test suites for spoken dialogue system evaluation : ... Implicit understanding (anaphora, ellipses) ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 14

Provided by: antoinej

Category:

more less

Transcript and Presenter's Notes

Title: DQR test suites for spoken dialogue system evaluation : A paradigm for a qualitative evaluation

1
DQR test suites for spoken dialogue system
evaluation A paradigm for a qualitative
evaluation

Jean-Yves Antoine
VALORIA
U. Bretagne Sud
Vannes, France

Jérôme Zeiliger INRS-Telecom Quebec, Canada
Jean Caelen CLIPS Institut IMAG Grenoble, France
2
Quantitative evaluation

Overall performance of the system
Accuracy rates outputs / predefinite references
Advantages
Objective evaluation
Overall improvements over time
Drawbacks
Lack of predictive power
Lack of genericness

3
Predictability some questions

Overall accuracy rate of the system
How does it depend on the performances of its
components ?
Overall accuracy rate of a specific component
How does it depend on the testing data ?
How does it depend on the application ?
How should it enlighten us about future
improvements ?

4
Predictability a solution
Quantitative evaluation
Qualitative evaluation

Assessment of the Overall improvements of the
technology
Appropriateness to a specific task / application

Evaluation of the systems behaviour on EVERY
specific phenomenon
PREDICTABILITY
5
DQR methodology

Qualitative Evaluation in NLP
TSNLP FRACAS AUPELF-UREF
DQR test suites

Declaration D the utterance the system should
understand. D concerns a specific phenomenon
Peter is attending a meeting. He is to chair
it.
Question Q assesses the understanding of D
Is Peter to chair a meeting ?.
Reply R Yes / No

6
DQR Evaluation and Speech
EXTENSIONS OF THE DQR METHODOLOGY
Specificity of the spoken language interaction
Specificity of the speech technologies
Structural Analysis spontaneous unexpected
structures Dialog Strategy
Practical adaptation of the DQR test suites
7
Multi-level Evaluation

Speech Understanding

Literal understanding (structural analysis)
Implicit understanding (anaphora, ellipses)
Inference - common sense reasonning (logical
inferences)
- pragmatic reasonning
- multiple turns inferences

Dialogue

Speech acts interpretation (intention in action)
Speakers intention recognition (preliminary
intention)
Relevance - reply of the system
- dialogue strategy

8
Practical achievement
Simplicity of the question Q

(D) I need to go to Granada tomorrow morning
(Q) Go to Granada
(R) Yes

Simplicity of the evaluation

Computation of the answer mere unification
Accuracy rate specific to each phenomenon

Rsystem UNIF ( D, Q )
9
Genericity
Unification of the intrinsic representations of
the system
No predefinite references No common
representations
Complete independance
10
Predicatibility literal understanding

Key information retrieval

(D) I need to go to Granada tomorrow
morning (Q) Go to Granada (R) Yes

Sharper understanding

(D) Turn on right after the building with the red
shutters (Q) Red shutters (R) Yes (Q) Building
with shutters (R) Yes
11
Predicatibility negative tests
Positive Tests
Tracking the errors
Negative Tests
Explaining the errors
Example literal understanding
(D) Turn on right after the building with the red
shutters (Q) Red building (R) No (D) Move the
circle and the triangle on the right (Q) Move the
right triangle (R) No
12
Predicatibility spoken constructions