Building a sentential model for automatic prosody evaluation - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Building a sentential model for automatic prosody evaluation

Description:

... Adopt raw values Calculate difference values between the target and the good utterances in terms of The three prosodic aspects : F0, intensity, ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 32
Provided by: lingOhio
Category:

less

Transcript and Presenter's Notes

Title: Building a sentential model for automatic prosody evaluation


1
Building a sentential modelforautomatic prosody
evaluation
Part A
  • Kyuchul Yoon
  • School of English Language Literature
  • Yeungnam University
  • 2009.06.19
  • Korea University

2
English pronunciation evaluation
Introduction
  • English pronunciation proficiency evaluation
  • Ultimate goals
  • Evaluation at
  • The segmental level
  • The suprasegmental level
  • Current goals
  • Evaluation at
  • The suprasegmental level

3
English pronunciation evaluation
Introduction
  • The goal of present study
  • Prosody evaluation of a single target utterance
  • Produced by a Korean student
  • Given
  • An English target sentence
  • A sentential model for prosody evaluation

4
Manual vs. automatic
Introduction
  • Problems of manual evaluation
  • What to evaluate
  • How to evaluate
  • Consistency
  • Problems of automatic evaluation
  • How to reflect human knowledge

5
Manual vs. automatic
Introduction
  • A possible solution?
  • Avoid knowledge-based abstraction
  • Compare a target utterance with native speakers
    utterances
  • Use multiple utterances for comparison
  • Multiple good utterances from native speakers
  • Adopt raw values
  • Calculate difference values between the target
    and the good utterances in terms of
  • The three prosodic aspects F0, intensity,
    durations ? 3D coordinates

6
How to build the model
Introduction
  • Use multivariate statistical analysis
  • A discriminant analysis
  • The components of the model
  • (The segmental proficiency scores controlled)
  • The manual prosody evaluation scores (response)
  • The automatic prosody evaluation scores (factors)
  • The requirements of the model
  • The correlation between the two levelsManual
    scores vs. Automatic scores

7
How to build the model
Introduction
  • The manual prosody scores (an ideal case)
  • The good utterance versions (point 5)by many
    native speakers of English
  • The utterance versions by Korean students whose
    prosodic proficiencies are
  • High (point 5)
  • Intermediate (point 3)
  • Low (point 1)
  • On a scale of 1 (worst) to 5 (best)

8
How to build the model
Introduction
  • The automatic prosody scores
  • Use of Praat scripts
  • Comparison between a single target utterance
    multiple native speakers utterances to yield
    scores for
  • The F0 difference
  • The intensity difference
  • The duration difference
  • in the form of 3D coordinates (x, y, z) (F0,
    Int, Dur)
  • One utterance yields as many coordinates as the
    number of good native speakers

9
How to build the model
Introduction
  • Evaluation by comparisons

10
A 3D sentential modelfor prosody evaluation
Introduction
  • A 3D model
  • 3D axes F0, intensity, durations
  • (F0, Int, Dur) coordinates (x, y, z)
  • Automatic scores as scatterplot points
  • Manually evaluated scores group the points

11
A 3D sentential modelfor prosody evaluatioin
Introduction
  • Validity of the model
  • Sufficient separation of groups with different
    manual scores
  • colors manual scores
  • arrowheads automatic scores

12
Sentential prosody evaluation 7
Methods
Before after duration manipulation
native
learner before
learner after
13
Sentential prosody evaluation 7
Methods
F0 point-to-point comparison btw/ native and
learner after normalization
native
learner after
Automatic score (F0, Int, Dur) (x, y, z)
14
Sentential prosody evaluation 7
Methods
Intensity point-to-point comparison btw/ native
and learner after normalization
native
learner after
Automatic score (F0, Int, Dur) (x, y, z)
15
Sentential prosody evaluation 7
Methods
Duration segment-to-segment comparison btw/
native and learner
native
learner before
Automatic score (F0, Int, Dur) (x, y, z)
Euclidean distance metric for evaluation measure
P (p1, p2, p3,..., pn) and Q (q1, q2, q3,...,
qn) in Euclidean n-dimensional space
16
Manual evaluation of sentential prosody
Methods
Manual scores for Set B utterances The dancing
queen likes only the apple pies
17
Sentential prosody evaluation 7
Methods
A sample score array for one utterance from group
K5one learner utterance vs. 10 model native
utterances Automatic prosody score for K5.U1
(899,142,408), (360,92,190), (716,178,183)
18
A prosody evaluation modelby a Korean phonetician
Results
Korean phoneticians Model
19
A prosody evaluation modelby a Korean phonetician
Results
Korean phoneticians Model
20
A sample prosody evaluationwith a discriminant
analysis
Results
21
To make this fully automatic
Discussion
  • For manual evaluation of the training model
  • The number of Korean learners
  • The more the better
  • The levels of English proficiency
  • The diverse the better (scores 1 through 5)
  • For automatic evaluation of the trainees
  • Need automatic segmentation (ASR)
  • Need to deal with redundant/missing segments

22
Building a sentential modelfor automatic
evaluation of pronunciation proficiency
  • What about segmental evaluation?

Part B
23
Segmental evaluation byspectral comparison
Methods
  • Sex/age controlled (no normalization was used)
  • Adult male (native/Korean) speakers were selected
  • Spectral comparison
  • Three equally-spaced spectral slices were used
    for each matching segments
  • Euclidean distance measure was used from a pair
    of matching spectral envelopes
  • Four coordinates for pronunciation proficiency
    evaluation
  • Segments, F0, intensity, durations
  • (w, x, y, z) becomes one of the score array

24
Manual evaluation of overall proficiency
Methods
Manual scores for Set C utterances Put your toys
away right now
ltTable 4gt The overall scores of the 34 utterances
for Set C sentence Put your toys away right
now. The manual evaluation was performed by a
Korean phonetician. Note that the subjects were
all male adults.
25
A pronunciation proficiency evaluation modelby a
Korean phonetician
Results
Korean phoneticians Models
(Intensity axis not shown)
26
A prosody evaluation modelby a Korean phonetician
Results
Korean phoneticians Model
27
A discriminant analysis
Results
ltTable 5gt The classification table from the
discriminant analysis of one test data. The
number in each cell represents the probability of
the automatic pronunciation Proficiency score
being classified into the predicted group.
ltTable 6gt The confusion matrix for the
classification table.
28
Discriminant analyseswith leave-one-out
cross-validation
Results
Testing for score 4 6 out of 9 correct
Testing for score 2 12 out of 15 correct
29
Discriminant analyseswith leave-one-out
cross-validation
Results
  • For N4 K2 groups, evaluation models were built
    by using
  • The discriminant analysis with
  • Leave-one-out cross-validation
  • The number of models (built by discriminant
    analyses) was 24
  • Group N4 9 subjects
  • Group K2 15 subjects
  • Success rate
  • Group N4 6 out of 9 predicted correct
  • Group K2 12 out of 15 predicted correct

30
Automatic evaluationof pronunciation proficiency
Discussion
  • Viability of sentential models for the evaluation
    of
  • Segmental proficiency spectral comparison
  • Prosodic proficiency F0/intensity/durations
  • in the form of multiple score array
    coordinates (segments, F0, intensity,
    durations) (w, x, y, z)
  • Comparison seems to work
  • A target utterance vs. multiple model native
    utterances
  • Better models can be built with
  • More (controlled) utterances
  • More score resolution
  • Current score 2 (bad) score 4 (good)
  • Future score 1 (worst) score 3 (fair) score
    5 (best)

31
References
1 Boersma, Paul, Praat, a system for doing
phonetics by computer, Glot International
5(9/10), pp.341-345, 2001. 2 Mahalanobis, P.C.,
On the generalized distance in statistics,
Proceedings of the National Institute of Science
of India 12, pp.49-55, 1936. 3 Moulines, E.
F. Charpentier, Pitch synchronous waveform
processing techniques for text-to-speech
synthesis using diphones, Speech Communication
9, pp.453-467, 1990. 4 Ramus, F., M. Nespor, J.
Mehler, Correlates of linguistic rhythm in the
speech signal, Cognition 73, pp. 265-292,
1999. 5 Rhee, S., S. Lee, Y. Lee S. Kang,
Design and construction of Korean-Spoken English
Corpus (K-SEC), Malsori 46, pp.159-174,
2003. 6 Yoon, K, Imposing native speakers'
prosody on non-native speakers' utterances The
technique of cloning prosody, Journal of the
Modern British American Language Literature
25(4), pp.197-215, 2007. 7 Yoon, K. 2008.
Synthesis and evaluation of prosodically
exaggerated utterances. Unpublished manuscript
Write a Comment
User Comments (0)
About PowerShow.com