Title: Formant Synthesis of Speaker Age ( a question regarding prosodic timing)
1Formant Synthesisof Speaker Age( a question
regarding prosodic timing)
Swedish National Graduate School of Language
Technology
- Susanne Schötz
- Centre for Languages and Literature
- Lund University
- susanne.schotz_at_ling.lu.se
2 A Prototype System for Analysis of
Speaker Age by Formant Synthesis
3background (1)speaker age
- acoustic cues to speaker age in almost every
phonetic dimension - relative importance of these cues not fully
explored - one reason the lack of an adequate analysis tool
(where a large number of potential age parameters
can be varied systematically and studied in
detail)
4background (2)formant synthesis
- robust flexible, but not as natural-sounding as
concatenation synthesis - GLOVE (OVE III with improved glottal source, dev.
at Royal Institute of Technology, Stockholm) used
in experiments of voice variation since 1989
(Carlson et al., 1991 Karlsson, 1992) - data-driven formant synthesis (GLOVE)(Sjölander,
2001 Sigvardson, 2002 Öhlin, 2004)
5purpose aim
- develop a prototype system for analysis of
speaker age by data-driven formant synthesis - use as a tool to analyse, model and synthesize
speaker age
6material
- 1 word själen '???l?n (the soul)
- 4 speakers (same dialect family)
- speaker 1 a granddaughter (aged 6)
- speaker 2 a daughter (aged 36)
- speaker 3 a mother (aged 66)
- speaker 4 a grandmother (aged 91)
7materialacustic pre-analysis
- F0-contours for '???l?n F1-F2 plot for the
steady-state part of ??
8method (1)
- data-driven analysis by synthesis(GLOVE, Praat,
Perl) -
- automatic extraction of 23 GLOVE parameters
(once every 10 ms) - formant synthesis (GLOVE)
- audio-visual comparison of synthesis to original
- parameter adjustment rules
9resultsspeaker 1 (6 years)
- original synthesis
- fricative, creak
- - formant error, amplitude
10resultsspeaker 2 (36 years)
- original synthesis
- formants, F0
- - dull
11resultsspeaker 3 (66 years)
- original synthesis
- formants, F0
- - amplitude, dull
12resultsspeaker 4 (91 years)
- original synthesis
- formants, F0 (incl. creak)
- - amplitude, fricative, dull
13method (2)
- weighted linear interpolation between two source
speakers to synthesize a target age (Praat,
Java) - ex. target age 51
- source speakers 2 (36) and 3 (66)
- age weights for each source speaker 0.5
- duration interpolation for segment 1(source
speaker 2 dur 100 ms, source speaker 3 dur
200 ms) - target dur 100 x 0.5 200 x 0.5 150 ms
14results (interpolation)
- at a first glance similarities
- linear interpolation not optimal aging is not
linear! - only first attempt
- how evaluate?
- - perception tests (31 students)
- age
- (naturalness)
15evaluation results natural speakers
CA PA
6 7
36 30
66 35/36
91 74
16evaluation results data-driven synthesis
CA PA
6 12
36 41
66 44
91 69
17evaluation results linear age interpolation
CA PA
10 13
20 63
30 54
40 42
50 42
60 54
70 58
80 70
18summary discussion (1)
- both similarities and differences between
original and synthesis - a good start, but needs more work
- formants
- amplitudes
- voice source parameters
- try other interpolation algorithms (spline?)
- if developed further, may be used to to model and
synthesize speaker age
19Question about prosodic timing
- Bruce (1983) strong Swedish dialect -gt later
tonal peaks - older speakers often sound more dialectal
(Stölten Engstrand, 2003) - could prosodic timing be age-related?
- my hypothesis older speakers -gt
- later tonal peaks.
- investigated 1 word Nordanvinden (4 speakers)
20F0 contours for nordanvinden
66
6
91
36
21F0 contours for nordanvinden
66
6
H
H
91
36
H
H
22Questions
- What have I missed?
- synchronisation to segments?
- context effects?
- ?
- How do I proceed?
- suggestions?
- Do you (as prosodic experts) believe that there
may be a relation between age and prosodic timing?
23- thank you!
- questions, answers and comments welcome!