Formant Synthesis of Speaker Age ( a question regarding prosodic timing)

About This Presentation

Title:

Formant Synthesis of Speaker Age ( a question regarding prosodic timing)

Description:

Graduate School of Language Technology Formant Synthesis of Speaker Age (+ a question regarding prosodic timing) Susanne Sch tz Centre for Languages and Literature – PowerPoint PPT presentation

Number of Views:146

Avg rating:3.0/5.0

Slides: 24

Provided by: ipdsUniki

Category:

more less

Transcript and Presenter's Notes

Title: Formant Synthesis of Speaker Age ( a question regarding prosodic timing)

1
Formant Synthesisof Speaker Age( a question
regarding prosodic timing)
Swedish National Graduate School of Language
Technology

Susanne Schötz
Centre for Languages and Literature
Lund University
susanne.schotz_at_ling.lu.se

2
A Prototype System for Analysis of
Speaker Age by Formant Synthesis
3
background (1)speaker age

acoustic cues to speaker age in almost every
phonetic dimension
relative importance of these cues not fully
explored
one reason the lack of an adequate analysis tool
(where a large number of potential age parameters
can be varied systematically and studied in
detail)

4
background (2)formant synthesis

robust flexible, but not as natural-sounding as
concatenation synthesis
GLOVE (OVE III with improved glottal source, dev.
at Royal Institute of Technology, Stockholm) used
in experiments of voice variation since 1989
(Carlson et al., 1991 Karlsson, 1992)
data-driven formant synthesis (GLOVE)(Sjölander,
2001 Sigvardson, 2002 Öhlin, 2004)

5
purpose aim

develop a prototype system for analysis of
speaker age by data-driven formant synthesis
use as a tool to analyse, model and synthesize
speaker age

6
material

1 word själen '???l?n (the soul)
4 speakers (same dialect family)
speaker 1 a granddaughter (aged 6)
speaker 2 a daughter (aged 36)
speaker 3 a mother (aged 66)
speaker 4 a grandmother (aged 91)

7
materialacustic pre-analysis

F0-contours for '???l?n F1-F2 plot for the
steady-state part of ??

8
method (1)

data-driven analysis by synthesis(GLOVE, Praat,
Perl)
automatic extraction of 23 GLOVE parameters
(once every 10 ms)
formant synthesis (GLOVE)
audio-visual comparison of synthesis to original
parameter adjustment rules

9
resultsspeaker 1 (6 years)

original synthesis
fricative, creak
- formant error, amplitude

10
resultsspeaker 2 (36 years)

original synthesis
formants, F0
- dull

11
resultsspeaker 3 (66 years)

original synthesis
formants, F0
- amplitude, dull

12
resultsspeaker 4 (91 years)

original synthesis
formants, F0 (incl. creak)
- amplitude, fricative, dull

13
method (2)

weighted linear interpolation between two source
speakers to synthesize a target age (Praat,
Java)
ex. target age 51
source speakers 2 (36) and 3 (66)
age weights for each source speaker 0.5
duration interpolation for segment 1(source
speaker 2 dur 100 ms, source speaker 3 dur
200 ms)
target dur 100 x 0.5 200 x 0.5 150 ms

14
results (interpolation)

at a first glance similarities
linear interpolation not optimal aging is not
linear!
only first attempt
how evaluate?
- perception tests (31 students)
age
(naturalness)

15
evaluation results natural speakers
CA PA
6 7
36 30
66 35/36
91 74
16
evaluation results data-driven synthesis
CA PA
6 12
36 41
66 44
91 69
17
evaluation results linear age interpolation
CA PA
10 13
20 63
30 54
40 42
50 42
60 54
70 58
80 70
18
summary discussion (1)

both similarities and differences between
original and synthesis
a good start, but needs more work
formants
amplitudes
voice source parameters
try other interpolation algorithms (spline?)
if developed further, may be used to to model and
synthesize speaker age

19
Question about prosodic timing

Bruce (1983) strong Swedish dialect -gt later
tonal peaks
older speakers often sound more dialectal
(Stölten Engstrand, 2003)
could prosodic timing be age-related?
my hypothesis older speakers -gt
later tonal peaks.
investigated 1 word Nordanvinden (4 speakers)

20
F0 contours for nordanvinden
66
6
91
36
21
F0 contours for nordanvinden
66
6
H
H
91
36
H
H
22
Questions

What have I missed?
synchronisation to segments?
context effects?
?
How do I proceed?
suggestions?
Do you (as prosodic experts) believe that there
may be a relation between age and prosodic timing?