Formant Synthesis of Speaker Age ( a question regarding prosodic timing) - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Formant Synthesis of Speaker Age ( a question regarding prosodic timing)

Description:

Graduate School of Language Technology Formant Synthesis of Speaker Age (+ a question regarding prosodic timing) Susanne Sch tz Centre for Languages and Literature – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 24
Provided by: ipdsUniki
Category:

less

Transcript and Presenter's Notes

Title: Formant Synthesis of Speaker Age ( a question regarding prosodic timing)


1
Formant Synthesisof Speaker Age( a question
regarding prosodic timing)
Swedish National Graduate School of Language
Technology
  • Susanne Schötz
  • Centre for Languages and Literature
  • Lund University
  • susanne.schotz_at_ling.lu.se

2
A Prototype System for Analysis of
Speaker Age by Formant Synthesis
3
background (1)speaker age
  • acoustic cues to speaker age in almost every
    phonetic dimension
  • relative importance of these cues not fully
    explored
  • one reason the lack of an adequate analysis tool
    (where a large number of potential age parameters
    can be varied systematically and studied in
    detail)

4
background (2)formant synthesis
  • robust flexible, but not as natural-sounding as
    concatenation synthesis
  • GLOVE (OVE III with improved glottal source, dev.
    at Royal Institute of Technology, Stockholm) used
    in experiments of voice variation since 1989
    (Carlson et al., 1991 Karlsson, 1992)
  • data-driven formant synthesis (GLOVE)(Sjölander,
    2001 Sigvardson, 2002 Öhlin, 2004)

5
purpose aim
  • develop a prototype system for analysis of
    speaker age by data-driven formant synthesis
  • use as a tool to analyse, model and synthesize
    speaker age

6
material
  • 1 word själen '???l?n (the soul)
  • 4 speakers (same dialect family)
  • speaker 1 a granddaughter (aged 6)
  • speaker 2 a daughter (aged 36)
  • speaker 3 a mother (aged 66)
  • speaker 4 a grandmother (aged 91)

7
materialacustic pre-analysis
  • F0-contours for '???l?n F1-F2 plot for the
    steady-state part of ??

8
method (1)
  • data-driven analysis by synthesis(GLOVE, Praat,
    Perl)
  • automatic extraction of 23 GLOVE parameters
    (once every 10 ms)
  • formant synthesis (GLOVE)
  • audio-visual comparison of synthesis to original
  • parameter adjustment rules

9
resultsspeaker 1 (6 years)
  • original synthesis
  • fricative, creak
  • - formant error, amplitude

10
resultsspeaker 2 (36 years)
  • original synthesis
  • formants, F0
  • - dull

11
resultsspeaker 3 (66 years)
  • original synthesis
  • formants, F0
  • - amplitude, dull

12
resultsspeaker 4 (91 years)
  • original synthesis
  • formants, F0 (incl. creak)
  • - amplitude, fricative, dull

13
method (2)
  • weighted linear interpolation between two source
    speakers to synthesize a target age (Praat,
    Java)
  • ex. target age 51
  • source speakers 2 (36) and 3 (66)
  • age weights for each source speaker 0.5
  • duration interpolation for segment 1(source
    speaker 2 dur 100 ms, source speaker 3 dur
    200 ms)
  • target dur 100 x 0.5 200 x 0.5 150 ms

14
results (interpolation)
  • at a first glance similarities
  • linear interpolation not optimal aging is not
    linear!
  • only first attempt
  • how evaluate?
  • - perception tests (31 students)
  • age
  • (naturalness)

15
evaluation results natural speakers
CA PA
6 7
36 30
66 35/36
91 74
16
evaluation results data-driven synthesis
CA PA
6 12
36 41
66 44
91 69
17
evaluation results linear age interpolation
CA PA
10 13
20 63
30 54
40 42
50 42
60 54
70 58
80 70
18
summary discussion (1)
  • both similarities and differences between
    original and synthesis
  • a good start, but needs more work
  • formants
  • amplitudes
  • voice source parameters
  • try other interpolation algorithms (spline?)
  • if developed further, may be used to to model and
    synthesize speaker age

19
Question about prosodic timing
  • Bruce (1983) strong Swedish dialect -gt later
    tonal peaks
  • older speakers often sound more dialectal
    (Stölten Engstrand, 2003)
  • could prosodic timing be age-related?
  • my hypothesis older speakers -gt
  • later tonal peaks.
  • investigated 1 word Nordanvinden (4 speakers)

20
F0 contours for nordanvinden
66
6
91
36
21
F0 contours for nordanvinden
66
6
H
H
91
36
H
H
22
Questions
  • What have I missed?
  • synchronisation to segments?
  • context effects?
  • ?
  • How do I proceed?
  • suggestions?
  • Do you (as prosodic experts) believe that there
    may be a relation between age and prosodic timing?

23
  • thank you!
  • questions, answers and comments welcome!
Write a Comment
User Comments (0)
About PowerShow.com