Speech and speaker normalization (in vowel normalization) - PowerPoint PPT Presentation

About This Presentation
Title:

Speech and speaker normalization (in vowel normalization)

Description:

Miller 1953. doubled F0 and found vowel category shift for most American English vowels ... Peterson, G. E. & Barney, H. L. (1952) Control methods used in the ... – PowerPoint PPT presentation

Number of Views:343
Avg rating:3.0/5.0
Slides: 34
Provided by: clarati
Category:

less

Transcript and Presenter's Notes

Title: Speech and speaker normalization (in vowel normalization)


1
Speech and speaker normalization (in vowel
normalization)
  • Venice International University
  • Phonetic and technological aspects of speaker
    characteristics
  • Prof. Dr. J. Harrington
  • Presented by
  • Clara Tillmanns
  • clarainindia_at_yahoo.com
  • 18.10.2007

2
Contents
  1. Speech and speaker normalization in vowel
    normalization definition
  2. Influencing parameters and instruments for vowel
    normalization
  3. Theories
  4. Studies Johnson 1990 and 1999
  5. Recapitulation

3
Definition
  • Normalization.
  • We know there is extensive variation in speech.
    How come that listeners agree in their perception
    of vowels?

4
Fig. 1 Scatter plot of first and second formant
values of American English vowels. From Peterson
Barney 1952
5
Definition
  • Normalization.
  • Which information influences this decision?

6
Definition
  • Normalization.
  • And, which mechanism leads to the decision?

7
Contents
  • Speech and speaker normalization definition
  • Influencing parameters and instruments for vowel
    normalization
  • Context
  • Formant ratio
  • F0
  • Visual information
  • Auditory gestalts
  • Theories
  • Studies Johnson 1990 and 1999
  • Recapitulation

8
Influencing parameters and instruments for vowel
normalization
  • Extrinsic Intrinsic

Context
Formant ratio
F0
Auditory gestalts
Visual information
9
Influencing parameters and instruments for vowel
normalization
Syllable internal
Syllable external
  • Extrinsic Intrinsic

Context
Formant ratio
F0
Auditory gestalts
Visual information
10
Influencing parameters and instruments for vowel
normalization
Syllable internal
Syllable external
  • Extrinsic Intrinsic

Context
Formant ratio
Vocalic
Prosodic
F0
Tonal
Auditory gestalts
Visual information
11
Influencing parameters and instruments for vowel
normalization
  • Context
  • Perceived vowel quality is influenced
  • by the formant frequencies of context vowels
    (Ladefoged Broadbent 1957)
  • by the F0 range of the carrier phrase (Johnson
    1990)
  • Tones Pitch range of a context utterance
    influences Mandarin Chinese tones (Leather 1983)

12
Influencing parameters and instruments for vowel
normalization
Syllable internal
Syllable external
  • Extrinsic Intrinsic

Context
Formant ratio
Vocalic
Relative patterns
Prosodic
Gender
F0
Tonal
Auditory gestalts
Visual information
13
Influencing parameters and instruments for vowel
normalization
  • Formant ratio
  • Vowels are relative patterns - no absolute
    frequencies

14
Influencing parameters and instruments for vowel
normalization
  • Formant ratio
  • Fig. 2 Spectrogram of a man and a woman saying
    cat. The three lowest vowel formants (vocal
    tract resonant frequencies are marked as F1, F2
    and F3) (Johnson 2004)

15
Influencing parameters and instruments for vowel
normalization
  • F0
  • Miller 1953
  • doubled F0 and found vowel category shift for
    most American English vowels
  • Fujisaki Kawashime 1968
  • Found F1 boundary shifts from 100Hz to 200Hz for
    F0 shifts of 200Hz

16
Influencing parameters and instruments for vowel
normalization
Syllable internal
Syllable external
  • Extrinsic Intrinsic

Context
Formant ratio
Vocalic
Relative patterns
Prosodic
Gender
F0
Tonal
Auditory gestalts
Visual information
Articulatory gestures
Gender / Age
17
Influencing parameters and instruments for vowel
normalization
  • Visual information
  • Gender boundary shift much like the F0 shift
    (Strand Johnson 1996)
  • Age
  • Vowel quality boundary shift through differing
    visual phonetic information (Johnson 1999)
  • Sociocultural Speech intelligibility is reduced,
    when the voice is associated with an Asian
    looking face (Rubin 1992)

18
Influencing parameters and instruments for vowel
normalization
  • Auditory gestalts - secondary cues
  • Duration
  • Formant frequency movement trajectories
  • Lehiste Metzger 1973
  • Fixed duration vowels synthesized with
    steady-state formant frequencies (51 correct)
  • - mixed lists of the original vowels from men,
    women and children 79 correct.
  • Hillenbrand Neary 1999
  • Flat-formant vowels were correctly identified 74
    of the time, while vowels synthesized with the
    original formant frequency trajectories were
    correctly identified 89 of the time.

19
Contents
  • Speech and speaker normalization in vowel
    normalization definition
  • Influencing parameters and instruments for vowel
    normalization
  • Theories
  • 3.1 Vocal tract normalization (VTN)
  • 3.2 Talker normalization (TN)
  • 4. Studies Johnson 1990 and 1999
  • 5. Recapitulation

20
Theories - VTN
  • Vocal tract normalization theories consider that
    listeners perceptually evaluate vowels on a
    talker specific coordinate system. (Johnson
    2004)
  • Context vowels (reference)
  • Visual information about the size of the vocal
    tract

21
Theories - VTN
  • But Talkers may differ from each other at the
    level of their articulatory habits of speech
  • Perception may not be able to depend on vocal
    tract normalization to remove talker
    differences by removing vocal tract differences.
    (Johnson 2004)
  • ? Speaker/speech variation depends on anatomical
    differences only?

22
Theories - VTN
  • Cross-linguistic gender differences
  • Bladon, Henton and Pickering (1984)
  • The difference between men and women vary from
    language to language.
  • Cultural factors are involved in defining and
    shaping male or female speech
  • Anatomy does not completely determine the vowel
    formant frequencies

23
Theories - VTN

Fig. 3 Spectral shift needed to normalize male
and female spectra From Bladon, Henton
Pickering (1984)
24
Theories - VTN
  • This seems to suggest that talkers choose
    different styles of speaking as social, dialectal
    gender markers.
  • A speaker normalization that removes vocal tract
    differences will fail to account for the
    linguistic categorical similarity of vowels that
    are different due to different habits of
    articulation.
  • (Johnson 2004)

25
Theories - TN
  • Talker normalization is subject to expectations
  • Magnuson Nusbaum (1994) compared
  • 1-voice with 2-voice instructions in a
    mixed-talker and blocked-talker experiment.
  • Advantage of blocked-talker disappeared when
    subjects didnt know about the different F0s of
    the two voices.
  • Talker normalization is an active process
  • Kato Kakehi (1988) Listener adaptation to
    talker voice
  • Increase in recognition accuracy over the course
    of 5 stimuli presented in noise

26
Theories - TN
  • In this approach, cognitive categories are
    represented as collections of the stored
    cognitive representations of experienced
    instances of the category,
  • rather than as normalized abstract
    representations from which category-internal
    structure has been removed (Johnson 2004)

27
Contents
  • Speech and speaker normalization in vowel
    normalization definition
  • Influencing parameters and instruments for vowel
    normalization
  • Theories
  • Studies
  • 4.1 Johnson 1990
  • 4.2 Johnson 1999
  • 5. Recapitulation

28
Studies
  • The role of perceived speaker identity in F0
    normalization of vowels (Johnson 1990)
  • Presentation of vowels from a hood-hud
    continuum in two different intonational contexts
    which were judged to have been produced by
    different speakers, even though the F0 of the
    test word was identical in the two contexts.

29
Studies
  • The role of perceived speaker identity in F0
    normalization of vowels (Johnson 1990)
  • Shift in identification as a result of the
    intonational context
  • which was interpreted as evidence for the role of
    perceived speaker identity in vowel normalization

30
Studies
  • Auditory-visual integration of talker gender in
    vowel perception (Johnson 1999)
  • Exp. 1 found, that the gender of
    auditory-visually presented stimuli shift the
    phoneme boundary of a vowel continuum
  • Exp. 2 found that visual phonetic information is
    integrated in the boundary shift
  • Exp. 3 showed that listeners integrate abstract
    gender information with phonetic information in
    speech perception

31
Contents
  1. Speech and speaker normalization in vowel
    normalization definition
  2. Influencing parameters and instruments for vowel
    normalization
  3. Theories
  4. Studies Johnson 1990 and 1999
  5. Recapitulation

32
Recapitulation
  • Great internal and external influence on the
    perception (of vowels)
  • Explanation must integrate repeated learning
  • Information on speaker identity influences the
    perception (of vowels)
  • But Is the perception of speaker identity
    influenced by certain components of the speech
    signal?
  • May speaker identity be manipulated?

33
References
  • Bladon, R.A., Henton, C. G. Pickering, J. B.
    (1984) Towards an auditory theory of speaker
    normalization. Language Communication 4, 59-69.
  • Fujisaki, H. Kawashima, T. (1968) The roles of
    pitch and higher formants in the perception of
    vowels. IEEE Transactions on Audio and
    Electroacoustics AU-16, 73-77.
  • Hillenbrand, J. M. Neary, T. M. (1999)
    Identification of synthesized /hVd/ utterances
    Effects of formant contour. J. Acoust. Soc. Am.
    105, 3509-3523.
  • Ladefoged, P. Broadbent, D. E. (1957)
    Information conveyed by vowels. J. Acoust. Soc.
    Am. 29, 98-104
  • Leather, J. (1983) Speaker normalization in the
    perception of lexical tone. Journal of Phonetics
    11, 373-382
  • Lehiste, I. Metzger, D. (1973) Vowel and
    speaker identification in natural and synthetic
    speech. Language and Speech 16, 356-364.
  • Johnson, K., Strand, E. A. DImperio, M. (1999)
    Auditory-visual integration of talker gender in
    vowel perception. Journal of Phonetics 27,
    359-384
  • Johnson, K. (2004) Speaker normalization in
    speech perception. Ohio State University
  • Johnson, K. (1990) The role of percieved speaker
    identity in F0 normalization of vowels. J.
    Acoust. Soc. Am. 88 642-654
  • Kato, K Kakehi, K. (1988) Listener
    adaptability to individual speaker differences in
    monosyllabic speech perception. J. Acoust. Soc.
    Of Japan 44, 180-186
  • Magnuson, J. Nusbaum, H. (1994) Are
    representations used for talker identification
    available for talker normalization? Proceedings
    of the International Conference on Spoken
    Language Processing.
  • Miller, R. L. (1953) Auditory tests with
    synthetic vowels. J. Acoust. Soc. Am. 25,
    114-121.
  • Peterson, G. E. Barney, H. L. (1952) Control
    methods used in the study of vowels. J. Acoust.
    Soc. Am. 24, 175-184
  • Rubin, D. L. (1992) Non-language factors
    affecting undergraduates jedgements of
    non-native English-speaking teaching assistants.
    Research in Higher Education 33, 4.
  • Strand, E. A. Johnson, K. (1996) Gradient and
    visual speaker normalization in the perception of
    fricatives. In Natural languag processing and
    speech technology results of the 3rd KONVENS
    conference, Bielefeld, (D. Gibbon, Ed.), Berlin
    Mouton de Gruyter (pp. 14-26).
Write a Comment
User Comments (0)
About PowerShow.com