Title: ECE 598: The Speech Chain
1ECE 598 The Speech Chain
- Lecture 8 Formant Transitions Vocal Tract
Transfer Function
2Today
- Perturbation Theory
- A different way to estimate vocal tract resonant
frequencies, useful for consonant transitions - Syllable-Final Consonants Formant Transitions
- Vocal Tract Transfer Function
- Uniform Tube (Quarter-Wave Resonator)
- During Vowels All-Pole Spectrum
- Q
- Bandwidth
- Nasal Vowels Sum of two transfer functions gives
spectral zeros
3Topic 1Perturbation Theory
4Perturbation Theory(Chiba and Kajiyama, The
Vowel, 1940)
A(x) is constant everywhere, except for one small
perturbation.
Method 1. Compute formants of the
unperturbed vocal tract. 2. Perturb the
formant frequencies to match the area
perturbation.
5Conservation of Energy Under Perturbation
6Conservation of Energy Under Perturbation
7Sensitivity Functions
8Sensitivity Functions for the Quarter-Wave
Resonator (Lips Open)
x
0
L
- Note low F3 of /er/ is caused in part by a
side branch under the tongue perturbation alone
is not enough to explain it.
/AA/
/ER/
/IY/
/W/
9Sensitivity Functions for the Half-Wave Resonator
(Lips Rounded)
x
0
L
- Note high F3 of /l/ is caused in part by a
side branch above the tongue perturbation alone
is not enough to explain it.
/L,OW/
/UW/
10Formant Frequencies of Vowels
From Peterson Barney, 1952
11Topic 2Formant Transitions, Syllable-Final
Consonant
12Events in the Closure of a Nasal Consonant
Formant Transitions
Vowel Nasalization
Nasal Murmur
13Formant Transitions A Perturbation Theory Model
14Formant Transitions Labial Consonants
the mom
the bug
15Formant Transitions Alveolar Consonants
the supper
the tug
16Formant Transitions Post-alveolar Consonants
the shoe
the zsazsa
17Formant Transitions Velar Consonants
the gut
sing a song
18Topic 3Vocal Tract Transfer Functions
19Transfer Function
- Transfer Function T(w)Output(w)/Input(w)
- In speech, its convenient to write
T(w)UL(w)/UG(w) - UL(w) volume velocity at the lips
- UG(w) volume velocity at the glottis
- T(0) 1
- Speech recorded at a microphone pressure
- PR(w) R(w)T(w)UG(w)
- R(w) jrf/r radiation characteristic
- r density of air
- r distance to the microphone
- f frequency in Hertz
20Transfer Function of an Ideal Uniform Tube
- Ideal Terminations
- Reflection coefficient at glottis zero velocity,
g1 - Reflection coefficient at lips zero pressure,
g-1 - Obviously, this is an approximation, but it
gives - T(w) 1/cos(wL/c)
- (ww3)(ww2)(ww1)(w-w1)(w-w2)(w-w3)
- wn npc/L pc/2L
- Fn nc/2L c/4L
w12w22w32
21Transfer Function of an Ideal Uniform Tube
Peaks are actually infinite in height (figure is
clipped to fit the display)
22Transfer Function of a Non-Ideal Uniform Tube
- Almost ideal terminations
- At glottis velocity almost zero, g1
- At lips pressure almost zero, g-1
- T(w) 1/(j/Q cos(wL/c))
- at Fnnc/2L c/4L,
- T(2pFn) -jQ
- 20log10T(2pFn) 20log10Q
23Transfer Function of a Non-Ideal Uniform Tube
24Transfer Function of a Vowel Height of First
Peak is Q1F1/B1
(2pFn)2(pBn)2
8
- T(w) P (jwj2pFnpBn)(jw-j2pFnpBn)
- T(2pF1) (2pF1)2/(j4pF1pB1)
- -jF1/B1
- Call Qn Fn/Bn
- T(2pF1) -jQ1
- 20log10T(2pF1) 20log10Q1
n1
25Transfer Function of a Vowel Bandwidth of a Peak
is Bn
(2pFn)2(pBn)2
8
- T(w) P (jwj2pFnpBn)(jw-j2pFnpBn)
- T(2pF1pB1) (2pF1)2/((j4pF1)(pB1pB1))
- -jQ1/2
- At fF10.5Bn,
- T(w)0.5Qn
- 20log10T(w) 20log10Q1 3dB
n1
26Amplitudes of Higher Formants Include the Rolloff
(2pFn)2(pBn)2
8
- T(w) P (jwj2pFnpBn)(jw-j2pFnpBn)
- At f above F1
- T(2pf) (F1/f)
- T(2pF2) (-jF2/B2)(F1/F2)
- 20log10T(2pF2)
- 20log10Q2 20log10(F2/F1)
- 1/f Rolloff 6 dB per octave (per doubling of
frequency)
n1
27Vowel Transfer Function Synthetic Example
L1 20log10(500/80)16dB
L2 20log10(1500/240) 20log10(F2/F1) 16dB
9.5dB
L3 20log10(2500/600) 20log10(F3/F1)
20log10(F3/F2)
B2 240Hz
B1 80Hz
B3 600Hz? (hard to measure because rolloff from
F1, F2 turns the F3 peak into a plateau)
F4 peak completely swamped by rolloff from lower
formants
28Shorthand Notation for the Spectrum of a Vowel
snsn
8
- T(s) P (s-sn)(s-sn)
- s jw
- sn -pBnj2pFn
- sn -pBn-j2pFn
- snsn sn2 (2pFn)2(pBn)2
- T(0) 1
- 20log10T(0) 0dB
n1
29Another Shorthand Notation for the Spectrum of a
Vowel
1
8
n1
30Topic 4Nasalized Vowels
31Vowel Nasalization
Nasalized Vowel
Nasal Consonant
32Nasalized Vowel
- PR(w) R(w)(UL(w)UN(w))
- UN(w) Volume Velocity from Nostrils
- PR(w) R(w)(TL(w)TN(w))UG(w)
- R(w)T(w)UG(w)
- T(w) TL(w) TN(w)
33Nasalized Vowel
- T(s) TL(s)TN(s)
- (1-s/sLn)(1-s/sLn) (1-s/sNn)(1-s/sNn)
- (1-s/sLn)(1-s/sLn)(1-s/sNn)(1-s/sNn)
- 1/sZn ½(1/sLn1/sNn)
- sZn nth spectral zero
- T(s) 0 if ssZn
1
1
2(1-s/sZn)(1-s/sZn)
34The Pole-Zero Pair
- 20log10T(w)
- 20log10(1/(1-s/sLn)(1-s/sLn))
- 20log10((1-s/sZn)(1-s/sZn)/(1-s/sNn)(1-s/sNn))
- original vowel log spectrum
- log spectrum of a pole-zero pair
35Additive Terms in the Log Spectrum
36Transfer Function of a Nasalized Vowel
37Pole-Zero Pairs in the Spectrogram
Nasal Pole
Zero
Oral Pole
38Summary
- Perturbation Theory
- Squeeze near a velocity peak formant goes down
- Squeeze near a pressure peak formant goes up
- Formant Transitions
- Labial closure loci near 250, 1000, 2000 Hz
- Alveolar closure loci near 250, 1700, 3000 Hz
- Velar closure F2 and F3 come together (velar
pinch) - Vocal Tract Transfer Function
- T(s) P snsn/(s-sn)(s-sn)
- T(w2pFn) Qn Fn/Bn
- 3dB bandwidth Bn Hertz
- T(0) 1
- Nasal Vowels
- Sum of two transfer functions gives a spectral
zero between the oral and nasal poles - Pole-zero pair is a local perturbation of the
spectrum