Title: LandmarkBased Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Net
1Landmark-Based Speech RecognitionSpectrogram
Reading,Support Vector Machines,Dynamic
Bayesian Networks,and Phonology
- Mark Hasegawa-Johnson
- University of Illinois at Urbana-Champaign, USA
- Assistant Professor, Electrical and Computer
Engineering Department - Assistant Professor, Beckman Institute for
Advanced Science and Technology - Adjunct Professor, Speech and Hearing Sciences
Department
2Lecture 1Introduction to Spectrogram Reading
- Review
- Laplace and Fourier transforms
- Short-time Fourier transform (STFT) and windowing
- White noise
- Periodic Signals
- Spectrogram reading Pitch
- Wideband and narrowband spectrograms
- Spectrogram reading Manner
- Speech physiology
- Manner classification of phonemes
- Spectrogram reading Formants
- Log-linear form of a rational filter
3Laplace and Fourier Transforms
4Transform Properties
5Transforms worth knowing Impulses
6Transforms worth knowing Filters
7Rectangular Window
8Hamming Hanning Windows
9Periodic Signals
10Random Signals (Noise)
11The Short-Time Fourier Transform
12The Spectrogram
13Narrowband Spectrogram N gt 2T0
14Wideband Spectrogram N lt T0
15Fundamental Frequency
10F0
4T0
Fundamental Frequency (Pitch) F01/T0
16On to New MaterialManner Features, Speech
Production, and Landmarks
17Anatomy of Speech Production
Hard Palate
Nasal Cavity
Lips
Soft Palate (Open)
Oral Cavity
Pharynx
Epiglottis
Tongue Blade
Vocal Folds
Tongue Body
Jaw
Tongue Root
18Speech sources Voicing, Turbulence, and
Transients
- The vocal folds
- A nonlinear, high-impedance oscillator
- Excitation is like a periodic impulse train
- Turbulence
- Vortices striking an obstacle produce white noise
- Excitation is like white noise
- Transient
- High pressure, suddenly released
- Excitation is like a single loud impulse, d(t)
19The vocal folds A nonlinear, high-impedance
oscillator
Vocal tract rings like a bell, shaping the
sound produced by the vocal folds (Cross-sectional
area of the vocal tract 0.5-10 cm2)
Larynx (the opening between the vocal folds) has
an open area of 0.03 cm2. In order to get
through, air from lungs must speed up to a
high-speed jet. Vocal folds flap back and forth,
driven by the jet, with a rate of 100-200
pulses/second.
20Turbulence Vortices striking an obstacle produce
white noise
In a fricative, area of the tongue constriction
is about 0.2cm2. In order to get through, air
speeds up into a turbulent jet.
The turbulent jet strikes against downstream
obstacles, like the teeth. The jet contains
vortices of all different radii, between 0mm and
0.2cm, therefore the resulting sound contains
noise at all frequencies above about 700Hz.
21Transient High pressure, suddenly released
While tongue tip is closed, air pressure builds
up behind the constriction.
When constriction is released, there is a sudden
change in air flow through the constriction (from
0 to nonzero). The sudden change in airflow is
heard as a pop.
22The Source-Filter Model of Speech Production
Corresponds to S(s) H(s)E(s), where S(s)
Recorded speech spectrum E(s) Source
spectrum H(s) Transfer function Filtering by
the vocal tract
23Manner Classification of Phonemes continuant
- -continuant lips or tongue close COMPLETELY
on midline of the vocal tract - stops (p,b,t,d,k,g)
- nasals (m,n,ng),
- affricates (q,j,ch,zh)
- syllable-initial lateral (l, e.g., lake)
- continuant no complete closure
- fricatives (f,v,s,z,sh,x, Chinese h)
- glides (w,y,r, English h)
- vowels (a,e,i,o,u)
- diphthongs (in buy, boy, bow)
24Manner Classification of Phonemes sonorant
- sonorant a sound you can sing (Latin)
- nasals (m,n,ng)
- lateral (l)
- glides (w,y,r)
- vowels (a,e,i,o,u)
- diphthongs (buy, boy, bow)
- -sonorant air pressure builds up behind
constriction voicing amplitude drops (also
called an obstruent consonant) - stops (p,b,t,d,k,g)
- affricates (q,j,ch,zh)
- fricatives (f,v,s,z,sh,x)
- Special status of sonorant in Chinese
- initial must be all-sonorant (liang) or
all-obstruent (qing) - final must be all-sonorant
25Sonorant Consonants Glide, Lateral, Nasal
layya ton -- /l/, /y/, /t/, /n/ (the /y/ is
continuant, others are -)
ame -- /m/ -continuant
26Obstruent Consonants Fricatives, Affricates, and
Stops
sa (continuant)
shi (continuant)
qe (-continuant)
iji (-continuant)
ba (-continuant)
ita (-continuant)
27Place of Primary Articulation
Palatal (Blade)q,j,sh,y,i
Alveolar (Blade)t,d,s,z,n,l
Retroflex (Blade)ch,zh,x,r,er
Velar (Body)k,g,ng,w,u
Dental (Blade)th,dh
Labial (Lips)p,b,f,v,m,w,u,o
Uvular (Body)h,o
Pharyngeal(Body)a,ae
Laryngealh
28Features of Secondary Articulators lateral,
nasal, affricated, aspirated
- sonorant,continuant vowels, glides
- sonorant,-continuant
- nasal soft palate is open air escapes
through the nose - lateral tongue is open on the sides air can
escape around edges of tongue - -sonorant,continuant fricatives
- -sonorant,-continuant
- affricated tongue stays nearly closed after
release, causing frication (q,j,ch,zh) - aspirated larynx stays open after release,
causing aspiration (p,t,k) - -affricated,-aspirated nothing special happens
after release vowel starts immediately (b,d,g)
29Sonorant Consonants Glide, Lateral, Nasal
layya ton -- /l/, /y/, /t/, /n/ (the /y/ is
continuant, others are -)
ame -- /m/ -continuant
30Waveforms and Spectrograms Aspirated and
Unaspirated Stops
Unaspirated /b/
Aspirated /t/
31Phonetic Subsegments in the Release of an
Aspirated Stop
32Waveforms and Spectrograms Fricatives and
Affricates
iji
qe
sa
shi
33Landmarks Changes in the features continuant,
sonorant
/m/ release
/t/ release
/k/
/l/ release
/m/ closure
/n/ release
/v/ release
/t/ closure
/v/ closure
/n/ closure
34The Vocal Tract Transfer Function
35Log-Spectral Separation of Source and Filter
36Formant Frequencies Resonant Frequencies of the
Vocal Tract
37Formant Frequencies of a Vowel
From Peterson and Barney, Control Methods in a
Study of the Vowels, Journal of the Acoustical
Society of America, 1952
38Classifying Vowels
F2 starts at 1200Hz, rises to 2000Hz
F21200Hz
F1800Hz
F1 starts at 800Hz, falls to 300Hz
Therefore diphthong is /AY/
Therefore vowel is /AH/
39Rational Filters Obstruents
40Example Front Cavity Resonance of /ch/ (q) is
near F3 of Following Vowel
41Rational Filters Nasal Consonants
42Examples Nasal Consonants
/m/ This talker makes /m/ with resonances at
1000Hz, 1800Hz uncancelled, but with the
resonance at 300Hz cancelled by zeros.
/ng/ This talker makes /ng/ with resonances at
300Hz, 1000Hz uncancelled, but with the resonance
at 1800Hz cancelled by zeros.
43Summary
- Spectrogram is the log magnitude of the STFT.
- Wideband spectrogram NltT0, pitch shows up in the
time domain - Narrowband spectrogram Ngt2T0, pitch shows up in
the frequency domain - Landmarks occur at changes in the values of the
distinctive features continuant and sonorant - continuant,sonorant vowels, glides,
diphthongs - continuant,-sonorant fricatives
- -continuant,sonorant nasals, laterals
- -continuant,-sonorant stops, affricates
- Recognition of Vowels and Glides F1 and F2 are
usually enough - Recognition of Diphthongs F1 and F2 at two
separate points in time (beginning and ending of
the vowel). - Obstruent Consonants Back cavity formants are
cancelled by zeros, leaving only the front cavity
formants (e.g., F3 for /sh/, /q/) - Nasal Consonants Resonances of the mouth-nose
system are often cancelled by zeros, leaving
primarily low-frequency energy.