LandmarkBased Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Net - PowerPoint PPT Presentation

1 / 43

About This Presentation

Title:

LandmarkBased Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Net

Description:

Spectrogram reading: Formants. Log-linear form of a rational filter ... Spectrogram is the log magnitude of the STFT. Wideband spectrogram: N T0, pitch shows up ... – PowerPoint PPT presentation

Number of Views:97

Avg rating:3.0/5.0

Slides: 44

Provided by: jhas7

Category:

more less

Transcript and Presenter's Notes

Title: LandmarkBased Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Net

1
Landmark-Based Speech RecognitionSpectrogram
Reading,Support Vector Machines,Dynamic
Bayesian Networks,and Phonology

Mark Hasegawa-Johnson
University of Illinois at Urbana-Champaign, USA
Assistant Professor, Electrical and Computer
Engineering Department
Assistant Professor, Beckman Institute for
Advanced Science and Technology
Adjunct Professor, Speech and Hearing Sciences
Department

2
Lecture 1Introduction to Spectrogram Reading

Review
Laplace and Fourier transforms
Short-time Fourier transform (STFT) and windowing
White noise
Periodic Signals
Spectrogram reading Pitch
Wideband and narrowband spectrograms
Spectrogram reading Manner
Speech physiology
Manner classification of phonemes
Spectrogram reading Formants
Log-linear form of a rational filter

3
Laplace and Fourier Transforms
4
Transform Properties
5
Transforms worth knowing Impulses
6
Transforms worth knowing Filters
7
Rectangular Window
8
Hamming Hanning Windows
9
Periodic Signals
10
Random Signals (Noise)
11
The Short-Time Fourier Transform
12
The Spectrogram
13
Narrowband Spectrogram N gt 2T0
14
Wideband Spectrogram N lt T0
15
Fundamental Frequency
10F0
4T0
Fundamental Frequency (Pitch) F01/T0
16
On to New MaterialManner Features, Speech
Production, and Landmarks
17
Anatomy of Speech Production
Hard Palate
Nasal Cavity
Lips
Soft Palate (Open)
Oral Cavity
Pharynx
Epiglottis
Tongue Blade
Vocal Folds
Tongue Body
Jaw
Tongue Root
18
Speech sources Voicing, Turbulence, and
Transients

The vocal folds
A nonlinear, high-impedance oscillator
Excitation is like a periodic impulse train
Turbulence
Vortices striking an obstacle produce white noise
Excitation is like white noise
Transient
High pressure, suddenly released
Excitation is like a single loud impulse, d(t)

19
The vocal folds A nonlinear, high-impedance
oscillator
Vocal tract rings like a bell, shaping the
sound produced by the vocal folds (Cross-sectional
area of the vocal tract 0.5-10 cm2)
Larynx (the opening between the vocal folds) has
an open area of 0.03 cm2. In order to get
through, air from lungs must speed up to a
high-speed jet. Vocal folds flap back and forth,
driven by the jet, with a rate of 100-200
pulses/second.
20
Turbulence Vortices striking an obstacle produce
white noise
In a fricative, area of the tongue constriction
is about 0.2cm2. In order to get through, air
speeds up into a turbulent jet.
The turbulent jet strikes against downstream
obstacles, like the teeth. The jet contains
vortices of all different radii, between 0mm and
0.2cm, therefore the resulting sound contains
noise at all frequencies above about 700Hz.
21
Transient High pressure, suddenly released
While tongue tip is closed, air pressure builds
up behind the constriction.
When constriction is released, there is a sudden
change in air flow through the constriction (from
0 to nonzero). The sudden change in airflow is
heard as a pop.
22
The Source-Filter Model of Speech Production
Corresponds to S(s) H(s)E(s), where S(s)
Recorded speech spectrum E(s) Source
spectrum H(s) Transfer function Filtering by
the vocal tract
23
Manner Classification of Phonemes continuant

-continuant lips or tongue close COMPLETELY
on midline of the vocal tract
stops (p,b,t,d,k,g)
nasals (m,n,ng),
affricates (q,j,ch,zh)
syllable-initial lateral (l, e.g., lake)
continuant no complete closure
fricatives (f,v,s,z,sh,x, Chinese h)
glides (w,y,r, English h)
vowels (a,e,i,o,u)
diphthongs (in buy, boy, bow)

24
Manner Classification of Phonemes sonorant

sonorant a sound you can sing (Latin)
nasals (m,n,ng)
lateral (l)
glides (w,y,r)
vowels (a,e,i,o,u)
diphthongs (buy, boy, bow)
-sonorant air pressure builds up behind
constriction voicing amplitude drops (also
called an obstruent consonant)
stops (p,b,t,d,k,g)
affricates (q,j,ch,zh)
fricatives (f,v,s,z,sh,x)
Special status of sonorant in Chinese
initial must be all-sonorant (liang) or
all-obstruent (qing)
final must be all-sonorant

25
Sonorant Consonants Glide, Lateral, Nasal
layya ton -- /l/, /y/, /t/, /n/ (the /y/ is
continuant, others are -)
ame -- /m/ -continuant
26
Obstruent Consonants Fricatives, Affricates, and
Stops
sa (continuant)
shi (continuant)
qe (-continuant)
iji (-continuant)
ba (-continuant)
ita (-continuant)
27
Place of Primary Articulation
Palatal (Blade)q,j,sh,y,i
Alveolar (Blade)t,d,s,z,n,l
Retroflex (Blade)ch,zh,x,r,er
Velar (Body)k,g,ng,w,u
Dental (Blade)th,dh
Labial (Lips)p,b,f,v,m,w,u,o
Uvular (Body)h,o
Pharyngeal(Body)a,ae
Laryngealh
28
Features of Secondary Articulators lateral,
nasal, affricated, aspirated

sonorant,continuant vowels, glides
sonorant,-continuant
nasal soft palate is open air escapes
through the nose
lateral tongue is open on the sides air can
escape around edges of tongue
-sonorant,continuant fricatives
-sonorant,-continuant
affricated tongue stays nearly closed after
release, causing frication (q,j,ch,zh)
aspirated larynx stays open after release,
causing aspiration (p,t,k)
-affricated,-aspirated nothing special happens
after release vowel starts immediately (b,d,g)

29
Sonorant Consonants Glide, Lateral, Nasal
layya ton -- /l/, /y/, /t/, /n/ (the /y/ is
continuant, others are -)
ame -- /m/ -continuant
30
Waveforms and Spectrograms Aspirated and
Unaspirated Stops
Unaspirated /b/
Aspirated /t/
31
Phonetic Subsegments in the Release of an
Aspirated Stop
32
Waveforms and Spectrograms Fricatives and
Affricates
iji
qe
sa
shi
33
Landmarks Changes in the features continuant,
sonorant
/m/ release
/t/ release
/k/
/l/ release
/m/ closure
/n/ release
/v/ release
/t/ closure
/v/ closure
/n/ closure
34
The Vocal Tract Transfer Function
35
Log-Spectral Separation of Source and Filter
36
Formant Frequencies Resonant Frequencies of the
Vocal Tract
37
Formant Frequencies of a Vowel
From Peterson and Barney, Control Methods in a
Study of the Vowels, Journal of the Acoustical
Society of America, 1952
38
Classifying Vowels
F2 starts at 1200Hz, rises to 2000Hz
F21200Hz
F1800Hz
F1 starts at 800Hz, falls to 300Hz
Therefore diphthong is /AY/
Therefore vowel is /AH/
39
Rational Filters Obstruents
40
Example Front Cavity Resonance of /ch/ (q) is
near F3 of Following Vowel
41
Rational Filters Nasal Consonants
42
Examples Nasal Consonants
/m/ This talker makes /m/ with resonances at
1000Hz, 1800Hz uncancelled, but with the
resonance at 300Hz cancelled by zeros.
/ng/ This talker makes /ng/ with resonances at
300Hz, 1000Hz uncancelled, but with the resonance
at 1800Hz cancelled by zeros.
43
Summary

Spectrogram is the log magnitude of the STFT.
Wideband spectrogram NltT0, pitch shows up in the
time domain
Narrowband spectrogram Ngt2T0, pitch shows up in
the frequency domain
Landmarks occur at changes in the values of the
distinctive features continuant and sonorant
continuant,sonorant vowels, glides,
diphthongs
continuant,-sonorant fricatives
-continuant,sonorant nasals, laterals
-continuant,-sonorant stops, affricates
Recognition of Vowels and Glides F1 and F2 are
usually enough
Recognition of Diphthongs F1 and F2 at two
separate points in time (beginning and ending of
the vowel).
Obstruent Consonants Back cavity formants are
cancelled by zeros, leaving only the front cavity
formants (e.g., F3 for /sh/, /q/)
Nasal Consonants Resonances of the mouth-nose
system are often cancelled by zeros, leaving
primarily low-frequency energy.