LandmarkBased Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Net - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

LandmarkBased Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Net

Description:

Spectrogram reading: Formants. Log-linear form of a rational filter ... Spectrogram is the log magnitude of the STFT. Wideband spectrogram: N T0, pitch shows up ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 44
Provided by: jhas7
Category:

less

Transcript and Presenter's Notes

Title: LandmarkBased Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Net


1
Landmark-Based Speech RecognitionSpectrogram
Reading,Support Vector Machines,Dynamic
Bayesian Networks,and Phonology
  • Mark Hasegawa-Johnson
  • University of Illinois at Urbana-Champaign, USA
  • Assistant Professor, Electrical and Computer
    Engineering Department
  • Assistant Professor, Beckman Institute for
    Advanced Science and Technology
  • Adjunct Professor, Speech and Hearing Sciences
    Department

2
Lecture 1Introduction to Spectrogram Reading
  • Review
  • Laplace and Fourier transforms
  • Short-time Fourier transform (STFT) and windowing
  • White noise
  • Periodic Signals
  • Spectrogram reading Pitch
  • Wideband and narrowband spectrograms
  • Spectrogram reading Manner
  • Speech physiology
  • Manner classification of phonemes
  • Spectrogram reading Formants
  • Log-linear form of a rational filter

3
Laplace and Fourier Transforms
4
Transform Properties
5
Transforms worth knowing Impulses
6
Transforms worth knowing Filters
7
Rectangular Window
8
Hamming Hanning Windows
9
Periodic Signals
10
Random Signals (Noise)
11
The Short-Time Fourier Transform
12
The Spectrogram
13
Narrowband Spectrogram N gt 2T0
14
Wideband Spectrogram N lt T0
15
Fundamental Frequency
10F0
4T0
Fundamental Frequency (Pitch) F01/T0
16
On to New MaterialManner Features, Speech
Production, and Landmarks
17
Anatomy of Speech Production
Hard Palate
Nasal Cavity
Lips
Soft Palate (Open)
Oral Cavity
Pharynx
Epiglottis
Tongue Blade
Vocal Folds
Tongue Body
Jaw
Tongue Root
18
Speech sources Voicing, Turbulence, and
Transients
  • The vocal folds
  • A nonlinear, high-impedance oscillator
  • Excitation is like a periodic impulse train
  • Turbulence
  • Vortices striking an obstacle produce white noise
  • Excitation is like white noise
  • Transient
  • High pressure, suddenly released
  • Excitation is like a single loud impulse, d(t)

19
The vocal folds A nonlinear, high-impedance
oscillator
Vocal tract rings like a bell, shaping the
sound produced by the vocal folds (Cross-sectional
area of the vocal tract 0.5-10 cm2)
Larynx (the opening between the vocal folds) has
an open area of 0.03 cm2. In order to get
through, air from lungs must speed up to a
high-speed jet. Vocal folds flap back and forth,
driven by the jet, with a rate of 100-200
pulses/second.
20
Turbulence Vortices striking an obstacle produce
white noise
In a fricative, area of the tongue constriction
is about 0.2cm2. In order to get through, air
speeds up into a turbulent jet.
The turbulent jet strikes against downstream
obstacles, like the teeth. The jet contains
vortices of all different radii, between 0mm and
0.2cm, therefore the resulting sound contains
noise at all frequencies above about 700Hz.
21
Transient High pressure, suddenly released
While tongue tip is closed, air pressure builds
up behind the constriction.
When constriction is released, there is a sudden
change in air flow through the constriction (from
0 to nonzero). The sudden change in airflow is
heard as a pop.
22
The Source-Filter Model of Speech Production
Corresponds to S(s) H(s)E(s), where S(s)
Recorded speech spectrum E(s) Source
spectrum H(s) Transfer function Filtering by
the vocal tract
23
Manner Classification of Phonemes continuant
  • -continuant lips or tongue close COMPLETELY
    on midline of the vocal tract
  • stops (p,b,t,d,k,g)
  • nasals (m,n,ng),
  • affricates (q,j,ch,zh)
  • syllable-initial lateral (l, e.g., lake)
  • continuant no complete closure
  • fricatives (f,v,s,z,sh,x, Chinese h)
  • glides (w,y,r, English h)
  • vowels (a,e,i,o,u)
  • diphthongs (in buy, boy, bow)

24
Manner Classification of Phonemes sonorant
  • sonorant a sound you can sing (Latin)
  • nasals (m,n,ng)
  • lateral (l)
  • glides (w,y,r)
  • vowels (a,e,i,o,u)
  • diphthongs (buy, boy, bow)
  • -sonorant air pressure builds up behind
    constriction voicing amplitude drops (also
    called an obstruent consonant)
  • stops (p,b,t,d,k,g)
  • affricates (q,j,ch,zh)
  • fricatives (f,v,s,z,sh,x)
  • Special status of sonorant in Chinese
  • initial must be all-sonorant (liang) or
    all-obstruent (qing)
  • final must be all-sonorant

25
Sonorant Consonants Glide, Lateral, Nasal
layya ton -- /l/, /y/, /t/, /n/ (the /y/ is
continuant, others are -)
ame -- /m/ -continuant
26
Obstruent Consonants Fricatives, Affricates, and
Stops
sa (continuant)
shi (continuant)
qe (-continuant)
iji (-continuant)
ba (-continuant)
ita (-continuant)
27
Place of Primary Articulation
Palatal (Blade)q,j,sh,y,i
Alveolar (Blade)t,d,s,z,n,l
Retroflex (Blade)ch,zh,x,r,er
Velar (Body)k,g,ng,w,u
Dental (Blade)th,dh
Labial (Lips)p,b,f,v,m,w,u,o
Uvular (Body)h,o
Pharyngeal(Body)a,ae
Laryngealh
28
Features of Secondary Articulators lateral,
nasal, affricated, aspirated
  • sonorant,continuant vowels, glides
  • sonorant,-continuant
  • nasal soft palate is open air escapes
    through the nose
  • lateral tongue is open on the sides air can
    escape around edges of tongue
  • -sonorant,continuant fricatives
  • -sonorant,-continuant
  • affricated tongue stays nearly closed after
    release, causing frication (q,j,ch,zh)
  • aspirated larynx stays open after release,
    causing aspiration (p,t,k)
  • -affricated,-aspirated nothing special happens
    after release vowel starts immediately (b,d,g)

29
Sonorant Consonants Glide, Lateral, Nasal
layya ton -- /l/, /y/, /t/, /n/ (the /y/ is
continuant, others are -)
ame -- /m/ -continuant
30
Waveforms and Spectrograms Aspirated and
Unaspirated Stops
Unaspirated /b/
Aspirated /t/
31
Phonetic Subsegments in the Release of an
Aspirated Stop
32
Waveforms and Spectrograms Fricatives and
Affricates
iji
qe
sa
shi
33
Landmarks Changes in the features continuant,
sonorant
/m/ release
/t/ release
/k/
/l/ release
/m/ closure
/n/ release
/v/ release
/t/ closure
/v/ closure
/n/ closure
34
The Vocal Tract Transfer Function
35
Log-Spectral Separation of Source and Filter
36
Formant Frequencies Resonant Frequencies of the
Vocal Tract
37
Formant Frequencies of a Vowel
From Peterson and Barney, Control Methods in a
Study of the Vowels, Journal of the Acoustical
Society of America, 1952
38
Classifying Vowels
F2 starts at 1200Hz, rises to 2000Hz
F21200Hz
F1800Hz
F1 starts at 800Hz, falls to 300Hz
Therefore diphthong is /AY/
Therefore vowel is /AH/
39
Rational Filters Obstruents
40
Example Front Cavity Resonance of /ch/ (q) is
near F3 of Following Vowel
41
Rational Filters Nasal Consonants
42
Examples Nasal Consonants
/m/ This talker makes /m/ with resonances at
1000Hz, 1800Hz uncancelled, but with the
resonance at 300Hz cancelled by zeros.
/ng/ This talker makes /ng/ with resonances at
300Hz, 1000Hz uncancelled, but with the resonance
at 1800Hz cancelled by zeros.
43
Summary
  • Spectrogram is the log magnitude of the STFT.
  • Wideband spectrogram NltT0, pitch shows up in the
    time domain
  • Narrowband spectrogram Ngt2T0, pitch shows up in
    the frequency domain
  • Landmarks occur at changes in the values of the
    distinctive features continuant and sonorant
  • continuant,sonorant vowels, glides,
    diphthongs
  • continuant,-sonorant fricatives
  • -continuant,sonorant nasals, laterals
  • -continuant,-sonorant stops, affricates
  • Recognition of Vowels and Glides F1 and F2 are
    usually enough
  • Recognition of Diphthongs F1 and F2 at two
    separate points in time (beginning and ending of
    the vowel).
  • Obstruent Consonants Back cavity formants are
    cancelled by zeros, leaving only the front cavity
    formants (e.g., F3 for /sh/, /q/)
  • Nasal Consonants Resonances of the mouth-nose
    system are often cancelled by zeros, leaving
    primarily low-frequency energy.
Write a Comment
User Comments (0)
About PowerShow.com