Speech Recognition Chapter 3 - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

Speech Recognition Chapter 3

Description:

Speech Recognition Chapter 3 Speech Front-Ends Linear Prediction Analysis Linear-Prediction Based Processing Cepstral Analysis Auditory signal Processing Linear ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 81
Provided by: JuanArtur3
Category:

less

Transcript and Presenter's Notes

Title: Speech Recognition Chapter 3


1
Speech RecognitionChapter 3
2
Speech Front-Ends
  • Linear Prediction Analysis
  • Linear-Prediction Based Processing
  • Cepstral Analysis
  • Auditory signal Processing

3
Linear Prediction Analysis
  • Introduction
  • Linear Prediction Model
  • Linear Prediction Coefficients Computation
  • Linear Prediction for Automatic Speech
    Recognition
  • Linear Prediction in Speech Processing
  • How good is the LP Model.

4
Signal Processing Front End
Convert the speech waveform in some type of
parametric representation.
sk
Filterbank
Signal Processing Front End
Linear Prediction Front End
Linear Prediction Coefficients
Oo(1)o(2)..o(T)
5
Introduction
  • In short intervals, it provides a good model of
    the speech.
  • Mathematical precise and simple.
  • Easy to implement in software or hardware.
  • Works fine for recognition applications.
  • It also has applications in formant and pitch
    estimation, speech coding and synthesis.

6
Linear Prediction Model
  • Basic idea
  • are called LP(Linear
    Prediction) coefficients.
  • By including the excitation signal, we obtain
  • where is the normalised excitation and
    is the gain of the excitation.

7
  • In the z-domain (secc. 1.1.4, pp. 15, Deller)
  • leading to the transfer function (Fig. 3.27)

8
  • LP model retains the spectral magnitude, but it
    has a minimum phase (Sec. 1.1.7, Deller) feature.
  • However, in practice, phase is not very important
    for speech perception.

Observation H(z) models the glottal
filter(G(z)) and the lips radiation(R(z).
9
Linear Prediction Coefficients Computation
  • Introduction
  • Methogologies

10
Linear Prediction Coefficients Computation
  • LP coefficients can be obtained by solving the
    next equation system (Secc. 3.3.2, Prove
    )

11
Methodologies
  • Autocorrelation Method
  • Covariance Method
  • Not commonly used in Speech Recognition

12
Autocorrelation Method
  • Assumptions Each frame is independent (Fig. 3.29
    ).
  • Solution (Juang, secc. 3.3.3 pp105-106)
  • where

  • (2)

These equations are know as Yule-Walker equations.
13
  • Using matrix notation
  • or

14
Features
Symetric.
Diagonal elements are the same.
Toeplitz Matriz
15
  • This matrix is known as Toeplitz. A linear system
    with this matrix can be solved very efficient.
  • Examples (Fig. 3.32 and 3.33 )
  • Example (Fig. 3.34 )
  • Example (Fig. 3.35 )
  • Example (Fig. 3.36 )

16
Linear Prediction for Automatic Speech Recogition
To minimise signal discontinuity
Flats the spectrum
equation (2) usually M8
Incorporate signal dynamics
to minimise noise sensitivity
To Cepstral Coefficients
Durbin Algorithm
17
Preemphasis
  • The transfer function of the glottis can be
    modelled as follows
  • The radiation effect can be modelled as follows

18
Hence, to obtain the transfer function of the
vocal tract the other pole must be cancelled as
follows.
19
Preemphasis sould be done only for sonorant
sounds.
This process can be automated as follows.
where is the autocorrelation function.
20
N samples size frame, M samples frame shift
N samples size frame, M samples frame shift
21
  • Minimize signal discontinuities at the edges of
    the frames.
  • A typical window is the Hamming window.

22
(No Transcript)
23
LPC Analysis
  • Converts the autocorrelations coefficients into
    LPC parameter set.
  • LPC Parameter set
  • LPC coefficients
  • Reflection (PARCOR) coefficients
  • log area ratio coefficients
  • The formal method to obtain the LPC parameter set
    is know as Durbins method.

24
Durbins method
25
(No Transcript)
26
LPC (Typical values)
27
LPC Parameter Conversion
  • Conversion to Cepstral Coeficients.
  • Robust feature set for speech recognition.
  • Algorithm

28
Parameter weighting
  • low-order cepstral coefficents are highly
    sensibles to noise

29
Temporal Cepstral Derivative
  • First or second order derivatives is enough.
  • It can be aproximated as follows

30
(No Transcript)
31
(No Transcript)
32
Given
33
Hamming Windowed
Large prediction errors since speech is
predicted form previous samples arbitray set to
zero.
34
Large prediction errors since speech is
predicted form previous samples arbitray set to
zero.
35
Unvoiced signals are not position sensitive. It
does not show special effect at the edges.
36
Observe the whitening phenomena at the error
spectrum.
37
Observe the whitening phenomena at the
error specturm
38
Observe the error wave periodicity behaviour
taken as bases for the Pitch Estimators.
39
  • Observe that a sharp decrease
  • in the prediction error is obtain
  • for small M value (M1...4).
  • Observe that unvoiced signal
  • has higher RMS error.

40
Observe the all-pole model ability to match the
spectrum.
41
Linear Prediction in Speech Processing
  • LPC for Vocal Tract Shape Estimation
  • LPC for Pitch Detection
  • LPC for Formant prediction

42
LPC for Vocal Tract Shape Estimation
To minimise signal discontinuity
Free of glottis and radiation effects
Vocal Tract Shape Estimation
Parameter Calculation
to minimise noise sensitivity
To Cepstral Coefficients
43
Parameter Calculation
  • Durbins Method (As in Speech Recognition)
  • In case, this method is used, first the
    autocorrelation analysis should be performed.
  • Lattice Filter

44
Lattice Filter
  • The reflection coefficients are obtain directly
    form the signal, avoiding the autocorrelation
    analysis.
  • Methods
  • Itakura-Saito (Parcor)
  • Burg
  • New forms
  • Advantage
  • Easier to implement in Hardware
  • Disadvantage
  • needs around 5 times more calculation.

45
Itakura-Saito (PARCOR)
where
Accumulates over time (n).
It can be shown that the PARCOR coefficients,
obtain for the Itakura-Saito method are exactly
the same as the reflection coefficients obtained
by the Levison Durbin algorithm.
Example
46
Burg
where
Example
47
Example
Itakura-Saito
Burg
48
New Forms
  • Stroback, New forms of Levinson and Schur
    algorithms, IEEE Signal Processing Magazine, pp.
    12-36, 1991.

49
Vocal Tract Shape Estimation
From
We obtain
Therefore, by setting the the lips area to an
arbitrary value we can obtain the vocal tract
configuration relative to the initial condition.
This technique as been succesfully used to train
deaf persons.
50
LPC for Pitch Detection
Speech Sampled at 10KHz
Inverse Filering A(z)
LPF 800Hz
DownSampler 51
Peak finding
Autocorrelation
LPC Analysis
V/U decision or Pitch
51
LPC for Formant Detection
Sampled Speech
Formants
Peak finding
LPC Spectrum
Emphasis Peaks (second derivative)
LPC Analysis
52
LPC Spectrum
  • LP assumes that the vocal tract system can be
    modelled with an all-pole system
  • The spectrum can be obtain by
  • In order to emphasis formant peaks we can set

53
Therefore
Spectrum (DTFT)
Spectrum (DFT)
In order to increase the spectral resolution we
pad with zeros
In order to use an FFT algorithm
54
Caclulate the Spectral magnitude(DFT)
Invert the Spectral magnitude(DFT)
This spectrum is called the LPC Spectrum.
55
How good is the LP Model
  • As shown by the physiological analysis of the
    vocal tract the speech model is as follows
  • However, it can be shown ( ), that LP Model
    is good for estimating the magnitude of pole-zero
    system.

56
Prove
  • According to lema 1 ( ) and lema 2 ( ) ,
    can be written as follows
  • The estimates are calculated such that it
    correspond to the of this model.

All pass component
57
  • Since hence
  • therefore, if the estimators, are exacts, then
    at least we obtain a model with a correct
    magnitude.

58
Lema 1
  • Lema 1(System Decomposition)
  • Any causal ration system
  • can be descomponed as (prove )

Minimal phase component
59
Prove
For two poles and two zeros
Lets define
Re-arranging this equation
60
With the knowledge that
Hence
61
Therefore
End of prove.
62
Lema 2
  • Lema 2 Minimum phase component can be expresed
    as an all-pole system
  • in theory goes to infinity, in practice is
    limited.

63
Linear Prediction Based Procesing
  • Critics to the Linear Prediction Model
  • Perceptual Linear Prediction (PLP)
  • LP Cepstra

64
Critics to the Linear Prediction Model
  • The LP spectrum approximate the speech spectrum
    equally well at all frequencies of the analysis
    band.
  • This property is inconsistent with the human
    hearing.

65
Precepual Linear Prediction (PLP)
Critical Band Spectral Analysis
Equal Loudness Pre-emphasis
Intensity Loudness
IDFT
Yule-Walker Equations Solutions
66
Critical Band Analysis
Speech Signal Frame
Critical Band Spectral Resolution
Short-Term Spectra
Windowing
DFT (20 ms) (200 samples 56 zeros for padding for
Ts10KHz)c
DFT (20 ms Hamming Window
67
Critical-Band Spectral Resolution
Frequency Warping (Hertz -gt Barks)
Convolution and Downsampling
filter-bank masking curve approximation
68
Equal Loudness Pre-emphasis
Approximate the non-equal sensitivity of the
human hearing at different frequencies.
69
Intensitive Loudnes Power Law
Approximate the non-linear relation between the
intensity of sound and its perceived loudness.
70
Cepstral Analysis
  • Introduction
  • Homomorphic Processing
  • Cepstral Spectrum
  • Cepstrum
  • Mel-Cepstrum
  • Cepstrum in Speech Processing

71
Introduction
When speech is pre-emphasised
The excitation is not necessary for estimate the
vocal tract function.
Therefore, it is desirable to separate the
excitation information form the vocal tract
information.
72
We can think the speech spectrum as a signal, we
can observer that is composed for the
multiplication of a slow signal, and a
fast signal, .
Therefore, we can try to obtain the best of this
knowledge. The formal technique which exploit
this feature is called Homomorphic Processing.
73
Homomorphic Processing
  • It is a technique to filter no-lineal systems.
  • In Homomorphic Processing the non-linear related
    signals are transform the signal to a linear
    domain.

F(z)
H
H-1
74
In order to obtain a linear system a complex log
transformation is applied to the speech spectrum.
S(z)
log
exp
75
Cepstral Spectrum
Definition.
where
is the STFT
76
Cepstrum
Definition.
77
Cepstrum In Speech Processing
  • Pitch Estimation
  • Format Estimation
  • Pitch and Formant Estimation

78
Pitch Estimation
Sampled Speech
Peak finding
High-Pass Liftering
Emphasis Peaks (second derivative)
Cepstrum
Pitch
79
Formant Estimation
Sampled Speech
Peak finding
Low-Pass Liftering
Emphasis Peaks (second derivative)
Cepstrum
Formants
80
Pitch and Formant Estimation
Sampled Speech
Peak finding
High-Pass Liftering
Emphasis Peaks (second derivative)
Cepstrum
Pitch
Peak finding
Low-Pass Liftering
Emphasis Peaks (second derivative)
Formants
Write a Comment
User Comments (0)
About PowerShow.com