Speech Recognition Chapter 3 - PowerPoint PPT Presentation

1 / 80

About This Presentation

Title:

Speech Recognition Chapter 3

Description:

Speech Recognition Chapter 3 Speech Front-Ends Linear Prediction Analysis Linear-Prediction Based Processing Cepstral Analysis Auditory signal Processing Linear ... – PowerPoint PPT presentation

Number of Views:120

Avg rating:3.0/5.0

Slides: 81

Provided by: JuanArtur3

Category:

more less

Transcript and Presenter's Notes

Title: Speech Recognition Chapter 3

1
Speech RecognitionChapter 3
2
Speech Front-Ends

Linear Prediction Analysis
Linear-Prediction Based Processing
Cepstral Analysis
Auditory signal Processing

3
Linear Prediction Analysis

Introduction
Linear Prediction Model
Linear Prediction Coefficients Computation
Linear Prediction for Automatic Speech
Recognition
Linear Prediction in Speech Processing
How good is the LP Model.

4
Signal Processing Front End
Convert the speech waveform in some type of
parametric representation.
sk
Filterbank
Signal Processing Front End
Linear Prediction Front End
Linear Prediction Coefficients
Oo(1)o(2)..o(T)
5
Introduction

In short intervals, it provides a good model of
the speech.
Mathematical precise and simple.
Easy to implement in software or hardware.
Works fine for recognition applications.
It also has applications in formant and pitch
estimation, speech coding and synthesis.

6
Linear Prediction Model

Basic idea
are called LP(Linear
Prediction) coefficients.
By including the excitation signal, we obtain
where is the normalised excitation and
is the gain of the excitation.

In the z-domain (secc. 1.1.4, pp. 15, Deller)
leading to the transfer function (Fig. 3.27)

LP model retains the spectral magnitude, but it
has a minimum phase (Sec. 1.1.7, Deller) feature.
However, in practice, phase is not very important
for speech perception.

Observation H(z) models the glottal
filter(G(z)) and the lips radiation(R(z).
9
Linear Prediction Coefficients Computation

Introduction
Methogologies

10
Linear Prediction Coefficients Computation

LP coefficients can be obtained by solving the
next equation system (Secc. 3.3.2, Prove
)

11
Methodologies

Autocorrelation Method
Covariance Method
Not commonly used in Speech Recognition

12
Autocorrelation Method

Assumptions Each frame is independent (Fig. 3.29
).
Solution (Juang, secc. 3.3.3 pp105-106)
where
(2)

These equations are know as Yule-Walker equations.
13

Using matrix notation
or

14
Features
Symetric.
Diagonal elements are the same.
Toeplitz Matriz
15

This matrix is known as Toeplitz. A linear system
with this matrix can be solved very efficient.
Examples (Fig. 3.32 and 3.33 )
Example (Fig. 3.34 )
Example (Fig. 3.35 )
Example (Fig. 3.36 )

16
Linear Prediction for Automatic Speech Recogition
To minimise signal discontinuity
Flats the spectrum
equation (2) usually M8
Incorporate signal dynamics
to minimise noise sensitivity
To Cepstral Coefficients
Durbin Algorithm
17
Preemphasis

The transfer function of the glottis can be
modelled as follows
The radiation effect can be modelled as follows

18
Hence, to obtain the transfer function of the
vocal tract the other pole must be cancelled as
follows.
19
Preemphasis sould be done only for sonorant
sounds.
This process can be automated as follows.
where is the autocorrelation function.
20
N samples size frame, M samples frame shift
N samples size frame, M samples frame shift
21

Minimize signal discontinuities at the edges of
the frames.
A typical window is the Hamming window.

22
(No Transcript)
23
LPC Analysis

Converts the autocorrelations coefficients into
LPC parameter set.
LPC Parameter set
LPC coefficients
Reflection (PARCOR) coefficients
log area ratio coefficients
The formal method to obtain the LPC parameter set
is know as Durbins method.

24
Durbins method
25
(No Transcript)
26
LPC (Typical values)
27
LPC Parameter Conversion

Conversion to Cepstral Coeficients.
Robust feature set for speech recognition.
Algorithm

28
Parameter weighting

low-order cepstral coefficents are highly
sensibles to noise

29
Temporal Cepstral Derivative

First or second order derivatives is enough.
It can be aproximated as follows

30
(No Transcript)
31
(No Transcript)
32
Given
33
Hamming Windowed
Large prediction errors since speech is
predicted form previous samples arbitray set to
zero.
34
Large prediction errors since speech is
predicted form previous samples arbitray set to
zero.
35
Unvoiced signals are not position sensitive. It
does not show special effect at the edges.
36
Observe the whitening phenomena at the error
spectrum.
37
Observe the whitening phenomena at the
error specturm
38
Observe the error wave periodicity behaviour
taken as bases for the Pitch Estimators.
39

Observe that a sharp decrease
in the prediction error is obtain
for small M value (M1...4).
Observe that unvoiced signal
has higher RMS error.

40
Observe the all-pole model ability to match the
spectrum.
41
Linear Prediction in Speech Processing

LPC for Vocal Tract Shape Estimation
LPC for Pitch Detection
LPC for Formant prediction

42
LPC for Vocal Tract Shape Estimation
To minimise signal discontinuity
Free of glottis and radiation effects
Vocal Tract Shape Estimation
Parameter Calculation
to minimise noise sensitivity
To Cepstral Coefficients
43
Parameter Calculation

Durbins Method (As in Speech Recognition)
In case, this method is used, first the
autocorrelation analysis should be performed.
Lattice Filter

44
Lattice Filter

The reflection coefficients are obtain directly
form the signal, avoiding the autocorrelation
analysis.
Methods
Itakura-Saito (Parcor)
Burg
New forms
Advantage
Easier to implement in Hardware
Disadvantage
needs around 5 times more calculation.

45
Itakura-Saito (PARCOR)
where
Accumulates over time (n).
It can be shown that the PARCOR coefficients,
obtain for the Itakura-Saito method are exactly
the same as the reflection coefficients obtained
by the Levison Durbin algorithm.
Example
46
Burg
where
Example
47
Example
Itakura-Saito
Burg
48
New Forms

Stroback, New forms of Levinson and Schur
algorithms, IEEE Signal Processing Magazine, pp.
12-36, 1991.

49
Vocal Tract Shape Estimation
From
We obtain
Therefore, by setting the the lips area to an
arbitrary value we can obtain the vocal tract
configuration relative to the initial condition.
This technique as been succesfully used to train
deaf persons.
50
LPC for Pitch Detection
Speech Sampled at 10KHz
Inverse Filering A(z)
LPF 800Hz
DownSampler 51
Peak finding
Autocorrelation
LPC Analysis
V/U decision or Pitch
51
LPC for Formant Detection
Sampled Speech
Formants
Peak finding
LPC Spectrum
Emphasis Peaks (second derivative)
LPC Analysis
52
LPC Spectrum

LP assumes that the vocal tract system can be
modelled with an all-pole system
The spectrum can be obtain by
In order to emphasis formant peaks we can set

53
Therefore
Spectrum (DTFT)
Spectrum (DFT)
In order to increase the spectral resolution we
pad with zeros
In order to use an FFT algorithm
54
Caclulate the Spectral magnitude(DFT)
Invert the Spectral magnitude(DFT)
This spectrum is called the LPC Spectrum.
55
How good is the LP Model

As shown by the physiological analysis of the
vocal tract the speech model is as follows
However, it can be shown ( ), that LP Model
is good for estimating the magnitude of pole-zero
system.

56
Prove

According to lema 1 ( ) and lema 2 ( ) ,
can be written as follows
The estimates are calculated such that it
correspond to the of this model.

All pass component
57

Since hence
therefore, if the estimators, are exacts, then
at least we obtain a model with a correct
magnitude.

58
Lema 1

Lema 1(System Decomposition)
Any causal ration system
can be descomponed as (prove )

Minimal phase component
59
Prove
For two poles and two zeros
Lets define
Re-arranging this equation
60
With the knowledge that
Hence
61
Therefore
End of prove.
62
Lema 2

Lema 2 Minimum phase component can be expresed
as an all-pole system
in theory goes to infinity, in practice is
limited.

63
Linear Prediction Based Procesing

Critics to the Linear Prediction Model
Perceptual Linear Prediction (PLP)
LP Cepstra

64
Critics to the Linear Prediction Model

The LP spectrum approximate the speech spectrum
equally well at all frequencies of the analysis
band.
This property is inconsistent with the human
hearing.

65
Precepual Linear Prediction (PLP)
Critical Band Spectral Analysis
Equal Loudness Pre-emphasis
Intensity Loudness
IDFT
Yule-Walker Equations Solutions
66
Critical Band Analysis
Speech Signal Frame
Critical Band Spectral Resolution
Short-Term Spectra
Windowing
DFT (20 ms) (200 samples 56 zeros for padding for
Ts10KHz)c
DFT (20 ms Hamming Window
67
Critical-Band Spectral Resolution
Frequency Warping (Hertz -gt Barks)
Convolution and Downsampling
filter-bank masking curve approximation
68
Equal Loudness Pre-emphasis
Approximate the non-equal sensitivity of the
human hearing at different frequencies.
69
Intensitive Loudnes Power Law
Approximate the non-linear relation between the
intensity of sound and its perceived loudness.
70
Cepstral Analysis

Introduction
Homomorphic Processing
Cepstral Spectrum
Cepstrum
Mel-Cepstrum
Cepstrum in Speech Processing

71
Introduction
When speech is pre-emphasised
The excitation is not necessary for estimate the
vocal tract function.
Therefore, it is desirable to separate the
excitation information form the vocal tract
information.
72
We can think the speech spectrum as a signal, we
can observer that is composed for the
multiplication of a slow signal, and a
fast signal, .
Therefore, we can try to obtain the best of this
knowledge. The formal technique which exploit
this feature is called Homomorphic Processing.
73
Homomorphic Processing

It is a technique to filter no-lineal systems.
In Homomorphic Processing the non-linear related
signals are transform the signal to a linear
domain.

F(z)
H
H-1
74
In order to obtain a linear system a complex log
transformation is applied to the speech spectrum.
S(z)
log
exp
75
Cepstral Spectrum
Definition.
where
is the STFT
76
Cepstrum
Definition.
77
Cepstrum In Speech Processing

Pitch Estimation
Format Estimation
Pitch and Formant Estimation

78
Pitch Estimation
Sampled Speech
Peak finding
High-Pass Liftering
Emphasis Peaks (second derivative)
Cepstrum
Pitch
79
Formant Estimation
Sampled Speech
Peak finding
Low-Pass Liftering
Emphasis Peaks (second derivative)
Cepstrum
Formants
80
Pitch and Formant Estimation
Sampled Speech
Peak finding
High-Pass Liftering
Emphasis Peaks (second derivative)
Cepstrum
Pitch
Peak finding
Low-Pass Liftering
Emphasis Peaks (second derivative)
Formants

Write a Comment

User Comments (0)