A 12-WEEK PROJECT IN Speech Coding and Recognition - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

A 12-WEEK PROJECT IN Speech Coding and Recognition

Description:

A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen Overview An Introduction to Speech Signals (Vedrana) Linear Prediction ... – PowerPoint PPT presentation

Number of Views:158
Avg rating:3.0/5.0
Slides: 54
Provided by: VedranaA7
Category:

less

Transcript and Presenter's Notes

Title: A 12-WEEK PROJECT IN Speech Coding and Recognition


1
A 12-WEEK PROJECT INSpeech Coding and Recognition
  • by Fu-Tien Hsiao
  • and Vedrana Andersen

2
Overview
  • An Introduction to Speech Signals (Vedrana)
  • Linear Prediction Analysis (Fu)
  • Speech Coding and Synthesis (Fu)
  • Speech Recognition (Vedrana)

3
Speech Coding and Recognition
  • AN INTRODUCTION TO SPEECH SIGNALS

4
AN INTRODUCTION TO SPEECH SIGNALSSpeech
Production
  • Flow of air from lungs
  • Vibrating vocal cords
  • Speech production cavities
  • Lips
  • Sound wave
  • Vowels (a, e, i), fricatives (f, s, z) and
    plosives (p, t, k)

5
AN INTRODUCTION TO SPEECH SIGNALSSpeech Signals
  • Sampling frequency 8 16 kHz
  • Short-time stationary assumption (frames 20 40
    ms)

6
AN INTRODUCTION TO SPEECH SIGNALSModel for
Speech Production
  • Excitation (periodic, noisy)
  • Vocal tract filter (nasal cavity, oral cavity,
    pharynx)

7
AN INTRODUCTION TO SPEECH SIGNALSVoiced and
Unvoiced Sounds
  • Voiced sounds, periodic excitation, pitch period
  • Unvoiced sounds, noise-like excitation
  • Short-time measures power and zero-crossing

8
AN INTRODUCTION TO SPEECH SIGNALSFrequency Domain
  • Pitch, harmonics (excitation)
  • Formants, envelope (vocal tract filter)
  • Harmonic product spectrum

9
AN INTRODUCTION TO SPEECH SIGNALSSpeech
Spectrograms
  • Time varying formant structure
  • Narrowband / wideband

10
Speech Coding and Recognition
  • LINEAR PREDICTION ANALYSIS

11
LINEAR PREDICTION ANALYSISCategories
  • Vocal Tract Filter
  • Linear Prediction Analysis
  • Error Minimization
  • Levison-Durbin Recursion
  • Residual sequence u(n)

12
LINEAR PREDICTION ANALYSISVocal Tract Filter(1)
  • Vocal tract filter
  • If we assume an all poles filter?

Output speech
Input periodic impulse train
13
LINEAR PREDICTION ANALYSISVocal Tract Filter(2)
  • Auto regressive model
  • (all poles filter)
  • where p is called the model order
  • Speech is a linear combination of past samples
    and an extra part, Aug(z)

14
LINEAR PREDICTION ANALYSISLinear Prediction
Analysis(1)
  • Goal how to find the coefficients ak in this all
    poles model?

Physical model v.s. Analysis system
error, e(n)
speech, s(n)
impulse, Aug(n)
all poles model
?
ak here is fixed, but unknown!
we try to find ak to estimate ak
15
LINEAR PREDICTION ANALYSISLinear Prediction
Analysis(2)
  • What is really inside the ? box?
  • A predictor (P(z), FIR filter) inside,
  • where s(n) a1s(n-1)a2s(n-2) aps(n-p)
  • If ak ak , then e(n) Aug(n)

predicitve s(n)
predictive error, e(n)s(n)- s(n)
original s(n)
P(z)
-
A(z)1-P(z)
16
LINEAR PREDICTION ANALYSISLinear Prediction
Analysis (3)
  • If we can find a predictor generating a smallest
    error e(n) which is close to Aug(n), then we can
    use A(z) to estimate filter coefficients.

very similar to vocal tract model
17
LINEAR PREDICTION ANALYSISError Minization(1)
  • Problem How to find the minimum error?
  • Energy of error
  • , where e(n)s(n)- s(n)
  • function(ai)
  • For quadratic function of ai we can find the
    smallest value by for each

18
LINEAR PREDICTION ANALYSISError Minization(2)
  • By differentiation,
  • We define that,
  • where
  • This is actually an autocorrelation of s(n)

a set of linear equations
19
LINEAR PREDICTION ANALYSISError Minization(3)
  • Hence, lets discuss linear equations in matrix
  • Linear prediction coefficient is our goal.
  • How to solve it efficiently?

20
LINEAR PREDICTION ANALYSISLevinson-Durbin
Recursion(1)
  • In the matrix, LD recursion method is based on
    following characteristics
  • Symmetric
  • Toeplitz
  • Hence we can solve matrix in O(p2) instead of
    O(p3)
  • Dont forget our objective, which is to find ak
    to simulate the vocal tract filter.

21
LINEAR PREDICTION ANALYSISLevinson-Durbin
Recursion(2)
  • In exercise, we solve matrix by brute force and
    L-D recursion. There is no difference of
    corresponding parameters
  • Error energy v.s. Predictor
    order

22
LINEAR PREDICTION ANALYSISResidual sequence u(n)
  • After knowing filter coefficients, we can find
    residual sequence u(n) by inversely filtering
    computation.
  • Try to compare
  • original s(n)
  • residual u(n)

23
Speech Coding and Recognition
  • SPEECH CODING AND SYNTHESIS

24
SPEECH CODING AND SYNTHESISCategories
  • Analysis-by-Synthesis
  • Perceptual Weighting Filter
  • Linear Predictive Coding
  • Multi-Pulse Linear Prediction
  • Code-Excited Linear Prediction (CELP)
  • CELP Experiment
  • Quantization

25
SPEECH CODING AND SYNTHESISAnalysis-by-Synthesis(
1)
  • Analyze the speech by estimating a LP synthesis
    filter
  • Computing a residual sequence as a excitation
    signal to reconstruct signal
  • Encoder/Decoder
  • the parameters like LP synthesis filter, gain,
    and pitch are coded, transmitted, and decoded

26
SPEECH CODING AND SYNTHESISAnalysis-by-Synthesis(
2)
  • Frame by frame
  • Without error minimization
  • With error minimization

27
SPEECH CODING AND SYNTHESISPerceptual Weighting
Filter(1)
  • Perceptual masking effect
  • Within the formant regions, one is less
    sensitive to the noise
  • Idea
  • designing a filter that de-emphasizes the error
    in the formant region
  • Result
  • synthetic speech with more error near formant
    peaks but less error in others

28
SPEECH CODING AND SYNTHESISPerceptual Weighting
Filter(2)
  • In frequency domain
  • LP syn. filter v.s. PW filter
  • Perceptual weighting coefficient
  • a 1, no filtering.
  • a decreases, filtering more
  • optimala depends on perception

29
SPEECH CODING AND SYNTHESISPerceptual Weighting
Filter(3)
  • In z domain, LP filter v.s. PW filter
  • Numerator generating the zeros which are the
    original poles of LP synthesis filter
  • Denominator placing the poles closer to the
    origin. a determines the distance

30
SPEECH CODING AND SYNTHESISLinear Predictive
Coding(1)
  • Based on above methods, PW filter and
    analysis-by-synthesis
  • If excitation signal impulse train, during
    voicing, we can get a reconstructed signal very
    close to the original
  • More often, however, the residue is far from the
    impulse train

31
SPEECH CODING AND SYNTHESISLinear Predictive
Coding(2)
  • Hence, there are many kinds of coding trying to
    improve this
  • Primarily differ in the type of excitation signal
  • Two kinds
  • Multi-Pulse Linear Prediction
  • Code-Excited Linear Prediction (CELP)

32
SPEECH CODING AND SYNTHESISMulti-Pulse Linear
Predcition(1)
  • Concept represent the residual sequence by
    putting impulses in order to make s(n) closer to
    s(n).

s(n)
LP Analysis
s(n)
Error Minimization
Excitation Generator
LP Synthesis Filter
-
Multi-pulse, u(n)
PW Filter
33
SPEECH CODING AND SYNTHESISMulti-Pulse Linear
Predcition(2)
  • s1 Estimate the LPC filter without excitation
  • s2 Place one impulse (placement and amplitude)
  • s3 A new error is determined
  • s4 Repeat s2-s3 until reaching a desired min
    error

34
SPEECH CODING AND SYNTHESISCode-Excited Linear
Prediction(1)
  • The difference
  • Represent the residue v(n) by codewords
    (exhaustive searching) from a codebook of
    zero-mean Gaussian sequence
  • Consider primary pitch pulses which are
    predictable over consecutive periods

35
SPEECH CODING AND SYNTHESISCode-Excited Linear
Prediction(2)
s(n)
LP analysis
LP parameters
s(n)
s(n)
u(n)
LP synthesis filter
Gaussian excitation codebook
Multi-pulse generator
-
PW filter
Error minimization
36
SPEECH CODING AND SYNTHESISCELP Experiment(1)
  • An experiment of CELP
  • Original (blue)
  • Excitation signal (below)
  • Reconstructed
  • (green)

37
SPEECH CODING AND SYNTHESISCELP Experiment(2)
  • Test the quality for different settings
  • LPC model order
  • Initial M10
  • Test M2
  • PW coefficient

38
SPEECH CODING AND SYNTHESISCELP Experiment(3)
  • Codebook (L,K)
  • K codebook size
  • K influences the computation time strongly.
  • if K 1024 to 256, then time 13 to 6 sec
  • Initial (40,1024)
  • Test (40,16)
  • L length of the random signal
  • L determines the number of subblock in the frame

39
SPEECH CODING AND SYNTHESISQuantization
  • With quantization,
  • 16000 bps CELP
  • 9600 bps CELP
  • Trade-off
  • Bandwidth efficiency v.s. speech quality

40
Speech Coding and Recognition
  • SPEECH RECOGNITION

41
SPEECH RECOGNITIONDimensions of Difficulty
  • Speaker dependent / independent
  • Vocabulary size (small, medium, large)
  • Discrete words / continuous utterance
  • Quiet / noisy environment

42
SPEECH RECOGNITIONFeature Extraction
  • Overlapping frames
  • Feature vector for each frame
  • Mel-cepstrum, difference cepstrum, energy, diff.
    energy

43
SPEECH RECOGNITIONVector Quantization
  • Vector quantization
  • K-means algorithm
  • Observation sequence for the whole word

44
SPEECH RECOGNITIONHidden Markov Model (1)
  • Changing states, emitting symbols
  • ?(1), A, B

1
5
4
2
3
45
SPEECH RECOGNITIONHidden Markov Model (2)
  • Probability of transition
  • State transition matrix
  • State probability vector
  • State equation

46
SPEECH RECOGNITIONHidden Markov Model (3)
  • Probability of observing
  • Observation probability matrix
  • Observation probability vector
  • Observation equation

47
SPEECH RECOGNITIONHidden Markov Model (4)
  • Discrete observation hidden Markov model
  • Two HMM problems
  • Training problem
  • Recognition problem

48
SPEECH RECOGNITIONRecognition using HMM (1)
  • Determining the probability that a
    given HMM produced the observation sequence
  • Using straightforward computation all possible
    paths, ST

49
SPEECH RECOGNITIONRecognition using HMM (2)
  • Forward-backward algorithm, only the forward part
  • Forward partial observation
  • Forward probability

50
SPEECH RECOGNITIONRecognition using HMM (3)
  • Initialization
  • Recursion
  • Termination

51
SPEECH RECOGNITIONTraining HMM
  • No known analytical way
  • Forward-backward (Baum-Welch) reestimation, a
    hill-climbing algorithm
  • Reestimates HMM parameters in such a way that
  • Method
  • Uses and to compute forward and backward
    probabilities, calculates state transition
    probabilities and observation probabilities
  • Reestimates the model to improve probability
  • Need for scaling

52
SPEECH RECOGNITIONExperiments
  • Matrices A and B
  • Observation sequences for words one and two

53
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com