Chap 6' Speech signal Representations - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Chap 6' Speech signal Representations

Description:

impulse train with same period in cepstrum-domain. Cepstrum of windowing signal. Example ... candidates in each frame. define a cost function. Viterbi search ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 27
Provided by: yihru
Category:

less

Transcript and Presenter's Notes

Title: Chap 6' Speech signal Representations


1
Chap 6. Speech signal Representations
  • Short-time Fourier analysis
  • (1) Speech signal is time-variant
  • (2) Short-time stationary
  • (3) time domain ? Frequency domain using FFT
  • (4) Windowing function (FFT size)
  • Hamming window was the most frequently
    used one.
  • For 8KHz, window size 256 (240, 30ms)
    (zero-pending)
  • For 16KHz, window size 512 (480, 30ms)
  • (5) Spectrograms
  • Narrow band, wide band spectrogram (FFT
    resolution, output filter)

2
Source filter Model
Nostrils all-pole model is not good enough
The lips model Usually using 1-?z-1 ? 0.9,
0.95, 0.97
All-poles model (LPC)
3
Linear Prediction Coding (LPC)
  • Linear prediction (AR model)
  • According to losses tube model (lattice formula
    which will introduce later)
  • 8KHz sampling, c340 m/s, L 17cm ? N8 (2 poles
    for 1KHz)

4
  • Solve the linear prediction coefficients using
    MMSE criterion

5
  • Covariance method

6
  • Solution of covariance method using Cholesky
    decomposition
  • (1) solve V, D

7
  • (2) solve A

8
  • Autocorrelation Method

9
  • R is Toepliz
  • Using Levinson Dubins algorithm

The algorithm is to transfer Ladder filter ?
Lattice filter lattice filter ? cascade form Ei
the square error of prediction Ki the
coefficients of lattice filter refection
coefficients
10
  • Lattice filter
  • Define forward/backward prediction errors

11
  • LPC using Lattice filter

12
Spectral analysis vs. LPC
  • LPC spectrum

13
  • Prediction error vs. LPC order

14
Conversion between parameters
  • Reflection coefficients vs. LPC
  • Log-area ratios

15
Cepstral processing
  • Spectral vs. Cepstral
  • Cepstral is a homomorphic transformation
    (de-convolution)
  • The Block diagram

16
  • Cepstral of real signal

17
  • Cepstrum of pole-zero function

18
LPC derived cepstrum

19
  • Cepstrum of speech signal
  • periodic excitation train
  • ? impulse train with same period in
    cepstrum-domain
  • Cepstrum of windowing signal
  • Example
  • (from Fundamentals of Speech Recognition, by B.
    H. Juang)

20
Mel-frequency Cepstrum (MFCC)
  • Change the frequency scale into Mel-scale

Frequency quantiztion?
21
  • M20 for Fs8KHz, 24 for Fs16KHz

22
Pitch detector of speech signal
  • Speech signal is a quasi-periodic signal, because
    the speech is a time-variated signal.
  • Find the pitch frequency (Fundamental freq., F0)
  • ? find the period of a discrete signal.
  • Assume the pitch contour will continue
  • ? find a smooth pitch contour
  • ? smoothing/contour tracking algorithm is
    needed.

23
  • Autocorrelation method
  • - autocorrelation function of a periodic signal
    is also periodic, ? finding the time shift with
    max autocorrelation ? period

24
  • An example of Autocorrelation and pitch contour
  • Half pitch/Double pitch error?
  • U/V decision?
  • Smoothness of pitch contour?

Max picking range? Global or Local max?
25
  • Normalized Cross-correlation method - used
    cross-correlation instead of
  • Decaying the Normalized
  • Cross-correlation
  • wrt. to T?

26
  • Pitch tracking
  • ? leave more candidates in each frame
  • ? define a cost function
  • ? Viterbi search
Write a Comment
User Comments (0)
About PowerShow.com