Linear Predictive Coding for Speech Compression - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Linear Predictive Coding for Speech Compression

Description:

Title: Scalable Video Coding Update Author: yangzi Created Date: 2/2/2005 5:07:21 PM Document presentation format: On-screen Show Company: northwestern university – PowerPoint PPT presentation

Number of Views:208
Avg rating:3.0/5.0
Slides: 19
Provided by: yan69
Category:

less

Transcript and Presenter's Notes

Title: Linear Predictive Coding for Speech Compression


1
Linear Predictive Coding for Speech Compression
  • Dev Ghosh
  • ECE 463

9 March 2006
2
Overview
  • General Model for Speech Synthesis
  • Channel Vocoder
  • Linear Predictive Coder (LPC-10)
  • Code Excited Linear Prediction (CELP)
  • Novel Application
  • Sub-band adaptive filtering based on cochlear
    model

3
Model for Speech Synthesis
  • Speech produced by forcing air through vocal
    cords, larynx, pharynx, mouth and nose
  • At transmitter speech is divided into segments
  • Each segment analyzed to determine excitation
    signal and parameters of vocal tract filter

Excitation Source
Vocal tract filter
Speech
4
Channel Vocoder - analysis
  • Each segment of input speech analyzed by a bank
    of (bandpass) analysis filters
  • Energy at output of each filter is estimated 50
    times a second and transmitted to receiver
  • Decision made whether segment
  • voiced /a/, /e/, /o/ or
  • unvoiced /s/, /f/
  • Estimate of pitch period (period of fundamental
    harmonic) is determined

5
Voice vs. Unvoiced Speech
6
Channel vocoder - synthesis
  • Vocal tract filter implemented by bank of
    (bandpass) synthesis filters
  • For voiced segments, periodic pulse generator is
    input
  • For unvoiced segments, pseudonoise source is
    input
  • Period determined by pitch estimate
  • Scaled by output of energy estimate
  • First approach to speech compression

7
Linear Predictive Coder
  • Models vocal tract as a single linear filter
  • yn ?aiyn-iG?n
  • Output yn, Input ?n, Gain G
  • Input is random noise (unvoiced) or periodic
    pulse (voiced)
  • LPC-10 is a standard (2.4 kb, 8000 Samples/sec)

8
LPC - Voiced/Unvoiced Decision
  • Voiced speech has more energy and lower frequency
    than unvoiced
  • Speech segment lowpass filtered, energy at output
    relative to background noise used to determine
  • Zero-crossings counted to determine frequency
  • Continuity critereon voicing decision of
    neighboring frames taken into account

9
LPC - Estimating Pitch Period
  • Extracting pitch from short noisy segment is
    difficult
  • One approach is to maximize autocorrelation
  • Periodicity isnt strong enough
  • Threshold cant be used because maximum value not
    known in advance

10
LPC - Estimating Pitch Period
  • LPC-10 uses average magnitude difference function
    (AMDF)
  • AMDF(P) (1/N)?yi-yi-P
  • If yn is periodic with period P0, samples P0
    apart will have values close to each other and
    AMDF will have a min at P0
  • AMDF is periodic for voiced and roughly flat for
    unvoiced
  • AMDF is min when P is the pitch period and
    spurious min in unvoiced segments are shallow

11
LPC - Obtaining Vocal Tract Filter
  • At transmitter, we want filter coeffs that best
    match the segment in a mean squared error
  • en2(yn- ?aiyn-iG?n)2
  • Autocorrelation approach assumes yn is
    stationary
  • A R-1P
  • Recursive solution uses Levinson-Durbin

12
LPC - Obtaining the Vocal Tract Filter
  • Covariance approach discards stationarity
    assumption (not valid for speech signals)
  • cij Eyn-iyn-j
  • yields
  • CA S

13
LPC - Obtaining the Vocal Tract Filter
  • cij are estimated as
  • cij ?yn-iyn-j
  • No longer assume values of yn outside of segment
    are zero
  • Cholesky decomposition required
  • Reflection coeffs used to update voicing decision

14
LPC - Transmitting Parameters
  • Tenth order filter used for voiced speech and
    fourth order for unvoiced
  • Vocal tract filter is sensitive to errors in
    reflection coeffs close to one
  • gi (1ki)/(1-ki)
  • are quantized and sent instead of ki

15
Code Excited Linear Prediction
  • Single pulse per pitch period leads to buzzy
    twang
  • Variety of excitation signals is allowed
  • For each segment encoder finds excitation vector
    that generates synthesized speech that best
    matches speech being coded

16
Sub-band adaptive filtering
  • Multi-channel speech enhancement system
  • Greater number of sub-bands used, the faster the
    convergence of the overall system

17
Cochlear Modelling
  • Sub-band filters are distributed logarithmically
    in frequency to approximate distribution of
    filters in cochlea

18
Adaptive Noise Cancellation
  • LMS algorithm is used to model differential
    transfer function between noise signals in a
    number of sub-bands
  • Lower power and shorter filters used in each
    sub-band
  • Convergence is equal across all bands if power is
    distributed equally and filter lengths are the
    same
  • Convergence dominated by sub-band with greatest
    power
Write a Comment
User Comments (0)
About PowerShow.com