Speech Coding Techniques - PowerPoint PPT Presentation

Loading...

PPT – Speech Coding Techniques PowerPoint presentation | free to view - id: cce21-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Speech Coding Techniques

Description:

Change the mode at any time. Offer discontinuous transmission ... DTMF digits, a busy tone, a congestion tone, a ringing tone, etc. The named events ... – PowerPoint PPT presentation

Number of Views:319
Avg rating:3.0/5.0
Slides: 39
Provided by: csieN
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Speech Coding Techniques


1
Speech Coding Techniques
  • ???
  • 4/7/2003

2
Introduction
  • Efficient speech-coding techniques
  • Advantages for VoIP
  • Digital streams of ones and zeros
  • The lower the bandwidth, the lower the quality
  • RTP payload types
  • Processing power
  • The better quality (for a given bandwidth) uses a
    more complex algorithm
  • A balance between quality and cost

3
Voice Quality
  • Bandwidth is easily quantified
  • Voice quality is subjective
  • MOS, Mean Opinion Score
  • ITU-T Recommendation P.800
  • Excellent 5
  • Good 4
  • Fair 3
  • Poor 2
  • Bad 1
  • A minimum of 30 people
  • Listen to voice samples or in conversations

4
  • P.800 recommendations
  • The selection of participants
  • The test environment
  • Explanations to listeners
  • Analysis of results
  • Toll quality
  • A MOS of 4.0 or higher

5
About Speech
  • Speech
  • Air pushed from the lungs past the vocal cords
    and along the vocal tract
  • The basic vibrations vocal cords
  • The sound is altered by the disposition of the
    vocal tract ( tongue and mouth)
  • Model the vocal tract as a filter
  • The shape changes relatively slowly
  • The vibrations at the vocal cords
  • The excitation signal

6
Speech sounds
  • Voiced sound
  • The vocal cords vibrate open and close
  • Quasi-periodic pulses of air
  • The rate of the opening and closing the pitch
  • Unvoiced sounds
  • Forcing air at high velocities through a
    constriction
  • Noise-like turbulence
  • Show little long-term periodicity
  • Short-term correlations still present
  • Plosive sounds
  • A complete closure in the vocal tract
  • Air pressure is built up and released suddenly

7
Voice Sampling
  • Discrete Time LTI Systems The Convolution Sum

1
hn
0
1
2
n
2.5
2
2
xn
yn
0.5
0.5
0
1
0
1
2
3
n
n
8
  • Nyquist sampling theorem

9
Quantization (Scalar Quantization)
v1
v2
vk1
vL
m1
m0 -A
m2
mk
mk1
mL?1
mLA
Jk1
      Assume xn ? A divide the range ?A
, A into L quantization levels J1 , J2 ,
Jk ,.. JL Jk mk-1,mk L 2R
each quantization level Jk is represented by a
value vk S U Jk , V v1 , v2 , vk ,..
vL
10
Non-Uniform Quantization
Concept small quantization levels for small
x large quantization levels for large x
Goal constant SNRQ for all x
11
Companding
Compressor 11011101 Expandor
Compressor Expandor ? Compandor F(x) is to
specify the non-uniform quantization
characteristics
12
Non-Uniform Quantization
  • ?-law
  • A-law
  • Typical values in practice
  • ? 255 , A 87.6

13
Types of Speech Codecs
  • Waveform codecs,source codecs (also known as
    vocoders),and hybrid codecs.

14
Speech Source Model and Source Coding
G(z), G(?), gn
unvoiced G v/u
voiced N
Excitation parameters v/u voiced/ unvoiced N
pitch for voiced G signal gain ? excitation
signal un
random sequence generator
un
xn
?
periodic pulse train generator
Vocal Tract Model
Vocal Tract parameters ak LPC
coefficients ?formant structure of
speech signals
Excitation
A good approximation, though not precise enough
15
LPC Vocoder(Voice Coder)
xn
ak N , G v/u
LPC Analysis
Encoder
11011
N by pitch detection v/u by voicing detection
ak can be non-uniform or vector quantized to
reduce bit rate further
16
G.711
  • The most commonplace codec
  • Used in circuit-switched telephone network
  • PCM, Pulse-Code Modulation
  • If uniform quantization
  • 12 bits 8 k/sec 96 kbps
  • Non-uniform quantization
  • 65 kbps DS0 rate
  • North America
  • A-law
  • Other countries, a little friendlier to lower
    signal levels
  • An MOS of about 4.3

17
ADPCM(adaptive differential PCM)
  • DPCM and ADPCM.
  • ADPCM Adaptive Prediction in DPCM Adaptive
    Quantization
  • Adaptive Quantization
  • Quantization level ? varies with local signal
    level
  • ?n a?xn
  • ?xn locally estimated standard deviation of
    xn
  • G.721ADPCM-coded speech at 32Kbps.
  • G.726(A-law or )
  • 16,24,32,40Kbps
  • MOS 4.0 , at 32Kbps

18
Analysis-by-Synthesis (AbS) Codecs
  • Hybrid codec
  • Fill the gap between waveform and source codecs
  • The most successful and commonly used
  • Time-domain AbS codecs
  • Not a simple two-state, voiced/unvoiced
  • Different excitation signals are attempted
  • Closest to the original waveform is selected
  • MPE, Multi-Pulse Excited
  • RPE, Regular-Pulse Excited
  • CELP, Code-Excited Linear Predictive

19
G.728 LD-CELP
  • CELP codecs
  • A filter its characteristics change over time
  • A codebook of acoustic vectors
  • A vector a set of elements representing various
    char. of the excitation
  • Transmit
  • Filter coefficients, gain, a pointer to the
    vector chosen
  • Low Delay CELP
  • Backward-adaptive coder
  • Use previous samples to determine filter
    coefficients
  • Operates on five samples at a time
  • Delay lt 1 ms
  • Only the pointer is transmitted

20
  • 1024 vectors in the code book
  • 10-bit pointer (index)
  • 16 kbps
  • LD-CELP encoder
  • Minimize a frequency-weighted mean-square error

21
  • LD-CELP decoder
  • An MOS score of about 3.9
  • One-quarter of G.711 bandwidth

22
G.723.1 ACELP
  • 6.3 or 5.3 kbps
  • Both mandatory
  • Can change from one to another during a
    conversation
  • The coder
  • A band-limited input speech signal
  • Sampled at 8 KHz, 16-bit uniform PCM quantization
  • Operate on blocks of 240 samples at a time
  • A look-ahead of 7.5 ms
  • A total algorithmic delay of 37.5 ms other
    delays
  • A high-pass filter to remove any DC component

23
  • G.723.1 Annex A
  • Silence Insertion Description (SID) frames of
    size four octets
  • The two lsbs of the first octet
  • 00 6.3kbps 24 octets/frame
  • 01 5.3kbps 20
  • 10 SID frame 4
  • An MOS of about 3.8
  • At least 37.5 ms delay

24
G.729
  • 8 kbps
  • Input frames of 10 ms, 80 samples for 8 KHz
    sampling rate
  • 5 ms look-ahead
  • Algorithmic delay of 15 ms
  • An 80-bit frame for 10 ms of speech
  • A complex codec
  • G.729.A (Annex A), a number of simplifications
  • Same frame structure
  • Encoder/decoder, G.729/G.729.A
  • Slightly lower quality

25
  • G.729.B
  • VAD, Voice Activity Detection
  • Based on analysis of several parameters of the
    input
  • The current frames plus two preceding frames
  • DTX, Discontinuous Transmission
  • Send nothing or send an SID frame
  • SID frame contains information to generate
    comfort noise
  • CNG, Comfort Noise Generation
  • G.729, an MOS of about 4.0
  • G.729A an MOS of about 3.7

26
Other Codecs
  • CDMA QCELP defined in IS-733
  • Variable-rate coder
  • Two most common rates
  • The high rate, 13.3 kbps
  • A lower rate, 6.2 kbps
  • Silence suppression
  • For use with RTP, RFC 2658

27
  • GSM Enhanced Full-Rate (EFR)
  • GSM 06.60
  • An enhanced version of GSM Full-Rate
  • ACELP-based codec
  • The same bit rate and the same overall packing
    structure
  • 12.2 kbps
  • Support discontinuous transmission
  • For use with RTP, RFC 1890

28
  • GSM Adaptive Multi-Rate (AMR) codec
  • GSM 06.90
  • Eight different modes
  • 4.75 kbps to 12.2 kbps
  • 12.2 kbps, GSM EFR
  • 7.4 kbps, IS-641 (TDMA cellular systems)
  • Change the mode at any time
  • Offer discontinuous transmission
  • The coding choice of many 3G wireless networks

29
  • The MOS values are for laboratory conditions
  • G.711 does not deal with lost packets
  • G.729 can accommodate a lost frame by
    interpolating from previous frames
  • But cause errors in subsequent speech frames
  • Processing Power
  • G.728 or G.729, 40 MIPS
  • G.726 10 MIPS

30
  • Cascaded Codecs
  • E.g., G.711 stream -gt G.729 encoder/decoder
  • Might not even come close to G.729
  • Each coder only generate an approximate of the
    incoming signal

31
Tones, Signal, and DTMF Digits
  • The hybrid codecs are optimized for human speech
  • Other data may need to be transmitted
  • Tones fax tones, dialing tone, busy tone
  • DTMF digits for two-stage dialing or voice-mail
  • G.711 is OK
  • G.723.1 and G.729 can be unintelligible
  • The ingress gateway needs to intercept
  • The tones and DTMT digits
  • Use an external signaling system

32
  • Easy at the start of a call
  • Difficult in the middle of a call
  • Encode the tones differently form the speech
  • Send them along the same media path
  • An RTP packet provides the name of the tone and
    the duration
  • Or, a dynamic RTP profile an RTP packet
    containing the frequency, volume and the duration
  • RFC 2198
  • An RTP payload format for redundant audio data
  • Sending both types of RTP payload

33
  • RTP Payload Format for DTMF Digits
  • An Internet Draft
  • Both methods described before
  • A large number of tones and events
  • DTMF digits, a busy tone, a congestion tone, a
    ringing tone, etc.
  • The named events
  • E the end of the tone, R reserved

34
  • Payload format

35
  • Finis

36
Discrete Time LTI Systems The Convolution Sum
1
hn
0
1
2
n
2.5
2
2
xn
yn
0.5
0.5
0
1
0
1
2
3
n
n
37
Frequency-Domain Representation of Sampling
38
Speech Source Model and Source Coding
  • Vocal Tract Model
About PowerShow.com