Perceptual Audio Coding - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Perceptual Audio Coding

Description:

Perceptual Audio Coding. By Nick Hone and Colin Carnegie. July 25, 2005. N. Why Perceptual Coding? The data rate of a standard audio CD is 172 KB/sec ... – PowerPoint PPT presentation

Number of Views:234
Avg rating:3.0/5.0
Slides: 22
Provided by: nickhoneco
Category:

less

Transcript and Presenter's Notes

Title: Perceptual Audio Coding


1
Perceptual Audio Coding
  • By Nick Hone and Colin Carnegie
  • July 25, 2005

N
2
Why Perceptual Coding?
  • The data rate of a standard audio CD is 172
    KB/sec
  • Home Internet connections had speeds of 5 KB/sec
  • Perceptual coding exploits the imprecision of the
    human ear

N
3
Perceptual Coding Standards
  • Audio
  • MPEG-1 Part 3 Layers 1, 2, and 3
  • Ogg Vorbis
  • Video
  • MPEG-1 Part 2, MPEG-2 Part 2
  • H.264
  • Ogg Theora

N
4
Human Auditory Limitations
  • Frequency resolution within critical bands
  • Loud signals overpower adjacent signals
  • Quantization noise is less audible at these
    points - we can save bits!

C
5
Compression System
C
6
Filter Bank
  • 32 band polyphase filter
  • Overlapping bands
  • Non Reversible
  • Linear (unlike 25 critical bands)
  • Noise introduced is negligible

N
7
Psychoacoustic Model
  • Operates on 12 samples
  • Hann weighted FFT/separated into bands
  • Tonal/non tonal power separated
  • Non tonal information shifted to geometric mean
  • Masking function calculated, using global and
    dynamic threshold values

C
8
Bit Allocation
  • Quantization required for each subband determined
    with iterative process
  • Starting from total bits, distribute a bit to a
    sub-band with the highest signal to mask ratio,
    then recalculate ratios

C
9
Polyphase Filter Implementation
  • Sub-band filtering achieved using modified
    discrete cosine transform
  • Implemented in MATLAB with loops evaluating the
    following equation
  • C is the filter response
  • X is the PCM samples
  • i denotes the current sub-band

N
10
Bitstream Formatting 1
  • MPEG-1 Layer 1 frames are independent
  • Frames can be randomly accessed
  • Each frame stores 384 PCM samples
  • Each frame is identified by a valid header

N
11
Bitstream Formatting 2
  • Header contains data such as sampling rate,
    presence of CRC field
  • The infrequently used 16 bit CRC field allows
    error checking
  • All sub-bands have a bit allocation field
  • Sub-bands with zero bits do not have scale factor
    or sample fields

N
12
Analysis Frames
N
13
Implementation-Bit Allocation
  • Bit allocation and psychoacoustic models are
    combined
  • Applied to the 32 sub-bands instead of critical
    bands

C
14
Implementation-Bit Allocation
  • Average power calculated for each sub-band
  • Sub-band power scaled by frequency response of
    humans ear sensitivity
  • Sub-bands adjacent to high power sub-bands are
    reduced.
  • Iterative bit allocation process distributes bits
    to highest power sub bands

C
15
Power in a Signal
C
16
Bit Allocation
C
17
Test Signal Spectrogram
N
18
Test Signal
  • Spectrum spans 0 to 20 kHz
  • Most power near 0 Hz
  • Noise is relatively low (blue-green)

N
19
Compressed Signal Spectrogram
N
20
Compressed Signal
  • Spectrum spans 0 to 4 kHz
  • Almost all power concentrated in 0 to 4 kHz
  • High frequencies replaced with noise

N
21
Conclusion
  • MPEG is really, really hard
  • Concepts are simple to prove
  • The devil is in the details

C
Write a Comment
User Comments (0)
About PowerShow.com