A Tutorial on MPEG/Audio Compression - PowerPoint PPT Presentation


PPT – A Tutorial on MPEG/Audio Compression PowerPoint presentation | free to download - id: 4388ba-ZmQwZ


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

A Tutorial on MPEG/Audio Compression


A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004 Outline Introduction ... – PowerPoint PPT presentation

Number of Views:195
Avg rating:3.0/5.0
Slides: 42
Provided by: Randee4
Learn more at: http://www.cs.ubc.ca


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: A Tutorial on MPEG/Audio Compression

A Tutorial on MPEG/Audio Compression
  • Davis Pan, IEEE Multimedia Journal,
  • Summer 1995
  • Presented by
  • Randeep Singh Gakhal
  • CMPT 820, Spring 2004

  • Introduction
  • Technical Overview
  • Polyphase Filter Bank
  • Psychoacoustic Model
  • Coding and Bit Allocation
  • Conclusions and Future Work

  • What does MPEG-1 Audio provide?
  • A transparently lossy audio compression system
    based on the weaknesses of the human ear.
  • Can provide compression by a factor of 6 and
    retain sound quality.
  • One part of a three part standard that includes
    audio, video, and audio/video synchronization.

Technical Overview
MPEG-I Audio Features
  • PCM sampling rate of 32, 44.1, or 48 kHz
  • Four channel modes
  • Monophonic and Dual-monophonic
  • Stereo and Joint-stereo
  • Three modes (layers in MPEG-I speak)
  • Layer I Computationally cheapest, bit rates gt
  • Layer II Bit rate 128 kbps, used in VCD
  • Layer III Most complicated encoding/decoding,
    bit rates 64kbps, originally intended for
    streaming audio

Human Audio System (ear brain)
  • Human sensitivity to sound is non-linear across
    audible range (20Hz 20kHz)
  • Audible range broken into regions where humans
    cannot perceive a difference
  • called the critical bands

MPEG-I Encoder Architecture1
MPEG-I Encoder Architecture
  • Polyphase Filter Bank Transforms PCM samples to
    frequency domain signals in 32 subbands
  • Psychoacoustic Model Calculates acoustically
    irrelevant parts of signal
  • Bit Allocator Allots bits to subbands according
    to input from psychoacoustic calculation.
  • Frame Creation Generates an MPEG-I compliant bit

The Polyphase Filter Bank
Polyphase Filter Bank
  • Divides audio signal into 32 equal width subband
    streams in the frequency domain.
  • Inverse filter at decoder cannot recover signal
    without some, albeit inaudible, loss.
  • Based on work by Rothweiler2.
  • Standard specifies 512 coefficient analysis
    window, Cn

Polyphase Filter Bank
  • Buffer of 512 PCM samples with 32 new samples,
    Xn, shifted in every computation cycle
  • Calculate window samples for i0511
  • Partial calculation for i063
  • Calculate 32 subsamples

Polyphase Filter Bank
  • Visualization of the filter1

Polyphase Filter Bank
  • The net effect
  • Analysis matrix
  • Requires 512 32x64 2560 multiplies.
  • Each subband has bandwidth p/32T centered at odd
    multiples of p/64T

Polyphase Filter Bank
  • Shortcomings
  • Equal width filters do not correspond with
    critical band model of auditory system.
  • Filter bank and its inverse are NOT lossless.
  • Frequency overlap between subbands.

Polyphase Filter Bank
  • Comparison of filter banks and critical bands1

Polyphase Filter Bank
  • Frequency response of one subband1

Psychoacoustic Model
The Weakness of the Human Ear
  • Frequency dependent resolution
  • We do not have the ability to discern minute
    differences in frequency within the critical
  • Auditory masking
  • When two signals of very close frequency are both
    present, the louder will mask the softer.
  • A masked signal must be louder than some
    threshold for it to be heard ? gives us room to
    introduce inaudible quantization noise.

MPEG-I Psychoacoustic Models
  • MPEG-I standard defines two models
  • Psychoacoustic Model 1
  • Less computationally expensive
  • Makes some serious compromises in what it assumes
    a listener cannot hear
  • Psychoacoustic Model 2
  • Provides more features suited for Layer III
    coding, assuming of course, increased processor

Psychoacoustic Model
  • Convert samples to frequency domain
  • Use a Hann weighting and then a DFT
  • Simply gives an edge artifact (from finite window
    size) free frequency domain representation.
  • Model 1 uses 512 (Layer I) or 1024 (Layers II and
    III) sample window.
  • Model 2 uses a 1024 sample window and two
    calculations per frame.

Psychoacoustic Model
  • Need to separate sound into tones and noise
  • Model 1
  • Local peaks are tones, lump remaining spectrum
    per critical band into noise at a representative
  • Model 2
  • Calculate tonality index to determine
    likelihood of each spectral point being a tone
  • based on previous two analysis windows

Psychoacoustic Model
  • Smear each signal within its critical band
  • Use either a masking (Model 1) or a spreading
    function (Model 2).
  • Adjust calculated threshold by incorporating a
    quiet mask masking threshold for each
    frequency when no other frequencies are present.

Psychoacoustic Model
  • Calculate a masking threshold for each subband in
    the polyphase filter bank
  • Model 1
  • Selects minima of masking threshold values in
    range of each subband
  • Inaccurate at higher frequencies recall how
    subbands are linearly distributed, critical bands
    are NOT!
  • Model 2
  • If subband wider than critical band
  • Use minimal masking threshold in subband
  • If critical band wider than subband
  • Use average masking threshold in subband

Psychoacoustic Model
  • The hard work is done now, we just calculate
    the signal-to-mask ratio (SMR) per subband
  • SMR signal energy / masking threshold
  • We pass our result on to the coding unit which
    can now produce a compressed bitstream

Psychoacoustic Model (example)
  • Input1

Psychoacoustic Model (example)
  • Transformation to perceptual domain1

Psychoacoustic Model (example)
  • Calculation of masking thresholds1

Psychoacoustic Model (example)
  • Signal-to-mask ratios1

Psychoacoustic Model (example)
  • What we actually send1

Coding and Bit Allocation
Layer Specific Coding
  • Layer specific frame formats1

Layer Specific Coding
  • Stream of samples is processed in groups1

Layer I Coding
  • Group 12 samples from each subband and encode
    them in each frame (384 samples)
  • Each group encoded with 0-15 bits/sample
  • Each group has 6-bit scale factor

Layer II Coding
  • Similar to Layer I except
  • Groups are now 3 of 12 samples per-subband
    1152 samples per frame
  • Can have up to 3 scale factors per subband to
    avoid audible distortion in special cases
  • Called scale factor selection information (SCFSI)

Layer III Coding
  • Further subdivides subbands using Modified
    Discrete Cosine Transform (MDCT) a lossless
  • Larger frequency resolution gt smaller time
  • possibility of pre-echo
  • Layer III encoder can detect and reduce pre-echo
    by borrowing bits from future encodings

Bit Allocation
  • Determine number of bits to allot for each
    subband given SMR from psychoacoustic model.
  • Layers I and II
  • Calculate mask-to-noise ratio
  • MNR SNR SMR (in dB)
  • SNR given by MPEG-I standard (as function of
    quantization levels)
  • Now iterate until no bits to allocate left
  • Allocate bits to subband with lowest MNR.
  • Re-calculate MNR for subband allocated more bits.

Bit Allocation
  • Layer III
  • Employs noise allocation
  • Quantizes each spectral value and employs Huffman
  • If Huffman encoding results in noise in excess of
    allowed distortion for a subband, encoder
    increases resolution on that subband
  • Whole process repeats until one of three
    specified stop conditions is met.

Conclusions and Future Work
  • MPEG-I provides tremendous compression for
    relatively cheap computation.
  • Not suitable for archival or audiophile grade
    music as very seasoned listeners can discern
  • Modifying or searching MPEG-I content requires
    decompression and is not cheap!

Future Work
  • MPEG-1 audio lays the foundation for all modern
    audio compression techniques
  • Lots of progress since then (1994!)
  • MPEG-2 (1996) extends MPEG audio compression to
    support 5.1 channel audio
  • MPEG-4 (1998) attempts to code based on perceived
    audio objects in the stream
  • Finally, MPEG-7 (2001) operates at an even higher
    level of abstraction, focusing on meta-data
    coding to make content searchable and retrievable

  • 1 D. Pan, A Tutorial on MPEG/Audio
    Compression, IEEE Multimedia Journal, 1995.
  • 2 J. H. Rothweiler, Polyphase Quadrature
    Filters a New Subband Coding Technique, Proc
    of the Int. Conf. IEEE ASSP, 27.2, pp1280-1283,
    Boston 1983.
About PowerShow.com