Audio Compression Techniques - PowerPoint PPT Presentation


PPT – Audio Compression Techniques PowerPoint presentation | free to download - id: 177ca1-ZDc1Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Audio Compression Techniques


Audio compression algorithms are often referred to as 'audio encoders' Applications ... Implemented using a look-up tables in encoder and in decoder ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 27
Provided by: Pau169
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Audio Compression Techniques

Audio Compression Techniques
  • MUMT 611, January 2005
  • Assignment 2
  • Paul Kolesnik

  • Digital Audio Compression
  • Removal of redundant or otherwise irrelevant
    information from audio signal
  • Audio compression algorithms are often referred
    to as audio encoders
  • Applications
  • Reduces required storage space
  • Reduces required transmission bandwidth

Audio Compression
  • Audio signal overview
  • Sampling rate ( of samples per second)
  • Bit rate ( of bits per second). Typically,
    uncompressed stereo 16-bit 44.1KHz signal has a
    1.4MBps bit rate
  • Number of channels (mono / stereo / multichannel)
  • Reduction by lowering those values or by data
    compression / encoding

Audio Data Compression
  • Redundant information
  • Implicit in the remaining information
  • Ex. oversampled audio signal
  • Irrelevant information
  • Perceptually insignificant
  • Cannot be recovered from remaining information

Audio Data Compression
  • Lossless Audio Compression
  • Removes redundant data
  • Resulting signal is same as original perfect
  • Lossy Audio Encoding
  • Removes irrelevant data
  • Resulting signal is similar to original

Audio Data Compression
  • Audio vs. Speech Compression Techniques
  • Speech Compression uses a human vocal tract model
    to compress signals
  • Audio Compression does not use this technique due
    to larger variety of possible signal variations

Generic Audio Encoder
Generic Audio Encoder
  • Psychoacoustic Model
  • Psychoacoustics study of how sounds are
    perceived by humans
  • Uses perceptual coding
  • eliminate information from audio signal that is
    inaudible to the ear
  • Detects conditions under which different audio
    signal components mask each other

Psychoacoustic Model
  • Signal Masking
  • Threshold cut-off
  • Spectral (Frequency / Simultaneous) Masking
  • Temporal Masking
  • Threshold cut-off and spectral masking occur in
    frequency domain, temporal masking occurs in time

Signal Masking
  • Threshold cut-off
  • Hearing threshold level a function of frequency
  • Any frequency components below the threshold will
    not be perceived by human ear

Signal Masking
  • Spectral Masking
  • A frequency component can be partly or fully
    masked by another component that is close to it
    in frequency
  • This shifts the hearing threshold

Signal Masking
  • Temporal Masking
  • A quieter sound can be masked by a louder sound
    if they are temporally close
  • Sounds that occur both (shortly) before and after
    volume increase can be masked

Spectral Analysis
  • Tasks of Spectral Analysis
  • To derive masking thresholds to determine which
    signal components can be eliminated
  • To generate a representation of the signal to
    which masking thresholds can be applied
  • Spectral Analysis is done through transforms or
    filter banks

Spectral Analysis
  • Transforms
  • Fast Fourier Transform (FFT)
  • Discrete Cosine Transform (DCT) - similar to FFT
    but uses cosine values only
  • Modified Discrete Cosine Transform (MDCT) used
    by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3
    overlapped and windowed version of DCT

Spectral Analysis
  • Filter Banks
  • Time sample blocks are passed through a set of
    bandpass filters
  • Masking thresholds are applied to resulting
    frequency subband signals
  • Poly-phase and wavelet banks are most popular
    filter structures

Filter Bank Structures
  • Polyphase Filter Bank
    used in all of the MPEG-1 encoders
  • Signal is separated into subbands, the widths of
    which are equal over the entire frequency range
  • The resulting subband signals are downsampled to
    create shorter signals (which are later
    reconstructed during decoding process)

Filter Bank Structures
  • Wavelet Filter Bank
    used by Enhanced Perceptual Audio Coder (EPAC)
    by Lucent
  • Unlike polyphase filter, the widths of the
    subbands are not evenly spaced (narrower for
    higher frequencies)
  • This allows for better time resolution (ex. short
    attacks), but at expense of frequency resolution

Noise Allocation
  • System Task derive and apply shifted hearing
    threshold to the input signal
  • Anything below the threshold doesnt need to be
  • Any noise below the threshold is irrelevant
  • Frequency component quantization
  • Tradeoff between space and noise
  • Encoder saves on space by using just enough bits
    for each frequency component to keep noise under
    the threshold - this is known as noise allocation

Noise Allocation
  • Pre-echo
  • In case a single audio block contains silence
    followed by a loud attack, pre-echo error occurs
    - there will be audible noise in the silent part
    of the block after decoding
  • This is avoided by pre-monitoring audio data at
    encoding stage and separating audio into shorter
    blocks in potential pre-echo case
  • This does not completely eliminate pre-echo, but
    can make it short enough to be masked by the
    attack (temporal masking)

Pre-echo Effect
Additional Encoding Techniques
  • Other encoding techniques techniques are
    available (alternative or in combination)
  • Predictive Coding
  • Coupling / Delta Encoding
  • Huffman Encoding

Additional Encoding Techniques
  • Predictive Coding
  • Often used in speech and image compression
  • Estimates the expected value for each sample
    based on previous sample values
  • Transmits/stores the difference between the
    expected and received value
  • Generates an estimate for the next sample and
    then adjusts it by the difference stored for the
    current sample
  • Used for additional compression in MPEG2 AAC

Additional Encoding Techniques
  • Coupling / Delta encoding
  • Used in cases where audio signal consists of two
    or more channels (stereo or surround sound)
  • Similarities between channels are used for
  • A sum and difference between two channels are
    derived difference is usually some value close
    to zero and therefore requires less space to
  • This is a case of lossless encoding process

Additional Encoding Techniques
  • Huffman Coding
  • Information-theory-based technique
  • An element of a signal that often reoccurs in the
    signal is represented by a simpler symbol, and
    its value is stored in a look-up table
  • Implemented using a look-up tables in encoder and
    in decoder
  • Provides substantial lossless compression, but
    requires high computational power and therefore
    is not very popular
  • Used by MPEG1 and MPEG2 AAC

Encoding - Final Stages
  • Audio data packed into frames
  • Frames stored or transmitted

  • HTML Bibliography
  • http//
  • Questions