An Overview of Perceptual Audio Coding and MPEG AAC - PowerPoint PPT Presentation


PPT – An Overview of Perceptual Audio Coding and MPEG AAC PowerPoint presentation | free to download - id: 7046e4-YTY0M


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

An Overview of Perceptual Audio Coding and MPEG AAC


An Overview of Perceptual Audio Coding and MPEG AAC Introduction Audio coding or audio compression algorithms are used to obtain compact digital representation of ... – PowerPoint PPT presentation

Number of Views:170
Avg rating:3.0/5.0
Slides: 42
Provided by: Author197
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: An Overview of Perceptual Audio Coding and MPEG AAC

An Overview of Perceptual Audio Coding and MPEG
  • Audio coding or audio compression algorithms are
    used to obtain compact digital representation of
    high-fidelity (wideband) audio signals for the
    purpose of efficient transmission or storage.
  • The central objective in audio coding is to
    represent the signal with minimum number of bits
    while achieving transparent signal reproduction
    i.e. generating output audio that cannot
    distinguished from the original input even by a
    listener with Golden Ears
  • The Motion Picture Experts Group (MPEG) audio
    compression algorithm is an International
    Organization for Standardization (ISO) standard
    for high- fidelity audio compression.

  • MPEG audio compression standards are lossy audio
    coding standards. They try to compress audio by
    trying to reduce perceptual and statistical
  • The basic task of a perceptual audio coding
    system is to compress the digital audio data in a
    way that -
  • - the compression is as high as possible, and
  • - the reconstructed (decoded) audio sounds
    exactly (or as close as possible) to the
    original audio before compression

Audio Coding Techniques
  • Parametric Coding
  • Waveform Coding
  • Time Domain
  • PCM, DPCM, ADPCM etc.
  • Frequency Domain
  • Transform Coding, Subband Coding
  • Hybrid Coding

Perceptual Audio Coding Basics
  • Human hearing limited to values lower than 20kHz
    in most cases
  • Human hearing is insensitive to quiet frequency
    components to sound accompanying other stronger
    frequency components
  • Stereo audio streams contain largely redundant
  • MPEG audio compression takes advantage of these
    facts to reduce extent and detail of mostly
    inaudible frequency ranges

Generic Perceptual Audio Coding Architecture
Psychoacoustic Principles
  • High-precision engineering models for
    high-fidelity audio currently do not exist. So,
    audio coding algorithms rely upon generalized
    receiver models to optimize coding efficiency.
  • In the case of audio, the receiver is ultimately
    the human ear and sound perception is affected by
    its masking properties.
  • Perceptual audio coders achieve compression by
    exploiting the fact that irrelevant signal
    information is not detectable by even a well
    trained or sensitive listener.

  • Irrelevant signal information is identified
    during signal analysis by incorporating into the
    coder several psychoacoustic principles,
    including absolute hearing thresholds, critical
    band frequency analysis, simultaneous masking,
    the spread of masking along the basilar membrane,
    and temporal masking.
  • By combining all these, a quantitative estimate
    of the fundamental limit of transparent audio
    signal compression i.e. Perceptual Entropy is
    determined for given audio frame.

  • Perceptual entropy denotes minimum number of bits
    which should be allocated to a given audio frame
    to represent perceptually lossless audio.

Absolute Threshold of Hearing
  • The absolute threshold of hearing characterizes
    the amount of energy needed in a pure tone such
    that it can be detected by a listener in a
    noiseless environment.
  • It can be expressed with a non-linear
  • Tq(f) 3.64(f/1000)-0.8 - 6.5e-0.6(f/1000-3.3)
  • 10-3(f/1000)4 (dB SPL)

(No Transcript)
  • When applied to signal compression, it could be
    interpreted as a maximum allowable energy level
    for coding distortions introduced in the
    frequency domain.
  • So using this information the noise levels during
    quantization are tried to fit below this
  • Due to this quantization noise does not become

  • However
  • The detection threshold for spectrally complex
    quantization noise is a modified version of the
    absolute threshold, with its shape determined by
    the stimuli present at any given time.
  • Since stimuli are in general time-varying, the
    detection threshold is also a time-varying
    function of the input signal.
  • A Spreading function helps to determine modified
    detection threshold of hearing in presence of
    stimuli in given audio frame.

(No Transcript)
Critical Bands
  • Human ear can be viewed as a discrete set of band
    pass filters, which covers the entire 20kHz
    frequency range.
  • The inner ear called as Cochlea contains
    frequency sensitive positions. Whenever any tone
    enters the cochlea it moves until it reaches the
    position where it resonates. (Works as spectrum
  • The critical bandwidth is a function of
    frequency that quantifies the cochlear filter
    pass bands. (unit Bark)

  • As the center frequency goes on increasing, the
    bark-width also goes on increasing.
  • Spectral analysis of audio content is performed
    using critical bands.
  • Bark-width with center frequency f is gives as
  • BWc(f) 25 75(1 1.4(f/100)2)0.69 Hz
  • To convert frequency in Hz to Bark
  • Z(f) 13 arctan(0.00076f) 3.5
    arctan(f/7500)2 (Bark)

Figure Idealized critical band filter bank
  • Masking refers to a process where one sound is
    rendered inaudible because of the presence of
    another sound
  • Simultaneous Masking (Frequency domain)
  • Relative shapes of the masker and maskee
    magnitude spectra determine extent of masking
  • Non-simultaneous Masking (Time domain)
  • Phase relationships between masker and
    maskee determine masking outcome.

  • Depending on the behavior of masker and maskee
    there are following cases
  • Noise Masking Tone (NMT)
  • Tone Masking Noise (TMN)
  • Noise Masking Noise (NMN)

Noise Masking Tone Tone Masking NoiseWe
can see the asymmetry of masking power between
noise and tonal maskers. Significantly greater
masking power is associated with noise maskers
than with tonal masker.
Difference between SMR, NMR and SNR
Spread of Masking
  • Masker centered within one critical band has some
    predictable effect on detection thresholds in
    other critical bands. This effect, also known as
    the spread of masking,
  • It is often modeled in coding applications by an
    approximately triangular spreading function

Non-simultaneous Masking (Temporal Masking)
MPEG Audio Codec Family
  • MPEG-1 (ISO/IEC 11172-3) Layer 2 (mp2)
  • MPEG-1 Layer 3 (mp3)
  • MPEG-2 (ISO/IEC 13818-3) AAC
  • MPEG-4 (ISO/IEC 14496-3) AAC
  • MPEG-4 HE AAV v2

MP3 Compression Flow Chart
Layer 3 uses a 2-stage filter, more frequency
resolution and improved Huffman Coding to the
basic perceptual coder principle
MDCT Filter bank
QMF Filter bank
  • Bit rates available
  • In MPEG-1 Layer 3 are 32, 40, 48, 56, 64, 80, 96,
    112, 128, 160, 192, 224, 256 and 320 kbit/s, and
    the available sampling frequencies are 32, 44.1
    and 48 kHz. 44.1 kHz is almost always used
    (coincides with the sampling rate of compact
    discs), and 128 kbit/s has become the de facto
    "good enough" standard, although 192 kbit/s is
    becoming increasingly popular over peer-to-peer
    file sharing networks.
  • In MPEG-2 and the non-official MPEG-2.5 include
    some additional bit rates 8, 16, 24, 32, 40, 48,
    56, 64, 80, 96, 112, 128, 144, 160 kbit/s while
    providing lower sampling frequencies (8, 11.025,
    12, 16, 22.05 and 24 kHz)

Design limitations of MP3
  • There are several limitations inherent to the
    MP3 format that cannot be overcome by using a
    better encoder. Newer audio compression formats
    such as Vorbis and AAC no longer have these
  • In technical terms, MP3 is limited in the
    following ways
  • Bitrate is limited to a maximum of 320 kbit/s
  • Time resolution can be too low for highly
    transient signals, causing some smearing of
    percussive sounds
  • Frequency resolution is limited by the small long
    block window size, decreasing coding efficiency
  • No scale factor band for frequencies above
    15.5/15.8 kHz
  • Joint stereo is done on a frame-to-frame basis
  • Encoder/decoder overall delay is not defined,
    which means lack of official provision for
    gapless playback. However, some encoders such as
    LAME can attach additional metadata that will
    allow players that are aware of it to deliver
    gapless playback.
  • Nevertheless, a well-tuned MP3 encoder can
    perform competitively even with these

Advanced Audio Coding (AAC)
  • It is a standardized, lossy digital audio
    compression scheme. It was developed with the
    cooperation and contributions of companies mainly
    including Dolby, Fraunhofer (FhG), ATT, Sony and
    Nokia, and was officially declared an
    international standard by the Moving Pictures
    Experts Group in April of 1997.
  • Not backward compatible with other MPEG audio
    standards (like mp3)

  • AAC was promoted as the successor to MP3 for
    audio coding at medium to high bitrates.
  • AAC follows the same basic coding paradigm as
    Layer-3 (high frequency resolution filterbank,
    non-uniform quantization, Huffman coding,
    iteration loop structure using analysis
    by-synthesis), but improves on Layer-3 in a lot
    of details and uses new coding tools for improved
    quality at low bit-rates.
  • Its popularity is currently maintained by it
    being the default iTunes codec, the media player
    which powers iPod, the most popular digital audio
    player on the market.
  • Furthermore, the iTunes Music Store, whose sales
    account for 85 of the market for legal online
    downloads, sells AAC-encoded songs (encapsulated
    with FairPlay Digital Rights Management)

AAC's improvements over MP3
  • Sample frequencies from 8 kHz to 96 kHz (official
    MP3 16 kHz to 48 kHz)
  • Up to 48 channels
  • Higher efficiency and simpler filterbank (hybrid
    ? pure MDCT)
  • Higher coding efficiency for stationary signals
    (blocksize 576 ? 1024 samples)
  • Higher coding efficiency for transient signals
    (blocksize 192 ? 128 samples)
  • Can use Kaiser-Bessel derived window function to
    eliminate spectral leakage at the expense of
    widening the main lobe
  • Much better handling of frequencies above 16 kHz
  • More flexible joint stereo (separate for every
    scale band)

  • Both the mid/side coding and the intensity coding
    are more flexible, allowing to apply them to
    reduce the bit-rate more frequently.
  • An optional backward prediction, computed line by
    line, achieves better coding efficiency
    especially for very tone-like signals. This
    feature is only available within the rarely used
    main profile.
  • Improved Huffman Coding In AAC, coding by
    quadruples of frequency lines applied more often.
    In addition, the assignment of Huffman code
    tables to coder partitions can be much more
  • AAC and HE-AAC are far better than MP3 at very
    low bitrates, but at medium to higher bitrates
    the two formats are more comparable

  • Modular encoding
  • AAC takes a modular approach to encoding.
    Depending on the complexity of the bitstream to
    be encoded, the desired performance and the
    acceptable output, implementers may create
    profiles to define which of a specific set of
    tools they want use for a particular application.
    The standard offers four default profiles
  • Low Complexity (LC) - the simplest and most
    widely used and supported
  • Main Profile (MAIN) - like the LC profile, with
    the addition of backwards prediction
  • Sample-Rate Scalable (SRS), a.k.a. Scalable
    Sample Rate (MPEG-4 AAC-SSR)
  • Long Term Prediction (LTP) added in the MPEG-4
    standard - an improvement of the MAIN profile
    using a forward predictor with lower
    computational complexity.
  • Depending on the AAC profile and the MP3 encoder,
    96 kbit/s AAC can give nearly the same or better
    perceptional quality as 128 kbit/s MP3

MPEG-2 AAC Flowchart
Extensions and Improvements
  • Some extensions have been added to the
    original AAC standard
  • MPEG-4 Scalable To Lossless (SLS)
  • High Efficiency AAC (HE-AAC), a.k.a. aacPlus v1
    or AAC - the combination of SBR (Spectral Band
    Replication) and AAC used for low bitrates
  • HE-AAC v.2, a.k.a. aacPlus v2 - the combination
    of Parametric Stereo (PS) and HE-AAC
  • Perceptual Noise Substitution (PNS)
  • Long Term Predictor (LTP) - added in MPEG-4 Part

MPEG AAC Performance
  • MPEG AAC provides excellent audio quality.
    Reaching perceptually transparent quality at only
    64 kbit/s per channel, it fulfills the
    requirements for broadcast quality as defined by
    the European Broadcasting Union.
  • With sampling rates ranging from 8kHz up to 96kHz
    and above, with bit rates up to 256 kbit/s, and
    with support for up to 48 channels, MPEG AAC is
    one of the most flexible audio codecs. Of course,
    the standard also supports mono, stereo, and all
    common multi-channel configurations (e. g. 5.1 or
  • The low computational demands make AAC the ideal
    codec for any low bit rate high-quality audio

  • HE-AAC is the low bit rate codec in the AAC
    family and is a combination of the AAC LC
    (Advanced Audio Coding Low Complexity) audio
    coder and the SBR (Spectral Band Replication)
    bandwidth expansion tool.
  • This combination achieves good stereo quality
    already at bit rates of 32 to 48 kbit/s. HE-AAC
    is also known as aacPlus and can be used in
    multi-channel operations.

  • Combined with parametric stereo, the HE-AAC codec
    provides good audio quality starting at bit rates
    around 16 to 24 kbit/s for stereo content.
  • HE-AAC v2 is also known as aacPlus v2.

  • Rough work
  • Explain basic psychoacoustic principles
    Absolute threshold of hearing, Critical bands,
    Phenomenon of masking Simultaneous, Masking
    asymmetry, Spread of masking, Non-simultaneous,
    Perceptual Entropy
  • MPEG audio codec family mp3, mp2 AAC, mp4 AAC,
    advanced AAC plus version 1, advanced AAC plus
    version 2
  • (mention features present/absent in each)

  • Limitations of mp3
  • What is different in AAC ?
  • Features in AAC
  • Explain each feature in detail (mp2, mp4)