Ogg Vorbis: Audio Compression Scheme - PowerPoint PPT Presentation


PPT – Ogg Vorbis: Audio Compression Scheme PowerPoint presentation | free to view - id: 2b88f-Zjk1M


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Ogg Vorbis: Audio Compression Scheme


Vorbis is a lossey audio compression scheme intended to replace other similar ... Ogg is the transport mechanism in which Vorbis audio is contained. ... – PowerPoint PPT presentation

Number of Views:271
Avg rating:3.0/5.0
Slides: 16
Provided by: Gre9198


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Ogg Vorbis: Audio Compression Scheme

Ogg Vorbis Audio Compression Scheme
  • Presented by Greg Eustace
  • MUMT 611

  • Introduction
  • Licensing
  • Availability
  • Audio quality metric
  • File structure
  • Encoding/decoding procedure

  • Vorbis is a lossey audio compression scheme
    intended to replace other similar formats (MP3,
    etc.). It uses a psychoacoustic model to
    eliminate perceptually negligible information,
    thereby decreasing file size.
  • The architecture is forward-adaptive, encouraging
    future improvements to the encoding scheme, and
    flexible in that a range of encoding algorithms
    are possible.
  • Vorbis website http//www.vorbis.com/

  • Ogg is the transport mechanism in which Vorbis
    audio is contained. It can hold audio, video (Ogg
    Theora) and metadata, alone or multiplexed. Ogg
    is also compatible with other audio formats such
    as FLAC and Speex. Vorbis requires a container
    (such as Ogg) for framing, synchronization,
    positioning and error correction.
  • Both Ogg and Vorbis are developed by the Xiph.org
  • http//www.xiph.org/

Licensing and Availability
  • Ogg Vorbis is open source, unpatented and free.
    There are no licensing fees for developers,
    musicians, record labels, etc.
  • Encoders/decoders are available and compatible
    with many popular media players on all major
    platforms. Among these are the official command
    line encoder called Oggenc and a graphical
    encoder/decoder/player called OggDrop
  • Software can be obtained here http//www.vorbis.c

Audio quality metric
  • Ogg Vorbis audio quality is measured on a
    subjective quality scale ranging from -1 to 10.
    The default quality setting is 3, which the
    developers claim sounds better than 128 kbps
    MP3s, while occupying 10 of the file size.
  • Although each quality rating corresponds to an
    average bit rate, the encoder does not attempt to
    achieve a specific ABR. Moreover, the developers
    hope to avoid judging quality in terms of this
    criterion. The logic being that bit rates
    correspond only loosely to measures of audio
    quality where differing compression algorithms
    are in use.
  • This is demonstrated in comparison between Vorbis
    and MP3, MP3Pro, WMA, ACC and Real Audio, where
    comparable average bit rates clearly correspond
    to varying degrees of audio quality.
  • Comparisons can be found here http//www.xiph.org

Audio quality metric
  • Additional specifications
  • Encodes using variable bit rates (VBR) or average
    bit rates (ABR). ABR is used for streaming to
    meet bandwidth requirements.
  • Supports up to 255 channels of audio.
  • Works with sample rates from 8kHz to 192kHz
  • Can potentially support bit rate peeling. This
    entails conversion of an already compressed file
    to a lesser quality without reintroducing (and
    thereby compounding) the same encoding artifacts
    associated with the first conversion.

File Structure
  • The Vorbis I specification can be found here
  • http//www.xiph.org/vorbis/doc/Vorbis_I_spec.html
  • The Vorbis bit stream specification consists of
    four packet types, which occur consecutively. The
    first three are headers. The header size is
  • The identification header Identifies the bit
    stream as Vorbis and gives the version in use. It
    includes audio characteristics required for
    further interpretation such as sampling rate and
    channel number.
  • The comment header Includes tags consisting of
    user comments and a vendor string. Tags
    themselves may be user-defined.

File Structure
  • 3. The codec setup header Setup components
    include modes, mappings, floors, residues
    and codebooks, all of which have specific roles
    in the decoding process.
  • Codebooks are required for decoding the audio
    stream. For efficiency, audio is represented by
    codewords derived using vector quantisation and
    entropy encoding methods (Huffman binary tree
    representation). Encoding/decoding of individual
    audio packets involves reading from the
    appropriate codebook.
  • Audio packets.

Encoding Decoding
  • In general, encoding involves separating the
    input audio into frames which are compressed to
    form packets. An overlapping transform called the
    Modified Discrete Cosine Transform (MDCT), which
    is a type of discrete Fourier transform, is used
    to convert time domain data to the frequency
    domain for further processing.
  • The MDCT involves time-domain aliasing
    cancellation (TDAC) which cancels errors
    resulting from the IMDCT by overlapping (using a
    50 overlap) and adding windows. The decoder
    synthesizes audio frames from the packets and
    reassembles them to approximate the original
    audio stream.

Encoding Decoding
  • The decoding and synthesis process includes
    several steps.
  • Decode packet type flag First the decoder must
    verify that a given packet contains audio data by
    inspecting its type flag.
  • Decode mode number The mode number indicates
    the current frame size, window type, transform
    type and mapping number.
  • The frame size is a power of 2 between 64 and
    8192, and can be either short or long. Short
    windows are used near attack transients in order
    to limit artifacts associated with the MDCT.
  • The window taper varies for long windows
    depending on whether the previous and subsequent
    frames are short or long.
  • The transform type is always type 0, the MDCT,
    in Vorbis I.
  • The mapping number contains a description of the
    channel coupling scheme and a list of sub-maps
    which bundle sets of channel vectors.

Encoding Decoding
  • 3. Decode window shape The window shape for a
    given long frame is decoded.
  • Decode the floor The floor vector is a
    low-resolution representation of the audio
    spectrum for the given channel in the current
    frame, generally used akin to a whitening
    filter. During encoding this data is extracted
    from the log spectrum of the audio stream using
    either floor configuration type 0 or 1.
  • Type 0 uses Line Spectral Pair (LSP, also
    alternately known as Line Spectral Frequency or
    LSF) representation to encode a smooth spectral
    envelope curve as the frequency response of the
    LSP filter. This representation is equivalent to
    a traditional all-pole infinite impulse response
    filter as would be used in linear predictive
    coding LSP representation may be converted to
    LPC representation and vice-versa.

Encoding Decoding
  • Type 1 uses a piecewise straight-line
    representation to encode a spectral envelope
    curve. The representation plots this curve
    mechanically on a linear frequency axis and a
    logarithmic (dB) amplitude axis. The integer
    plotting algorithm used is similar to Bresenham's
  • 5. Decode the residue The high frequency detail
    remaining after the floor has been subtracted
    from the audio spectrum (for a given channel and
    frame) during encoding comprises the residue.

Encoding Decoding
  • 6. Inverse channel coupling of residue vectors
    The bit rate is lowered during encoding by
    eliminating redundancies between channels.
  • Two mechanisms exist for channel coupling
  • Channel interleaving via residue backend type 2
  • Cartesian to square polar mapping.
  • The inverse process is performed during decoding.
  • For encoder quality settings equal or greater
    than six, channel coupling is loseless.

Encoding Decoding
  • The floor curve is generated from the decoded
    floor data.
  • The dot product of the floor and residue vectors
    is taken to produce an audio spectrum vector.
  • The audio spectrum is converted back to the time
    domain via the inverse MDCT.
  • 10. The result of the transform is overlapped
    and added together frame-by-frame to provide the
    new audio stream.
About PowerShow.com