Serving, shrinking, and otherwise messing about with perfectly good audio files - PowerPoint PPT Presentation


PPT – Serving, shrinking, and otherwise messing about with perfectly good audio files PowerPoint presentation | free to download - id: 1a576e-ZDc1Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Serving, shrinking, and otherwise messing about with perfectly good audio files


Loudness related to force with which a sound presses on ... Popular formats are Real audio, MS ASF, Apple Quicktime. Compression & Streaming. Dr Paul Vickers ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 27
Provided by: paulvi
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Serving, shrinking, and otherwise messing about with perfectly good audio files

Compression Streaming
  • Serving, shrinking, and otherwise messing about
    with perfectly good audio files

Loudness and power
  • Loudness related to force with which a sound
    presses on your eardrum
  • The more power, the louder the sound
  • Power is proportional to the square of a sounds
    intensity (amplitude, or voltage)

Sampling error and noise
  • CD audio uses 44.1 KHz at 16 bit resolution
  • Sampled voltages quantised to 3276832767
  • Quantisation introduces error (through rounding)
  • Largest error is 0.5 which is 2-16 times as loud
    as the loudest sample value
  • Power related to square of amplitude so error has
    power 2-32 as loud as loudest signal
  • Ratio of signal to error (noise) is 2321
  • Or 96.3 dB (10 log10(232))
  • SNR of 96 dB

Signal to noise ratio
  • So, CD audio has SNR of 96 dB
  • 8-bit sampling has SNR of 48 dB
  • Therefore, 1 bit of resolution adds approx. 6 dB
    to the dynamic range
  • Threshold of pain is 120 dB so we need a 20-bit
    resolution to capture the dynamic range of human
    auditory system
  • Loud samples are rare, so noise is more
    noticeable than the theory would suggest

  • A standard .WAV file (no such thing) stores
    samples as 16-bit values.
  • These values are codes representing the voltages
    (amplitudes) of the signal
  • System called pulse code modulation (contrast
    with pulse amplitude modulation and pulse width
  • WAV format actually supports nearly 100 different
    coding systems

  • Lossless compression (e.g. LZW) does not work
    well on audio as there are very few repeating
  • Sampled audio tends to have random noise in the
    least significant bits making very few bytes
  • Winzip hardly compresses audio files at all
  • Try girl2.wav and 528 Hz.wav. Why does the second
    file compress 2.331?

Other techniques
  • Need some different compression techniques
  • Popular ones are
  • Differential PCM (DPCM)
  • Adaptive DPCM (ADPCM)
  • A-Law
  • µ-Law
  • Logarithmic non-linear codings
  • Perceptual codings

Differential PCM
  • Consider the differences in value between
    individual samples at rates of, say, 44.1 KHz
  • Usually fairly small
  • Small differences need fewer bits than the
    samples themselves
  • So, DCPM stores sample differences, hence the
  • Leads to some inaccuracy and requires look ahead
    to balance things out

DCPM example
  • To reduce 8-bit sample values to 4-bit
  • Consider three samples of 17, 28, 30
  • Differences 11, 2
  • 4-bit system only allows values -87 (10000111)
  • Thus 11 overflows, therefore clipped at 7
  • But decompressing would then give 17, 24, 26
  • But if we look at diff. between decompressed
    sample and next actual
  • 17-28 11 -gt 7. 17 7 24. Diff. 24-30 6
  • Give 7, 6 which, when decompressed gives 17, 24,

Predictor based compression
  • Try to predict next sample on basis of previous
  • If correct, no need to store sample as
    decompressor uses same rules and so can work it
    out too
  • If prediction correct, output 1 else output 0
    followed by actual sample

Adaptive DPCM
  • ADPCM uses prediction
  • Outputs predicted differences. If accurate then
    diff between actual and predicted samples has
    lower variance than actual samples and thus take
    fewer bits
  • Uses 4-bit codes representing predicted diff.
    between two 16-bit samples

Sub-band coding
  • Low frequencies have fewer cycles per second and
    thus lots of small differences
  • High frequencies have larger differences
  • Dividing signal into frequency bands allows low
    frequencies to be coded with fewer bits than high
  • Bands to which ear is less sensitive can be less
    accurately stored

Speech compression
  • Musical sound has little silence
  • Speech has many pauses and silences
  • These can be replaced by duration codes
  • Can reduce a signal by 50 by doing this

  • Predictive techniques need knowledge of what has
    gone before
  • If a stream (e.g. love radio feed) is opened in
    the middle, this state information is unavailable
  • Therefore, insert checkpoints that contain
  • Uncompressed samples, or
  • Compressor state vector
  • Checkpoints allow decompressor to reset itself

Non-linear coding
  • High sample rate gives wide dynamic range
  • Reducing from 16 bits to 8 bits halves storage
    requirements, but reduces dynamic range by 63,000
    times (96 dB down to 48 dB)
  • Standard PCM is linear
  • Sample value 50 is twice the amplitude of 25
  • In 8-bit system, sounds less than 1/256th of
    loudest possible signal disappears

Non-linear coding
  • Ear is quite insensitive to small changes in loud
    sounds but very sensitive to same small change in
    quieter sounds
  • Linear coding ideal of computational manipulation
    but wasteful
  • Non-linear coding uses a logarithmic scale
  • Value of 1 may be much less than 1/50th of
    intensity represented by value of 50
  • More bits for quiet sounds and fewer bits for
    very loud sounds

?-Law A-Law
  • ?-Law and A-Law uses logarithmic compression to
    convert linear-coded PCM samples into 8-bit codes
  • Provide greater accuracy for the small (quiet)
    samples that form bulk of an audio signal
  • Human auditory system has (approx) logarithmic
    response so these techniques give highest
    accuracy where most audible
  • Dynamic range is 14 bits 13 bits respec. (84 dB
    and 78 dB)

Perceptual coding
  • DPCM, ADPCM, ?-Law A-Law do not give
    high-enough compression for demanding multimedia
    and web applications
  • Using psychoacoustic models of our auditory
    system we can take information out of the audio
    signal without changing its perceptual
    characteristics (well, sort of)
  • Linear PCM captures sound as it is
  • Perceptual coding captures audio as it sounds

Perceptual coding
  • PC uses knowledge of the masking properties of
    the human auditory system and our sensitivity to
    different frequency bands
  • PC introduces significant noise into the signal
  • but in such a way as we dont hear it.
  • MP3, ATRAC (mini disc), DCC use perceptual coding

  • Part of an audio signal can be inaudible
  • A loud sound can mask a simultaneous quiet sound
  • A quiet sound immediately following a very loud
    sound may also be inaudible
  • E.g. you have to turn up the radio when your car
    goes faster
  • E.g. A handclap (normally loud) heard straight
    after a gun shot would sound quiet
  • PC assigns fewer bits to masked signals

MPEG audio
  • MPEG audio layer 1, 2, 3
  • Most commonly use layer 3, hence MP3
  • A standard for coding an audio stream into a bit
    stream at various bit rates
  • The higher the bit rate, the more data
  • At a bit rate of 96 kpbs achieve bandwidth of
    about 15 KHz and compression of 161
  • At 128 kpbs, get closer to 20 KHz and compression
    of about 121

  • Mini disc uses adaptive transform acoustic coding
  • Compression of 51
  • Like MP3 uses perceptual coding and sub-band
  • ATRAC uses three sub-bands, MP3 uses 32

  • Streaming is the process of sending an audio file
    as a continuous stream that can be played back
    the moment the stream starts
  • Avoids having to download the file first
  • suitable for live situations, e.g. web casts,
    internet radio, etc.
  • Need to know about network capabilities of client
  • e.g. no point sending 128 kbps MP3 audio to a 56
    k modem client

  • Smooth signal heard where transmitter sends data
    at least as fast as client can decode it
  • low bandwidth connections and
  • network congestion
  • lead to low stream rate either poorer quality
    audio, or glitches and pauses
  • Popular formats are Real audio, MS ASF, Apple

Creating streamed content
  • Very simple
  • Connect a live feed to a streaming-enable media
  • Use tools such as Windows Media Encoder or Reals
    Helix Producer to turn audio files into
    streamable files. Even Sound Forge can save as
    .ASF and .RM
  • Select required bit rate/bandwidth
  • Some services provide multiple bit rates

  • http//!.htm