Serving, shrinking, and otherwise messing about with perfectly good audio files - PowerPoint PPT Presentation

Loading...

PPT – Serving, shrinking, and otherwise messing about with perfectly good audio files PowerPoint presentation | free to download - id: 1a576e-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Serving, shrinking, and otherwise messing about with perfectly good audio files

Description:

Loudness related to force with which a sound presses on ... Popular formats are Real audio, MS ASF, Apple Quicktime. Compression & Streaming. Dr Paul Vickers ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 27
Provided by: paulvi
Learn more at: http://computing.unn.ac.uk
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Serving, shrinking, and otherwise messing about with perfectly good audio files


1
Compression Streaming
  • Serving, shrinking, and otherwise messing about
    with perfectly good audio files

2
Loudness and power
  • Loudness related to force with which a sound
    presses on your eardrum
  • The more power, the louder the sound
  • Power is proportional to the square of a sounds
    intensity (amplitude, or voltage)

3
Sampling error and noise
  • CD audio uses 44.1 KHz at 16 bit resolution
  • Sampled voltages quantised to 3276832767
  • Quantisation introduces error (through rounding)
  • Largest error is 0.5 which is 2-16 times as loud
    as the loudest sample value
  • Power related to square of amplitude so error has
    power 2-32 as loud as loudest signal
  • Ratio of signal to error (noise) is 2321
  • Or 96.3 dB (10 log10(232))
  • SNR of 96 dB

4
Signal to noise ratio
  • So, CD audio has SNR of 96 dB
  • 8-bit sampling has SNR of 48 dB
  • Therefore, 1 bit of resolution adds approx. 6 dB
    to the dynamic range
  • Threshold of pain is 120 dB so we need a 20-bit
    resolution to capture the dynamic range of human
    auditory system
  • Loud samples are rare, so noise is more
    noticeable than the theory would suggest

5
Coding
  • A standard .WAV file (no such thing) stores
    samples as 16-bit values.
  • These values are codes representing the voltages
    (amplitudes) of the signal
  • System called pulse code modulation (contrast
    with pulse amplitude modulation and pulse width
    modulation)
  • WAV format actually supports nearly 100 different
    coding systems

6
Compression
  • Lossless compression (e.g. LZW) does not work
    well on audio as there are very few repeating
    patterns
  • Sampled audio tends to have random noise in the
    least significant bits making very few bytes
    identical.
  • Winzip hardly compresses audio files at all
  • Try girl2.wav and 528 Hz.wav. Why does the second
    file compress 2.331?

7
Other techniques
  • Need some different compression techniques
  • Popular ones are
  • Differential PCM (DPCM)
  • Adaptive DPCM (ADPCM)
  • A-Law
  • µ-Law
  • Logarithmic non-linear codings
  • Perceptual codings

8
Differential PCM
  • Consider the differences in value between
    individual samples at rates of, say, 44.1 KHz
  • Usually fairly small
  • Small differences need fewer bits than the
    samples themselves
  • So, DCPM stores sample differences, hence the
    name
  • Leads to some inaccuracy and requires look ahead
    to balance things out

9
DCPM example
  • To reduce 8-bit sample values to 4-bit
    differences
  • Consider three samples of 17, 28, 30
  • Differences 11, 2
  • 4-bit system only allows values -87 (10000111)
  • Thus 11 overflows, therefore clipped at 7
  • But decompressing would then give 17, 24, 26
  • But if we look at diff. between decompressed
    sample and next actual
  • 17-28 11 -gt 7. 17 7 24. Diff. 24-30 6
  • Give 7, 6 which, when decompressed gives 17, 24,
    30

10
Predictor based compression
  • Try to predict next sample on basis of previous
    samples
  • If correct, no need to store sample as
    decompressor uses same rules and so can work it
    out too
  • If prediction correct, output 1 else output 0
    followed by actual sample

11
Adaptive DPCM
  • ADPCM uses prediction
  • Outputs predicted differences. If accurate then
    diff between actual and predicted samples has
    lower variance than actual samples and thus take
    fewer bits
  • Uses 4-bit codes representing predicted diff.
    between two 16-bit samples

12
Sub-band coding
  • Low frequencies have fewer cycles per second and
    thus lots of small differences
  • High frequencies have larger differences
  • Dividing signal into frequency bands allows low
    frequencies to be coded with fewer bits than high
    frequencies
  • Bands to which ear is less sensitive can be less
    accurately stored

13
Speech compression
  • Musical sound has little silence
  • Speech has many pauses and silences
  • These can be replaced by duration codes
  • Can reduce a signal by 50 by doing this

14
Checkpointing
  • Predictive techniques need knowledge of what has
    gone before
  • If a stream (e.g. love radio feed) is opened in
    the middle, this state information is unavailable
  • Therefore, insert checkpoints that contain
  • Uncompressed samples, or
  • Compressor state vector
  • Checkpoints allow decompressor to reset itself

15
Non-linear coding
  • High sample rate gives wide dynamic range
  • Reducing from 16 bits to 8 bits halves storage
    requirements, but reduces dynamic range by 63,000
    times (96 dB down to 48 dB)
  • Standard PCM is linear
  • Sample value 50 is twice the amplitude of 25
  • In 8-bit system, sounds less than 1/256th of
    loudest possible signal disappears

16
Non-linear coding
  • Ear is quite insensitive to small changes in loud
    sounds but very sensitive to same small change in
    quieter sounds
  • Linear coding ideal of computational manipulation
    but wasteful
  • Non-linear coding uses a logarithmic scale
  • Value of 1 may be much less than 1/50th of
    intensity represented by value of 50
  • More bits for quiet sounds and fewer bits for
    very loud sounds

17
?-Law A-Law
  • ?-Law and A-Law uses logarithmic compression to
    convert linear-coded PCM samples into 8-bit codes
  • Provide greater accuracy for the small (quiet)
    samples that form bulk of an audio signal
  • Human auditory system has (approx) logarithmic
    response so these techniques give highest
    accuracy where most audible
  • Dynamic range is 14 bits 13 bits respec. (84 dB
    and 78 dB)

18
Perceptual coding
  • DPCM, ADPCM, ?-Law A-Law do not give
    high-enough compression for demanding multimedia
    and web applications
  • Using psychoacoustic models of our auditory
    system we can take information out of the audio
    signal without changing its perceptual
    characteristics (well, sort of)
  • Linear PCM captures sound as it is
  • Perceptual coding captures audio as it sounds

19
Perceptual coding
  • PC uses knowledge of the masking properties of
    the human auditory system and our sensitivity to
    different frequency bands
  • PC introduces significant noise into the signal
  • but in such a way as we dont hear it.
  • MP3, ATRAC (mini disc), DCC use perceptual coding
    techniques

20
Masking
  • Part of an audio signal can be inaudible
  • A loud sound can mask a simultaneous quiet sound
  • A quiet sound immediately following a very loud
    sound may also be inaudible
  • E.g. you have to turn up the radio when your car
    goes faster
  • E.g. A handclap (normally loud) heard straight
    after a gun shot would sound quiet
  • PC assigns fewer bits to masked signals

21
MPEG audio
  • MPEG audio layer 1, 2, 3
  • Most commonly use layer 3, hence MP3
  • A standard for coding an audio stream into a bit
    stream at various bit rates
  • The higher the bit rate, the more data
  • At a bit rate of 96 kpbs achieve bandwidth of
    about 15 KHz and compression of 161
  • At 128 kpbs, get closer to 20 KHz and compression
    of about 121

22
ATRAC
  • Mini disc uses adaptive transform acoustic coding
  • Compression of 51
  • Like MP3 uses perceptual coding and sub-band
    compression
  • ATRAC uses three sub-bands, MP3 uses 32

23
Streaming
  • Streaming is the process of sending an audio file
    as a continuous stream that can be played back
    the moment the stream starts
  • Avoids having to download the file first
  • suitable for live situations, e.g. web casts,
    internet radio, etc.
  • Need to know about network capabilities of client
  • e.g. no point sending 128 kbps MP3 audio to a 56
    k modem client

24
Streaming
  • Smooth signal heard where transmitter sends data
    at least as fast as client can decode it
  • low bandwidth connections and
  • network congestion
  • lead to low stream rate either poorer quality
    audio, or glitches and pauses
  • Popular formats are Real audio, MS ASF, Apple
    Quicktime

25
Creating streamed content
  • Very simple
  • Connect a live feed to a streaming-enable media
    producer
  • Use tools such as Windows Media Encoder or Reals
    Helix Producer to turn audio files into
    streamable files. Even Sound Forge can save as
    .ASF and .RM
  • Select required bit rate/bandwidth
  • Some services provide multiple bit rates

26
Example
  • http//computing.unn.ac.uk/staff/cgpv1/music!.htm
About PowerShow.com