Audio

About This Presentation

Title:

Audio

Description:

Sounds of instruments, Music. Sounds of all other kinds ... To display the spectrogram, use specgram. Audio analysis are done in frames of 20ms 40ms long. ... – PowerPoint PPT presentation

Number of Views:165

Avg rating:3.0/5.0

Slides: 20

Provided by: Hao65

Category:

more less

Transcript and Presenter's Notes

Title: Audio

1
Audio

Hao Jiang
Computer Science Department
Boston College
Oct. 11, 2007

2
Digital Audio

Audio comes from different sources
Speech.
Sounds of instruments, Music.
Sounds of all other kinds (the sound of wind,
train and ocean).
Audio needs new methods for coding and
processing.
Audio processing is a key task in multimedia
systems
Audio coding (MPEG audio, mp3, AAC and others)
Authoring and representation (composition)
Analysis and searching (retrieval and database)
3D sound, etc.
We will focus on basic audio processing, MPEG
audio and related topics.

3
Audio Processing

Audio authoring

Audio file formats Waveform files and
MIDI. MIDI Musical Instrument Digital
Interface. Instead of storing the
waveform samples, MIDI file has a sequence of
commands to control an audio device to generate a
specified note with given properties.
4
Audio Processing Using Matlab

To load a wave in Windows
audat wavread(filename.wav)
Or, directly open the file and load a stream
of words (2 bytes) or bytes depending on the
wav format.
To play a sound, use sound(audat, samplingrate).
To display the spectrogram, use specgram.
Audio analysis are done in frames of 20ms 40ms
long.

5
Frequency Domain Analysis

Fourier transform can be used to decompose any
signal into summation of sinusoidal waves.
In Matlab, we can use fft (Fast Fourier
Transform) for frequency domain analysis.

T
Base frequency ¼ 1/T
The time domain waveform
The frequency Domain components.
6
MP3 and Others

MPEG (Motion Picture Expert Group) and ISO
(International Standard Organization) have
published several standards about digital audio
coding.
MPEG-1 Layer 1,2 and 3 (MP3)
MPEG2 AAC
MPEG4 AAC and TwinVQ
Other standards
Dolby AC3
They have been widely used in consumer
electronics, digital audio broadcasting, DVD and
movies etc.

7
Perceptual Coding in MPEG
audio
Encoder
MUX
Bit stream
Dynamic bit allocation
FFT
Masking Threshold
Encoder
MUX
audio
Bit stream
Dynamic bit allocation
8
Simultaneous Masking

A strong audio component can mask its nearby
frequency components.

dB
Masker
Sound pressure level
Masking threshold
Threshold in quiet
20000 Hz
1000
20
9
Masking and Quantization
Masker
dB
Signal To mask ratio
Sound pressure level
m1-bit quantizer SNR
Minimum masking threshold for band A.
m-bit quantizer SNR
20000 Hz
20

Critical band A Neighbor
critical band
A critical band defines the resolution of the
hearing at some frequency location.
10
Temporal Masking
Amplitude
Pre-masking curve
Post-masking curve
time
11
MPEG Perceptual Model

A matlab demo.

12
MPEG Audio Layer 1

MPEG (1 and 2) audio allows sampling rate at 44.1
48, 32, 22.05, 24 and 16KHz.
MPEG filters the input audio into 32 bands.

12 samples
Filtering And downsampling
Perceptual coder
12 samples
Audio
Normalize By scale factor
384 samples
12 samples
13
MPEG Audio Layer 2

Layer 2 is very similar to Layer 1, but groups 3
12-samples together in coding.
It also improves the scaling factor quantization
and also groups 3 audio samples together in bit
assignment.

36 samples
Filtering And downsampling
Perceptual coder
36 samples
Audio
Normalize By scale factor
3x384 samples
36 samples
14
Overlapped Transform and MDCT
Window 1
Window 3
2N
Window 2
Window 4
In overlapped transform, 2N samples are
transformed to N elements.
1
3
In reverse Transform
2
4

Reconstructed result.
15
Some Matlab Codes

The program compares DCT and MDCT in audio
processing.
Code is available on the course website as a tar
ball mdct_and_dct.tar.

16
MP3

MP3 is another layer built on top of MPEG audio
layer 2.
MP3 further does MDCT on each band and tries to
encode the MDCT coefficients.
MP3 then uses Huffman coding to further compress
the bit streams losslessly.

17
File Format
Mpeg audio puts header in each of the frame, so
that they can be decoded separately.
Header
CRC
Bit Allocation
Scale factors
Subband Data
Header
CRC
Bit Allocation
Scale factors
Subband Data
Frame 1
Frame 2
18
Other Audio Coding Standards