Title: Adaptive Time-Frequency Resolution for Analysis and Processing of Audio
1Adaptive Time-Frequency Resolutionfor Analysis
and Processing of Audio
Alexey Lukin AES Student Member Moscow State
University, Moscow, Russia
Jeremy Todd AES Member iZotope, Inc., Cambridge,
MA
2Short-Time Fourier Transform
- Most commonly used transform for audio
- Spectral analysis
- Noise reduction (spectral subtraction algorithms)
- Time-variable filters and other effects
- Very fast implementation for large number of
bands via FFT - Good energy compaction for many musical signals
- Many oscillations in basis functions ? ringing
(Gibbs phenomenon) - Uniform frequency resolution ? inadequate
resolution at lows
3Filter banks
- Idea
- Decompositions of time-frequency plane
Uncertainty principle
4Suggested approach
Transforms must vary their time-frequency
resolution in a perceptually motivated way
- Imitation of time-frequency resolution of human
hearing - Adaptation of resolution to local signal features
5Spectrograms
- Problems
- Most perceptually meaningful energy is
concentrated in the narrow band below 4 kHz ?
cant see useful details - Time/frequency resolution trade-off
Conventional STFT spectrogram (linear frequency
scale)
6Spectrograms
- Problems
- Poor frequency resolution at low frequencies ?
cant separate bass harmonics from bass drum - Time/frequency resolution trade-off
Mel-scale STFT spectrogram (window size 12 ms)
7Spectrograms
- Problems
- Poor time resolution at transients ?
time-smearing of drums
Mel-scale STFT spectrogram (window size 93 ms)
8Spectrograms
- Simple solution combine spectrograms with
different resolutions - Take bass from spectrogram with good freq.
resolution - Take treble from spectrogram with good time
resolution
9Spectrograms
- Simple solution
- Combine spectrograms with different resolutions
take bass from spectrogram with good frequency
resolution, take treble from spectrogram with
good time resolution
Combined resolution spectrogram (window
sizes from 12 to 93 ms)
10Spectrograms
- Better approach select best resolution for each
time-frequency neighborhood - Criteria?
- Better frequency resolution at bass (reflects
a-priori psychoacoustical knowledge) - Maximal energy compaction (to minimize spectral
smearing in both time and frequency)
best
6 ms
12 ms
24 ms
48 ms
96 ms
STFT window size
11Spectrograms
- Calculation of energy compaction
- (energy smearing in the given block
- for all given resolutions)
Here ai,r are descendingly sorted STFT
magnitudes in the block, Sr is the
energy smearing for the given resolution r,
r0 is the resolution with best energy
compaction.
best
6 ms
12 ms
24 ms
48 ms
96 ms
STFT window size
12Spectrograms
- Benefits
- Sharper bass drum hits and other transients, even
in mid-frequency range - Sharper guitar harmonics in high frequencies
Adaptive resolution spectrogram (window
sizes from 12 to 93 ms)
13Spectrograms
More examples
Conventional STFT spectrogram
Tone onset waveform
14Spectrograms
More examples
Adaptive resolution spectrogram
Combined resolution spectrogram
15Processing framework
- General framework for
- multi-resolution processing
- Perform processing with
- several different resolutions
- Adaptively combine (mix)
- results in time-frequency space
- Mixing is controlled by a-priori
- knowledge of psychoacoustics
- and analysis of local signal features
- (e.g. transience)
16Noise reduction
- Spectral subtraction algorithm modifications
- Better frequency resolution at low frequencies
(according to the human hearing resolution) - Better temporal resolution near signal transients
(for reduction of Gibbs phenomenon)
17Noise reduction
- Results of single-resolution and multi-resolution
algorithms
Noisy recording (guitar castanets)
18Noise reduction
- Results of single-resolution and multi-resolution
algorithms
Single resolution
Multi-resolution
19Your questions
?
Demo web page http//www.izotope.com/tech/aes_ada
pt/ Poster session P17 Monday, 900 1030 a.m.