Title: Perception of mid frequency and high frequency intermodulation distortion in loudspeakers, and its relationship to high-definition audio.
1Perception of mid frequency and high frequency
intermodulation distortion inloudspeakers, and
its relationship to high-definition audio.
- (A physicist meets the twilight zone)
- David Griesinger
- Lexicon
- 3 Oak Park
- Bedford, MA 01730
2Why Bother?
- What is the relationship between high frequency
intermodulation distortion and recordings with
frequency response above 20kHz? - Why do my choral recordings sound fuzzy
particularly when played at high level? - And does the perception of fuzziness grow more
obvious with long-term listening? - Is the short-term A/B test the ultimate Gold
Standard for audio reproduction? - Is it possible that long-term listening can
reveal flaws that go undetected in a short-term
A/B test? - This talk will examine the physics and physiology
that these these questions involve. - We may even suggest a few answers!
3High Frequency Intermodulation Distortion and
Ultrasonic reproduction
- THE essential paper on this subject is Kiryu
and Ashihara Detection of Threshold for tones
above 22kHz. Convention paper 5401 presented
at the 110th Convention, May 12-15 2001,
Amsterdam. - The authors presented 13 subjects with a test
signal consisting of a 2kHz tone combined with
odd order harmonics, both sonic and ultrasonic. - The ultrasonic harmonics were switched on and
off at a 2Hz rate. - ALL subjects could discriminate the ultrasonics
when the combined signal was presented through a
single loudspeaker. - NONE of the subjects could discriminate the
ultrasonics when each ultrasonic harmonic was
reproduced from a separate speaker.
4Spectrum at the listener position from Kiryu
and Ashihara
Note that an essential feature of this experiment
is that the stimulus harmonics are all ODD.
Asymmetric non-linearity produces both odd and
EVEN harmonics which are immediately visible
(and audible.)
5Kiryu and Ashiharas result is strong
- Their choice of source signal MAXIMIZES the
(possible) audibility of an ultrasonic signal. - The sound pressure of the ultrasonic harmonics
are equal to the sound pressure of the harmonics
below 20kHz. - For almost all common sound sources the
ultrasonics are weaker. - Thus if the ultrasonics are perceived directly
through some effect of their presence on nerve
firings for the lower harmonics, this signal
should produce a positive result. - The use of only odd harmonics for the source
signal maximizes the chance that the ultrasonics
will be perceived if ANY part of human physiology
is (asymmetrically) non-linear. - The basilar membrane is inherently asymmetrically
non-linear. The hair cells are half-wave
rectifiers. - The probability of finding asymmetric
non-linearities in other parts of the system is
large.
6There was a null result!
- When the EXTERNAL non-linearities were
eliminated, there was NO ultrasonic perception. - We can conclude that
- The mechanical conduction of ultrasonics to the
basilar membrane is either effectively zero, or
symmetrically linear. - If significant ultrasonic energy actually reaches
the basilar membrane, there are NO hair cells
that respond to it. - We will present evidence from other experiments
that support these conclusions.
7DG experiment Modulated Harmonics
- We wanted to measure the distortion generating
mechanism observed by Kiryu and Ashihara by
inducing distortion in common tweeters. - Seemed like a simple experiment
- Make some sweep signals in MATLAB
- Record a few common instruments with a BK 4133
microphone - Filter out the frequencies below 20kHz
- Play them back at various levels, and listen.
8ENTER the TWILIGHT ZONE
- All I needed was a good quality sound card that
would record and playback 96kHz. - Many manufacturers seem to offer such a thing at
reasonable prices. I chose one by the most
popular manufacturer lets call them C. - After the usual frustrating two hours to purchase
the board, disassemble the computer, install the
drivers, call customer service when the drivers
crash, download new drivers from the web, etc, - Although you could set the device to record at
96kHz, signals above 23kHz would not record. - Matlab sweeps generated at 96kHz would not play
back above 23kHz. - More calls to customer service. Typical
conversation What interrupts is the card using?
What video card are you using? What operating
system are you using? (Windows 2000 pro.) OH!
Thats your problem try XP. - So I decided to bite the bullet and upgrade to
XP. I also upgraded to the most expensive
version of this manufacturers sound boards. - But the XP upgrade took longer than expected at
the computer shop.
9More twilight
- I decided to buy a USB based device that did
96/24, as this would be useful for demos from my
laptop such as at this lecture. - But the drivers would not load in my regular
laptop, and the customer service had no advice at
all. - I tried a different laptop. The drivers loaded
fine. Same operating system. (Windows ME.) - But this external board would also not record or
play above 23kHz. - Customer service was again not very helpful.
What interrupts is the USB service routine
using? You are actually looking at the output
with an oscilloscope!? - Its a good thing I am sometimes known as Mr.
Matrix!
10A bit of light
- So I just kept calling customer service until I
got an operator who was willing to answer the
question Are ANY of your boards actually capable
of operating at 96kHz? - After a great deal of waiting on hold the answer
came back. - NO
- In fact, none of our competitors do either We
all have 96/24 converters, but we run them at
48kHz. - What do you suggest that I do
- Dont buy consumer go to pro.
- So I did. No problem, just much more money.
11So finally to the experiments
- C language program was written to generate a
twin-tone frequency sweep, broken into tone
bursts (so as not to burn out the tweeter.) - The sweep is repeated with 6dB increases in
level, so the level dependence of any distortion
could be measured.
12Spectrum of the burst signal
- The two frequencies are relatively constant
during each burst, but sweep slowly upwards in
frequency. - The start frequencies and end frequencies of the
sweep in each tone can be specified, along with
the sweep rate. - This allows different types of non-linearity to
be tested.
13Result nothing significant is heard.
- Subharmonics of the burst signal can be heard
with difficulty - In a quiet room
- When the ultrasonic level is above 80dB SPL at
one meter the subharmonics are below 30dB SPL, at
15cm from the tweeter. - The levels of the sub harmonics are consistent
with the distortion in the amplifier - 0.1 . - The LOUDSPEAKER seems blameless.
- 4 different loudspeakers with different tweeters
were tried, with the same result. - The observed harmonics were produced by the
amplifier.
14DG experiment 2 Rattling Keys
- A set of three house keys on a plastic key ring
were shaken in front of a BK 4133 microphone,
and the output was recorded at 96kHz. - The resulting signal has an enormous crest
factor 28dB. - This means it is 16dB quieter than
non-compressed music with the same peak signal
voltage. - And very high ultrasonic content.
15Keys spectrum
16Experiment
- Reproduce the signal with and without the
ultrasonic component. - This was done by low-pass filtering the original
signal at 20kHz, and alternating it with the full
bandwidth signal. - Reproduce only the components above 20kHz and
listen for ANY audible sound.
17Keys ultrasonics
- Waveform of the frequencies above 20kHz same
scale as previous slide.
18Result the same as for bursts
- No difference could be heard with and without the
ultrasonics - (but the one subject was rather old)
- When the ultrasonic signals only were played at
high levels, intermodulation products from the
input signals were easily heard - - at levels consistent with amplifier distortion.
19Conclusions from DG ultrasonic tests
- The various tweeters tested 3 metal dome
tweeters and one soft dome tweeter produce
insignificant amounts of intermodulation products
below 20kHz when driven by ultrasonic signals. - Amplifier distortion can produce distortion
products below 20kHz that are audible (with
difficulty) in the absence of other signals below
20kHz. - But with a high quality amplifier these
distortion products are not audible in the
presence of even extraordinary ultrasonic sources
such as rattling keys. - Unless the amplifier is driven into clipping.
20Ultrasonic content of musical instruments
- Trumpet spectrum of the note with the highest
harmonics
21Sopranino Recorder
- Spectrum of highest note 3200Hz
22Sopranino Recorder 2
- Although the highest note of the particular
sopranino recorder I own produces ultrasonic
harmonics - These harmonics are AT MAXIMUM 40dB below the
level of the fundamental. - Compare this to the levels used by Kiryu and
Ashihara, where the ultrasonic harmonics were
equal in level to the fundamentals - Notice also that both even and odd harmonics are
present in the sopranino. - So any even order harmonic products will be
masked.
23Percussion
- DG lacks a home drum set. So he went looking for
recordings of drums - MORE TWILIGHT ZONE
- I many samplers and examples of SACD and DVD
audio disks. - Very few had any popular music that was not
resampled from 48kHz. - So I borrowed three more samplers and five SACD
disks from John Newton. - None of the popular music samples had anything at
all above 23kHz.
24SACD examples Sting
25Steely Dan two against nature
26Diana Krall
27Jazz at 192kHz test DVD (the spot with the
highest harmonics)
Notice the ultrasonic harmonics are lower than
the fundamentals by more than 42dB.
28John Eargle, Schnittke SACD
Note the ultrasonic harmonics disappear into the
SACD noise at about 27kHz. Remember that the
SACD noise is believed by everyone to be
inaudible.
29Timing accuracy and information theory.
- It is widely believed that the assumed
superiority of DVD audio and SACD is improved
resolution due to improved timing accuracy. - It is well known that human binaural hearing can
distinguish timing differences between the ears
of as little as two microseconds - This is often taken to imply that the frequency
response of the physiological system must extend
to 500kHz. - However it is easy to demonstrate that a 1kHz
sine tone modulated with a raised cosine can
be accurately localized, even though the waveform
contains NO frequencies above 1400Hz!
30Bandwidth and Signal to Noise Ratio
- In Physics, the accuracy of timing is not
determined by the bandwidth, but roughly by the
product of the bandwidth and the signal to noise
ratio. - Audio systems have low bandwidth but very high
signal to noise ratio. - The hair cells in the basilar membrane fire when
movement causes an ion channel to open. - Firings maximize at positive zero crossings of
the membrane motion
31Timing difference and signal to noise
- The timing difference between two waveforms can
be determined as long as the signal to noise
ratio is high enough to allow accurate
determination of the zero crossing.
32Sampling Theory and Timing
- Sampling theory proves that as long as the
sampling rate is at least twice the bandwidth of
a signal ALL the information content of that
signal will be retained after sampling, and can
be exactly reconstructed.
33Sampling Theory and interchannel Timing
- The timing of any zero-crossing is exactly
preserved if SR gt 2BW. Extra samples are wasted.
34Sampling and reconstruction in practice
- 10 or more years ago it was difficult to sample
signals with sufficient accuracy to approach the
theoretical ideals. - Similarly, reconstruction with practical D/A
converters and filters could cause (barely)
audible artifacts. - The artifacts present in practical converters
were reduced at higher sampling rates. - So raising the sampling rate above 48kHz seemed
reasonable for the highest quality audio. - Presently integrated A/D and D/A converters are
available that sample and reconstruct signals to
the theoretical limits (at 18 or 20 bit
accuracy). - These converters are inexpensive and in wide use.
For these converters, there is no advantage to
higher sampling rates unless we can prove that
ultrasonic frequencies somehow contribute to
human perception. - The author is unaware of any experiment meeting
double blind standards that supports this claim.
35Absolute timing and Human Physiology
- The human brain is a computer of great
sophistication and complexity, with a clock
frequency of 1kHz. (Hiroshi Riquimaroux) - The author is unaware of any experiment that
shows musical timing accuracy in speech or music
that is better than about 1ms. - Physiological processes do exist in binaural
hearing with interchannel timing accuracies down
to 2 microseconds, but this is not the same thing
as long term timing accuracy over fractions of a
second. - These interchannel timing differences are exactly
preserved at common sampling rates. - Hiroshis comment pretty much sums it up!
36Ultrasonic Directivity
- The directivity of a tweeter depends on the
diameter of the diaphragm and the frequency. As
written in Matlab, using a Bessel function of
order 1 - If a is the diaphragm diameter,and lambda is the
wavelength,and p is the sound pressure, - mu 2pia/lambda
- p 2besselj(1,musin(theta))./(musin(theta))
To actually hear ultrasonics the listener must be
very carefully aligned with the driver, both
horizontally and vertically. This precision of
alignment is unlikely in music listening
From Philip Morse Vibration and Sound, Second
edition, McGraw-Hill, 1948
37The frequency transmission of the pinnae and
middle ear
From B. C. J. Moore, B. R. Glasberg and T.
Baer, A model for the prediction of thresholds,
loudness and partial loudness, J. Audio Eng.
Soc., vol. 45, pp. 224-240 (1997).
Note that the external pinnae structures and the
mechanics of the middle ear severely attenuate
sound transmission above 4kHz. The slope of this
curve would predict a transmission factor of
40dB at 30kHz
38Conclusions for High Definition Audio
- Adding ultrasonics to a recording technique does
NOT improve time resolution of typical signals
either for imaging or precision of tempo. The
presumption that it does is based on a
misunderstanding of both information theory and
human physiology. - Kiryu and Ashihara have shown that ultrasonic
harmonics of a 2kHz signal are NOT audible in the
absence of external (non-human) intermodulation
distortion. - Their experiments put a limit on the possibility
that a physiological non-linearity can make
ultrasonic harmonics perceptible. They find that
such a non-linearity does not exist at ultrasonic
sound pressure levels below 80dB. - All commercial recordings tested by the author as
of 6/1/03 contained either no ultrasonic
information, or ultrasonic harmonics at levels
more than 40dB below the fundamentals. - Our experiments suggest that the most important
source of audible intermodulation for ultrasonics
is the electronics, not in the transducers. - Some consumer grade equipment makes a tacit
admission of the inaudibility of frequencies
above 22kHz by simply not reproducing them. Yet
the advertising for these products claims the
benefits of higher resolution. - Even assuming ultrasonics are audible,
loudspeaker directivity creates an unusually tiny
sweet spot, both horizontally and vertically.
39Mid-Frequency Intermodulation distortionor why
do my loudspeakers sound fuzzy?
- Problem
- Loudspeaker reproduction of massed chorus and
orchestra is often perceived (by the author) as
harsher than the live chorus. - The degree of harshness seemed to depend on the
loudspeaker type, and on the playback level. - It seemed worthwhile to investigate whether this
perception could be related to intermodulation
distortion.
40The validity of short duration A/B tests
- Floyd Toole has established a protocol for
loudspeaker evaluation that allows rapid
comparison of two loudspeakers in the identical
acoustical location. - Tests using this protocol have proven to be
reliable, in that they consistently rank-order
loudspeakers in a way that is robust for
different listeners and for repeated tests. - But it is not clear that rapid A/B tests are the
ideal way to test for intermodulation distortion,
which may require a period of time to be
perceived. - In a rapid A/B test the primary perception is
loudness. - Once loudness has been controlled, differences in
frequency response and timbre dominate the
result. - Human hearing adapts to errors in spectrum and
timbre over a period of 10 to 20 minutes. - Even an old fashioned phonograph sounds pretty
good once you get used to it! - And many of the monitoring loudspeakers in common
use are severely colored. The people who use
them say they sound fine. - Is it possible that after adapting to spectrum
and timbre intermodulation problems might become
more apparent?
41Test Signals
- We wanted test signals that would mimic the
levels and frequencies found in choral and
orchestral music. - This music typically has fundamentals in the
range of 100Hz to 1000Hz, with substantial
high-order harmonics. - The mix of fundamentals and harmonics are related
by common musical intervals. - A sweep signal consisting of two or more pitches
was chosen, where the pitch intervals was held
constant. Preliminary tests showed that the
harmonic content of the chosen pitches did not
significantly affect either the measured or
perceived amount of intermodulation distortion - Eventually a minor triad was chosen as a test
signal. The signal consists of three equal
amplitude sine waves. A root, the minor third
above, and the fifth above the root. The minor
third was chosen as even tempered, and the fifth
was chosen to be perfect. - Thus a sweep would consist of a frequency f0
which sweeps from 250Hz to 4kHz, in combination
with f1 1.1225f0, in combination with f2
1.5f0.
42Sweep rate and waveform
- The sweep rate was set at 10 seconds for a four
octave sweep. - The sweep is then repeated at a 6dB higher
amplitude, until the maximum level is reached. - Each block in the waveform below is 10 seconds
long, and sweeps f0 over four octaves. - In this case the signal has been formed by
summing the sines of f0, f1, and f2, resulting in
a symmetric output signal.
43Fine waveform and spectrum
44Result speaker 1
Typical output spectrum at the highest level.
45Analysis
- Results were analyzed with a C language program
that eliminated the source tones with a tracking
filter. - This program outputs files that can be plotted
with MATLAB.
Notice the total distortion is not strongly level
dependent in this loudspeaker. Nor is it
strongly dependent on frequency. Is this
possibly a source of fuzziness?
46Analysis of distortion
- The tracking filter selects two harmonics
- One characteristic of symmetric distortion,
- And one characteristic of asymmetric distortion.
Symmetric distortion Asymmetric distortion
Notice that this loudspeaker has a symmetric
distortion characteristic almost independent of
level, at least below 1kHz. Asymmetric
distortion rises with level.
47Is the observed distortion audible?
- The observed distortion is audible on the test
signal, particularly on the high level segment.
But only if the test subject wears earplugs. - Distortion is audible on the lower level portions
also, and it seems relatively independent of
level. - But is the perceived distortion in the speaker
or is it in the listener? - As a test, the same signal was reproduced through
three loudspeakers, one for each frequency, f0,
f1, and f2. - The perception was subtly different when the
signals were combined and reproduced through a
single loudspeaker. - But these differences could be ascribed to the
non-anechoic conditions of the test.
48Human hearing is inherently non-linear
Hair cells fire when the ion channel controlled
by the hair opens. This causes a burst of neural
activity at the zero-crossings of the pressure
waveform. This process is similar to a half-wave
rectifier followed by a differentiator.
All the sounds we hear pass through this
asymmetric non-linear system. We perceive the
signals as undistorted only through the action of
the filters in the basilar membrane. These
filters are not particularly effective at low
frequencies!
49Hair cell firing
- Hair cells act as a half-wave rectifier. We are
unaware of the (negative) half of the waveform.
50Result of the half-wave rectification
- The pitch of low frequencies is determined not
through the basilar membrane filters, but through
the time intervals between nerve firings. - Consequently we cannot distinguish between real
frequencies and subharmonics generated through
the half-wave rectification process. - This leads to the well-known phenomenon of false
bass - Listening to two tones that are harmonically
related will often produce the perception of the
fundamental. For example, a tone at 50Hz will be
heard when 100Hz and 150Hz are played together. - Complex low frequency signals, such as a minor
triad, are heard as an un-interpretable mix of
fundamentals and harmonics. - Composers outside of grundge rock tend to
avoid them!
51Example a low frequency triad
- We can generate a minor triad sweep from 80Hz to
320Hz. - The lower frequencies simply sound as if the
loudspeaker is broken - Only above 250Hz do we begin to resolve the
pitches that make up the harmony.
52Distortion Models - symmetric
- We need a mathematical model for loudspeaker
distortion that will allow us to find the just
noticeable level at which distortion becomes
perceivable. - After a great deal of head-scratching two simple
models were chosen - 1. Symmetric compression where delta is
typically 0.1 or less - If the signal voltage is positive
- Vout exp((1-delta)log(Vin)
- If the signal voltage is negative
- Vout -exp((1-delta)log(abs(Vin))
- This distortion is identical to a uniform
compression of the waveform. If delta 0.1,
then the waveform is compressed by 1dB for every
10dB of level increase. - The advantage of this model is that the
percentage distortion is independent of the
signal level and spectrum. - There is a small dependence on crest factor.
53Distortion Models asymmetric
- A simple asymmetric distortion model can be
created by using a different gain for the
positive and negative signal voltages. - If the signal voltage is positive
- Vout (1-delta_a)Vin
- If the signal voltage is negative
- Vout Vin
- This model also produces a distortion that is
independent of level and spectrum. - A C language program was written that applies
these two distortions to a stereo input file,
measures the resulting percentage of distortion,
and outputs the distortion only as a stereo file. - This program allows us to listen to the result of
the distortion process on a wide variety of input
signals. Since the distortion alone is the
output, various distortion percentages can be
created by simply mixing the distortion with the
original signal.
54Distortion test with Cool-Edit
Here is a segment from the Faure Requiem,
repeated once Here is the same signal with
delta 0.05 and adelta 0.03. The distortion
has been amplified 20dB and the original signal
removed.
55Sum of signal and distortion
30
15
The first section is distorted the repeat is
clean. 15 distortion is quite difficult to hear
with this signal!!!
7
56So why does it sound fuzzy?
Filter the short segment of Faure at 1kHz with a
100Hz bandwidth. Playing this signal produces a
shattering perception, particularly at high
levels. Considerable energy in the 100Hz region
is perceived, even though there is no energy in
that frequency range.
57Result hair-cell distortion produces the
perception of shattering
- Explaination
- A 100Hz bandwidth noise-like signal at 1kHz
creates intermodulation products in the 100Hz
region when passed through an asymmetric
detector. - These subharmonics may excite the neural sensors
for low frequencies directly through motion of
the basilar membrane. - They also might be directly perceived by
correlation detectors in the 1kHz neural
channels. - As the frequency is raised above 1kHz both
detection mechanisms will be less active. - In fact, a 4kHz signal with a 100Hz bandwidth
produces very little shattering, and few
perceptual subharmonics. - A 4kHz signal with a 400Hz bandwidth sounds
pretty bad.
58Shattering at high frequencies
- Try an 8kHz signal with a 250Hz bandwidth
- How about 18kHz and a 500Hz bandwidth?
- How about 15kHz and 500Hz bandwidth?
59Converter Intermodulation
- Very inexpensive converters can have high
intermodulation distortion at high frequencies - For example, the converters in this laptop.
- Example 15kHz 500Hz bandwidth as output from
this laptop.
60Conclusions fuzzy speakers
- The loudspeakers tested have intermodulation
distortion lower than the threshold of detection
for complex tones. - Non-linear distortion in human hearing appears to
account for the audible distortion in full
chorus. - The non-linearities particularly at high
frequencies may be a form of age-related hearing
loss. - This type of distortion may be well understood by
researchers in the fields of hearing and speech,
or hearing pathology.
61Conclusions A/B tests
- No evidence was uncovered in this study that
would invalidate rapid, blind, A/B tests as the
gold standard for audio research. - But the possibility remains
- Particularly in the study of room acoustics
- intelligibility, muddiness, and envelopment all
may depend on the time period devoted to
listening to a particular acoustic signal.