Perception of mid frequency and high frequency intermodulation distortion in loudspeakers, and its relationship to high-definition audio. - PowerPoint PPT Presentation

View by Category
About This Presentation

Perception of mid frequency and high frequency intermodulation distortion in loudspeakers, and its relationship to high-definition audio.


loudspeakers, and its relationship to high-definition audio. (A physicist meets the twilight zone) ... ENTER the TWILIGHT ZONE ... More twilight ... – PowerPoint PPT presentation

Number of Views:268
Avg rating:3.0/5.0
Slides: 62
Provided by: Defa132


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Perception of mid frequency and high frequency intermodulation distortion in loudspeakers, and its relationship to high-definition audio.

Perception of mid frequency and high frequency
intermodulation distortion in loudspeakers, and
its relationship to high-definition audio.
  • (A physicist meets the twilight zone)
  • David Griesinger
  • Lexicon
  • 3 Oak Park
  • Bedford, MA 01730

Why Bother?
  • What is the relationship between high frequency
    intermodulation distortion and recordings with
    frequency response above 20kHz?
  • Why do my choral recordings sound fuzzy
    particularly when played at high level?
  • And does the perception of fuzziness grow more
    obvious with long-term listening?
  • Is the short-term A/B test the ultimate Gold
    Standard for audio reproduction?
  • Is it possible that long-term listening can
    reveal flaws that go undetected in a short-term
    A/B test?
  • This talk will examine the physics and physiology
    that these these questions involve.
  • We may even suggest a few answers!

High Frequency Intermodulation Distortion and
Ultrasonic reproduction
  • THE essential paper on this subject is Kiryu
    and Ashihara Detection of Threshold for tones
    above 22kHz. Convention paper 5401 presented
    at the 110th Convention, May 12-15 2001,
  • The authors presented 13 subjects with a test
    signal consisting of a 2kHz tone combined with
    odd order harmonics, both sonic and ultrasonic.
  • The ultrasonic harmonics were switched on and
    off at a 2Hz rate.
  • ALL subjects could discriminate the ultrasonics
    when the combined signal was presented through a
    single loudspeaker.
  • NONE of the subjects could discriminate the
    ultrasonics when each ultrasonic harmonic was
    reproduced from a separate speaker.

Spectrum at the listener position from Kiryu
and Ashihara
Note that an essential feature of this experiment
is that the stimulus harmonics are all ODD.
Asymmetric non-linearity produces both odd and
EVEN harmonics which are immediately visible
(and audible.)
Kiryu and Ashiharas result is strong
  • Their choice of source signal MAXIMIZES the
    (possible) audibility of an ultrasonic signal.
  • The sound pressure of the ultrasonic harmonics
    are equal to the sound pressure of the harmonics
    below 20kHz.
  • For almost all common sound sources the
    ultrasonics are weaker.
  • Thus if the ultrasonics are perceived directly
    through some effect of their presence on nerve
    firings for the lower harmonics, this signal
    should produce a positive result.
  • The use of only odd harmonics for the source
    signal maximizes the chance that the ultrasonics
    will be perceived if ANY part of human physiology
    is (asymmetrically) non-linear.
  • The basilar membrane is inherently asymmetrically
    non-linear. The hair cells are half-wave
  • The probability of finding asymmetric
    non-linearities in other parts of the system is

There was a null result!
  • When the EXTERNAL non-linearities were
    eliminated, there was NO ultrasonic perception.
  • We can conclude that
  • The mechanical conduction of ultrasonics to the
    basilar membrane is either effectively zero, or
    symmetrically linear.
  • If significant ultrasonic energy actually reaches
    the basilar membrane, there are NO hair cells
    that respond to it.
  • We will present evidence from other experiments
    that support these conclusions.

DG experiment Modulated Harmonics
  • We wanted to measure the distortion generating
    mechanism observed by Kiryu and Ashihara by
    inducing distortion in common tweeters.
  • Seemed like a simple experiment
  • Make some sweep signals in MATLAB
  • Record a few common instruments with a BK 4133
  • Filter out the frequencies below 20kHz
  • Play them back at various levels, and listen.

  • All I needed was a good quality sound card that
    would record and playback 96kHz.
  • Many manufacturers seem to offer such a thing at
    reasonable prices. I chose one by the most
    popular manufacturer lets call them C.
  • After the usual frustrating two hours to purchase
    the board, disassemble the computer, install the
    drivers, call customer service when the drivers
    crash, download new drivers from the web, etc,
  • Although you could set the device to record at
    96kHz, signals above 23kHz would not record.
  • Matlab sweeps generated at 96kHz would not play
    back above 23kHz.
  • More calls to customer service. Typical
    conversation What interrupts is the card using?
    What video card are you using? What operating
    system are you using? (Windows 2000 pro.) OH!
    Thats your problem try XP.
  • So I decided to bite the bullet and upgrade to
    XP. I also upgraded to the most expensive
    version of this manufacturers sound boards.
  • But the XP upgrade took longer than expected at
    the computer shop.

More twilight
  • I decided to buy a USB based device that did
    96/24, as this would be useful for demos from my
    laptop such as at this lecture.
  • But the drivers would not load in my regular
    laptop, and the customer service had no advice at
  • I tried a different laptop. The drivers loaded
    fine. Same operating system. (Windows ME.)
  • But this external board would also not record or
    play above 23kHz.
  • Customer service was again not very helpful.
    What interrupts is the USB service routine
    using? You are actually looking at the output
    with an oscilloscope!?
  • Its a good thing I am sometimes known as Mr.

A bit of light
  • So I just kept calling customer service until I
    got an operator who was willing to answer the
    question Are ANY of your boards actually capable
    of operating at 96kHz?
  • After a great deal of waiting on hold the answer
    came back.
  • NO
  • In fact, none of our competitors do either We
    all have 96/24 converters, but we run them at
  • What do you suggest that I do
  • Dont buy consumer go to pro.
  • So I did. No problem, just much more money.

So finally to the experiments
  • C language program was written to generate a
    twin-tone frequency sweep, broken into tone
    bursts (so as not to burn out the tweeter.)
  • The sweep is repeated with 6dB increases in
    level, so the level dependence of any distortion
    could be measured.

Spectrum of the burst signal
  • The two frequencies are relatively constant
    during each burst, but sweep slowly upwards in
  • The start frequencies and end frequencies of the
    sweep in each tone can be specified, along with
    the sweep rate.
  • This allows different types of non-linearity to
    be tested.

Result nothing significant is heard.
  • Subharmonics of the burst signal can be heard
    with difficulty
  • In a quiet room
  • When the ultrasonic level is above 80dB SPL at
    one meter the subharmonics are below 30dB SPL, at
    15cm from the tweeter.
  • The levels of the sub harmonics are consistent
    with the distortion in the amplifier - 0.1 .
  • The LOUDSPEAKER seems blameless.
  • 4 different loudspeakers with different tweeters
    were tried, with the same result.
  • The observed harmonics were produced by the

DG experiment 2 Rattling Keys
  • A set of three house keys on a plastic key ring
    were shaken in front of a BK 4133 microphone,
    and the output was recorded at 96kHz.
  • The resulting signal has an enormous crest
    factor 28dB.
  • This means it is 16dB quieter than
    non-compressed music with the same peak signal
  • And very high ultrasonic content.

Keys spectrum
  • Reproduce the signal with and without the
    ultrasonic component.
  • This was done by low-pass filtering the original
    signal at 20kHz, and alternating it with the full
    bandwidth signal.
  • Reproduce only the components above 20kHz and
    listen for ANY audible sound.

Keys ultrasonics
  • Waveform of the frequencies above 20kHz same
    scale as previous slide.

Result the same as for bursts
  • No difference could be heard with and without the
  • (but the one subject was rather old)
  • When the ultrasonic signals only were played at
    high levels, intermodulation products from the
    input signals were easily heard
  • - at levels consistent with amplifier distortion.

Conclusions from DG ultrasonic tests
  • The various tweeters tested 3 metal dome
    tweeters and one soft dome tweeter produce
    insignificant amounts of intermodulation products
    below 20kHz when driven by ultrasonic signals.
  • Amplifier distortion can produce distortion
    products below 20kHz that are audible (with
    difficulty) in the absence of other signals below
  • But with a high quality amplifier these
    distortion products are not audible in the
    presence of even extraordinary ultrasonic sources
    such as rattling keys.
  • Unless the amplifier is driven into clipping.

Ultrasonic content of musical instruments
  • Trumpet spectrum of the note with the highest

Sopranino Recorder
  • Spectrum of highest note 3200Hz

Sopranino Recorder 2
  • Although the highest note of the particular
    sopranino recorder I own produces ultrasonic
  • These harmonics are AT MAXIMUM 40dB below the
    level of the fundamental.
  • Compare this to the levels used by Kiryu and
    Ashihara, where the ultrasonic harmonics were
    equal in level to the fundamentals
  • Notice also that both even and odd harmonics are
    present in the sopranino.
  • So any even order harmonic products will be

  • DG lacks a home drum set. So he went looking for
    recordings of drums
  • I many samplers and examples of SACD and DVD
    audio disks.
  • Very few had any popular music that was not
    resampled from 48kHz.
  • So I borrowed three more samplers and five SACD
    disks from John Newton.
  • None of the popular music samples had anything at
    all above 23kHz.

SACD examples Sting
Steely Dan two against nature
Diana Krall
Jazz at 192kHz test DVD (the spot with the
highest harmonics)
Notice the ultrasonic harmonics are lower than
the fundamentals by more than 42dB.
John Eargle, Schnittke SACD
Note the ultrasonic harmonics disappear into the
SACD noise at about 27kHz. Remember that the
SACD noise is believed by everyone to be
Timing accuracy and information theory.
  • It is widely believed that the assumed
    superiority of DVD audio and SACD is improved
    resolution due to improved timing accuracy.
  • It is well known that human binaural hearing can
    distinguish timing differences between the ears
    of as little as two microseconds
  • This is often taken to imply that the frequency
    response of the physiological system must extend
    to 500kHz.
  • However it is easy to demonstrate that a 1kHz
    sine tone modulated with a raised cosine can
    be accurately localized, even though the waveform
    contains NO frequencies above 1400Hz!

Bandwidth and Signal to Noise Ratio
  • In Physics, the accuracy of timing is not
    determined by the bandwidth, but roughly by the
    product of the bandwidth and the signal to noise
  • Audio systems have low bandwidth but very high
    signal to noise ratio.
  • The hair cells in the basilar membrane fire when
    movement causes an ion channel to open.
  • Firings maximize at positive zero crossings of
    the membrane motion

Timing difference and signal to noise
  • The timing difference between two waveforms can
    be determined as long as the signal to noise
    ratio is high enough to allow accurate
    determination of the zero crossing.

Sampling Theory and Timing
  • Sampling theory proves that as long as the
    sampling rate is at least twice the bandwidth of
    a signal ALL the information content of that
    signal will be retained after sampling, and can
    be exactly reconstructed.

Sampling Theory and interchannel Timing
  • The timing of any zero-crossing is exactly
    preserved if SR gt 2BW. Extra samples are wasted.

Sampling and reconstruction in practice
  • 10 or more years ago it was difficult to sample
    signals with sufficient accuracy to approach the
    theoretical ideals.
  • Similarly, reconstruction with practical D/A
    converters and filters could cause (barely)
    audible artifacts.
  • The artifacts present in practical converters
    were reduced at higher sampling rates.
  • So raising the sampling rate above 48kHz seemed
    reasonable for the highest quality audio.
  • Presently integrated A/D and D/A converters are
    available that sample and reconstruct signals to
    the theoretical limits (at 18 or 20 bit
  • These converters are inexpensive and in wide use.
    For these converters, there is no advantage to
    higher sampling rates unless we can prove that
    ultrasonic frequencies somehow contribute to
    human perception.
  • The author is unaware of any experiment meeting
    double blind standards that supports this claim.

Absolute timing and Human Physiology
  • The human brain is a computer of great
    sophistication and complexity, with a clock
    frequency of 1kHz. (Hiroshi Riquimaroux)
  • The author is unaware of any experiment that
    shows musical timing accuracy in speech or music
    that is better than about 1ms.
  • Physiological processes do exist in binaural
    hearing with interchannel timing accuracies down
    to 2 microseconds, but this is not the same thing
    as long term timing accuracy over fractions of a
  • These interchannel timing differences are exactly
    preserved at common sampling rates.
  • Hiroshis comment pretty much sums it up!

Ultrasonic Directivity
  • The directivity of a tweeter depends on the
    diameter of the diaphragm and the frequency. As
    written in Matlab, using a Bessel function of
    order 1
  • If a is the diaphragm diameter,and lambda is the
    wavelength,and p is the sound pressure,
  • mu 2pia/lambda
  • p 2besselj(1,musin(theta))./(musin(theta))

To actually hear ultrasonics the listener must be
very carefully aligned with the driver, both
horizontally and vertically. This precision of
alignment is unlikely in music listening
From Philip Morse Vibration and Sound, Second
edition, McGraw-Hill, 1948
The frequency transmission of the pinnae and
middle ear
From B. C. J. Moore, B. R. Glasberg and T.
Baer, A model for the prediction of thresholds,
loudness and partial loudness, J. Audio Eng.
Soc., vol. 45, pp. 224-240 (1997).
Note that the external pinnae structures and the
mechanics of the middle ear severely attenuate
sound transmission above 4kHz. The slope of this
curve would predict a transmission factor of
40dB at 30kHz
Conclusions for High Definition Audio
  • Adding ultrasonics to a recording technique does
    NOT improve time resolution of typical signals
    either for imaging or precision of tempo. The
    presumption that it does is based on a
    misunderstanding of both information theory and
    human physiology.
  • Kiryu and Ashihara have shown that ultrasonic
    harmonics of a 2kHz signal are NOT audible in the
    absence of external (non-human) intermodulation
  • Their experiments put a limit on the possibility
    that a physiological non-linearity can make
    ultrasonic harmonics perceptible. They find that
    such a non-linearity does not exist at ultrasonic
    sound pressure levels below 80dB.
  • All commercial recordings tested by the author as
    of 6/1/03 contained either no ultrasonic
    information, or ultrasonic harmonics at levels
    more than 40dB below the fundamentals.
  • Our experiments suggest that the most important
    source of audible intermodulation for ultrasonics
    is the electronics, not in the transducers.
  • Some consumer grade equipment makes a tacit
    admission of the inaudibility of frequencies
    above 22kHz by simply not reproducing them. Yet
    the advertising for these products claims the
    benefits of higher resolution.
  • Even assuming ultrasonics are audible,
    loudspeaker directivity creates an unusually tiny
    sweet spot, both horizontally and vertically.

Mid-Frequency Intermodulation distortion or why
do my loudspeakers sound fuzzy?
  • Problem
  • Loudspeaker reproduction of massed chorus and
    orchestra is often perceived (by the author) as
    harsher than the live chorus.
  • The degree of harshness seemed to depend on the
    loudspeaker type, and on the playback level.
  • It seemed worthwhile to investigate whether this
    perception could be related to intermodulation

The validity of short duration A/B tests
  • Floyd Toole has established a protocol for
    loudspeaker evaluation that allows rapid
    comparison of two loudspeakers in the identical
    acoustical location.
  • Tests using this protocol have proven to be
    reliable, in that they consistently rank-order
    loudspeakers in a way that is robust for
    different listeners and for repeated tests.
  • But it is not clear that rapid A/B tests are the
    ideal way to test for intermodulation distortion,
    which may require a period of time to be
  • In a rapid A/B test the primary perception is
  • Once loudness has been controlled, differences in
    frequency response and timbre dominate the
  • Human hearing adapts to errors in spectrum and
    timbre over a period of 10 to 20 minutes.
  • Even an old fashioned phonograph sounds pretty
    good once you get used to it!
  • And many of the monitoring loudspeakers in common
    use are severely colored. The people who use
    them say they sound fine.
  • Is it possible that after adapting to spectrum
    and timbre intermodulation problems might become
    more apparent?

Test Signals
  • We wanted test signals that would mimic the
    levels and frequencies found in choral and
    orchestral music.
  • This music typically has fundamentals in the
    range of 100Hz to 1000Hz, with substantial
    high-order harmonics.
  • The mix of fundamentals and harmonics are related
    by common musical intervals.
  • A sweep signal consisting of two or more pitches
    was chosen, where the pitch intervals was held
    constant. Preliminary tests showed that the
    harmonic content of the chosen pitches did not
    significantly affect either the measured or
    perceived amount of intermodulation distortion
  • Eventually a minor triad was chosen as a test
    signal. The signal consists of three equal
    amplitude sine waves. A root, the minor third
    above, and the fifth above the root. The minor
    third was chosen as even tempered, and the fifth
    was chosen to be perfect.
  • Thus a sweep would consist of a frequency f0
    which sweeps from 250Hz to 4kHz, in combination
    with f1 1.1225f0, in combination with f2

Sweep rate and waveform
  • The sweep rate was set at 10 seconds for a four
    octave sweep.
  • The sweep is then repeated at a 6dB higher
    amplitude, until the maximum level is reached.
  • Each block in the waveform below is 10 seconds
    long, and sweeps f0 over four octaves.
  • In this case the signal has been formed by
    summing the sines of f0, f1, and f2, resulting in
    a symmetric output signal.

Fine waveform and spectrum
Result speaker 1
Typical output spectrum at the highest level.
  • Results were analyzed with a C language program
    that eliminated the source tones with a tracking
  • This program outputs files that can be plotted
    with MATLAB.

Notice the total distortion is not strongly level
dependent in this loudspeaker. Nor is it
strongly dependent on frequency. Is this
possibly a source of fuzziness?
Analysis of distortion
  • The tracking filter selects two harmonics
  • One characteristic of symmetric distortion,
  • And one characteristic of asymmetric distortion.

Symmetric distortion Asymmetric distortion
Notice that this loudspeaker has a symmetric
distortion characteristic almost independent of
level, at least below 1kHz. Asymmetric
distortion rises with level.
Is the observed distortion audible?
  • The observed distortion is audible on the test
    signal, particularly on the high level segment.
    But only if the test subject wears earplugs.
  • Distortion is audible on the lower level portions
    also, and it seems relatively independent of
  • But is the perceived distortion in the speaker
    or is it in the listener?
  • As a test, the same signal was reproduced through
    three loudspeakers, one for each frequency, f0,
    f1, and f2.
  • The perception was subtly different when the
    signals were combined and reproduced through a
    single loudspeaker.
  • But these differences could be ascribed to the
    non-anechoic conditions of the test.

Human hearing is inherently non-linear
Hair cells fire when the ion channel controlled
by the hair opens. This causes a burst of neural
activity at the zero-crossings of the pressure
waveform. This process is similar to a half-wave
rectifier followed by a differentiator.
All the sounds we hear pass through this
asymmetric non-linear system. We perceive the
signals as undistorted only through the action of
the filters in the basilar membrane. These
filters are not particularly effective at low
Hair cell firing
  • Hair cells act as a half-wave rectifier. We are
    unaware of the (negative) half of the waveform.

Result of the half-wave rectification
  • The pitch of low frequencies is determined not
    through the basilar membrane filters, but through
    the time intervals between nerve firings.
  • Consequently we cannot distinguish between real
    frequencies and subharmonics generated through
    the half-wave rectification process.
  • This leads to the well-known phenomenon of false
  • Listening to two tones that are harmonically
    related will often produce the perception of the
    fundamental. For example, a tone at 50Hz will be
    heard when 100Hz and 150Hz are played together.
  • Complex low frequency signals, such as a minor
    triad, are heard as an un-interpretable mix of
    fundamentals and harmonics.
  • Composers outside of grundge rock tend to
    avoid them!

Example a low frequency triad
  • We can generate a minor triad sweep from 80Hz to
  • The lower frequencies simply sound as if the
    loudspeaker is broken
  • Only above 250Hz do we begin to resolve the
    pitches that make up the harmony.

Distortion Models - symmetric
  • We need a mathematical model for loudspeaker
    distortion that will allow us to find the just
    noticeable level at which distortion becomes
  • After a great deal of head-scratching two simple
    models were chosen
  • 1. Symmetric compression where delta is
    typically 0.1 or less
  • If the signal voltage is positive
  • Vout exp((1-delta)log(Vin)
  • If the signal voltage is negative
  • Vout -exp((1-delta)log(abs(Vin))
  • This distortion is identical to a uniform
    compression of the waveform. If delta 0.1,
    then the waveform is compressed by 1dB for every
    10dB of level increase.
  • The advantage of this model is that the
    percentage distortion is independent of the
    signal level and spectrum.
  • There is a small dependence on crest factor.

Distortion Models asymmetric
  • A simple asymmetric distortion model can be
    created by using a different gain for the
    positive and negative signal voltages.
  • If the signal voltage is positive
  • Vout (1-delta_a)Vin
  • If the signal voltage is negative
  • Vout Vin
  • This model also produces a distortion that is
    independent of level and spectrum.
  • A C language program was written that applies
    these two distortions to a stereo input file,
    measures the resulting percentage of distortion,
    and outputs the distortion only as a stereo file.
  • This program allows us to listen to the result of
    the distortion process on a wide variety of input
    signals. Since the distortion alone is the
    output, various distortion percentages can be
    created by simply mixing the distortion with the
    original signal.

Distortion test with Cool-Edit
Here is a segment from the Faure Requiem,
repeated once Here is the same signal with
delta 0.05 and adelta 0.03. The distortion
has been amplified 20dB and the original signal
Sum of signal and distortion
The first section is distorted the repeat is
clean. 15 distortion is quite difficult to hear
with this signal!!!
So why does it sound fuzzy?
Filter the short segment of Faure at 1kHz with a
100Hz bandwidth. Playing this signal produces a
shattering perception, particularly at high
levels. Considerable energy in the 100Hz region
is perceived, even though there is no energy in
that frequency range.
Result hair-cell distortion produces the
perception of shattering
  • Explaination
  • A 100Hz bandwidth noise-like signal at 1kHz
    creates intermodulation products in the 100Hz
    region when passed through an asymmetric
  • These subharmonics may excite the neural sensors
    for low frequencies directly through motion of
    the basilar membrane.
  • They also might be directly perceived by
    correlation detectors in the 1kHz neural
  • As the frequency is raised above 1kHz both
    detection mechanisms will be less active.
  • In fact, a 4kHz signal with a 100Hz bandwidth
    produces very little shattering, and few
    perceptual subharmonics.
  • A 4kHz signal with a 400Hz bandwidth sounds
    pretty bad.

Shattering at high frequencies
  • Try an 8kHz signal with a 250Hz bandwidth
  • How about 18kHz and a 500Hz bandwidth?
  • How about 15kHz and 500Hz bandwidth?

Converter Intermodulation
  • Very inexpensive converters can have high
    intermodulation distortion at high frequencies
  • For example, the converters in this laptop.
  • Example 15kHz 500Hz bandwidth as output from
    this laptop.

Conclusions fuzzy speakers
  • The loudspeakers tested have intermodulation
    distortion lower than the threshold of detection
    for complex tones.
  • Non-linear distortion in human hearing appears to
    account for the audible distortion in full
  • The non-linearities particularly at high
    frequencies may be a form of age-related hearing
  • This type of distortion may be well understood by
    researchers in the fields of hearing and speech,
    or hearing pathology.

Conclusions A/B tests
  • No evidence was uncovered in this study that
    would invalidate rapid, blind, A/B tests as the
    gold standard for audio research.
  • But the possibility remains
  • Particularly in the study of room acoustics
  • intelligibility, muddiness, and envelopment all
    may depend on the time period devoted to
    listening to a particular acoustic signal.