The Physics and PsychoAcoustics of Surround Recording Part 2

About This Presentation

Title:

The Physics and PsychoAcoustics of Surround Recording Part 2

Description:

There is no phantom image in the stereo version. ... Segment of opera old Bolshoi. Segment of Verdi pitch coherence of the 2500Hz 1/3 octave band. ... – PowerPoint PPT presentation

Number of Views:166

Avg rating:3.0/5.0

Slides: 81

Provided by: davidgri

Category:

more less

Transcript and Presenter's Notes

Title: The Physics and PsychoAcoustics of Surround Recording Part 2

1
The Physics and Psycho-Acoustics of Surround
Recording Part 2

David Griesinger
Lexicon
dgriesinger_at_lexicon.com
www.world.std.com/griesngr

2
Introduction

We all know how to make a good recording
We need good music
A very good performance
And satisfactory balance between the solos and
the instruments.
But we want to make a great recording
How do we do it?
How do we know when a recording is great?
We must learn how to hear the technical quality
of a great recording,
And learn how to achieve the best result.
The talk is based on classical music but the
techniques and perceptions apply to all
recordings.

3
The recording space is very important!

It is much easier to achieve a great result in a
large hall.
But large halls with great acoustics are rare.
Our job is to make a great result in the hall we
have available (usually small).
This talk will tell you how to do it.
And help you hear the difference.
We will not talk about issues such as
instrumental balance
or the differences between microphones or sample
rates.
We will talk about basic sound properties
The clarity and localization of the direct sound
The perceived distance between the sound source
and the listener (depth)
The recording and reproduction of the sound of
the hall.

4
Major Goals

To review the physical and psychoacoustic
properties that make a great recording (or a
great performance space).
The clarity of the direct sound (the absence of
muddiness)
The creation of a large listening area and a
stable front image using three front speakers
in a 5.1 recording.
The blending together of the different
instruments into a whole acoustic scene through
early reflections.
The re-creation of the acoustic space of the
performance, through late reflections and
envelopment.
To show how muddiness occurs when there are too
many early reflections
To show how we perceive muddiness through our
perception of pitch.
To show how the loudspeaker positions in the
playback room influences the envelopment at low
frequencies.
To play as many musical examples as possible!

5
Localization a stable front image over a large
listening area

In a high-quality recording the front image does
not greatly change when a listener moves away
from the sweet spot.
Image stability requires using the center channel
speaker in a 5.1 recording.
Even without the center speaker some two channel
recordings are more stable than others.
Popular music recordings are often better than
classical recordings in image stablilty.
The secret is Amplitude Panning
Which is almost universally used in popular music
recording.

6
Time delay panning

Many engineers attempt to record a broad sound
source with closely spaced microphones
Omni microphones are often used in a so-called
Decca Tree.
Cardioid microphones are often used in the ORTF
configuration
Both these techniques rely on time delay
differences to spread the front image
Time delay spreading only works when the listener
is in the sweet spot.
The front image is not stable over a large area.

7
Training to hear localization

The importance of ignoring the sweet spot
Most research tests of localization use a single
listener, who is strictly restricted to the sweet
spot.
Your customers will not listen this way!
How do you know if the recording has a stable
front image?
Move laterally in front of the loudspeakers.
Does the sound image stay wide and fixed to the
loudspeakers, or does it follow you?
Do the soloists in the center follow you left or
right? If they do they are recorded with too
much phantom center.
Since most 5 channel recording methods are
derived from stereo techniques almost all have
too much phantom center.
A center image that follows a listener who moves
laterally out of the sweet spot is the most
common failing of even the best five channel
recordings.
Play examples

8
Example Time delay panning outside the sweet
spot.
Record the orchestra with a Decca Tree - three
omni microphones separated by one meter. A
source on the left will be picked up with equal
level in all three microphones. The time delays
will be different by -3ms.
On playback, a listener on the far right will
hear this instrument coming from the right
loudspeaker. This listener will hear every
instrument coming from the right.
9
Amplitude panning outside the sweet spot.
If you record with three widely spaced
microphones, an instrument on the left will have
high amplitude in the left microphone. The time
delay will also be much shorter.
A listener on the far right will hear the
instrument on the left. Now the orchestra
spreads out across the entire loudspeaker basis,
even when the listener is not in the sweet spot.
10
WARNING!!!

In the authors experience a front image that is
not stable when you walk in front of the speakers
will never make a great recording.
regardless of how beautiful it is in the sweet
spot.
This is my FIRST test of a recording, either two
channel or surround.

11
Summary of acoustic perceptions in a recording

1. Clarity the lack of muddiness
Clarity is perceived through the direct sound
sound that travels directly from the instrument
to the microphone.
A clear direct sound requires that the microphone
be relatively close to the instrument!
2. Blend and depth
Blend and depth are perceived through early
reflections that arrive from all around the
listener.
The total energy in these early reflections must
be less than the energy in the direct sound!
In a surround recording these reflections should
come equally from all the loudspeakers (except
the center,) and they must be decorrelated.
(different)
3. Envelopment (reverberation)
Envelopment is perceived through late reflected
energy that arrives from all around the listener.
(Not just from the rear!)
The energy must be decorrelated in each
loudspeaker

12
Clarity

Clarity to an acoustician is determined through
intelligibility the ability to understand
speech or a musical line.
For this talk I will use a different meaning
For me clarity is the perception that the sound
source is acoustically close to the listener.
While this definition may seem vague, almost
everyone agrees on the optimal acoustic distance
for a recorded sound source.
We can demonstrate this perception

13
Muddiness Dry Speech 40ms reflections
Mono speech The sound is clear, but much
too close to the loudspeaker. Speech with 40ms
allpass reflections and no direct
sound. Mono Stereo Note both the mono and the
stereo version sound muddy and distant. There is
no phantom image in the stereo version.
14
Reflections used in these experiments
The reflections used in these experiments form a
decaying burst which peaks about 25ms after the
direct sound, and has largely decayed away by
50ms. The reflections are different in the two
channels, and have a flat frequency response.
15
Optimum level for Early Reflections

Recorded sound consists of a mix of direct sound
and reflections
Too many reflections and muddiness results.
But reflections add a sense of blend and depth.
An optimum mix must be found.
The optimum level for early reflections is -4 to
-6dB relative to the direct sound.
This level is preferred by almost every listener.
In a surround recording the reflections should
come equally from all directions (except the
center), and be decorrelated.
The perceived result is independent of the
precise delay time and the pattern of the
reflections.
It is the total energy which determines the
perception.

16
Depth without Muddiness

Dry speech
Note the sound is uncomfortably close
Mix of dry with early reflections at -5dB.
The mix has distance (depth), and is not muddy!
Note there is no apparent reverberation, just
depth.
Same but with the reflections delayed 20ms at
-5dB.
Note also that with the additional delay the
reflections begin to be heard as discrete echos.
But the apparent distance remains the same.
Same but with the reflections delayed 50ms at
-3dB
Now the sound is becoming garbled. These
reflections are undesirable!
If the speech were faster it would be difficult
to understand.
Same but with reflections delayed 150ms at -12dB
I also added a few reflections between 20 and
80ms at a level of -8dB to
smooth the decay.
Note the strong hall sense, and the lack of
muddiness.

17
The ideal mix

We see from the previous slide that the ideal
acoustic mix has three independent perceptual
requirements
1. The direct sound dominates the total energy
by at least 4dB.
2. There are early reflections that add blend,
distance, and depth to the sound.
These should come equally from all directions in
a surround recording
And they should avoid adding energy in the 50ms
to 100ms time region.
3. There should be reflections (reverberation)
with time delays greater than 150ms to provide
the impression of the hall.
To make a great recording we must separately
capture all three!

18
Direction of early reflections

It is not possible to detect whether the
reflections come from the front or the rear when
they arrive between 20ms and 50ms after the end
of a sound.
But it is more natural if they come from both
front and rear.
Using all four speakers also results in the
largest sweet spot - demo

19
Muddiness is hard to avoid in small spaces!

We are attempting to show that the optimum total
energy for all reflections is at least 4dB less
than the direct sound.
The total reflected energy sum does not include
the floor reflection.
I will explain why later if there is time.
The direct sound must dominate the total sound
picture
The reverberation radius of a small hall or
church is usually below 2m, and may be as low as
1m.
Every microphone used in the recording picks up
both direct sound and reverberation.
But only the microphone closest to the sound
source picks up true direct sound.
Direct sound into all the other microphones is
perceived as a reflection, and adds to the
potential distance and muddiness.

20
Muddiness also comes from the playback room!
In this room there is no absorption in the front,
and thus the reverberation radius is small,
perhaps as low as 2.5m. The distance from the
front loudspeakers to the listeners is greater
than the reverberation radius. So the
reverberation will be stronger than the direct
sound. We are trying to keep the direct sound
stronger than the reflections by 4dB. This goal
is probably not possible to achieve in this room!
(Except at frequencies above 1000Hz, where the
side curtains begin to be absorptive.) Always mix
your recordings in an absorbent space!
21
Boston Cantata Singers Cantata 76Die Himmel
erzahlen die Ehre Gottes
Performance in Jordan Hall, January 23, 2004.
Reverberation time in Jordan 1.4 seconds at
1000Hz. This is similar to the Semperoper
Dresden. The typical audience member is 3
reverb radii from this singer. The dramatic
consequences are highly audible.
Although Jordan is beloved as a chamber music
hall, the stage house is deep and reverberant.
When the hall is full, the sound in the audience
can be dry and muddy. The recording engineer
must overcome these obstacles.
22
Cantata Singers Bach BWV 76
Multimiked recording. Note the clarity of vocal
timbre (low sonic distance).
Recording simulating the sound in the hall. Note
the timbre coloration and the sense of distance
to the performers. With the picture and after
adaptation the performance is quite enjoyable.
23
The Ideal Reverberation

has 20ms to 50ms reflections with a total energy
-4dB to -6dB
has relatively little energy from 50 to 150ms.

24
Most small rooms (including playback rooms)

Have exponential decay
If we pick up enough late reflections to hear the
hall, we will get too many early reflections.
We will get coloration and poor intelligibility.

25
Example of as small recording space Swedenborg
Chapel, Cambridge
26
Oriana Consort in Swedenborg Chapel
27
Oriana Setup
28
Recording in Sweedenborg Chapel, Cambridge

The chapel holds perhaps 200 people, but when it
is empty the RT is 1.8 seconds.
And the reverberation radius is 1.5m
The picture shows four supercardioid microphones
about 1m from the chorus. These provide the
direct sound.
With the supercardioid pattern we have a 6dB
direct/reverberant ratio, so the reverberation is
less than the direct sound by about 6dB.
Note that in this space we must add hall sound
and early reflections very carefully, or the
sound will become muddy!
In addition the early reflections and
reverberation arrive soon after the direct sound.
The sound seems small and cramped. There is no
sense of space around the direct sound.
The chorus microphones are as close as they can
be to the chorus without creating balance
problems.
We cannot exclude the early reverberation by
moving the mikes closer.

29
Main microphones in Sweedenborg Chapel

The picture also shows two variable pattern
microphones about 2m from the chorus.
I put these there for an experiment. The sound
is not very good
The problem with a main microphone pair in this
space is that it must be placed too far from the
singers!
A main pair must be at least 2.5m away or there
will be balance problems.
This distance is beyond the reverberation radius,
and the sound will be muddy.

30
Hall Sound in Sweedenborg

The chapel is reverberant with a high
reverberation level
But the reverberation is too strong in the
10-150ms time range.
Using cardioid microphones pointing away from the
sound source reduces the early reverberation
energy and maximizes the late energy.
The hall sounds larger and better.

31
Distance Perception and MUD

Reflections during the sound event and up to
150ms after it ends create the perception of
distance
But there is a price to pay
Reflections from 10-50ms do not impair
intelligibility.
The fluctuations they produce are perceived as an
acoustic halo or airaround the original sound
stream. (ESI)
Reflections from 50-150ms contribute to the
perception of distance but they degrade both
timbre and intelligibility, producing the
perception of sonic MUD.
We will have many examples of mud in this talk!

32
Training to hear MUD

Mud occurs when the reverberant decay of the
recording venue has too much reflected energy in
the 10-150ms region of the decay curve.
This is true of nearly all sound stages, small
auditoria, and churches.
If you are recording in such a space with a
relatively large ensemble, you are in trouble.
The perception of mud can be tricky, because our
hearing mechanism adapts to a muddy environment,
and the sonic degradation becomes inaudible after
about 10 minutes.
It is easy to convince yourself the recording is
excellent when you have been listening to it all
day.
This is why we can enjoy a concert even when we
are sitting far from the instruments.
You MUST compare your recording to a reference
recording in a short time A/B test.

33
Example John Eargle at Skywalker ranch

John Eargle has made wonderful recordings,
particularly those with the Dallas Symphony on
Delos Records
But even he can be fooled by a small space
As I said, you adapt quickly to such a space, and
no longer hear the mud that it produces.
John Eargle recently made a 5.1 channel DVD audio
recording at the Skywalker ranch in Los Angeles.
He was very excited by it but listen and
compare to Dallas.
Skywalker is a large sound stage with
controllable acoustics. It is not a concert
hall.
As a consequence the reverberation radius is
relatively short. By my estimate (without having
seen it) the radius is less than 3.5 meters.
It is very easy to record mud in such a space.
Many instruments are beyond the reverb radius.
Adding more microphones only increases the
reverberant pickup.

34
Recording in a large space is much easier!
Covenant church is a very large space, holding
more than 1000 people. It is damped by pew
cushions and acoustic treatment on the walls,
yielding a RT of 2.5 seconds and a large
reverberation radius probably above 3m. The
microphones can be quite distant without picking
up early reflections or reverberation. It is a
very good place to record! (And it is
exceptionally beautiful visually)
35
Example depth perspective through mike
technique

When the reverberation radius is large enough we
can use an extra pair of microphones to create a
single early reflection.
This can provide the needed perspective and depth

Direct sound Early reflection Late
reverberation Direct Early -5dB Direct
Early Late -8dB
Mike 480L
Mike 480L
36
The depth impression is greatly improved in
surround

I will run the same experiment, but use all five
speakers.
The early reflections will come from both the
front and rear equally, but different delay
patterns will be used for each speaker.
This means the reflections are decorrelated.
The late (hall) reflections will also come
equally but decorrelated in the front and rear
speakers.
This will create a large and uniform sweet spot
for the acoustics.

37
The Polyhymnia Pentangle

The Polyhymnia engineers employ a surround array
of spaced omni microphones, at a spacing similar
to the ITU playback array.
The technique works well in spaces where the
reverberation radius is equal to or greater than
the microphone spacing!
In this case the direct sound picked up by the
rear microphones is perceived as an early lateral
reflection and the adds distance to the front
image.
Caution!! In a small hall this array will be TOO
MUDDY!!!

In practice the Polyhemnia engineers often pick
up the direct sound with accent microphones. In
this case the front microphones provide a first
reflection to the front speakers. The center
microphone is also often moved closer to the
sound sources, so it picks up mostly direct sound.
38
Boston Symphony Hall
39
Boston Symphony Hall

2631 seats, 662,000ft3, 18700m3, RT 1.9s
Its enormous!
One of the greatest concert halls in the world
maybe the best.
Recording here is almost too easy!
Working here is a rare privilege
Sufficiently rare I do not do it. (Its a union
shop.)
The recording in this talk is courtesy of Alan
McClellan of WGBH Boston. (Mixed from 16 tracks
by the presenter)
Reverb Radius is gt20 (gt6.6m) even on stage.
The stage house is enormous and NOT reverberant.
With the orchestra in place, stage house RT 1
sec

40
Boston Symphony Hall, occupied, stage to front of
balcony, 1000Hz
This picture compares favorably to our picture of
the ideal reverberation on a recording. But this
is what an audience member hears 100 feet from
the stage!
41
Boston Symphony Orchestra in Symphony Hall
42
Boston Cantata Singers in Symphony Hall. March
17, 2002
43
Microphone Array (WGBH)
44
Beware the main microphone array

Nearly all engineers will provide a main
microphone usually a Decca Tree, or a pair of
omni or cardioid microphones.
Almost always the sound from this array is only
acceptable for instruments close to the
microphones.
Most of the instruments are far beyond the
reverberation radius.
The more distant instruments must be spot-miked.
A cardioid pair (ORTF) has too much phantom
center for an acceptable surround recording.
(this is a two-channel technique only.)
Very frequently time delay panning (for a Decca
Tree or spaced omnis) makes the sound unusable
in a high-quality mix.
Time delay panning makes the front image unstable
Closely spaced microphones yield high correlation
at low frequencies, which degrades the sense of
space.
It is better to simply turn off the main
microphone (even if your instructor insists you
install one.)
In our Boston Symphony Hall recording a pair of
BK omnis spaced 25cm was hung behind the
conductor by the WGBH engineer.

Front pair
Front pair LF
45
Correlation in the main microphone two omnis
spaced by 25cm, just behind the conductor.
___ measured correlation - - - calculated,
assuming d25cm
The high correlation in this pair makes the sound
unusable in a stereo or surround mix. It sounds
unpleasant even in this lecture room, as the
audio demo makes clear.
46
Beware the exclusive use of spaced front
microphones

In our recording the wide front orchestra pick-up
is fine for the first row of the strings.
But nearly all the orchestra is beyond the
reverberation radius for these microphones.
If we want good balance and clarity, we must use
additional microphones over the orchestra
And treat these microphones as part of our main
array.
Using cardioid microphones in front will help a
lot.
The cardioid is 4.7 dB less sensitive to
reverberation, which will pick out more distant
instruments with clarity.
Using super cardioid microphones will help a
little bit more.
But if the stage house is reverberant the
improvement is minimal.
The author greatly prefers to use (equalized)
directional microphones for orchestra and chorus
pick-up.
After equalization the bass performance is
adequate.
There is better control of leakage, and less MUD.

47
Balance and distance come first

In any recording the balance between the musical
forces should reflect the needs of the music.
In this recording, even with 120 singers the
chorus is nearly inaudible in the hall.
So we must heavily use the chorus accent
microphones.
In the final mix MOST of the energy in the
recording will come from these. In practice,
these are our MAIN microphones!
However, if we heavily use the chorus
microphones, the chorus will sound too close to
the loudspeakers
And in front of the orchestra.
To correct this distance problem we MUST use
electronic early reflections.
There is no other possible solution.
Play example

48
Lets build the hall sound

We need decorrelated reverberation in both the
front and the rear with equal level
Test just the hall microphones to see if the
reverberation is enveloping and uniform.
Then add the front microphones for the direct
sound.
Where the hall balance is not correct you MUST
augment the natural reverberation with
electronics.
In this recording the orchestra is much stronger
than the chorus even with 120 singers and
there is too little chorus in the natural
reverberation!!
When we add the accent microphones the chorus
will sound as if they are in a smaller space.
So we add electronic reverberation from the
chorus (equally in all four outer speakers) from
the surround reverberator.

49
Final Mix

The final mix uses the three omni microphones
over the chorus as the main microphones. They
are simply patched to left, center, and right.
The spot microphones for the soloists are mostly
mixed to the center, with some panning to the
left or right. (No divergence was used.)
The orchestra is a combination of two wide spaced
omnis patched to left front and right front.
Augmented by spot microphones over the woodwinds
and the more distant strings.
the center channel was provided automatically
through leakage from the soloistss microphones.
The rear channels come from a widely spaced pair
of omnis about 20 feet behind the conductor,
Extensively augmented by electronic early
reflections and late reverberation.

50
Hall sound decorrelation at low frequencies.

It is widely believed that localization is
impossible below 100Hz.
So a single subwoofer has become the standard for
reproducing low frequencies.
Although localization below 100Hz is difficult in
a small room, there is a large difference between
a single subwoofer and an independently driven
pair.
We have turned off the subwoofer in this room and
we are running the other speakers full-range.
A great recording will easily demonstrate the
difference between a single subwoofer and
full-range discrete speakers.
As a consequence you must be sure the hall sound
in your recordings is decorrelated at low
frequencies!
Both in the front and in the rear of a surround
recording.
Most single microphone array surround techniques
fail for this reason.

51
Conclusions

A great recording
Has a stable front image over a large listening
area.
Has direct sound stronger than early reflections,
microphone leakage, and reverberation.
So it is not MUDDY!
Has decorrelated early reflections both in the
front speakers and in the rear speakers.
These provide a sense of blend and depth to the
recording. But be sure to mix in an absorbent
space!
Has decorrelated late reverberation in both the
front and the back speakers.
The decorrelation must be active for low
frequencies
It is possible to make a great recording in a
small space
But if the group is physically larger than the
reverberation radius, electronic early
reflections and reverberation will probably be
necessary.

52
Medial Reflections the detection of muddiness.

Medial reflections can cause clear differences in
quality.
We can measure medial energy through an analysis
of pitch.
Pitch information is available in each critical
band, even those above the frequency of auditory
phase-locking.
Here is an example of speech filtered into a
1000Hz 1/3 octave band.

The waveform appears to be a series of decaying
tone bursts, repeating at the fundamental
frequency. When this signal is rectified, there
is substantial energy at the fundamental
frequency.
53
Waveform of speech formants
The waveform of the word five in the 2kHz 1/3
octave band.
The same, but convolved with a 20ms windowed
burst of white noise, simulating a diffuse
reflection, or the sound of a small reverberant
room.
Non-reverberant speech has a clear repeating
pattern in the waveform. Reverberant speech does
not. We can devise a measurement system around
this difference.
54
The plus/minus pitch detector
The pitch detector operates separately on each
third octave band. Each band is rectified and
low-pass filtered. The output is delayed, and
then added and subtracted from the undelayed
signal. The logs of the plus signal and the
minus signal are then subtracted from each
other. The result has a high sensitivity to
fundamental pitch.
55
Example one, two 2500Hz 1/3 octave band.
Pitch detector output with dry speech the
syllables one, two with no added reverberation.
Note the high accuracy of the fundamental
extraction and the gt15dB S/N
56
Same but convolved with 20ms of white noise
Convolving with white noise does not change the
intelligibility, nor the C80, but dramatically
changes the sound and the pitch coherence. By
chance the second syllable is not seriously
degraded, but the first one is at least in this
1/3 octave band The sound quality is markedly
degraded. We need a measure for this perception.
57
one,two 2500Hz band equal mix of direct and
one diffuse reflection at 30ms.
The high pitch coherence and high
direct/reverberant ratio in the first 30ms is
easily seen at the start of each syllable.
58
Segment of opera old Bolshoi
Segment from the old Bolshoi
Segment from the new Bolshoi. (I was unable to
produce a similar plot.)
Segment of Verdi pitch coherence of the 2500Hz
1/3 octave band. F, F, glide to A. Recording
from the back of the first balcony. There is no
obvious gap before reflections arrive, and the
pitch coherence appears relatively high.
59
Sound examples syllables one,two,three with
no reverberation
1kHz 1/3 octave band
1.25kHz
1.6kHz
2kHz
2.5kHz
3.2kHz Note the height and frequency
of the pitch coherence peaks are (almost) uniform
through all bands.
60
Maximum pitch coherence vs 1/3 octave bandfor
non-reverberant speech
The syllables one two three four five six seven
are analyzed. Note that the maximum pitch
coherence is relatively constant across all 1/3
octave bands, although the value depends on the
particular vowel
61
one,two,three convolved with 20ms noise
1kHz
1.25kHz
1.6kHz
2kHz
2.5kHz
3.2kHz
Note that most of the pitch coherence has been
eliminated
62
Maximum pitch coherence vs /3 octave bandsfor
speech convolved with 20ms noise.
The syllables one two three four five six seven
are analyzed. Note the pitch coherence is low and
not constant across third octave bands.
63
Pitch coherence of speech with a diffuse
reflection at a level of 0dB
1kHz
1.25kHz
1.6kHz
2kHz
2.5kHz
Note the low pitch coherence for some
of the syllables in several bands
64
Maximum pitch coherence vs 1/3 octave bands for
direct reverb at 0dB
Analysis of the syllables one two three four
five six seven. Note the low and noise-like
coherence for most of the syllables.
65
Pitch coherence of speech with a diffuse
reflection at a level of -4dB (optimum)
1kHz
1.25kHz
1.6kHz
2kHz
2.5kHz
3.2kHz Note the high pitch
coherence on most syllables in most bands. This
reflection level is usually chosen as optimum.
66
Max pitch coherence vs 1/3 octave band for direct
and reflected at -4dB
Analysis of the syllables one two three four
five six seven. Note the pitch coherence is
both high and uniform across 1/3 octave bands
67
Teatro Alla Scala, Milan
Echograms from LaScala. (From Hidaka and
Beranek) illustrate these profiles Top curve -
2kHz octave band, 0-200ms At 2kHz note the high
direct sound and low level of reflections in the
50-150ms time range. Bottom curve - 500Hz octave
band 0-200ms Note the high reverberation level
and short critical distance.
68
Lets listen to Alla Scala!

Matlab can be used to read these printed impulse
responses and convert them into real impulse
responses.
1. First we read the .bmp file from a scan, and
convert the peaks in the file to delta functions
with identical time delay, and an amplitude
equivalent to the peak height.
All the direct sound energy is combined into a
single delta function, and the level of the
direct sound is normalized (relative to the rest
of the decay), so the 2kHz and 500kHz impulses
can be accurately combined.
2. We then apply a random variable - 5ms to the
delay time to correct for the quantization in the
scan.
3. We then extend the echogram to higher times by
tacking on an exponentially decaying segment of
white noise, with a decay rate equal to the
published data for the hall.
4. We then filter the result for the 2kHz
echogram with a 1k high-pass filter, and combine
it with the 500Hz echogram low-pass filtered at
1kHz.
5. If desired we can create a right channel and
a left channel reverberation by using a
different set of random variables in steps 2 and
3.
6. We convolve a segment of dry sound with the
new impulse response.
The result is sonically quite convincing!

69
Alla Scala at 500Hz reading the plot
Top curve 500Hz measured impulse response as
given by Beranek. JASA Vol. 107 1, Jan 2000, pp
356-367 Bottom curve impulse response as
regenerated from delta functions, passed through
a 500Hz 6th order 1 octave filter. Note the
correspondence is more than plausible.
70
Alla Scala 500Hz randomizing and extending
Top graph Alla Scala published data Bottom
graph regenerated impulse response after
randomization and extension.
71
Pitch coherence of speech in La Scalla
1kHz
1.25Hz
1.6kHz
2kHz
2.5kHz
3.2kHz Note the excellent
sharpness of the pitch peaks, and good
consistency across bands.
72
Maximum coherence vs 1/3 octave bands La Scala,
Milan
Pitch coherence is similar to our example where
the direct/reverberant ratio 4dB While not as
clear as in some examples, fundamental pitch is
easily extracted using this simple detector.
73
Listen to Alla Scala, NNT Tokyo, Semperoper
2kHz
500Hz
2kHz and 500Hz Impulse responses from Scala
Milan NNT Theater Tokyo Semper Oper
Dresden (All data from Hidaka and Beranek)
Original Sound
74
Pitch Coherence NNT opera house, Tokyo
1kHz
1.25kHz
1.6kHz
2kHz
2.5kHz
3.2kHz Note the peaks where they
exist are very broad, indicating inexact pitch
extraction. For most bands, there is no
extracted pitch for all syllables.
75
Maximum coherence vs 1/3 octave band NNT Opera
Theater, Tokyo
Fundamental pitch is not extractable using this
simple detector.
76
Binaural Examples in Opera Houses

It is very difficult to study opera acoustics, as
the sound changes drastically depending on
the set design,
the position of the singers (actors),
the presence of the audience, and
the presence of the orchestra.
Binaural recordings made during performances give
us the only clues.
Here is a sound bite from a famous German opera
house Note the excessive distance of the
singers, and the low intelligibility. This is
MUD in action!
And here is an example from another famous German
opera house Note the increase in
intelligibility, reduced distance, and the
improvement in dramatic connection between the
singer and the audience.

77
Synthetic Opera House Study

We can use MC12 Logic 7 to separate the orchestra
from the singers on commercial recordings, and
test different theories of balance and
reverberation.
From Elektra Barenboim. Balance in original
is OK by Barenboim.

Original Orchestra LeftRight Vocals Downmix -
No reverb on the singers Reverb from
orchestra Reveb from singers Downmix with
reverb on the singers.
78
Muddiness Dry Speech 20ms noise
Mono speech signal Convolved with noise
(diffuse reflections) Mono Stereo Note the
reflections increase muddiness and distance. The
stereo version is more natural than the mono, but
equally distant.
79
Recorded speech in Covenant
Voice segment recorded at 1.5m with a
supercardioid mike The same segment with the
reflections below. Note the muddiness increases
dramatically A frequency-flat reflection pattern
with peak energy about 30ms after the direct sound
80
Demo 1 Clarity

Demonstrate dry sound
Demonstrate muddy sound by adding reflections in
monaural.
Note that adding only very early reflections does
not decrease the intelligibility.
But it increases the perceived distance of the
source.
Demonstrate adding reflections in surround
Note that adding the reflections in surround
increases the perceived distance more
effectively.
Less reflected energy is needed, and the direct
sound remains clear.
The optimum early energy is between -4dB and -6dB