The Physics and PsychoAcoustics of Surround Recording Part 2 - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

The Physics and PsychoAcoustics of Surround Recording Part 2

Description:

There is no phantom image in the stereo version. ... Segment of opera old Bolshoi. Segment of Verdi pitch coherence of the 2500Hz 1/3 octave band. ... – PowerPoint PPT presentation

Number of Views:166
Avg rating:3.0/5.0
Slides: 81
Provided by: davidgri
Category:

less

Transcript and Presenter's Notes

Title: The Physics and PsychoAcoustics of Surround Recording Part 2


1
The Physics and Psycho-Acoustics of Surround
Recording Part 2
  • David Griesinger
  • Lexicon
  • dgriesinger_at_lexicon.com
  • www.world.std.com/griesngr

2
Introduction
  • We all know how to make a good recording
  • We need good music
  • A very good performance
  • And satisfactory balance between the solos and
    the instruments.
  • But we want to make a great recording
  • How do we do it?
  • How do we know when a recording is great?
  • We must learn how to hear the technical quality
    of a great recording,
  • And learn how to achieve the best result.
  • The talk is based on classical music but the
    techniques and perceptions apply to all
    recordings.

3
The recording space is very important!
  • It is much easier to achieve a great result in a
    large hall.
  • But large halls with great acoustics are rare.
  • Our job is to make a great result in the hall we
    have available (usually small).
  • This talk will tell you how to do it.
  • And help you hear the difference.
  • We will not talk about issues such as
    instrumental balance
  • or the differences between microphones or sample
    rates.
  • We will talk about basic sound properties
  • The clarity and localization of the direct sound
  • The perceived distance between the sound source
    and the listener (depth)
  • The recording and reproduction of the sound of
    the hall.

4
Major Goals
  • To review the physical and psychoacoustic
    properties that make a great recording (or a
    great performance space).
  • The clarity of the direct sound (the absence of
    muddiness)
  • The creation of a large listening area and a
    stable front image using three front speakers
    in a 5.1 recording.
  • The blending together of the different
    instruments into a whole acoustic scene through
    early reflections.
  • The re-creation of the acoustic space of the
    performance, through late reflections and
    envelopment.
  • To show how muddiness occurs when there are too
    many early reflections
  • To show how we perceive muddiness through our
    perception of pitch.
  • To show how the loudspeaker positions in the
    playback room influences the envelopment at low
    frequencies.
  • To play as many musical examples as possible!

5
Localization a stable front image over a large
listening area
  • In a high-quality recording the front image does
    not greatly change when a listener moves away
    from the sweet spot.
  • Image stability requires using the center channel
    speaker in a 5.1 recording.
  • Even without the center speaker some two channel
    recordings are more stable than others.
  • Popular music recordings are often better than
    classical recordings in image stablilty.
  • The secret is Amplitude Panning
  • Which is almost universally used in popular music
    recording.

6
Time delay panning
  • Many engineers attempt to record a broad sound
    source with closely spaced microphones
  • Omni microphones are often used in a so-called
    Decca Tree.
  • Cardioid microphones are often used in the ORTF
    configuration
  • Both these techniques rely on time delay
    differences to spread the front image
  • Time delay spreading only works when the listener
    is in the sweet spot.
  • The front image is not stable over a large area.

7
Training to hear localization
  • The importance of ignoring the sweet spot
  • Most research tests of localization use a single
    listener, who is strictly restricted to the sweet
    spot.
  • Your customers will not listen this way!
  • How do you know if the recording has a stable
    front image?
  • Move laterally in front of the loudspeakers.
    Does the sound image stay wide and fixed to the
    loudspeakers, or does it follow you?
  • Do the soloists in the center follow you left or
    right? If they do they are recorded with too
    much phantom center.
  • Since most 5 channel recording methods are
    derived from stereo techniques almost all have
    too much phantom center.
  • A center image that follows a listener who moves
    laterally out of the sweet spot is the most
    common failing of even the best five channel
    recordings.
  • Play examples

8
Example Time delay panning outside the sweet
spot.
Record the orchestra with a Decca Tree - three
omni microphones separated by one meter. A
source on the left will be picked up with equal
level in all three microphones. The time delays
will be different by -3ms.
On playback, a listener on the far right will
hear this instrument coming from the right
loudspeaker. This listener will hear every
instrument coming from the right.
9
Amplitude panning outside the sweet spot.
If you record with three widely spaced
microphones, an instrument on the left will have
high amplitude in the left microphone. The time
delay will also be much shorter.
A listener on the far right will hear the
instrument on the left. Now the orchestra
spreads out across the entire loudspeaker basis,
even when the listener is not in the sweet spot.
10
WARNING!!!
  • In the authors experience a front image that is
    not stable when you walk in front of the speakers
    will never make a great recording.
  • regardless of how beautiful it is in the sweet
    spot.
  • This is my FIRST test of a recording, either two
    channel or surround.

11
Summary of acoustic perceptions in a recording
  • 1. Clarity the lack of muddiness
  • Clarity is perceived through the direct sound
    sound that travels directly from the instrument
    to the microphone.
  • A clear direct sound requires that the microphone
    be relatively close to the instrument!
  • 2. Blend and depth
  • Blend and depth are perceived through early
    reflections that arrive from all around the
    listener.
  • The total energy in these early reflections must
    be less than the energy in the direct sound!
  • In a surround recording these reflections should
    come equally from all the loudspeakers (except
    the center,) and they must be decorrelated.
    (different)
  • 3. Envelopment (reverberation)
  • Envelopment is perceived through late reflected
    energy that arrives from all around the listener.
    (Not just from the rear!)
  • The energy must be decorrelated in each
    loudspeaker

12
Clarity
  • Clarity to an acoustician is determined through
    intelligibility the ability to understand
    speech or a musical line.
  • For this talk I will use a different meaning
  • For me clarity is the perception that the sound
    source is acoustically close to the listener.
  • While this definition may seem vague, almost
    everyone agrees on the optimal acoustic distance
    for a recorded sound source.
  • We can demonstrate this perception

13
Muddiness Dry Speech 40ms reflections
Mono speech The sound is clear, but much
too close to the loudspeaker. Speech with 40ms
allpass reflections and no direct
sound. Mono Stereo Note both the mono and the
stereo version sound muddy and distant. There is
no phantom image in the stereo version.
14
Reflections used in these experiments
The reflections used in these experiments form a
decaying burst which peaks about 25ms after the
direct sound, and has largely decayed away by
50ms. The reflections are different in the two
channels, and have a flat frequency response.
15
Optimum level for Early Reflections
  • Recorded sound consists of a mix of direct sound
    and reflections
  • Too many reflections and muddiness results.
  • But reflections add a sense of blend and depth.
  • An optimum mix must be found.
  • The optimum level for early reflections is -4 to
    -6dB relative to the direct sound.
  • This level is preferred by almost every listener.
  • In a surround recording the reflections should
    come equally from all directions (except the
    center), and be decorrelated.
  • The perceived result is independent of the
    precise delay time and the pattern of the
    reflections.
  • It is the total energy which determines the
    perception.

16
Depth without Muddiness
  • Dry speech
  • Note the sound is uncomfortably close
  • Mix of dry with early reflections at -5dB.
  • The mix has distance (depth), and is not muddy!
  • Note there is no apparent reverberation, just
    depth.
  • Same but with the reflections delayed 20ms at
    -5dB.
  • Note also that with the additional delay the
    reflections begin to be heard as discrete echos.
  • But the apparent distance remains the same.
  • Same but with the reflections delayed 50ms at
    -3dB
  • Now the sound is becoming garbled. These
    reflections are undesirable!
  • If the speech were faster it would be difficult
    to understand.
  • Same but with reflections delayed 150ms at -12dB
  • I also added a few reflections between 20 and
    80ms at a level of -8dB to
  • smooth the decay.
  • Note the strong hall sense, and the lack of
    muddiness.

17
The ideal mix
  • We see from the previous slide that the ideal
    acoustic mix has three independent perceptual
    requirements
  • 1. The direct sound dominates the total energy
    by at least 4dB.
  • 2. There are early reflections that add blend,
    distance, and depth to the sound.
  • These should come equally from all directions in
    a surround recording
  • And they should avoid adding energy in the 50ms
    to 100ms time region.
  • 3. There should be reflections (reverberation)
    with time delays greater than 150ms to provide
    the impression of the hall.
  • To make a great recording we must separately
    capture all three!

18
Direction of early reflections
  • It is not possible to detect whether the
    reflections come from the front or the rear when
    they arrive between 20ms and 50ms after the end
    of a sound.
  • But it is more natural if they come from both
    front and rear.
  • Using all four speakers also results in the
    largest sweet spot - demo

19
Muddiness is hard to avoid in small spaces!
  • We are attempting to show that the optimum total
    energy for all reflections is at least 4dB less
    than the direct sound.
  • The total reflected energy sum does not include
    the floor reflection.
  • I will explain why later if there is time.
  • The direct sound must dominate the total sound
    picture
  • The reverberation radius of a small hall or
    church is usually below 2m, and may be as low as
    1m.
  • Every microphone used in the recording picks up
    both direct sound and reverberation.
  • But only the microphone closest to the sound
    source picks up true direct sound.
  • Direct sound into all the other microphones is
    perceived as a reflection, and adds to the
    potential distance and muddiness.

20
Muddiness also comes from the playback room!
In this room there is no absorption in the front,
and thus the reverberation radius is small,
perhaps as low as 2.5m. The distance from the
front loudspeakers to the listeners is greater
than the reverberation radius. So the
reverberation will be stronger than the direct
sound. We are trying to keep the direct sound
stronger than the reflections by 4dB. This goal
is probably not possible to achieve in this room!
(Except at frequencies above 1000Hz, where the
side curtains begin to be absorptive.) Always mix
your recordings in an absorbent space!
21
Boston Cantata Singers Cantata 76Die Himmel
erzahlen die Ehre Gottes
Performance in Jordan Hall, January 23, 2004.
Reverberation time in Jordan 1.4 seconds at
1000Hz. This is similar to the Semperoper
Dresden. The typical audience member is 3
reverb radii from this singer. The dramatic
consequences are highly audible.
Although Jordan is beloved as a chamber music
hall, the stage house is deep and reverberant.
When the hall is full, the sound in the audience
can be dry and muddy. The recording engineer
must overcome these obstacles.
22
Cantata Singers Bach BWV 76
Multimiked recording. Note the clarity of vocal
timbre (low sonic distance).
Recording simulating the sound in the hall. Note
the timbre coloration and the sense of distance
to the performers. With the picture and after
adaptation the performance is quite enjoyable.
23
The Ideal Reverberation
  • has 20ms to 50ms reflections with a total energy
    -4dB to -6dB
  • has relatively little energy from 50 to 150ms.

24
Most small rooms (including playback rooms)
  • Have exponential decay
  • If we pick up enough late reflections to hear the
    hall, we will get too many early reflections.
  • We will get coloration and poor intelligibility.

25
Example of as small recording space Swedenborg
Chapel, Cambridge
26
Oriana Consort in Swedenborg Chapel
27
Oriana Setup
28
Recording in Sweedenborg Chapel, Cambridge
  • The chapel holds perhaps 200 people, but when it
    is empty the RT is 1.8 seconds.
  • And the reverberation radius is 1.5m
  • The picture shows four supercardioid microphones
    about 1m from the chorus. These provide the
    direct sound.
  • With the supercardioid pattern we have a 6dB
    direct/reverberant ratio, so the reverberation is
    less than the direct sound by about 6dB.
  • Note that in this space we must add hall sound
    and early reflections very carefully, or the
    sound will become muddy!
  • In addition the early reflections and
    reverberation arrive soon after the direct sound.
    The sound seems small and cramped. There is no
    sense of space around the direct sound.
  • The chorus microphones are as close as they can
    be to the chorus without creating balance
    problems.
  • We cannot exclude the early reverberation by
    moving the mikes closer.

29
Main microphones in Sweedenborg Chapel
  • The picture also shows two variable pattern
    microphones about 2m from the chorus.
  • I put these there for an experiment. The sound
    is not very good
  • The problem with a main microphone pair in this
    space is that it must be placed too far from the
    singers!
  • A main pair must be at least 2.5m away or there
    will be balance problems.
  • This distance is beyond the reverberation radius,
    and the sound will be muddy.

30
Hall Sound in Sweedenborg
  • The chapel is reverberant with a high
    reverberation level
  • But the reverberation is too strong in the
    10-150ms time range.
  • Using cardioid microphones pointing away from the
    sound source reduces the early reverberation
    energy and maximizes the late energy.
  • The hall sounds larger and better.

31
Distance Perception and MUD
  • Reflections during the sound event and up to
    150ms after it ends create the perception of
    distance
  • But there is a price to pay
  • Reflections from 10-50ms do not impair
    intelligibility.
  • The fluctuations they produce are perceived as an
    acoustic halo or airaround the original sound
    stream. (ESI)
  • Reflections from 50-150ms contribute to the
    perception of distance but they degrade both
    timbre and intelligibility, producing the
    perception of sonic MUD.
  • We will have many examples of mud in this talk!

32
Training to hear MUD
  • Mud occurs when the reverberant decay of the
    recording venue has too much reflected energy in
    the 10-150ms region of the decay curve.
  • This is true of nearly all sound stages, small
    auditoria, and churches.
  • If you are recording in such a space with a
    relatively large ensemble, you are in trouble.
  • The perception of mud can be tricky, because our
    hearing mechanism adapts to a muddy environment,
    and the sonic degradation becomes inaudible after
    about 10 minutes.
  • It is easy to convince yourself the recording is
    excellent when you have been listening to it all
    day.
  • This is why we can enjoy a concert even when we
    are sitting far from the instruments.
  • You MUST compare your recording to a reference
    recording in a short time A/B test.

33
Example John Eargle at Skywalker ranch
  • John Eargle has made wonderful recordings,
    particularly those with the Dallas Symphony on
    Delos Records
  • But even he can be fooled by a small space
  • As I said, you adapt quickly to such a space, and
    no longer hear the mud that it produces.
  • John Eargle recently made a 5.1 channel DVD audio
    recording at the Skywalker ranch in Los Angeles.
  • He was very excited by it but listen and
    compare to Dallas.
  • Skywalker is a large sound stage with
    controllable acoustics. It is not a concert
    hall.
  • As a consequence the reverberation radius is
    relatively short. By my estimate (without having
    seen it) the radius is less than 3.5 meters.
  • It is very easy to record mud in such a space.
  • Many instruments are beyond the reverb radius.
  • Adding more microphones only increases the
    reverberant pickup.

34
Recording in a large space is much easier!
Covenant church is a very large space, holding
more than 1000 people. It is damped by pew
cushions and acoustic treatment on the walls,
yielding a RT of 2.5 seconds and a large
reverberation radius probably above 3m. The
microphones can be quite distant without picking
up early reflections or reverberation. It is a
very good place to record! (And it is
exceptionally beautiful visually)
35
Example depth perspective through mike
technique
  • When the reverberation radius is large enough we
    can use an extra pair of microphones to create a
    single early reflection.
  • This can provide the needed perspective and depth

Direct sound Early reflection Late
reverberation Direct Early -5dB Direct
Early Late -8dB
Mike 480L
Mike 480L
36
The depth impression is greatly improved in
surround
  • I will run the same experiment, but use all five
    speakers.
  • The early reflections will come from both the
    front and rear equally, but different delay
    patterns will be used for each speaker.
  • This means the reflections are decorrelated.
  • The late (hall) reflections will also come
    equally but decorrelated in the front and rear
    speakers.
  • This will create a large and uniform sweet spot
    for the acoustics.

37
The Polyhymnia Pentangle
  • The Polyhymnia engineers employ a surround array
    of spaced omni microphones, at a spacing similar
    to the ITU playback array.
  • The technique works well in spaces where the
    reverberation radius is equal to or greater than
    the microphone spacing!
  • In this case the direct sound picked up by the
    rear microphones is perceived as an early lateral
    reflection and the adds distance to the front
    image.
  • Caution!! In a small hall this array will be TOO
    MUDDY!!!

In practice the Polyhemnia engineers often pick
up the direct sound with accent microphones. In
this case the front microphones provide a first
reflection to the front speakers. The center
microphone is also often moved closer to the
sound sources, so it picks up mostly direct sound.
38
Boston Symphony Hall
39
Boston Symphony Hall
  • 2631 seats, 662,000ft3, 18700m3, RT 1.9s
  • Its enormous!
  • One of the greatest concert halls in the world
    maybe the best.
  • Recording here is almost too easy!
  • Working here is a rare privilege
  • Sufficiently rare I do not do it. (Its a union
    shop.)
  • The recording in this talk is courtesy of Alan
    McClellan of WGBH Boston. (Mixed from 16 tracks
    by the presenter)
  • Reverb Radius is gt20 (gt6.6m) even on stage.
  • The stage house is enormous and NOT reverberant.
    With the orchestra in place, stage house RT 1
    sec

40
Boston Symphony Hall, occupied, stage to front of
balcony, 1000Hz
This picture compares favorably to our picture of
the ideal reverberation on a recording. But this
is what an audience member hears 100 feet from
the stage!
41
Boston Symphony Orchestra in Symphony Hall
42
Boston Cantata Singers in Symphony Hall. March
17, 2002
43
Microphone Array (WGBH)
44
Beware the main microphone array
  • Nearly all engineers will provide a main
    microphone usually a Decca Tree, or a pair of
    omni or cardioid microphones.
  • Almost always the sound from this array is only
    acceptable for instruments close to the
    microphones.
  • Most of the instruments are far beyond the
    reverberation radius.
  • The more distant instruments must be spot-miked.
  • A cardioid pair (ORTF) has too much phantom
    center for an acceptable surround recording.
    (this is a two-channel technique only.)
  • Very frequently time delay panning (for a Decca
    Tree or spaced omnis) makes the sound unusable
    in a high-quality mix.
  • Time delay panning makes the front image unstable
  • Closely spaced microphones yield high correlation
    at low frequencies, which degrades the sense of
    space.
  • It is better to simply turn off the main
    microphone (even if your instructor insists you
    install one.)
  • In our Boston Symphony Hall recording a pair of
    BK omnis spaced 25cm was hung behind the
    conductor by the WGBH engineer.

Front pair
Front pair LF
45
Correlation in the main microphone two omnis
spaced by 25cm, just behind the conductor.
___ measured correlation - - - calculated,
assuming d25cm
The high correlation in this pair makes the sound
unusable in a stereo or surround mix. It sounds
unpleasant even in this lecture room, as the
audio demo makes clear.
46
Beware the exclusive use of spaced front
microphones
  • In our recording the wide front orchestra pick-up
    is fine for the first row of the strings.
  • But nearly all the orchestra is beyond the
    reverberation radius for these microphones.
  • If we want good balance and clarity, we must use
    additional microphones over the orchestra
  • And treat these microphones as part of our main
    array.
  • Using cardioid microphones in front will help a
    lot.
  • The cardioid is 4.7 dB less sensitive to
    reverberation, which will pick out more distant
    instruments with clarity.
  • Using super cardioid microphones will help a
    little bit more.
  • But if the stage house is reverberant the
    improvement is minimal.
  • The author greatly prefers to use (equalized)
    directional microphones for orchestra and chorus
    pick-up.
  • After equalization the bass performance is
    adequate.
  • There is better control of leakage, and less MUD.

47
Balance and distance come first
  • In any recording the balance between the musical
    forces should reflect the needs of the music.
  • In this recording, even with 120 singers the
    chorus is nearly inaudible in the hall.
  • So we must heavily use the chorus accent
    microphones.
  • In the final mix MOST of the energy in the
    recording will come from these. In practice,
    these are our MAIN microphones!
  • However, if we heavily use the chorus
    microphones, the chorus will sound too close to
    the loudspeakers
  • And in front of the orchestra.
  • To correct this distance problem we MUST use
    electronic early reflections.
  • There is no other possible solution.
  • Play example

48
Lets build the hall sound
  • We need decorrelated reverberation in both the
    front and the rear with equal level
  • Test just the hall microphones to see if the
    reverberation is enveloping and uniform.
  • Then add the front microphones for the direct
    sound.
  • Where the hall balance is not correct you MUST
    augment the natural reverberation with
    electronics.
  • In this recording the orchestra is much stronger
    than the chorus even with 120 singers and
    there is too little chorus in the natural
    reverberation!!
  • When we add the accent microphones the chorus
    will sound as if they are in a smaller space.
  • So we add electronic reverberation from the
    chorus (equally in all four outer speakers) from
    the surround reverberator.

49
Final Mix
  • The final mix uses the three omni microphones
    over the chorus as the main microphones. They
    are simply patched to left, center, and right.
  • The spot microphones for the soloists are mostly
    mixed to the center, with some panning to the
    left or right. (No divergence was used.)
  • The orchestra is a combination of two wide spaced
    omnis patched to left front and right front.
  • Augmented by spot microphones over the woodwinds
    and the more distant strings.
  • the center channel was provided automatically
    through leakage from the soloistss microphones.
  • The rear channels come from a widely spaced pair
    of omnis about 20 feet behind the conductor,
  • Extensively augmented by electronic early
    reflections and late reverberation.

50
Hall sound decorrelation at low frequencies.
  • It is widely believed that localization is
    impossible below 100Hz.
  • So a single subwoofer has become the standard for
    reproducing low frequencies.
  • Although localization below 100Hz is difficult in
    a small room, there is a large difference between
    a single subwoofer and an independently driven
    pair.
  • We have turned off the subwoofer in this room and
    we are running the other speakers full-range.
  • A great recording will easily demonstrate the
    difference between a single subwoofer and
    full-range discrete speakers.
  • As a consequence you must be sure the hall sound
    in your recordings is decorrelated at low
    frequencies!
  • Both in the front and in the rear of a surround
    recording.
  • Most single microphone array surround techniques
    fail for this reason.

51
Conclusions
  • A great recording
  • Has a stable front image over a large listening
    area.
  • Has direct sound stronger than early reflections,
    microphone leakage, and reverberation.
  • So it is not MUDDY!
  • Has decorrelated early reflections both in the
    front speakers and in the rear speakers.
  • These provide a sense of blend and depth to the
    recording. But be sure to mix in an absorbent
    space!
  • Has decorrelated late reverberation in both the
    front and the back speakers.
  • The decorrelation must be active for low
    frequencies
  • It is possible to make a great recording in a
    small space
  • But if the group is physically larger than the
    reverberation radius, electronic early
    reflections and reverberation will probably be
    necessary.

52
Medial Reflections the detection of muddiness.
  • Medial reflections can cause clear differences in
    quality.
  • We can measure medial energy through an analysis
    of pitch.
  • Pitch information is available in each critical
    band, even those above the frequency of auditory
    phase-locking.
  • Here is an example of speech filtered into a
    1000Hz 1/3 octave band.

The waveform appears to be a series of decaying
tone bursts, repeating at the fundamental
frequency. When this signal is rectified, there
is substantial energy at the fundamental
frequency.
53
Waveform of speech formants
The waveform of the word five in the 2kHz 1/3
octave band.
The same, but convolved with a 20ms windowed
burst of white noise, simulating a diffuse
reflection, or the sound of a small reverberant
room.
Non-reverberant speech has a clear repeating
pattern in the waveform. Reverberant speech does
not. We can devise a measurement system around
this difference.
54
The plus/minus pitch detector
The pitch detector operates separately on each
third octave band. Each band is rectified and
low-pass filtered. The output is delayed, and
then added and subtracted from the undelayed
signal. The logs of the plus signal and the
minus signal are then subtracted from each
other. The result has a high sensitivity to
fundamental pitch.
55
Example one, two 2500Hz 1/3 octave band.
Pitch detector output with dry speech the
syllables one, two with no added reverberation.
Note the high accuracy of the fundamental
extraction and the gt15dB S/N
56
Same but convolved with 20ms of white noise
Convolving with white noise does not change the
intelligibility, nor the C80, but dramatically
changes the sound and the pitch coherence. By
chance the second syllable is not seriously
degraded, but the first one is at least in this
1/3 octave band The sound quality is markedly
degraded. We need a measure for this perception.
57
one,two 2500Hz band equal mix of direct and
one diffuse reflection at 30ms.
The high pitch coherence and high
direct/reverberant ratio in the first 30ms is
easily seen at the start of each syllable.
58
Segment of opera old Bolshoi
Segment from the old Bolshoi
Segment from the new Bolshoi. (I was unable to
produce a similar plot.)
Segment of Verdi pitch coherence of the 2500Hz
1/3 octave band. F, F, glide to A. Recording
from the back of the first balcony. There is no
obvious gap before reflections arrive, and the
pitch coherence appears relatively high.
59
Sound examples syllables one,two,three with
no reverberation
1kHz 1/3 octave band
1.25kHz
1.6kHz
2kHz
2.5kHz
3.2kHz Note the height and frequency
of the pitch coherence peaks are (almost) uniform
through all bands.
60
Maximum pitch coherence vs 1/3 octave bandfor
non-reverberant speech
The syllables one two three four five six seven
are analyzed. Note that the maximum pitch
coherence is relatively constant across all 1/3
octave bands, although the value depends on the
particular vowel
61
one,two,three convolved with 20ms noise
1kHz
1.25kHz
1.6kHz
2kHz
2.5kHz
3.2kHz
Note that most of the pitch coherence has been
eliminated
62
Maximum pitch coherence vs /3 octave bandsfor
speech convolved with 20ms noise.
The syllables one two three four five six seven
are analyzed. Note the pitch coherence is low and
not constant across third octave bands.
63
Pitch coherence of speech with a diffuse
reflection at a level of 0dB
1kHz
1.25kHz
1.6kHz
2kHz
2.5kHz
Note the low pitch coherence for some
of the syllables in several bands
64
Maximum pitch coherence vs 1/3 octave bands for
direct reverb at 0dB
Analysis of the syllables one two three four
five six seven. Note the low and noise-like
coherence for most of the syllables.
65
Pitch coherence of speech with a diffuse
reflection at a level of -4dB (optimum)
1kHz
1.25kHz
1.6kHz
2kHz
2.5kHz
3.2kHz Note the high pitch
coherence on most syllables in most bands. This
reflection level is usually chosen as optimum.
66
Max pitch coherence vs 1/3 octave band for direct
and reflected at -4dB
Analysis of the syllables one two three four
five six seven. Note the pitch coherence is
both high and uniform across 1/3 octave bands
67
Teatro Alla Scala, Milan
Echograms from LaScala. (From Hidaka and
Beranek) illustrate these profiles Top curve -
2kHz octave band, 0-200ms At 2kHz note the high
direct sound and low level of reflections in the
50-150ms time range. Bottom curve - 500Hz octave
band 0-200ms Note the high reverberation level
and short critical distance.
68
Lets listen to Alla Scala!
  • Matlab can be used to read these printed impulse
    responses and convert them into real impulse
    responses.
  • 1. First we read the .bmp file from a scan, and
    convert the peaks in the file to delta functions
    with identical time delay, and an amplitude
    equivalent to the peak height.
  • All the direct sound energy is combined into a
    single delta function, and the level of the
    direct sound is normalized (relative to the rest
    of the decay), so the 2kHz and 500kHz impulses
    can be accurately combined.
  • 2. We then apply a random variable - 5ms to the
    delay time to correct for the quantization in the
    scan.
  • 3. We then extend the echogram to higher times by
    tacking on an exponentially decaying segment of
    white noise, with a decay rate equal to the
    published data for the hall.
  • 4. We then filter the result for the 2kHz
    echogram with a 1k high-pass filter, and combine
    it with the 500Hz echogram low-pass filtered at
    1kHz.
  • 5. If desired we can create a right channel and
    a left channel reverberation by using a
    different set of random variables in steps 2 and
    3.
  • 6. We convolve a segment of dry sound with the
    new impulse response.
  • The result is sonically quite convincing!

69
Alla Scala at 500Hz reading the plot
Top curve 500Hz measured impulse response as
given by Beranek. JASA Vol. 107 1, Jan 2000, pp
356-367 Bottom curve impulse response as
regenerated from delta functions, passed through
a 500Hz 6th order 1 octave filter. Note the
correspondence is more than plausible.
70
Alla Scala 500Hz randomizing and extending
Top graph Alla Scala published data Bottom
graph regenerated impulse response after
randomization and extension.
71
Pitch coherence of speech in La Scalla
1kHz
1.25Hz
1.6kHz
2kHz
2.5kHz
3.2kHz Note the excellent
sharpness of the pitch peaks, and good
consistency across bands.
72
Maximum coherence vs 1/3 octave bands La Scala,
Milan
Pitch coherence is similar to our example where
the direct/reverberant ratio 4dB While not as
clear as in some examples, fundamental pitch is
easily extracted using this simple detector.
73
Listen to Alla Scala, NNT Tokyo, Semperoper
2kHz
500Hz
2kHz and 500Hz Impulse responses from Scala
Milan NNT Theater Tokyo Semper Oper
Dresden (All data from Hidaka and Beranek)
Original Sound
74
Pitch Coherence NNT opera house, Tokyo
1kHz
1.25kHz
1.6kHz
2kHz
2.5kHz
3.2kHz Note the peaks where they
exist are very broad, indicating inexact pitch
extraction. For most bands, there is no
extracted pitch for all syllables.
75
Maximum coherence vs 1/3 octave band NNT Opera
Theater, Tokyo
Fundamental pitch is not extractable using this
simple detector.
76
Binaural Examples in Opera Houses
  • It is very difficult to study opera acoustics, as
    the sound changes drastically depending on
  • the set design,
  • the position of the singers (actors),
  • the presence of the audience, and
  • the presence of the orchestra.
  • Binaural recordings made during performances give
    us the only clues.
  • Here is a sound bite from a famous German opera
    house Note the excessive distance of the
    singers, and the low intelligibility. This is
    MUD in action!
  • And here is an example from another famous German
    opera house Note the increase in
    intelligibility, reduced distance, and the
    improvement in dramatic connection between the
    singer and the audience.

77
Synthetic Opera House Study
  • We can use MC12 Logic 7 to separate the orchestra
    from the singers on commercial recordings, and
    test different theories of balance and
    reverberation.
  • From Elektra Barenboim. Balance in original
    is OK by Barenboim.

Original Orchestra LeftRight Vocals Downmix -
No reverb on the singers Reverb from
orchestra Reveb from singers Downmix with
reverb on the singers.
78
Muddiness Dry Speech 20ms noise
Mono speech signal Convolved with noise
(diffuse reflections) Mono Stereo Note the
reflections increase muddiness and distance. The
stereo version is more natural than the mono, but
equally distant.
79
Recorded speech in Covenant
Voice segment recorded at 1.5m with a
supercardioid mike The same segment with the
reflections below. Note the muddiness increases
dramatically A frequency-flat reflection pattern
with peak energy about 30ms after the direct sound
80
Demo 1 Clarity
  • Demonstrate dry sound
  • Demonstrate muddy sound by adding reflections in
    monaural.
  • Note that adding only very early reflections does
    not decrease the intelligibility.
  • But it increases the perceived distance of the
    source.
  • Demonstrate adding reflections in surround
  • Note that adding the reflections in surround
    increases the perceived distance more
    effectively.
  • Less reflected energy is needed, and the direct
    sound remains clear.
  • The optimum early energy is between -4dB and -6dB
Write a Comment
User Comments (0)
About PowerShow.com