Title: Sound%20and%20Music%20for%20Video%20Games
1Sound and Music for Video Games
- Technology Overview
- Roger Crawfis
- Ohio State University
2Overview
- Fundamentals of Sound
- Psychoacoustics
- Interactive Audio
- Applications
3What is sound?
- Sound is the sensation perceived by the sense of
hearing - Audio is acoustic, mechanical, or electrical
frequencies corresponding to normally audible
sound waves
4Dual Nature of Sound
- Transfer of sound and physical stimulation of ear
- Physiological and psychological processing in ear
and brain (psychoacoustics)
5Transmission of Sound
- Requires a medium with elasticity and inertia
(air, water, steel, etc.) - Movements of air molecules result in the
propagation of a sound wave
6Longitudinal Motion of Air
7Wavefronts and Rays
8Reflection of Sound
9Absorption of Sound
- Some materials readily absorb the energy of a
sound wave - Example carpet, curtains at a movie theater
10Refraction of Sound
11Refraction of Sound
12Diffusion of Sound
- Not analogous to diffusion of light
- Naturally occurring diffusions of sounds
typically affect only a small subset of audible
frequencies - Nearly full diffusion of sound requires a
reflection phase grating (Schroeder Diffuser)
13The Inverse-Square Law (Attenuation)
I is the sound intensity in W/cm2 W is the sound
power of the source in W r is the distance from
the source in cm
14The Skull
- Occludes wavelengths small relative to the
skull - Causes diffraction around the head (helps amplify
sounds) - Wavelengths much larger than the skull are not
affected (explains how low frequencies are not
directional)
15The Pinna
16Ear Canal and Skull
- (A) Dark line ear canal only
- (B) Dashed line ear canal and skull diffraction
17Auditory Area (20Hz-20kHz)
18Spatial Hearing
- Ability to determine direction and distance from
a sound source - Not fully understood process
- However, some cues have been identified as useful
19The Duplex Theory of Localization
- Interaural Intensity Differences (IIDs)
- Interaural Arrival-Time Differences (ITDs)
20Interaural Intensity Difference
- The skull produces a sound shadow
- Intensity difference results from one ear being
shadowed and the other not - The IID does not apply to frequencies below
1000Hz (waves similar or larger than size of
head) - Sound shadowing can result in up to 20dB drops
for frequencies gt6000Hz - The Inverse-Square Law can also effect intensity
21Head Rotation or Tilt
- Rotation or tilt can alter interaural spectrum in
predictable manner
22Interaural Arrival-Time Difference
- Perception of phase difference between ears
caused by arrival-time delay (ITD) - Ear closest to sound source hears the sound
before the other ear
23Digital Sound
- Remember that sound is an analogue process (like
vision). - Computers need to deal with digital processes
(like digital images). - Many similar properties between computer imagery
and computer sound processing.
24Class or Semantics
- Sample
- Stream Sounds
- Music
- Tracks
- MIDI
25Sound for Games
- Stereo doesnt cut it anymore you need
positional audio. - Positional audio increases immersion
- The Old Vary volume as position changes
- The New Head-Related Transfer Functions (HRTF)
for 3d positional audio with 2-4 speakers - Games use
- Dolby 5.1 requires lots of speakers
- Creatives EAX environmental audio
- Aureals A3D good positional audio
- DirectSound3D Microsofts answer
- OpenAL open, cross-platform API
26Audio Basics
- Has two fundamental physical properties
- Frequency (the pitch of the wave oscillations
per second (Hertz)) - Amplitude (the loudness or strength of the wave -
decibels)
27Sampling
- A sound wave is sampled
- measurements of amplitude taken at a fast rate
- results in a stream of numbers
28Data Rates for Sound
- Human ear can hear frequencies between ?? and ??.
- Must sample at twice the highest frequency.
- Assume stereo (two channels)
- Assume 44Khz sampling rate (CD sampling rate)
- Assume 2 bytes per channel per sample
- How much raw data is required to record 3 minutes
of music?
29Waveform Sampling Quantization
- Quantization
- Introduces
- Noise
- Examples 16, 12, 8, 6, 4 bit music
- 16, 12, 8, 6, 4 bit speech
30Limits of Human Hearing
- Time and Frequency
- Events longer than 0.03 seconds are resolvable in
time - shorter events are perceived as features in
frequency - 20 Hz. lt Human Hearing lt 20 KHz.
- (for those under 15 or so)
- Pitch is PERCEPTION related to FREQUENCY
- Human Pitch Resolution is about 40 - 4000
Hz.
31Limits of Human Hearing
- Amplitude or Power???
- Loudness is PERCEPTION related to POWER,
not AMPLITUDE - Power is proportional to (integrated) square of
signal - Human Loudness perception range is about 120 dB,
- where 10 db 10 x power 20 x
amplitude - Waveform shape is of little consequence. Energy
at each frequency, and how that changes in
time, is the most important feature of a sound.
32Limits of Human Hearing
- Waveshape or Frequency Content??
- Here are two waveforms with identical power
spectra, and which are (nearly) perceptually
identical - Wave 1
- Wave 2
- Magnitude
- Spectrum
33Limits of Human Hearing
- Masking in Amplitude, Time, and Frequency
- Masking in Amplitude Loud sounds mask soft
ones. - Example Quantization Noise
- Masking in time A soft sound just before a
louder - sound is more likely to be heard than if it is
just after. - Example (and reason) Reverb vs. Preverb
- Masking in Frequency Loud neighbor frequency
- masks soft spectral components. Low sounds
- mask higher ones more than high masking low.
34Limits of Human Hearing
- Masking in Amplitude
- Intuitively, a soft sound will not be heard if
there is a competing loud sound. Reasons - Gain controls in the ear
- stapedes reflex and more
- Interaction (inhibition) in the cochlea
- Other mechanisms at higher levels
35Limits of Human Hearing
- Masking in Time
- In the time range of a few milliseconds
- A soft event following a louder event tends to be
grouped perceptually as part of that louder event - If the soft event precedes the louder event, it
might be heard as a separate event (become
audible)
36Limits of Human Hearing
- Masking in Frequency
- Only one component in this spectrum is
- audible because of frequency masking
37Sampling Rates
- For Cheap Compression, Look at Lowering the
Sampling Rate First - 44.1kHz 16 bit CD Quality
- 8kHz 8 bit MuLaw Phone Quality
- Examples
- Music 44.1, 32, 22.05, 16, 11.025kHz
- Speech 44.1, 32, 22.05, 16, 11.025, 8kHz
38Views of Digital Sound
- Two (mainstream) views of sound and their
implications for compression - 1) Sound is Perceived
- The auditory system doesnt hear everything
present - Bandwidth is limited
- Time resolution is limited
- Masking in all domains
- 2) Sound is Produced
- Perfect model could provide perfect compression
39Production Models
- Build a model of the sound production system,
then fit the parameters - Example If signal is speech, then a
well-parameterized vocal model can yield highest
quality and compression ratio - Benefits Highest possible compression
- Drawbacks Signal source(s) must be assumed,
known, or identified
40MIDI and Other Event Models
- Musical Instrument Digital Interface
- Represents Music as Notes and Events
- and uses a synthesis engine to render it.
- An Edit Decision List (EDL) is another example.
- A history of source materials, transformations,
and processing steps is kept. Operations can be
undone or recreated easily. Intermediate
non-parametric files are not saved.
41Event Based Compression
- A Musical Score is a very compact representation
of music - Benefits
- Highest possible compression
- Drawbacks
- Cannot guarantee the performance
- Cannot assure the quality of the sounds
- Cannot make arbitrary sounds
42Event Based Compression
- Enter General MIDI
- Guarantees a base set of instrument sounds,
- and a means for addressing them,
- but doesnt guarantee any quality
- Better Yet, Downloadable Sounds
- Download samples for instruments
- Benefits Does more to guarantee quality
- Drawbacks Samples arent reality
43Event Based Compression
- Downloadable Algorithms
- Specify the algorithm, the synthesis engine runs
it, and we just send parameter changes - Part of Structured Audio (MPEG4)
- Benefits
- Can upgrade algorithms later
- Can implement scalable synthesis
- Drawbacks
- Different algorithm for each class of sounds
(but can always fall back on samples)
44Compressed Audio Formats
Name Extension Ownership
AIFF (Mac) .aif, .aiff Public
AU (Sun/Next) .au Public
CD audio (CDDA) N/A Public
MP3 .mp3 MPEG Audio Layer-III
Windows Media Audio .wma Proprietary (Microsoft)
QuickTime .qt Proprietary (Apple)
RealAudio .ra, ram Proprietary (Real Networks)
WAV .wav Public
45To be continued
- Stop here
- Sound Group Technical Presentations.
- Suggested Topics
- Compression
- Controlling the Environment
- ToolKit I features
- ToolKit II features
- Examples and Demos
46Environmental Effects
- Obstruction/Occlusion
- Reverberation
- Doppler Shift
- Atmospheric Effects
47Obstruction
- Same as sound shadowing
- Generally approximated by a ray test and a low
pass filter - High frequencies should get shadowed while low
frequencies diffract
48Obstruction
49Occlusion
- A completely blocked sound
- Example A sound that penetrates a closed door or
a wall - The sound will be muffled (low pass filter)
50Reverberation
- Effects from sound reflection
- Similar to echo
- Static reverberation
- Dynamic reverberation
51Static Reverberation
- Relies on the closed container assumption
- Parameters used to specify approximate
environment conditions (decay, room size, etc.) - Example Microsoft DirectSound3D EAX
52Static Reverberation
53Dynamic Reverberation
- Calculation of reflections off of surfaces taking
into account surface properties - Typically diffusion and diffraction ignored
- Wave Tracing
- Example Aureal A3D 2.0
54Dynamic Reverberation
55Comparison
- Static Reverberation less expensive
computationally, simple to implement - Dynamic Reverberation very expensive
computationally, difficult to implement, but
potentially superior results
56Doppler Shift
- Change in frequency due to velocity
- Very susceptible to temporal aliasing
- The faster the update rate the better
- Requires dedicated hardware
57Atmospheric Effects
- Attenuate high frequencies faster than low
frequencies - Moisture in air increases this effect