Color constancy in the auditory system: Cancellation of reliable spectral characteristics - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

Color constancy in the auditory system: Cancellation of reliable spectral characteristics

Description:

Color constancy in the auditory system: Cancellation of reliable spectral characteristics – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 65
Provided by: arielsh
Category:

less

Transcript and Presenter's Notes

Title: Color constancy in the auditory system: Cancellation of reliable spectral characteristics


1
  • Color constancy in the auditory system
    Cancellation of reliable spectral
    characteristics
  • with Michael J. Kiefte
  • Dalhousie University

2
? m e r I k ? z d eI r i l æ n d i
t ts i z o r d aI
3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
Replace al and ar with pure tone at F3 offset
frequency.
Same pattern of da-ga perception for
sine-waves and full-spectrum al and ar.
7
(No Transcript)
8
/eda/
/eba/
/oba/
/oda/
9
(No Transcript)
10
/eda/
/eba/
/oba/
/oda/
11
(No Transcript)
12
At Wisconsin Flanking spectral energy influences
perception of vowels.   Sinewaves replacing F2
in CVCs   beb-b?b- ded-d?d- (Holt, Lotto,
Kluender, 2000)   Flanking energy influences
perception of consonants.   Sinewaves replacing
F3 in VCCVs   arda-arga alda-alga (Lotto
Kluender, 1998)   Sinewaves replacing F2 in VCVs
  iba-ida uba-uda (Holt,
1999)   Band-limited and band-pass harmonics for
CVCs iba-ida uba-uda (Coady, Kluender,
Rhode, 2003)  Complementary spectra for VCVs
  iba-ida uba-uda (Coady, Kluender,
Rhode, 2003)
13
Spectral Contrast
  • Contrast effects have been demonstrated for every
    modality.
  • Lightness, color, spatial frequency, apparent
    velocity
  • Loudness, pitch
  • Test, touch, smell

14
(No Transcript)
15
Sensorineural systems respond to change, and to
little else.
  • Optimizes restricted dynamic range of biological
    transducers to function cross much larger
    physical dynamic range.
  • Common process is to neglect that which does not
    change.
  • Adaptation is ubiquitous mechanism that
    highlights change.
  • Many other mechanisms for increasing sensitivity
    to change.
  • Absolute values have little relevance to
    perception.

16
(No Transcript)
17
Hybrid stimuli were generated by setting the
spectral tilt of one vowel to the natural tilt of
another. In this example the spectral tilt of
/u/ (blue line) is raised to match that of /i/
(gold line) with the result given by the red line.
18
Proportion of i responses showing significant
effect for both spectral tilt and F2 and F3
formant frequency. Data is aggregated from 14
subjects. The red shaded region indicates -1
standard error for the 50 crossover estimated
from bootstrap resampling of individual subjects.
19
Proportion of /i/ responses with modified
carrier in which spectral tilt is matched target
vowel. Listeners respond primarily on the basis
of formant frequency information.
20
Proportion of /i/ responses with modified carrier
filtered by single pole with frequency and
bandwidth of target vowel F2. Listeners respond
primarily on basis of spectral tilt. Some effect
of F3 which may be confused for F2.
21
Proportions of /i/ responses with modified
random carrier with spectral tilt adjusted to
match that of target vowel.
22
Proportion of /i/ responses with modified
random carrier with additional pole corresponding
in frequency and bandwidth to F2 of target vowel.
23
(Fig 1.) Responses to RSS. (A to D) Four RSS in a
set containing 83 stimuli with spectral contrast
of 10 db SD (dotted lines). (E) FRA of AC neuron
computed with pure tones of differing
level/ Frequency combinations. Driven rate is
computed over entire stimulus duration. Atten,
attenuation Sp, spikes. (F) RSS WFs were
computed at various mean sound levels from 83
stimuli. Shape but not Magnitude of WF remains
constant. (G and H) Raster plots of action
potentials (spikes) in response To tones at 70 db
attenuation and to RSS at 80 db attenuation
(arrows in E and F). Shaded areas Represent
stimulus duration. Barbour Wang, Science,
2003.
24
(Fig. 2) WF shape remains relatively constant
throughout stimulus duration. Shown Is mean WF
similarity less than or equal to 100 ms (n 90
neurons) and more than 100 ms (n 22 neurons).
Barbour Wang, Science, 2003.
25
  • (Fig. 3) Two opposite responses to spectral
    contrast. (A) Tuning of the neuron in Fig. 1 to
    tones (?)
  • As compared with RSS WF ( ). (B) The WF was
    converted into OLS over several spectral
  • Contrast values (5 to 20 db SD). Stimulus
    spectra were smoothed for illustration only.
  • Rate-level response curves for the four stimuli
    in (B). Lowest contrast elicited least
  • Response highest contrast, greatest. Threshold
    shifts commonly occur as contrast is varied.
  • (D) The peak rates from the rate level curves in
    (C) (filled symbols) are plotted against stimulus
  • Contrast to produce a monotonic rate-contrast
    curve. (E) Tuning of another neuron tones (?),
  • 0.4-octave BPN ( ), and RSS WF ( ). (F)
    Smoothed spectra of OLS at contrast values
  • Of 0 to 20 db SD. (G) Rate-level curves showing
    decreased responsiveness at the highest
    contrasts.
  • (H) Rate-contrast curve revealing nonmonotonic
    characteristics. Barbour Wang, Science, 2003.

26
(Fig. 4) Population responses to spectral
contrast. (A and B) Rate-contrast curves for 54
high (blue) and 36 low (green) contrast neurons.
(C) Mean rate-contrast curves for high and
low- contrast neurons. Error bars show standard
error of the mean (SEM). (D) Percentage of
neurons whose rate-contrast peaks occurred at the
given contrast values. (E) Mean - SEM
rate-level curves showing lower thresholds and
higher rates for high- than for low-contrast
neurons but similar shapes. (F) Percentage of
neurons whose rate-level peaks occurred at the
given sound levels. Magnitude of
contrast preference (max absolute rate-contrast
slope with sign preserved) are shown as a
function of (G) CF and (H) orthogonal distance
lateral to the lateral sulcus and (I) as a
histogram. Barbour Wang, Science, 2003.
27
(Fig. 5) Canonical responses to spectral
contrast. This coding scheme reflects
complex Multi-frequency signal integration that
cannot be predicted from frequency tuning alone.
Spontaneous discharge rate. Barbour Wang,
Science, 2003.
28
(a) Influence of stimulus-specific adaptation
(SSA) on the frequencyresponse curve. Thick
black line, before adaptation magenta line,
during adaptation thin black line, after 30 s of
recovery blue tic mark on x-axis, adapting
frequency (3.33 kHz). (b) The oddball stimuli.
Each stimulus set consisted of three blocks
(13). In block 1, the lower-frequency tone (f1)
was common ('standard') and the higher frequency
tone (f2) was rare ('deviant'). In block 2, the
roles were reversed f2 was standard, and f1 was
deviant. In control block 3, f1 and f2 were mixed
with 50/50 probability. (c) The receptive field
of an A1 neuron, with white lines denoting the
frequencies f1 and f2 (for f 0.10) and the
tone amplitude used in the oddball stimuli these
fall well inside the receptive field. (d)
Res-ponses (peri-stimulus time histogram, PSTH)
of the neuron in c to the oddball stimuli. Shown
are responses to the four stimulus sets (columns)
and the two frequencies f1 and f2 (rows). Red
lines, responses to deviant blue lines,
responses to standard black lines, responses to
control (50/50 probability). Each panel thus
represents responses to the same physical
stimulus in different probability contexts. (e)
Responses of the same neuron, averaged over f1
and f2 separately for each probability condition.
The small bars denote spike counts, and a star
indicates significantly larger responses to the
deviant than to the standard (t-test, one-tailed,
P lt 0.0001). Scales in d and e are identical. (f)
Responses of another A1 neuron, presented as in
e. Ulanovsky, Las, Nelken, Nature, 2003.
29
(a) Population PSTHs. (b) The difference signal
(DS) between the population PSTHs for deviant and
standard DS deviant -standard. (c) Histogram
of the neuron-specific SI for all neurons. The
number of neurons with SI gt 0 (deviant stronger
than standard on average) and SI lt 0 is shown.
(d) Scatterplot of SI(f2) versus SI(f1) for all
neurons, with the number of dots above and below
the diagonal. (e) Left, schematic frequency
response curve of a hypothetical neuron,
undergoing either stimulus-insensitive adaptation
('fatigue', thick lines) or SSA (thin lines).
Right, plot of SI(f2) versus SI(f1) for
stimulus-insensitive adaptation (filled circle)
and for SSA (open square). (f) A high-resolution
plot of the initial portion of DS, showing DS
dependence on f (for 90/10). Blue ticks are
latency of the population response to the
standard (12, 13 and 16 ms for f 0.37, 0.10
and 0.04, respectively). (g) Time course of
adaptation for the standard and deviant stimuli
average population spike count versus serial
trial number, for 90/10, f 0.37. (h)
Population comparison of the frequency response
curve with the responses to the deviant (red) and
to the standard (blue) see Methods. The six
frequencies represent the three f values, in
pairs (for example, f 0.37 for two outermost
points). Error bars, population mean s.e.m. The
number of neurons here was smaller than in c, as
only neurons that had frequency response curve
data were included here. Ulanovsky, Las,
Nelken, Nature, 2003.
30
(ac) Scatterplots of the discriminability,
90/10 versus 50/50, for f 0.37, 0.10 and
0.04. Each dot represents one neuron, and the
number of dots above and below the diagonal are
shown. Dots above the diagonal mean better
discriminability for the 90/10 condition than
for 50/50. The total number of neurons is
smaller than in Fig. 2c because only neurons that
had 50/50 data were included here. The
populations of the most sensitive neurons (top
10), which differ in size and composition for
the three f values, are marked for 90/10 ()
and for 50/50 ( ). Stars () denote neurons that
are most sensitive for both 90/10 and 50/50
conditions. The circle and square represent the
two cells in Fig. 1e and f, respectively. (d)
Psychometric curves, showing discriminability by
the population of most sensitive neurons (mean
s.e.m.). Solid line, 90/10 hatched line,
50/50 arrows, crossing points of 75 threshold.
Ulanovsky, Las, Nelken, Nature, 2003.
31
(a) Manipulating the interstimulus interval. Each
column represents a different interstimulus
interval all rows show single-unit data, except
the last row, which shows data recorded from
multiunit clusters. (bc) SSA for amplitude
deviants. (b) Neuronal responses. The first row
shows responses to the weak tone when it was
deviant (red) and standard (blue) the second row
shows responses to the strong tone, and the third
row shows their average. (c) Scatterplots of
SI(f2) versus SI(f1), for single-unit and
multi-unit data. In both a and b, stars indicate
significantly larger responses to the deviant
than to the standard (t-test, one-tailed, P lt
0.05). Ulanovsky, Las, Nelken, Nature, 2003.
32
http//micro.magnet.fsu.edu/optics/lightandcolor/s
ources.html
33
Spectral distribution curves demonstrating the
relative amounts of energy versus wavelength for
the three most common sources of white light are
illustrated in Figure 2. The red spectrum
represents the relative energy of tungsten light
over the visible spectrum. As is apparent, the
energy of tungsten light increases as wavelength
increases, which dramatically affects the average
color temperature of the resultant light,
especially when it is compared to that of natural
sunlight and fluorescent light. The yellow
spectrum represents what humans see with a
natural sunlight spectrum sampled at noon. Under
normal circumstances sunlight would have the
greatest amount of energy, but the spectrum has
been normalized in order to compare it to the
other two. The blue spectrum illustrates what is
seen with fluorescent light and contains some
notable differences from the tungsten and natural
sunlight spectra. Several energy peaks are
present in the fluorescent light spectrum, which
are a result of the superposed line spectrum of
mercury vapor in a fluorescent lamp.
http//micro.magnet.fsu.edu/primer/lightandcolorin
tro.html
34
Compare these to representations of the spectral
power distributions of average daylight and a
normal florescent light source
Note how the florescent source is relatively low
in terms of relative power as compared to CIE
Source A (a tungsten-filament bulb) and average
daylight and how it's relative power spikes
sharply at certain wavelengths. These spikes are
also typical of gas-discharge lamps.
http//www.adobe.com/support/techguides/color/colo
rtheory/light.html
35
About Color Constancy
  • In order to achieve color constancy, the visual
    system must somehow extract reliable spectral
    properties across the entire image in order to
    determine inherent spectral properties of objects
    within the scene. Boynton, 1988 Churchland
    Sejnowski, 1988 Foster, et al., 1997)
  • Presence of spectral peaks in illumination
    compromise color constancy. (Boynton Purl,
    1989 Von Fieandt, et al., 1964).
  • Combinations of as few as 3 basis functions cab
    describe over 99 of spectral reflectances of
    Munsell color chips (Cohen, 1964) and of natural
    objects (Maloney, 1985).
  • As few as 2 or 3 basis functions, describing
    estimated spectrum of illumination, are adequate
    to describe human color constancy (Maloney
    Wandell, 1986).

36
Auditory color constancy
  • It is unknown how many basis functions are
    required to describe relevant variations in
    spectral shape for natural acoustic sources.
  • We do know that many more than 3 are required.
    For example, at least 10 discreete cosine
    transform coefficients (DCTCs) are required only
    to classify simple vowel sounds in a fashion
    comparable to human performance (Zahorian
    Jaghardhi, 1993).
  • While human color vision is accomplished using
    only four receptor classes (rods and cones), the
    human auditory system uses an array of about
    3,500 hair cells to encode spectral composition.

37
To where from here
  • Developing nonspeech acoustic contexts that
    permit detailed analysis of perceptual
    cancellation of reliable spectral
    characteristics.
  • Extending these findings to perception of
    nonspeech spectra, such as differences between
    timbres of musical instruments.
  • Establishing same effects in animal model
    (chinchilla) for future physiological
    investigation.
  • Evaluate whether these processes provide
    mechanism for talker normalization in speech.
  • Develop explicit model to explain accumulation
    and cancellation of redundant spectral
    characteristics.

38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com