Title: Neuromorphic Signal Processing for Auditory Scene Analysis
1Neuromorphic Signal Processing for Auditory Scene
Analysis
- Jose C. Principe, Ph.D.
- Distinguished Professor and Director
- Computational NeuroEngineering Laboratory,
University of Florida - Gainesville, FL 32611
- principe_at_cnel.ufl.edu
- http//www.cnel.ufl.edu
2Table of Contents
- The need to go beyond traditional signal
processing and linear modeling. - Examples
- Dynamic Vector Quantizers.
- Signal to Symbol Translators
- Entropy Based learning as a model for information
processing in distributed systems.
3DSP for Man-Made Signals
- Digital Signal Processing methods have been
developed assuming linear, time invariant systems
and stationary Gaussian processes. - Complex exponentials are eigenvectors of linear
systems - FFTs define frequency in an interval
- Wiener filters are linear optimal for stationary
random processes. - Markov models are context insensitive
4Neurobiology reality
- In order to become more productive we should
develop a new systematic theory of biological
information processing based on the known
biological reality. - Decomposition in real exponentials (mesoscopic)
- Local time descriptors (spike trains)
- Nonlinear dynamical models
- Adaptive distributed representations
5Why delay a Neuromorphic Theory of Signal
Processing?
- A revamped framework is needed to understand
biological information processing. It should be
based on the distributed nature of the
computation, the nonlinear nature of the dynamic
PEs, the competition and association of the
interactions at different space temporal scales. - Here we will be showing three examples of how the
addition of dynamics have impacted conventional
models and is helping us find new paradigms for
computation.
6Protocol for Time-varying modeling
7Protocol for Time-varying modeling
8Types of Memory
- Generalized feedforward (gamma memory -- see
Principe et al) - Spatial feedback
9Temporal SOM Research
- Basically two approaches for temporal
self-organizing map (SOM) either memory is
placed at the input (embedding) or at the output.
See - Kangas external memory or hierarchical maps
- Chappell and Taylor, Critchley time-constant at
each PE - Kohonen and Kangas hypermap
- Goppert and Rosenstiel combined distance to
input and distance to last winner
10SOMs for Dynamic Modeling
- Principe et al. applied temporal SOMs for local
nonlinear dynamical modeling. - SOMs were used to cluster the NASA Langley
supersonic wind tunnel dynamics. From the SOM
weights, optimal filters were derived to predict
the best control strategy to keep the tunnel at
the optimum operating point.
11SOMs for Dynamic Modeling
- See also Ritter and Schulten.
12Biological Motivation - NO
- Nitric Oxide (NO) exists in the brain
- NO produced by firing neurons
- NO diffuses rapidly with long half-life
- NO helps control the neurons synaptic strength
(LTP/LTD) - NO is believed to be a diffusive messenger
- Krekelberg has shown many interesting properties.
13Biological Activity Diffusion
- Turings Reaction-diffusion equation
- Biological method of combining spatial (reaction)
info with temporal info (diffusion) - R-D Equations can create wave-fronts
- need excitable, nonlinear kinetics, and
relaxation after excitation - Example Fitzhugh-Nagumo equations
14Temporal Activity Diffusion-TAD
- Goal is to create a truly distributed,
spatio-temporal memory - Similar to NO diffusion in the SOM outputs
- Activity diffused to neighboring PEs
- lowers threshold of PEs with temporally active
neighbors - creates temporal and spatial neighborhoods
15SOM-TAD
- Models diffusion with a traveling wave-front
- Activity decays over time
16SOM-TAD Equations
- Exponential decay of activity at each PE
- Activity creates traveling wave (build-up)
- Winner selected including enhancement
- Normal SOM update rule
17SOM-TAD Memory
- TAD creates a spatially distributed memory
18SOM-TAD Application
- Adjustable wave-front speed and width
- Temporally self-organize spoken phonemes
- words suit and small
- Sampled at 16KHz, 3 bandpass filters (0.6-1.0
Khz, 1.0-3.5 KHz, and 3.5-7.4 KHz) - See also Ruwisch, et. al.
19Phoneme Organization
s
m
a
u
t
l
Probabilities without TAD
20Phoneme Organization Results
Winners and Enhancement
21Plasticity
- Temporal information creates plasticity in the VQ
Without temporal info
With temporal info
22Tessellation Dynamics
This demonstration shows how the GasTAD uses the
past of the signal to anticipate the future.
Note that the network has FIXED coefficients.
All the dynamics that are seen come from the
input and the coupling given by the temporal
diffusion memory mechanism.
Run Demonstration
23VQ Results
VQ27,27,27,27,27,27,27,27,27,27,27,27,27,27
VQ12,12,16,16,25,25,25,25,27,27,27,27,27,27
GAS-TAD removes noise from signal using temporal
information to Anticipate the next input
24VQ for Speech Recognition
- GAS-TAD used to VQ speech and remove noise using
temporal information - 15 speakers saying the digits one through ten --
10 training, 5 testing - Preprocessing
- 10 KHz sampling, 25.6 ms frames, 50 overlap
- 12 liftered, cepstral coefficients
- Mean filtered 3 at a time to reduce input vectors
25Trainable VQ
26Training
- Each VQ trained with 10 instances of desired
digit plus random vectors from other 9 digits
27Recognition System
- An MLP with a gamma memory for input was used for
recognition - Winner-take-all determines digit
28System Performance
- Compare no VQ (use raw input), vs. NG VQ, vs.
GAS-TAD VQ - GAS-TAD VQ reduces errors by 40 and 25
- HHM provided 81 (small data base)
29Conclusions
- TAD algorithm uses temporal plasticity induced by
the diffusion of activity through time and space - Unique spatio-temporal memory
- Dynamics that can help disambiguate the static
spatial information with temporal information. - Principe, J., Euliano N., Garani S., Principles
and Networks for self-organization in space
time, Special Issue on SOMs, Neural Networks,
Aug 2002 (in Press).
30New paradigms for computation
- Interpretation of real world requires two basic
steps - Mapping signals into symbols
- Processing symbols
- For optimality both have to be accomplished with
as little error as possible.
31New paradigms for computation
- Turing machines process symbols perfectly
- But can they map signals-to-symbols (STS)
optimally? - I submit that STS mappings should be implemented
by processors that learn directly from the data
using non-convergent (chaotic) dynamics to fully
utilize the time dimension for computation.
32New paradigms for computation
- STS processors interface the infinite complexity
of the external world with the finite resources
of conventional symbolic information processors. - Such STS processors exist in animal and human
brains, and their principle of operation are now
becoming known. - This translation is not easy if we observe the
size of animal cortices..
33New paradigms for computation
- Our aim (w/ Walter Freeman) is to construct a
neuromorphic processor in analog VLSI that
operates in accordance with the nonlinear
(chaotic) neurodynamics of the cerebral cortex. - Besides hierarchical organization, nonlinear
dynamics provides the only known mechanism that
can communicate local effects over long spatial
scales. Except that chaos does not need any extra
hardware.
34Freemans K0 model
- Freemans modeled the hierarchical organization
of neural assemblies using K - (Katchalsky) sets - The simplest (K0) is a distributed, nonlinear,
two variable dynamic model
35Freemans PE (KII)
- The fundamental building block is a tetrad of K0
nodes interconnected with fixed weights
The Freeman PE functions as an oscillator. Freque
ncy is set by a,b and the strength of negative
feedback.
36Freemans KII model
- An area of the cortex is modeled as a layer of
Freeman PEs, where the excitatory connections are
trainable. This is a set of coupled oscillators
in a space-time lattice.
adaptive
adaptive
37Freemans KII model
- How does it work?
- PEs oscillate (characteristic frequency) when an
input is applied, and the oscillation propagates.
- The space coupling depends on the learned
weights, so information is coded in spatial
amplitude of quasi-sinusoidal waves.
38Freemans KII model
39Freemans KIII model
- The olfactory system is a multilayer arrangement
of Freeman PEs connected with dispersive delays
and each layer with its natural (noncomensurate)
frequencies. - End result the system state never settles,
creating a chaotic attractor with wings.
40Freemans KIII model
- How does it work?
- With no input the system is in a state of high
dimensional chaos, searching a large space. - When a known input is applied to the KII network
the dimensionality of the system rapidly
collapses to one of the wings of the attractor of
low dimensionality. - Symbols are coded into these transiently stable
attractors.
41Freemans KIII model
PG Layer
P
P
P
P
To all Ps
S
From all M1s
To all G1s
-
-
-
-
-
-
-
-
-
-
-
-
-
-
f1(.)
S
AON Layer (Single KII Unit)
f2(.)
f3(.)
PC Layer (Single KII Unit)
-
f4(.)
EC Layer
C
42Freemans KIII model
All these attractors can be used as different
symbols
43Conclusion
- Coupled nonlinear oscillators can be used as
signals to symbol translators. - The dynamics can be implemented in mixed signal
VLSI chips to work as intelligent preprocessors
for sensory inputs. - The readout of such systems is spatio-temporal
and needs to be further researched.
Principe J., Tavares V., Harris J., Freeman W.,
Design and implementation of a biologically
realistic olfactory cortex in analog VLSI, in
the Proc. IEEE, vol 89,7, 1030-1051, 2001.
44Information Theoretic Learning
- The mean square error (MSE) criterion has been
the workhorse of optimum filtering and neural
networks. - We have introduced a new learning principle that
applies to both supervised and unsupervised
learning based on ENTROPY. - When we distill the method we see that it is
based on interactions among pairs of information
particles, which brings the possibility of using
it as a principle for adaptation in highly
complex systems.
45A Different View of Entropy
- Renyis Entropy
- Shannon is a special case when
46Quadratic Entropy
- Quadratic Entropy (a2)
- Information Potential
- Parzen window pdf estimation with Gaussian kernel
(symmetric)
47IP as an Estimator of Quadratic Entropy
- Information Potential (IP)
-
48Information Force (IF)
- Between two Information
- Particles (IPTs)
49Entropy Criterion
- Think of the IPTs as outputs of a nonlinear
mapper (such as the MLP). How can we train the
MLP ? - Use the IF as the Injected error.
- Then apply the Backpropagation algorithm.
Minimization of entropy means maximization of IP.
50Implications of Entropy Learning
- Note that the MLP is being adapted in
unsupervised mode, with a property of its output.
- The cost function is totally independent of the
mapper, so it can be applied generically. - The algorithm is O(N2).
51Properties of Entropy Learning with Information
Potential
- The IP with Gaussian kernels preserves the global
minimum of Renyis entropy. - The global minimum of Renyis entropy coincides
with Shannons entropy. - Around the global minimum, Renyis entropy (of
any order) cost has the same eigenvalues as
Shannon. - The global minimum is degenerated to a line
(because entropy is insensitive to the mean).
52Extension to on-line adaptation the Stochastic
Information Gradient (SIG)
- Write the Information Force for the ADALINE
- and approximating every summation by a single
term
53Relation between SIG and Hebbian learning
- For Gaussian kernels the
expression simplifies to - We see that SIG gives rise to a sample by sample
adaptation - rule that is like Hebbian between consecutive
samples!
54Does SIG work?
- Generated 50 samples of a 2D distribution where
the x axis - is uniform and the y axis is Gaussian and the
sample - covariance matrix is unity
- PCA would converge to any direction but SIG found
- consistently the 90 degree direction!
55SIG for Optimal Linear Filters
- We can derive a LMS like algorithm using IP.
- For a 2, Gaussian kernels and i1 we get again
56Example I Blind Source Separation
- Instantaneous mixing
- This saves the estimation of the joint pdf, which
has dimension equal to the number of sources.
57Applications to Blind Source Separation
- Renyis MI is insensitive to rotations, so we can
apply the same idea as with Shannon. The cost
function becomes simply - We use the IP to estimate the entropies. The
method is called MeRMaId.
58Applications to Blind Source Separation
- Mixed instantaneously 10 sources. Figure of merit
is - MeRMaId is the most efficient in of samples!
59Example II Optimal Feature Extraction
Question How do we project data to a subspace
preserving discriminability? Answer By
maximizing the mutual information between desired
responses and the output of the nonlinear mapper.
60Block Diagram (2-D feature space)
Class identity
d
y
Information Potential Field
x
Forces
Back-Propagation
61SAR/Automatic Target Recognition
MSTAR Public Release Data Base Three class
problem BMP2, BTR70, T72. Input Images are
64x64. Output space is 2 D. A Gaussian
classifier is computed in the output space.
62SAR/Automatic Target Recognition
63SAR/Classification
- Confusion Matrix Comparisons (Pcc)
(counts)
64Example III Image/voice data fusion
- Use IED to maximize the MI between speech and
image of the speaker. Image blocks (6x6) are sent
to a two output perceptron, which is trained to
maximize IED (20 frames) with the speech. So this
is pre-fusion. - Used very simple preprocessing (sub sampling and
energy contours in speech).
65Example III Image/voice data fusion
Gray scale is coded For MI. When the subject is
speaking the lips Have the largest MI with the
sound
66Conclusion
- ITL seems to be a new paradigm for adaptation
based on pairwise interactions that can be used
to self-organize complex distributed systems. - When we apply the principle to dynamic Processing
Elements, we see that it is translated as Hebbian
learning for the increments. So after all,
Hebbian learning my not be coding correlation but
entropy. -