Neuromorphic Signal Processing for Auditory Scene Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Neuromorphic Signal Processing for Auditory Scene Analysis

Description:

Title: A Spatio-Temporal Memory Based on SOMs with Activity Diffusion Author: Neil Euliano Last modified by: Dr. Jose Principe Created Date: 6/23/1999 1:13:47 AM – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 67
Provided by: NeilEu1
Category:

less

Transcript and Presenter's Notes

Title: Neuromorphic Signal Processing for Auditory Scene Analysis


1
Neuromorphic Signal Processing for Auditory Scene
Analysis
  • Jose C. Principe, Ph.D.
  • Distinguished Professor and Director
  • Computational NeuroEngineering Laboratory,
    University of Florida
  • Gainesville, FL 32611
  • principe_at_cnel.ufl.edu
  • http//www.cnel.ufl.edu

2
Table of Contents
  • The need to go beyond traditional signal
    processing and linear modeling.
  • Examples
  • Dynamic Vector Quantizers.
  • Signal to Symbol Translators
  • Entropy Based learning as a model for information
    processing in distributed systems.

3
DSP for Man-Made Signals
  • Digital Signal Processing methods have been
    developed assuming linear, time invariant systems
    and stationary Gaussian processes.
  • Complex exponentials are eigenvectors of linear
    systems
  • FFTs define frequency in an interval
  • Wiener filters are linear optimal for stationary
    random processes.
  • Markov models are context insensitive

4
Neurobiology reality
  • In order to become more productive we should
    develop a new systematic theory of biological
    information processing based on the known
    biological reality.
  • Decomposition in real exponentials (mesoscopic)
  • Local time descriptors (spike trains)
  • Nonlinear dynamical models
  • Adaptive distributed representations

5
Why delay a Neuromorphic Theory of Signal
Processing?
  • A revamped framework is needed to understand
    biological information processing. It should be
    based on the distributed nature of the
    computation, the nonlinear nature of the dynamic
    PEs, the competition and association of the
    interactions at different space temporal scales.
  • Here we will be showing three examples of how the
    addition of dynamics have impacted conventional
    models and is helping us find new paradigms for
    computation.

6
Protocol for Time-varying modeling
7
Protocol for Time-varying modeling
8
Types of Memory
  • Generalized feedforward (gamma memory -- see
    Principe et al)
  • Spatial feedback

9
Temporal SOM Research
  • Basically two approaches for temporal
    self-organizing map (SOM) either memory is
    placed at the input (embedding) or at the output.
    See
  • Kangas external memory or hierarchical maps
  • Chappell and Taylor, Critchley time-constant at
    each PE
  • Kohonen and Kangas hypermap
  • Goppert and Rosenstiel combined distance to
    input and distance to last winner

10
SOMs for Dynamic Modeling
  • Principe et al. applied temporal SOMs for local
    nonlinear dynamical modeling.
  • SOMs were used to cluster the NASA Langley
    supersonic wind tunnel dynamics. From the SOM
    weights, optimal filters were derived to predict
    the best control strategy to keep the tunnel at
    the optimum operating point.

11
SOMs for Dynamic Modeling
  • See also Ritter and Schulten.

12
Biological Motivation - NO
  • Nitric Oxide (NO) exists in the brain
  • NO produced by firing neurons
  • NO diffuses rapidly with long half-life
  • NO helps control the neurons synaptic strength
    (LTP/LTD)
  • NO is believed to be a diffusive messenger
  • Krekelberg has shown many interesting properties.

13
Biological Activity Diffusion
  • Turings Reaction-diffusion equation
  • Biological method of combining spatial (reaction)
    info with temporal info (diffusion)
  • R-D Equations can create wave-fronts
  • need excitable, nonlinear kinetics, and
    relaxation after excitation
  • Example Fitzhugh-Nagumo equations

14
Temporal Activity Diffusion-TAD
  • Goal is to create a truly distributed,
    spatio-temporal memory
  • Similar to NO diffusion in the SOM outputs
  • Activity diffused to neighboring PEs
  • lowers threshold of PEs with temporally active
    neighbors
  • creates temporal and spatial neighborhoods

15
SOM-TAD
  • Models diffusion with a traveling wave-front
  • Activity decays over time

16
SOM-TAD Equations
  • Exponential decay of activity at each PE
  • Activity creates traveling wave (build-up)
  • Winner selected including enhancement
  • Normal SOM update rule

17
SOM-TAD Memory
  • TAD creates a spatially distributed memory

18
SOM-TAD Application
  • Adjustable wave-front speed and width
  • Temporally self-organize spoken phonemes
  • words suit and small
  • Sampled at 16KHz, 3 bandpass filters (0.6-1.0
    Khz, 1.0-3.5 KHz, and 3.5-7.4 KHz)
  • See also Ruwisch, et. al.

19
Phoneme Organization
s
m
a
u
t
l
  • Probabilities with TAD

Probabilities without TAD
20
Phoneme Organization Results
Winners and Enhancement
21
Plasticity
  • Temporal information creates plasticity in the VQ

Without temporal info
With temporal info
22
Tessellation Dynamics
This demonstration shows how the GasTAD uses the
past of the signal to anticipate the future.
Note that the network has FIXED coefficients.
All the dynamics that are seen come from the
input and the coupling given by the temporal
diffusion memory mechanism.
Run Demonstration
23
VQ Results
VQ27,27,27,27,27,27,27,27,27,27,27,27,27,27
VQ12,12,16,16,25,25,25,25,27,27,27,27,27,27
GAS-TAD removes noise from signal using temporal
information to Anticipate the next input
24
VQ for Speech Recognition
  • GAS-TAD used to VQ speech and remove noise using
    temporal information
  • 15 speakers saying the digits one through ten --
    10 training, 5 testing
  • Preprocessing
  • 10 KHz sampling, 25.6 ms frames, 50 overlap
  • 12 liftered, cepstral coefficients
  • Mean filtered 3 at a time to reduce input vectors

25
Trainable VQ
26
Training
  • Each VQ trained with 10 instances of desired
    digit plus random vectors from other 9 digits

27
Recognition System
  • An MLP with a gamma memory for input was used for
    recognition
  • Winner-take-all determines digit

28
System Performance
  • Compare no VQ (use raw input), vs. NG VQ, vs.
    GAS-TAD VQ
  • GAS-TAD VQ reduces errors by 40 and 25
  • HHM provided 81 (small data base)

29
Conclusions
  • TAD algorithm uses temporal plasticity induced by
    the diffusion of activity through time and space
  • Unique spatio-temporal memory
  • Dynamics that can help disambiguate the static
    spatial information with temporal information.
  • Principe, J., Euliano N., Garani S., Principles
    and Networks for self-organization in space
    time, Special Issue on SOMs, Neural Networks,
    Aug 2002 (in Press).

30
New paradigms for computation
  • Interpretation of real world requires two basic
    steps
  • Mapping signals into symbols
  • Processing symbols
  • For optimality both have to be accomplished with
    as little error as possible.

31
New paradigms for computation
  • Turing machines process symbols perfectly
  • But can they map signals-to-symbols (STS)
    optimally?
  • I submit that STS mappings should be implemented
    by processors that learn directly from the data
    using non-convergent (chaotic) dynamics to fully
    utilize the time dimension for computation.

32
New paradigms for computation
  • STS processors interface the infinite complexity
    of the external world with the finite resources
    of conventional symbolic information processors.
  • Such STS processors exist in animal and human
    brains, and their principle of operation are now
    becoming known.
  • This translation is not easy if we observe the
    size of animal cortices..

33
New paradigms for computation
  • Our aim (w/ Walter Freeman) is to construct a
    neuromorphic processor in analog VLSI that
    operates in accordance with the nonlinear
    (chaotic) neurodynamics of the cerebral cortex.
  • Besides hierarchical organization, nonlinear
    dynamics provides the only known mechanism that
    can communicate local effects over long spatial
    scales. Except that chaos does not need any extra
    hardware.

34
Freemans K0 model
  • Freemans modeled the hierarchical organization
    of neural assemblies using K - (Katchalsky) sets
  • The simplest (K0) is a distributed, nonlinear,
    two variable dynamic model

35
Freemans PE (KII)
  • The fundamental building block is a tetrad of K0
    nodes interconnected with fixed weights

The Freeman PE functions as an oscillator. Freque
ncy is set by a,b and the strength of negative
feedback.
36
Freemans KII model
  • An area of the cortex is modeled as a layer of
    Freeman PEs, where the excitatory connections are
    trainable. This is a set of coupled oscillators
    in a space-time lattice.

adaptive
adaptive
37
Freemans KII model
  • How does it work?
  • PEs oscillate (characteristic frequency) when an
    input is applied, and the oscillation propagates.
  • The space coupling depends on the learned
    weights, so information is coded in spatial
    amplitude of quasi-sinusoidal waves.

38
Freemans KII model
39
Freemans KIII model
  • The olfactory system is a multilayer arrangement
    of Freeman PEs connected with dispersive delays
    and each layer with its natural (noncomensurate)
    frequencies.
  • End result the system state never settles,
    creating a chaotic attractor with wings.

40
Freemans KIII model
  • How does it work?
  • With no input the system is in a state of high
    dimensional chaos, searching a large space.
  • When a known input is applied to the KII network
    the dimensionality of the system rapidly
    collapses to one of the wings of the attractor of
    low dimensionality.
  • Symbols are coded into these transiently stable
    attractors.

41
Freemans KIII model
PG Layer
















P
P
P
P
To all Ps



















S




From all M1s
To all G1s
-
-
-
-
-
-
-
-
-
-

-
-
-
-
f1(.)
S


AON Layer (Single KII Unit)

f2(.)

f3(.)

PC Layer (Single KII Unit)

-
f4(.)
EC Layer
C
42
Freemans KIII model
All these attractors can be used as different
symbols
43
Conclusion
  • Coupled nonlinear oscillators can be used as
    signals to symbol translators.
  • The dynamics can be implemented in mixed signal
    VLSI chips to work as intelligent preprocessors
    for sensory inputs.
  • The readout of such systems is spatio-temporal
    and needs to be further researched.

Principe J., Tavares V., Harris J., Freeman W.,
Design and implementation of a biologically
realistic olfactory cortex in analog VLSI, in
the Proc. IEEE, vol 89,7, 1030-1051, 2001.
44
Information Theoretic Learning
  • The mean square error (MSE) criterion has been
    the workhorse of optimum filtering and neural
    networks.
  • We have introduced a new learning principle that
    applies to both supervised and unsupervised
    learning based on ENTROPY.
  • When we distill the method we see that it is
    based on interactions among pairs of information
    particles, which brings the possibility of using
    it as a principle for adaptation in highly
    complex systems.

45
A Different View of Entropy
  • Shannons Entropy
  • Renyis Entropy
  • Shannon is a special case when

46
Quadratic Entropy
  • Quadratic Entropy (a2)
  • Information Potential
  • Parzen window pdf estimation with Gaussian kernel
    (symmetric)

47
IP as an Estimator of Quadratic Entropy
  • Information Potential (IP)

48
Information Force (IF)
  • Between two Information
  • Particles (IPTs)
  • Overall

49
Entropy Criterion
  • Think of the IPTs as outputs of a nonlinear
    mapper (such as the MLP). How can we train the
    MLP ?
  • Use the IF as the Injected error.
  • Then apply the Backpropagation algorithm.

Minimization of entropy means maximization of IP.
50
Implications of Entropy Learning
  • Note that the MLP is being adapted in
    unsupervised mode, with a property of its output.
  • The cost function is totally independent of the
    mapper, so it can be applied generically.
  • The algorithm is O(N2).

51
Properties of Entropy Learning with Information
Potential
  • The IP with Gaussian kernels preserves the global
    minimum of Renyis entropy.
  • The global minimum of Renyis entropy coincides
    with Shannons entropy.
  • Around the global minimum, Renyis entropy (of
    any order) cost has the same eigenvalues as
    Shannon.
  • The global minimum is degenerated to a line
    (because entropy is insensitive to the mean).

52
Extension to on-line adaptation the Stochastic
Information Gradient (SIG)
  • Write the Information Force for the ADALINE
  • and approximating every summation by a single
    term

53
Relation between SIG and Hebbian learning
  • For Gaussian kernels the
    expression simplifies to
  • We see that SIG gives rise to a sample by sample
    adaptation
  • rule that is like Hebbian between consecutive
    samples!

54
Does SIG work?
  • Generated 50 samples of a 2D distribution where
    the x axis
  • is uniform and the y axis is Gaussian and the
    sample
  • covariance matrix is unity
  • PCA would converge to any direction but SIG found
  • consistently the 90 degree direction!

55
SIG for Optimal Linear Filters
  • We can derive a LMS like algorithm using IP.
  • For a 2, Gaussian kernels and i1 we get again

56
Example I Blind Source Separation
  • Instantaneous mixing
  • This saves the estimation of the joint pdf, which
    has dimension equal to the number of sources.

57
Applications to Blind Source Separation
  • Renyis MI is insensitive to rotations, so we can
    apply the same idea as with Shannon. The cost
    function becomes simply
  • We use the IP to estimate the entropies. The
    method is called MeRMaId.

58
Applications to Blind Source Separation
  • Mixed instantaneously 10 sources. Figure of merit
    is
  • MeRMaId is the most efficient in of samples!

59
Example II Optimal Feature Extraction
Question How do we project data to a subspace
preserving discriminability? Answer By
maximizing the mutual information between desired
responses and the output of the nonlinear mapper.
60
Block Diagram (2-D feature space)
Class identity
d
y
Information Potential Field
x
Forces
Back-Propagation
61
SAR/Automatic Target Recognition
MSTAR Public Release Data Base Three class
problem BMP2, BTR70, T72. Input Images are
64x64. Output space is 2 D. A Gaussian
classifier is computed in the output space.
62
SAR/Automatic Target Recognition
63
SAR/Classification
  • Confusion Matrix Comparisons (Pcc)

(counts)
64
Example III Image/voice data fusion
  • Use IED to maximize the MI between speech and
    image of the speaker. Image blocks (6x6) are sent
    to a two output perceptron, which is trained to
    maximize IED (20 frames) with the speech. So this
    is pre-fusion.
  • Used very simple preprocessing (sub sampling and
    energy contours in speech).

65
Example III Image/voice data fusion
Gray scale is coded For MI. When the subject is
speaking the lips Have the largest MI with the
sound
66
Conclusion
  • ITL seems to be a new paradigm for adaptation
    based on pairwise interactions that can be used
    to self-organize complex distributed systems.
  • When we apply the principle to dynamic Processing
    Elements, we see that it is translated as Hebbian
    learning for the increments. So after all,
    Hebbian learning my not be coding correlation but
    entropy.
Write a Comment
User Comments (0)
About PowerShow.com