Tutorial on Neural Network Models for Speech and Image Processing

About This Presentation

Title:

Tutorial on Neural Network Models for Speech and Image Processing

Description:

... Applications in speech and image processing PART I Feature Extraction and Classification Problems in ... Analysis Feature extraction Image ... – PowerPoint PPT presentation

Number of Views:404

Avg rating:3.0/5.0

Slides: 101

Provided by: sda121

Category:

more less

Transcript and Presenter's Notes

Title: Tutorial on Neural Network Models for Speech and Image Processing

1
Tutorial onNeural Network Models for Speech and
Image Processing

B. Yegnanarayana
Speech Vision Laboratory
Dept. of Computer Science Engineering
IIT Madras, Chennai-600036
yegna_at_cs.iitm.ernet.in

WCCI 2002, Honululu, Hawaii, USA May 12, 2002
2
Need for New Models of Computing for Speech
Image Tasks

Speech Image processing tasks
Issues in dealing with these tasks by human
beings
Issues in dealing with the tasks by machine
Need for new models of computing in dealing with
natural signals
Need for effective (relevant) computing
Role of Artificial Neural Networks (ANN)

lt Prev
Next gt
3
Organization of the Tutorial

Part I Feature extraction and classification
problems with speech and image data
Part II Basics of ANN
Part III ANN models for feature extraction and
classification
Part IV Applications in speech and image
processing

lt Prev
Next gt
4
PART I Feature Extraction and Classification
Problems in Speech and Image
5
Feature Extraction and Classification Problems in
Speech and Image

Distinction between natural and synthetic
signals (unknown model vs known model generating
the signal)
Nature of speech and image data (non-repetitive
data, but repetitive features)
Need for feature extraction and classification
Methods for feature extraction and models for
classification
Need for nonlinear approaches (methods and models)

lt Prev
Next gt
6
Speech vs Audio

Audio (audible) signals (noise, music, speech and
other signals)
Categories of audio signals
Audio signal vs non-signal (noise)
Signal from speech production mechanism vs other
audio signals
Non-speech vs speech signals (like with natural
language)

lt Prev
Next gt
7
Speech Production Mechanism
lt Back
8
Different types of sounds
lt Back
9
Categorization of sound units
lt Back
10
Nature of Speech Signal

Digital speech Sequence of samples or numbers
Waveform for word MASK (Figure)
Characteristics of speech signal
Excitation source characteristics
Vocal tract system characteristics

lt Prev
Nextgt
11
Waveform for the word mask
ltBack
12
Source-System Model of Speech Production
Pitch period
Vocal tract parameters
Voice/ unvoiced switch
Impulse train generator
Time-varying digital filter
X
s(n)
u(n)
Random noise generator
G
lt Prev
Next gt
13
Features from Speech Signal (demo)

Different components of speech (speech, source
and system)
Different speech sound units (Alphabet in Indian
Languages)
Different emotions
Different speakers

lt Prev
Next gt
14
Speech Signal Processing Methods

To extract source-system features and
suprasegmental features
Production-based features
DSP-based features
Perception-based features

lt Prev
Next gt
15
Models for Matching and Classification

Dynamic Time Warping (DTW)
Hidden Markov Models (HMM)
Gaussian Mixture Models (GMM)

lt Prev
Next gt
16
Applications of Speech Processing

Speech recognition
Speaker recognition/verification
Speech enhancement
Speech compression
Audio indexing and retrieval

lt Prev
Next gt
17
Limitations of Feature Extraction Methods and
Classification Models

Fixed frame analysis
Variability in the implicit pattern
Not pattern-based analysis
Temporal nature of the patterns

lt Prev
Next gt
18
Need for New Approaches

To deal with ambiguity and variability in the
data for feature extraction
To combine evidence from multiple sources
(classifiers and knowledge sources)

lt Prev
Next gt
19
Images

Digital Image - Matrix of numbers
Types of Images
line sketches, binary, gray level and color
Still images, video, multimedia

lt Prev
Next gt
20
Image Analysis

Feature extraction
Image segmentation Gray level, color, texture
Image classification

lt Prev
Next gt
21
Processing of Texture-like Images
2-D Gabor Filter
A typical Gaussian filter with ?30
A typical Gabor filter with ?30, ?3.14 and
?45?
lt Prev
Next gt
22
Limitations

Feature extraction
Matching
Classification methods/models

lt Prev
Next gt
23
Need for New Approaches

Feature extraction PCA and nonlinear PCA
Matching Stereo images
Smoothing Using the knowledge of image and not
noise
Edge extraction and classification Integration
of global and local information or combining
evidence

lt Prev
Next gt
24
PART IIBasics of ANN
25
Artificial Neural Networks

Problem solving Pattern recognition tasks by
human and machine
Pattern vs data
Pattern processing vs data processing
Architectural mismatch
Need for new models of computing

lt Prev
Next gt
26
Biological Neural Networks

Structure and function Neurons,
interconnections, dynamics for learning and
recall
Features Robustness, fault tolerance,
flexibility, ability to deal with variety of data
situations, collective computation
Comparison with computers Speed, processing,
size and complexity, fault tolerance, control
mechanism
Parallel and Distributed Processing (PDP) models

lt Prev
Next gt
27
Basics of ANN

ANN terminology Processing unit (fig),
interconnection, operation and update (input,
weights, activation value, output function,
output value)
Models of neurons MP neuron, perceptron and
adaline
Topology (fig)
Basic learning laws (fig)

Next gt
lt Prev
28
Model of a Neuron
ltback
29
Topology
ltback
30
Basic Learning Laws
ltback
31
Activation and Synaptic Dynamic Models

General activation dynamics model

Passive decay term
Excitatory term
Inhibitory term

Synaptic dynamics model

Correlation term
Passive decay term

Stability and convergence

Nextgt
ltPrev
32
Functional Units and Pattern Recognition Tasks

Feedforward ANN
Pattern association
Pattern classification
Pattern mapping/classification
Feedback ANN
Autoassociation
Pattern storage (LTM)
Pattern environment storage (LTM)
Feedforward and Feedback (Competitive Learning)
ANN
Pattern storage (STM)
Pattern clustering
Feature map

lt Prev
Next gt
33
Two Layer Feedforward Neural Network (FFNN)
lt Prev
Next gt
34
PR Tasks by FFNN

Pattern association
Architecture Two layers, linear processing,
single set of weights
Learning, Hebb's (orthogonal) rule, Delta
(linearly independent) rule
Recall Direct
Limitation Linear independence, number of
patterns restricted to input dimensionality
To overcome Nonlinear processing units, leads to
a pattern classification problem
Pattern classification
Architecture Two layers, nonlinear processing
units, geometrical interpretation
Learning Perceptron learning
Recall Direct
Limitation Linearly separable functions, cannot
handle hard problems
To overcome More layers, leads to a hard
learning problem
Pattern mapping/classification
Architecture Multilayer (hidden), nonlinear
processing units, geometrical interpretation
Learning Generalized delta rule
(backpropagation)
Recall Direct
Limitation Slow learning, does not guarantee
convergence
To overcome More complex architecture

lt Prev
Next gt
35
Perceptron Network

Perceptron classification problem
Perceptron learning law
Perceptron convergence theorem
Perceptron representation problem
Multilayer perceptron

lt Prev
Next gt
36
Geometric Interpretation of Perceptron Learning
lt Prev
Next gt
37
Generalized Delta Rule (Backpropagation Learning)
lt Prev
Next gt
38
Issues in Backpropagation Learning

Description and features of error backpropagation
Performance of backpropagation learning
Refinements of backpropagation learning
Interpretation of results of learning
Generalization
Tasks with backpropagation network
Limitations of backpropagation learning
Extensions to backpropagation

lt Prev
Next gt
39
PR Tasks by FBNN

Autoassociation
Architecture Single layer with feedback, linear
processing units
Learning Hebb (orthogonal inputs), Delta
(linearly independent inputs)
Recall Activation dynamics until stable states
are reached
Limitation No accretive behavior
To overcome Nonlinear processing units, leads to
a pattern storage problem
Pattern Storage
Architecture Feedback neural network, nonlinear
processing units, states, Hopfield energy
analysis
Learning Not important
Recall Activation dynamics until stable states
are reached
Limitation Hard problems, limited number of
patterns, false minima
To overcome Stochastic update, hidden units
Pattern Environment Storage
Architecture Boltzmann machine, nonlinear
processing units, hidden units, stochastic update
Learning Boltzmann learning law, simulated
annealing
Recall Activation dynamics, simulated annealing
Limitation Slow learning
To Overcome Different architecture

lt Prev
Next gt
40
Hopfield Model

Model
Pattern storage condition

where

Capacity of Hopfield model Number of patterns
for a given probability of error

Energy analysis

Continuous Hopfield model
lt Prev
Next gt
41
State Transition Diagram
lt Prev
Next gt
42
Computation of Weights for Pattern Storage
Patterns to be stored (111) and (010). Results
in set of inequalities to be satisfied.
lt Prev
Next gt
43
Pattern Storage Tasks

Hard problems Conflicting requirements on a set
of inequalities
Hidden units Problem of false minima
Stochastic update

Stochastic equilibrium Boltzmann-Gibbs Law
lt Prev
Next gt
44
Simulated Annealing
Next gt
lt Prev
45
Boltzmann Machine

Pattern environment storage
Architecture Visible units, hidden units,
stochastic update, simulated annealing
Boltzmann Learning Law

lt Prev
Next gt
46
Discussion on Boltzmann Learning

Expression for Boltzmann learning
Significance of pij and p-ij
Learning and unlearning
Local property
Choice of ? and initial weights
Implementation of Boltzmann learning
Algorithm for learning a pattern environment
Algorithm for recall of a pattern
Implementation of simulated annealing
Annealing schedule
Pattern recognition tasks by Boltzmann machine
Pattern completion
Pattern association
Recall from noisy or partial input
Interpretation of Boltzmann learning
Markov property of simulated annealing
Clamped-free energy and full energy
Variations of Boltzmann learning
Deterministic Boltzmann machine

lt Prev
Next gt
47
Competitive Learning Neural Network (CLNN)
Output layer with on-center and
off-surround connections
Input layer
lt Prev
Next gt
48
PR Tasks by CLNN

Pattern storage (STM)
Architecture Two layers (input and competitive),
linear processing units
Learning No learning in FF stage, fixed weights
in FB layer
Recall Not relevant
Limitation STM, no application, theoretical
interest
To overcome Nonlinear output function in FB
stage, learning in FF stage
Pattern clustering (grouping)
Architecture Two layers (input and competitive),
nonlinear processing units in the competitive
layer
Learning Only in FF stage, Competitive learning
Recall Direct in FF stage, activation dynamics
until stable state is reached in FB layer
Limitation Fixed (rigid) grouping of patterns
To overcome Train neighbourhood units in
competition layer
Feature map
Architecture Self-organization network, two
layers, nonlinear processing units, excitatory
neighbourhood units
Learning Weights leading to the neighbourhood
units in the competitive layer
Recall Apply input, determine winner
Limitation Only visual features, not
quantitative
To overcome More complex architecture

lt Prev
Next gt
49
Learning Algorithms for PCA networks
Next gt
lt Prev
50
Self Organization Network
Output layer
Input layer
(b) Neighborhood regions at different times in
the output layer
(a) Network structure
lt Prev
Next gt
51
Illustration of SOM
lt Prev
Next gt
52
PART IIIANN Models for Feature Extraction and
Classification
Next gt
53
Neural Network Architecture and Models for
Feature Extraction

Multilayer Feedforward Neural Network (MLFFNN)
Autoassociative Neural Networks (AANN)
Constraint Satisfaction Models (CSM)
Self Organization MAP (SOM)
Time Delay Neural Networks (TDNN)
Hidden Markov Models (HMM)

lt Prev
Next gt
54
Multilayer FFNN

Nonlinear feature extraction followed by linearly
separable classification problem

lt Prev
Next gt
55
Multilayer FFNN

Complex decision hypersurfaces for classification
Asymptotic approximation of a posterior class
probabilities

lt Prev
Next gt
56
Radial Basis Function

Radial Basis Function NN Clustering followed by
classification

Basis function
?j(a)
Class labels
Input vector a
j
c1
cN
lt Prev
Next gt
57
Autoassociation Neural Network (AANN)

Architecture
Nonlinear PCA
Feature extraction
Distribution capturing ability

Next gt
lt Prev
58
Autoassociation Neural Network (AANN)

Architecture

Input Layer
Output Layer
Dimension Compression Hidden Layer
ltBack
59
Distribution Capturing Ability of AANN

Distribution of feature vector (fig)
Illustration of distribution in 2D case (fig)
Comparison with Gaussian Mixture Model (fig)

Next gt
lt Prev
60
Distribution of feature vector
ltBack
61
(a) Illustration of distribution in 2D case (b,c)
Comparison with Gaussian Mixture Model
ltBack
62
Feature Extraction by AANN

Input and output to AANN Sequence of signal
samples
(captures dominant 2nd order statistical
features)
Input and output to AANN Sequence of Residual
samples
(captures higher order statistical features in
the sample sequence)

Next gt
lt Prev
63
Constraint Satisfaction Model

Purpose To satisfy the given (weak) constraints
as much as possible
Structure Feedback network with units
(hypotheses), connections (constraints /
knowledge)
Goodness of fit function Depends on the output
of unit and connection weights
Relaxation Strategies Deterministic and
Stochastic

lt Prev
Next gt
64
Application of CS Models

Combining evidence
Combining classifiers outputs
Solving optimization problems

lt Prev
Next gt
65
Self Organization Map (illustrations)

Organization of 2D input to 1D feature mapping
Organization of 16 Dimensional LPC vector to
obtain phoneme map
Organization of large document files

lt Prev
Next gt
66
Time Delay Neural Networks for Temporal Pattern
Recognition
lt Prev
Next gt
67
Stochastic Models for Temporal Pattern Recognition

Maximum likelihood formulation Determine the
class w, given the observation symbol sequence y,
using criterion
Markov Models
Hidden Markov Models

lt Prev
Next gt
68
PART IVApplications in Speech Image Processing
69
Applications in Speech and Image Processing

Edge extraction in texture-like images
Texture segmentation/classification by CS model
Road detection from satellite images
Speech recognition by CS model
Speaker recognition by AANN model

lt Prev
Next gt
70
Problem of Edge Extraction in Texture-like Images

Nature of texture-like images
Problem of edge extraction
Preprocessing (1-D) to derive partial evidence
Combining evidence using CS model

lt Prev
Next gt
71
Problem of Edge Extraction

Texture Edges are the locations where there is an
abrupt change in texture properties

Image with 4 natural texture regions
Edgemap showing micro edges
Edgemap showing macro edges
lt Prev
Next gt
72
1-D processing using Gabor Filter and Difference
Operator

1-D Gabor smoothing filter Magnitude and Phase

1-D Gabor Filter Gaussian modulated by a complex
sinusoidal
Even Component
Odd Component
lt Prev
Next gt
73
1-D processing using Gabor filter and Difference
operator (contd.)

Differential operator for edge evidence
First derivative of 1-D Gaussian function
Need for a set of Gabor filters

lt Prev
Next gt
74
Texture Edge Extraction using 1-D Gabor Magnitude
and Phase

Apply 1-D Gabor filter along each of the parallel
lines of an image in one direction ( say,
horizontal )
Apply all Gabor filters of the filter bank in a
similar way
For each of the Gabor filtered output, partial
edge information is extracted by applying the 1-D
differential operator in the orthogonal direction
( say, vertical )
The entire process is repeated in the orthogonal
(vertical and horizontal) directions to obtain
the partial edge evidence in the other direction
The partial edge evidence is combined using a
Constraint Satisfaction Neural Network Model

lt Prev
Next gt
75
Texture Edge Extraction using a set of 1-D Gabor
Filters
Input Image
Bank of 1-D Gabor Filters
Filtered Image
Post-processing using 1-D Differential operator
and Thresholding
Edge evidence
Combining the Edge evidence using Constraint
Satisfaction Neural Network Mode
Edge map
lt Prev
Next gt
76
Combining Evidence using CSNN model
Structure of 3-D CSNN Model
J
I
K
ve
Connections among the nodes across the layers of
for each pixel
3D lattice of size IxJxK
-ve
Connections from a set of neighboring nodes to
each node in the same layer.
lt Prev
Next gt
77
Combining the Edge Evidence using Constraint
Satisfaction Neural Network (CSNN) Model

Neural network model contains nodes arranged in a
3-D lattice structure
Each node corresponds to a pixel in the
post-processed Gabor filter output
Post processed output of a single 1-D Gabor
filter is an input to one 2-D layer of nodes
Different layers of nodes, each corresponding to
a particular filter output, are stacked one upon
the other to form the 3-D structure
Each node represents a hypothesis
Connection between two nodes represents a
constraint
Each node is connected to other nodes with
inhibitory and excitatory connections

lt Prev
Next gt
78
Combining Evidence using CSNN model (contd.)
Let, represents the weight of the connection
from node (i,j,k) to node (i1,j1,k) within each
layer k, and the weight represents the
constraint between the nodes in two different
layers (k and k1) in the same column. These are
given as

The node is connected to other nodes in the
same column with excitatory connections

lt Prev
Next gt
79
Combining Evidence using CSNN model (contd.)

Using the notation as the
output of the node (i,j,k), and the set
as the state if the network
The state of the neural network model is
initialized using
In the deterministic relaxation method, the state
of the network is updated iteratively by changing
the output of each node at one time
The state of each node is obtained using
Ui,j,k (n) ? Wi,j,k,i1,j1,k ?i1,j1,k ?
Wi,j,k,i,j,k1 ?i,j,k1 Ii,j,k
Where Ui,j,k(n) is the net input to node(i,j,k)
at nth iteration, and Ii,j,k is the external
input given to the node (i,j,k)
The state of the network is updated using
where ? is the threshold

lt Prev
Next gt
80
Comparison of Edge Extraction using Gabor
Magnitude and Gabor Phase
1-D Gabor Magnitude
1-D Gabor Phase
2-D Gabor Filter
Texture Image
1-D Gabor Magnitude
1-D Gabor Phase
2-D Gabor Filter
Texture Image
Next gt
lt Prev
81
Texture Segmentation and Classification

Image analysis (revisited)
Problem of texture segmentation and
classification
Preprocessing using 2D Gabor filter to derive
feature vector
Combining the partial evidence using CS model

lt Prev
Next gt
82
CS Model for Texture Classification

Supervised and unsupervised problem
Modeling of image constraint
Formulation of a posterior probability CS model
Hopfield neural network model and its energy
function
Deterministic and Stochastic relaxation
strategies

lt Prev
Next gt
83
CS Model for Texture Classification- Modeling of
Image Constraints

Feature formation process Defined by the
conditional probability of the feature
vector gs of each pixels given the model
parameter of each class k.

Partition process Defines the probability of
the label of a pixel given the label of the
pixels in its pth order neighborhood.

Label competition process Describes the
conditional probability of assigning a new label
to an already labeled pixel

lt Prev
Next gt
84
CS Model for Texture Classification- Modeling of
Image Constraints (contd.)

Formulation of Posteriori Probability

where
and

Total energy of the system

lt Prev
Next gt
85
CS Model for Texture Classification
(ijK)
K
(ijK)
(ijk)
k
(ijk)
ve
(ij1)
J
-ve
I
(ij1)
Connections from a set of neighboring nodes to
each node in the same layer.
Connections among the nodes across the layers of
for each pixel
E
state
lt Prev
Next gt
86
Hopfield Neural Network and its Energy Function
o1
oj
oN
B1
Bj
BN
K
J
I
lt Prev
Next gt
87
Results of Texture Classification - Natural
Textures
Natural Textures
Initial Classification
Final Classification
lt Back
88
Results of Texture Classification - Remotely
Sensing Data
Band-2 IRS image containing 4 texture classes
Initial Classification
Final Classification
lt Back
89
Results of Texture Classification -
Multispectral Data
SIR-C/X-SAR image of Lost City of Ubar
Classification using multispectral and textural
information
Classification using multispectral information
lt Back
90
Speech Recognition using CS Model

Problem of recognition of SCV unit (Table)
Issues in classification of SCVs(Table)
Representation of isolated utterance of SCV unit
60ms before and 140 ms after vowel onset point
240 dimensional feature vector consisting of
weighted cepstral coefficients
Block diagram of the recognition system for SCV
unit (Fig)
CS network for classification of SCV unit(Fig)

lt Prev
Next gt
91
Problem of Recognition of SCV Units
ltBack
92
Issues in Classification of SCVs

Importance of SCVs
High frequency of occurrence About 45
Main Issues in Classification of SCVs
Large number of SCV classes
Similarity among several SCVs classes
Model of Classification of SCVs
Should have good discriminatory capablity
( Artificial neural networks )
- Should be able to handle large number of
classes
( Neural networks based on a modular approach )

ltBack
93
Block Diagram of Recognition System for SCV Units
ltBack
94
CS Network for Classification of SCV Units
POA Feedback Subnetwork
External evidence of bias for the node is
computed using the output of the MLFFNN5
External evidence of bias for the node is
computed using the output of the MLFFNN1
External evidence of bias for the node is
computed using the output of the MLFFNN9
Vowel Feedback Subnetwork
MOA Feedback Subnetwork
ltBack
95
Classification Performance of CSM and other SCV
Recognition Systems on Test Data of 80 SCV Classes
lt Prev
Next gt
96
Speaker Verification using AANN Models and Vocal
Tract System Features

One AANN for each speaker
Verification by identification
AANN structure 19L 38N 4N 38N 19 L
Feature 19 weighted LPCC from 16th order LPC
for each frame of 27.5 ms and frame shift 13.75ms
Training Pattern mode, 100 epochs, 1 min of data
Testing Model giving highest confidence for 10
sec of test data

lt Prev
Next gt
97
Speaker Recognition using Source Features

One model for each speaker
Structure of AANN 40L 48N 12N 48N 40L
Feature About 10 sec of data, 60 epochs
Testing Select model giving highest confidence
for 2 sec of test data

lt Prev
Next gt
98
Other Applications

Speech enhancement
Speech compression
Image compression
Character recognition
Stereo image matching

lt Prev
Next gt
99
Summary and Conclusions

Speech and image processing Natural tasks
Significance of pattern processing
Limitation of conventional computer architecture
Need for new models or architectures for pattern
processing tasks
Basics of ANN
Architecture of ANN for feature extraction and
classification
Potential of ANN for speech and image processing

lt Prev
100
References
1. B.Yegnanarayana, Artificial Neural
Networks, Prentice-Hall of India, New Delhi,
1999 2. L. R. Rabiner and B. H. Juang,
Fundamentals of Speech Recognition,
Prentice-Hall, New Jersey, 1993 3. Alan C.
Bovik, Handbook of Image and Video Processing,
Academic Press, 2001 4. Xuedong Hwang, Alex Acero
and Hsiao-Wuen Hon, Spoken Language Processing,
Prentice-Hall, New Jersey, 2001 5. P. P. Raghu,
Artificial Neural Network Models for Texture
Analysis, PhD Thesis, CSE Dept., IIT Madras,
1995 6. C. Chandra Sekar, Neural Network Models
for Recognition of Stop Consonant Vowel (SCV)
Segments in Continuous Speech, PhD Thesis, CSE
Dept., IIT Madras, 1996 7. P. Kiran Kumar,
Texture Edge Extraction using One Dimensional
Processing, MS Thesis, CSE Dept., 2001 8. S. P.
Kishore, Speaker Verification using
Autoassociative Neural Netwrok Models, MS
Thesis, CSE Dept., IIT Madras, 2000 9. B.
Yegnanarayana, K. Sharath Reddy and S. P.
Kishore, Source and System Features for Speaker
Recognition using AANN Models, ICASSP, May
2001 10. S. P. Kishore, Suryakanth V.
Ganagashetty and B. Yegnanarayana, Online Text
Independent Speaker Verification System using
Autoassociative Neural Network Models,
INNS-IEEE Int. Conf. Neural Networks, July
2001. 11. K. Sharat Reddy, Source and System
Features for Speaker Recognition, MS Thesis, CSE
Dept., IIT Madras, September 2001. 12. B.
Yegnanarayana and S. P. Kishore, Autoassociative
Neural Networks An alternative to GMM for
Pattern Recognition, to appear in Nerual
Networks 2002.

Write a Comment

User Comments (0)