Artificial Intelligence Chapter 20.5: Neural Networks

About This Presentation

Title:

Artificial Intelligence Chapter 20.5: Neural Networks

Description:

Department of Computer Science. Kent State University. November 11, 2004 ... Binary sigmoid. Bipolar sigmoid. November 11, 2004. AI: Chapter 20.5: Neural Networks ... – PowerPoint PPT presentation

Number of Views:825

Avg rating:3.0/5.0

Slides: 85

Provided by: michaels85

Category:

more less

Transcript and Presenter's Notes

Title: Artificial Intelligence Chapter 20.5: Neural Networks

1
Artificial IntelligenceChapter 20.5 Neural
Networks

Michael Scherger
Department of Computer Science
Kent State University

2
Contents

Introduction
Simple Neural Networks for Pattern Classification
Pattern Association
Neural Networks Based on Competition
Backpropagation Neural Network

3
Introduction

Much of these notes come from Fundamentals of
Neural Networks Architectures, Algorithms, and
Applications by Laurene Fausett, Prentice Hall,
Englewood Cliffs, NJ, 1994.

4
Introduction

Aims
Introduce some of the fundamental techniques and
principles of neural network systems
Investigate some common models and their
applications

5
What are Neural Networks?

Neural Networks (NNs) are networks of neurons,
for example, as found in real (i.e. biological)
brains.
Artificial Neurons are crude approximations of
the neurons found in brains. They may be physical
devices, or purely mathematical constructs.
Artificial Neural Networks (ANNs) are networks of
Artificial Neurons, and hence constitute crude
approximations to parts of real brains. They may
be physical devices, or simulated on conventional
computers.
From a practical point of view, an ANN is just a
parallel computational system consisting of many
simple processing elements connected together in
a specific way in order to perform a particular
task.
One should never lose sight of how crude the
approximations are, and how over-simplified our
ANNs are compared to real brains.

6
Why Study Artificial Neural Networks?

They are extremely powerful computational devices
(Turing equivalent, universal computers)
Massive parallelism makes them very efficient
They can learn and generalize from training data
so there is no need for enormous feats of
programming
They are particularly fault tolerant this is
equivalent to the graceful degradation found in
biological systems
They are very noise tolerant so they can cope
with situations where normal symbolic systems
would have difficulty
In principle, they can do anything a
symbolic/logic system can do, and more. (In
practice, getting them to do it can be rather
difficult)

7
What are Artificial Neural Networks Used for?

As with the field of AI in general, there are two
basic goals for neural network research
Brain modeling The scientific goal of building
models of how real brains work
This can potentially help us understand the
nature of human intelligence, formulate better
teaching strategies, or better remedial actions
for brain damaged patients.
Artificial System Building The engineering goal
of building efficient systems for real world
applications.
This may make machines more powerful, relieve
humans of tedious tasks, and may even improve
upon human performance.

8
What are Artificial Neural Networks Used for?

Brain modeling
Models of human development help children with
developmental problems
Simulations of adult performance aid our
understanding of how the brain works
Neuropsychological models suggest remedial
actions for brain damaged patients
Real world applications
Financial modeling predicting stocks, shares,
currency exchange rates
Other time series prediction climate, weather,
airline marketing tactician
Computer games intelligent agents, backgammon,
first person shooters
Control systems autonomous adaptable robots,
microwave controllers
Pattern recognition speech recognition,
hand-writing recognition, sonar signals
Data analysis data compression, data mining
Noise reduction function approximation, ECG
noise reduction
Bioinformatics protein secondary structure, DNA
sequencing

9
Learning in Neural Networks

There are many forms of neural networks. Most
operate by passing neural activations through a
network of connected neurons.
One of the most powerful features of neural
networks is their ability to learn and generalize
from a set of training data. They adapt the
strengths/weights of the connections between
neurons so that the final output activations are
correct.

10
Learning in Neural Networks

There are three broad types of learning
Supervised Learning (i.e. learning with a
teacher)
Reinforcement learning (i.e. learning with
limited feedback)
Unsupervised learning (i.e. learning with no help)

11
A Brief History

1943 McCulloch and Pitts proposed the
McCulloch-Pitts neuron model
1949 Hebb published his book The Organization of
Behavior, in which the Hebbian learning rule was
proposed.
1958 Rosenblatt introduced the simple single
layer networks now called Perceptrons.
1969 Minsky and Paperts book Perceptrons
demonstrated the limitation of single layer
perceptrons, and almost the whole field went into
hibernation.
1982 Hopfield published a series of papers on
Hopfield networks.
1982 Kohonen developed the Self-Organizing Maps
that now bear his name.
1986 The Back-Propagation learning algorithm for
Multi-Layer Perceptrons was re-discovered and the
whole field took off again.
1990s The sub-field of Radial Basis Function
Networks was developed.
2000s The power of Ensembles of Neural Networks
and Support Vector Machines becomes apparent.

12
Overview

Artificial Neural Networks are powerful
computational systems consisting of many simple
processing elements connected together to perform
tasks analogously to biological brains.
They are massively parallel, which makes them
efficient, robust, fault tolerant and noise
tolerant.
They can learn from training data and generalize
to new situations.
They are useful for brain modeling and real world
applications involving pattern recognition,
function approximation, prediction,

13
The Nervous System

The human nervous system can be broken down into
three stages that may be represented in block
diagram form as
The receptors collect information from the
environment e.g. photons on the retina.
The effectors generate interactions with the
environment e.g. activate muscles.
The flow of information/activation is represented
by arrows feed forward and feedback.

14
Levels of Brain Organization

The brain contains both large scale and small
scale anatomical structures and different
functions take place at higher and lower levels.
There is a hierarchy of interwoven levels of
organization
Molecules and Ions
Synapses
Neuronal microcircuits
Dendritic trees
Neurons
Local circuits
Inter-regional circuits
Central nervous system
The ANNs we study in this module are crude
approximations to levels 5 and 6.

15
Brains vs. Computers

There are approximately 10 billion neurons in the
human cortex, compared with 10 of thousands of
processors in the most powerful parallel
computers.
Each biological neuron is connected to several
thousands of other neurons, similar to the
connectivity in powerful parallel computers.
Lack of processing units can be compensated by
speed. The typical operating speeds of biological
neurons is measured in milliseconds (10-3 s),
while a silicon chip can operate in nanoseconds
(10-9 s).
The human brain is extremely energy efficient,
using approximately 10-16 joules per operation
per second, whereas the best computers today use
around 10-6 joules per operation per second.
Brains have been evolving for tens of millions of
years, computers have been evolving for tens of
decades.

16
Structure of a Human Brain
17
Slice Through a Real Brain
18
Biological Neural Networks

The majority of neurons encode their outputs or
activations as a series of brief electical pulses
(i.e. spikes or action potentials).
Dendrites are the receptive zones that receive
activation from other neurons.
The cell body (soma) of the neurons processes
the incoming activations and converts them into
output activations.
4. Axons are transmission lines that send
activation to other neurons.
5. Synapses allow weighted transmission of
signals (using neurotransmitters) between axons
and dendrites to build up large neural networks.

19
The McCulloch-Pitts Neuron

This vastly simplified model of real neurons is
also known as a Threshold Logic Unit
A set of synapses (i.e. connections) brings in
activations from other neurons.
A processing unit sums the inputs, and then
applies a non-linear activation function (i.e.
squashing/transfer/threshold function).
An output line transmits the result to other
neurons.

20
Networks of McCulloch-Pitts Neurons

Artificial neurons have the same basic components
as biological neurons. The simplest ANNs consist
of a set of McCulloch-Pitts neurons labeled by
indices k, i, j and activation flows between them
via synapses with strengths wki, wij

21
Some Useful Notation

We often need to talk about ordered sets of
related numbers we call them vectors, e.g.
x (x1, x2, x3, , xn) , y (y1, y2, y3, , ym)
The components xi can be added up to give a
scalar (number), e.g.
s x1 x2 x3 xn SUM(i, n, xi)
Two vectors of the same length may be added to
give another vector, e.g.
z x y (x1 y1, x2 y2, , xn yn)
Two vectors of the same length may be multiplied
to give a scalar, e.g.
p x.y x1y1 x2 y2 xnyn SUM(i, N,
xiyi)

22
Some Useful Functions

Common activation functions
Identity function
f(x) x for all x
Binary step function (with threshold ?) (aka
Heaviside function or threshold function)

23
Some Useful Functions

Binary sigmoid
Bipolar sigmoid

24
The McCulloch-Pitts Neuron Equation

Using the above notation, we can now write down a
simple equation for the output out of a
McCulloch-Pitts neuron as a function of its n
inputs ini

25
Review

Biological neurons, consisting of a cell body,
axons, dendrites and synapses, are able to
process and transmit neural activation
The McCulloch-Pitts neuron model (Threshold Logic
Unit) is a crude approximation to real neurons
that performs a simple summation and thresholding
function on activation levels
Appropriate mathematical notation facilitates the
specification and programming of artificial
neurons and networks of artificial neurons.

26
Networks of McCulloch-Pitts Neurons

One neuron cant do much on its own. Usually we
will have many neurons labeled by indices k, i, j
and activation flows between them via synapses
with strengths wki, wij

27
The Perceptron

We can connect any number of McCulloch-Pitts
neurons together in any way we like.
An arrangement of one input layer of
McCulloch-Pitts neurons feeding forward to one
output layer of McCulloch-Pitts neurons is known
as a Perceptron.

28
Logic Gates with MP Neurons

We can use McCulloch-Pitts neurons to implement
the basic logic gates.
All we need to do is find the appropriate
connection weights and neuron thresholds to
produce the right outputs for each set of inputs.
We shall see explicitly how one can construct
simple networks that perform NOT, AND, and OR.
It is then a well known result from logic that we
can construct any logical function from these
three operations.
The resulting networks, however, will usually
have a much more complex architecture than a
simple Perceptron.
We generally want to avoid decomposing complex
problems into simple logic gates, by finding the
weights and thresholds that work directly in a
Perceptron architecture.

29
Implementation of Logical NOT, AND, and OR

Logical OR
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 1

x1
?2
2
y
x2
2
30
Implementation of Logical NOT, AND, and OR

Logical AND
x1 x2 y
0 0 0
0 1 0
1 0 0
1 1 1

x1
?2
1
y
x2
1
31
Implementation of Logical NOT, AND, and OR

Logical NOT
x1 y
0 1
1 0

x1
?2
-1
y
1
2
bias
32
Implementation of Logical NOT, AND, and OR

Logical AND NOT
x1 x2 y
0 0 0
0 1 0
1 0 1
1 1 0

x1
?2
2
y
x2
-1
33
Logical XOR

Logical XOR
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 0

x1
?
y
x2
?
34
Logical XOR

How long do we keep looking for a solution? We
need to be able to calculate appropriate
parameters rather than looking for solutions by
trial and error.
Each training pattern produces a linear
inequality for the output in terms of the inputs
and the network parameters. These can be used to
compute the weights and thresholds.

35
Finding the Weights Analytically

We have two weights w1 and w2 and the threshold
q, and for each training pattern we need to
satisfy

36
Finding the Weights Analytically

For the XOR network
Clearly the second and third inequalities are
incompatible with the fourth, so there is in fact
no solution. We need more complex networks, e.g.
that combine together many simple networks, or
use different activation/thresholding/transfer
functions.

37
ANN Topologies

Mathematically, ANNs can be represented as
weighted directed graphs. For our purposes, we
can simply think in terms of activation flowing
between processing units via one-way connections
Single-Layer Feed-forward NNs One input layer and
one output layer of processing units. No
feed-back connections. (For example, a simple
Perceptron.)
Multi-Layer Feed-forward NNs One input layer, one
output layer, and one or more hidden layers of
processing units. No feed-back connections. The
hidden layers sit in between the input and output
layers, and are thus hidden from the outside
world. (For example, a Multi-Layer Perceptron.)
Recurrent NNs Any network with at least one
feed-back connection. It may, or may not, have
hidden units. (For example, a Simple Recurrent
Network.)

38
ANN Topologies
39
Detecting Hot and Cold

It is a well-known and interesting psychological
phenomenon that if a cold stimulus is applied to
a persons skin for a short period of time, the
person will perceive heat.
However, if the same stimulus is applied for a
longer period of time, the person will perceive
cold. The use of discrete time steps enables the
network of MP neurons to model this phenomenon.

40
Detecting Hot and Cold

The desired response of the system is that cold
is perceived if a cold stimulus is applied for
two time steps
y2(t) x2(t-2) AND x2(t-1)
It is also required that heat be perceived if
either a hot stimulus is applied or a cold
stimulus is applied briefly (for one time step)
and then removed
y1(t) x1(t-1) OR x2(t-3) AND NOT x2(t-2)

41
Detecting Heat and Cold
2
Heat
x1
y1
2
z1
-1
2
1
2
z2
x2
y2
Cold
1
42
Detecting Heat and Cold
Heat
0
Apply Cold
1
Cold
43
Detecting Heat and Cold
Heat
0
0
Remove Cold
1
0
Cold
44
Detecting Heat and Cold
Heat
0
1
0
0
Cold
45
Detecting Heat and Cold
Heat
1
Perceive Heat
0
Cold
46
Detecting Heat and Cold
Heat
0
Apply Cold
1
Cold
47
Detecting Heat and Cold
Heat
0
0
1
1
Cold
48
Detecting Heat and Cold
Heat
0
0
1
1
Cold
Perceive Cold
49
Example Classification

Consider the example of classifying airplanes
given their masses and speeds
How do we construct a neural network that can
classify any type of bomber or fighter?

50
A General Procedure for Building ANNs

1. Understand and specify your problem in terms
of inputs and required outputs, e.g. for
classification the outputs are the classes
usually represented as binary vectors.
2. Take the simplest form of network you think
might be able to solve your problem, e.g. a
simple Perceptron.
3. Try to find appropriate connection weights
(including neuron thresholds) so that the network
produces the right outputs for each input in its
training data.
4. Make sure that the network works on its
training data, and test its generalization by
checking its performance on new testing data.
5. If the network doesnt perform well enough, go
back to stage 3 and try harder.
6. If the network still doesnt perform well
enough, go back to stage 2 and try harder.
7. If the network still doesnt perform well
enough, go back to stage 1 and try harder.
8. Problem solved move on to next problem.

51
Building a NN for Our Example

For our airplane classifier example, our inputs
can be direct encodings of the masses and speeds
Generally we would have one output unit for each
class, with activation 1 for yes and 0 for no
With just two classes here, we can have just one
output unit, with activation 1 for fighter and
0 for bomber (or vice versa)
The simplest network to try first is a simple
Perceptron
We can further simplify matters by replacing the
threshold by using a bias

52
Building a NN for Our Example
53
Building a NN for Our Example
54
Decision Boundaries in Two Dimensions

For simple logic gate problems, it is easy to
visualize what the neural network is doing. It
is forming decision boundaries between classes.
Remember, the network output is
The decision boundary (between out 0 and out
1) is at
w1in1 w2in2 - ? 0

55
Decision Boundaries in Two Dimensions
In two dimensions the decision boundaries are
always on straight lines
56
Decision Boundaries for AND and OR
57
Decision Boundaries for XOR

There are two obvious remedies
either change the transfer function so that it
has more than one decision boundary
use a more complex network that is able to
generate more complex decision boundaries

58
Logical XOR (Again)

z1 x1 AND NOT x2
z2 x2 AND NOT x1
y z1 OR z2

2
x1
z1
2
-1
y
-1
2
x2
z2
2
59
Decision Hyperplanes and Linear Separability

If we have two inputs, then the weights define a
decision boundary that is a one dimensional
straight line in the two dimensional input space
of possible input values
If we have n inputs, the weights define a
decision boundary that is an n-1 dimensional
hyperplane in the n dimensional input space
w1in1 w2in2 wninn - ? 0

60
Decision Hyperplanes and Linear Separability

This hyperplane is clearly still linear (i.e.
straight/flat) and can still only divide the
space into two regions. We still need more
complex transfer functions, or more complex
networks, to deal with XOR type problems
Problems with input patterns which can be
classified using a single hyperplane are said to
be linearly separable. Problems (such as XOR)
which cannot be classified in this way are said
to be non-linearly separable.

61
General Decision Boundaries

Generally, we will want to deal with input
patterns that are not binary, and expect our
neural networks to form complex decision
boundaries
We may also wish to classify inputs into many
classes (such as the three shown here)

62
Learning and Generalization

A network will also produce outputs for input
patterns that it was not originally set up to
classify (shown with question marks), though
those classifications may be incorrect
There are two important aspects of the networks
operation to consider
Learning The network must learn decision surfaces
from a set of training patterns so that these
training patterns are classified correctly
Generalization After training, the network must
also be able to generalize, i.e. correctly
classify test patterns it has never seen before
Usually we want our neural networks to learn
well, and also to generalize well.

63
Learning and Generalization

Sometimes, the training data may contain errors
(e.g. noise in the experimental determination of
the input values, or incorrect classifications)
In this case, learning the training data
perfectly may make the generalization worse
There is an important tradeoff between learning
and generalization that arises quite generally

64
Generalization in Classification

Suppose the task of our network is to learn a
classification decision boundary
Our aim is for the network to generalize to
classify new inputs appropriately. If we know
that the training data contains noise, we dont
necessarily want the training data to be
classified totally accurately, as that is likely
to reduce the generalization ability.

65
Generalization in Function Approximation

Suppose we wish to recover a function for which
we only have noisy data samples
We can expect the neural network output to give a
better representation of the underlying function
if its output curve does not pass through all the
data points. Again, allowing a larger error on
the training data is likely to lead to better
generalization.

66
Training a Neural Network

Whether our neural network is a simple
Perceptron, or a much more complicated multilayer
network with special activation functions, we
need to develop a systematic procedure for
determining appropriate connection weights.
The general procedure is to have the network
learn the appropriate weights from a
representative set of training data
In all but the simplest cases, however, direct
computation of the weights is intractable

67
Training a Neural Network

Instead, we usually start off with random initial
weights and adjust them in small steps until the
required outputs are produced
We shall now look at a brute force derivation of
such an iterative learning algorithm for simple
Perceptrons.
Later, we shall see how more powerful and general
techniques can easily lead to learning algorithms
which will work for neural networks of any
specification we could possibly dream up

68
Perceptron Learning

For simple Perceptrons performing classification,
we have seen that the decision boundaries are
hyperplanes, and we can think of learning as the
process of shifting around the hyperplanes until
each training pattern is classified correctly
Somehow, we need to formalize that process of
shifting around into a systematic algorithm
that can easily be implemented on a computer
The shifting around can conveniently be split
up into a number of small steps.

69
Perceptron Learning

If the network weights at time t are wij(t), then
the shifting process corresponds to moving them
by an amount Dwij(t) so that at time t1 we have
weights
wij(t1) wij(t) Dwij(t)
It is convenient to treat the thresholds as
weights, as discussed previously, so we dont
need separate equations for them

70
Formulating the Weight Changes

Suppose the target output of unit j is targj and
the actual output is outj sgn(S ini wij), where
ini are the activations of the previous layer of
neurons (e.g. the network inputs)
Then we can just go through all the possibilities
to work out an appropriate set of small weight
changes

71
Perceptron Algorithm

Step 0 Initialize weights and bias
For simplicity, set weights and bias to zero
Set learning rate a (0 lt a lt 1) (h)
Step 1 While stopping condition is false do
steps 2-6
Step 2 For each training pair st do steps 3-5
Step 3 Set activations of input units
xi si

72
Perceptron Algorithm

Step 4 Compute response of output unit

73
Perceptron Algorithm

Step 5 Update weights and bias if an error
occurred for this pattern
if y ! t
wi(new) wi(old) atxi
b(new) b(old) at
else
wi(new) wi(old)
b(new) b(old)
Step 6 Test Stopping Condition
If no weights changed in Step 2, stop, else,
continue

74
Convergence of Perceptron Learning

The weight changes Dwij need to be applied
repeatedly for each weight wij in the network,
and for each training pattern in the training
set. One pass through all the weights for the
whole training set is called one epoch of
training
Eventually, usually after many epochs, when all
the network outputs match the targets for all the
training patterns, all the Dwij will be zero and
the process of training will cease. We then say
that the training process has converged to a
solution

75
Convergence of Perceptron Learning

It can be shown that if there does exist a
possible set of weights for a Perceptron which
solves the given problem correctly, then the
Perceptron Learning Rule will find them in a
finite number of iterations
Moreover, it can be shown that if a problem is
linearly separable, then the Perceptron Learning
Rule will find a set of weights in a finite
number of iterations that solves the problem
correctly

76
Overview and Review

Neural network classifiers learn decision
boundaries from training data
Simple Perceptrons can only cope with linearly
separable problems
Trained networks are expected to generalize, i.e.
deal appropriately with input data they were not
trained on
One can train networks by iteratively updating
their weights
The Perceptron Learning Rule will find weights
for linearly separable problems in a finite
number of iterations.

77
Hebbian Learning

In 1949 neuropsychologist Donald Hebb postulated
how biological neurons learn
When an axon of cell A is near enough to excite
a cell B and repeatedly or persistently takes
part in firing it, some growth process or
metabolic change takes place on one or both cells
such that As efficiency as one of the cells
firing B, is increased.
In other words
1. If two neurons on either side of a synapse
(connection) are activated simultaneously (i.e.
synchronously), then the strength of that synapse
is selectively increased.
This rule is often supplemented by
2. If two neurons on either side of a synapse are
activated asynchronously, then that synapse is
selectively weakened or eliminated.
so that chance coincidences do not build up
connection strengths.

78
Hebbian Learning Algorithm

Step 0 Initialize all weights
For simplicity, set weights and bias to zero
Step 1 For each input training vector do steps
2-4
Step 2 Set activations of input units
xi si
Step 3 Set the activation for the output unit
y t
Step 4 Adjust weights and bias
wi(new) wi(old) yxi
b(new) b(old) y

79
Hebbian vs Perceptron Learning

In the notation used for Perceptrons, the Hebbian
learning weight update rule is
wij (new) outj . ini
There is strong physiological evidence that this
type of learning does take place in the region of
the brain known as the hippocampus.
Recall that the Perceptron learning weight update
rule we derived was
wij (new) h. targj . ini
There is some similarity, but it is clear that
Hebbian learning is not going to get our
Perceptron to learn a set of training data.

80
Adaline

Adaline (Adaptive Linear Network) was developed
by Widrow and Hoff in 1960.
Uses bipolar activations (-1 and 1) for its input
signals and target values
Weight connections are adjustable
Trained using the delta rule for weight update
wij(new) wij(old) a(targj-outj)xi

81
Adaline Training Algorithm

Step 0 Initialize weights and bias
For simplicity, set weights (small random values)
Set learning rate a (0 lt a lt 1) (h)
Step 1 While stopping condition is false do
steps 2-6
Step 2 For each training pair st do steps 3-5
Step 3 Set activations of input units
xi si

82
Adaline Training Algorithm