Artificial IntelligenceChapter 20.5 Neural

Networks

- Michael Scherger
- Department of Computer Science
- Kent State University

Contents

- Introduction
- Simple Neural Networks for Pattern Classification
- Pattern Association
- Neural Networks Based on Competition
- Backpropagation Neural Network

Introduction

- Much of these notes come from Fundamentals of

Neural Networks Architectures, Algorithms, and

Applications by Laurene Fausett, Prentice Hall,

Englewood Cliffs, NJ, 1994.

Introduction

- Aims
- Introduce some of the fundamental techniques and

principles of neural network systems - Investigate some common models and their

applications

What are Neural Networks?

- Neural Networks (NNs) are networks of neurons,

for example, as found in real (i.e. biological)

brains. - Artificial Neurons are crude approximations of

the neurons found in brains. They may be physical

devices, or purely mathematical constructs. - Artificial Neural Networks (ANNs) are networks of

Artificial Neurons, and hence constitute crude

approximations to parts of real brains. They may

be physical devices, or simulated on conventional

computers. - From a practical point of view, an ANN is just a

parallel computational system consisting of many

simple processing elements connected together in

a specific way in order to perform a particular

task. - One should never lose sight of how crude the

approximations are, and how over-simplified our

ANNs are compared to real brains.

Why Study Artificial Neural Networks?

- They are extremely powerful computational devices

(Turing equivalent, universal computers) - Massive parallelism makes them very efficient
- They can learn and generalize from training data

so there is no need for enormous feats of

programming - They are particularly fault tolerant this is

equivalent to the graceful degradation found in

biological systems - They are very noise tolerant so they can cope

with situations where normal symbolic systems

would have difficulty - In principle, they can do anything a

symbolic/logic system can do, and more. (In

practice, getting them to do it can be rather

difficult)

What are Artificial Neural Networks Used for?

- As with the field of AI in general, there are two

basic goals for neural network research - Brain modeling The scientific goal of building

models of how real brains work - This can potentially help us understand the

nature of human intelligence, formulate better

teaching strategies, or better remedial actions

for brain damaged patients. - Artificial System Building The engineering goal

of building efficient systems for real world

applications. - This may make machines more powerful, relieve

humans of tedious tasks, and may even improve

upon human performance.

What are Artificial Neural Networks Used for?

- Brain modeling
- Models of human development help children with

developmental problems - Simulations of adult performance aid our

understanding of how the brain works - Neuropsychological models suggest remedial

actions for brain damaged patients - Real world applications
- Financial modeling predicting stocks, shares,

currency exchange rates - Other time series prediction climate, weather,

airline marketing tactician - Computer games intelligent agents, backgammon,

first person shooters - Control systems autonomous adaptable robots,

microwave controllers - Pattern recognition speech recognition,

hand-writing recognition, sonar signals - Data analysis data compression, data mining
- Noise reduction function approximation, ECG

noise reduction - Bioinformatics protein secondary structure, DNA

sequencing

Learning in Neural Networks

- There are many forms of neural networks. Most

operate by passing neural activations through a

network of connected neurons. - One of the most powerful features of neural

networks is their ability to learn and generalize

from a set of training data. They adapt the

strengths/weights of the connections between

neurons so that the final output activations are

correct.

Learning in Neural Networks

- There are three broad types of learning
- Supervised Learning (i.e. learning with a

teacher) - Reinforcement learning (i.e. learning with

limited feedback) - Unsupervised learning (i.e. learning with no help)

A Brief History

- 1943 McCulloch and Pitts proposed the

McCulloch-Pitts neuron model - 1949 Hebb published his book The Organization of

Behavior, in which the Hebbian learning rule was

proposed. - 1958 Rosenblatt introduced the simple single

layer networks now called Perceptrons. - 1969 Minsky and Paperts book Perceptrons

demonstrated the limitation of single layer

perceptrons, and almost the whole field went into

hibernation. - 1982 Hopfield published a series of papers on

Hopfield networks. - 1982 Kohonen developed the Self-Organizing Maps

that now bear his name. - 1986 The Back-Propagation learning algorithm for

Multi-Layer Perceptrons was re-discovered and the

whole field took off again. - 1990s The sub-field of Radial Basis Function

Networks was developed. - 2000s The power of Ensembles of Neural Networks

and Support Vector Machines becomes apparent.

Overview

- Artificial Neural Networks are powerful

computational systems consisting of many simple

processing elements connected together to perform

tasks analogously to biological brains. - They are massively parallel, which makes them

efficient, robust, fault tolerant and noise

tolerant. - They can learn from training data and generalize

to new situations. - They are useful for brain modeling and real world

applications involving pattern recognition,

function approximation, prediction,

The Nervous System

- The human nervous system can be broken down into

three stages that may be represented in block

diagram form as - The receptors collect information from the

environment e.g. photons on the retina. - The effectors generate interactions with the

environment e.g. activate muscles. - The flow of information/activation is represented

by arrows feed forward and feedback.

Levels of Brain Organization

- The brain contains both large scale and small

scale anatomical structures and different

functions take place at higher and lower levels.

There is a hierarchy of interwoven levels of

organization - Molecules and Ions
- Synapses
- Neuronal microcircuits
- Dendritic trees
- Neurons
- Local circuits
- Inter-regional circuits
- Central nervous system
- The ANNs we study in this module are crude

approximations to levels 5 and 6.

Brains vs. Computers

- There are approximately 10 billion neurons in the

human cortex, compared with 10 of thousands of

processors in the most powerful parallel

computers. - Each biological neuron is connected to several

thousands of other neurons, similar to the

connectivity in powerful parallel computers. - Lack of processing units can be compensated by

speed. The typical operating speeds of biological

neurons is measured in milliseconds (10-3 s),

while a silicon chip can operate in nanoseconds

(10-9 s). - The human brain is extremely energy efficient,

using approximately 10-16 joules per operation

per second, whereas the best computers today use

around 10-6 joules per operation per second. - Brains have been evolving for tens of millions of

years, computers have been evolving for tens of

decades.

Structure of a Human Brain

Slice Through a Real Brain

Biological Neural Networks

- The majority of neurons encode their outputs or

activations as a series of brief electical pulses

(i.e. spikes or action potentials). - Dendrites are the receptive zones that receive

activation from other neurons. - The cell body (soma) of the neurons processes

the incoming activations and converts them into

output activations. - 4. Axons are transmission lines that send

activation to other neurons. - 5. Synapses allow weighted transmission of

signals (using neurotransmitters) between axons

and dendrites to build up large neural networks.

The McCulloch-Pitts Neuron

- This vastly simplified model of real neurons is

also known as a Threshold Logic Unit - A set of synapses (i.e. connections) brings in

activations from other neurons. - A processing unit sums the inputs, and then

applies a non-linear activation function (i.e.

squashing/transfer/threshold function). - An output line transmits the result to other

neurons.

Networks of McCulloch-Pitts Neurons

- Artificial neurons have the same basic components

as biological neurons. The simplest ANNs consist

of a set of McCulloch-Pitts neurons labeled by

indices k, i, j and activation flows between them

via synapses with strengths wki, wij

Some Useful Notation

- We often need to talk about ordered sets of

related numbers we call them vectors, e.g. - x (x1, x2, x3, , xn) , y (y1, y2, y3, , ym)
- The components xi can be added up to give a

scalar (number), e.g. - s x1 x2 x3 xn SUM(i, n, xi)
- Two vectors of the same length may be added to

give another vector, e.g. - z x y (x1 y1, x2 y2, , xn yn)
- Two vectors of the same length may be multiplied

to give a scalar, e.g. - p x.y x1y1 x2 y2 xnyn SUM(i, N,

xiyi)

Some Useful Functions

- Common activation functions
- Identity function
- f(x) x for all x
- Binary step function (with threshold ?) (aka

Heaviside function or threshold function)

Some Useful Functions

- Binary sigmoid
- Bipolar sigmoid

The McCulloch-Pitts Neuron Equation

- Using the above notation, we can now write down a

simple equation for the output out of a

McCulloch-Pitts neuron as a function of its n

inputs ini

Review

- Biological neurons, consisting of a cell body,

axons, dendrites and synapses, are able to

process and transmit neural activation - The McCulloch-Pitts neuron model (Threshold Logic

Unit) is a crude approximation to real neurons

that performs a simple summation and thresholding

function on activation levels - Appropriate mathematical notation facilitates the

specification and programming of artificial

neurons and networks of artificial neurons.

Networks of McCulloch-Pitts Neurons

- One neuron cant do much on its own. Usually we

will have many neurons labeled by indices k, i, j

and activation flows between them via synapses

with strengths wki, wij

The Perceptron

- We can connect any number of McCulloch-Pitts

neurons together in any way we like. - An arrangement of one input layer of

McCulloch-Pitts neurons feeding forward to one

output layer of McCulloch-Pitts neurons is known

as a Perceptron.

Logic Gates with MP Neurons

- We can use McCulloch-Pitts neurons to implement

the basic logic gates. - All we need to do is find the appropriate

connection weights and neuron thresholds to

produce the right outputs for each set of inputs. - We shall see explicitly how one can construct

simple networks that perform NOT, AND, and OR. - It is then a well known result from logic that we

can construct any logical function from these

three operations. - The resulting networks, however, will usually

have a much more complex architecture than a

simple Perceptron. - We generally want to avoid decomposing complex

problems into simple logic gates, by finding the

weights and thresholds that work directly in a

Perceptron architecture.

Implementation of Logical NOT, AND, and OR

- Logical OR
- x1 x2 y
- 0 0 0
- 0 1 1
- 1 0 1
- 1 1 1

x1

?2

2

y

x2

2

Implementation of Logical NOT, AND, and OR

- Logical AND
- x1 x2 y
- 0 0 0
- 0 1 0
- 1 0 0
- 1 1 1

x1

?2

1

y

x2

1

Implementation of Logical NOT, AND, and OR

- Logical NOT
- x1 y
- 0 1
- 1 0

x1

?2

-1

y

1

2

bias

Implementation of Logical NOT, AND, and OR

- Logical AND NOT
- x1 x2 y
- 0 0 0
- 0 1 0
- 1 0 1
- 1 1 0

x1

?2

2

y

x2

-1

Logical XOR

- Logical XOR
- x1 x2 y
- 0 0 0
- 0 1 1
- 1 0 1
- 1 1 0

x1

?

y

x2

?

Logical XOR

- How long do we keep looking for a solution? We

need to be able to calculate appropriate

parameters rather than looking for solutions by

trial and error. - Each training pattern produces a linear

inequality for the output in terms of the inputs

and the network parameters. These can be used to

compute the weights and thresholds.

Finding the Weights Analytically

- We have two weights w1 and w2 and the threshold

q, and for each training pattern we need to

satisfy

Finding the Weights Analytically

- For the XOR network
- Clearly the second and third inequalities are

incompatible with the fourth, so there is in fact

no solution. We need more complex networks, e.g.

that combine together many simple networks, or

use different activation/thresholding/transfer

functions.

ANN Topologies

- Mathematically, ANNs can be represented as

weighted directed graphs. For our purposes, we

can simply think in terms of activation flowing

between processing units via one-way connections - Single-Layer Feed-forward NNs One input layer and

one output layer of processing units. No

feed-back connections. (For example, a simple

Perceptron.) - Multi-Layer Feed-forward NNs One input layer, one

output layer, and one or more hidden layers of

processing units. No feed-back connections. The

hidden layers sit in between the input and output

layers, and are thus hidden from the outside

world. (For example, a Multi-Layer Perceptron.) - Recurrent NNs Any network with at least one

feed-back connection. It may, or may not, have

hidden units. (For example, a Simple Recurrent

Network.)

ANN Topologies

Detecting Hot and Cold

- It is a well-known and interesting psychological

phenomenon that if a cold stimulus is applied to

a persons skin for a short period of time, the

person will perceive heat. - However, if the same stimulus is applied for a

longer period of time, the person will perceive

cold. The use of discrete time steps enables the

network of MP neurons to model this phenomenon.

Detecting Hot and Cold

- The desired response of the system is that cold

is perceived if a cold stimulus is applied for

two time steps - y2(t) x2(t-2) AND x2(t-1)
- It is also required that heat be perceived if

either a hot stimulus is applied or a cold

stimulus is applied briefly (for one time step)

and then removed - y1(t) x1(t-1) OR x2(t-3) AND NOT x2(t-2)

Detecting Heat and Cold

2

Heat

x1

y1

2

z1

-1

2

1

2

z2

x2

y2

Cold

1

Detecting Heat and Cold

Heat

0

Apply Cold

1

Cold

Detecting Heat and Cold

Heat

0

0

Remove Cold

1

0

Cold

Detecting Heat and Cold

Heat

0

1

0

0

Cold

Detecting Heat and Cold

Heat

1

Perceive Heat

0

Cold

Detecting Heat and Cold

Heat

0

Apply Cold

1

Cold

Detecting Heat and Cold

Heat

0

0

1

1

Cold

Detecting Heat and Cold

Heat

0

0

1

1

Cold

Perceive Cold

Example Classification

- Consider the example of classifying airplanes

given their masses and speeds - How do we construct a neural network that can

classify any type of bomber or fighter?

A General Procedure for Building ANNs

- 1. Understand and specify your problem in terms

of inputs and required outputs, e.g. for

classification the outputs are the classes

usually represented as binary vectors. - 2. Take the simplest form of network you think

might be able to solve your problem, e.g. a

simple Perceptron. - 3. Try to find appropriate connection weights

(including neuron thresholds) so that the network

produces the right outputs for each input in its

training data. - 4. Make sure that the network works on its

training data, and test its generalization by

checking its performance on new testing data. - 5. If the network doesnt perform well enough, go

back to stage 3 and try harder. - 6. If the network still doesnt perform well

enough, go back to stage 2 and try harder. - 7. If the network still doesnt perform well

enough, go back to stage 1 and try harder. - 8. Problem solved move on to next problem.

Building a NN for Our Example

- For our airplane classifier example, our inputs

can be direct encodings of the masses and speeds - Generally we would have one output unit for each

class, with activation 1 for yes and 0 for no

- With just two classes here, we can have just one

output unit, with activation 1 for fighter and

0 for bomber (or vice versa) - The simplest network to try first is a simple

Perceptron - We can further simplify matters by replacing the

threshold by using a bias

Building a NN for Our Example

Building a NN for Our Example

Decision Boundaries in Two Dimensions

- For simple logic gate problems, it is easy to

visualize what the neural network is doing. It

is forming decision boundaries between classes.

Remember, the network output is - The decision boundary (between out 0 and out

1) is at - w1in1 w2in2 - ? 0

Decision Boundaries in Two Dimensions

In two dimensions the decision boundaries are

always on straight lines

Decision Boundaries for AND and OR

Decision Boundaries for XOR

- There are two obvious remedies
- either change the transfer function so that it

has more than one decision boundary - use a more complex network that is able to

generate more complex decision boundaries

Logical XOR (Again)

- z1 x1 AND NOT x2
- z2 x2 AND NOT x1
- y z1 OR z2

2

x1

z1

2

-1

y

-1

2

x2

z2

2

Decision Hyperplanes and Linear Separability

- If we have two inputs, then the weights define a

decision boundary that is a one dimensional

straight line in the two dimensional input space

of possible input values - If we have n inputs, the weights define a

decision boundary that is an n-1 dimensional

hyperplane in the n dimensional input space - w1in1 w2in2 wninn - ? 0

Decision Hyperplanes and Linear Separability

- This hyperplane is clearly still linear (i.e.

straight/flat) and can still only divide the

space into two regions. We still need more

complex transfer functions, or more complex

networks, to deal with XOR type problems - Problems with input patterns which can be

classified using a single hyperplane are said to

be linearly separable. Problems (such as XOR)

which cannot be classified in this way are said

to be non-linearly separable.

General Decision Boundaries

- Generally, we will want to deal with input

patterns that are not binary, and expect our

neural networks to form complex decision

boundaries - We may also wish to classify inputs into many

classes (such as the three shown here)

Learning and Generalization

- A network will also produce outputs for input

patterns that it was not originally set up to

classify (shown with question marks), though

those classifications may be incorrect - There are two important aspects of the networks

operation to consider - Learning The network must learn decision surfaces

from a set of training patterns so that these

training patterns are classified correctly - Generalization After training, the network must

also be able to generalize, i.e. correctly

classify test patterns it has never seen before - Usually we want our neural networks to learn

well, and also to generalize well.

Learning and Generalization

- Sometimes, the training data may contain errors

(e.g. noise in the experimental determination of

the input values, or incorrect classifications) - In this case, learning the training data

perfectly may make the generalization worse - There is an important tradeoff between learning

and generalization that arises quite generally

Generalization in Classification

- Suppose the task of our network is to learn a

classification decision boundary - Our aim is for the network to generalize to

classify new inputs appropriately. If we know

that the training data contains noise, we dont

necessarily want the training data to be

classified totally accurately, as that is likely

to reduce the generalization ability.

Generalization in Function Approximation

- Suppose we wish to recover a function for which

we only have noisy data samples - We can expect the neural network output to give a

better representation of the underlying function

if its output curve does not pass through all the

data points. Again, allowing a larger error on

the training data is likely to lead to better

generalization.

Training a Neural Network

- Whether our neural network is a simple

Perceptron, or a much more complicated multilayer

network with special activation functions, we

need to develop a systematic procedure for

determining appropriate connection weights. - The general procedure is to have the network

learn the appropriate weights from a

representative set of training data - In all but the simplest cases, however, direct

computation of the weights is intractable

Training a Neural Network

- Instead, we usually start off with random initial

weights and adjust them in small steps until the

required outputs are produced - We shall now look at a brute force derivation of

such an iterative learning algorithm for simple

Perceptrons. - Later, we shall see how more powerful and general

techniques can easily lead to learning algorithms

which will work for neural networks of any

specification we could possibly dream up

Perceptron Learning

- For simple Perceptrons performing classification,

we have seen that the decision boundaries are

hyperplanes, and we can think of learning as the

process of shifting around the hyperplanes until

each training pattern is classified correctly - Somehow, we need to formalize that process of

shifting around into a systematic algorithm

that can easily be implemented on a computer - The shifting around can conveniently be split

up into a number of small steps.

Perceptron Learning

- If the network weights at time t are wij(t), then

the shifting process corresponds to moving them

by an amount Dwij(t) so that at time t1 we have

weights - wij(t1) wij(t) Dwij(t)
- It is convenient to treat the thresholds as

weights, as discussed previously, so we dont

need separate equations for them

Formulating the Weight Changes

- Suppose the target output of unit j is targj and

the actual output is outj sgn(S ini wij), where

ini are the activations of the previous layer of

neurons (e.g. the network inputs) - Then we can just go through all the possibilities

to work out an appropriate set of small weight

changes

Perceptron Algorithm

- Step 0 Initialize weights and bias
- For simplicity, set weights and bias to zero
- Set learning rate a (0 lt a lt 1) (h)
- Step 1 While stopping condition is false do

steps 2-6 - Step 2 For each training pair st do steps 3-5
- Step 3 Set activations of input units
- xi si

Perceptron Algorithm

- Step 4 Compute response of output unit

Perceptron Algorithm

- Step 5 Update weights and bias if an error

occurred for this pattern - if y ! t
- wi(new) wi(old) atxi
- b(new) b(old) at
- else
- wi(new) wi(old)
- b(new) b(old)
- Step 6 Test Stopping Condition
- If no weights changed in Step 2, stop, else,

continue

Convergence of Perceptron Learning

- The weight changes Dwij need to be applied

repeatedly for each weight wij in the network,

and for each training pattern in the training

set. One pass through all the weights for the

whole training set is called one epoch of

training - Eventually, usually after many epochs, when all

the network outputs match the targets for all the

training patterns, all the Dwij will be zero and

the process of training will cease. We then say

that the training process has converged to a

solution

Convergence of Perceptron Learning

- It can be shown that if there does exist a

possible set of weights for a Perceptron which

solves the given problem correctly, then the

Perceptron Learning Rule will find them in a

finite number of iterations - Moreover, it can be shown that if a problem is

linearly separable, then the Perceptron Learning

Rule will find a set of weights in a finite

number of iterations that solves the problem

correctly

Overview and Review

- Neural network classifiers learn decision

boundaries from training data - Simple Perceptrons can only cope with linearly

separable problems - Trained networks are expected to generalize, i.e.

deal appropriately with input data they were not

trained on - One can train networks by iteratively updating

their weights - The Perceptron Learning Rule will find weights

for linearly separable problems in a finite

number of iterations.

Hebbian Learning

- In 1949 neuropsychologist Donald Hebb postulated

how biological neurons learn - When an axon of cell A is near enough to excite

a cell B and repeatedly or persistently takes

part in firing it, some growth process or

metabolic change takes place on one or both cells

such that As efficiency as one of the cells

firing B, is increased. - In other words
- 1. If two neurons on either side of a synapse

(connection) are activated simultaneously (i.e.

synchronously), then the strength of that synapse

is selectively increased. - This rule is often supplemented by
- 2. If two neurons on either side of a synapse are

activated asynchronously, then that synapse is

selectively weakened or eliminated. - so that chance coincidences do not build up

connection strengths.

Hebbian Learning Algorithm

- Step 0 Initialize all weights
- For simplicity, set weights and bias to zero
- Step 1 For each input training vector do steps

2-4 - Step 2 Set activations of input units
- xi si
- Step 3 Set the activation for the output unit
- y t
- Step 4 Adjust weights and bias
- wi(new) wi(old) yxi
- b(new) b(old) y

Hebbian vs Perceptron Learning

- In the notation used for Perceptrons, the Hebbian

learning weight update rule is - wij (new) outj . ini
- There is strong physiological evidence that this

type of learning does take place in the region of

the brain known as the hippocampus. - Recall that the Perceptron learning weight update

rule we derived was - wij (new) h. targj . ini
- There is some similarity, but it is clear that

Hebbian learning is not going to get our

Perceptron to learn a set of training data.

Adaline

- Adaline (Adaptive Linear Network) was developed

by Widrow and Hoff in 1960. - Uses bipolar activations (-1 and 1) for its input

signals and target values - Weight connections are adjustable
- Trained using the delta rule for weight update
- wij(new) wij(old) a(targj-outj)xi

Adaline Training Algorithm

- Step 0 Initialize weights and bias
- For simplicity, set weights (small random values)

Set learning rate a (0 lt a lt 1) (h) - Step 1 While stopping condition is false do

steps 2-6 - Step 2 For each training pair st do steps 3-5
- Step 3 Set activations of input units
- xi si

Adaline Training Algorithm

- Step 4 Compute net input to output unit
- y_in b S xiwi
- Step 5 Update bias and weights
- wi(new) wi(old) a(t-y_in)xi
- b(new) b(old) a(t-y_in)
- Step 6 Test for stopping condition

Autoassociative Net

- The feed forward autoassociative net has the

following diagram - Useful for determining is something is a part of

the test pattern or not - Weight matrix diagonal is usually zeroimproves

generalization - Hebbian learning if mutually orthogonal vectors

are used

x1

y1

xi

yj

xn

ym

BAM Net

- Bidirectional Associative Net