Introduction to Neural Networks' Backpropagation algorithm'

About This Presentation

Title:

Introduction to Neural Networks' Backpropagation algorithm'

Description:

each PSP travels along its dendrite and spreads over the soma ... input p (or input vector p) input signal (or signals) at the dendrite ... – PowerPoint PPT presentation

Number of Views:3916

Avg rating:3.0/5.0

Slides: 59

Provided by: irenako

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Neural Networks' Backpropagation algorithm'

1
Lecture 4bCOMP4044 Data Mining and Machine
LearningCOMP5318 Knowledge Discovery and Data
Mining

Introduction to Neural Networks. Backpropagation
algorithm.
Reference Dunham 61-66, 103-114

2
Outline

Introduction to Neural Networks
What is an artificial neural network?
Human nervous system
Taxonomy of neural networks
Backpropagation algorithm
Example
Error space
Universality of backpropagation
Generalization and overfitting
Heuristic modifications of backpropagation
Convergence example
Momentum
Learning rate
Limitations and capabilities
Interesting applications

3
What is an Artificial Neural Network (NN)?
0.7

A network of many simple units (neurons, nodes)

0.3
0.2

The units are connected by connections
Each connection has a numeric weight associated
with it
Units receive inputs (from the environment or
other units) via the connections. They produce
output using their weights and the inputs (i.e.
they operate locally)
A NN can be represented as a directed graph

NNs learn from examples and exhibit some
capability for generalization beyond the training
data
knowledge is acquired by the network from its
environment via learning and is stored in the
weights of the connections
the training (learning) rule a procedure for
modifying the weights of connections in order to
perform a certain task

4
Neuron Model

Each connection from unit i to j has a numeric
weigh wij associated with it, which determines
the strength and the sign of the connection
Each neuron first computes the weighed sum of its
inputs wp, and then applies an activation
function f to derive the output (activation) a
A neuron may have a special weight called bias
weight b . It is connected to a fixed input of 1.
NNs represent a function of their weights
(parameters). By adjusting the weights, we change
this function. This is done by using a learning
rule.

if there are 2 inputs p12 and p23, and if w11
3, w121, b -1.5, then a f(2331 -1.5)
f(7.5)
5
Artificial NNs vs. Biological NNs?

Artificial neurons are
(extremely) simple abstractions of biological
neurons
realized as a computer program or specialized
hardware
Networks of artificial neurons
do not have a fraction of the power of the human
brain but can be trained to perform useful
functions

Some of the artificial NNs are models of
biological NNs, some are not
Computational Neuroscience deals with creating
realistic models of biological neurons and brain
The inspiration for the field of NNs came from
the desire to produce artificial systems capable
of sophisticated, perhaps "intelligent",
computations similar to those that the human
brain routinely performs, and thereby also to
enhance our understanding of the brain

6
Human Nervous System

We have only just began to understand how our
neural system operates
A huge number of neurons and interconnections
between them
100 billion (i.e. 1010 ) neurons in the brain
a full Olympic-sized swimming pool contains 1010
raindrops the number of stars in the Milky Way
is of the same magnitude
104 connections per neuron
Biological neurons are slower than computers
Neurons operate in 10-3 seconds , computers in
10-9 seconds
The brain makes up for the slow rate of operation
by the large number of neurons and connections

7
Efficiency of Biological Neural Systems
For interested students, not examinable

The brain performs tasks like pattern
recognition, perception, motor control many times
faster than the fastest digital computers

efficiency of the sonar system of a bat
sonar is an active echo-location system
a bat sonar provides information about the
distance from a target, its relative velocity,
size, azimuth, elevation the size of various
features of the target
the complex neural computations needed to extract
all this information from the target echo occurs
within a brain which has the size of a plum!
the precision and success rate of the target
location achieved by the echo-locating bat is
rather impossible to match by radar or sonar
engineers
How does a human brain or the brain of a bat do
it?

8
Biological Neurons

Purpose of neurons transmit information in the
form of electrical signals
it accepts many inputs, which are all added up in
some way
if enough active inputs are received at once, the
neuron will be activated and fire if not, it
will remain in its inactive state
Structure of neuron

body (soma) contains nucleus containing the
chromosomes
dendrites
axon
synapse - a narrow gap
couples the axon with the dendrite of another
cell
no direct linkage across the junction, it is a
chemical one
information is passed from one neuron to another
through synapses

9
Different types of biological neurons
10
Operation of biological neurons

Signals are transmitted between neurons by
electrical pulses (action potentials, AP)
traveling along the axon
When the potential at the synapse is raised
sufficiently by the AP, it releases chemicals
called neurotransmitters
it may take the arrival of more than one AP
before the synapse is triggered

The neurotransmitters diffuse across the gap and
chemically activate gates on the dendrites, that
allows charged ions to flow

The flow of ions alters the potential of the
dendrite and provides a voltage pulse on the
dendrite (post-synaptic-potential, PSP)
some synapses excite the dendrite they affect,
while others inhibit it
the synapses also determine the strength of the
new input signal
each PSP travels along its dendrite and spreads
over the soma
the soma sums the effects of thousands PSPs if
the resulting potential exceeds a threshold, the
neuron fires and generates an AP

11
Learning in Biological NNs

We were born with some of our neural structures
others have been established by experience
At the early stage of the human brain development
(first 2 years) about 1 million synapses are
formed per second
Synapses are then modified through the learning
process
Learning is achieved by
creation of new synaptic connections between
neurons
modification of existing synapses
The synapses are thought to be mainly responsible
for learning
1949, Hebb proposed his famous learning rule
The strength of a synapse between 2 neurons is
increased by the repeated activation of one
neuron by the other across the synapse

12
Correspondence Between Artificial and Biological
Neurons

How this artificial neuron relates to the
biological one?
input p (or input vector p) input signal (or
signals) at the dendrite
weight w (or weight vector w) - strength of the
synapse (or synapses)
summer transfer function - cell body
neuron output a - signal at the axon

13
Taxonomy of NNs

Active phase feedforward (acyclic) and
recurrent (cyclic, feedback)
Learning phase - supervised and unsupervised
Feedforward supervised networks
typically used for classification and function
approximation
perceptrons, ADALINEs, backpropagation networks,
RBF, Learning Vector Quantization (LVQ) networks
Feedforward unsupervised networks
Hebbian networks used for associative learning
Competitive networks performing clustering and
visualization, e.g. Self-Organizing Kohonen
Feature Maps (SOM)
Recurrent networks temporal data processing
recurrent backpropagation, associative memories,
adaptive resonance networks

14
Backpropagation algorithm
15
Neural Network (NN) Model

Computational model consisting of 3 parts
1) Architecture neurons and connections
input, hidden, output neurons
fully or partially connected
neuron model computation performed by each
neuron type of transfer function
initialization of the weights
2) Learning algorithm
how are the weights of the connections changed in
order to facilitate learning
Goal for classification tasks mapping between
the input examples and the classes
3) Recall technique how is the information
obtained from the NN
for classification tasks how is the class of a
new example determined

16
Backpropagation Network - Architecture

1) A network with 1 or more hidden layers
e.g. a NN for the iris data

e.g. 1 output neuron for each class
output neurons
hidden neurons (1 hidden layer)
e.g. 1 input neuron for each attribute
inputs

2) Feedforward network - each neuron receives
input only from the neurons in the previous layer
3) Typically fully connected - all neurons in a
layer are connected with all neurons in the next
layer
4) Weights initialization small random values,
e.g. -1,1

17
Backpropagation Network Architecture 2

5) Neuron model - weighed sum of input signals
differentiable transfer function

a f(wpb)

any differentiable transfer function f can be
used most frequently the sigmoid and tan-sigmoid
(hyperbolic tangent sigmoid) functions are used

18
Architecture Number of Input Units

Numerical data - typically 1 input unit for each
attribute
Categorical data 1 input unit for each
attribute value)
How many input units for the weather data?

sunny overcast rainy hot mild cool high
normal false true
outlook temperature
humidity windy

Encoding of the input examples typically binary
depending on the value of the attribute (on and
off)
e.g. ex 1 100 100 10 01

19
Number of Output Units

Typically 1 neuron for each class

target class ex1 1 0
ex.1 1 0 0 1 0 0 1
0 0 1

Encoding of the targets (classes) typically
binary
e.g. class1 (no) 1 0, class2 (yes) 0 1

20
Number of Hidden Layers and Units in Them

An art! Typically - by trial and error
The task constrains the number of inputs and
output units but not the number of hidden layers
and neurons in them
Too many hidden layers and units (i.e. too many
weights) overfitting
Too few underfitting, i.e. the NN is not able
to learn the input-output mapping
A heuristic to start with 1 hidden layer with n
hidden neurons, n(inputsoutput_neurons)/2

target class ex1 1 0

sunny overcast rainy hot mild cool high
normal false true
outlook temperature
humidity windy
ex.1 1 0 0 1 0 0 1
0 0 1
21
Learning in Backpropagation NNs

Supervised learning the training data
consists of labeled examples (p,d), i.e. the
desired output d for them is given (p - input
vector d- desired output)
can be viewed as a teacher during the training
process
error - difference between the desired d and
actual a network output
Idea of backpropagation learning
For each training example p
Propagate p through the network and calculate the
output a . Compare the desired d with the actual
output a and calculate the error
Update weights of the network to reduce the
error
Until error over all examples lt threshold
Why backpropagation? Adjusts the weights
backwards (from the output to the input units) by
propagating the weight change

22
Backpropagation Learning - 2

Sum of Squared Errors (E) is a classical measure
of error
E for a single training example over all output
neurons
di - desired, ai - actual network output for
output neuron i
Thus, backpropagation learning can be viewed as
an optimization search in the weight space
Goal state the set of weights for which the
performance index (error) is minimum
Search method hill climbing

23
Error Landscape in Weight Space

E is a function of the weights
Several local minima and one global minimum

E
E as a function of w1 and w2
w1
w2

How to minimize the error? Take steps downhill
Not guaranteed to find the global minimum except
in the (glorious) situation where there is only
one global minimum
How to get to the bottom as fast as possible?
(i.e. we need to know what direction to move that
will make the largest reduction in error)

24
Steepest Gradient Descent

The direction of the steepest descent is called
gradient and can be computed ( dE/dw )
A function decreases most rapidly when the
direction of movement is in the direction of the
negative of the gradient
Hence, we want to adjust the weights so that the
change moves the system down the error surface in
the direction of the locally steepest descent,
given by the negative of the gradient
- learning rate, defines the step
typically in the range (0,1)

25
Backpropagation Algorithm - Idea

The backpropagation algorithm adjust weights by
working backward from the output layer to the
input layer
Calculate the error and propagate this error from
layer to layer

2 approaches
Incremental the weights are adjusted after each
training example is applied
Called also an approximate steepest descent
Preferred as it requires less space
Batch weights are adjusted once after all
training examples are applied and a total error
was calculated

Solid lines - forward propagation of signals
Dashed lines backward propagation of error

26
For interested students, not examinable
Backpropagation - Derivation

a neural network with one hidden layer
indexes
i over output neurons, j over hidden, k over
inputs
E (over all neurons, for the current input
vector p, i.e. incremental mode)

di target output of neuron i for p oi actual
output of neuron i for p

Express E in terms of weights and input signals
1. Input for the hidden neuron j for p

2. Activation of neuron j as function of its
input
27
For interested students, not examinable
Backpropagation Derivation - 2
3. Input for the output neuron i
4. Output for the output neuron i
5. Substituting 4 into E
28
For interested students, not examinable
Backpropagation Derivation - 3
6. Steepest gradient descent adjust the weights
proportionally to the negative of the error
gradient For a weight wji to an output neuron
lt- chain rule
For a weight wkj to a hidden neuron
29
Backpropagation Rule - Summary
( i is over the nodes in the layer above q)
30
Derivative of Sigmoid Activation Function

From the formulas for gt we must be able
to calculate the derivatives for f. For a
sigmoid transfer function

Thus, backpropagation errors for a network with
sigmoid transfer function

q is an output neuron
q is a hidden neuron

31
Backpropagation Algorithm - Summary

1. Determine the architecture of the network
how many input and output neurons what output
encoding
hidden neurons and layers
2. Initialize all weights (biases incl.) to
small random values, typically ??-1,1
3. Repeat until termination criterion satisfied
(forward pass) Present a training example and
propagate it through the network to calculate the
actual output
(backward pass) Compute the error (the
values for the output neurons).
Starting with output layer, repeat for
each layer in the network
- propagate the
values back to the previous layer
- update the
weights between the two layers
The stopping criteria is checked at the end of
each epoch
The error (mean absolute or mean square) is
below a threshold
All training examples are propagated and the
total error is calculated
The threshold is determined heuristically e.g.
0.3
Maximum number of epochs is reached
Early stopping using a validation set
It typically takes hundreds or thousands of
epochs for an NN to converge
Try Matlabs demo nnd11bc!

epoch - 1 pass through the training set

32
How to Determine if an Example is Correctly
Classified?

Accuracy may be used to evaluate performance
once training has finished or as a stopping
criteria checked at the end of each epoch
Binary encoding
apply each example and get the resulting output
activations of the output neurons the example
will belong to the class corresponding to the
output neuron with highest activation.
Example 3 classes the outputs for ex.X are
0.3, 0.7, 0.5 gt ex. X belongs to class 2
i.e. each output value is regarded as the
probability of the example to belong to the class
corresponding to this output

33
Backpropagation - Example

2 classes, 2-d input data
training set
ex.1 0.6 0.1 class 1 (banana)
ex.2 0.2 0.3 class 2 (orange)
Network architecture
How many inputs?
How many hidden neurons?
Heuristic
n(inputsoutput_neurons)/2
How many output neurons?
What encoding of the outputs?
10 for class 1, 01 for class 0
Initial weights and learning rate
Lets ?0.1 and the weights are set as in the
picture

34
Backpropagation Example (cont. 1)

1. Forward pass for ex. 1 - calculate the
outputs o6 and o7
o10.6, o20.1, target output 1 0, i.e.
class 1
Activations of the hidden units
net3 o1 w13 o2w23b30.60.10.1(-0.2)0.10.
14
o31/(1e-net3) 0.53
net4 o1 w14 o2w24b40.600.10.20.20.22
o41/(1e-net4) 0.55
net5 o1 w15 o2w25b50.60.30.1(-0.4)0.50.
64
o51/(1e-net5) 0.65
Activations of the output units
net6 o3 w36 o4w46 o5w56 b60.53(-0.4)0.55
0.10.650.6-0.10.13
o61/(1e-net6) 0.53
net7 o3 w37 o4w47 o5w57 b70.530.20.55(-
0.1)0.65(-0.2)0.60.52
o71/(1e-net7) 0.63

35
Backpropagation Example (cont. 2)

2. Backward pass for ex. 1
Calculate the output errors ??6 and ??7 (note
that d61, d70 for class 1)
??6 (d6-o6) o6 (1-o6)(1-0.53)0.53(1-0.53)
0.12
??7 (d7-o7) o7 (1-o7)(0-0.63)0.63(1-0.63)
-0.15
Calculate the new weights between the hidden and
output units (?0.1)
??w36 ? ?6 o3 0.10.120.530.006
?w36new w36old ??w36 -0.40.006-0.394
??w37 ? ?7 o3 0.1-0.150.53-0.008
?w37new w37old ??w37 0.2-0.008-0.19
Similarly for ?w46new , w47new , w56new and
w57new
For the biases b6 and b7 (remember biases are
weights with input 1)
??b6 ? ?6 1 0.10.120.012
b6new b6old ??b6 -0.10.012-0.012
Similarly for b7

36
Backpropagation Example (cont. 3)

Calculate the errors of the hidden units ?3, ?4
and ??5
??3 o3 (1-o3) (w36 ??6 w37 ??7 )
0.53(1-0.53)(-0.40.120.2(-0.15))-0.019
Similarly for ?4 and ??5
Calculate the new weights between the input and
hidden units (?0.1)
??w13 ? ?3 o1 0.1(-0.019)0.6-0.0011
?w13new w13old ??w13 0.1-0.00110.0989
Similarly for ?w23new , w14new , w24new , w15new
and w25new b3, b4 and b6
3. Repeat the same procedure for the other
training examples
Forward pass for ex. 2backward pass for ex.2
Forward pass for ex. 3backward pass for ex. 3
Note its better to apply input examples in
random order

37
Backpropagation Example (cont. 4)

4. At the end of the epoch check if the
stopping criteria is satisfied
if yes stop training
if not, continue training
epoch
go to step 1

38
Steepest Gradient Descent

Does the gradient descent guarantee that after
each adjustment the error will be reduced? No!
Not optimal - is guaranteed to find a minimum but
it might be a local minimum!
a local minimum may be a good enough solution

Backpropagations error space many local and 1
global minimum
39
Universality of Backpropagation

Boolean functions
Every boolean function of the inputs can be
represented by network with a single hidden layer
Continuous functions - universal approximation
theorems
Any continuous function can be approximated with
arbitrary small error by a network with one
hidden layer (Cybenko 1989, Hornik et al. 1989)
Any function (inc. discontinuous) can be
approximated to arbitrary small error by a
network with two hidden layers (Cybenco 1988)
These are existence theorems they say the
solution exist but dont say how to choose the
number of hidden neurons!
For a given network it is hard to say exactly
which functions can be represented and which ones
not

40
Overfitting

Occurs when
Training examples are noisy
Number of the free (trainable) parameters is
bigger than the number of training examples
The network has been trained too long
Preventing overtraining
Use network that is just large enough to provide
an adequate fit
OckhamRazor dont use a bigger net when a
smaller one will work
The network should not have more free parameters
than there are training examples!
However, it is difficult to know beforehand how
large a network should be for a specific
application!

41
Preventing Overtraining - Validation Set Approach

Also called an early stopping method
Available data is divided into 3 subsets
Training set
Used for computing the gradient and updating the
weights
Validation set
The error on the validation set is monitored
during the training
This error will normally decrease during the
initial phase of training (as does the training
error)
However, when the network begins to overfit the
data, the error on the validation set will
typically begin to rise
Training is stopped when the error on the
validation set increases for a pres-specified
number of iterations and the weights and biases
at the minimum of the validation set error are
returned
Testing set
Not used during training but to compare different
algorithms once training has completed

42
Error Surface and Convergence - Example

Path b gets trapped in a local minimum
What can be done? Try different initializations
Path a converges to the optimum solution but is
very slow
What can we do?

Try nnd12sd1!
43
Speeding up the Convergence

Solution 1 Increase the learning rate
Faster on the flat part but unstable when falling
into the steep valley that contains the minimum
point overshooting the minimum
Try nnd12sd2!

Solution 2 Smooth out the trajectory by
averaging the weight update, e.g. make current
update dependent on the previous
The use of momentum might smooth out the
oscillations and produce a stable trajectory

44
Backpropagation with Momentum - Example

Example the same learning rate and initial
position

Smooth and faster convergence
Stable algorithm
By the use of momentum we can use a larger
learning rate while maintaining the stability of
the algorithm

squared error

Try nnd12mo!

Typical momentum values used in practice 0.6-0.9

45
More on the Learning Rate

Constant throughout training (standard steepest
descent)
The performance is very sensitive to the proper
setting of the learning rate
Too small slow convergence
Too big oscillation, overshooting of the
minimum
It is not possible to determine the optimum
learning rate before training as it changes
during training and depends on the error surface
Variable learning rate
goal keep the learning rate as large as possible
while keeping learning stable
Several algorithms have been proposed

46
Limitations and Capabilities

Multilayer perceptons (MLPs) trained with
backpropagation can perform function
approximation and pattern classification
Theoretically they can
Perform any linear and non-linear computation
Can approximate any reasonable function arbitrary
well
gt are able to overcome the limitations of
earlier NNs (perceptrons and ADALINEs)
In practice
May not always find a solution can be trapped
in a local minimum
Performance is sensitive to the starting
conditions (weights initialization)
Sensitive to the number of hidden layers and
neurons
Too few neurons underfitting, unable to learn
what you want it to learn
Too many overfitting, learns slowly
gt the architecture of a MLP network is not
completely constrained by the problem to be
solved as the number of hidden layers and
neurons are left to the designer

47
Limitations and Capabilities cont.

Sensitive to the value of the learning rate
Too small slow learning
Too big instability or poor performance
The proper choices depends on the nature of
examples
Trial and error
Refer to the choices that have worked well in
similar problems
gt successful application of NNs requires time
and experience

Backpropagation - summary
uses steepest descent algorithm for minimizing
the mean square error
Gradient descent (GD)
Standard GD is slow as it requires small learning
rate for stable learning
GD with momentum is faster as it allows higher
learning rate while maintaining stability
There are several variations of the
backpropagation algorithm

48
Some Interesting NN Applications

A few examples of the many significant
applications of NNs
You can use them for the paper presentation in
w12 and 13!
Network design was the result of several months
trial and error experimentation
Moral NNs are widely applicable but they cannot
magically solve problems wrong choices lead to
poor performance
NNs are the second best way of doing just about
anything John Denker
NN provide passable performance on many tasks
that would be difficult to solve explicitly with
other techniques

49
For interested students only, not examinable
NETtalk

Sejnowski and Rosenberg 87
Pronunciation of written English
Fascinating problem in linguistics
Task with high commercial profit
How?
Mapping the text stream to phonemes
Passing the phonemes to speech generator
Task for the NN learning to map the text to
phonemes
Good task for a NN as most of the rules are
approximately correct
E.g. cat k, century s

50
For interested students only, not examinable
NETtalk -Architecture

203 input neurons 7 (sliding window the
character to be pronounced and the 3 characters
before and after it) x 29 possible characters (26
letters blank, period, other punctuation)
80 hidden
26 output corresponding to the phonemes

51
For interested students only, not examinable
NETtalk - Performance

Training set
1024-words hand transcribed into phonemes
Accuracy on training set 90 after 50 epochs
Why not 100?
A few dozen hours of training time a few months
of experimentation with different architectures
Testing
Accuracy 78
Importance
A good showpiece for the philosophy of NNs
The network appears to mimic the speech patterns
of young children incorrect bubble at first (as
the weights are random), then gradually improving
to become understandable

52
For interested students only, not examinable
Handwritten Character Recognition

Le Cun et al. 89
Read zip code on hand-addressed envelopes
Task for the NN
A preprocessor is used to recognize the segments
in the individual digits
Based on the segments, the network has to
identify the digits
Network architecture
256 input neurons 16x16 array of pixels
3 hidden layers 768, 192, 30 neurons
respectively
10 output neurons digits 0-9
Not fully connected network
If it was a fully connected network 200 000
connections (impossible to train) instead only
9760 connections
Units in the hidden layer act as feature
detectors e.g. each unit in the 1st hidden
layer is connected with 25 input neurons
(5x5pixel region)

53
For interested students only, not examinable
Handwritten Character Recognition cont.

Training 7300 examples
Testing 2000 examples
Accuracy 99
Hardware implementation (in VLSI)
enables letters to be sorted at high speed
zip codes
One of the largest applications of NNs

54
For interested students only, not examinable
Driving Motor Vehicles

Pomerleau et al., 1993
ALVIN (Autonomous Land Vehicle In a Neural
Network)
Learns to drive a van along a single lane on a
highway

Once trained on a particular road, ALVIN can
drive at speed gt 40 miles per hour
Chevy van and US Army HMMWV personnel carrier
computer-controlled steering, acceleration and
braking
sensors color stereo video camera, radar,
positioning system, scanning laser finders

55
For interested students only, not examinable
ALVINN - Architecture

Fully connected backpropagation NN with 1 hidden
layer
960 input neurons the signal from the camera is
preprocessed to yield 30x32 image intensity grid
5 hidden neurons
32 output neurons corresponding to directions
If the output node with the highest activation is
The left most , than ALVINN turns sharply left
The right most, than ALVINN turns sharply right
A node between them, than ALVINN directs the van
in a proportionally intermediate direction
Smoothing the direction it is calculated as
average suggested not only by the output node
with highest activation but also by the nodes
immediate neighbours
Training examples (image-direction pairs)
Recording such pairs when human drives the
vehicle
After collecting 5 mins such data and 10 mins
training, ALVINN can drive on its own

56
For interested students only, not examinable
ALVINN - Training

Training examples (image-direction pairs)
Recording such pairs when human drives the
vehicle
After collecting 5 min such data and 10 min
training, ALVINN can drive on its own
Potential problem as the human is too good and
(typically) does not stray from the lane, there
are no training examples that show how to recover
when you are misaligned with the road
Solution ALVINN corrects this by creating
synthetic training examples it rotates each
video image to create additional views of what
the road would look like if the van were a little
off course to the left or right

57
For interested students only, not examinable
ALVINN - Results

Impressive results
ALVINN has driven at speeds up to 70 miles per
hour for up to 90 miles on public highways near
Pittsburgh
Also at normal speeds on single lane dirt roads,
paved bike paths, and two lane suburban streets
Limitations
Unable to drive on a road type for which it
hasnt been trained
Not very robust to changes in lighting conditions
and presence of other vehicles
Comparison with traditional vision algorithms
Use image processing to analyse the scene and
find the road and then follow it
Most of them achieve 3-4 miles per hour

58
For interested students only, not examinable
ALVINN - Discussion

Why is ALVINN so successful?
Fast computation - once trained, the NN is able
to compute a new steering direction 10 times a
second gt the computed direction can be off by
10 from the ideal as long as the system is able
to make a correction in a few tenths of a second
Learning from examples is very appropriate
No good theory of driving but it is easy to
collect examples . Motivated the use of learning
algorithm (but not necessary NNs)
Driving is continuous, noisy domain, in which
almost all features contribute some information
gt NNs are better choice than some other learning
algorithms (e.g. DTs)