Introduction to Neural Networks' Backpropagation algorithm' - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Introduction to Neural Networks' Backpropagation algorithm'

Description:

each PSP travels along its dendrite and spreads over the soma ... input p (or input vector p) input signal (or signals) at the dendrite ... – PowerPoint PPT presentation

Number of Views:3916
Avg rating:3.0/5.0
Slides: 59
Provided by: irenako
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Neural Networks' Backpropagation algorithm'


1
Lecture 4bCOMP4044 Data Mining and Machine
LearningCOMP5318 Knowledge Discovery and Data
Mining
  • Introduction to Neural Networks. Backpropagation
    algorithm.
  • Reference Dunham 61-66, 103-114

2
Outline
  • Introduction to Neural Networks
  • What is an artificial neural network?
  • Human nervous system
  • Taxonomy of neural networks
  • Backpropagation algorithm
  • Example
  • Error space
  • Universality of backpropagation
  • Generalization and overfitting
  • Heuristic modifications of backpropagation
  • Convergence example
  • Momentum
  • Learning rate
  • Limitations and capabilities
  • Interesting applications

3
What is an Artificial Neural Network (NN)?
0.7
  • A network of many simple units (neurons, nodes)

0.3
0.2
  • The units are connected by connections
  • Each connection has a numeric weight associated
    with it
  • Units receive inputs (from the environment or
    other units) via the connections. They produce
    output using their weights and the inputs (i.e.
    they operate locally)
  • A NN can be represented as a directed graph

  • NNs learn from examples and exhibit some
    capability for generalization beyond the training
    data
  • knowledge is acquired by the network from its
    environment via learning and is stored in the
    weights of the connections
  • the training (learning) rule a procedure for
    modifying the weights of connections in order to
    perform a certain task

4
Neuron Model
  • Each connection from unit i to j has a numeric
    weigh wij associated with it, which determines
    the strength and the sign of the connection
  • Each neuron first computes the weighed sum of its
    inputs wp, and then applies an activation
    function f to derive the output (activation) a
  • A neuron may have a special weight called bias
    weight b . It is connected to a fixed input of 1.
  • NNs represent a function of their weights
    (parameters). By adjusting the weights, we change
    this function. This is done by using a learning
    rule.

if there are 2 inputs p12 and p23, and if w11
3, w121, b -1.5, then a f(2331 -1.5)
f(7.5)
5
Artificial NNs vs. Biological NNs?
  • Artificial neurons are
  • (extremely) simple abstractions of biological
    neurons
  • realized as a computer program or specialized
    hardware
  • Networks of artificial neurons
  • do not have a fraction of the power of the human
    brain but can be trained to perform useful
    functions
  • Some of the artificial NNs are models of
    biological NNs, some are not
  • Computational Neuroscience deals with creating
    realistic models of biological neurons and brain
  • The inspiration for the field of NNs came from
    the desire to produce artificial systems capable
    of sophisticated, perhaps "intelligent",
    computations similar to those that the human
    brain routinely performs, and thereby also to
    enhance our understanding of the brain

6
Human Nervous System
  • We have only just began to understand how our
    neural system operates
  • A huge number of neurons and interconnections
    between them
  • 100 billion (i.e. 1010 ) neurons in the brain
  • a full Olympic-sized swimming pool contains 1010
    raindrops the number of stars in the Milky Way
    is of the same magnitude
  • 104 connections per neuron
  • Biological neurons are slower than computers
  • Neurons operate in 10-3 seconds , computers in
    10-9 seconds
  • The brain makes up for the slow rate of operation
    by the large number of neurons and connections

7
Efficiency of Biological Neural Systems
For interested students, not examinable
  • The brain performs tasks like pattern
    recognition, perception, motor control many times
    faster than the fastest digital computers
  • efficiency of the sonar system of a bat
  • sonar is an active echo-location system
  • a bat sonar provides information about the
    distance from a target, its relative velocity,
    size, azimuth, elevation the size of various
    features of the target
  • the complex neural computations needed to extract
    all this information from the target echo occurs
    within a brain which has the size of a plum!
  • the precision and success rate of the target
    location achieved by the echo-locating bat is
    rather impossible to match by radar or sonar
    engineers
  • How does a human brain or the brain of a bat do
    it?

8
Biological Neurons
  • Purpose of neurons transmit information in the
    form of electrical signals
  • it accepts many inputs, which are all added up in
    some way
  • if enough active inputs are received at once, the
    neuron will be activated and fire if not, it
    will remain in its inactive state
  • Structure of neuron
  • body (soma) contains nucleus containing the
    chromosomes
  • dendrites
  • axon
  • synapse - a narrow gap
  • couples the axon with the dendrite of another
    cell
  • no direct linkage across the junction, it is a
    chemical one
  • information is passed from one neuron to another
    through synapses

9
Different types of biological neurons
10
Operation of biological neurons
  • Signals are transmitted between neurons by
    electrical pulses (action potentials, AP)
    traveling along the axon
  • When the potential at the synapse is raised
    sufficiently by the AP, it releases chemicals
    called neurotransmitters
  • it may take the arrival of more than one AP
    before the synapse is triggered
  • The neurotransmitters diffuse across the gap and
    chemically activate gates on the dendrites, that
    allows charged ions to flow
  • The flow of ions alters the potential of the
    dendrite and provides a voltage pulse on the
    dendrite (post-synaptic-potential, PSP)
  • some synapses excite the dendrite they affect,
    while others inhibit it
  • the synapses also determine the strength of the
    new input signal
  • each PSP travels along its dendrite and spreads
    over the soma
  • the soma sums the effects of thousands PSPs if
    the resulting potential exceeds a threshold, the
    neuron fires and generates an AP

11
Learning in Biological NNs
  • We were born with some of our neural structures
    others have been established by experience
  • At the early stage of the human brain development
    (first 2 years) about 1 million synapses are
    formed per second
  • Synapses are then modified through the learning
    process
  • Learning is achieved by
  • creation of new synaptic connections between
    neurons
  • modification of existing synapses
  • The synapses are thought to be mainly responsible
    for learning
  • 1949, Hebb proposed his famous learning rule
  • The strength of a synapse between 2 neurons is
    increased by the repeated activation of one
    neuron by the other across the synapse

12
Correspondence Between Artificial and Biological
Neurons
  • How this artificial neuron relates to the
    biological one?
  • input p (or input vector p) input signal (or
    signals) at the dendrite
  • weight w (or weight vector w) - strength of the
    synapse (or synapses)
  • summer transfer function - cell body
  • neuron output a - signal at the axon

13
Taxonomy of NNs
  • Active phase feedforward (acyclic) and
    recurrent (cyclic, feedback)
  • Learning phase - supervised and unsupervised
  • Feedforward supervised networks
  • typically used for classification and function
    approximation
  • perceptrons, ADALINEs, backpropagation networks,
    RBF, Learning Vector Quantization (LVQ) networks
  • Feedforward unsupervised networks
  • Hebbian networks used for associative learning
  • Competitive networks performing clustering and
    visualization, e.g. Self-Organizing Kohonen
    Feature Maps (SOM)
  • Recurrent networks temporal data processing
  • recurrent backpropagation, associative memories,
    adaptive resonance networks

14
Backpropagation algorithm
15
Neural Network (NN) Model
  • Computational model consisting of 3 parts
  • 1) Architecture neurons and connections
  • input, hidden, output neurons
  • fully or partially connected
  • neuron model computation performed by each
    neuron type of transfer function
  • initialization of the weights
  • 2) Learning algorithm
  • how are the weights of the connections changed in
    order to facilitate learning
  • Goal for classification tasks mapping between
    the input examples and the classes
  • 3) Recall technique how is the information
    obtained from the NN
  • for classification tasks how is the class of a
    new example determined

16
Backpropagation Network - Architecture
  • 1) A network with 1 or more hidden layers
  • e.g. a NN for the iris data

e.g. 1 output neuron for each class
output neurons
hidden neurons (1 hidden layer)
e.g. 1 input neuron for each attribute
inputs
  • 2) Feedforward network - each neuron receives
    input only from the neurons in the previous layer
  • 3) Typically fully connected - all neurons in a
    layer are connected with all neurons in the next
    layer
  • 4) Weights initialization small random values,
    e.g. -1,1

17
Backpropagation Network Architecture 2
  • 5) Neuron model - weighed sum of input signals
    differentiable transfer function

a f(wpb)
  • any differentiable transfer function f can be
    used most frequently the sigmoid and tan-sigmoid
    (hyperbolic tangent sigmoid) functions are used

18
Architecture Number of Input Units
  • Numerical data - typically 1 input unit for each
    attribute
  • Categorical data 1 input unit for each
    attribute value)
  • How many input units for the weather data?

sunny overcast rainy hot mild cool high
normal false true
outlook temperature
humidity windy
  • Encoding of the input examples typically binary
    depending on the value of the attribute (on and
    off)
  • e.g. ex 1 100 100 10 01

19
Number of Output Units
  • Typically 1 neuron for each class

target class ex1 1 0
ex.1 1 0 0 1 0 0 1
0 0 1
  • Encoding of the targets (classes) typically
    binary
  • e.g. class1 (no) 1 0, class2 (yes) 0 1

20
Number of Hidden Layers and Units in Them
  • An art! Typically - by trial and error
  • The task constrains the number of inputs and
    output units but not the number of hidden layers
    and neurons in them
  • Too many hidden layers and units (i.e. too many
    weights) overfitting
  • Too few underfitting, i.e. the NN is not able
    to learn the input-output mapping
  • A heuristic to start with 1 hidden layer with n
    hidden neurons, n(inputsoutput_neurons)/2

target class ex1 1 0


sunny overcast rainy hot mild cool high
normal false true
outlook temperature
humidity windy
ex.1 1 0 0 1 0 0 1
0 0 1
21
Learning in Backpropagation NNs
  • Supervised learning the training data
  • consists of labeled examples (p,d), i.e. the
    desired output d for them is given (p - input
    vector d- desired output)
  • can be viewed as a teacher during the training
    process
  • error - difference between the desired d and
    actual a network output
  • Idea of backpropagation learning
  • For each training example p
  • Propagate p through the network and calculate the
    output a . Compare the desired d with the actual
    output a and calculate the error
  • Update weights of the network to reduce the
    error
  • Until error over all examples lt threshold
  • Why backpropagation? Adjusts the weights
    backwards (from the output to the input units) by
    propagating the weight change

22
Backpropagation Learning - 2
  • Sum of Squared Errors (E) is a classical measure
    of error
  • E for a single training example over all output
    neurons
  • di - desired, ai - actual network output for
    output neuron i
  • Thus, backpropagation learning can be viewed as
    an optimization search in the weight space
  • Goal state the set of weights for which the
    performance index (error) is minimum
  • Search method hill climbing

23
Error Landscape in Weight Space
  • E is a function of the weights
  • Several local minima and one global minimum

E
E as a function of w1 and w2
w1
w2
  • How to minimize the error? Take steps downhill
  • Not guaranteed to find the global minimum except
    in the (glorious) situation where there is only
    one global minimum
  • How to get to the bottom as fast as possible?
    (i.e. we need to know what direction to move that
    will make the largest reduction in error)

24
Steepest Gradient Descent
  • The direction of the steepest descent is called
    gradient and can be computed ( dE/dw )
  • A function decreases most rapidly when the
    direction of movement is in the direction of the
    negative of the gradient
  • Hence, we want to adjust the weights so that the
    change moves the system down the error surface in
    the direction of the locally steepest descent,
    given by the negative of the gradient
  • - learning rate, defines the step
    typically in the range (0,1)

25
Backpropagation Algorithm - Idea
  • The backpropagation algorithm adjust weights by
    working backward from the output layer to the
    input layer
  • Calculate the error and propagate this error from
    layer to layer
  • 2 approaches
  • Incremental the weights are adjusted after each
    training example is applied
  • Called also an approximate steepest descent
  • Preferred as it requires less space
  • Batch weights are adjusted once after all
    training examples are applied and a total error
    was calculated
  • Solid lines - forward propagation of signals
  • Dashed lines backward propagation of error

26
For interested students, not examinable
Backpropagation - Derivation
  • a neural network with one hidden layer
    indexes
  • i over output neurons, j over hidden, k over
    inputs
  • E (over all neurons, for the current input
    vector p, i.e. incremental mode)

di target output of neuron i for p oi actual
output of neuron i for p
  • Express E in terms of weights and input signals
  • 1. Input for the hidden neuron j for p

2. Activation of neuron j as function of its
input
27
For interested students, not examinable
Backpropagation Derivation - 2
3. Input for the output neuron i
4. Output for the output neuron i
5. Substituting 4 into E
28
For interested students, not examinable
Backpropagation Derivation - 3
6. Steepest gradient descent adjust the weights
proportionally to the negative of the error
gradient For a weight wji to an output neuron
lt- chain rule
For a weight wkj to a hidden neuron
29
Backpropagation Rule - Summary
( i is over the nodes in the layer above q)
30
Derivative of Sigmoid Activation Function
  • From the formulas for gt we must be able
    to calculate the derivatives for f. For a
    sigmoid transfer function
  • Thus, backpropagation errors for a network with
    sigmoid transfer function
  • q is an output neuron
  • q is a hidden neuron

31
Backpropagation Algorithm - Summary
  • 1. Determine the architecture of the network
  • how many input and output neurons what output
    encoding
  • hidden neurons and layers
  • 2. Initialize all weights (biases incl.) to
    small random values, typically ??-1,1
  • 3. Repeat until termination criterion satisfied
  • (forward pass) Present a training example and
    propagate it through the network to calculate the
    actual output
  • (backward pass) Compute the error (the
    values for the output neurons).
  • Starting with output layer, repeat for
    each layer in the network
  • - propagate the
    values back to the previous layer
  • - update the
    weights between the two layers
  • The stopping criteria is checked at the end of
    each epoch
  • The error (mean absolute or mean square) is
    below a threshold
  • All training examples are propagated and the
    total error is calculated
  • The threshold is determined heuristically e.g.
    0.3
  • Maximum number of epochs is reached
  • Early stopping using a validation set
  • It typically takes hundreds or thousands of
    epochs for an NN to converge
  • Try Matlabs demo nnd11bc!
  • epoch - 1 pass through the training set

32
How to Determine if an Example is Correctly
Classified?
  • Accuracy may be used to evaluate performance
    once training has finished or as a stopping
    criteria checked at the end of each epoch
  • Binary encoding
  • apply each example and get the resulting output
    activations of the output neurons the example
    will belong to the class corresponding to the
    output neuron with highest activation.
  • Example 3 classes the outputs for ex.X are
    0.3, 0.7, 0.5 gt ex. X belongs to class 2
  • i.e. each output value is regarded as the
    probability of the example to belong to the class
    corresponding to this output

33
Backpropagation - Example
  • 2 classes, 2-d input data
  • training set
  • ex.1 0.6 0.1 class 1 (banana)
  • ex.2 0.2 0.3 class 2 (orange)
  • Network architecture
  • How many inputs?
  • How many hidden neurons?
  • Heuristic
  • n(inputsoutput_neurons)/2
  • How many output neurons?
  • What encoding of the outputs?
  • 10 for class 1, 01 for class 0
  • Initial weights and learning rate
  • Lets ?0.1 and the weights are set as in the
    picture

34
Backpropagation Example (cont. 1)
  • 1. Forward pass for ex. 1 - calculate the
    outputs o6 and o7
  • o10.6, o20.1, target output 1 0, i.e.
    class 1
  • Activations of the hidden units
  • net3 o1 w13 o2w23b30.60.10.1(-0.2)0.10.
    14
  • o31/(1e-net3) 0.53
  • net4 o1 w14 o2w24b40.600.10.20.20.22
  • o41/(1e-net4) 0.55
  • net5 o1 w15 o2w25b50.60.30.1(-0.4)0.50.
    64
  • o51/(1e-net5) 0.65
  • Activations of the output units
  • net6 o3 w36 o4w46 o5w56 b60.53(-0.4)0.55
    0.10.650.6-0.10.13
  • o61/(1e-net6) 0.53
  • net7 o3 w37 o4w47 o5w57 b70.530.20.55(-
    0.1)0.65(-0.2)0.60.52
  • o71/(1e-net7) 0.63

35
Backpropagation Example (cont. 2)
  • 2. Backward pass for ex. 1
  • Calculate the output errors ??6 and ??7 (note
    that d61, d70 for class 1)
  • ??6 (d6-o6) o6 (1-o6)(1-0.53)0.53(1-0.53)
    0.12
  • ??7 (d7-o7) o7 (1-o7)(0-0.63)0.63(1-0.63)
    -0.15
  • Calculate the new weights between the hidden and
    output units (?0.1)
  • ??w36 ? ?6 o3 0.10.120.530.006
  • ?w36new w36old ??w36 -0.40.006-0.394
  • ??w37 ? ?7 o3 0.1-0.150.53-0.008
  • ?w37new w37old ??w37 0.2-0.008-0.19
  • Similarly for ?w46new , w47new , w56new and
    w57new
  • For the biases b6 and b7 (remember biases are
    weights with input 1)
  • ??b6 ? ?6 1 0.10.120.012
  • b6new b6old ??b6 -0.10.012-0.012
  • Similarly for b7

36
Backpropagation Example (cont. 3)
  • Calculate the errors of the hidden units ?3, ?4
    and ??5
  • ??3 o3 (1-o3) (w36 ??6 w37 ??7 )
  • 0.53(1-0.53)(-0.40.120.2(-0.15))-0.019
  • Similarly for ?4 and ??5
  • Calculate the new weights between the input and
    hidden units (?0.1)
  • ??w13 ? ?3 o1 0.1(-0.019)0.6-0.0011
  • ?w13new w13old ??w13 0.1-0.00110.0989
  • Similarly for ?w23new , w14new , w24new , w15new
    and w25new b3, b4 and b6
  • 3. Repeat the same procedure for the other
    training examples
  • Forward pass for ex. 2backward pass for ex.2
  • Forward pass for ex. 3backward pass for ex. 3
  • Note its better to apply input examples in
    random order

37
Backpropagation Example (cont. 4)
  • 4. At the end of the epoch check if the
    stopping criteria is satisfied
  • if yes stop training
  • if not, continue training
  • epoch
  • go to step 1

38
Steepest Gradient Descent
  • Does the gradient descent guarantee that after
    each adjustment the error will be reduced? No!
  • Not optimal - is guaranteed to find a minimum but
    it might be a local minimum!
  • a local minimum may be a good enough solution

Backpropagations error space many local and 1
global minimum
39
Universality of Backpropagation
  • Boolean functions
  • Every boolean function of the inputs can be
    represented by network with a single hidden layer
  • Continuous functions - universal approximation
    theorems
  • Any continuous function can be approximated with
    arbitrary small error by a network with one
    hidden layer (Cybenko 1989, Hornik et al. 1989)
  • Any function (inc. discontinuous) can be
    approximated to arbitrary small error by a
    network with two hidden layers (Cybenco 1988)
  • These are existence theorems they say the
    solution exist but dont say how to choose the
    number of hidden neurons!
  • For a given network it is hard to say exactly
    which functions can be represented and which ones
    not

40
Overfitting
  • Occurs when
  • Training examples are noisy
  • Number of the free (trainable) parameters is
    bigger than the number of training examples
  • The network has been trained too long
  • Preventing overtraining
  • Use network that is just large enough to provide
    an adequate fit
  • OckhamRazor dont use a bigger net when a
    smaller one will work
  • The network should not have more free parameters
    than there are training examples!
  • However, it is difficult to know beforehand how
    large a network should be for a specific
    application!

41
Preventing Overtraining - Validation Set Approach
  • Also called an early stopping method
  • Available data is divided into 3 subsets
  • Training set
  • Used for computing the gradient and updating the
    weights
  • Validation set
  • The error on the validation set is monitored
    during the training
  • This error will normally decrease during the
    initial phase of training (as does the training
    error)
  • However, when the network begins to overfit the
    data, the error on the validation set will
    typically begin to rise
  • Training is stopped when the error on the
    validation set increases for a pres-specified
    number of iterations and the weights and biases
    at the minimum of the validation set error are
    returned
  • Testing set
  • Not used during training but to compare different
    algorithms once training has completed

42
Error Surface and Convergence - Example
  • Path b gets trapped in a local minimum
  • What can be done? Try different initializations
  • Path a converges to the optimum solution but is
    very slow
  • What can we do?

Try nnd12sd1!
43
Speeding up the Convergence
  • Solution 1 Increase the learning rate
  • Faster on the flat part but unstable when falling
    into the steep valley that contains the minimum
    point overshooting the minimum
  • Try nnd12sd2!
  • Solution 2 Smooth out the trajectory by
    averaging the weight update, e.g. make current
    update dependent on the previous
  • The use of momentum might smooth out the
    oscillations and produce a stable trajectory

44
Backpropagation with Momentum - Example
  • Example the same learning rate and initial
    position
  • Smooth and faster convergence
  • Stable algorithm
  • By the use of momentum we can use a larger
    learning rate while maintaining the stability of
    the algorithm

squared error
  • Try nnd12mo!
  • Typical momentum values used in practice 0.6-0.9

45
More on the Learning Rate
  • Constant throughout training (standard steepest
    descent)
  • The performance is very sensitive to the proper
    setting of the learning rate
  • Too small slow convergence
  • Too big oscillation, overshooting of the
    minimum
  • It is not possible to determine the optimum
    learning rate before training as it changes
    during training and depends on the error surface
  • Variable learning rate
  • goal keep the learning rate as large as possible
    while keeping learning stable
  • Several algorithms have been proposed

46
Limitations and Capabilities
  • Multilayer perceptons (MLPs) trained with
    backpropagation can perform function
    approximation and pattern classification
  • Theoretically they can
  • Perform any linear and non-linear computation
  • Can approximate any reasonable function arbitrary
    well
  • gt are able to overcome the limitations of
    earlier NNs (perceptrons and ADALINEs)
  • In practice
  • May not always find a solution can be trapped
    in a local minimum
  • Performance is sensitive to the starting
    conditions (weights initialization)
  • Sensitive to the number of hidden layers and
    neurons
  • Too few neurons underfitting, unable to learn
    what you want it to learn
  • Too many overfitting, learns slowly
  • gt the architecture of a MLP network is not
    completely constrained by the problem to be
    solved as the number of hidden layers and
    neurons are left to the designer

47
Limitations and Capabilities cont.
  • Sensitive to the value of the learning rate
  • Too small slow learning
  • Too big instability or poor performance
  • The proper choices depends on the nature of
    examples
  • Trial and error
  • Refer to the choices that have worked well in
    similar problems
  • gt successful application of NNs requires time
    and experience
  • Backpropagation - summary
  • uses steepest descent algorithm for minimizing
    the mean square error
  • Gradient descent (GD)
  • Standard GD is slow as it requires small learning
    rate for stable learning
  • GD with momentum is faster as it allows higher
    learning rate while maintaining stability
  • There are several variations of the
    backpropagation algorithm

48
Some Interesting NN Applications
  • A few examples of the many significant
    applications of NNs
  • You can use them for the paper presentation in
    w12 and 13!
  • Network design was the result of several months
    trial and error experimentation
  • Moral NNs are widely applicable but they cannot
    magically solve problems wrong choices lead to
    poor performance
  • NNs are the second best way of doing just about
    anything John Denker
  • NN provide passable performance on many tasks
    that would be difficult to solve explicitly with
    other techniques

49
For interested students only, not examinable
NETtalk
  • Sejnowski and Rosenberg 87
  • Pronunciation of written English
  • Fascinating problem in linguistics
  • Task with high commercial profit
  • How?
  • Mapping the text stream to phonemes
  • Passing the phonemes to speech generator
  • Task for the NN learning to map the text to
    phonemes
  • Good task for a NN as most of the rules are
    approximately correct
  • E.g. cat k, century s

50
For interested students only, not examinable
NETtalk -Architecture
  • 203 input neurons 7 (sliding window the
    character to be pronounced and the 3 characters
    before and after it) x 29 possible characters (26
    letters blank, period, other punctuation)
  • 80 hidden
  • 26 output corresponding to the phonemes

51
For interested students only, not examinable
NETtalk - Performance
  • Training set
  • 1024-words hand transcribed into phonemes
  • Accuracy on training set 90 after 50 epochs
  • Why not 100?
  • A few dozen hours of training time a few months
    of experimentation with different architectures
  • Testing
  • Accuracy 78
  • Importance
  • A good showpiece for the philosophy of NNs
  • The network appears to mimic the speech patterns
    of young children incorrect bubble at first (as
    the weights are random), then gradually improving
    to become understandable

52
For interested students only, not examinable
Handwritten Character Recognition
  • Le Cun et al. 89
  • Read zip code on hand-addressed envelopes
  • Task for the NN
  • A preprocessor is used to recognize the segments
    in the individual digits
  • Based on the segments, the network has to
    identify the digits
  • Network architecture
  • 256 input neurons 16x16 array of pixels
  • 3 hidden layers 768, 192, 30 neurons
    respectively
  • 10 output neurons digits 0-9
  • Not fully connected network
  • If it was a fully connected network 200 000
    connections (impossible to train) instead only
    9760 connections
  • Units in the hidden layer act as feature
    detectors e.g. each unit in the 1st hidden
    layer is connected with 25 input neurons
    (5x5pixel region)

53
For interested students only, not examinable
Handwritten Character Recognition cont.
  • Training 7300 examples
  • Testing 2000 examples
  • Accuracy 99
  • Hardware implementation (in VLSI)
  • enables letters to be sorted at high speed
  • zip codes
  • One of the largest applications of NNs

54
For interested students only, not examinable
Driving Motor Vehicles
  • Pomerleau et al., 1993
  • ALVIN (Autonomous Land Vehicle In a Neural
    Network)
  • Learns to drive a van along a single lane on a
    highway
  • Once trained on a particular road, ALVIN can
    drive at speed gt 40 miles per hour
  • Chevy van and US Army HMMWV personnel carrier
  • computer-controlled steering, acceleration and
    braking
  • sensors color stereo video camera, radar,
    positioning system, scanning laser finders

55
For interested students only, not examinable
ALVINN - Architecture
  • Fully connected backpropagation NN with 1 hidden
    layer
  • 960 input neurons the signal from the camera is
    preprocessed to yield 30x32 image intensity grid
  • 5 hidden neurons
  • 32 output neurons corresponding to directions
  • If the output node with the highest activation is
  • The left most , than ALVINN turns sharply left
  • The right most, than ALVINN turns sharply right
  • A node between them, than ALVINN directs the van
    in a proportionally intermediate direction
  • Smoothing the direction it is calculated as
    average suggested not only by the output node
    with highest activation but also by the nodes
    immediate neighbours
  • Training examples (image-direction pairs)
  • Recording such pairs when human drives the
    vehicle
  • After collecting 5 mins such data and 10 mins
    training, ALVINN can drive on its own

56
For interested students only, not examinable
ALVINN - Training
  • Training examples (image-direction pairs)
  • Recording such pairs when human drives the
    vehicle
  • After collecting 5 min such data and 10 min
    training, ALVINN can drive on its own
  • Potential problem as the human is too good and
    (typically) does not stray from the lane, there
    are no training examples that show how to recover
    when you are misaligned with the road
  • Solution ALVINN corrects this by creating
    synthetic training examples it rotates each
    video image to create additional views of what
    the road would look like if the van were a little
    off course to the left or right

57
For interested students only, not examinable
ALVINN - Results
  • Impressive results
  • ALVINN has driven at speeds up to 70 miles per
    hour for up to 90 miles on public highways near
    Pittsburgh
  • Also at normal speeds on single lane dirt roads,
    paved bike paths, and two lane suburban streets
  • Limitations
  • Unable to drive on a road type for which it
    hasnt been trained
  • Not very robust to changes in lighting conditions
    and presence of other vehicles
  • Comparison with traditional vision algorithms
  • Use image processing to analyse the scene and
    find the road and then follow it
  • Most of them achieve 3-4 miles per hour

58
For interested students only, not examinable
ALVINN - Discussion
  • Why is ALVINN so successful?
  • Fast computation - once trained, the NN is able
    to compute a new steering direction 10 times a
    second gt the computed direction can be off by
    10 from the ideal as long as the system is able
    to make a correction in a few tenths of a second
  • Learning from examples is very appropriate
  • No good theory of driving but it is easy to
    collect examples . Motivated the use of learning
    algorithm (but not necessary NNs)
  • Driving is continuous, noisy domain, in which
    almost all features contribute some information
    gt NNs are better choice than some other learning
    algorithms (e.g. DTs)
Write a Comment
User Comments (0)
About PowerShow.com