CENG 569 NEUROCOMPUTING - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

CENG 569 NEUROCOMPUTING

Description:

The training time tends to be 'surprisingly long' Weights tend to become rather large. ... Discriminate between sonar return signals from a rock or a metal cylinder. ... – PowerPoint PPT presentation

Number of Views:95

Avg rating:3.0/5.0

Slides: 33

Provided by: erols

Category:

more less

Transcript and Presenter's Notes

Title: CENG 569 NEUROCOMPUTING

1
CENG 569NEUROCOMPUTING
Erol SahinDept. of Computer EngineeringMiddle
East Technical UniversityInonu Bulvari, 06531,
Ankara, TURKEY

Week 3 Generalization, and applications of
backpropagation

2
Todays Topics

Generalization
Toy and benchmarking problems
XOR (Rumelhart, Hinton, Williams 1986)
Parity (Minsky and Papert 1969)
Encoder (Ackley, Hinton, Sejnowski 1985)
Applications
NETTalk (Sejnowski, Rosenberg 1987)

3
Generalization

If we already know the pattern associations to be
learned, why learn?

4
Overtraining

Lower training error does not necessarily mean
lower testing error. When to stop?
How is the number of hidden units related with
generalization performance?

5
The XOR problem

Not linearly separable.
Requires hidden units.
The training time tends to be surprisingly long
Weights tend to become rather large.
Good way to test the network.

6
Parity problem

1 if there are an odd number of 1's in the input,
0 otherwise.
Essentially a generalization of the XOR problem
for N inputs.
At least N hidden neurons are required.
Output changes whenever any single input changes.
Not a typical real-world classication problem.
Why not?

7
Encoder

Auto-association input output
Requires an efcient encoding at the hidden layer.
For binary patterns, at least log2N hidden
neurons required.
A trivial solution is to use binary coding at the
hidden layer.
Can be used for code compression.

8
NETTalk

Learn to read English text.
Inputs seven consecutive characters from a text.
203 inputs 26 letter 3 punctuation mark
detectors
26 outputs 23 articulatory features 3 markers.

Learning gradient descent with momentum.
Trained on 1024 words from a transcription of a
child's informal speech.
On the training test 95 accuracy.
Graceful degradation.
Faster re-training.

10
Sonar Target Recognition

Work done by Gorman and Sejnowski (1988)
Discriminate between sonar return signals from a
rock or a metal cylinder.
Inputs 60 units tuned to frequency
(preprocessing).
Hidden Variable number from 0 to 24.
Output Two units - (0,1) or (1,0) coding
Learning Standard gradient descent with no
momentum.

208 returns hand picked from 1200.
Temporal signal filtered and transformed to
spectral information (why?)
Normalized envelope calculated for 60 samples
(inputs).
Two important contributions
Vary the number of the hidden units and analyze
the performance.
Analyze the weights to see what they have
learned.

12
Hidden Unit Analysis

Evaluate the performance on 16 test and 192
training patterns combined.
For each network (0 to 24 hidden units)
run 10 times (300 epochs) and average.
select a different 16-element test set and
average.
Performance and training speed increase with the
number of hidden units.
With no hidden units, the network can never
achieve 100.
What's coded in the weights? How do we analyze
what's being learned?

13
Temporal patterns

Static pattern recognition. No temporal info.
What about dynamic patterns that have a sequence
info.?
Temporal association e.g. plant modelling,
control
Sequence reproduction e.g. financial forecasting
Sequence recognition e.g. speech recognition
Several architectures developed to tackle such
problems.

14
Recurrent backpropagation

Proposed by Pineada (1987, 1989), Almeida (1987,
1988).
Extends back-prop to arbitrary networks, as long
as they converge to stable states. Define the
activation of the neurons in a time-dependent way
as
One way to train is as follows
Relax the network to find all xis.
Calculate the errors.
Relax the error-propagation ?i's.
Update the weights using ? wis.
Note that recurrent back-propagation can be used
as an associative memory, allowing the completion
of noisy patterns.

15
Learning time sequences

Turn the problem into the static pattern
recognition problem.
Time-Delay Neural Networks (TDNN's)

16
Plant identification

Nonlinear dynamical plant identication (forward
dynamics).
It is also possible to learn the reverse dynamics

Once the plant is identied it can be used for
control.
It is also possible to learn the reverse
dynamics. But under which conditions?

18
The truck backer-upper

Design a controller to back truck into a loading
dock.
The controller receives the observed state
xx,y,?t, ?c and produces the steering angle ?s

The solution proposed
Train a network to emulate truck and trailer
dynamics.
Use the emulator to train the controller so that
error can be passed back through the network.
http//www.stanford.edu/class/ee373b/truckbackerup
per.pdf

Train the network to emulate the truck dynamics.

C Controller
T Truck and trailer emulator

22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
Protein Second structure

Done by Olan and Sejnowski (1988). Similar to
NETtalk. The network takes as input a moving
window of 13 aminoacids and produced a prediction
of ?-helix, ?-sheet, or other for the central
part of the protein structure.
After training the network achieved 62 accuracy
on the test set, compared to 53 for best
alternative.

27
Hyphenation

Done by Brunak and Lautrup (1989). The network
finds hyphenation points for words in languages
that have context-dependent rules, such as Danish
and German.

28
Navigation of a car (Pomerlau 1989)

Done by Pomerlau (1989). The network takes inputs
from a 34X36 video image and a 7X36 range finder.
Output units represent drive straight, turn
left or turn right. After training about 40
times on 1200 road images, the car drove around
CMU campus at 5 km/h (using a small workstation
on the car). This was almost twice the speed of
any other non-NN algorithm at the time.

29
Backgammon

Done Tesauro and Sejnowski (1988). A version of
.Neurogammon., trained by an expert player,
defeated all other programs (but not the human
champion) at the computer olympiads in London,
1989.
TD-Gammon is a neural network that trains itself
to be an evaluation function for the game of
backgammon by playing against itself and learning
from the outcome.
http//researchweb.watson.ibm.com/massive/tdl.html

30
Criticism of Back-propagation

Just a statistical method, and one that is not
very well characterized
Not biologically plausible
Don't have much to say on how brain works
Only works for well-crafted, supervised problems

31
Closing remarks

One reason why progress has been so slow in this
field is that researchers unfamiliar with its
history have continued to make the same mistakes
that others have made before them. We believe
this realm of work to be immensely important and
rich, but we expect its growth to require a
degree of critical analysis that its more
romantic advocates have always been reluctant to
pursue - perhaps because the spirit of
connectionism seems itself to go somewhat against
the grain of analytic rigor Prologue to the 1988
edition of Perceptrons

32
Whats coming next?