CENG 569 NEUROCOMPUTING - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

CENG 569 NEUROCOMPUTING

Description:

The training time tends to be 'surprisingly long' Weights tend to become rather large. ... Discriminate between sonar return signals from a rock or a metal cylinder. ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 33
Provided by: erols
Category:

less

Transcript and Presenter's Notes

Title: CENG 569 NEUROCOMPUTING


1
CENG 569NEUROCOMPUTING
Erol SahinDept. of Computer EngineeringMiddle
East Technical UniversityInonu Bulvari, 06531,
Ankara, TURKEY
  • Week 3 Generalization, and applications of
  • backpropagation

2
Todays Topics
  • Generalization
  • Toy and benchmarking problems
  • XOR (Rumelhart, Hinton, Williams 1986)
  • Parity (Minsky and Papert 1969)
  • Encoder (Ackley, Hinton, Sejnowski 1985)
  • Applications
  • NETTalk (Sejnowski, Rosenberg 1987)

3
Generalization
  • If we already know the pattern associations to be
    learned, why learn?

4
Overtraining
  • Lower training error does not necessarily mean
    lower testing error. When to stop?
  • How is the number of hidden units related with
    generalization performance?

5
The XOR problem
  • Not linearly separable.
  • Requires hidden units.
  • The training time tends to be surprisingly long
  • Weights tend to become rather large.
  • Good way to test the network.

6
Parity problem
  • 1 if there are an odd number of 1's in the input,
    0 otherwise.
  • Essentially a generalization of the XOR problem
    for N inputs.
  • At least N hidden neurons are required.
  • Output changes whenever any single input changes.
  • Not a typical real-world classication problem.
    Why not?

7
Encoder
  • Auto-association input output
  • Requires an efcient encoding at the hidden layer.
  • For binary patterns, at least log2N hidden
    neurons required.
  • A trivial solution is to use binary coding at the
    hidden layer.
  • Can be used for code compression.

8
NETTalk
  • Learn to read English text.
  • Inputs seven consecutive characters from a text.
  • 203 inputs 26 letter 3 punctuation mark
    detectors
  • 26 outputs 23 articulatory features 3 markers.

9
  • Learning gradient descent with momentum.
  • Trained on 1024 words from a transcription of a
    child's informal speech.
  • On the training test 95 accuracy.
  • Graceful degradation.
  • Faster re-training.

10
Sonar Target Recognition
  • Work done by Gorman and Sejnowski (1988)
  • Discriminate between sonar return signals from a
    rock or a metal cylinder.
  • Inputs 60 units tuned to frequency
    (preprocessing).
  • Hidden Variable number from 0 to 24.
  • Output Two units - (0,1) or (1,0) coding
  • Learning Standard gradient descent with no
    momentum.

11
  • 208 returns hand picked from 1200.
  • Temporal signal filtered and transformed to
    spectral information (why?)
  • Normalized envelope calculated for 60 samples
    (inputs).
  • Two important contributions
  • Vary the number of the hidden units and analyze
    the performance.
  • Analyze the weights to see what they have
    learned.

12
Hidden Unit Analysis
  • Evaluate the performance on 16 test and 192
    training patterns combined.
  • For each network (0 to 24 hidden units)
  • run 10 times (300 epochs) and average.
  • select a different 16-element test set and
    average.
  • Performance and training speed increase with the
    number of hidden units.
  • With no hidden units, the network can never
    achieve 100.
  • What's coded in the weights? How do we analyze
    what's being learned?

13
Temporal patterns
  • Static pattern recognition. No temporal info.
  • What about dynamic patterns that have a sequence
    info.?
  • Temporal association e.g. plant modelling,
    control
  • Sequence reproduction e.g. financial forecasting
  • Sequence recognition e.g. speech recognition
  • Several architectures developed to tackle such
    problems.

14
Recurrent backpropagation
  • Proposed by Pineada (1987, 1989), Almeida (1987,
    1988).
  • Extends back-prop to arbitrary networks, as long
    as they converge to stable states. Define the
    activation of the neurons in a time-dependent way
    as
  • One way to train is as follows
  • Relax the network to find all xis.
  • Calculate the errors.
  • Relax the error-propagation ?i's.
  • Update the weights using ? wis.
  • Note that recurrent back-propagation can be used
    as an associative memory, allowing the completion
    of noisy patterns.

15
Learning time sequences
  • Turn the problem into the static pattern
    recognition problem.
  • Time-Delay Neural Networks (TDNN's)

16
Plant identification
  • Nonlinear dynamical plant identication (forward
    dynamics).
  • It is also possible to learn the reverse dynamics

17
  • Once the plant is identied it can be used for
    control.
  • It is also possible to learn the reverse
    dynamics. But under which conditions?

18
The truck backer-upper
  • Design a controller to back truck into a loading
    dock.
  • The controller receives the observed state
    xx,y,?t, ?c and produces the steering angle ?s

19
  • The solution proposed
  • Train a network to emulate truck and trailer
    dynamics.
  • Use the emulator to train the controller so that
    error can be passed back through the network.
  • http//www.stanford.edu/class/ee373b/truckbackerup
    per.pdf

20
  • Train the network to emulate the truck dynamics.

21
  • C Controller
  • T Truck and trailer emulator

22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
Protein Second structure
  • Done by Olan and Sejnowski (1988). Similar to
    NETtalk. The network takes as input a moving
    window of 13 aminoacids and produced a prediction
    of ?-helix, ?-sheet, or other for the central
    part of the protein structure.
  • After training the network achieved 62 accuracy
    on the test set, compared to 53 for best
    alternative.

27
Hyphenation
  • Done by Brunak and Lautrup (1989). The network
    finds hyphenation points for words in languages
    that have context-dependent rules, such as Danish
    and German.

28
Navigation of a car (Pomerlau 1989)
  • Done by Pomerlau (1989). The network takes inputs
    from a 34X36 video image and a 7X36 range finder.
    Output units represent drive straight, turn
    left or turn right. After training about 40
    times on 1200 road images, the car drove around
    CMU campus at 5 km/h (using a small workstation
    on the car). This was almost twice the speed of
    any other non-NN algorithm at the time.

29
Backgammon
  • Done Tesauro and Sejnowski (1988). A version of
    .Neurogammon., trained by an expert player,
    defeated all other programs (but not the human
    champion) at the computer olympiads in London,
    1989.
  • TD-Gammon is a neural network that trains itself
    to be an evaluation function for the game of
    backgammon by playing against itself and learning
    from the outcome.
  • http//researchweb.watson.ibm.com/massive/tdl.html

30
Criticism of Back-propagation
  • Just a statistical method, and one that is not
    very well characterized
  • Not biologically plausible
  • Don't have much to say on how brain works
  • Only works for well-crafted, supervised problems

31
Closing remarks
  • One reason why progress has been so slow in this
    field is that researchers unfamiliar with its
    history have continued to make the same mistakes
    that others have made before them. We believe
    this realm of work to be immensely important and
    rich, but we expect its growth to require a
    degree of critical analysis that its more
    romantic advocates have always been reluctant to
    pursue - perhaps because the spirit of
    connectionism seems itself to go somewhat against
    the grain of analytic rigor Prologue to the 1988
    edition of Perceptrons

32
Whats coming next?
  • The Hopfield Model
  • Boltzman Machine
  • Simulated Annealing
  • Competitive learning
  • Kohonen's self-organizing nets
  • Principal component analysis
Write a Comment
User Comments (0)
About PowerShow.com