Connectionist Models - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Connectionist Models

Description:

(actually a bidirectional network the weight from node a to node b ... Pattern recognizers, associative memories, pattern transformers, dynamic transformers ... – PowerPoint PPT presentation

Number of Views:486

Avg rating:3.0/5.0

Slides: 31

Provided by: gatew220

Category:

more less

Transcript and Presenter's Notes

Title: Connectionist Models

1
Connectionist Models

Motivated by Brain rather Mind
A large number of very simple processing elements
A large number of weighted connections between
elements (network)
Parallel, distributed control
Emphasis on learning internal representations
automatically

2
Hopfield Networks

Undirected network w/ integer weights
(actually a bidirectional network the weight
from node a to node b is the same as the weight
from b to a)
Nodes are units which are on or off
Given input, use Parallel Relaxation
For each node, sum up the weights of all adjacent
on nodes
If sum 0 then turn the node on else turn the
node off
Update all nodes at once (in parallel) so that
turning on a node will have an affect during the
next cycle
Continue until the network is stable

3
More on Hopfield Nets

A given network will have some steady states --
those states that will always be reached after
parallel relaxation
Any input will settle in a steady state
Each differing state might be a representation
for a value/class
Hopfield net becomes an associative or
content-addressable memory with distributed
representation and some fault tolerance

4
Learning in Neural Nets

Note that Hopfield Networks are built with the
weights installed, they do not learn these
weights
Other approaches allow a given network to learn
their own weights
Perceptrons (earliest attempt at NN)
Backpropagation Networks
Boltzmann Machines

5
Perceptrons

Earliest NN research, begun in 1962
A Perceptron has n inputs and n1 edges/weights
(where input 0 is always a 1). Weights are real
numbers between -1 and 1
Each input is on or off
A Perceptron sums up the totals of each
inputweight, if sum 0 then perceptron is on
(1), if sum

6
Combining Perceptrons

As neurons connect to other neurons so that the
output of one is an input to another, perceptrons
can be linked together
Some perceptrons might be elementary feature
detectors, others might be decision makers based
on inidividually detected features. See fig.
18.8 p. 495

7
Fixed-Increment Perceptron Learning Algorithm

Given a classification problem with n inputs and
two outputs (yes or no)
Determine the weights for each of the n inputs
such that a yes is provided if a given input is
in the class being learned and a no is provided
otherwise
Use a training set of positive and negative
examples
Alter weights so that the system performs better
and better

8
Algorithm

Given a perceptron of n1 inputs/weights (0th
input is always 1)
Initialize weights to a random real -1 .. 1
Iterate through a training set accumulating all
misclassified examples
Compute a vector S of all misclassified inputs --
add to S all inputs in which the perceptron
failed to fire and subtract from S all inputs in
which it fired but shouldnt have
Modify weights by adding Sscale factor
Scale factor determines how quickly it might
learn (also how quickly it might err)

9
Perceptron Convergeance Theorem

Guarantees that a perceptron will find a set of
weights to properly classify anything as long as
the inputs are linearly separable (see fig 18.11
p. 499)
Linear separability is not limited to two
dimensions but can work in any number of
dimensions n dimensions ? n1 inputs/weights

10
Linear Separability

Are all problems linearly separable? No.
Consider XOR which has two linear decision
surfaces
No single perceptron can learn XOR - a very basic
function
Therefore perceptrons will not be able to learn
many things either complex or simple

11
Solving XOR with Perceptrons

We can have perceptrons solve the XOR problem by
combining two perceptrons together (fig 18.13 p.
500)
However, the combining weight -9.0 cannot be
learned by the previous algorithm
There is no way to make combined perceptrons
learn
Therefore, perceptrons are not very useful

12
Backpropagation networks

The XOR problem killed Neural Network research
for about 15 years
In the 1980s, researchers revived NN research by
using a new learning algorithm and network known
as Backpropagation
Unlike perceptrons, these are multilayered
networks that seem to have no limitation to what
they can learn

13
Multilayered Networks

Input layer
Hidden layer 1
Hidden layer 2 ...
Hidden layer N
Output layer
Each layer is fully connected to its preceeding
and succeeding layers only
Every edge has its own weight

14
Feedforward Activation

Input if the feature that the node represents
is present, the node is set to 1, else 0
Multiply input value weight and send value to
next layer
Output of a node 1/(1e-sum) (sigmoid
function, see fig 18.16 p 503)
The sigmoid function permits a grey area where
a node can have some degree of uncertainty
(unlike the perceptron where all nodes were
either 0 or 1)
Each node at next layer will compute the sigmoid
function and propagate values to the next layer
Propagate these values forward until output is
achieved

15
Backpropagation

To train a multilayered network
randomly initialize all weights -1..1
choose a training example and use feedforward
if correct, backpropagate reward by increasing
weights that led to correct output
if incorrect, backpropagate punishment by
decreasing weights that led to incorrect output

16
Backpropagation continued

Continue this for each example in training set
This is 1 epoch
After 1 complete epoch, repeat process
Repeat until network has reached a stable state
(i.e. changes to weights are always less than
some minimum amount that is trivial)
Training may take 1000s or more epochs! (even
millions)

17
Perceptron vs. Backprop learning

Perceptrons are guaranteed to reach a stable
state if the concept is linearly separable
Multilayered networks have no guarantee of
reaching a stable state (known as a global
minima) they may get stuck in a local minima
(recall hill climbing)
However, Multilayered nets can learn a much
larger range of things

18
Variations to Backprop

Momentum Factor - decrease the rate of change in
weights as time goes on large leaps early on,
small changes later
Simulated Annealing - change activation function
to p1/(1e-Sum/T) where T is temperature

19
Sensitivity of initial conditions

It turns out that the initialized random weights
can play a dramatic role in learning
One set of weights may require many more times
the amount of epochs
However, one set of weights that leads to quick
learning will probably not be useful for a
different network!
A IxHxO network will differ from an Ix(H1)xO
network meaning the new network will require
training on its own

20
Generalization

Given a training set, a NN could learn to
generalize to the entire class
However, the more the system sees the training
set, the more it will learn just the training
set!
One must be careful in that the system must learn
but not overlearn
How can one control this?

21
Boltzmann Machines

Hopfield networks can solve a variety of
constraint satisifaction (optimization) problems
However, Hopfield Nets reach a stable state
(local minima) instead of a an optimal solution
(global minima)
Use simulated annealing instead of parallel
relaxation.
p1/(1e(-sum E/T) )
This computes a probability of whether a node
should activate or not

22
Boltzmann Machines cont.

Bolzmann Machines can learn weights w/ a
variation of backprop (although much more
complicated)
In a Boltzmann Machine, we can actually assign
features or values to each node (that is, nodes
stand for something, unlike normal NNs)
Boltzmann machines combine all of these ideas to
solve optimization problems like travelling
salesman (see fig 18.21 p 517)

23
Uses of NNs

NN are knowledge poor and have internal
representations that are meaningless to us
However, NN can learn classifications and
recognitions
Some useful applications include
Pattern recognizers, associative memories,
pattern transformers, dynamic transformers

24
Particular Domains

Speech recognition (vowel distinction)
Visual Recognition
Combinatorial problems
Motor-type problems (including vehicular control)
Classification-type problems with reasonable
sized inputs
Game playing (backgammon)

25
Advantages of NN

Able to handle fuzziness
Able to handle degraded inputs and ambiguity
Able to learn their own internal representations
learn new things
Use distributed representations
Capable of supervised unsupervised learning
Easy to build

26
Disadvantages of NNs

Lengthy training times
Unpredictable nature
Inability to handle temporal issues
Fixed-size input restricts dynamic changes in the
problem
Are not process-oriented, cannot solve
non-recognition problems
Cannot use symbolic knowledge
May not generalize if training set is biased

27
Recurrent Networks

In order to handle the problem of
temporal-dependent inputs, recurrent networks
have been tried
Essentially the output of the network is wrapped
around and used as additional inputs
Temporal dependence occurs in many problems,
consider driving how hard to press on the
brake? This might change between the first
moment and the next
See figures 18.22, 18.23 p 518-519 for examples
These networks have other problems, notably
learning using backprop where an output error can
be propagated to the next iteration

28
More on Recurrent Networks

These networks are often applied to problems that
not only have input-output pairs, but internal
states
These are sometimes referred to as mental models
Training is done for two different things, first
the models or internal states are learned, and
then the individual items within each state
Consider a recurrent network which is trained to
perform speech recognition it might first learn
different kinds of speech phenomena that are not
apparent (visible) to us, and then learn to
interpret these as the different phonetic units
as the output (see figure 18.23)

29
NN/brain comparison - flaws

NN Nodes are not neurons, NN nodes much too
primitive
Lack of useful input (a few bits rather than
millions or billions of bits of input)
Lack of nodes (brain has billions of neurons, NN
usually have 10-100)
Evolution vs. Backprop learning -- both the type
of learning and the time taken

30
Genetic Algorithms

Features are stored as a vector
Backprop-like algorithm used to alter vector
values based on training examples
Uses techniques such as mutation and heredity
to alter new vectors in attempts to evolve
better representations for the next iteration

Write a Comment

User Comments (0)