Connectionist Models - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Connectionist Models

Description:

(actually a bidirectional network the weight from node a to node b ... Pattern recognizers, associative memories, pattern transformers, dynamic transformers ... – PowerPoint PPT presentation

Number of Views:486
Avg rating:3.0/5.0
Slides: 31
Provided by: gatew220
Category:

less

Transcript and Presenter's Notes

Title: Connectionist Models


1
Connectionist Models
  • Motivated by Brain rather Mind
  • A large number of very simple processing elements
  • A large number of weighted connections between
    elements (network)
  • Parallel, distributed control
  • Emphasis on learning internal representations
    automatically

2
Hopfield Networks
  • Undirected network w/ integer weights
  • (actually a bidirectional network the weight
    from node a to node b is the same as the weight
    from b to a)
  • Nodes are units which are on or off
  • Given input, use Parallel Relaxation
  • For each node, sum up the weights of all adjacent
    on nodes
  • If sum 0 then turn the node on else turn the
    node off
  • Update all nodes at once (in parallel) so that
    turning on a node will have an affect during the
    next cycle
  • Continue until the network is stable

3
More on Hopfield Nets
  • A given network will have some steady states --
    those states that will always be reached after
    parallel relaxation
  • Any input will settle in a steady state
  • Each differing state might be a representation
    for a value/class
  • Hopfield net becomes an associative or
    content-addressable memory with distributed
    representation and some fault tolerance

4
Learning in Neural Nets
  • Note that Hopfield Networks are built with the
    weights installed, they do not learn these
    weights
  • Other approaches allow a given network to learn
    their own weights
  • Perceptrons (earliest attempt at NN)
  • Backpropagation Networks
  • Boltzmann Machines

5
Perceptrons
  • Earliest NN research, begun in 1962
  • A Perceptron has n inputs and n1 edges/weights
    (where input 0 is always a 1). Weights are real
    numbers between -1 and 1
  • Each input is on or off
  • A Perceptron sums up the totals of each
    inputweight, if sum 0 then perceptron is on
    (1), if sum

6
Combining Perceptrons
  • As neurons connect to other neurons so that the
    output of one is an input to another, perceptrons
    can be linked together
  • Some perceptrons might be elementary feature
    detectors, others might be decision makers based
    on inidividually detected features. See fig.
    18.8 p. 495

7
Fixed-Increment Perceptron Learning Algorithm
  • Given a classification problem with n inputs and
    two outputs (yes or no)
  • Determine the weights for each of the n inputs
    such that a yes is provided if a given input is
    in the class being learned and a no is provided
    otherwise
  • Use a training set of positive and negative
    examples
  • Alter weights so that the system performs better
    and better

8
Algorithm
  • Given a perceptron of n1 inputs/weights (0th
    input is always 1)
  • Initialize weights to a random real -1 .. 1
  • Iterate through a training set accumulating all
    misclassified examples
  • Compute a vector S of all misclassified inputs --
    add to S all inputs in which the perceptron
    failed to fire and subtract from S all inputs in
    which it fired but shouldnt have
  • Modify weights by adding Sscale factor
  • Scale factor determines how quickly it might
    learn (also how quickly it might err)

9
Perceptron Convergeance Theorem
  • Guarantees that a perceptron will find a set of
    weights to properly classify anything as long as
    the inputs are linearly separable (see fig 18.11
    p. 499)
  • Linear separability is not limited to two
    dimensions but can work in any number of
    dimensions n dimensions ? n1 inputs/weights

10
Linear Separability
  • Are all problems linearly separable? No.
  • Consider XOR which has two linear decision
    surfaces
  • No single perceptron can learn XOR - a very basic
    function
  • Therefore perceptrons will not be able to learn
    many things either complex or simple

11
Solving XOR with Perceptrons
  • We can have perceptrons solve the XOR problem by
    combining two perceptrons together (fig 18.13 p.
    500)
  • However, the combining weight -9.0 cannot be
    learned by the previous algorithm
  • There is no way to make combined perceptrons
    learn
  • Therefore, perceptrons are not very useful

12
Backpropagation networks
  • The XOR problem killed Neural Network research
    for about 15 years
  • In the 1980s, researchers revived NN research by
    using a new learning algorithm and network known
    as Backpropagation
  • Unlike perceptrons, these are multilayered
    networks that seem to have no limitation to what
    they can learn

13
Multilayered Networks
  • Input layer
  • Hidden layer 1
  • Hidden layer 2 ...
  • Hidden layer N
  • Output layer
  • Each layer is fully connected to its preceeding
    and succeeding layers only
  • Every edge has its own weight

14
Feedforward Activation
  • Input if the feature that the node represents
    is present, the node is set to 1, else 0
  • Multiply input value weight and send value to
    next layer
  • Output of a node 1/(1e-sum) (sigmoid
    function, see fig 18.16 p 503)
  • The sigmoid function permits a grey area where
    a node can have some degree of uncertainty
    (unlike the perceptron where all nodes were
    either 0 or 1)
  • Each node at next layer will compute the sigmoid
    function and propagate values to the next layer
  • Propagate these values forward until output is
    achieved

15
Backpropagation
  • To train a multilayered network
  • randomly initialize all weights -1..1
  • choose a training example and use feedforward
  • if correct, backpropagate reward by increasing
    weights that led to correct output
  • if incorrect, backpropagate punishment by
    decreasing weights that led to incorrect output

16
Backpropagation continued
  • Continue this for each example in training set
  • This is 1 epoch
  • After 1 complete epoch, repeat process
  • Repeat until network has reached a stable state
    (i.e. changes to weights are always less than
    some minimum amount that is trivial)
  • Training may take 1000s or more epochs! (even
    millions)

17
Perceptron vs. Backprop learning
  • Perceptrons are guaranteed to reach a stable
    state if the concept is linearly separable
  • Multilayered networks have no guarantee of
    reaching a stable state (known as a global
    minima) they may get stuck in a local minima
    (recall hill climbing)
  • However, Multilayered nets can learn a much
    larger range of things

18
Variations to Backprop
  • Momentum Factor - decrease the rate of change in
    weights as time goes on large leaps early on,
    small changes later
  • Simulated Annealing - change activation function
    to p1/(1e-Sum/T) where T is temperature

19
Sensitivity of initial conditions
  • It turns out that the initialized random weights
    can play a dramatic role in learning
  • One set of weights may require many more times
    the amount of epochs
  • However, one set of weights that leads to quick
    learning will probably not be useful for a
    different network!
  • A IxHxO network will differ from an Ix(H1)xO
    network meaning the new network will require
    training on its own

20
Generalization
  • Given a training set, a NN could learn to
    generalize to the entire class
  • However, the more the system sees the training
    set, the more it will learn just the training
    set!
  • One must be careful in that the system must learn
    but not overlearn
  • How can one control this?

21
Boltzmann Machines
  • Hopfield networks can solve a variety of
    constraint satisifaction (optimization) problems
  • However, Hopfield Nets reach a stable state
    (local minima) instead of a an optimal solution
    (global minima)
  • Use simulated annealing instead of parallel
    relaxation.
  • p1/(1e(-sum E/T) )
  • This computes a probability of whether a node
    should activate or not

22
Boltzmann Machines cont.
  • Bolzmann Machines can learn weights w/ a
    variation of backprop (although much more
    complicated)
  • In a Boltzmann Machine, we can actually assign
    features or values to each node (that is, nodes
    stand for something, unlike normal NNs)
  • Boltzmann machines combine all of these ideas to
    solve optimization problems like travelling
    salesman (see fig 18.21 p 517)

23
Uses of NNs
  • NN are knowledge poor and have internal
    representations that are meaningless to us
  • However, NN can learn classifications and
    recognitions
  • Some useful applications include
  • Pattern recognizers, associative memories,
    pattern transformers, dynamic transformers

24
Particular Domains
  • Speech recognition (vowel distinction)
  • Visual Recognition
  • Combinatorial problems
  • Motor-type problems (including vehicular control)
  • Classification-type problems with reasonable
    sized inputs
  • Game playing (backgammon)

25
Advantages of NN
  • Able to handle fuzziness
  • Able to handle degraded inputs and ambiguity
  • Able to learn their own internal representations
    learn new things
  • Use distributed representations
  • Capable of supervised unsupervised learning
  • Easy to build

26
Disadvantages of NNs
  • Lengthy training times
  • Unpredictable nature
  • Inability to handle temporal issues
  • Fixed-size input restricts dynamic changes in the
    problem
  • Are not process-oriented, cannot solve
    non-recognition problems
  • Cannot use symbolic knowledge
  • May not generalize if training set is biased

27
Recurrent Networks
  • In order to handle the problem of
    temporal-dependent inputs, recurrent networks
    have been tried
  • Essentially the output of the network is wrapped
    around and used as additional inputs
  • Temporal dependence occurs in many problems,
    consider driving how hard to press on the
    brake? This might change between the first
    moment and the next
  • See figures 18.22, 18.23 p 518-519 for examples
  • These networks have other problems, notably
    learning using backprop where an output error can
    be propagated to the next iteration

28
More on Recurrent Networks
  • These networks are often applied to problems that
    not only have input-output pairs, but internal
    states
  • These are sometimes referred to as mental models
  • Training is done for two different things, first
    the models or internal states are learned, and
    then the individual items within each state
  • Consider a recurrent network which is trained to
    perform speech recognition it might first learn
    different kinds of speech phenomena that are not
    apparent (visible) to us, and then learn to
    interpret these as the different phonetic units
    as the output (see figure 18.23)

29
NN/brain comparison - flaws
  • NN Nodes are not neurons, NN nodes much too
    primitive
  • Lack of useful input (a few bits rather than
    millions or billions of bits of input)
  • Lack of nodes (brain has billions of neurons, NN
    usually have 10-100)
  • Evolution vs. Backprop learning -- both the type
    of learning and the time taken

30
Genetic Algorithms
  • Features are stored as a vector
  • Backprop-like algorithm used to alter vector
    values based on training examples
  • Uses techniques such as mutation and heredity
    to alter new vectors in attempts to evolve
    better representations for the next iteration
Write a Comment
User Comments (0)
About PowerShow.com