Function Approximation With ANNs - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Function Approximation With ANNs

Description:

Function approximation problem is to find neural network weights, f, such that ... Does this adequately approximate the training data? ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 42
Provided by: dUmn
Category:

less

Transcript and Presenter's Notes

Title: Function Approximation With ANNs


1
Function Approximation With ANNs
  • Resources
  • Chapter 20, textbook
  • Sections 20.5
  • Fausett (1994) chapter Delta rule
  • Supervised Training
  • Delta rule Single-layer networks
  • Backpropagation Multi-layer networks

2
Single Layer Feed-Forward ANN
  • Network has n inputs and m outputs
  • One layer of weights
  • Training data
  • Pairs (input, output)
  • input is a vector of length n
  • output is a vector of length m
  • Function approximation problem is to find neural
    network weights, f, such that
  • f(input) output
  • Sometimes call this
  • association learning

n inputs
m outputs
fully connected
. . .
. . .
3
Interpolation Problem
  • May be viewed as multidimensional interpolation
  • number of dimensions correspond to the number of
    input units, and output units
  • Example
  • one-dimensional input, one-dimensional output
    problem
  • input vectors have only one component output
    vectors also have one component
  • this is now a curve fitting problem which curve
    we choose to fit the data will change the result
    for new inputs i.e., for generalization
  • In general, there is no unique solution to curve
    the we choose
  • Raises a sampling issue
  • We need to have data that adequately sample the
    distribution

4
Sampling issue
If this was the underlying system we are
obtaining data from, how would we select the
samples?
5
Example of Association Learning
  • Images of three types
  • Converted to grayscale
  • Want to associate each image with a shape name
  • Dog shape with audio representation of word dog
  • Yellow pail with audio representation of pail
  • Green bucket with audio representation of
    bucket
  • Assume image is 16x16 pixels (256 inputs)
  • Assume output audio is represented by 10 real
    number values
  • How do we find neural network weights to
    approximate
  • f(image) audio

6
Simplify Problem
  • One Output Unit
  • Activation function Identity f(x) x
  • output defined just by weighted sum of inputs
  • Example problem, with n 3

n inputs
1 output
fully connected
. . .
How do we establish weights for the ANN?
7
General Weight Update Algorithm
  • Initialize the weight to random values
  • Compute errors
  • While errors produced by the network are too
    great
  • For current sample
  • Update network weights using a weight update rule
  • Re-compute error for current sample

(Incremental weight update)
8
Weight Update Rule 1 Delta Rule
Change in Ith weight of weight vector
Learning rate (scalar, constant)
t
Target or correct output
y_in
Net (summed, weighted) input to output unit
Ith input value
9
Example
  • W (W1, W2, W3)
  • Initially W (.5 .2 .4)
  • Let ? 0.5
  • Apply delta rule

W1
W2
W3
Delta rule
10
One Epoch of Training
Delta rule
11
Step 1 of Training
Delta rule
12
Remaining Steps in First Epoch of Training
Delta rule
13
Completing the Example
  • After 18 epochs
  • Weights
  • W1 0.990735
  • W2 -0.970018005
  • W3 0.98147
  • Does this adequately approximate the training
    data?

W1
W2
W3
http//www.cprince.com/courses/cs5541fall03/lectur
es/neural-networks/delta-rule1.xls
14
Actual Outputs
So, we have one method to incrementally adjust
the network weights, based on a series of
training samples This is typically called
training or learning
15
What about
  • The following weights?
  • W1 1
  • W2 -1
  • W3 1
  • Generalization?
  • (0 1 0)
  • (1 1 0)
  • (0 1 1)

16
Why is the Delta-Rule Effective?
  • Delta rule implements a form of error
    minimization
  • Change weights to reduce sum squared error, E
  • For a specific training pattern, the sum squared
    error is
  • E (t y_in)2
  • Recall
  • t desired output of network
  • y_in actual output of network
  • The derivative of E gives the slope or gradient
    of E
  • Gives both direction of most rapid increase in E,
    the error, and direction of most rapid decrease
  • Want the derivative with respect to the weights
  • We are adjusting the weights in an effort to
    decrease the error
  • Since y_in is a function of multiple weights, we
    will have partial derivatives
  • Adjusting weight WI in the direction of
  • will reduce the error

17
Delta Rule Approach
E (t y_in)2
  • E and y_in defined as before
  • Note that y_in is computed for one training
    sample
  • Define training as modifying weights so that
    the error is reduced
  • Typically, this is done iteratively
  • E.g., Weights modified to reduce error for the
    current training sample, then modified to reduce
    error for another training sample etc.
  • Approach
  • Take the derivative of E, the error, with respect
    to the weights
  • Tells us how to change weights so as to minimize
    E
  • Results in changes in weights that reduce E, the
    error

18
Partial Derivative of E, Error
Since
i.e., chain rule
(t is a constant in this context)
Since
19
Completing Derivation of Delta Rule
Because we are looking for
we negate
Incorporating constant of 2 into the learning
rate, ? gives
Changing the weight by this amount will reduce
the error, E, for this data sample
20
Delta Rule with Activation Functions
Example for f
Change in Ith weight of weight vector
Learning rate (scalar, constant)
t
Target or correct output
y_in
Net (summed, weighted) input to output unit
Ith input value
f
Differentiable activation function
y
Output of network f(y_in)
21
Derivation of Delta Rule for Use With Activation
Functions
  • f differentiable activation function
  • e.g., sigmoidal

Output of network
y
Now we need
22
Derivation of
In derivation, we can apply chain rule to
this f(g(x0) f(g(x0))g(x0)
23
Delta Rule With Activation Function
Because we are looking for
we negate
Incorporating the constant into ? gives
24
How do we modify this to use the sigmoidal
activation function?
25
Delta Rule With Sigmoidal Activation Function
  • Need to take the derivative of the sigmoidal
    function

26
Extension to Multiple Output Units
  • Have been dealing with only a single output unit
  • Need to generalize to multiple output units

Weight from i-th input unit to j-th output unit
n inputs
m outputs
Expected output from j-th output unit
Actual output from j-th output unit
fully connected
Summed weighted input to j-th output unit
. . .
. . .
Ith input value
27
What about Multiple Layers?
  • This works when we have
  • known outputs (supervision)
  • single layer ANN
  • How can we train weights for multi-layer ANNs?

28
Backpropagation
  • A method of training weights in a multi-layer,
    feedforward network
  • Problem is to establish correct or expected
    values for layers other than the output layer
  • Method
  • Start at output layer
  • Work backwards from output layer, to successive
    layers to left
  • Modify weights at each step

29
Problem
General form of connections between layers
  • First, compute weight updates for right weight
    layer using the delta rule

30
Now, need to consider input layer to hidden layer
  • Previously, in the delta rule, we needed the
    partial derivative
  • We now need the partial derivative

31
Expected output
Where
Previously,
Actual output
Now, well consider all p output units (error for
one training sample)
Our goal is to find
32
Taking the partial derivative of this, well find
it defined in terms of the v weights. Why?
Because the y outputs are indirectly generated,
in part, by the v weights Since each v weight may
have an indirect effect on potentially each of
the y outputs, we need to consider each of the y
outputs in the formulation.
33
Since
Collapsing back to the summation, we have
34
Recall,
Now, deriving
(Chain rule)
Substituting gives
35
For convenience, let
(We can calculate this directly)
Now, derive
36
Deriving
Recall,
is the single hidden layer unit that weight
is affecting
Now, need
37
Deriving
Recall,
(Chain rule)
Since
(We can calculate this directly)
Finally, we have an expression in terms of the v
weights!
38
Now, we have
Putting it together
39
Finally, the weight change for connections to
hidden layer units
40
Application of Backpropagation
  • The equations so far have been general, for two
    weight layer networks
  • Specialize the weight update equations for the
    following network architecture

etc.
41
z-y weights
x-z weights
These update rules are applied once per data
sample per epoch
Write a Comment
User Comments (0)
About PowerShow.com