CS621: Artificial Intelligence Lecture 18: Feedforward network contd

About This Presentation

Title:

CS621: Artificial Intelligence Lecture 18: Feedforward network contd

Description:

Change weights, if found better (i.e. changed weights result in reduced error) ... a movement of the operating point in the wmn co-ordinate space will result in ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 23

Provided by: saur1

Category:

more less

Transcript and Presenter's Notes

Title: CS621: Artificial Intelligence Lecture 18: Feedforward network contd

1
CS621 Artificial IntelligenceLecture 18
Feedforward network contd

Pushpak Bhattacharyya
Computer Science and Engineering Department
IIT Bombay

2
Pocket Algorithm

Algorithm evolved in 1985 essentially uses PTA
Basic Idea
Always preserve the best weight obtained so far
in the pocket
Change weights, if found better (i.e. changed
weights result in reduced error).

3
XOR using 2 layers

Non-LS function expressed as a linearly
separable
function of individual linearly separable
functions.

4
Example - XOR

? Calculation of XOR
w21
w11
x1x2
x1x2
Calculation of
x1x2

w21.5
w1-1
x2
x1
5
Example - XOR

w21
w11
x1x2
1
1
x1x2
1.5
-1
-1
1.5
x2
x1
6
Some Terminology

A multilayer feedforward neural network has
Input layer
Output layer
Hidden layer (asserts computation)
Output units and hidden units are called
computation units.

7
Training of the MLP

Multilayer Perceptron (MLP)
Question- How to find weights for the hidden
layers when no target output is available?
Credit assignment problem to be solved by
Gradient Descent

8
Gradient Descent Technique

Let E be the error at the output layer
ti target output oi observed output
i is the index going over n neurons in the
outermost layer
j is the index going over the p patterns (1 to p)
Ex XOR p4 and n1

9
Weights in a ff NN

wmn is the weight of the connection from the nth
neuron to the mth neuron
E vs surface is a complex surface in the
space defined by the weights wij
gives the direction in which a movement
of the operating point in the wmn co-ordinate
space will result in maximum decrease in error

m
wmn
n
10
Sigmoid neurons

Gradient Descent needs a derivative computation
- not possible in perceptron due to the
discontinuous step function used!
? Sigmoid neurons with easy-to-compute
derivatives used!
Computing power comes from non-linearity of
sigmoid function.

11
Derivative of Sigmoid function
12
Training algorithm

Initialize weights to random values.
For input x ltxn,xn-1,,x0gt, modify weights as
follows
Target output t, Observed output o
Iterate until E lt ? (threshold)

13
Calculation of ?wi
14
Observations

Does the training technique support our
intuition?
The larger the xi, larger is ?wi
Error burden is borne by the weight values
corresponding to large input values

15
Backpropagation on feedforward network
16
Backpropagation algorithm
Output layer (m o/p neurons)
.
j
wji
.
i
Hidden layers
.
.
Input layer (n i/p neurons)

Fully connected feed forward network
Pure FF network (no jumping of connections over
layers)

17
Gradient Descent Equations
18
Backpropagation for outermost layer
19
Backpropagation for hidden layers
Output layer (m o/p neurons)
.
k
.
j
Hidden layers
.
i
.
Input layer (n i/p neurons)
?k is propagated backwards to find value of ?j
20
Backpropagation for hidden layers
21
General Backpropagation Rule