Feed-Forward Neural Networks - PowerPoint PPT Presentation

1 / 150

About This Presentation

Title:

Feed-Forward Neural Networks

Description:

Feed-Forward Neural Networks : Content Introduction Single-Layer Perceptron Networks Learning Rules for Single-Layer Perceptron Networks Perceptron ... – PowerPoint PPT presentation

Number of Views:370

Avg rating:3.0/5.0

Slides: 151

Provided by: Tai129

Category:

more less

Transcript and Presenter's Notes

Title: Feed-Forward Neural Networks

1
Feed-Forward Neural Networks

??? ???

2
Content

Introduction
Single-Layer Perceptron Networks
Learning Rules for Single-Layer Perceptron
Networks
Perceptron Learning Rule
Adaline Leaning Rule
?-Leaning Rule
Multilayer Perceptron
Back Propagation Learning algorithm

3
Feed-Forward Neural Networks

Introduction

4
Historical Background

1943 McCulloch and Pitts proposed the first
computational models of neuron.
1949 Hebb proposed the first learning rule.
1958 Rosenblatts work in perceptrons.
1969 Minsky and Paperts exposed limitation of
the theory.
1970s Decade of dormancy for neural networks.
1980-90s Neural network return (self-organization,
back-propagation algorithms, etc)

5
Nervous Systems

Human brain contains 1011 neurons.
Each neuron is connected 104 others.
Some scientists compared the brain with a
complex, nonlinear, parallel computer.
The largest modern neural networks achieve the
complexity comparable to a nervous system of a
fly.

6
Neurons

The main purpose of neurons is to receive,
analyze and transmit further the information in a
form of signals (electric pulses).
When a neuron sends the information we say that a
neuron fires.

7
Neurons
Acting through specialized projections known as
dendrites and axons, neurons carry information
throughout the neural network.
This animation demonstrates the firing of a
synapse between the pre-synaptic terminal of one
neuron to the soma (cell body) of another neuron.
8
A Model of Artificial Neuron
9
A Model of Artificial Neuron
10
Feed-Forward Neural Networks

Graph representation
nodes neurons
arrows signal flow directions
A neural network that does not contain cycles
(feedback loops) is called a feedforward network
(or perceptron).

11
Layered Structure
Hidden Layer(s)
12
Knowledge and Memory

The output behavior of a network is determined by
the weights.
Weights ? the memory of an NN.
Knowledge ? distributed across the network.
Large number of nodes
increases the storage capacity
ensures that the knowledge is robust
fault tolerance.
Store new information by changing weights.

13
Pattern Classification
output pattern y

Function x ? y
The NNs output is used to distinguish between
and recognize different input patterns.
Different output patterns correspond to
particular classes of input patterns.
Networks with hidden layers can be used for
solving more complex problems then just a linear
pattern classification.

input pattern x
14
Training
Training Set
. . .
. . .
Goal
. . .
. . .
15
Generalization

By properly training a neural network may produce
reasonable answers for input patterns not seen
during training (generalization).
Generalization is particularly useful for the
analysis of a noisy data (e.g. timeseries).

16
Generalization

By properly training a neural network may produce
reasonable answers for input patterns not seen
during training (generalization).
Generalization is particularly useful for the
analysis of a noisy data (e.g. timeseries).

17
Applications

Pattern classification
Object recognition
Function approximation
Data compression
Time series analysis and forecast
. . .

18
Feed-Forward Neural Networks

Single-Layer Perceptron Networks

19
The Single-Layered Perceptron
20
Training a Single-Layered Perceptron
Training Set
Goal
21
Learning Rules

Linear Threshold Units (LTUs) Perceptron
Learning Rule
Linearly Graded Units (LGUs) Widrow-Hoff
learning Rule

Training Set
Goal
22
Feed-Forward Neural Networks

Learning Rules for
Single-Layered Perceptron Networks
Perceptron Learning Rule
Adline Leaning Rule
?-Learning Rule

23
Perceptron
Linear Threshold Unit
sgn
24
Perceptron
Goal
Linear Threshold Unit
sgn
25
Example
Goal
Class 1
g(x) ?2x1 x220
Class 2
26
Augmented input vector
Goal
Class 1 (1)
Class 2 (?1)
27
Augmented input vector
Goal
28
Augmented input vector
Goal
A plane passes through the origin in the
augmented input space.
29
Linearly Separable vs. Linearly Non-Separable
AND
OR
XOR
Linearly Separable
Linearly Separable
Linearly Non-Separable
30
Goal

Given training sets T1?C1 and T2 ? C2 with
elements in form of x(x1, x2 , ... , xm-1 , xm)
T , where x1, x2 , ... , xm-1 ?R and xm ?1.
Assume T1 and T2 are linearly separable.
Find w(w1, w2 , ... , wm) T such that

31
Goal
wTx 0 is a hyperplain passes through the origin
of augmented input space.

Given training sets T1?C1 and T2 ? C2 with
elements in form of x(x1, x2 , ... , xm-1 , xm)
T , where x1, x2 , ... , xm-1 ?R and xm ?1.
Assume T1 and T2 are linearly separable.
Find w(w1, w2 , ... , wm) T such that

32
Observation
Which ws correctly classify x?

What trick can be used?
33
Observation
Is this w ok?

w1x1 w2x2 0
34
Observation
w1x1 w2x2 0
Is this w ok?

35
Observation
w1x1 w2x2 0
Is this w ok?

How to adjust w?
?w ?
36
Observation
Is this w ok?

How to adjust w?
?w ??x
reasonable?
gt0
lt0
37
Observation
Is this w ok?

reasonable?
How to adjust w?
?w ?x
gt0
lt0
38
Observation
Is this w ok?
?
?w ?
?x
??x
or
39
Perceptron Learning Rule
Upon misclassification on
Define error
40
Perceptron Learning Rule
Define error
41
Perceptron Learning Rule
42
Summary ? Perceptron Learning Rule
Based on the general weight learning rule.
correct
incorrect
43
Summary ? Perceptron Learning Rule
Converge?
44
Perceptron Convergence Theorem

Exercise Reference some papers or textbooks to
prove the theorem.

If the given training set is linearly separable,
the learning process will converge in a finite
number of steps.
45
The Learning Scenario
Linearly Separable.
46
The Learning Scenario
47
The Learning Scenario
48
The Learning Scenario
49
The Learning Scenario
50
The Learning Scenario
w4 w3
w3
51
The Learning Scenario
w
52
The Learning Scenario
The demonstration is in augmented space.
w
Conceptually, in augmented space, we adjust the
weight vector to fit the data.
53
Weight Space
A weight in the shaded area will give correct
classification for the positive example.
w
54
Weight Space
A weight in the shaded area will give correct
classification for the positive example.
?w ?x
w
55
Weight Space
A weight not in the shaded area will give correct
classification for the negative example.
w
56
Weight Space
A weight not in the shaded area will give correct
classification for the negative example.
w
?w ??x
57
The Learning Scenario in Weight Space
58
The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w1
59
The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w1
w1
w0
60
The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w2
w1
w1
w0
61
The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w2
w3
w1
w1
w0
62
The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w4
w2
w3
w1
w1
w0
63
The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w4
w2
w3
w5
w1
w1
w0
64
The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w4
w2
w3
w5
w1
w6
w1
w0
65
The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w7
w4
w2
w3
w5
w1
w6
w1
w0
66
The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w8
w7
w4
w2
w3
w5
w1
w6
w1
w0
67
The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w9
w2
w8
w7
w4
w2
w3
w5
w1
w6
w1
w0
68
The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w9
w10
w2
w8
w7
w4
w2
w3
w5
w1
w6
w1
w0
69
The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w9
w10
w2
w11
w8
w7
w4
w2
w3
w5
w1
w6
w1
w0
70
The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w11
w1
w0
Conceptually, in weight space, we move the weight
into the feasible region.
71
Feed-Forward Neural Networks

Learning Rules for
Single-Layered Perceptron Networks
Perceptron Learning Rule
Adaline Leaning Rule
?-Learning Rule

72
Adaline (Adaptive Linear Element)
Widrow 1962
73
Adaline (Adaptive Linear Element)
In what condition, the goal is reachable?
Goal
Widrow 1962
74
LMS (Least Mean Square)
Minimize the cost function (error function)
75
Gradient Decent Algorithm
Our goal is to go downhill.
Contour Map
?w
(w1, w2)
76
Gradient Decent Algorithm
Our goal is to go downhill.
How to find the steepest decent direction?
Contour Map
?w
(w1, w2)
77
Gradient Operator
Let f(w) f (w1, w2,, wm) be a function over Rm.
Define
78
Gradient Operator
df positive
df zero
df negative
Go uphill
Plain
Go downhill
79
The Steepest Decent Direction
To minimize f , we choose ?w ?? ? f
df positive
df zero
df negative
Go uphill
Plain
Go downhill
80
LMS (Least Mean Square)
Minimize the cost function (error function)
? (k)
81
Adaline Learning Rule
Minimize the cost function (error function)
82
Learning Modes

Batch Learning Mode
Incremental Learning Mode

83
Summary ? Adaline Learning Rule
?-Learning Rule LMS Algorithm Widrow-Hoff
Learning Rule
Converge?
84
LMS Convergence

Based on the independence theory (Widrow, 1976).
The successive input vectors are statistically
independent.
At time t, the input vector x(t) is statistically
independent of all previous samples of the
desired response, namely d(1), d(2), , d(t?1).
At time t, the desired response d(t) is dependent
on x(t), but statistically independent of all
previous values of the desired response.
The input vector x(t) and desired response d(t)
are drawn from Gaussian distributed populations.

85
LMS Convergence
It can be shown that LMS is convergent if
where ?max is the largest eigenvalue of the
correlation matrix Rx for the inputs.
86
LMS Convergence
Since ?max is hardly available, we commonly use
It can be shown that LMS is convergent if
where ?max is the largest eigenvalue of the
correlation matrix Rx for the inputs.
87
Comparisons
Hebbian Assumption
Gradient Decent
Fundamental
Converge Asymptotically
Convergence
In finite steps
Linearly Separable
Linear Independence
Constraint
88
Feed-Forward Neural Networks

Learning Rules for
Single-Layered Perceptron Networks
Perceptron Learning Rule
Adaline Leaning Rule
?-Learning Rule

89
Adaline
90
Unipolar Sigmoid
91
Bipolar Sigmoid
92
Goal
Minimize
93
Gradient Decent Algorithm
Minimize
94
The Gradient
Minimize
Depends on the activation function used.
?
?
95
Weight Modification Rule
Minimize
Batch
Learning Rule
Incremental
96
The Learning Efficacy
Minimize
Sigmoid
Unipolar
Bipolar
Adaline
Exercise
97
Learning Rule ? Unipolar Sigmoid
Minimize
98
Comparisons
Batch
Adaline
Incremental
Batch
Sigmoid
Incremental
99
The Learning Efficacy
Sigmoid
Adaline
depends on output
constant
100
The Learning Efficacy
Sigmoid
Adaline
The learning efficacy of Adaline is constant
meaning that the Adline will never get saturated.
depends on output
constant
101
The Learning Efficacy
Sigmoid
Adaline
The sigmoid will get saturated if its output
value nears the two extremes.
depends on output
constant
102
Initialization for Sigmoid Neurons
Why?
Before training, it weight must be sufficiently
small.
103
Feed-Forward Neural Networks

Multilayer Perceptron

104
Multilayer Perceptron
Output Layer
Hidden Layer
Input Layer
105
Multilayer Perceptron
Where the knowledge from?
Classification
Output
Analysis
Learning
Input
106
How an MLP Works?
Example

Not linearly separable.
Is a single layer perceptron workable?

XOR
107
How an MLP Works?
Example
00
01
11
108
How an MLP Works?
Example
00
01
11
109
How an MLP Works?
Example
00
01
11
110
How an MLP Works?
Example
111
Parity Problem
Is the problem linearly separable?
112
Parity Problem
x3
P1
P2
x2
P3
x1
113
Parity Problem
111
011
001
000
114
Parity Problem
111
011
001
000
115
Parity Problem
111
P4
011
001
000
116
Parity Problem
P4
117
General Problem
118
General Problem
119
Hyperspace Partition
120
Region Encoding
001
000
010
100
101
110
111
121
Hyperspace Partition Region Encoding Layer
122
Region Identification Layer
123
Region Identification Layer
124
Region Identification Layer
125
Region Identification Layer
126
Region Identification Layer
127
Region Identification Layer
128
Region Identification Layer
129
Classification
0
?1
1
130
Feed-Forward Neural Networks

Back Propagation Learning algorithm

131
Activation Function Sigmoid
Remember this
132
Supervised Learning
Training Set
Output Layer
Hidden Layer
Input Layer
133
Supervised Learning
Training Set
Sum of Squared Errors
Goal
Minimize
134
Back Propagation Learning Algorithm

Learning on Output Neurons
Learning on Hidden Neurons

135
Learning on Output Neurons
?
?
136
Learning on Output Neurons
depends on the activation function
137
Learning on Output Neurons
Using sigmoid,
138
Learning on Output Neurons
Using sigmoid,
139
Learning on Output Neurons
140
Learning on Output Neurons
How to train the weights connecting to output
neurons?
141
Learning on Hidden Neurons
?
?
142
Learning on Hidden Neurons
143
Learning on Hidden Neurons
?
144
Learning on Hidden Neurons
145
Learning on Hidden Neurons
146
Back Propagation
147
Back Propagation
148
Back Propagation
149
Learning Factors

Initial Weights
Learning Constant (?)
Cost Functions
Momentum
Update Rules
Training Data and Generalization
Number of Layers
Number of Hidden Nodes

150
Reading Assignments

Shi Zhong and Vladimir Cherkassky, Factors
Controlling Generalization Ability of MLP
Networks. In Proc. IEEE Int. Joint Conf. on
Neural Networks, vol. 1, pp. 625-630, Washington
DC. July 1999. (http//www.cse.fau.edu/zhong/pubs
.htm)
Rumelhart, D. E., Hinton, G. E., and Williams, R.
J. (1986b). "Learning Internal Representations by
Error Propagation," in Parallel Distributed
Processing Explorations in the Microstructure of
Cognition, vol. I, D. E. Rumelhart, J. L.
McClelland, and the PDP Research Group. MIT
Press, Cambridge (1986).
(http//www.cnbc.cmu.edu/plaut/85-419/papers/Rum
elhartETAL86.backprop.pdf).