Robert J' Marks II - PowerPoint PPT Presentation

About This Presentation

Title:

Robert J' Marks II

Description:

Train a machine to simulate the input/output relationship. Types ... n. sum = wn sn (sum) Robert J. Marks II. Squashing Functions. sum (sum) 1. sigmoid: (x) ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 46

Provided by: bobm2

Category:

more less

Transcript and Presenter's Notes

Title: Robert J' Marks II

1

Artificial Neural Networks Supervised Models

Robert J. Marks II
CIA Lab
Baylor University
School of Engineering
CiaLab.org

2
Supervised Learning

Given
Input (Stimulus)/Output (Response) Data
Object
Train a machine to simulate the input/output
relationship
Types
Classification (Discrete Outputs)
Regression (Continuous Outputs)

3
Training a Classifier
gt classifier lt not Marks gt
classifier lt Marks gt classifier lt not
Marks
gt classifier lt Marks gt classifier lt not
Marks gt classifier lt not
Marks
4
Recall from a Trained Classifier

gt Classifier gt Marks Note The test image
does not appear in the training data.
Learning Memorization
5
Classifier In Feature Space, After Training
training data Marks not Marks
representation
concept (truth)
test data (Marks)
6
Supervised Regression (Interpolation)

Output data is continuous rather than discrete
Example - Load Forecasting
Training (from historical data)
Input temperatures, current load, day of week,
holiday(?), etc.
Output next days load
Test
Input forecasted temperatures, current load,
day of week, holiday(?), etc.
Output tomorrows load forecast

7
Properties of Good Classifiersand Regression
Machines

Good accuracy outside of
training set
Explanation Facility
Generate rules after training
Fast training
Fast testing

8
Some Classifiers and Regression Machines

Classification Autoregression Trees (CART)
Nearest Neighbor Look-Up
Neural Networks
Layered Perceptron (or MLPs)
Recurrent Perceptrons
Cascade Correlation Neural Networks
Radial Basis Function Neural Networks

9
A Model of an Artificial Neuron
s1
w1
s2
w2
sum s
w3
?
s3
?(sum)
w4
w5
s4
sum ??wn sn
n
s state ?(sum) ?(.) squashing function
s5
10
Squashing Functions
?(sum)
1
sum
1
sigmoid ?(x) __________
1 e - x
11
A Layered Perceptron
output
neurons
hidden layer
interconnects
input
12
Training

Given Training Data,
input vector set
i n 1 lt n lt N
corresponding output (target) vector set
t n 1 lt n lt N
Find the weights of the interconnects using
training data to minimize error in the test data

13
Error

Input, target response
input vector set i n 1 lt n lt N
target vector set t n 1 lt n lt N
on neural network output when the input is i n
. (Note on t n )
Error
????????????on - tn ??

1 2
n
14
Error Minimization Techniques

The error is a function of the
fixed training and test data
neural network weights
Find weights that minimize error (Standard
Optimization)
conjugate gradient descent
random search
genetic algorithms
steepest descent (error backpropagation)

15
Minimizing Error Using Steepest Descent

The main idea
Find the way downhill and take a step

d E d x
E
downhill - _____
d E d x
minimum
x x -???????
? step size
x
16
Example of Steepest Descent
1 2
E(x) _ x 2 minimum at x 0 - ___ - x
d E d x
d E d x
x x -???????????????????x Solution to
difference equation xp ??????????xp-1 is
x p ??????????p ?x0. for ????? lt 1, x1
??? ?
17
Training the Perceptron
2
1 2
???????????????on- tn?? ? ???????????????????wnk?i
k -tn?? ????????im ?????wmk?ik - tm? ??ij
??om- tm?
o1 o2
n 1
2 4
1 2
n1 k1
w11
w24
4
d? dwm j?
k1
i 1 i 2 i 3 i 4
18
Weight Update
o1 o2
d? dwm j?
?????????????ij ??om- tm? ???for m 4 and j
2 w24 w24 - ??i4 ??o2- t2?

w11
w24
i 1 i 2 i 3 i 4
19
No Hidden Alters Linear Separation
o ? (??wn in ) For classifier,
threshold If o gt ___ , announce class 1
If o lt ___ , announce class 2
Classification boundary o ___ , or ? wn in
0. This is the equation of a plane!
n
o
1 2
1 2
w1
w3
w2
1 2
n
i 1 i 2 i 3
20
??wn in 0 line through origin.
n
i 2
i 1
Classification Boundary
21
Adding Bias Term
i 1
o
w4
w1
i 2
w2
w3
i 1 i 2 i 3 1
Classification boundary still a line, but need
not go through origin.
22
The Minsky-Papert Objection
The simple operation of the exclusive or
(XOR) cannot be resolved using a linear
perceptron with bias. ? More important problems
can probably thus not be resolved with a linear
perceptron with bias.
i 1
1
?
i 2
1
23
The Layered Perceptron
output l L
neurons states sj(l )
hidden layer l
interconnect weights wjk(l)
input l 0
24
Error Backpropagation
Problem For an arbitrary weight, wjk(l) ,
update wjk(l ) wjk(l )
- ? ______ A Solution ? Error
Backpropagation ? Chain rule for partial
fractions d?? ??????d ????????d sj(l )
dsumj(l ) dwm j d sj(l ) dsumj(l )
dwm j
dE dwjk(l )
______ _____ ________ ________
25
Each Partial is Evaluated(Beautiful Math!!!)
dsj(l ) d 1 dsumj(l
) dsumj(l ) 1 exp - sumj(l )
sj(l ) 1 - sj(l )
dsumj(l ) dwm j dE
dsj(l )
________ _______
_________________
________ s (l -1)
m
?? j (l ) ? ? n(l 1) sn(l 1) 1 - s n (l
1) wnj (l )
n
26
Weight Update
wjk(l ) wjk(l ) -? ______
d?? d ?????dsj(l ) dsumj(l ) dwm j
d sj(l ) dsumj(l ) dwm j ? j(l
) sj(l 1) 1 - s j (l 1) sk (l -1)
dE dwjk(l )
______ _____ ________ ________
27
Step 1 Input Data Feedforward
The states of all of the neurons are determined
by the states of the neurons below them and the
interconnect weights.
s1(2) o1 s2(2) o2
s1(1) s2(1)
s3(1)
i1 i2 s2(0)
28
Step 2 Evaluate output error, backpropagate to
find ?s for each neuron
o1 , t1 o2 ,
t2 ? ??????1(2) ?2(2)
Each neuron now keeps track of two numbers. The
?s for each neuron are determined by
back- propagating the output error towards the
input.
s1(1) s2(1)
s3(1) ??1(1) ?2(1)
?3(1)
i1 i2
s2(0) ?1(0) ?2(0)
29
Step 3 Update Weights
o1 , t1 o2 ,
t2 ? ??????1(2) ?2(2)
Weight updates are performed within the neural
network architecture
s1(1) s2(1)
s3(1) ??1(1) ?2(1)
?3(1)
w32(1) w32(1) -???3(1) s3(1) 1 -
s3(1) s2 (0)
i1 i2
s2(0) ?1(0) ?2(0)
30
Neural Smithing

Bias
Momentum
Batch Training
Learning Versus Memorization
Cross Validation
The Curse of Dimensionality
Variations

31
Bias

Bias is used with MLP
At input
Hidden layers (sometimes)

32
Momentum

Steepest descent
wjk(l) wjk(l ) ????wjk(l )
With Momentum, ?
wjk (l ) wjk(l ) ???wjk (l )
???wjk(l )
New step effected by previous step
m is the iteration number
Convergence is improved

m
m
m1
m1
33
Back Propagation Batch Training

Accumulate error from all training data prior to
weight update
True steepest descent
Update weights each epoch
Training Layered Perceptron One Data pair at a
time
Randomize data to avoid structure
The Widrow-Hoff Algorithm

34
Learning versus Memorization Both have zero
training error
training data
test data
good generalization (learning)
concept (truth) bad generalization (memorization)
35
Alternate View
concept
learning
memorization (over fitting)
36
Learning versus Memorization (cont.)

Successful Learning
Recognizing data outside the training set, e.g.
data in the test set.
i.e. the neural network must successfully
classify (interpolate) inputs it has not seen
before.
How can we assure learning?
Cross Validation
Choosing neural network structure
Pruning
Genetic Algorithms

37
Cross Validation
test error
minimum
training error
iterations (m)
38
The Curse of Dimensionality

For many problems, the required number of
training
data increases to the power of the inputs
dimension.
Example
For N2 inputs, suppose that
100 102 training data pairs
For N3 inputs,
103 1000 training data pairs are needed
In general, 10N training data pairs are needed
for many important problems.

39
Example Classifying a circle in a square
i2
o i1 i2
neural net
100 102 points are shown.
i1
40
Example Classifying a sphere in a cubeN3
i3
o i1 i2 i3
neural net
10 layers each with 102 points 103 points 10N
points
i1
i2
41
Variations

Architecture variation for MLPs
Recurrent Neural Networks
Radial Basis Functions
Cascade Correlation
Fuzzy MLPs
Training Algorithms

42
Applications

Power Engineering
Finance
Bioengineering
Control
Industrial Applications
Politics

43
Political Applications
Robert Novak syndicated column Washington,
February 18, 1996 UNDECIDED BOWLERS President
Clintons pollsters have identified the voters
who will determine whether he will be elected to
a second term two-parent families whose members
bowl for recreation. Using a technique they
call the neural network, Clinton advisors
contend that these family bowlers are the
quintessential undecided voters. Therefore,
these are the people who must be targeted by the
president.
44
Robert Novak syndicated column Washington,
February 18, 1996 (continued)
A footnote Two decades ago, Illinois Democratic
Gov. Dan Walker campaigned heavily in bowling
alleys in the belief he would find swing voters
there. Walker had national political ambitions
but ended up in federal prison.
45
Finis

Write a Comment

User Comments (0)