Robert J' Marks II - PowerPoint PPT Presentation

About This Presentation
Title:

Robert J' Marks II

Description:

Train a machine to simulate the input/output relationship. Types ... n. sum = wn sn (sum) Robert J. Marks II. Squashing Functions. sum (sum) 1. sigmoid: (x) ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 46
Provided by: bobm2
Category:

less

Transcript and Presenter's Notes

Title: Robert J' Marks II


1

Artificial Neural Networks Supervised Models
  • Robert J. Marks II
  • CIA Lab
  • Baylor University
  • School of Engineering
  • CiaLab.org

2
Supervised Learning
  • Given
  • Input (Stimulus)/Output (Response) Data
  • Object
  • Train a machine to simulate the input/output
    relationship
  • Types
  • Classification (Discrete Outputs)
  • Regression (Continuous Outputs)

3
Training a Classifier
gt classifier lt not Marks gt
classifier lt Marks gt classifier lt not
Marks
gt classifier lt Marks gt classifier lt not
Marks gt classifier lt not
Marks
4
Recall from a Trained Classifier

gt Classifier gt Marks Note The test image
does not appear in the training data.
Learning Memorization
5
Classifier In Feature Space, After Training
training data Marks not Marks
representation
concept (truth)
test data (Marks)
6
Supervised Regression (Interpolation)
  • Output data is continuous rather than discrete
  • Example - Load Forecasting
  • Training (from historical data)
  • Input temperatures, current load, day of week,
    holiday(?), etc.
  • Output next days load
  • Test
  • Input forecasted temperatures, current load,
    day of week, holiday(?), etc.
  • Output tomorrows load forecast

7
Properties of Good Classifiersand Regression
Machines
  • Good accuracy outside of
  • training set
  • Explanation Facility
  • Generate rules after training
  • Fast training
  • Fast testing

8
Some Classifiers and Regression Machines
  • Classification Autoregression Trees (CART)
  • Nearest Neighbor Look-Up
  • Neural Networks
  • Layered Perceptron (or MLPs)
  • Recurrent Perceptrons
  • Cascade Correlation Neural Networks
  • Radial Basis Function Neural Networks

9
A Model of an Artificial Neuron
s1
w1
s2
w2
sum s
w3
?
s3
?(sum)
w4
w5
s4
sum ??wn sn
n
s state ?(sum) ?(.) squashing function
s5
10
Squashing Functions
?(sum)
1
sum
1
sigmoid ?(x) __________
1 e - x
11
A Layered Perceptron
output
neurons
hidden layer
interconnects
input
12
Training
  • Given Training Data,
  • input vector set
  • i n 1 lt n lt N
  • corresponding output (target) vector set
  • t n 1 lt n lt N
  • Find the weights of the interconnects using
    training data to minimize error in the test data

13
Error
  • Input, target response
  • input vector set i n 1 lt n lt N
  • target vector set t n 1 lt n lt N
  • on neural network output when the input is i n
    . (Note on t n )
  • Error
  • ????????????on - tn ??

1 2
n
14
Error Minimization Techniques
  • The error is a function of the
  • fixed training and test data
  • neural network weights
  • Find weights that minimize error (Standard
    Optimization)
  • conjugate gradient descent
  • random search
  • genetic algorithms
  • steepest descent (error backpropagation)

15
Minimizing Error Using Steepest Descent
  • The main idea
  • Find the way downhill and take a step

d E d x
E
downhill - _____
d E d x
minimum
x x -???????
? step size
x
16
Example of Steepest Descent
1 2
E(x) _ x 2 minimum at x 0 - ___ - x
d E d x
d E d x
x x -???????????????????x Solution to
difference equation xp ??????????xp-1 is
x p ??????????p ?x0. for ????? lt 1, x1
??? ?
17
Training the Perceptron
2
1 2
???????????????on- tn?? ? ???????????????????wnk?i
k -tn?? ????????im ?????wmk?ik - tm? ??ij
??om- tm?
o1 o2
n 1
2 4
1 2
n1 k1
w11
w24
4
d? dwm j?
k1
i 1 i 2 i 3 i 4
18
Weight Update
o1 o2
d? dwm j?
?????????????ij ??om- tm? ???for m 4 and j
2 w24 w24 - ??i4 ??o2- t2?

w11
w24
i 1 i 2 i 3 i 4
19
No Hidden Alters Linear Separation
o ? (??wn in ) For classifier,
threshold If o gt ___ , announce class 1
If o lt ___ , announce class 2
Classification boundary o ___ , or ? wn in
0. This is the equation of a plane!
n
o
1 2
1 2
w1
w3
w2
1 2
n
i 1 i 2 i 3
20
??wn in 0 line through origin.
n
i 2
i 1
Classification Boundary
21
Adding Bias Term
i 1
o
w4
w1
i 2
w2
w3
i 1 i 2 i 3 1
Classification boundary still a line, but need
not go through origin.
22
The Minsky-Papert Objection
The simple operation of the exclusive or
(XOR) cannot be resolved using a linear
perceptron with bias. ? More important problems
can probably thus not be resolved with a linear
perceptron with bias.
i 1
1
?
i 2
1
23
The Layered Perceptron
output l L
neurons states sj(l )
hidden layer l
interconnect weights wjk(l)
input l 0
24
Error Backpropagation
Problem For an arbitrary weight, wjk(l) ,
update wjk(l ) wjk(l )
- ? ______ A Solution ? Error
Backpropagation ? Chain rule for partial
fractions d?? ??????d ????????d sj(l )
dsumj(l ) dwm j d sj(l ) dsumj(l )
dwm j
dE dwjk(l )
______ _____ ________ ________
25
Each Partial is Evaluated(Beautiful Math!!!)
dsj(l ) d 1 dsumj(l
) dsumj(l ) 1 exp - sumj(l )
sj(l ) 1 - sj(l )
dsumj(l ) dwm j dE
dsj(l )
________ _______
_________________
________ s (l -1)
m
?? j (l ) ? ? n(l 1) sn(l 1) 1 - s n (l
1) wnj (l )
n
26
Weight Update
wjk(l ) wjk(l ) -? ______
d?? d ?????dsj(l ) dsumj(l ) dwm j
d sj(l ) dsumj(l ) dwm j ? j(l
) sj(l 1) 1 - s j (l 1) sk (l -1)
dE dwjk(l )
______ _____ ________ ________
27
Step 1 Input Data Feedforward
The states of all of the neurons are determined
by the states of the neurons below them and the
interconnect weights.
s1(2) o1 s2(2) o2
s1(1) s2(1)
s3(1)
i1 i2 s2(0)
28
Step 2 Evaluate output error, backpropagate to
find ?s for each neuron
o1 , t1 o2 ,
t2 ? ??????1(2) ?2(2)
Each neuron now keeps track of two numbers. The
?s for each neuron are determined by
back- propagating the output error towards the
input.
s1(1) s2(1)
s3(1) ??1(1) ?2(1)
?3(1)
i1 i2
s2(0) ?1(0) ?2(0)
29
Step 3 Update Weights
o1 , t1 o2 ,
t2 ? ??????1(2) ?2(2)
Weight updates are performed within the neural
network architecture
s1(1) s2(1)
s3(1) ??1(1) ?2(1)
?3(1)
w32(1) w32(1) -???3(1) s3(1) 1 -
s3(1) s2 (0)
i1 i2
s2(0) ?1(0) ?2(0)
30
Neural Smithing
  • Bias
  • Momentum
  • Batch Training
  • Learning Versus Memorization
  • Cross Validation
  • The Curse of Dimensionality
  • Variations

31
Bias
  • Bias is used with MLP
  • At input
  • Hidden layers (sometimes)

32
Momentum
  • Steepest descent
  • wjk(l) wjk(l ) ????wjk(l )
  • With Momentum, ?
  • wjk (l ) wjk(l ) ???wjk (l )
    ???wjk(l )
  • New step effected by previous step
  • m is the iteration number
  • Convergence is improved

m
m
m1
m1
33
Back Propagation Batch Training
  • Accumulate error from all training data prior to
    weight update
  • True steepest descent
  • Update weights each epoch
  • Training Layered Perceptron One Data pair at a
    time
  • Randomize data to avoid structure
  • The Widrow-Hoff Algorithm

34
Learning versus Memorization Both have zero
training error
training data
test data
good generalization (learning)
concept (truth) bad generalization (memorization)
35
Alternate View
concept
learning
memorization (over fitting)
36
Learning versus Memorization (cont.)
  • Successful Learning
  • Recognizing data outside the training set, e.g.
    data in the test set.
  • i.e. the neural network must successfully
    classify (interpolate) inputs it has not seen
    before.
  • How can we assure learning?
  • Cross Validation
  • Choosing neural network structure
  • Pruning
  • Genetic Algorithms

37
Cross Validation
test error
minimum
training error
iterations (m)
38
The Curse of Dimensionality
  • For many problems, the required number of
    training
  • data increases to the power of the inputs
    dimension.
  • Example
  • For N2 inputs, suppose that
  • 100 102 training data pairs
  • For N3 inputs,
  • 103 1000 training data pairs are needed
  • In general, 10N training data pairs are needed
  • for many important problems.

39
Example Classifying a circle in a square
i2
o i1 i2
neural net
100 102 points are shown.
i1
40
Example Classifying a sphere in a cubeN3
i3
o i1 i2 i3
neural net
10 layers each with 102 points 103 points 10N
points
i1
i2
41
Variations
  • Architecture variation for MLPs
  • Recurrent Neural Networks
  • Radial Basis Functions
  • Cascade Correlation
  • Fuzzy MLPs
  • Training Algorithms

42
Applications
  • Power Engineering
  • Finance
  • Bioengineering
  • Control
  • Industrial Applications
  • Politics

43
Political Applications
Robert Novak syndicated column Washington,
February 18, 1996 UNDECIDED BOWLERS President
Clintons pollsters have identified the voters
who will determine whether he will be elected to
a second term two-parent families whose members
bowl for recreation. Using a technique they
call the neural network, Clinton advisors
contend that these family bowlers are the
quintessential undecided voters. Therefore,
these are the people who must be targeted by the
president.
44
Robert Novak syndicated column Washington,
February 18, 1996 (continued)
A footnote Two decades ago, Illinois Democratic
Gov. Dan Walker campaigned heavily in bowling
alleys in the belief he would find swing voters
there. Walker had national political ambitions
but ended up in federal prison.
45
Finis
Write a Comment
User Comments (0)
About PowerShow.com