Training Neural Networks - PowerPoint PPT Presentation

1 / 58

About This Presentation

Title:

Training Neural Networks

Description:

Assess the accuracy of artificial neurons as models for ... ALVINN: Knight Rider in real life! Getting rich: LBS Capital Management predicts the S&P 500 ... – PowerPoint PPT presentation

Number of Views:103

Avg rating:3.0/5.0

Slides: 59

Provided by: jip4

Category:

more less

Transcript and Presenter's Notes

Title: Training Neural Networks

1
Training Neural Networks

Robert Turetsky
Columbia University rjt72_at_columbia.edu
Systems, Man and Cybernetics Society
IEEE North Jersey Chapter
December 12, 2000

2
Objective

Introduce fundamental concepts in Artificial
Neural Networks
Discuss methods of training ANNs
Explore some uses of ANNs
Assess the accuracy of artificial neurons as
models for biological neurons
Discuss current views, ideas and research

3
Organization

Why Neural Networks?
Single TLUs
Training Neural Nets Back propagation
Working with Neural Networks
Modeling the neuron
The multi-agent architecture
Directions and destinations

4
Why Neural Networks?
5
The Von Neumann architecture

Memory for programs and data
CPU for math and logic
Control unit to steer program flow

6
Von Neumann vs. ANNs
Von Neumann
Neural Net

Follows Rules
Solution can/must be formally specified
Cannot generalize
Not error tolerant

Learns from data
Rules on data are not visible
Able to generalize
Copes well with noise

7
Circuits that LEARN

Three types of learning
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Hebbian networks reward good paths, punish
bad paths
Train neural net by adjusting weights
PAC (Probably Approximately Correct) theory
Kerns Vazirani 1994, Haussler 1990

8
Supervised Learning Concepts

Training set ? Input/output pairs
Supervised learning because we know the correct
action for every input in ?
We want our Neural Net to act correctly in as
many training vectors as possible
Choose training set to be a typical set of inputs
The Neural net will (hopefully) generalize to all
inputs based on training set
Validation Set Check to see how well our
training can generalize

9
Neural Net Applications

Miros Corp. Face recognition
Handwriting Recognition
BrainMaker Medical Diagnosis
Bushnell Neural net for combinational automatic
test pattern generation
ALVINN Knight Rider in real life!
Getting rich LBS Capital Management predicts the
SP 500

10
History of Neural Networks

1943 McCullough and Pitts - Modeling the Neuron
for Parallel Distributed Processing
1958 Rosenblatt - Perceptron
1969 Minsky and Papert publish limits on the
ability of a perceptron to generalize
1970s and 1980s ANN renaissance
1986 Rumelhart, Hinton Williams present
backpropagation
1989 Tsividis Neural Network on a chip

11
Threshold Logic Units

The building blocks of
Neural Networks

12
The TLU at a glance

TLU Threshold Logic Unit
Loosely based on the firing of biological neurons
Many inputs, one binary output
Threshold Biasing function
Squashing function compresses infinite input into
range of 0 - 1

13
The TLU in Action
14
Training TLUs Notation

? Threshold of TLU
X Input Vector
W Weight Vector
s X Wie if s ? ?, op 1 if s lt ?, op
0
d desired output of TLU
f output of TLU with current X and W

15
Augmented Vectors

Motivation Train threshold ? at the same time as
input weights
X ? W ? ? is the same as X ? W - ? ? 0
Set threshold of TLU 0
Augment W W w1, w2, wn, -?
Augment X X x1, x2, .. xn, 1
New TLU equation X W ? 0(for augmented X and
W)

16
Gradient Descent Methods

Error Function How far off are we?
Example Error function
? depends on weight values
Gradient Descent Minimize error by moving
weights along the decreasing slope of error
The Idea iterate through the training set and
adjust the weights to minimize the gradient of
the error

17
Gradient Descent The Math

We have ? (d - f)2
Gradient of ?
Using the chain rule
Since , we have
Also
Which finally gives

18
Gradient Descent Back to reality

So we have
The problem ?f / ?s is not differentiable
Three solutions
Ignore It The Error-Correction Procedure
Fudge It Widrow-Hoff
Approximate it The Generalized Delta Procedure

19
Training a TLU Example

Train a neural network to match the following
linearly separable training set

20
Behind the scenes Planes and Hyperplanes
21
What can a TLU learn?
22
Linearly Separable Functions

A single TLU can implement any Linearly separable
function
AB is Linearly separable
A ? B is not

23
NEURAL NETWORKS

An Architecture for Learning

24
Neural Network Fundamentals

Chain multiple TLUs together
Three layers
Input Layer
Hidden Layers
Output Layer
Two classifications
Feed-Forward
Recurrent

25
Neural Network Terminology
26
Training ANNs Backpropagation

Main Idea distribute the error function across
the hidden layers, corresponding to their effect
on the output
Works on feed-forward networks
Use sigmoid units to train, and then we can
replace with threshold functions.

27
Back-Propagation Birds-eye view

Repeat
Choose training pair and copy it to input layer
Cycle that pattern through the net
Calculate error derivative between output
activation and target output
Back propagate the summed product of the weights
and errors in the output layer to calculate the
error on the hidden units
Update weights according to the error on that
unit
Until error is low or the net settles

28
Back-Prop Sharing the Blame

We want to assign
Wij weights of i-th sigmoid in j-th layer
Xj-1 inputs to our TLU (outputs from previous
layer)
cij learning rate constant of i-th sigmoid in
j-th layer
?ij sensitivity of the network output to
changes in the input of our TLU
Important equation

29
Back-Prop Calculating ?ij

For the output layer ?ij ?k
?ij ?k (d-f)?f/ ?sk
?ij (d-f)f(1-f) for sigmoid
Therefore Wk lt- Wk ck (d - f) f (1 -f ) Xk-1
For the hidden layers
See Nilsson 1998 for calculation
Recursive Formula base case ?k (d-f)f(1-f)

30
Back-Prop Example

Train a 2-layer Neural net with the following
input
x10 1, x20 0, x30 1, d 0
x10 0, x20 0, x30 1, d 1
x10 0, x20 1, x30 1, d 0
x10 1, x20 1, x30 1, d 1

31
Back-Prop Problems

Learning rate is non-optimal
One solution Learn the learning rate
Network Paralysis Weights grow so large that
fij(1-fij) --gt 0, and the net never learns
Local Extrema Gradient Descent is a greedy
method
These problems are acceptable in many cases, even
if workarounds cant be found

32
Back-Prop Momentum

We want to choose a learning rate that is as
large as possible
Speed up convergence
Avoid oscillations
Add momentum term dependent on past weight
change

33
Another Method ALOPEX

Used for visual receptive field mapping by
Tzanakou and Harth,1973
Originally developed for receptive field mapping
in the visual pathway of frogs
The main ideas
Use cross-correlation to determine a direction of
movement in gradient field
Add a random element to avoid local extrema

34
WORKING WITHNEURAL NETS

AI the easy way!

35
ANN Project Lifecycle

Task identification and design
Feasibility
Data Coding
Network Design
Data Collection
Data Checking
Training and Testing
Error Analysis
Network Analysis
System Implementation

36
ANN Design Tradeoffs

A good design will find a balance between these
two extremes!

37
ANN Design Balance Depth

Too few hidden layers will cause errors in
accuracy
Too many errors will cause errors in
generalization!

38
CLICK!

Modeling the neuron

39
Wetware Biological Neurons
40
The Process Neuron Firing

Each electrical signal received at a synapse
causes neurotransmitter release
The neurotransmitter travels along the synaptic
cleft and received by the other neuron at a
receptor site
Post-Synaptic-Potential (PSP) either increases
(hyperpolarizes) or decreases (depolarizes) the
polarization of the post-synaptic membrane (the
receptors)
In hyperpolarization, the spike train is
inhibited. In depolarization, the spike train is
excited.

41
The Process Part 2

Each PSP travels along the dendrite of the new
neuron, and spreads itself over the cell body
When the effects of the PSP reaches the
axon-hillock, it is summed with other PSPs.
If the sum is greater than a certain threshold,
the neuron fires a spike along the axon
Once the spike reaches the synapse of an efferent
neuron, the process starts in that neuron

42
The neuron to the TLU

Cell Body (Soma) accumulator plus its threshold
function
Dendrites inputs to the TLU
Axon output of the TLU
Information Encoding
Neurons use frequency
TLUs use value

43
Modeling the Neuron Capabilities

Humans and Neural Nets are both
Good at pattern recognition
Bad at mathematical calculation
Good at compressing lots of information into a
yes/no decision
Taught via training period
TLUs win because neurons are slow
Wetware wins because we have a cheap source of
billions of neurons

44
Do ANNs model neuron structures?

No Hundreds of types of specialized nerons, only
one TLU
No Weights to neural threshold controlled by
many neurotransmitters, not just one
Yes Most of the complexity in the neuron is
devoted to sustaining life, not information
processing
Maybe There is no real method for
backpropagation in the brain. Instead, firing of
neurons increases connection strength

45
High Level Agent Architecture

Our minds are composed of a series of
non-intelligent agents
The hierarchy, interconnections, and interactions
between the agents creates our intelligence
There is no one agent in control
We learn by forming new connections between
agents
We improve by dealing with agents at a higher
level, ie creating mental scripts

46
Agent Hierarchy Playing with Blocks
From the outside, Builder knows how to build
towers. From inside, Builder just turns on other
agents.
47
How We Remember K-Line Theory
48
New Knowledge Connections

Sandcastles in the sky Everything we know is
connected to everything else we know
Knowledge is acquired by making connections new
between things we already know

49
Learning Meaning

Uniframing Combining several descriptions into
one
Accumulating Collecting incompatible
descriptions
Reformulating modifying a descriptions
character
Transforming bridging between structures and
functions or actions

50
The Exception Principle

It rarely pays to tamper with a rule that nearly
always works. It is better to complement it with
an accumulation of exceptions
Birds can Fly
Birds can fly unless they are penguins and
ostriches

51
The Exception Principle Overfitting

Birds can fly, unless they are penguins and
ostriches, or if they happen to be dead, or have
broken wings, or are confined to cages, or have
their feet stuck in cement, or have undergone
experiences so dreadful as to render them
psychologically incapable of flight
In real thought, finding exceptions to everything
is usually unnecessary.

52
Minskys Princples

Most new knowledge is simply finding a new way to
relate things we already know
There is nothing wrong with circular logic or
having imperfect rules
Any idea will seem self-evident... once youve
forgotten learning it.
Easy things are hard Were least aware of what
our minds do best

53
TO THE FUTURE AND BEYOND

Why you should be nice
to your computer

54
Im lonely and Im bored.Come play with me!
55
Computers are Dumb

Deep Blue might be able to win at chess, but it
wont know to come in from the rain.
Computers can only know what theyre told, or
what theyre told to learn
Computers lack a sense of mortality and a
physical self with which to preserve
All of this will change when computers can reach
consciousness

56
I, Silicon Consciousness

Kurzweil By 2019, a 1000 computer will be
equivalent to the human brain.
By 2029, machines will claim to be conscious. We
will believe them.
By 2049, nanobot swarms will make virtual reality
obsolete in real reality.
By 2099, man and machine will have completely
merged.

57
You mean to tell me?????

We humans will gradually introduce machines into
our bodies, as implants
Our machines will grow more human as they learn,
and learn to design themselves
The Neo-Luddite scenarios
AI succeeds in creating conscious beings. All
life is at the mercy of the machines.
Humans retain control workers are obsolete. The
power to decide the fate of the masses is now
completely in the hands of the elite.

58
Neural Networks Conclusions