Training Neural Networks - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Training Neural Networks

Description:

Assess the accuracy of artificial neurons as models for ... ALVINN: Knight Rider in real life! Getting rich: LBS Capital Management predicts the S&P 500 ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 59
Provided by: jip4
Category:

less

Transcript and Presenter's Notes

Title: Training Neural Networks


1
Training Neural Networks
  • Robert Turetsky
  • Columbia University rjt72_at_columbia.edu
  • Systems, Man and Cybernetics Society
  • IEEE North Jersey Chapter
  • December 12, 2000

2
Objective
  • Introduce fundamental concepts in Artificial
    Neural Networks
  • Discuss methods of training ANNs
  • Explore some uses of ANNs
  • Assess the accuracy of artificial neurons as
    models for biological neurons
  • Discuss current views, ideas and research

3
Organization
  • Why Neural Networks?
  • Single TLUs
  • Training Neural Nets Back propagation
  • Working with Neural Networks
  • Modeling the neuron
  • The multi-agent architecture
  • Directions and destinations

4
Why Neural Networks?
5
The Von Neumann architecture
  • Memory for programs and data
  • CPU for math and logic
  • Control unit to steer program flow

6
Von Neumann vs. ANNs
Von Neumann
Neural Net
  • Follows Rules
  • Solution can/must be formally specified
  • Cannot generalize
  • Not error tolerant
  • Learns from data
  • Rules on data are not visible
  • Able to generalize
  • Copes well with noise

7
Circuits that LEARN
  • Three types of learning
  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning
  • Hebbian networks reward good paths, punish
    bad paths
  • Train neural net by adjusting weights
  • PAC (Probably Approximately Correct) theory
    Kerns Vazirani 1994, Haussler 1990

8
Supervised Learning Concepts
  • Training set ? Input/output pairs
  • Supervised learning because we know the correct
    action for every input in ?
  • We want our Neural Net to act correctly in as
    many training vectors as possible
  • Choose training set to be a typical set of inputs
  • The Neural net will (hopefully) generalize to all
    inputs based on training set
  • Validation Set Check to see how well our
    training can generalize

9
Neural Net Applications
  • Miros Corp. Face recognition
  • Handwriting Recognition
  • BrainMaker Medical Diagnosis
  • Bushnell Neural net for combinational automatic
    test pattern generation
  • ALVINN Knight Rider in real life!
  • Getting rich LBS Capital Management predicts the
    SP 500

10
History of Neural Networks
  • 1943 McCullough and Pitts - Modeling the Neuron
    for Parallel Distributed Processing
  • 1958 Rosenblatt - Perceptron
  • 1969 Minsky and Papert publish limits on the
    ability of a perceptron to generalize
  • 1970s and 1980s ANN renaissance
  • 1986 Rumelhart, Hinton Williams present
    backpropagation
  • 1989 Tsividis Neural Network on a chip

11
Threshold Logic Units
  • The building blocks of
  • Neural Networks

12
The TLU at a glance
  • TLU Threshold Logic Unit
  • Loosely based on the firing of biological neurons
  • Many inputs, one binary output
  • Threshold Biasing function
  • Squashing function compresses infinite input into
    range of 0 - 1

13
The TLU in Action
14
Training TLUs Notation
  • ? Threshold of TLU
  • X Input Vector
  • W Weight Vector
  • s X Wie if s ? ?, op 1 if s lt ?, op
    0
  • d desired output of TLU
  • f output of TLU with current X and W

15
Augmented Vectors
  • Motivation Train threshold ? at the same time as
    input weights
  • X ? W ? ? is the same as X ? W - ? ? 0
  • Set threshold of TLU 0
  • Augment W W w1, w2, wn, -?
  • Augment X X x1, x2, .. xn, 1
  • New TLU equation X W ? 0(for augmented X and
    W)

16
Gradient Descent Methods
  • Error Function How far off are we?
  • Example Error function
  • ? depends on weight values
  • Gradient Descent Minimize error by moving
    weights along the decreasing slope of error
  • The Idea iterate through the training set and
    adjust the weights to minimize the gradient of
    the error

17
Gradient Descent The Math
  • We have ? (d - f)2
  • Gradient of ?
  • Using the chain rule
  • Since , we have
  • Also
  • Which finally gives

18
Gradient Descent Back to reality
  • So we have
  • The problem ?f / ?s is not differentiable
  • Three solutions
  • Ignore It The Error-Correction Procedure
  • Fudge It Widrow-Hoff
  • Approximate it The Generalized Delta Procedure

19
Training a TLU Example
  • Train a neural network to match the following
    linearly separable training set

20
Behind the scenes Planes and Hyperplanes
21
What can a TLU learn?
22
Linearly Separable Functions
  • A single TLU can implement any Linearly separable
    function
  • AB is Linearly separable
  • A ? B is not

23
NEURAL NETWORKS
  • An Architecture for Learning

24
Neural Network Fundamentals
  • Chain multiple TLUs together
  • Three layers
  • Input Layer
  • Hidden Layers
  • Output Layer
  • Two classifications
  • Feed-Forward
  • Recurrent

25
Neural Network Terminology
26
Training ANNs Backpropagation
  • Main Idea distribute the error function across
    the hidden layers, corresponding to their effect
    on the output
  • Works on feed-forward networks
  • Use sigmoid units to train, and then we can
    replace with threshold functions.

27
Back-Propagation Birds-eye view
  • Repeat
  • Choose training pair and copy it to input layer
  • Cycle that pattern through the net
  • Calculate error derivative between output
    activation and target output
  • Back propagate the summed product of the weights
    and errors in the output layer to calculate the
    error on the hidden units
  • Update weights according to the error on that
    unit
  • Until error is low or the net settles

28
Back-Prop Sharing the Blame
  • We want to assign
  • Wij weights of i-th sigmoid in j-th layer
  • Xj-1 inputs to our TLU (outputs from previous
    layer)
  • cij learning rate constant of i-th sigmoid in
    j-th layer
  • ?ij sensitivity of the network output to
    changes in the input of our TLU
  • Important equation

29
Back-Prop Calculating ?ij
  • For the output layer ?ij ?k
  • ?ij ?k (d-f)?f/ ?sk
  • ?ij (d-f)f(1-f) for sigmoid
  • Therefore Wk lt- Wk ck (d - f) f (1 -f ) Xk-1
  • For the hidden layers
  • See Nilsson 1998 for calculation
  • Recursive Formula base case ?k (d-f)f(1-f)

30
Back-Prop Example
  • Train a 2-layer Neural net with the following
    input
  • x10 1, x20 0, x30 1, d 0
  • x10 0, x20 0, x30 1, d 1
  • x10 0, x20 1, x30 1, d 0
  • x10 1, x20 1, x30 1, d 1

31
Back-Prop Problems
  • Learning rate is non-optimal
  • One solution Learn the learning rate
  • Network Paralysis Weights grow so large that
    fij(1-fij) --gt 0, and the net never learns
  • Local Extrema Gradient Descent is a greedy
    method
  • These problems are acceptable in many cases, even
    if workarounds cant be found

32
Back-Prop Momentum
  • We want to choose a learning rate that is as
    large as possible
  • Speed up convergence
  • Avoid oscillations
  • Add momentum term dependent on past weight
    change

33
Another Method ALOPEX
  • Used for visual receptive field mapping by
    Tzanakou and Harth,1973
  • Originally developed for receptive field mapping
    in the visual pathway of frogs
  • The main ideas
  • Use cross-correlation to determine a direction of
    movement in gradient field
  • Add a random element to avoid local extrema

34
WORKING WITHNEURAL NETS
  • AI the easy way!

35
ANN Project Lifecycle
  • Task identification and design
  • Feasibility
  • Data Coding
  • Network Design
  • Data Collection
  • Data Checking
  • Training and Testing
  • Error Analysis
  • Network Analysis
  • System Implementation

36
ANN Design Tradeoffs
  • A good design will find a balance between these
    two extremes!

37
ANN Design Balance Depth
  • Too few hidden layers will cause errors in
    accuracy
  • Too many errors will cause errors in
    generalization!

38
CLICK!
  • Modeling the neuron

39
Wetware Biological Neurons
40
The Process Neuron Firing
  • Each electrical signal received at a synapse
    causes neurotransmitter release
  • The neurotransmitter travels along the synaptic
    cleft and received by the other neuron at a
    receptor site
  • Post-Synaptic-Potential (PSP) either increases
    (hyperpolarizes) or decreases (depolarizes) the
    polarization of the post-synaptic membrane (the
    receptors)
  • In hyperpolarization, the spike train is
    inhibited. In depolarization, the spike train is
    excited.

41
The Process Part 2
  • Each PSP travels along the dendrite of the new
    neuron, and spreads itself over the cell body
  • When the effects of the PSP reaches the
    axon-hillock, it is summed with other PSPs.
  • If the sum is greater than a certain threshold,
    the neuron fires a spike along the axon
  • Once the spike reaches the synapse of an efferent
    neuron, the process starts in that neuron

42
The neuron to the TLU
  • Cell Body (Soma) accumulator plus its threshold
    function
  • Dendrites inputs to the TLU
  • Axon output of the TLU
  • Information Encoding
  • Neurons use frequency
  • TLUs use value

43
Modeling the Neuron Capabilities
  • Humans and Neural Nets are both
  • Good at pattern recognition
  • Bad at mathematical calculation
  • Good at compressing lots of information into a
    yes/no decision
  • Taught via training period
  • TLUs win because neurons are slow
  • Wetware wins because we have a cheap source of
    billions of neurons

44
Do ANNs model neuron structures?
  • No Hundreds of types of specialized nerons, only
    one TLU
  • No Weights to neural threshold controlled by
    many neurotransmitters, not just one
  • Yes Most of the complexity in the neuron is
    devoted to sustaining life, not information
    processing
  • Maybe There is no real method for
    backpropagation in the brain. Instead, firing of
    neurons increases connection strength

45
High Level Agent Architecture
  • Our minds are composed of a series of
    non-intelligent agents
  • The hierarchy, interconnections, and interactions
    between the agents creates our intelligence
  • There is no one agent in control
  • We learn by forming new connections between
    agents
  • We improve by dealing with agents at a higher
    level, ie creating mental scripts

46
Agent Hierarchy Playing with Blocks
From the outside, Builder knows how to build
towers. From inside, Builder just turns on other
agents.
47
How We Remember K-Line Theory
48
New Knowledge Connections
  • Sandcastles in the sky Everything we know is
    connected to everything else we know
  • Knowledge is acquired by making connections new
    between things we already know

49
Learning Meaning
  • Uniframing Combining several descriptions into
    one
  • Accumulating Collecting incompatible
    descriptions
  • Reformulating modifying a descriptions
    character
  • Transforming bridging between structures and
    functions or actions

50
The Exception Principle
  • It rarely pays to tamper with a rule that nearly
    always works. It is better to complement it with
    an accumulation of exceptions
  • Birds can Fly
  • Birds can fly unless they are penguins and
    ostriches

51
The Exception Principle Overfitting
  • Birds can fly, unless they are penguins and
    ostriches, or if they happen to be dead, or have
    broken wings, or are confined to cages, or have
    their feet stuck in cement, or have undergone
    experiences so dreadful as to render them
    psychologically incapable of flight
  • In real thought, finding exceptions to everything
    is usually unnecessary.

52
Minskys Princples
  • Most new knowledge is simply finding a new way to
    relate things we already know
  • There is nothing wrong with circular logic or
    having imperfect rules
  • Any idea will seem self-evident... once youve
    forgotten learning it.
  • Easy things are hard Were least aware of what
    our minds do best

53
TO THE FUTURE AND BEYOND
  • Why you should be nice
  • to your computer

54
Im lonely and Im bored.Come play with me!
55
Computers are Dumb
  • Deep Blue might be able to win at chess, but it
    wont know to come in from the rain.
  • Computers can only know what theyre told, or
    what theyre told to learn
  • Computers lack a sense of mortality and a
    physical self with which to preserve
  • All of this will change when computers can reach
    consciousness

56
I, Silicon Consciousness
  • Kurzweil By 2019, a 1000 computer will be
    equivalent to the human brain.
  • By 2029, machines will claim to be conscious. We
    will believe them.
  • By 2049, nanobot swarms will make virtual reality
    obsolete in real reality.
  • By 2099, man and machine will have completely
    merged.

57
You mean to tell me?????
  • We humans will gradually introduce machines into
    our bodies, as implants
  • Our machines will grow more human as they learn,
    and learn to design themselves
  • The Neo-Luddite scenarios
  • AI succeeds in creating conscious beings. All
    life is at the mercy of the machines.
  • Humans retain control workers are obsolete. The
    power to decide the fate of the masses is now
    completely in the hands of the elite.

58
Neural Networks Conclusions
  • Neural Networks are a powerful tool for
  • Pattern recognition
  • Generalizing to a problem
  • Machine learning
  • Training Neural Networks
  • Can be done, but exercise great care
  • Still has room for improvement
  • Understanding and creating consciousness?
  • Still working on it )
Write a Comment
User Comments (0)
About PowerShow.com