Neural Networks - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

Neural Networks

Description:

A neural network (NN) is a machine learning approach inspired ... units) may overcome linear inseparability problem, learning methods for such nets are needed ... – PowerPoint PPT presentation

Number of Views:381
Avg rating:3.0/5.0
Slides: 75
Provided by: miba3
Category:

less

Transcript and Presenter's Notes

Title: Neural Networks


1
Neural Networks
(Beslutningsstøttesystemer I mm5)
  • MM5 Introduction to Neural Networks.
  • Søren Plougmann

2
Today's Schedule
  • Introduction, Neural Networks (NN)
  • Biological inspiration
  • Pattern recognition (Simple NN)
  • NN concepts
  • Learning in NN
  • Different NN types

3
Neural Networks
  • A neural network (NN) is a machine learning
    approach inspired by the way in which the brain
    performs a particular learning task.
  • Knowledge about the learning task is given in the
    form of examples called training examples.
  • Black box
  • A NN is specified by
  • an architecture a set of neurons and links
    connecting neurons. Each link has a weight,
  • a neuron model the information processing unit
    of the NN,
  • a learning algorithm used for training the NN by
    modifying the weights in order to model the
    particular learning task correctly on the
    training examples.
  • The aim is to obtain a NN that generalizes well,
    that is, that behaves correctly on new instances
    of the learning task.

4
Decision support systems
Reliance on data
Reliance on knowledge
Knowledge-based
Model-based
Data-based
CPN
Neural Networks
Rule-based systems
Differential equations
5
Today's Schedule
  • Introduction, Neural Networks (NN)
  • Biological inspiration
  • Pattern recognition (Simple NN)
  • NN concepts
  • Learning in NN
  • Different NN types

6
Biological inspirations
  • Some numbers
  • The human brain contains about 10 billion nerve
    cells (neurons)
  • Each neuron is connected to the others through
    10000 synapses
  • Properties of the brain
  • It can learn, reorganize itself from experience
  • It adapts to the environment
  • It is robust and fault tolerant

7
Biological neuron
  • A neuron has
  • A branching input (dendrites)
  • A branching output (the axon)
  • The information circulates from the dendrites to
    the axon via the cell body
  • Axon connects to dendrites via synapses
  • Synapses vary in strength
  • Synapses may be excitatory or inhibitory

8
The Neuron
9
Today's Schedule
  • Introduction, Neural Networks (NN)
  • Biological inspiration
  • Pattern recognition (Simple NN)
  • NN concepts
  • Learning in NN
  • Different NN types

10
General discussion
  • Pattern recognition
  • Patterns images, personal records, driving
    habits, etc.
  • Represented as a vector of features (encoded as
    integers or real numbers in NN)
  • Pattern classification
  • Classify a pattern to one of the given classes
  • Form pattern classes
  • Pattern associative recall
  • Using a pattern to recall a related pattern
  • Pattern completion using a partial pattern to
    recall the whole pattern
  • Pattern recovery deals with noise, distortion,
    missing information

11
General Architecture
  • Single layer
  • net input to Y
  • bias b is treated as the weight from a special
    unit with constant output 1.
  • threshold related to Y
  • output
  • classify into one of the
    two classes

12
Decision region/boundary
  • n 2, b ! 0, q 0
  • is a line, called decision boundary, which
    partitions the plane into two decision regions
  • If a point/pattern is in the
    positive region, then
  • , and the output is one
    (belongs to class one)
  • Otherwise, , output 1 (belongs to class
    two)
  • n 2, b 0, q ! 0 would result a similar
    partition

13
Decision region/boundary
  • If n 3 (three input units), then the decision
    boundary is a two dimensional plane in a three
    dimensional space
  • In general, a decision boundary
    is a n-1 dimensional hyper-plane in an
    n dimensional space, which partition the space
    into two decision regions
  • This simple network thus can classify a given
    pattern into one of the two classes, provided one
    of these two classes is entirely in one decision
    region (one side of the decision boundary) and
    the other class is in another region.
  • The decision boundary is determined completely by
    the weights W and the bias b (or threshold q).

14
Linear Separability Problem
  • If two classes of patterns can be separated by a
    decision boundary, represented by the linear
    equation
  • then they are said to be linearly separable. The
    simple network can correctly classify any
    patterns.
  • Decision boundary (i.e., W, b or q) of linearly
    separable classes can be determined either by
    some learning procedures or by solving linear
    equation systems based on representative patterns
    of each classes
  • If such a decision boundary does not exist, then
    the two classes are said to be linearly
    inseparable.
  • Linearly inseparable problems cannot be solved by
    the simple network , more sophisticated
    architecture is needed.

15
Examples of linearly separable classes
  • - Logical AND function
  • patterns (bipolar) decision boundary
  • x1 x2 y w1 1
  • -1 -1 -1 w2 1
  • -1 1 -1 b -1
  • 1 -1 -1 q 0
  • 1 1 1 -1 x1 x2 0
  • - Logical OR function
  • patterns (bipolar) decision boundary
  • x1 x2 y w1 1
  • -1 -1 -1 w2 1
  • -1 1 1 b 1
  • 1 -1 1 q 0
  • 1 1 1 1 x1 x2 0

16
  • Examples of linearly inseparable classes
  • - Logical XOR (exclusive OR) function
  • patterns (bipolar) decision boundary
  • x1 x2 y
  • -1 -1 -1
  • -1 1 1
  • 1 -1 1
  • 1 1 -1
  • No line can separate these two classes, as
    can be seen from the fact that the following
    linear inequality system has no solution
  • because we have b lt 0 from
  • (1) (4), and b gt 0 from
  • (2) (3), which is a
  • contradiction

17
Summary of these simple networks
  • Single layer nets have limited representation
    power (linear separability problem)
  • Error drive seems a good way to train a net
  • Multi-layer nets (or nets with non-linear hidden
    units) may overcome linear inseparability
    problem, learning methods for such nets are
    needed
  • Threshold/step output functions hinders the
    effort to develop learning methods for
    multi-layered nets

18
Today's Schedule
  • Introduction, Neural Networks (NN)
  • Biological inspiration
  • Pattern recognition (Simple NN)
  • NN concepts
  • Learning in NN
  • Different NN types

19
Dimensions of a Neural Network
  • Various network architectures
  • Various types of neurons
  • Various learning algorithms
  • Various applications

20
Network architectures
  • Three different classes of network architectures
  • single-layer feed-forward neurons are
    organized
  • multi-layer feed-forward in acyclic
    layers
  • recurrent (feed-backward)
  • The architecture of a neural network is linked
    with the learning algorithm used to train

21
Single Layer Feed-forward
Input layer of source nodes
Output layer of neurons
22
Multi layer feed-forward
3-4-2 Network
Output layer
Input layer
Hidden Layer
23
Recurrent network
  • Recurrent Network with hidden neuron unit delay
    operator z-1 is used to model a dynamic system

input hidden output
24
The Neuron
25
The Neuron
  • The neuron is the basic information processing
    unit of a NN. It consists of
  • A set of links, describing the neuron inputs,
    with weights W1, W2, , Wm
  • An adder function (linear combiner) for computing
    the weighted sum of
    the inputs (real numbers)
  • Activation function (squashing function)
    for limiting the amplitude of the neuron output.

26
Bias as extra input
  • The bias is an external parameter of the neuron.
    It can be modeled by adding an extra input.

27
Neuron Models
  • The choice of determines the neuron model.
    Examples
  • step function
  • ramp function
  • sigmoid function
  • with z,x,y parameters

28
Today's Schedule
  • Introduction, Neural Networks (NN)
  • Biological inspiration
  • Pattern recognition (Simple NN)
  • NN concepts
  • Learning in NN
  • Different NN types

29
Learning
  • The procedure that consists in estimating the
    parameters of neurons so that the whole network
    can perform a specific task
  • 2 types of learning
  • The supervised learning
  • The unsupervised learning
  • The Learning process (supervised)
  • Present the network a number of inputs and their
    corresponding outputs
  • See how closely the actual outputs match the
    desired ones
  • Modify the parameters to better approximate the
    desired outputs

30
Supervised learning
  • The desired response of the neural network in
    function of particular inputs is well known.
  • A Professor may provide examples and teach the
    neural network how to fulfill a certain task

31
Supervised learning
  • Minimises error in training set
  • Optimal minimise error of new cases
  • Over-learning
  • Larger network equals lower error
  • Solution
  • Traning set
  • Selection set
  • Test set

32
Example

33
Unsupervised learning
  • Idea group typical input data in function of
    resemblance criteria un-known a priori
  • Data clustering
  • No need of a professor
  • The network finds itself the correlations
    between the data
  • Examples of such networks
  • SOM (Kohonen feature maps)

34
Properties
  • Adaptivity
  • Adapt weights to environment and retrained easily
  • Generalization ability
  • May provide against lack of data
  • Fault tolerance
  • Graceful degradation of performances if damaged
    gt The information is distributed within the
    entire net.

35
Learning
Supervised
Unsupervised
Data Labeled examples (input , desired
output) Problems classification pattern
recognition regression NN models perceptron
adaline feed-forward NN radial basis function
Data Unlabeled examples (different realizations
of the input) Problems clustering content
addressable memory NN models self-organizing
maps (SOM) Hopfield networks

36
Today's Schedule
  • Introduction, Neural Networks (NN)
  • Biological inspiration
  • Pattern recognition (Simple NN)
  • NN concepts
  • Learning in NN
  • Different NN types

37
DIFFERENT NN
  • Functional classicication
  • Classification models
  • Perceptron
  • Multilayer FF network
  • Radial Basis functions
  • Association models
  • Hopfield (Hamming)
  • (Optimisation models)
  • Self-organization models
  • Kohonen SOM

38
Perceptron Neuron Model
  • The (McCulloch-Pitts) perceptron is a single
    layer NN with a non-linear ?, the sign function

39
Perceptron for Classification
  • The perceptron is used for binary classification.
  • Given training examples of classes C1, C2 train
    the perceptron in such a way that it classifies
    correctly the training examples
  • If the output of the perceptron is 1 then the
    input is assigned to class C1
  • If the output is -1 then the input is assigned
    to C2

40
Perceptron Training
  • How can we train a perceptron for a
    classification task?
  • We try to find suitable values for the weights
    in such a way that the training examples are
    correctly classified.
  • Geometrically, we try to find a hyper-plane that
    separates the examples of the two classes.

41
Perceptron Geometric View
  • The equation below describes a (hyper-)plane in
    the input space consisting of real valued
    m-dimensional vectors. The plane splits the input
    space into two regions, each of them describing
    one class.

decision region for C1
x2
w1x1 w2x2 w0 gt 0
decision boundary
C1
x1
C2
w1x1 w2x2 w0 0
42
The fixed-increment learning algorithm
  • n1
  • initialize w(n) randomly
  • while (there are misclassified training examples)
  • Select a misclassified augmented example
    (x(n),d(n))
  • w(n1) w(n) ?d(n)x(n)
  • n n1
  • end-while
  • ? learning rate parameter (real number)

43
Perceptron Limitations
  • The perceptron can only model linearly separable
    classes, like (those described by) the following
    Boolean functions
  • AND
  • OR
  • COMPLEMENT
  • It cannot model the XOR.

44
Multi layer feed-forward NNFFNN
We consider a more general network architecture
between the input and output layers there are
hidden layers, as illustrated below. Hidden
nodes do not directly receive inputs nor send
outputs to the external environment. FFNNs
overcome the limitation of single-layer NN they
can handle non-linearly separable learning
tasks.
Input layer
Output layer
Hidden Layer
45
XOR problem
A typical example of non-linealy separable
function is the XOR. This function takes two
input arguments with values in -1,1 and
returns one output in -1,1, as specified in the
following table
If we think at -1 and 1 as encoding of the truth
values false and true, respectively, then XOR
computes the logical exclusive or, which yields
true if and only if the two inputs have different
truth values.
46
XOR problem
In this graph of the XOR, input pairs giving
output equal to 1 and -1 are depicted with green
and blue circles, respectively. These two
classes (green and blue) cannot be separated
using a line. We have to use two lines, like
those depicted in blue. The following NN with two
hidden nodes realizes this non-linear
separation, where each hidden node describes one
of the two blue lines.
This NN uses the sign activation function. The
two green arrows indicate the directions of
the weight vectors of the two hidden nodes,
(1,-1) and (-1,1). They indicate the regions
where the network output will be 1. The
output node is used to combine the outputs of the
two hidden nodes.
x1
x2
-1
47
Types of decision regions
Network with a single node
One-hidden layer network that realizes the
convex region each hidden node realizes one of
the lines bounding the convex region
-3.5
P1
two-hidden layer network that realizes the union
of three convex regions each box represents a
one hidden layer network realizing one convex
region
P2
P3
-0.5
48
FFNN NEURON MODEL
  • The classical learning algorithm of FFNN is based
    on the gradient descent method. For this reason
    the activation function used in FFNN are
    continuous functions of the weights,
    differentiable everywhere.
  • A typical activation function that can be viewed
    as a continuous approximation of the step
    (threshold) function is the Sigmoid Function. The
    activation function for node j is
  • when a tends to infinity then ? becomes the step
    function

49
Associative Memories
  • We consider now NN models for unsupervised
    learning problems called auto-association
    problems.
  • Association is the task of mapping patterns to
    patterns.
  • An associative memory is one in which the
    stimulus of an incomplete or corrupted pattern
    leads to the response of a stored pattern that
    corresponds in some manner to the input pattern.
  • A neural network model most commonly used for
    (auto-)association problems is the Hopfield
    network.

50
Example
51
HOPFIELD NETWORKS
  • The Hopfield network implements a so-called
    content addressable memory.
  • A collection of patterns called fundamental
    memories is stored in the NN by means of weights.
  • Each neuron represents a component of the input.
  • The weight of the link between two neurons
    measures the correlation between the two
    corresponding components over the fundamental
    memories. If the weight is high then the
    corresponding components are often equal in the
    fundamental memories.

52
ARCHITECTURE recurrent
z-1
z-1
z-1
unit-delay operator
53
Hopfield discrete NN
  • Input vectors values are in -1,1 (or 0,1).
  • Every neuron has a link from every other neuron
    (recurrent architecture) except itself (no
    self-feedback).
  • The neuron state at time n is its output value.
  • The network state at time n is the vector of
    neurons states.
  • The activation function used to update a neuron
    state is the sign function but if the input of
    the activation function is 0 then the new output
    (state) of the neuron is equal to the old one.
  • Weights are symmetric

54
Notation
  • N input dimension.
  • M number of fundamental memories.
  • i-th component of the
    fundamental memory.
  • State of neuron i at time n.

55
Weights computation
1. Storage. Let f1, f2, , fM denote a known
set of N-dimensional fundamental memories. The
synaptic weights of the network are where
wji is the weight from neuron i to neuron j. The
elements of the vector fµ are in -1,1. Once
they are computed, the synaptic weights are kept
fixed.
56
NN Execution Retrieval
2. Initialisation. Let x probe denote an input
vector (probe) presented to the network. The
algorithm is initialised by setting where
xj(0) is the state of neuron j at time n 0, and
x probe,j is the jth element of the probe vector
x probe.
3. Iteration Until Convergence. Update the
elements of network state vector x(n)
asynchronously (i.e. randomly and one at the
time) according to the rule Repeat the
iteration until the state vector x remains
unchanged.
4. Outputting. Let xfixed denote the fixed
point (or stable state, that is such that
x(n1)x(n)) computed at the end of step 3. The
resulting output y of the network is
57
Unsupervised Learning
  • Neural networks for unsupervised learning attempt
    to discover interesting structure in the
    available data.
  • There is no information about the desired class
    (or output ) of an example.
  • Self Organizing Maps (SOM) combine a competitive
    learning principle with a topological structuring
    of neurons.

58
Biological Motivation
59
SOM ARCHITECTURE
  • The input is connected with each neuron of a
    lattice.
  • The topology of the lattice allows one to define
    a neighborhood structure on the neurons, like
    those illustrated below.


2-dimensional topology
and two possible neighborhoods
1-dimensional topology
with a small neighborhood
60
A 2-dimensional SOM
61
The idea
  • Upon repeated presentations of the training
    examples, the weight vectors of the neurons tend
    to follow the distribution of the examples.
  • This results in a topological ordering of the
    neurons, where neurons adjacent to each other
    tend to have similar weight vectors.
  • The input space of patterns is mapped onto a
    discrete output space of neurons.

62
The approach
  • One has to find values for the weight vectors of
    the links from the input layer to the nodes of
    the lattice, in such a way that adjacent neurons
    will have similar weight vectors.
  • For an input, the output of a SOM neural network
    will be the neuron with weight vector most
    similar (with respect to Euclidean distance) to
    that input.
  • Each neuron can be seen as representing the
    cluster containing all the input examples which
    are mapped to that neuron by the SOM NN.

63
Summary of the SOM algorithm
  • Initialization. n0. Choose random small values
    for weight vectors components.
  • Sampling. Select an x from the input examples.
  • Similarity matching. Find the winning neuron i(x)
    at iteration n
  • Updating adjust the weight vectors of all
    neurons using the following rule
  • Continuation n n1. Go to the Sampling step
    until no noticeable changes in the weights are
    observed.

64
Example 1
  • A two dimensional lattice driven by a two
    dimensional distribution
  • 100 neurons arranged in a 2D lattice of 10 x 10
    nodes.
  • Input is bidimensional x (x1, x2) from uniform
    distribution in region (-1 lt x1 lt
    1) (-1 lt x2 lt 1)
  • Weights are initialized with small random values.

65
Visualization
  • Neurons are visualized as changing positions in
    the weight space as learning takes place. Each
    neuron is described by the corresponding weight
    vector consisting of the weights of the links
    from the input layer to that neuron.
  • Two neurons are connected by an edge if they are
    direct neighbors in the NN lattice.

66
Example 1
67
Example 2
  • A one dimensional lattice driven by a two
    dimensional distribution
  • 100 neurons arranged in one dimensional lattice.
  • Input space is the same as in Example 1.
  • Weights are initialized with random values (again
    like in example 1).

68
Example 2
69
Radial-Basis Function Networks
  • A function is radial basis (RBF) if its output
    depends on (is a non-increasing function of) the
    distance of the input from a given stored
    vector.
  • RBFs represent local receptors, as illustrated
    below, where each green point is a stored vector
    used in one RBF.
  • In a RBF network one hidden layer uses neurons
    with RBF activation functions describing local
    receptors. Then one output node is used to
    combine linearly the outputs of the hidden
    neurons.

w3
The output of the blue vector is interpolated
using the three green vectors, where each vector
gives a contribution that depends on its weight
and on its distance from the blue point. In the
picture we have
w2
w1
70
RBF ARCHITECTURE
  • One hidden layer with RBF activation functions
  • Output layer with linear activation function.

71
HIDDEN NEURON MODEL
  • Hidden units use radial basis functions

the output depends on the distance of the input
x from the center t
f?( x - t)
x1
f?( x - t) t is called center ? is called
spread center and spread are parameters
x2
xm
72
Hidden Neurons
  • A hidden neuron is more sensitive to data points
    near its center.
  • For Gaussian RBF this sensitivity may be tuned by
    adjusting the spread ?, where a larger spread
    implies less sensitivity.

73
Comparison with multilayer NN
  • Architecture
  • RBF networks have one single hidden layer.
  • FFNN networks may have more hidden layers.
  • Neuron Model
  • In RBF the neuron model of the hidden neurons is
    different from the one of the output nodes.
  • Typically in FFNN hidden and output neurons
    share a common neuron model.
  • The hidden layer of RBF is non-linear, the output
    layer of RBF is linear.
  • Hidden and output layers of FFNN are usually
    non-linear.

74
Further reading ?
  • Very good (comprehensive)

http//www.statsoftinc.com/textbook/stneunet.html
  • Good overview, no formulas

Matlab Neural Network toolbox
Write a Comment
User Comments (0)
About PowerShow.com