Identification and Neural Networks

About This Presentation

Title:

Identification and Neural Networks

Description:

Identification and Neural Networks G. Horv th I S R G Department of Measurement and Information Systems – PowerPoint PPT presentation

Number of Views:177

Avg rating:3.0/5.0

Slides: 72

Provided by: Gabo82

Category:

more less

Transcript and Presenter's Notes

Title: Identification and Neural Networks

1

Identification and Neural Networks

G. Horváth
I S R G
Department of Measurement and Information Systems
2
Modular networks

Why modular approach
Motivations
Biological
Learning
Computational
Implementation

3
Motivations

Biological
Biological systems are not homogenous
Functional specialization
Fault tolerance
Cooperation, competition
Scalability
Extendibility

4
Motivations

Complexity of learning (divide and conquer)
Training of complex network (many layers)
layer by layer learning
Speed of learning
Catastrophic interference, incremental learning
Mixing supervised and unsupervised learning
Hierarchical knowledge structure

5
Motivations

Computational
The capacity of a network
The size of the network
Catastrophic interference
Generalization capability vs network complexity

6
Motivations

Implementation (hardware)
The degree of parallelism
Number of connections
The length of physical connections
Fan out

7
Modular networks

What modules
The modules are disagree on some inputs
every module solves the same, whole problem,
different ways of solutions (different modules)
every module solves different tasks (sub-tasks)
task decomposition (input space, output space)

8
Modular networks

How combine modules
Cooperative modules
simple average
weighted average (fixed weights)
optimal linear combination (OLC) of networks
Competitive modules
majority vote
winner takes all
Competitive/cooperative modules
weighted average (input-dependent weights)
mixture of experts (MOE)

9
Modular networks

Construct of modular networks
Task decomposition, subtask definition
Training modules for solving subtasks
Integration of the results
(cooperation and/or competition)

10
Modular networks

Cooperative networks
Ensemble (average)
Optimal linear combination of networks
Disjoint subtasks
Competitive networks
Ensemble (vote)
Competitive/cooperative networks
Mixture of experts

11
Cooperative networks

Ensemble of cooperating networks
(classification/regression)
The motivation
Heuristic explanation
Different experts together can solve a problem
better
Complementary knowledge
Mathematical justification
Accurate and diverse modules

12
Ensemble of networks

Mathematical justification
Ensemble output
Ambiguity (diversity)
Individual error
Ensemble error
Constraint

13
Ensemble of networks

Mathematical justification (contd)
Weighted error
Weighted diversity
Ensemble error
Averaging over the input distribution
Solution Ensemble of accurate and diverse
networks

14
Ensemble of networks

How to get accurate and diverse networks
different structures more than one network
structure (e.g. MLP, RBF, CCN, etc.)
different size, different complexity networks
(number of hidden units, number of layers,
nonlinear function, etc.)
different learning strategies (BP, CG, random
search,etc.) batch learning, sequential learning
different training algorithms, sample order,
learning samples
different training parameters
different starting parameter values
different stopping criteria

15
Linear combination of networks
16
Linear combination of networks

Computation of optimal coefficients
? simple average
, k depends on
the input for different input domains different
network (alone gives the output)
optimal values using the constraint
optimal values without any constraint
Wiener-Hopf equation

17
Task decomposition

Decomposition related to learning
before learning (subtask definition)
during learning (automatic task decomposition)
Problem space decomposition
input space (input space clustering, definition
of different input regions)
output space (desired response)

18
Task decomposition

Decomposition into separate subproblems
K-class classification K two-class
problems (coarse decomposition)
Complex two-class problems smaller
two-class problems (fine decomposition)
Integration (module combination)

19
Task decomposition

A 3-class problem

20
Task decomposition

3 classes

2 small classes
2 small classes
21
Task decomposition

3 classes

2 small classes
2 small classes
2 classes
22
Task decomposition

3 classes

2 small classes
2 small classes
23
Task decomposition
24
Task decomposition

A two-class problem
decomposed into
subtasks

25
Task decomposition
M22
M21

M12
AND
AND
OR
26
Task decomposition

MIN

C1
MAX
Input
M21
MIN
M22
27
Task decomposition

Training set decomposition
Original training set
Training set for each of the (K) two-class
problems
Each of the two-class problems are divided into
K-1 smaller two-class problems using an inverter
module really (K-1)/2 is enough

28
Task decomposition

A practical example Zip code recognition

29
Task decomposition

Zip code recognition (handwritten character
recognition) modular solution

30
Mixture of Experts (MOE)
µ
S
g1
g2
gM
Gating network
?M
µ1
Expert M
Expert 2
Expert 1
x
31
Mixture of Experts (MOE)

The output is the weighted sum of the outputs of
the experts
is the parameter of the i-th expert
The output of the gating network softmax
function
is the parameter of the gating network

32
Mixture of Experts (MOE)

Probabilistic interpretation
The probabilistic model with true parameters
a priori probability

33
Mixture of Experts (MOE)

Training
Training data
Probability of generating output from the input
The log likelihood function (maximum likelihood
estimation)

34
Mixture of Experts (MOE)

Training (contd)
Gradient method
The parameter of the expert network
The parameter of the gating network

and
35
Mixture of Experts (MOE)

Training (contd)
A priori probability
A posteriori probability

36
Mixture of Experts (MOE)

Training (contd)
EM (Expectation Maximization) algorithm
A general iterative technique for maximum
likelihood estimation
Introducing hidden variables
Defining a log likelihood function
Two steps
Expectation of the hidden variables
Maximization of the log likelihood function

37
EM (Expectation Maximization) algorithm

A simple example estimating means of k (2)
Gaussians

38
EM (Expectation Maximization) algorithm

A simple example estimating means of k (2)
Gaussians
hidden variables for every observation,
(x(l), zi1, zi2)
likelihood function
Log likelihood function
expected value of with given

39
Mixture of Experts (MOE)

A simple example estimating means of k (2)
Gaussians
Expected log likelihood function
where
The estimate of the means

40
Mixture of Experts (MOE)

Applications
Simple experts linear experts
ECG diagnostics
Mixture of Kalman filters
Discussion comparison to non-modular
architecture

41
Support vector machines

A new approach
Gives answers for questions not solved using the
classical approach
The size of the network
The generalization capability

42
Support vector machines

Classification

Optimal hyperplane
Classical neural learning
Support Vector Machine
43
VC dimension
44
Structural error minimization
45
Support vector machines

Linearly separable two-class problem
separating hyperpalne

Optimal hyperplane
46
Support vector machines

Geometric interpretation

47
Support vector machines

Criterion function, Lagrange function
a constrained optimization problem
conditions
dual problem
support vectors optimal hyperplane

48
Support vector machines

Linearly nonseparable case
separating hyperplane
criterion function
Lagrange function
support vectors optimal hyperplane

Optimal hyperplane
49
Support vector machines

Nonlinear separation
separating hyperplane
decision surface
kernel function
criterion function

50
Support vector machines

Examples of SVM
Polynomial
RBF
MLP

51
Support vector machines

Example polynomial
basis functions
kernel function

52
SVR (classification)
Separable samples
Not separable samples
Constraint
Constraint
Minimize
Minimize
53
SVR (regression)
54
SVR (regression)

Constraints
Minimize
55
SVR (regression)

Lagrange function
dual problem
constraints
support vectors
solution

56
SVR (regression)
57
SVR (regression)
58
SVR (regression)
59
SVR (regression)
60
Support vector machines

Main advantages
generalization
size of the network
centre parameters for RBF
linear-in-the-parameter structure
noise immunity

61
Support vector machines

Main disadavantages
computation intensive (quadratic optimization)
hyperparameter selection
VC dimension (classification)
batch processing

62
Support vector machines

Variants
LS SVM
basic criterion function
Advantages easier to compute
adaptivity,

63
Mixture of SVMs

Problem of hyper-parameter selection for SVMs
Different SVMs, with different hyper-parameters
Soft separation of the input space

64
Mixture of SVMs
65
Boosting techniques

Boosting by filtering
Boosting by subsampling
Boosting by reweighting

66
Boosting techniques

Boosting by filtering

67
Boosting techniques

Boosting by subsampling

68
Boosting techniques

Boosting by reweighting

69
Other modular architectures
70
Other modular architectures
71
Other modular architectures

Modular classifiers
Decoupled modules
Hierarchical modules
Network ensemble (linear combination)
Network ensemble (decision, voting)

72
Modular architectures

Write a Comment

User Comments (0)

About PowerShow.com

Identification and Neural Networks - PowerPoint PPT Presentation

Identification and Neural Networks

Identification and Neural Networks G. Horv th I S R G Department of Measurement and Information Systems – PowerPoint PPT presentation