Outline Lecture 1 - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Outline Lecture 1

Description:

Pattern: a complete set of data inputs and outputs that provide a ... avoids operating in the saturation range of the sigmoid function. Number of Hidden Layers ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 38

Provided by: chrisb48

Category:

more less

Transcript and Presenter's Notes

Title: Outline Lecture 1

1
EE459 Neural NetworksHow to design a good
performance NN?
Kasin Prakobwaitayakit Department of Electrical
Engineering Chiangmai University
2
Glossary

Pattern a complete set of data inputs and
outputs that provide a snapshot of the system
being modelled
also called examples, cases
Feature an identifying characteristic in the
data that the model will ideally capture
Domain a set of boundaries that define the range
of expected / observed data for a particular
model or problem

3
Backpropagation Modelling Heuristics

Selection of model inputs and outputs
Defining the model domain
Data pre-processing
Selection of training/testing cases
Data scaling
Number of hidden layers
Number of neurons
Activation function selection
Initial weight values
Learning rate and momentum
Presentation of patterns
Stopping criteria for training
Improving model performance

4
Selection of Model Inputs and Outputs

Start with a set of inputs that are KNOWN to
affect the process
then add other inputs that are suspected of
having a relationship in the process one at a
time
Eliminate input variables that are redundant
(high covariance)
Eliminate data patterns that do not contribute to
training (no new information)
Identify / eliminate data patterns with the same
inputs that have different outputs

5
Defining the Model Domain

ANNs should be confined to a limited domain
develop separate models for contradictory areas
of the domain
Build models that predict a single output
link multiple models together, if required

6
Data Pre-processing

Any time the dynamic range of an input is over a
few orders of magnitude, a logarithmic
transformation should be applied
Transformations may also be useful in
reconditioning input data when an ANN has
trouble converging
Non-numerical inputs need to be codified

7
Selection of Training/Testing Cases

Training cases should be representative of the
problem domain
For good generalization capacity, the training
set must be complete
important variable must be measured
Data set is randomly divided into training and
testing sets
in a 7030 ratio, for example

Rules of thumb
The number of training patterns should be at
least 5 times the number of nodes in the network
The number of training cases should be roughly
the number of weightsthe inverse of the accuracy
parameter (e, e0.9 means 90 accuracy of
prediction is required)

9
Data Scaling

Output variables should be scaled in the 0.1 to
0.9 range
avoids operating in the saturation range of the
sigmoid function

10
Number of Hidden Layers

Increasing the number of hidden layers increases
both the time and number of patterns (examples)
required for training
For most problems, one hidden layer will suffice
problems can always be solved with two
Multiple slabs (Ward Nets) supposedly increase
processing power
each slab (group of neurons) acts as a detector
for one or more input features

11
Number of Neurons

One neuron in the input layer for each input
One neuron in the output layer for each output
Proper number of hidden neurons is often
determined experimentally
too few poor ability to capture features
too many poor ability to generalize (ANN simply
memorizes training data)
Various rules of thumb have been reported
0.75N
2N 1

N number of inputs
12

In general, more weights introduce more local
minima to the error surface
Flat regions in the error surface can mislead
gradient-based error minimization methods
(backpropagation)
Start with a small network and then add
connections as needed
avoids convergence problems as the networks get
too large
The optimum ratio of hidden neurons in the first
to second hidden layer is 31

13
Activation Function Selection

Sigmoidal (logistic) activation functions are the
most widely used
Thresholding functions are only useful for
categorical outputs

14
Initial Weight Values

Weights need to be randomized initially
if all weights are set to the same number, the
GDR would never be able to leave the starting
point
The backpropagation algorithm may also have
difficulty if the connection weights are
prematurely saturated (gt0.9)

15
Learning Rate and Momentum

Learning rate (?) affects the speed of
convergence.
If it is large (gt0.5), the weights will be
changed more drastically, but this may cause the
optimum to be overshot
If it is small (lt0.2) the weights will be changed
in smaller increments, thus causing the system to
converge more slowly with little oscillation
The best training occurs when the learning rate
is as large as possible without leading to
oscillation

The learning rate can be increased as learning
progresses, or a momentum term added to improve
network learning speed
The momentum factor (?) has the ability to dampen
out oscillations caused by increasing the
learning rate
a momentum of 0.9 allows higher learning rates

17
Presenting Patterns to the ANN

Present patterns in a random fashion
If input patterns can be easily classified, do
not train the ANN on all patterns in a class in
succession
the ANN will forget information as it moves from
class to class
Shaping can be used to improve network training
involves starting off on a very small cohesive
data set and then adding more patterns that have
greater deviations from variable means as
training progresses

18
Stopping Criteria for Training

Training should be stopped when one of the
following conditions is met
the testing error is sufficiently small
the testing error begins to increase
a set number of iterations has passed

19
Improving Model Performance

Re-initialize network weights to a new set of
random values
re-train the model
Adjust learning rate and momentum
Modify stopping criteria
Prune network weights
Use genetic algorithms to adjust network topology
Add noise to training cases to decrease the
chance of memorization

20
ANN Modelling Approach
21
Needs and Suitability Assessment

What are the modelling needs ?
Is the ANN technique suitable to meet these needs
?
Can the following be met ?
data requirements
software requirements
hardware requirements
personnel requirements

22
Data Collection and Analysis

Successful models require careful attention to
data details
Recall relevant historical data is a key
requirement
For data collection, investigate
data availability
parameters, time-frame, frequency, format
QA/QC protocols
data reliability
process changes

Data requirements and guidelines
data for each of the parameters must be available
at least one full cycle of data must be available
appropriate QA/QC protocols must be in place
data collected prior to major process changes
should generally not be used
Data analysis involves
data characterization
a complete statistical analysis

Data characterization for each parameter
qualitative assessment of hourly, daily, seasonal
trends (graphical examination of data)
time-series analyses may be warranted
Statistical analysis for each parameter
measures of central tendency
mean, median, mode
measures of variability
standard deviation, variance
percentile analyses
identification of outliers, erroneous entries,
non-entries

25
Application of a Model-Building Protocol

There is no accepted best method of developing
ANN models
An infinite number of distinct architectures are
possible
A protocol is to reduce the number of
architectures that are evaluated

A sample five-step protocol

Selection of model inputs and outputs
Why its important
ANN models are based on process inputs and
outputs
How its done
first, select the model output
best models only have one output parameter
next, select model inputs from available input
parameters
selection is based on data availability,
literature, expert knowledge

Selection and organization of data patterns
Why its important
ANN models are only as good as the data used
separate independent data sets are required to
test and validate the model
How its done
examine each data pattern for erroneous entries,
outliers, blank entries
delete questionable data patterns
sort and divide data into training, testing, and
production sets
perform a statistical analysis on each of the
three data sets

Determination of architecture characteristics
Why its important
each modeling scenario will have an optimal
architecture
How its done
initially, hold many factors at software defaults
or at pre-determined values
use modelling heuristics from literature along
with expert knowledge
determine the number of hidden layer neurons
compare results for different runs

Evaluation of model stability
Why its important
ensure that model results are independent of the
method of data sorting
How its done
build new training, testing, and production sets
from the original database
re-train models on the new data sets
compare results with initial runs

Model fine tuning
Why its important
some models require minor improvements in order
to meet process operating criteria
How its done
modeling parameters previously held constant can
be varied to improve model results
the model fine-tuning methodology is typically
researcher-specific

32
Evaluating Model Performance