Title: Outline Lecture 1
 1EE459 Neural NetworksHow to design a good 
performance NN?
Kasin Prakobwaitayakit Department of Electrical 
Engineering Chiangmai University 
 2Glossary
- Pattern a complete set of data inputs and 
outputs that provide a snapshot of the system 
being modelled  - also called examples, cases 
 - Feature an identifying characteristic in the 
data that the model will ideally capture  - Domain a set of boundaries that define the range 
of expected / observed data for a particular 
model or problem 
  3Backpropagation Modelling Heuristics
- Selection of model inputs and outputs 
 - Defining the model domain 
 - Data pre-processing 
 - Selection of training/testing cases 
 - Data scaling 
 - Number of hidden layers 
 - Number of neurons 
 - Activation function selection 
 - Initial weight values 
 - Learning rate and momentum 
 - Presentation of patterns 
 - Stopping criteria for training 
 - Improving model performance
 
  4Selection of Model Inputs and Outputs
- Start with a set of inputs that are KNOWN to 
affect the process  - then add other inputs that are suspected of 
having a relationship in the process one at a 
time  - Eliminate input variables that are redundant 
(high covariance)  - Eliminate data patterns that do not contribute to 
training (no new information)  - Identify / eliminate data patterns with the same 
inputs that have different outputs  
  5Defining the Model Domain
- ANNs should be confined to a limited domain 
 - develop separate models for contradictory areas 
of the domain  - Build models that predict a single output 
 - link multiple models together, if required
 
  6Data Pre-processing
- Any time the dynamic range of an input is over a 
few orders of magnitude, a logarithmic 
transformation should be applied  - Transformations may also be useful in 
reconditioning input data when an ANN has 
trouble converging  - Non-numerical inputs need to be codified
 
  7Selection of Training/Testing Cases
- Training cases should be representative of the 
problem domain  - For good generalization capacity, the training 
set must be complete  - important variable must be measured 
 - Data set is randomly divided into training and 
testing sets  - in a 7030 ratio, for example
 
  8- Rules of thumb 
 - The number of training patterns should be at 
least 5 times the number of nodes in the network  - The number of training cases should be roughly  
the number of weightsthe inverse of the accuracy 
parameter (e, e0.9 means 90 accuracy of 
prediction is required)  
  9Data Scaling
- Output variables should be scaled in the 0.1 to 
0.9 range  - avoids operating in the saturation range of the 
sigmoid function  
  10Number of Hidden Layers
- Increasing the number of hidden layers increases 
both the time and number of patterns (examples) 
required for training  - For most problems, one hidden layer will suffice 
 - problems can always be solved with two 
 - Multiple slabs (Ward Nets) supposedly increase 
processing power  - each slab (group of neurons) acts as a detector 
for one or more input features 
  11Number of Neurons
- One neuron in the input layer for each input 
 - One neuron in the output layer for each output 
 - Proper number of hidden neurons is often 
determined experimentally  - too few  poor ability to capture features 
 - too many  poor ability to generalize (ANN simply 
memorizes training data)  - Various rules of thumb have been reported 
 - 0.75N 
 - 2N 1
 
N  number of inputs 
 12- In general, more weights introduce more local 
minima to the error surface  - Flat regions in the error surface can mislead 
gradient-based error minimization methods 
(backpropagation)  - Start with a small network and then add 
connections as needed  - avoids convergence problems as the networks get 
too large  - The optimum ratio of hidden neurons in the first 
to second hidden layer is 31  
  13Activation Function Selection
- Sigmoidal (logistic) activation functions are the 
most widely used  - Thresholding functions are only useful for 
categorical outputs 
  14Initial Weight Values
- Weights need to be randomized initially 
 - if all weights are set to the same number, the 
GDR would never be able to leave the starting 
point  - The backpropagation algorithm may also have 
difficulty if the connection weights are 
prematurely saturated (gt0.9)  
  15Learning Rate and Momentum
- Learning rate (?) affects the speed of 
convergence.  - If it is large (gt0.5), the weights will be 
changed more drastically, but this may cause the 
optimum to be overshot  - If it is small (lt0.2) the weights will be changed 
in smaller increments, thus causing the system to 
converge more slowly with little oscillation  - The best training occurs when the learning rate 
is as large as possible without leading to 
oscillation 
  16- The learning rate can be increased as learning 
progresses, or a momentum term added to improve 
network learning speed  - The momentum factor (?) has the ability to dampen 
out oscillations caused by increasing the 
learning rate  - a momentum of 0.9 allows higher learning rates
 
  17Presenting Patterns to the ANN
- Present patterns in a random fashion 
 - If input patterns can be easily classified, do 
not train the ANN on all patterns in a class in 
succession  - the ANN will forget information as it moves from 
class to class  - Shaping can be used to improve network training 
 - involves starting off on a very small cohesive 
data set and then adding more patterns that have 
greater deviations from variable means as 
training progresses  
  18Stopping Criteria for Training
- Training should be stopped when one of the 
following conditions is met  - the testing error is sufficiently small 
 - the testing error begins to increase 
 - a set number of iterations has passed
 
  19Improving Model Performance
- Re-initialize network weights to a new set of 
random values  - re-train the model 
 - Adjust learning rate and momentum 
 - Modify stopping criteria 
 - Prune network weights 
 - Use genetic algorithms to adjust network topology 
 - Add noise to training cases to decrease the 
chance of memorization 
  20ANN Modelling Approach 
 21Needs and Suitability Assessment
- What are the modelling needs ? 
 - Is the ANN technique suitable to meet these needs 
?  - Can the following be met ? 
 - data requirements 
 - software requirements 
 - hardware requirements 
 - personnel requirements
 
  22Data Collection and Analysis
- Successful models require careful attention to 
data details  - Recall  relevant historical data is a key 
requirement  - For data collection, investigate 
 - data availability 
 - parameters, time-frame, frequency, format 
 - QA/QC protocols 
 - data reliability 
 - process changes
 
  23- Data requirements and guidelines 
 - data for each of the parameters must be available 
 - at least one full cycle of data must be available 
 - appropriate QA/QC protocols must be in place 
 - data collected prior to major process changes 
should generally not be used  - Data analysis involves 
 - data characterization 
 - a complete statistical analysis 
 
  24- Data characterization for each parameter 
 - qualitative assessment of hourly, daily, seasonal 
trends (graphical examination of data)  - time-series analyses may be warranted 
 - Statistical analysis for each parameter 
 - measures of central tendency 
 - mean, median, mode 
 - measures of variability 
 - standard deviation, variance 
 - percentile analyses 
 - identification of outliers, erroneous entries, 
non-entries  
  25Application of a Model-Building Protocol
- There is no accepted best method of developing 
ANN models  - An infinite number of distinct architectures are 
possible  - A protocol is to reduce the number of 
architectures that are evaluated 
  26- A sample five-step protocol
 
  27- Selection of model inputs and outputs 
 - Why its important 
 - ANN models are based on process inputs and 
outputs  - How its done 
 - first, select the model output 
 - best models only have one output parameter 
 - next, select model inputs from available input 
parameters  - selection is based on data availability, 
literature, expert knowledge 
  28- Selection and organization of data patterns 
 - Why its important 
 - ANN models are only as good as the data used 
 - separate independent data sets are required to 
test and validate the model  - How its done 
 - examine each data pattern for erroneous entries, 
outliers, blank entries  - delete questionable data patterns 
 - sort and divide data into training, testing, and 
production sets  - perform a statistical analysis on each of the 
three data sets  
  29- Determination of architecture characteristics 
 - Why its important 
 - each modeling scenario will have an optimal 
architecture  - How its done 
 - initially, hold many factors at software defaults 
or at pre-determined values  - use modelling heuristics from literature along 
with expert knowledge  - determine the number of hidden layer neurons 
 - compare results for different runs 
 
  30- Evaluation of model stability 
 - Why its important 
 - ensure that model results are independent of the 
method of data sorting  - How its done 
 - build new training, testing, and production sets 
from the original database  - re-train models on the new data sets 
 - compare results with initial runs 
 
  31- Model fine tuning 
 - Why its important 
 - some models require minor improvements in order 
to meet process operating criteria  - How its done 
 - modeling parameters previously held constant can 
be varied to improve model results  - the model fine-tuning methodology is typically 
researcher-specific 
  32Evaluating Model Performance
- In many situations, more than one good model can 
be developed  - The best model is the one that both 
 - meets the modelling needs initially identified 
 - offers the smallest prediction errors 
 - Therefore need to be able to evaluate model 
performance 
  33- Prediction errors can be assessed 
 - graphically 
 - visual representation of missed predictions
 
  34- using statistics 
 - absolute measures of error 
 - mean absolute error (MAE) 
 - maximum absolute error 
 - relative measures of error 
 - mean absolute percent error (MAPE) 
 - coefficients of correlation 
 - coefficient of correlation (r) 
 - coefficient of multiple correlation (R) 
 - coefficients of determination 
 - coefficient of determination (r2) 
 - coefficient of multiple determination (R2) 
 
  35- Model residuals should also be studied 
 - residual  prediction error 
 - residuals should be 
 - Normally distributed 
 - plot a histogram of the residuals 
 - have a mean  0 
 - independent 
 - plot the residuals (in time order if applicable) 
 - plot should be free of obvious trends 
 - have constant variance 
 - plot the residuals against the predicted values 
 - plot should not show spreading, converging, or 
other trends 
  36Model Evaluation Using Real-time Data
- Need to consider 
 - changes in the frequency of data collection 
 - changes in the methodology of data collection 
 - existence of QA/QC protocols to detect erroneous 
data  - The evaluation can take the form of 
 - simulated real-time testing on a stand-alone PC 
 - online testing in real-time
 
  37- Methodology 
 - select the time frame of the test 
 - the data is ported to the developed models 
 - process each data pattern and record the results 
 - the model predicted values are compared to the 
actual values  - prediction errors are determined