Outline Lecture 1 - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Outline Lecture 1

Description:

Pattern: a complete set of data inputs and outputs that provide a ... avoids operating in the saturation range of the sigmoid function. Number of Hidden Layers ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 38
Provided by: chrisb48
Category:

less

Transcript and Presenter's Notes

Title: Outline Lecture 1


1
EE459 Neural NetworksHow to design a good
performance NN?
Kasin Prakobwaitayakit Department of Electrical
Engineering Chiangmai University
2
Glossary
  • Pattern a complete set of data inputs and
    outputs that provide a snapshot of the system
    being modelled
  • also called examples, cases
  • Feature an identifying characteristic in the
    data that the model will ideally capture
  • Domain a set of boundaries that define the range
    of expected / observed data for a particular
    model or problem

3
Backpropagation Modelling Heuristics
  • Selection of model inputs and outputs
  • Defining the model domain
  • Data pre-processing
  • Selection of training/testing cases
  • Data scaling
  • Number of hidden layers
  • Number of neurons
  • Activation function selection
  • Initial weight values
  • Learning rate and momentum
  • Presentation of patterns
  • Stopping criteria for training
  • Improving model performance

4
Selection of Model Inputs and Outputs
  • Start with a set of inputs that are KNOWN to
    affect the process
  • then add other inputs that are suspected of
    having a relationship in the process one at a
    time
  • Eliminate input variables that are redundant
    (high covariance)
  • Eliminate data patterns that do not contribute to
    training (no new information)
  • Identify / eliminate data patterns with the same
    inputs that have different outputs

5
Defining the Model Domain
  • ANNs should be confined to a limited domain
  • develop separate models for contradictory areas
    of the domain
  • Build models that predict a single output
  • link multiple models together, if required

6
Data Pre-processing
  • Any time the dynamic range of an input is over a
    few orders of magnitude, a logarithmic
    transformation should be applied
  • Transformations may also be useful in
    reconditioning input data when an ANN has
    trouble converging
  • Non-numerical inputs need to be codified

7
Selection of Training/Testing Cases
  • Training cases should be representative of the
    problem domain
  • For good generalization capacity, the training
    set must be complete
  • important variable must be measured
  • Data set is randomly divided into training and
    testing sets
  • in a 7030 ratio, for example

8
  • Rules of thumb
  • The number of training patterns should be at
    least 5 times the number of nodes in the network
  • The number of training cases should be roughly
    the number of weightsthe inverse of the accuracy
    parameter (e, e0.9 means 90 accuracy of
    prediction is required)

9
Data Scaling
  • Output variables should be scaled in the 0.1 to
    0.9 range
  • avoids operating in the saturation range of the
    sigmoid function

10
Number of Hidden Layers
  • Increasing the number of hidden layers increases
    both the time and number of patterns (examples)
    required for training
  • For most problems, one hidden layer will suffice
  • problems can always be solved with two
  • Multiple slabs (Ward Nets) supposedly increase
    processing power
  • each slab (group of neurons) acts as a detector
    for one or more input features

11
Number of Neurons
  • One neuron in the input layer for each input
  • One neuron in the output layer for each output
  • Proper number of hidden neurons is often
    determined experimentally
  • too few poor ability to capture features
  • too many poor ability to generalize (ANN simply
    memorizes training data)
  • Various rules of thumb have been reported
  • 0.75N
  • 2N 1

N number of inputs
12
  • In general, more weights introduce more local
    minima to the error surface
  • Flat regions in the error surface can mislead
    gradient-based error minimization methods
    (backpropagation)
  • Start with a small network and then add
    connections as needed
  • avoids convergence problems as the networks get
    too large
  • The optimum ratio of hidden neurons in the first
    to second hidden layer is 31

13
Activation Function Selection
  • Sigmoidal (logistic) activation functions are the
    most widely used
  • Thresholding functions are only useful for
    categorical outputs

14
Initial Weight Values
  • Weights need to be randomized initially
  • if all weights are set to the same number, the
    GDR would never be able to leave the starting
    point
  • The backpropagation algorithm may also have
    difficulty if the connection weights are
    prematurely saturated (gt0.9)

15
Learning Rate and Momentum
  • Learning rate (?) affects the speed of
    convergence.
  • If it is large (gt0.5), the weights will be
    changed more drastically, but this may cause the
    optimum to be overshot
  • If it is small (lt0.2) the weights will be changed
    in smaller increments, thus causing the system to
    converge more slowly with little oscillation
  • The best training occurs when the learning rate
    is as large as possible without leading to
    oscillation

16
  • The learning rate can be increased as learning
    progresses, or a momentum term added to improve
    network learning speed
  • The momentum factor (?) has the ability to dampen
    out oscillations caused by increasing the
    learning rate
  • a momentum of 0.9 allows higher learning rates

17
Presenting Patterns to the ANN
  • Present patterns in a random fashion
  • If input patterns can be easily classified, do
    not train the ANN on all patterns in a class in
    succession
  • the ANN will forget information as it moves from
    class to class
  • Shaping can be used to improve network training
  • involves starting off on a very small cohesive
    data set and then adding more patterns that have
    greater deviations from variable means as
    training progresses

18
Stopping Criteria for Training
  • Training should be stopped when one of the
    following conditions is met
  • the testing error is sufficiently small
  • the testing error begins to increase
  • a set number of iterations has passed

19
Improving Model Performance
  • Re-initialize network weights to a new set of
    random values
  • re-train the model
  • Adjust learning rate and momentum
  • Modify stopping criteria
  • Prune network weights
  • Use genetic algorithms to adjust network topology
  • Add noise to training cases to decrease the
    chance of memorization

20
ANN Modelling Approach
21
Needs and Suitability Assessment
  • What are the modelling needs ?
  • Is the ANN technique suitable to meet these needs
    ?
  • Can the following be met ?
  • data requirements
  • software requirements
  • hardware requirements
  • personnel requirements

22
Data Collection and Analysis
  • Successful models require careful attention to
    data details
  • Recall relevant historical data is a key
    requirement
  • For data collection, investigate
  • data availability
  • parameters, time-frame, frequency, format
  • QA/QC protocols
  • data reliability
  • process changes

23
  • Data requirements and guidelines
  • data for each of the parameters must be available
  • at least one full cycle of data must be available
  • appropriate QA/QC protocols must be in place
  • data collected prior to major process changes
    should generally not be used
  • Data analysis involves
  • data characterization
  • a complete statistical analysis

24
  • Data characterization for each parameter
  • qualitative assessment of hourly, daily, seasonal
    trends (graphical examination of data)
  • time-series analyses may be warranted
  • Statistical analysis for each parameter
  • measures of central tendency
  • mean, median, mode
  • measures of variability
  • standard deviation, variance
  • percentile analyses
  • identification of outliers, erroneous entries,
    non-entries

25
Application of a Model-Building Protocol
  • There is no accepted best method of developing
    ANN models
  • An infinite number of distinct architectures are
    possible
  • A protocol is to reduce the number of
    architectures that are evaluated

26
  • A sample five-step protocol

27
  • Selection of model inputs and outputs
  • Why its important
  • ANN models are based on process inputs and
    outputs
  • How its done
  • first, select the model output
  • best models only have one output parameter
  • next, select model inputs from available input
    parameters
  • selection is based on data availability,
    literature, expert knowledge

28
  • Selection and organization of data patterns
  • Why its important
  • ANN models are only as good as the data used
  • separate independent data sets are required to
    test and validate the model
  • How its done
  • examine each data pattern for erroneous entries,
    outliers, blank entries
  • delete questionable data patterns
  • sort and divide data into training, testing, and
    production sets
  • perform a statistical analysis on each of the
    three data sets

29
  • Determination of architecture characteristics
  • Why its important
  • each modeling scenario will have an optimal
    architecture
  • How its done
  • initially, hold many factors at software defaults
    or at pre-determined values
  • use modelling heuristics from literature along
    with expert knowledge
  • determine the number of hidden layer neurons
  • compare results for different runs

30
  • Evaluation of model stability
  • Why its important
  • ensure that model results are independent of the
    method of data sorting
  • How its done
  • build new training, testing, and production sets
    from the original database
  • re-train models on the new data sets
  • compare results with initial runs

31
  • Model fine tuning
  • Why its important
  • some models require minor improvements in order
    to meet process operating criteria
  • How its done
  • modeling parameters previously held constant can
    be varied to improve model results
  • the model fine-tuning methodology is typically
    researcher-specific

32
Evaluating Model Performance
  • In many situations, more than one good model can
    be developed
  • The best model is the one that both
  • meets the modelling needs initially identified
  • offers the smallest prediction errors
  • Therefore need to be able to evaluate model
    performance

33
  • Prediction errors can be assessed
  • graphically
  • visual representation of missed predictions

34
  • using statistics
  • absolute measures of error
  • mean absolute error (MAE)
  • maximum absolute error
  • relative measures of error
  • mean absolute percent error (MAPE)
  • coefficients of correlation
  • coefficient of correlation (r)
  • coefficient of multiple correlation (R)
  • coefficients of determination
  • coefficient of determination (r2)
  • coefficient of multiple determination (R2)

35
  • Model residuals should also be studied
  • residual prediction error
  • residuals should be
  • Normally distributed
  • plot a histogram of the residuals
  • have a mean 0
  • independent
  • plot the residuals (in time order if applicable)
  • plot should be free of obvious trends
  • have constant variance
  • plot the residuals against the predicted values
  • plot should not show spreading, converging, or
    other trends

36
Model Evaluation Using Real-time Data
  • Need to consider
  • changes in the frequency of data collection
  • changes in the methodology of data collection
  • existence of QA/QC protocols to detect erroneous
    data
  • The evaluation can take the form of
  • simulated real-time testing on a stand-alone PC
  • online testing in real-time

37
  • Methodology
  • select the time frame of the test
  • the data is ported to the developed models
  • process each data pattern and record the results
  • the model predicted values are compared to the
    actual values
  • prediction errors are determined
Write a Comment
User Comments (0)
About PowerShow.com