Eric%20Plummer - PowerPoint PPT Presentation

About This Presentation
Title:

Eric%20Plummer

Description:

For each training method, data series, and architecture, 3 candidates were trained ... One training method did not appear to be clearly better ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 41
Provided by: ericpl
Category:
Tags: 20plummer | eric | method

less

Transcript and Presenter's Notes

Title: Eric%20Plummer


1
Time Series Forecasting WithFeed-Forward Neural
NetworksGuidelines And Limitations
  • Eric Plummer
  • Computer Science Department
  • University of Wyoming
  • June 6, 2018

2
Topics
  • Thesis Goals
  • Time Series Forecasting
  • Neural Networks
  • K-Nearest-Neighbor
  • Test-Bed Application
  • Empirical Evaluation
  • Data Preprocessing
  • Contributions
  • Future Work
  • Conclusion
  • Demonstration

3
Thesis Goals
  • Compare neural networks and k-nearest-neighbor
    for time series forecasting
  • Analyze the response of various configurations to
    data series with specific characteristics
  • Identify when neural networks and
    k-nearest-neighbor are inadequate
  • Evaluate the effectiveness of data preprocessing

4
Time Series Forecasting Description
  • What is it?
  • Given an existing data series, observe or model
    the data series to make accurate forecasts
  • Example data series
  • Financial (e.g., stocks, rates)
  • Physically observed (e.g., weather, sunspots)
  • Mathematical (e.g., Fibonacci sequence)

5
Time Series Forecasting Difficulties
  • Why is it difficult?
  • Limited quantity of data
  • Observed data series sometimes too short to
    partition
  • Noise
  • Erroneous data points
  • Obscuring component
  • Moving Average
  • Nonstationarity
  • Fundamentals change over time
  • Nonstationary mean Ascending data series
  • First-difference preprocessing
  • Forecasting method selection
  • Statistics
  • Artificial intelligence

6
Time Series Forecasting Importance
  • Why is it important?
  • Preventing undesirable events by forecasting the
    event, identifying the circumstances preceding
    the event, and taking corrective action so the
    event can be avoided (e.g., inflationary economic
    period)
  • Forecasting undesirable, yet unavoidable, events
    to preemptively lessen their impact (e.g., solar
    maximum w/ sunspots)
  • Profiting from forecasting (e.g., financial
    markets)

7
Neural Networks Background
  • Loosely based on the human brains neuron
    structure
  • Timeline
  • 1940s McCulloch and Pitts proposed neuron
    models in the form of binary threshold devices
    and stochastic algorithms
  • 1950s 1960s Rosenblatt class of learning
    machines called perceptrons
  • Late 1960s Minsky and Papert discouraging
    analysis of perceptrons (linearly separable
    classes)
  • 1980s Rumelhart, Hinton, and Williams
    generalized delta rule for learning by
    back-propagation for training multilayer
    perceptrons
  • Present many new training algorithms and
    architectures, but nothing revolutionary

8
Neural Networks Architecture
  • A feed-forward neural network can have any number
    of
  • Layers
  • Units per layer
  • Network inputs
  • Network outputs
  • Hidden layers (A, B)
  • Output layer (C)

9
Neural Networks Units
  • A unit has
  • Connections
  • Weights
  • Bias
  • Activation function
  • Weights and bias are randomly initialized before
    training
  • Units input consists of
  • Sum of the products of each connection value and
    associated weight
  • Add the bias
  • Input is then fed into units activation function
  • Units output is the output of activation
    function
  • Hidden layers Sigmoid
  • Output layer Linear

10
Neural Networks Training
  • Partition data series into
  • Training set
  • Validation set (optional)
  • Test set (optional)
  • Typically, the training procedure is
  • Perform backpropagation training with training
    set
  • After n epochs, compute total squared error on
    training set and validation set
  • If consistently validation error ? and training
    error ?, stop training.
  • Overfitting Training set learned too well
  • Generalization Given inputs not in training and
    validation sets, able to accurately forecast

11
Neural Networks Training
  • Backpropagation training
  • First, examples in the form of ltinput, outputgt
    pairs are extracted from the data series
  • Then, the network is trained with backpropagation
    on the examples
  • Present an examples input vector to the network
    inputs and run the network sequentially forward
  • Propagate the error sequentially backward from
    the output layer
  • For every connection, change the weight modifying
    that connection in proportion to the error
  • When all three steps have been performed for all
    examples, one epoch has occurred
  • Goal is to converge to a near-optimal solution
    based on the total squared error

12
Neural Networks Training
Backpropagation training cycle
13
Neural Networks Forecasting
  • Forecasting method depends on examples
  • Examples depend on step-ahead size

If step-ahead size is one Iterative forecasting
If step-ahead size is greater than one Direct
forecasting
14
Neural Networks Forecasting
Iterative forecasting
Can continue this indefinitely
15
Neural Networks Forecasting
Directly forecasting n steps
This is the only forecast
16
K-Nearest-Neighbor Forecasting
  • No model to train
  • Simple linear search
  • Compare reference to candidates
  • Select k candidates with lowest error
  • Forecast is average of k next values

17
Test-Bed Application FORECASTER
  • Written in Visual C with MFC
  • Object-oriented
  • Multithreaded
  • Wizard-based
  • Easily modified
  • Implements feed-forward neural networks
    k-nearest-neighbor
  • Used for time series forecasting
  • Eventually will be upgraded for classification
    problems

18
Empirical Evaluation Data Series
Less Noisy
Original
More Noisy
Ascending
Sunspots
19
Empirical Evaluation Neural Network
Architectures
  • Number of network inputs based on data series
  • Need to make unambiguous examples
  • For sawtooths
  • 24 inputs are necessary
  • Test networks with 25 35 inputs
  • Test networks with 1 hidden layer with 2, 10,
    20 hidden layer units
  • One output layer unit
  • For sunspots
  • 30 inputs
  • 1 hidden layer with 30 units
  • For real-world data series, selection may be
    trial-and-error!

20
Empirical Evaluation Neural Network Training
  • Heuristic method
  • Start with aggressive learning rate
  • Gradually lower learning rate as validation error
    increases
  • Stop training when learning rate cannot be
    lowered anymore
  • Simple method
  • Use conservative learning rate
  • Training stops when
  • Number of training epochs equals the epochs limit
    -or-
  • Training error is less than or equal to error
    limit

21
Empirical Evaluation Neural Network Forecasting
  • Metric to compare forecasts Coefficient of
    Determination
  • Value may be (-?, 1
  • Want value between 0 and 1, where 0 is
    forecasting the mean of the data series and 1 is
    forecasting the actual value
  • Must have actual values to compare with
    forecasted values
  • For networks trained on original, less noisy, and
    more noisy data series, forecast will be compared
    to original series
  • For networks trained on ascending data series,
    forecast will be compared to continuation of
    ascending series
  • For networks trained on sunspots data series,
    forecast will be compared to test set

22
Empirical Evaluation K-Nearest-Neighbor
  • Choosing window size analogous to choosing number
    of neural network inputs
  • For sawtooth data series
  • k 2
  • Test window sizes of 20, 24, and 30
  • For sunspots data series
  • k 3
  • Window size of 10
  • Compare forecasts via coefficient of determination

23
Empirical Evaluation Candidate Selection
  • Neural networks
  • For each training method, data series, and
    architecture, 3 candidates were trained
  • Also, average of 3 candidates forecasts was
    taken forecasting by committee
  • Best forecast was selected based on coefficient
    of determination
  • K-nearest-neighbor
  • For each data series, k, and window size, only
    one search was performed (only one needed)

24
Empirical Evaluation Original Data Series
Simple NN
Heuristic NN
Smaller NN
K-N-N
25
Empirical Evaluation Less Noisy Data Series
Simple NN
Heuristic NN
K-N-N
26
Empirical Evaluation More Noisy Data Series
Simple NN
Heuristic NN
K-N-N
27
Empirical Evaluation Ascending Data Series
Simple NN
Heuristic NN
28
Empirical Evaluation Longer Forecast
Heuristic NN
29
Empirical Evaluation Sunspots Data Series
Simple NN K-N-N
30
Empirical Evaluation Discussion
  • Heuristic training method observations
  • Networks train longer (more epochs) on smoother
    data series like the original and ascending data
    series
  • The total squared error and unscaled error are
    higher for noisy data series
  • Neither the number of epochs nor the errors
    appear to correlate well with the coefficient of
    determination
  • In most cases, the committee forecast is worse
    than the best candidate's forecast
  • When actual values are unavailable, choosing the
    best candidate is difficult!

31
Empirical Evaluation Discussion
  • Simple training method observations
  • The total squared error and unscaled error are
    higher for noisy data series with the exception
    of the 35101 network trained on the more noisy
    data series
  • The errors do not appear to correlate well with
    the coefficient of determination
  • In most cases, the committee forecast is worse
    than the best candidate's forecast
  • There are four networks whose coefficient of
    determination is negative, compared with two for
    the heuristic training method


32
Empirical Evaluation Discussion
  • General observations
  • One training method did not appear to be clearly
    better
  • Increasingly noisy data series increasingly
    degraded the forecasting performance
  • Nonstationarity in the mean degraded the
    performance
  • Too few hidden units (e.g., 3521) forecasted
    well on simpler data series, but failed for more
    complex ones
  • Excessive numbers of hidden units (e.g, 35201)
    did not hurt performance
  • Twenty-five network inputs was not sufficient
  • K-nearest-neighbor was consistently better than
    the neural networks
  • Feed-forward neural networks are extremely
    sensitive to architecture and parameter choices,
    and making such choices is currently more art
    than science, more trial-and-error than absolute,
    more practice than theory!

33
Data Preprocessing
  • First-difference
  • For ascending data series, a neural network
    trained on first-difference can forecast near
    perfectly
  • In that case, it is better to train and forecast
    on first-difference
  • FORECASTER reconstitutes forecast from its
    first-difference
  • Moving average
  • For noisy data series, moving average would
    eliminate much of the noise
  • But would also smooth out peaks and valleys
  • Series may then be easier to learn and forecast
  • But in some series, the noise may be important
    data (e.g., utility load forecasting)

34
Contributions
  • Filled a void within feed-forward neural network
    time series forecasting literature know how
    networks respond to various data series
    characteristics in a controlled environment
  • Showed that k-nearest-neighbor is a better
    forecasting method for the data series used in
    this research
  • Reaffirmed that neural networks are very
    sensitive to architecture, parameter, and
    learning method changes
  • Presented some insight into neural network
    architecture selection selecting number of
    network inputs based on data series
  • Presented a neural network training heuristic
    that produced good results

35
Future Work
  • Upgrade FORECASTER to work with classification
    problems
  • Add more complex network types, including wavelet
    networks for time series forecasting
  • Investigate k-nearest-neighbor further
  • Add other forecasting methods, (e.g., decision
    trees for classification)

36
Conclusion
  • Presented
  • Time series forecasting
  • Neural networks
  • K-nearest-neighbor
  • Empirical evaluation
  • Learned a lot about the implementation details of
    the forecasting techniques
  • Learned a lot about MFC programming

Thank You
37
Demonstration
Various files can be found at http//w3.uwyo.edu/
eplummer
38
Unit Output, Error, and Weight Change Formulas
39
Forecast Error Formulas
40
Related Work
  • Drossu and Obradovic (1996) hybrid stochastic
    and neural network approach to time series
    forecasting
  • Zhang and Thearling (1994) parallel
    implementations of neural networks and
    memory-based reasoning
  • Geva (1998) multiscale fast wavelet transform
    and an array of feed-forward neural networks
  • Lawrence, Tsoi, and Giles (1996) encodes the
    series with a self-organizing map and uses
    recurrent neural networks
  • Kingdon (1997) automated intelligent system for
    financial forecasting and uses neural networks
    and genetic algorithms
Write a Comment
User Comments (0)
About PowerShow.com