Chapter 9. Analysis of Time Series Data - PowerPoint PPT Presentation

Loading...

PPT – Chapter 9. Analysis of Time Series Data PowerPoint presentation | free to download - id: 6ea648-MTJlN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Chapter 9. Analysis of Time Series Data

Description:

Chapter 9. Analysis of Time Series Data 9.1 Basic concepts 9.2 General model formulations 9.3 Smoothing methods 9.4 OLS regression models 9.5 Stochastic time series ... – PowerPoint PPT presentation

Number of Views:160
Avg rating:3.0/5.0
Slides: 70
Provided by: Agam151
Learn more at: http://auroenergy.com
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Chapter 9. Analysis of Time Series Data


1
Chapter 9. Analysis of Time Series Data
  • 9.1 Basic concepts
  • 9.2 General model formulations
  • 9.3 Smoothing methods
  • 9.4 OLS regression models
  • 9.5 Stochastic time series models
  • 9.6 ARIMAX or transfer function models
  • 9.7 Quality control and process monitoring using
    control chart methods

2
9.1 Introduction
  • Time series data is not merely data collected
    over time. If this definition were true, then
    almost any data set would qualify as time series
    data.
  • There must be some sort of ordering , i.e. a
    relation between successive data observations.
    Successive observations in time-series data are
    not independent and their sequence needs to be
    maintained during the analysis.
  •  Definition of time series data A collection of
    numerical observations arranged in a natural
    order with each observation associated with a
    particular instant of time which provides the
    ordering
  •  Practical way of ascertaining whether the data
    is to be treated as time series data or not, is
    to determine if the analysis results would change
    if the sequence of the data observations were to
    be scrambled.
  • The importance of time series analysis is that it
    provides insights and more accurate modeling and
    prediction to time series data than do classical
    statistical analysis because of the explicit
    manner of treating model residual

3
Fig. 9.1 Daily peak and minimum hourly loads over
several months for a large electric utility to
illustrate the diurnal, the weekday/weekend and
the seasonal fluctuations and trends.
4
9.1.2 Terminology
5
9.1.3. Basic behavior patterns
  • Constant process,
  • (b) linear trend,
  • (c) cyclic variation,
  • (d) impulse,
  • (e) step function,
  • (f) ramp

Much of the challenge in time series analysis is
distinguishing these basic behavior patterns when
they occur in conjunction. The problem is
compounded by the fact that processes may exhibit
these patterns at different times.
Fig.9.3. Different characteristics of time series
(from Montgomery and Johnson 1976 by permission
of McGraw-Hill)
6
9.1.4 Illustrative Data Set
Fig. 9.4 Time series data of electric power
demand by quarter (Data from Table 9.1)
7
9.2 General Model Formulation
  • How does one model the behavior of the data shown
    in Example 9.1.1 and use it for extrapolation
    purposes?
  •  
  • There are three general time domain approaches
  •  (a) Smoothing methods which are really meant to
    filter the data in a computationally simple
    manner. However, they can also be used for
    extrapolation purposes
  •  (b) OLS models which treat time series data as
    sectional data but with the time variable
    accounted for in an explicit manner as an
    independent variable
  •  (c) Stochastic time series models which
    explicitly treats the model residual errors of
    (b) by adding a layer of sophistication

8
9.3 Smoothing methods
  • Two basic methods used
  • - Arithmetic Moving Average (AMA)
  • Exponential Weighted Moving Average (EWA)
  • These allow to smoothen out these
  • fluctuations, thus making it easier to
  • discern longer time trends and thereby
  • allowing future or trend predictions to be made.
  • However, though they are useful in predicting
    mean future values, they do not provide any
    information about the uncertainty of these
    predictions since no modeling per se is involved,
    and so standard errors (which are the cause for
    forecast errors) cannot be estimated.
  • The inability to quantify forecast errors is a
    serious deficiency.

9
9.3.1 Arithmetic Moving Average (AMA)
10
Fig. 9.5 Plots illustrating how two different AMA
smoothing methods capture the electric utility
load data denoted by MW (meas)
Fig. 9.6. Residuals
11
9.3.2 Exponentially Weighted Moving Average (EWA)
12
Fig. 9.7 Plots illustrating how two different EWA
smoothing methods capture the electric utility
load data denoted by MW(meas)
Fig. 9.8. Residuals
13
9.4 OLS regression models
14
Fig. 9.10 Figure illustrating that residuals for
the linear trend model (eq. 9.4.1) are not random
(see Example 9.4.1). They exhibit both local
systematic scatter as well as an overall pattern
as shown by the quadratic trend line. They seem
to exhibit larger scatter than the AMA residuals
shown in Fig.9.6.
15
9.4.2 Trend and seasonal models
16
Fig. 9.11 Residuals for the linear and seasonal
model
17
(No Transcript)
18
9.4.3 Fourier series models for Periodic
Behavior
19
Fig. 9.12. Measured hourly whole building
electric use (excluding cooling and heating
related energy) for a large university building
in central Texas (from Dhar et al., 1999) from
January to June. The data shows distinct diurnal
and weekly periodicities but no seasonal trend.
Such behavior is referred to as
weather-independent data. The residual data
series using a pure sinusoidal model (Eq. 9.16)
are also shown.
20
Fig. 9.13 Measured hourly whole building cooling
thermal energy use for the same building as in
Fig. 9.12 (from Dhar et al., 1999) from January
to June. The data shows distinct diurnal and
weekly periodicities as well as
weather-dependency. The residual data series
using a sinusoidal model with weather variables
(Eq. 9.18) are also shown.
21
(No Transcript)
22
9.4.4 Interrupted time series
23
  • Fig. 9.15 Improvements in OLS model fit when an
    indicator variable is introduced to capture
    abrupt one-time change in energy use in a
    building (from Ruch et al., 1999).
  • Ordinary least squares (OLS) model
  • (b) Indicator variable model (IND)

24
9.5 Stochastic Time Series
25
  • The systematic stochastic component is treated by
    stochastic time series models such as AR, MA,
    ARMA, ARIMA and ARMAX which are linear in both
    model and parameters, and hence, simplify the
    parameter estimation process.
  •  
  • Usually allows more accurate predictions than
    classical regression
  • Once it is deemed that a time series modeling
    approach is appropriate for the situation at
    hand, three separate issues are involved similar
    to OLS modeling
  • (i) identification of the order of the model
    (i.e., model structure),
  • (ii) estimation of the model parameters
    (parameter estimation), and
  • (iii) ascertaining uncertainty in the
    forecasts.
  • Note that time series models may not always be
    superior to the standard OLS methods

26
(a) Autocorrelation function (ACF)
27
Usually, there is no need to fit a functional
equation, but a graphical representation called
the correlogram is a useful means to provide
insights both into model development and to
evaluate stationarity
28
Fig. 9.17 Sample correlogram for a time series
which is non-stationary since the ACF does not
asymptote to zero
9.5.2.3 Detrending data by differencing
Function First
differencing Second differencing
29
(No Transcript)
30
(No Transcript)
31
9.5.3 ARIMA models
  • The ARIMA (p,d,q) (Auto Regressive Integrated
    Moving Average) model formulation is a general
    linear framework consisting of three sub-models
  • the autoregressive (AR) is meant to capture the
    memory of the system
  • (done via a linear model between p past model
    residuals
  •  
  • the integrated (I) part is meant to make the
    series stationary by differencing
  •  
  • the moving average (MA) is meant to capture the
    shocks on the system
  • (done by using a linear function of q past
    white noise errors
  • Unlike OLS type models, ARMA models require
    relatively long data series for parameter
    estimation (about a minimum of 50 data points and
    preferably 100 data points or more)

32
MA Models
33
An example of a MA(1) process with mean 10
(Fig. 9.20a) where a set of 100 data points have
been generated in a spreadsheet program using the
model shown with a random number generator for
the white noise term. Since this is a first
order model, the ACF should have only one
significant value (this is seen in Fig. 9.20b
where ACF for greater lags fall inside the 95
confidence intervals). Ideally, there should
only be one spike at lag k1, but because random
noise was introduced in the synthetic data, this
obfuscates the estimation, and spikes at other
lags appear which, however, are statistically
insignificant
Fig. 9.20. One realization of a MA(1) process for
along with corresponding ACF and PACF with error
term being Normal(0,1).
34
AR Models Often used in engineering
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
Example 9.5.3. Comparison of various models for
peak electric demand
40
(No Transcript)
41
(No Transcript)
42
  • Recommendations for Model selection
  • Model type (ARIMA,AR or MA) best identified from
    correlograms of the ACF and the PACF
  • The identification procedure can be summarized as
    follows
  • For AR(1) ACF decays exponentially, PACF has a
    spike a lag 1, and other spikes are not
    statistically significant, i.e., are contained
    within the 95 confidence intervals
  • For AR(2) ACF decays exponentially (indicative
    of positive model coefficients) or with
    sinusoidal-exponential decay (indicative of a
    positive and a negative coefficient), and PACF
    has two statistically significant spikes
  • For MA(1) ACF has one statistically significant
    spike at lag 1 and PACF damps down exponentially
  • For MA(2) ACF has two statistically significant
    spikes (one at lag 1 and one at lag 2), and PACF
    has an exponential decay or a sinusoidal-exponenti
    al decay
  • For ARMA (1,1) ACF and PACF have spikes at lag 1
    with exponential decay.
  • - Usually, it is better to start with the lowest
    values of p and q for ARMA(p,q)
  • Increase model order until no systematic
    residual patterns are evident
  • Most time series data from engineering
    experiments or from physical systems or
    processes should be adequately modeled by low
    orders, i.e., about 1-3 terms.
  • - Cross-validation is strongly recommended to
    avoid over-fitting and would better reflect the
    predictive capability of the model.
  • - The model selection is somewhat subjective.

43
9.6 ARMAX or transfer function models
Fig. 9.25 Conceptual difference between the
single-variate ARMA approach and the multivariate
ARMAX approach applied to dynamic systems
44
9.6.2 Transfer function modeling of linear
dynamic systems
45
(No Transcript)
46
9.45
9.
47
Example 9.6.1   b 0.00099 0.00836 0.00361 0.00007        
Tin 25 d 1 -0.9397 0.04664 0        
    c 0.01303              
                     
  time (t) Tos,t   Qcond(t)   Qcond(t)   Qcond(t)   Qcond(t)
  hour C   W/m2   W/m2   W/m2   W/m2
  -2 27.2   0.000   2.228   2.357   2.365
  -1 26.1   0.000   2.011   2.126   2.132
  0 25   0.000   1.803   1.905   1.911
  1 24.4   0.004   1.604   1.694   1.699
  2 24.4   -0.002   1.418   1.498   1.502
  3 23.8   -0.011   1.249   1.320   1.324
  4 23.3   -0.024   1.094   1.157   1.160
  5 23.3   -0.042   0.949   1.005   1.008
  6 25   -0.059   0.820   0.870   0.873
  7 27.7   -0.057   0.723   0.767   0.770
  8 30   -0.023   0.669   0.708   0.710
  9 32.7   0.040   0.654   0.688   0.690
  10 35   0.131   0.676   0.706   0.708
  11 37.7   0.246   0.729   0.756   0.758
  12 40   0.382   0.811   0.835   0.837
  13 53.3   0.548   0.928   0.950   0.951
  14 64.4   0.828   1.165   1.184   1.185
  15 72.7   1.232   1.531   1.548   1.549
  16 75.5   1.712   1.978   1.993   1.993
  17 72.2   2.195   2.431   2.444   2.445
  18 58.8   2.597   2.806   2.817   2.818
  19 30.5   2.800   2.985   2.995   2.996
  20 29.4   2.685   2.850   2.859   2.860
  21 28.3   2.455   2.601   2.609   2.610
  22 27.2   2.228   2.357   2.365   2.365
  23 26.1   2.011   2.126   2.132   2.132
  24 25   1.803   1.905   1.911   1.911
                     
  Average 37.94583               1.577
48
9.7 Quality control and process monitoring using
control chart methods
49
Fig. 9.26 The upper and lower three-sigma limits
indicative of the UCL and LCL limits shown on a
normal distribution
Fig. 9.27 The Shewhart control chart with primary
limits
50
(No Transcript)
51
9.7.2 Shewart control charts for variables and
attributes
  • (a) Shewart chart for variables for continuous
    measurements such as diameter, temperature, flow,
    as well as derived parameters or quantities such
    as overall heat loss coefficient, efficiency,

52
Table 9.8 Numerical values of the three
coefficients to be used in Eqs. 9.49 and 9.51 for
constructing the three-sigma limits for the mean
and range charts.
53
(No Transcript)
54
(ii) range or R charts to control variation to
detect uniformity or consistency of a process.
55
It is suggested that the mean and range chart be
used together since their complementary
properties allow better monitoring of a
process. Fig. 9.28 illustrates two instances
where the benefit of using both charts reveal
behavior which one chart alone would have missed.
Fig. 9.28 The combined advantage provided by the
mean and range charts in detecting out-of-control
processes. Two instances are shown (a) where the
variability is within limits but the mean is out
of control which is detected by the mean chart,
and (b) where the mean is in control but not the
variability which is detected by the range chart
56
Table 9.10 Data table for the 20 samples
consisting of four items and associated mean and
range statistics (Example 9.7.1)
Sample Item 1 Item 2 Item 3 Item 4 X-bar ( ) Range (R)
1 1.405 1.419 1.377 1.400 1.400 0.042
2 1.407 1.397 1.377 1.393 1.394 0.030
3 1.385 1.392 1.399 1.392 1.392 0.014
4 1.386 1.419 1.387 1.417 1.402 0.033
5 1.382 1.391 1.390 1.397 1.390 0.015
6 1.404 1.406 1.404 1.402 1.404 0.004
7 1.409 1.386 1.399 1.403 1.399 0.023
8 1.399 1.382 1.389 1.410 1.395 0.028
9 1.408 1.411 1.394 1.388 1.400 0.023
10 1.399 1.421 1.400 1.407 1.407 0.022
11 1.394 1.397 1.396 1.409 1.399 0.015
12 1.409 1.389 1.398 1.399 1.399 0.020
13 1.405 1.387 1.399 1.393 1.396 0.018
14 1.390 1.410 1.388 1.384 1.393 0.026
15 1.393 1.403 1.387 1.415 1.400 0.028
16 1.413 1.390 1.395 1.411 1.402 0.023
17 1.410 1.415 1.392 1.397 1.404 0.023
18 1.407 1.386 1.396 1.393 1.396 0.021
19 1.411 1.406 1.392 1.387 1.399 0.024
20 1.404 1.396 1.391 1.390 1.395 0.014
Grand Mean 1.398 0.022
57
The X-bar and R charts are shown in Fig 9.29.
Note that no point is beyond the control limits
in either plot indicating that the process is in
statistical control.
58
(b) Shewart control charts for attributes
59
(No Transcript)
60
(c) Practical implementation issues
  • When a process is in control, the points from
    each sample plotted on the control chart should
    fluctuate in a random manner between the UCL and
    LCL
  • Several rules have been proposed to increase
    the sensitivity of Shewhart charts. Other than
    no points outside the control limits, one could
    check for such effects as
  • (i) the number of points above and below the
    centerline are about equal, (ii) there is no
    steady rise or decrease in a sequence of points,
  • (iii) most of the points are close to the
    centerline rather than hugging the limits,
  • (iv) there is a sudden shift in the process
    mean,
  • (v) cyclic behavior
  •  Devore and Farnum (2005) and others present an
    extended list of out-of-control rules involving
    counting the number of points falling within
    different bounds corresponding to one, two and
    three sigma lines. However, using such types of
    extended rules also increases the possibility of
    false alarms (or type I errors), and so, rather
    than being adhoc, there should be some
    statistical basis to these rules.

61
Examples of extended list of out-of-control rules
meant to improve the sensitivity of the
traditional Shewhart control chart (from Devore
and Farnum, 2005 with permission from Thomson
Brooks/Cole)
62
Examples of typical histograms used during
process capability analysis (from Devore and
Farnum, 2005 with permission from Thomson
Brooks/Cole)
63
(b) EWMA monitoring process
64
(No Transcript)
65
9.7.3 Statistical process control using time
weighted charts
Instead of mean residuals, one could also use
charts based on other statistics such as the
range, the variable itself, absolute differences,
or successive differences between observations.
66
(No Transcript)
67
(No Transcript)
68
There are several instances when certain products
and processes can be analyzed with more than one
method, and there is no clear cut choice.
and R charts are quite robust- they yield good
results even if the data is not normally
distributed, while Cusum charts are adversely
affected by serial correlation in the data.
Table 9.12 Relative effectiveness of control
charts in detecting a change in a process (from
Himmelblau 1978)
69
Other Related Analysis Methods
About PowerShow.com