# Chapter 9. Analysis of Time Series Data - PowerPoint PPT Presentation

PPT – Chapter 9. Analysis of Time Series Data PowerPoint presentation | free to download - id: 6ea648-MTJlN

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Chapter 9. Analysis of Time Series Data

Description:

### Chapter 9. Analysis of Time Series Data 9.1 Basic concepts 9.2 General model formulations 9.3 Smoothing methods 9.4 OLS regression models 9.5 Stochastic time series ... – PowerPoint PPT presentation

Number of Views:160
Avg rating:3.0/5.0
Slides: 70
Provided by: Agam151
Category:
Tags:
Transcript and Presenter's Notes

Title: Chapter 9. Analysis of Time Series Data

1
Chapter 9. Analysis of Time Series Data
• 9.1 Basic concepts
• 9.2 General model formulations
• 9.3 Smoothing methods
• 9.4 OLS regression models
• 9.5 Stochastic time series models
• 9.6 ARIMAX or transfer function models
• 9.7 Quality control and process monitoring using
control chart methods

2
9.1 Introduction
• Time series data is not merely data collected
over time. If this definition were true, then
almost any data set would qualify as time series
data.
• There must be some sort of ordering , i.e. a
relation between successive data observations.
Successive observations in time-series data are
not independent and their sequence needs to be
maintained during the analysis.
•  Definition of time series data A collection of
numerical observations arranged in a natural
order with each observation associated with a
particular instant of time which provides the
ordering
•  Practical way of ascertaining whether the data
is to be treated as time series data or not, is
to determine if the analysis results would change
if the sequence of the data observations were to
be scrambled.
• The importance of time series analysis is that it
provides insights and more accurate modeling and
prediction to time series data than do classical
statistical analysis because of the explicit
manner of treating model residual

3
Fig. 9.1 Daily peak and minimum hourly loads over
several months for a large electric utility to
illustrate the diurnal, the weekday/weekend and
the seasonal fluctuations and trends.
4
9.1.2 Terminology
5
9.1.3. Basic behavior patterns
• Constant process,
• (b) linear trend,
• (c) cyclic variation,
• (d) impulse,
• (e) step function,
• (f) ramp

Much of the challenge in time series analysis is
distinguishing these basic behavior patterns when
they occur in conjunction. The problem is
compounded by the fact that processes may exhibit
these patterns at different times.
Fig.9.3. Different characteristics of time series
(from Montgomery and Johnson 1976 by permission
of McGraw-Hill)
6
9.1.4 Illustrative Data Set
Fig. 9.4 Time series data of electric power
demand by quarter (Data from Table 9.1)
7
9.2 General Model Formulation
• How does one model the behavior of the data shown
in Example 9.1.1 and use it for extrapolation
purposes?
•
• There are three general time domain approaches
•  (a) Smoothing methods which are really meant to
filter the data in a computationally simple
manner. However, they can also be used for
extrapolation purposes
•  (b) OLS models which treat time series data as
sectional data but with the time variable
accounted for in an explicit manner as an
independent variable
•  (c) Stochastic time series models which
explicitly treats the model residual errors of
(b) by adding a layer of sophistication

8
9.3 Smoothing methods
• Two basic methods used
• - Arithmetic Moving Average (AMA)
• Exponential Weighted Moving Average (EWA)
• These allow to smoothen out these
• fluctuations, thus making it easier to
• discern longer time trends and thereby
• allowing future or trend predictions to be made.
• However, though they are useful in predicting
mean future values, they do not provide any
information about the uncertainty of these
predictions since no modeling per se is involved,
and so standard errors (which are the cause for
forecast errors) cannot be estimated.
• The inability to quantify forecast errors is a
serious deficiency.

9
9.3.1 Arithmetic Moving Average (AMA)
10
Fig. 9.5 Plots illustrating how two different AMA
smoothing methods capture the electric utility
load data denoted by MW (meas)
Fig. 9.6. Residuals
11
9.3.2 Exponentially Weighted Moving Average (EWA)
12
Fig. 9.7 Plots illustrating how two different EWA
smoothing methods capture the electric utility
Fig. 9.8. Residuals
13
9.4 OLS regression models
14
Fig. 9.10 Figure illustrating that residuals for
the linear trend model (eq. 9.4.1) are not random
(see Example 9.4.1). They exhibit both local
systematic scatter as well as an overall pattern
as shown by the quadratic trend line. They seem
to exhibit larger scatter than the AMA residuals
shown in Fig.9.6.
15
9.4.2 Trend and seasonal models
16
Fig. 9.11 Residuals for the linear and seasonal
model
17
(No Transcript)
18
9.4.3 Fourier series models for Periodic
Behavior
19
Fig. 9.12. Measured hourly whole building
electric use (excluding cooling and heating
related energy) for a large university building
in central Texas (from Dhar et al., 1999) from
January to June. The data shows distinct diurnal
and weekly periodicities but no seasonal trend.
Such behavior is referred to as
weather-independent data. The residual data
series using a pure sinusoidal model (Eq. 9.16)
are also shown.
20
Fig. 9.13 Measured hourly whole building cooling
thermal energy use for the same building as in
Fig. 9.12 (from Dhar et al., 1999) from January
to June. The data shows distinct diurnal and
weekly periodicities as well as
weather-dependency. The residual data series
using a sinusoidal model with weather variables
(Eq. 9.18) are also shown.
21
(No Transcript)
22
9.4.4 Interrupted time series
23
• Fig. 9.15 Improvements in OLS model fit when an
indicator variable is introduced to capture
abrupt one-time change in energy use in a
building (from Ruch et al., 1999).
• Ordinary least squares (OLS) model
• (b) Indicator variable model (IND)

24
9.5 Stochastic Time Series
25
• The systematic stochastic component is treated by
stochastic time series models such as AR, MA,
ARMA, ARIMA and ARMAX which are linear in both
model and parameters, and hence, simplify the
parameter estimation process.
•
• Usually allows more accurate predictions than
classical regression
• Once it is deemed that a time series modeling
approach is appropriate for the situation at
hand, three separate issues are involved similar
to OLS modeling
• (i) identification of the order of the model
(i.e., model structure),
• (ii) estimation of the model parameters
(parameter estimation), and
• (iii) ascertaining uncertainty in the
forecasts.
• Note that time series models may not always be
superior to the standard OLS methods

26
(a) Autocorrelation function (ACF)
27
Usually, there is no need to fit a functional
equation, but a graphical representation called
the correlogram is a useful means to provide
insights both into model development and to
evaluate stationarity
28
Fig. 9.17 Sample correlogram for a time series
which is non-stationary since the ACF does not
asymptote to zero
9.5.2.3 Detrending data by differencing
Function First
differencing Second differencing
29
(No Transcript)
30
(No Transcript)
31
9.5.3 ARIMA models
• The ARIMA (p,d,q) (Auto Regressive Integrated
Moving Average) model formulation is a general
linear framework consisting of three sub-models
• the autoregressive (AR) is meant to capture the
memory of the system
• (done via a linear model between p past model
residuals
•
• the integrated (I) part is meant to make the
series stationary by differencing
•
• the moving average (MA) is meant to capture the
shocks on the system
• (done by using a linear function of q past
white noise errors
• Unlike OLS type models, ARMA models require
relatively long data series for parameter
estimation (about a minimum of 50 data points and
preferably 100 data points or more)

32
MA Models
33
An example of a MA(1) process with mean 10
(Fig. 9.20a) where a set of 100 data points have
been generated in a spreadsheet program using the
model shown with a random number generator for
the white noise term. Since this is a first
order model, the ACF should have only one
significant value (this is seen in Fig. 9.20b
where ACF for greater lags fall inside the 95
confidence intervals). Ideally, there should
only be one spike at lag k1, but because random
noise was introduced in the synthetic data, this
obfuscates the estimation, and spikes at other
lags appear which, however, are statistically
insignificant
Fig. 9.20. One realization of a MA(1) process for
along with corresponding ACF and PACF with error
term being Normal(0,1).
34
AR Models Often used in engineering
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
Example 9.5.3. Comparison of various models for
peak electric demand
40
(No Transcript)
41
(No Transcript)
42
• Recommendations for Model selection
• Model type (ARIMA,AR or MA) best identified from
correlograms of the ACF and the PACF
• The identification procedure can be summarized as
follows
• For AR(1) ACF decays exponentially, PACF has a
spike a lag 1, and other spikes are not
statistically significant, i.e., are contained
within the 95 confidence intervals
• For AR(2) ACF decays exponentially (indicative
of positive model coefficients) or with
sinusoidal-exponential decay (indicative of a
positive and a negative coefficient), and PACF
has two statistically significant spikes
• For MA(1) ACF has one statistically significant
spike at lag 1 and PACF damps down exponentially
• For MA(2) ACF has two statistically significant
spikes (one at lag 1 and one at lag 2), and PACF
has an exponential decay or a sinusoidal-exponenti
al decay
• For ARMA (1,1) ACF and PACF have spikes at lag 1
with exponential decay.
values of p and q for ARMA(p,q)
• Increase model order until no systematic
residual patterns are evident
• Most time series data from engineering
experiments or from physical systems or
processes should be adequately modeled by low
• - Cross-validation is strongly recommended to
avoid over-fitting and would better reflect the
predictive capability of the model.
• - The model selection is somewhat subjective.

43
9.6 ARMAX or transfer function models
Fig. 9.25 Conceptual difference between the
single-variate ARMA approach and the multivariate
ARMAX approach applied to dynamic systems
44
9.6.2 Transfer function modeling of linear
dynamic systems
45
(No Transcript)
46
9.45
9.
47
Example 9.6.1   b 0.00099 0.00836 0.00361 0.00007
Tin 25 d 1 -0.9397 0.04664 0
c 0.01303

time (t) Tos,t   Qcond(t)   Qcond(t)   Qcond(t)   Qcond(t)
hour C   W/m2   W/m2   W/m2   W/m2
-2 27.2   0.000   2.228   2.357   2.365
-1 26.1   0.000   2.011   2.126   2.132
0 25   0.000   1.803   1.905   1.911
1 24.4   0.004   1.604   1.694   1.699
2 24.4   -0.002   1.418   1.498   1.502
3 23.8   -0.011   1.249   1.320   1.324
4 23.3   -0.024   1.094   1.157   1.160
5 23.3   -0.042   0.949   1.005   1.008
6 25   -0.059   0.820   0.870   0.873
7 27.7   -0.057   0.723   0.767   0.770
8 30   -0.023   0.669   0.708   0.710
9 32.7   0.040   0.654   0.688   0.690
10 35   0.131   0.676   0.706   0.708
11 37.7   0.246   0.729   0.756   0.758
12 40   0.382   0.811   0.835   0.837
13 53.3   0.548   0.928   0.950   0.951
14 64.4   0.828   1.165   1.184   1.185
15 72.7   1.232   1.531   1.548   1.549
16 75.5   1.712   1.978   1.993   1.993
17 72.2   2.195   2.431   2.444   2.445
18 58.8   2.597   2.806   2.817   2.818
19 30.5   2.800   2.985   2.995   2.996
20 29.4   2.685   2.850   2.859   2.860
21 28.3   2.455   2.601   2.609   2.610
22 27.2   2.228   2.357   2.365   2.365
23 26.1   2.011   2.126   2.132   2.132
24 25   1.803   1.905   1.911   1.911

Average 37.94583               1.577
48
9.7 Quality control and process monitoring using
control chart methods
49
Fig. 9.26 The upper and lower three-sigma limits
indicative of the UCL and LCL limits shown on a
normal distribution
Fig. 9.27 The Shewhart control chart with primary
limits
50
(No Transcript)
51
9.7.2 Shewart control charts for variables and
attributes
• (a) Shewart chart for variables for continuous
measurements such as diameter, temperature, flow,
as well as derived parameters or quantities such
as overall heat loss coefficient, efficiency,

52
Table 9.8 Numerical values of the three
coefficients to be used in Eqs. 9.49 and 9.51 for
constructing the three-sigma limits for the mean
and range charts.
53
(No Transcript)
54
(ii) range or R charts to control variation to
detect uniformity or consistency of a process.
55
It is suggested that the mean and range chart be
used together since their complementary
properties allow better monitoring of a
process. Fig. 9.28 illustrates two instances
where the benefit of using both charts reveal
behavior which one chart alone would have missed.
Fig. 9.28 The combined advantage provided by the
mean and range charts in detecting out-of-control
processes. Two instances are shown (a) where the
variability is within limits but the mean is out
of control which is detected by the mean chart,
and (b) where the mean is in control but not the
variability which is detected by the range chart
56
Table 9.10 Data table for the 20 samples
consisting of four items and associated mean and
range statistics (Example 9.7.1)
Sample Item 1 Item 2 Item 3 Item 4 X-bar ( ) Range (R)
1 1.405 1.419 1.377 1.400 1.400 0.042
2 1.407 1.397 1.377 1.393 1.394 0.030
3 1.385 1.392 1.399 1.392 1.392 0.014
4 1.386 1.419 1.387 1.417 1.402 0.033
5 1.382 1.391 1.390 1.397 1.390 0.015
6 1.404 1.406 1.404 1.402 1.404 0.004
7 1.409 1.386 1.399 1.403 1.399 0.023
8 1.399 1.382 1.389 1.410 1.395 0.028
9 1.408 1.411 1.394 1.388 1.400 0.023
10 1.399 1.421 1.400 1.407 1.407 0.022
11 1.394 1.397 1.396 1.409 1.399 0.015
12 1.409 1.389 1.398 1.399 1.399 0.020
13 1.405 1.387 1.399 1.393 1.396 0.018
14 1.390 1.410 1.388 1.384 1.393 0.026
15 1.393 1.403 1.387 1.415 1.400 0.028
16 1.413 1.390 1.395 1.411 1.402 0.023
17 1.410 1.415 1.392 1.397 1.404 0.023
18 1.407 1.386 1.396 1.393 1.396 0.021
19 1.411 1.406 1.392 1.387 1.399 0.024
20 1.404 1.396 1.391 1.390 1.395 0.014
Grand Mean 1.398 0.022
57
The X-bar and R charts are shown in Fig 9.29.
Note that no point is beyond the control limits
in either plot indicating that the process is in
statistical control.
58
(b) Shewart control charts for attributes
59
(No Transcript)
60
(c) Practical implementation issues
• When a process is in control, the points from
each sample plotted on the control chart should
fluctuate in a random manner between the UCL and
LCL
• Several rules have been proposed to increase
the sensitivity of Shewhart charts. Other than
no points outside the control limits, one could
check for such effects as
• (i) the number of points above and below the
centerline are about equal, (ii) there is no
steady rise or decrease in a sequence of points,
• (iii) most of the points are close to the
centerline rather than hugging the limits,
• (iv) there is a sudden shift in the process
mean,
• (v) cyclic behavior
•  Devore and Farnum (2005) and others present an
extended list of out-of-control rules involving
counting the number of points falling within
different bounds corresponding to one, two and
three sigma lines. However, using such types of
extended rules also increases the possibility of
false alarms (or type I errors), and so, rather
than being adhoc, there should be some
statistical basis to these rules.

61
Examples of extended list of out-of-control rules
meant to improve the sensitivity of the
traditional Shewhart control chart (from Devore
and Farnum, 2005 with permission from Thomson
Brooks/Cole)
62
Examples of typical histograms used during
process capability analysis (from Devore and
Farnum, 2005 with permission from Thomson
Brooks/Cole)
63
(b) EWMA monitoring process
64
(No Transcript)
65
9.7.3 Statistical process control using time
weighted charts
Instead of mean residuals, one could also use
charts based on other statistics such as the
range, the variable itself, absolute differences,
or successive differences between observations.
66
(No Transcript)
67
(No Transcript)
68
There are several instances when certain products
and processes can be analyzed with more than one
method, and there is no clear cut choice.
and R charts are quite robust- they yield good
results even if the data is not normally
distributed, while Cusum charts are adversely
affected by serial correlation in the data.
Table 9.12 Relative effectiveness of control
charts in detecting a change in a process (from
Himmelblau 1978)
69
Other Related Analysis Methods