Loading...

PPT – Chapter 9. Analysis of Time Series Data PowerPoint presentation | free to download - id: 6ea648-MTJlN

The Adobe Flash plugin is needed to view this content

Chapter 9. Analysis of Time Series Data

- 9.1 Basic concepts
- 9.2 General model formulations
- 9.3 Smoothing methods
- 9.4 OLS regression models
- 9.5 Stochastic time series models
- 9.6 ARIMAX or transfer function models
- 9.7 Quality control and process monitoring using

control chart methods

9.1 Introduction

- Time series data is not merely data collected

over time. If this definition were true, then

almost any data set would qualify as time series

data. - There must be some sort of ordering , i.e. a

relation between successive data observations.

Successive observations in time-series data are

not independent and their sequence needs to be

maintained during the analysis. - Definition of time series data A collection of

numerical observations arranged in a natural

order with each observation associated with a

particular instant of time which provides the

ordering - Practical way of ascertaining whether the data

is to be treated as time series data or not, is

to determine if the analysis results would change

if the sequence of the data observations were to

be scrambled. - The importance of time series analysis is that it

provides insights and more accurate modeling and

prediction to time series data than do classical

statistical analysis because of the explicit

manner of treating model residual

Fig. 9.1 Daily peak and minimum hourly loads over

several months for a large electric utility to

illustrate the diurnal, the weekday/weekend and

the seasonal fluctuations and trends.

9.1.2 Terminology

9.1.3. Basic behavior patterns

- Constant process,
- (b) linear trend,
- (c) cyclic variation,
- (d) impulse,
- (e) step function,
- (f) ramp

Much of the challenge in time series analysis is

distinguishing these basic behavior patterns when

they occur in conjunction. The problem is

compounded by the fact that processes may exhibit

these patterns at different times.

Fig.9.3. Different characteristics of time series

(from Montgomery and Johnson 1976 by permission

of McGraw-Hill)

9.1.4 Illustrative Data Set

Fig. 9.4 Time series data of electric power

demand by quarter (Data from Table 9.1)

9.2 General Model Formulation

- How does one model the behavior of the data shown

in Example 9.1.1 and use it for extrapolation

purposes? - There are three general time domain approaches
- (a) Smoothing methods which are really meant to

filter the data in a computationally simple

manner. However, they can also be used for

extrapolation purposes - (b) OLS models which treat time series data as

sectional data but with the time variable

accounted for in an explicit manner as an

independent variable - (c) Stochastic time series models which

explicitly treats the model residual errors of

(b) by adding a layer of sophistication

9.3 Smoothing methods

- Two basic methods used
- - Arithmetic Moving Average (AMA)
- Exponential Weighted Moving Average (EWA)
- These allow to smoothen out these
- fluctuations, thus making it easier to
- discern longer time trends and thereby
- allowing future or trend predictions to be made.
- However, though they are useful in predicting

mean future values, they do not provide any

information about the uncertainty of these

predictions since no modeling per se is involved,

and so standard errors (which are the cause for

forecast errors) cannot be estimated. - The inability to quantify forecast errors is a

serious deficiency.

9.3.1 Arithmetic Moving Average (AMA)

Fig. 9.5 Plots illustrating how two different AMA

smoothing methods capture the electric utility

load data denoted by MW (meas)

Fig. 9.6. Residuals

9.3.2 Exponentially Weighted Moving Average (EWA)

Fig. 9.7 Plots illustrating how two different EWA

smoothing methods capture the electric utility

load data denoted by MW(meas)

Fig. 9.8. Residuals

9.4 OLS regression models

Fig. 9.10 Figure illustrating that residuals for

the linear trend model (eq. 9.4.1) are not random

(see Example 9.4.1). They exhibit both local

systematic scatter as well as an overall pattern

as shown by the quadratic trend line. They seem

to exhibit larger scatter than the AMA residuals

shown in Fig.9.6.

9.4.2 Trend and seasonal models

Fig. 9.11 Residuals for the linear and seasonal

model

(No Transcript)

9.4.3 Fourier series models for Periodic

Behavior

Fig. 9.12. Measured hourly whole building

electric use (excluding cooling and heating

related energy) for a large university building

in central Texas (from Dhar et al., 1999) from

January to June. The data shows distinct diurnal

and weekly periodicities but no seasonal trend.

Such behavior is referred to as

weather-independent data. The residual data

series using a pure sinusoidal model (Eq. 9.16)

are also shown.

Fig. 9.13 Measured hourly whole building cooling

thermal energy use for the same building as in

Fig. 9.12 (from Dhar et al., 1999) from January

to June. The data shows distinct diurnal and

weekly periodicities as well as

weather-dependency. The residual data series

using a sinusoidal model with weather variables

(Eq. 9.18) are also shown.

(No Transcript)

9.4.4 Interrupted time series

- Fig. 9.15 Improvements in OLS model fit when an

indicator variable is introduced to capture

abrupt one-time change in energy use in a

building (from Ruch et al., 1999). - Ordinary least squares (OLS) model
- (b) Indicator variable model (IND)

9.5 Stochastic Time Series

- The systematic stochastic component is treated by

stochastic time series models such as AR, MA,

ARMA, ARIMA and ARMAX which are linear in both

model and parameters, and hence, simplify the

parameter estimation process. - Usually allows more accurate predictions than

classical regression - Once it is deemed that a time series modeling

approach is appropriate for the situation at

hand, three separate issues are involved similar

to OLS modeling - (i) identification of the order of the model

(i.e., model structure), - (ii) estimation of the model parameters

(parameter estimation), and - (iii) ascertaining uncertainty in the

forecasts. - Note that time series models may not always be

superior to the standard OLS methods

(a) Autocorrelation function (ACF)

Usually, there is no need to fit a functional

equation, but a graphical representation called

the correlogram is a useful means to provide

insights both into model development and to

evaluate stationarity

Fig. 9.17 Sample correlogram for a time series

which is non-stationary since the ACF does not

asymptote to zero

9.5.2.3 Detrending data by differencing

Function First

differencing Second differencing

(No Transcript)

(No Transcript)

9.5.3 ARIMA models

- The ARIMA (p,d,q) (Auto Regressive Integrated

Moving Average) model formulation is a general

linear framework consisting of three sub-models - the autoregressive (AR) is meant to capture the

memory of the system - (done via a linear model between p past model

residuals - the integrated (I) part is meant to make the

series stationary by differencing - the moving average (MA) is meant to capture the

shocks on the system - (done by using a linear function of q past

white noise errors - Unlike OLS type models, ARMA models require

relatively long data series for parameter

estimation (about a minimum of 50 data points and

preferably 100 data points or more)

MA Models

An example of a MA(1) process with mean 10

(Fig. 9.20a) where a set of 100 data points have

been generated in a spreadsheet program using the

model shown with a random number generator for

the white noise term. Since this is a first

order model, the ACF should have only one

significant value (this is seen in Fig. 9.20b

where ACF for greater lags fall inside the 95

confidence intervals). Ideally, there should

only be one spike at lag k1, but because random

noise was introduced in the synthetic data, this

obfuscates the estimation, and spikes at other

lags appear which, however, are statistically

insignificant

Fig. 9.20. One realization of a MA(1) process for

along with corresponding ACF and PACF with error

term being Normal(0,1).

AR Models Often used in engineering

(No Transcript)

(No Transcript)

(No Transcript)

(No Transcript)

Example 9.5.3. Comparison of various models for

peak electric demand

(No Transcript)

(No Transcript)

- Recommendations for Model selection
- Model type (ARIMA,AR or MA) best identified from

correlograms of the ACF and the PACF - The identification procedure can be summarized as

follows - For AR(1) ACF decays exponentially, PACF has a

spike a lag 1, and other spikes are not

statistically significant, i.e., are contained

within the 95 confidence intervals - For AR(2) ACF decays exponentially (indicative

of positive model coefficients) or with

sinusoidal-exponential decay (indicative of a

positive and a negative coefficient), and PACF

has two statistically significant spikes - For MA(1) ACF has one statistically significant

spike at lag 1 and PACF damps down exponentially - For MA(2) ACF has two statistically significant

spikes (one at lag 1 and one at lag 2), and PACF

has an exponential decay or a sinusoidal-exponenti

al decay - For ARMA (1,1) ACF and PACF have spikes at lag 1

with exponential decay. - - Usually, it is better to start with the lowest

values of p and q for ARMA(p,q) - Increase model order until no systematic

residual patterns are evident - Most time series data from engineering

experiments or from physical systems or

processes should be adequately modeled by low

orders, i.e., about 1-3 terms. - - Cross-validation is strongly recommended to

avoid over-fitting and would better reflect the

predictive capability of the model. - - The model selection is somewhat subjective.

9.6 ARMAX or transfer function models

Fig. 9.25 Conceptual difference between the

single-variate ARMA approach and the multivariate

ARMAX approach applied to dynamic systems

9.6.2 Transfer function modeling of linear

dynamic systems

(No Transcript)

9.45

9.

Example 9.6.1 b 0.00099 0.00836 0.00361 0.00007

Tin 25 d 1 -0.9397 0.04664 0

c 0.01303

time (t) Tos,t Qcond(t) Qcond(t) Qcond(t) Qcond(t)

hour C W/m2 W/m2 W/m2 W/m2

-2 27.2 0.000 2.228 2.357 2.365

-1 26.1 0.000 2.011 2.126 2.132

0 25 0.000 1.803 1.905 1.911

1 24.4 0.004 1.604 1.694 1.699

2 24.4 -0.002 1.418 1.498 1.502

3 23.8 -0.011 1.249 1.320 1.324

4 23.3 -0.024 1.094 1.157 1.160

5 23.3 -0.042 0.949 1.005 1.008

6 25 -0.059 0.820 0.870 0.873

7 27.7 -0.057 0.723 0.767 0.770

8 30 -0.023 0.669 0.708 0.710

9 32.7 0.040 0.654 0.688 0.690

10 35 0.131 0.676 0.706 0.708

11 37.7 0.246 0.729 0.756 0.758

12 40 0.382 0.811 0.835 0.837

13 53.3 0.548 0.928 0.950 0.951

14 64.4 0.828 1.165 1.184 1.185

15 72.7 1.232 1.531 1.548 1.549

16 75.5 1.712 1.978 1.993 1.993

17 72.2 2.195 2.431 2.444 2.445

18 58.8 2.597 2.806 2.817 2.818

19 30.5 2.800 2.985 2.995 2.996

20 29.4 2.685 2.850 2.859 2.860

21 28.3 2.455 2.601 2.609 2.610

22 27.2 2.228 2.357 2.365 2.365

23 26.1 2.011 2.126 2.132 2.132

24 25 1.803 1.905 1.911 1.911

Average 37.94583 1.577

9.7 Quality control and process monitoring using

control chart methods

Fig. 9.26 The upper and lower three-sigma limits

indicative of the UCL and LCL limits shown on a

normal distribution

Fig. 9.27 The Shewhart control chart with primary

limits

(No Transcript)

9.7.2 Shewart control charts for variables and

attributes

- (a) Shewart chart for variables for continuous

measurements such as diameter, temperature, flow,

as well as derived parameters or quantities such

as overall heat loss coefficient, efficiency,

Table 9.8 Numerical values of the three

coefficients to be used in Eqs. 9.49 and 9.51 for

constructing the three-sigma limits for the mean

and range charts.

(No Transcript)

(ii) range or R charts to control variation to

detect uniformity or consistency of a process.

It is suggested that the mean and range chart be

used together since their complementary

properties allow better monitoring of a

process. Fig. 9.28 illustrates two instances

where the benefit of using both charts reveal

behavior which one chart alone would have missed.

Fig. 9.28 The combined advantage provided by the

mean and range charts in detecting out-of-control

processes. Two instances are shown (a) where the

variability is within limits but the mean is out

of control which is detected by the mean chart,

and (b) where the mean is in control but not the

variability which is detected by the range chart

Table 9.10 Data table for the 20 samples

consisting of four items and associated mean and

range statistics (Example 9.7.1)

Sample Item 1 Item 2 Item 3 Item 4 X-bar ( ) Range (R)

1 1.405 1.419 1.377 1.400 1.400 0.042

2 1.407 1.397 1.377 1.393 1.394 0.030

3 1.385 1.392 1.399 1.392 1.392 0.014

4 1.386 1.419 1.387 1.417 1.402 0.033

5 1.382 1.391 1.390 1.397 1.390 0.015

6 1.404 1.406 1.404 1.402 1.404 0.004

7 1.409 1.386 1.399 1.403 1.399 0.023

8 1.399 1.382 1.389 1.410 1.395 0.028

9 1.408 1.411 1.394 1.388 1.400 0.023

10 1.399 1.421 1.400 1.407 1.407 0.022

11 1.394 1.397 1.396 1.409 1.399 0.015

12 1.409 1.389 1.398 1.399 1.399 0.020

13 1.405 1.387 1.399 1.393 1.396 0.018

14 1.390 1.410 1.388 1.384 1.393 0.026

15 1.393 1.403 1.387 1.415 1.400 0.028

16 1.413 1.390 1.395 1.411 1.402 0.023

17 1.410 1.415 1.392 1.397 1.404 0.023

18 1.407 1.386 1.396 1.393 1.396 0.021

19 1.411 1.406 1.392 1.387 1.399 0.024

20 1.404 1.396 1.391 1.390 1.395 0.014

Grand Mean 1.398 0.022

The X-bar and R charts are shown in Fig 9.29.

Note that no point is beyond the control limits

in either plot indicating that the process is in

statistical control.

(b) Shewart control charts for attributes

(No Transcript)

(c) Practical implementation issues

- When a process is in control, the points from

each sample plotted on the control chart should

fluctuate in a random manner between the UCL and

LCL - Several rules have been proposed to increase

the sensitivity of Shewhart charts. Other than

no points outside the control limits, one could

check for such effects as - (i) the number of points above and below the

centerline are about equal, (ii) there is no

steady rise or decrease in a sequence of points, - (iii) most of the points are close to the

centerline rather than hugging the limits, - (iv) there is a sudden shift in the process

mean, - (v) cyclic behavior
- Devore and Farnum (2005) and others present an

extended list of out-of-control rules involving

counting the number of points falling within

different bounds corresponding to one, two and

three sigma lines. However, using such types of

extended rules also increases the possibility of

false alarms (or type I errors), and so, rather

than being adhoc, there should be some

statistical basis to these rules.

Examples of extended list of out-of-control rules

meant to improve the sensitivity of the

traditional Shewhart control chart (from Devore

and Farnum, 2005 with permission from Thomson

Brooks/Cole)

Examples of typical histograms used during

process capability analysis (from Devore and

Farnum, 2005 with permission from Thomson

Brooks/Cole)

(b) EWMA monitoring process

(No Transcript)

9.7.3 Statistical process control using time

weighted charts

Instead of mean residuals, one could also use

charts based on other statistics such as the

range, the variable itself, absolute differences,

or successive differences between observations.

(No Transcript)

(No Transcript)

There are several instances when certain products

and processes can be analyzed with more than one

method, and there is no clear cut choice.

and R charts are quite robust- they yield good

results even if the data is not normally

distributed, while Cusum charts are adversely

affected by serial correlation in the data.

Table 9.12 Relative effectiveness of control

charts in detecting a change in a process (from

Himmelblau 1978)

Other Related Analysis Methods