Forecasting with Multiple Regression - PowerPoint PPT Presentation

1 / 44

About This Presentation

Title:

Forecasting with Multiple Regression

Description:

Holt vs Na ve (bivariate) Note: the in-sample estimates are coming from ... In this exercise, we use Holt's exponential model, but we aren't limited to that. ... – PowerPoint PPT presentation

Number of Views:199

Avg rating:3.0/5.0

Slides: 45

Provided by: jsmi1

Category:

more less

Transcript and Presenter's Notes

Title: Forecasting with Multiple Regression

1
Chapter 5

Forecasting with Multiple Regression

2
Selecting Independent Variables

Ideally, we want each of our independent
variables to be correlated with Y, but not
linearly correlated with all the other
independent variables.
The reason is we do not want our RHS variables to
overlap too much in what they measure.
If our RHS variables are very correlated with one
another, they end up measuring the same part of
the variance in the dependant variable.
This causes a problem called multicollinearity

3
Difficulties in finding RHS variables

Sometimes its difficult to find a RHS variable
that measures what we want.
Average interest rates for all installment loans
might be similar to just mortgage interest rates
Sometimes its impossible and we need to be more
creative.
Demand for housing would be good to know if you
are trying to buy or sell a house, but how do we
proxy for demand if we are trying to predict
sales?

4
Looking at Our First Multi-Variate Regression

We are going to forecast seasonally adjusted new
houses sold (NHS).
Lets first look at the bivariate model using
interest rates (IR) for comparison later on.
NHS b0 b1 (IR)
/-?
Data

5
Our bivariate estimate of b

NHSF 5,543.74 415.90 (IR)
When we were looking at a bivariate model as a
function of T, we just keep adding 1 to the last
T for the forecast.
What do we do here?
We need to forecast interest rates (IR) to get
our forecast of NHS.

6
Forecasting our RHS to Forecast our LHS

What model should we use to forecast our RHS
variable?
In reality, you can choose from all the models
that are available to forecast.
What does ForecastX use in the simple bivariate
case?
Simple Naïve Model
This is probably not the best you can do.

7
Holt vs Naïve (bivariate)
Note the in-sample estimates are coming from the
bivariate model and actual IR. Only when IR
forecasts differ do the forecasts for NHS differ.
8
Which is the better choice in this case?

The RMSE in the holdout period (2004) is lower
for the Holt forecasted IR than for the Naïve
forecasted IR.

9
Forecasting our RHS to Forecast our LHS

Typically, we rely on one of the time-trend
methods, other forecasts or an expert opinion.
In this exercise, we use Holts exponential
model, but we arent limited to that.

10
Multivariate Estimated Regression Model

?
Y b0 b1X1 b2X2 .... bkXk where
?
Y is forecast value of dependent variable
X1 . . . Xk are explanatory variables
b1 . . . bk show change in Y for one unit
change in X1, X2 , etc. when all other
explanatory variables are held constant.

11
Now we look at NHS with 2 RHS variables(multivar
iate)

Our second model incorporates per capita
disposable personal income (DPIPC) and the
interest rate (IR).
NHSF2 b0 b1 (DPIPC) b2 (IR)
(/-?)
Data

12
Results
NHSF2 -324.33 0.17 (DPIPC) -168.13 (IR)
13
The multivariate forecast

NHSF2 -324.33 0.17 (DPIPC) -168.13 (IR)

Holt Forecasted DPIPC IR
Holdout RMSE
From Bivariate
14
Evaluating the models(three questions)

Do the signs on the coefficients make sense?
Are they statistically significant?
How much of the variance is explained?

15
Evaluating the models

Both beta coefficients have the expected signs
and are statistically significant.
Adding the new variable DPIPC adds to
explainatory power of the model.
Additionally
RMSE decreased in-sample and in the holdout.
R-sq increased from 62.41 in the bivariate model
to 92.03 in the multivariate model. But, more
importantly adjusted R-sq went up as well, from
61.50 to 91.68.
Conclusion Adding the second RHS variable
improved the model and increased its accuracy.
We also know the relationship between IR, DPIPC,
and Seasonally adjusted NHS.
What do we do to get actual UN-adjusted sales???
Just checking

16
The Regression Plane
17
Multicollinearity at Work

Lets now consider a multivariate model with
three RHS variablesbut, two are very similar.
NHSF b0 b1 (DPIPC) b2 (GDP) b3 (IR)
(/-?)
Data

18
Multicollinear Regression
GDP and DPIPC Are highly correlated
DPIPC changes sign and becomes insignificant when
GDP is added
19
Checking the Correlation between the RHS variables
From a statistical standpoint, GDP and DPIPC are
essentially the same thing. They move together
99 of the time.
20
New stats to consider in evaluating a model

Adjusted R-Squared Adding one more independent
variable always increases R-sq, so Adjusted R-sq
factors in the loss in DF. For multivariate
models, we typically use only the adjusted R-Sq
to evaluate fit.
F-test tests the overall significance of the
regression. It simultaneously tests the
hypothesis that the regression has NO
explainatory power

21
Adjusted R-Sq
Adjusted R-Sq is scaled down by the loss in DF
22
Way of interpreting the Adjusted R-q

The way R-Sq is calculated, adding more RHS
variables will always improve the fit, even if
they are insignificant.
The adjusted R-sq declines if the added RHS
variable does not predict well. It goes up if
the added RHS var does predict well.
So, we can use the adjusted R-sq to determine if
adding a particular RHS variable is beneficial to
our regression or not (i.e., determine if the
added explainatory power offsets the loss in DF).

23
Rule of Thumb for getting rid of RHS vars that
could be causing MC problems

if I t-stat I lt 1, removing variable ?
adjusted r2 ?
if I t-stat I gt 1, removing variable ?
adjusted r2 ?

24
One Technique for Selecting Independent
Variables for forecasting

Adjusted r2 Criterion
choose set of variables that maximizes adjusted
r2 (minimizes Root Mean Square Error)
If removing variable in question causes adjusted
r2 to increase, leave it out.
If removing variable in question causes adjusted
r2 to decrease, leave it in.

25
The F-test

For the multiple regression with 2 RHS vars, K2.
We used 48 observations, so, n - (K1) 45. We
then go to our F-table.

26
The F-distribution
If our F is larger than this one, we reject the
null that all bs0 jointly.
27
F-statistic (Fcalculated) is related to r2

r2 Explained Variation in Y / Total Variation
in Y
F Explained Variation / Unexplained Variation
as Explained Variation in Y ? ? r2 ? and F ?
F-test is used to test if r2 is statistically
different from zero
H0 r2 ? 0
H1 r2 ? 0

28
Handling Seasonality in a MV Regression

Dummy Variables

29
What are Dummy Variables

They are constructed RHS variables that act like
switches. They turn off and on when something
is true of false.
They often are used in situations when no
continuous measure is available or where we
expect there to be discrete differences in
effects (like seasons).
To estimate a model with seasonality, we need to
CONSTRUCT season dummy variables.

30
Constructing the Seasonal Dummy

In all the data sets we have seen thus far, we
have had data on month and year.
To construct a month dummy, we need to construct
11 (NOT 12) new columns of data. Each month gets
its own column.

31
Data Construction
New Vars
Here we have made dummies for all months, but we
need to drop out one month for use in
regressions. Perfect Collinearity!!!
32
Ungraded Homework (Ch. 4, 7)

Provide a quarterly forecast of sales.
Prepare a time series plot of the data and
explain what you see. Is a simple linear trend
forecast useful in this case? Estimate the trend
and address parts A-E.
Data

33
7, Part A
Although, theres probably some seasonality,
there also is also definitely a trend and it
looks fairly linear and positive.
34
7, Part B C Estimate the Trend line, does it
have a significant trend?
C. Yes, Signif.
35
7, Part D Forecast 4 qtrs of 2004

T Forecast
Sales 88,741.01 5,362.62 (41)
308,608.43 Mar-2004
Sales 88,741.01 5,362.62 (42)
313,971.16 Jun-2004
Sales 88,741.01 5,362.62 (43)
319,333.78 Sep-2004
Sales 88,741.01 5,362.62 (44)
324,696.41 Dec-2004

36
7, Part E Accuracy for 2004
RMSE for the 4 quarters of 2004 is 19,571 or
about 5.9 of the average monthly sales for 2004.
37
Ungraded Homework (Ch. 4, 7)

Use the unemployment rate to estimate sales.
Unemployment Data
Part A does not actually ask you to do
anythinggo figure?!

38
8, Part B Plot a scattergram of sales vs
regional unemployment rate
There might be some positive relationship, which
is kinda odd. Typically, you would expect sales
to fall if unemployment rises.
39
8, Part C Bivariate Reg. of Sales as a fcn. of
Regional Unemployment Rate
???How did we do here???
40
8, Part D Take a memo

Dear Ms. Lynch,
The regression of Northern Regional Unemployment
Rate on Sales does not provide the kind of
accuracy we were seeking. Although the estimated
effect of of the unemployment rate on sales was
statistically significant, the model only
explained about 10 (R-sq9.99) of the variance
in the dependant variable, and the sign was not
what we expected. Furthermore, the MAPE was more
than 31, indicating a poor fit of the sample
data. Finally, our Thiels U indicates that we
would be much better off using a simple naïve
forecast rather than using one based on the
unemployment rate.
Your humble servant,
Flippin Hades Turwilliger

41
8, Part E Forecast using the model and the
forecast for the regions unemployment rate
(FNRUR)

Sales 73,222.19 15,369.38 (NRUR) Mar-2004
Sales 73,222.19 15,369.38 (NRUR) Jun-2004
Sales 73,222.19 15,369.38 (NRUR) Sep-2004
Sales 73,222.19 15,369.38 (NRUR) Dec-2004
190,029.50 73,222.19 15,369.38 (7.6)
Mar-2004
191,566.44 73,222.19 15,369.38
(7.7) Jun-2004
188,492.56 73,222.19 15,369.38 (7.5)
Sep-2004
186,955.62 73,222.19 15,369.38 (7.4)
Dec-2004

42
8, Part F Calculate RMSE and Compare with the
earlier model using the Time Trend
Unemployment Regression
Trend Regression
43
8, Part G Scattergram of Sales and Inc.
Now, that looks more like what we want! It looks
like there is a relationship between inc. and
saleswhich makes sense, right?!
44
8, Part H I Estimate the Bivariate Reg. of
Income on Sales

Sales 76,808.83 120.11 Income
Dear Ms. Lynch,
Using income, rather than unemployment rate,
substantially improved our estimates of sales.
We obtained the expected sign, with statistically
significant results. Our t-stats provide us with
a confidence interval in excess of 99. Our R-Sq
indicates that we explain about 86 of the
variance in sales with income, and the MAPE
decreased to 12.76. Overall, this is a much
better model than the one using unemployment
rate. Now, please promote me to a job that
doesnt require me to do this anymore.
Your personal slave,
Flippin Hades Turwilliger