Forecasting with Multiple Regression - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Forecasting with Multiple Regression

Description:

Holt vs Na ve (bivariate) Note: the in-sample estimates are coming from ... In this exercise, we use Holt's exponential model, but we aren't limited to that. ... – PowerPoint PPT presentation

Number of Views:199
Avg rating:3.0/5.0
Slides: 45
Provided by: jsmi1
Category:

less

Transcript and Presenter's Notes

Title: Forecasting with Multiple Regression


1
Chapter 5
  • Forecasting with Multiple Regression

2
Selecting Independent Variables
  • Ideally, we want each of our independent
    variables to be correlated with Y, but not
    linearly correlated with all the other
    independent variables.
  • The reason is we do not want our RHS variables to
    overlap too much in what they measure.
  • If our RHS variables are very correlated with one
    another, they end up measuring the same part of
    the variance in the dependant variable.
  • This causes a problem called multicollinearity

3
Difficulties in finding RHS variables
  • Sometimes its difficult to find a RHS variable
    that measures what we want.
  • Average interest rates for all installment loans
    might be similar to just mortgage interest rates
  • Sometimes its impossible and we need to be more
    creative.
  • Demand for housing would be good to know if you
    are trying to buy or sell a house, but how do we
    proxy for demand if we are trying to predict
    sales?

4
Looking at Our First Multi-Variate Regression
  • We are going to forecast seasonally adjusted new
    houses sold (NHS).
  • Lets first look at the bivariate model using
    interest rates (IR) for comparison later on.
  • NHS b0 b1 (IR)
  • /-?
  • Data

5
Our bivariate estimate of b
  • NHSF 5,543.74 415.90 (IR)
  • When we were looking at a bivariate model as a
    function of T, we just keep adding 1 to the last
    T for the forecast.
  • What do we do here?
  • We need to forecast interest rates (IR) to get
    our forecast of NHS.

6
Forecasting our RHS to Forecast our LHS
  • What model should we use to forecast our RHS
    variable?
  • In reality, you can choose from all the models
    that are available to forecast.
  • What does ForecastX use in the simple bivariate
    case?
  • Simple Naïve Model
  • This is probably not the best you can do.

7
Holt vs Naïve (bivariate)
Note the in-sample estimates are coming from the
bivariate model and actual IR. Only when IR
forecasts differ do the forecasts for NHS differ.
8
Which is the better choice in this case?
  • The RMSE in the holdout period (2004) is lower
    for the Holt forecasted IR than for the Naïve
    forecasted IR.

9
Forecasting our RHS to Forecast our LHS
  • Typically, we rely on one of the time-trend
    methods, other forecasts or an expert opinion.
  • In this exercise, we use Holts exponential
    model, but we arent limited to that.

10
Multivariate Estimated Regression Model
  • ?
  • Y b0 b1X1 b2X2 .... bkXk where
  • ?
  • Y is forecast value of dependent variable
  • X1 . . . Xk are explanatory variables
  • b1 . . . bk show change in Y for one unit
    change in X1, X2 , etc. when all other
    explanatory variables are held constant.

11
Now we look at NHS with 2 RHS variables(multivar
iate)
  • Our second model incorporates per capita
    disposable personal income (DPIPC) and the
    interest rate (IR).
  • NHSF2 b0 b1 (DPIPC) b2 (IR)
  • (/-?)
  • Data

12
Results
NHSF2 -324.33 0.17 (DPIPC) -168.13 (IR)
13
The multivariate forecast
  • NHSF2 -324.33 0.17 (DPIPC) -168.13 (IR)

Holt Forecasted DPIPC IR
Holdout RMSE
From Bivariate
14
Evaluating the models(three questions)
  • Do the signs on the coefficients make sense?
  • Are they statistically significant?
  • How much of the variance is explained?

15
Evaluating the models
  • Both beta coefficients have the expected signs
    and are statistically significant.
  • Adding the new variable DPIPC adds to
    explainatory power of the model.
  • Additionally
  • RMSE decreased in-sample and in the holdout.
  • R-sq increased from 62.41 in the bivariate model
    to 92.03 in the multivariate model. But, more
    importantly adjusted R-sq went up as well, from
    61.50 to 91.68.
  • Conclusion Adding the second RHS variable
    improved the model and increased its accuracy.
    We also know the relationship between IR, DPIPC,
    and Seasonally adjusted NHS.
  • What do we do to get actual UN-adjusted sales???
    Just checking

16
The Regression Plane
17
Multicollinearity at Work
  • Lets now consider a multivariate model with
    three RHS variablesbut, two are very similar.
  • NHSF b0 b1 (DPIPC) b2 (GDP) b3 (IR)
  • (/-?)
  • Data

18
Multicollinear Regression
GDP and DPIPC Are highly correlated
DPIPC changes sign and becomes insignificant when
GDP is added
19
Checking the Correlation between the RHS variables
From a statistical standpoint, GDP and DPIPC are
essentially the same thing. They move together
99 of the time.
20
New stats to consider in evaluating a model
  • Adjusted R-Squared Adding one more independent
    variable always increases R-sq, so Adjusted R-sq
    factors in the loss in DF. For multivariate
    models, we typically use only the adjusted R-Sq
    to evaluate fit.
  • F-test tests the overall significance of the
    regression. It simultaneously tests the
    hypothesis that the regression has NO
    explainatory power

21
Adjusted R-Sq
Adjusted R-Sq is scaled down by the loss in DF
22
Way of interpreting the Adjusted R-q
  • The way R-Sq is calculated, adding more RHS
    variables will always improve the fit, even if
    they are insignificant.
  • The adjusted R-sq declines if the added RHS
    variable does not predict well. It goes up if
    the added RHS var does predict well.
  • So, we can use the adjusted R-sq to determine if
    adding a particular RHS variable is beneficial to
    our regression or not (i.e., determine if the
    added explainatory power offsets the loss in DF).

23
Rule of Thumb for getting rid of RHS vars that
could be causing MC problems
  • if I t-stat I lt 1, removing variable ?
    adjusted r2 ?
  • if I t-stat I gt 1, removing variable ?
    adjusted r2 ?

24
One Technique for Selecting Independent
Variables for forecasting
  • Adjusted r2 Criterion
  • choose set of variables that maximizes adjusted
    r2 (minimizes Root Mean Square Error)
  • If removing variable in question causes adjusted
    r2 to increase, leave it out.
  • If removing variable in question causes adjusted
    r2 to decrease, leave it in.

25
The F-test
  • For the multiple regression with 2 RHS vars, K2.
    We used 48 observations, so, n - (K1) 45. We
    then go to our F-table.

26
The F-distribution
If our F is larger than this one, we reject the
null that all bs0 jointly.
27
F-statistic (Fcalculated) is related to r2
  • r2 Explained Variation in Y / Total Variation
    in Y
  • F Explained Variation / Unexplained Variation
  • as Explained Variation in Y ? ? r2 ? and F ?
  • F-test is used to test if r2 is statistically
    different from zero
  • H0 r2 ? 0
  • H1 r2 ? 0

28
Handling Seasonality in a MV Regression
  • Dummy Variables

29
What are Dummy Variables
  • They are constructed RHS variables that act like
    switches. They turn off and on when something
    is true of false.
  • They often are used in situations when no
    continuous measure is available or where we
    expect there to be discrete differences in
    effects (like seasons).
  • To estimate a model with seasonality, we need to
    CONSTRUCT season dummy variables.

30
Constructing the Seasonal Dummy
  • In all the data sets we have seen thus far, we
    have had data on month and year.
  • To construct a month dummy, we need to construct
    11 (NOT 12) new columns of data. Each month gets
    its own column.

31
Data Construction
New Vars
Here we have made dummies for all months, but we
need to drop out one month for use in
regressions. Perfect Collinearity!!!
32
Ungraded Homework (Ch. 4, 7)
  • Provide a quarterly forecast of sales.
  • Prepare a time series plot of the data and
    explain what you see. Is a simple linear trend
    forecast useful in this case? Estimate the trend
    and address parts A-E.
  • Data

33
7, Part A
Although, theres probably some seasonality,
there also is also definitely a trend and it
looks fairly linear and positive.
34
7, Part B C Estimate the Trend line, does it
have a significant trend?
C. Yes, Signif.
35
7, Part D Forecast 4 qtrs of 2004
  • T Forecast
  • Sales 88,741.01 5,362.62 (41)
    308,608.43 Mar-2004
  • Sales 88,741.01 5,362.62 (42)
    313,971.16 Jun-2004
  • Sales 88,741.01 5,362.62 (43)
    319,333.78 Sep-2004
  • Sales 88,741.01 5,362.62 (44)
    324,696.41 Dec-2004

36
7, Part E Accuracy for 2004
RMSE for the 4 quarters of 2004 is 19,571 or
about 5.9 of the average monthly sales for 2004.
37
Ungraded Homework (Ch. 4, 7)
  • Use the unemployment rate to estimate sales.
    Unemployment Data
  • Part A does not actually ask you to do
    anythinggo figure?!

38
8, Part B Plot a scattergram of sales vs
regional unemployment rate
There might be some positive relationship, which
is kinda odd. Typically, you would expect sales
to fall if unemployment rises.
39
8, Part C Bivariate Reg. of Sales as a fcn. of
Regional Unemployment Rate
???How did we do here???
40
8, Part D Take a memo
  • Dear Ms. Lynch,
  • The regression of Northern Regional Unemployment
    Rate on Sales does not provide the kind of
    accuracy we were seeking. Although the estimated
    effect of of the unemployment rate on sales was
    statistically significant, the model only
    explained about 10 (R-sq9.99) of the variance
    in the dependant variable, and the sign was not
    what we expected. Furthermore, the MAPE was more
    than 31, indicating a poor fit of the sample
    data. Finally, our Thiels U indicates that we
    would be much better off using a simple naïve
    forecast rather than using one based on the
    unemployment rate.
  • Your humble servant,
  • Flippin Hades Turwilliger

41
8, Part E Forecast using the model and the
forecast for the regions unemployment rate
(FNRUR)
  • Sales 73,222.19 15,369.38 (NRUR) Mar-2004
  • Sales 73,222.19 15,369.38 (NRUR) Jun-2004
  • Sales 73,222.19 15,369.38 (NRUR) Sep-2004
  • Sales 73,222.19 15,369.38 (NRUR) Dec-2004
  • 190,029.50 73,222.19 15,369.38 (7.6)
    Mar-2004
  • 191,566.44 73,222.19 15,369.38
    (7.7) Jun-2004
  • 188,492.56 73,222.19 15,369.38 (7.5)
    Sep-2004
  • 186,955.62 73,222.19 15,369.38 (7.4)
    Dec-2004

42
8, Part F Calculate RMSE and Compare with the
earlier model using the Time Trend
Unemployment Regression
Trend Regression
43
8, Part G Scattergram of Sales and Inc.
Now, that looks more like what we want! It looks
like there is a relationship between inc. and
saleswhich makes sense, right?!
44
8, Part H I Estimate the Bivariate Reg. of
Income on Sales
  • Sales 76,808.83 120.11 Income
  • Dear Ms. Lynch,
  • Using income, rather than unemployment rate,
    substantially improved our estimates of sales.
    We obtained the expected sign, with statistically
    significant results. Our t-stats provide us with
    a confidence interval in excess of 99. Our R-Sq
    indicates that we explain about 86 of the
    variance in sales with income, and the MAPE
    decreased to 12.76. Overall, this is a much
    better model than the one using unemployment
    rate. Now, please promote me to a job that
    doesnt require me to do this anymore.
  • Your personal slave,
  • Flippin Hades Turwilliger
Write a Comment
User Comments (0)
About PowerShow.com