REGRESSION DIAGNOSTICS Detecting problems in regression models Treating them to obtain unbiased results - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

REGRESSION DIAGNOSTICS Detecting problems in regression models Treating them to obtain unbiased results

Description:

Check by White test or similar tests. ... ADF (Augmented Dickey-Fuller) - Test yt = + t + yt-1 + yt-1 + . + yt-p + t H0 : = 0 HA: ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 20
Provided by: nul8
Category:

less

Transcript and Presenter's Notes

Title: REGRESSION DIAGNOSTICS Detecting problems in regression models Treating them to obtain unbiased results


1
REGRESSION DIAGNOSTICSDetecting problems in
regression modelsTreating them to obtain
unbiased results
2
Assumptions of OLS Estimator
  • E(ei) 0 (unbiasedness)
  • Var(ei) is constant (homoscedasticity)
  • Cov(ui,uj) 0 (independent error terms)
  • Cov(ui,Xi) 0 (error terms unrelated to Xs)
  • ei iid (0 , ?2)
  • Gauss-Markov Theorem If these conditions hold,
    OLS is the best linear unbiased estimator (BLUE).
  • Additional Assumption eis are normally
    distributed.

3
3 illnesses in Regression
  1. Multicollinearity Strong relationship among
    explanatory variables.
  2. Heteroscedasticty Changing variance.
  3. Autocorrelated Error Terms this is a symptom of
    specification error.

4
Multicollinearity (strong relationship among
explanatory variables themselves)
11-4
  • Variances of regression coefficients are
    inflated.
  • Regression coefficients may be different from
    their true values, even signs.
  • Adding or removing variables produces large
    changes in coefficients.
  • Removing a data point may cause large changes in
    coefficient estimates or signs. (inconsistency)
  • In some cases, the F ratio may be significant, R2
    may be very high despite the all t ratios are
    insignificant (suggesting no significant
    relationship).

5
Solutions to the Multicollinearity Problem
11-5
  • Drop a collinear variable from the regression
  • Combine collinear variables (e.g. use their sum
    as one variable)

6
Heteroscedasticity
  • The variance of error terms is used in computing
    t-tests of ? coefficients. If this variance is
    not constant, then t-tests are not healthy (not
    efficient, i.e. the probability of type 2 error
    is higher).
  • However, the coefficients are unbiased.
    Therefore heteroscedasticity is not a fatal
    illness.
  • Check by White test or similar tests.
  • Use heteroscedasticity-adjusted t-statistics and
    p-values.

7
Autocorrelation in Error Terms
  • This is a fatal illness. Because, it indicates a
    specification error (missing variable, variables
    used in inappropriate form, etc.).
  • With the current incorrect specification, you
    cannot see the true coefficients, which you would
    see if you were estimating the correct model.
    Hence, this is a serious problem.
  • Check Durbin-Watson, Graph of error terms.
  • DW 2(1?) where et ?et-1 vt
  • Limitation of DW test statistic It only checks
    for first order serial correlation in residuals.

8
  • Breusch-Godfrey Test
  • checks for higher order autocorrelation AR(q) in
    residuals
  • H0 no serial correlation
  • Solution to the problem of serial correlation in
    et
  • Find the correct specification.
  • In time series, use first differences.

9
Time Series Regressions
  • Lagged variable Yt ß0ß1Xtß2Xt-1ut
  • Autoregressive Model
  • Xt ß1Xt-1ß2Xt-2ut AR(2)
  • Time-Trend Yt ß0 ß1Xt ß2Tt ut

10
Spurious Regressions
  • As a general and very strict rule
  • All variables in a time-series regression must be
    stationary.
  • Never run a regression with nonstationary
    variables!
  • DW statistic will warn. Usually, DW ltlt 2 .
  • As most economic time-series grow over time, if
    you run regression with non-stationary variables
    you will find spurious positive relationships.

11
STATIONARITY
  • A variable is called stationary if it displays
    mean-reverting behavior (i.e., if its mean
    remains constant over time).
  • Any regression with nonstationary variables is
    invalid.
  • Hence, any time-series application must start
    with two preliminary steps
  • Test stationarity of the variables
  • If they are not, convert them into a stationary
    form

12
  • A regression with nonstationary variables will
    typically reveal the problem with a Durbin-Watson
    (DW) statistic being significantly smaller than
    2.
  • DW statistic measures the first-order
    autocorrelation in the error term. DW ltlt 2
    implies positive autocorrelation in the error
    term.
  • --------------------
  • Financial Markets Application All price series
    are typically nonstationary. Therefore, we use
    returns.
  • Rt ln(Pt / Pt-1)

13
TESTING STATIONARITY UNIT ROOT TESTS
  • ADF Test
  • H0 the series is non-stationary
  • (i.e. it has a unit root)
  • ADF test statistics need to be compared to
    McKinnon critical values. If H0 can be rejected
    (the test statistic more negative than the
    critical value), then the variable can be used in
    regression.

14
  • ADF (Augmented Dickey-Fuller) - Test
  • ?yt a ßt ?yt-1 ??yt-1 . ??yt-p et
  • H0 ? 0 HA ? lt 0
  • (PP test makes a nonparametric adjustment for
    lagged changes.)
  • Test equation derived from the primitive form
    Yt ßYt-1 et
  • ß lt 1 ? stationary ß 1 ? non-stationary
    ß gt 1 ? explosive
  • KPSS Test H0 the series is stationary
  • HA the series is
    non-stationary
  • (use when the sample size is small)

15
Treating Non-stationary variables
  • Before using a non-stationary series in any
    regression, we have to first treat it.
  • Possible Remedies
  • 1) first-difference it ?yt yt - yt-1
  • A series is
  • I(0) if it is stationary
  • I(1) if it becomes stationary when differenced
    once
  • I(2) if it becomes stationary when differenced
    twice
  • 2) adjust for trend
  • Sometimes a series can become stationary after
    de-trending, the it is called trend-stationary.
  • 3) Field-specific treatments Use
    inflation-adjusted series in financial
    time-series use log returns.

16
  • Sometimes, a variable may be stationary but can
    have strong persistence. To check this, obtain
    ACF and PACF.
  • A variable yt is called white noise if
  • yt
    i.i.d. (0, ?2)
  • When Yt and Xt are both white noise, then a
    regression analysis in the form Yt ß0ß1Xtet
  • is adequate, otherwise the problem of serial
    correlation in residuals will arise.
  • Portmanteau Test
  • H0 Xt is white noise
  • HA Xt is not white noise
  • Solution include lags of the persistent
    dependent variable.

17
Summary
  • Serial correlation in et may result from various
    reasons, each signaling that the econometrician
    is doing something wrong
  • 1) missing variable (omitted variable bias)
  • 2) incorrect functional form
  • 3) using nonstationary variables
  • 4) persistence in the variables
  • 5) (linear deterministic) trend
  • These reasons are interrelated (e.g., to correct
    persistence in Xt, you add Xt-1, which was a
    missing variable in the original specification).

18
Diagnosis
  • 1) Always see the time series plots of all
    variables before you run any regression. Check
    for stationarity, persistence, trend, seasonality
    and outliers. Any unexpected result should remind
    you to follow all steps on this page.
  • 2) Formal tests. Before using any variable in
    regression Always perform unit root tests, Check
    persistence (autocorrelations).
  • 3) Post-regression Always perform Durbin-Watson
    and Breusch-Godfrey tests for serial correlation
    in residuals.

19
Treatment
  • 1) Try to find the reason, if it is due to a
    distortion (e.g., a missing variable, inaccurate
    model specification, using nominal variables,
    trend term, persistence), the first-best solution
    is to remove the distortion (find all relevant
    variables and the most appropriate model
    specification this is done by reviewing the
    theory, use real variables, adjust for trend, add
    lags).
  • 2) First-differencing is the ultimate solution,
    especially in case of nonstationary variables.
  • Take first difference of logs ?ln(yt)
    ln(yt)-ln(yt-1)
Write a Comment
User Comments (0)
About PowerShow.com