1 / 25

Heteroskedasticity

- Lecture 17

Todays plan

- How to test for it graphs, Park and Glejser

tests - What we can do if we find heteroskedasticity
- How to estimate in the presence of

heteroskedasticity

Palm Beach County revisited

- How far is Palm Beach an outlier?
- Can the outlier be explained by

heteroskedasticity? - If so, what are the consequences?
- Heteroskedasticity will affect the variance of

the regression line - It will consequently affect the variance of the

estimated coefficients and estimated 95 percent

confidence interval for the prediction (see

Lecture 10). - L17.xls provides an example of how to work

through a problem like this using Excel

Palm Beach County revisited (2)

- Palm Beach is a good example to use since there

are scale effects in the data - The voting pattern shows that the voting behavior

and number of registered voters are related to

the population in each county - As the county gets larger, voting patterns may

diverge from what would be assumed given the

number of registered voters - Note from the graph as we move away from the

origin, the difference between registered Reform

voters and Reform votes cast increases - Well hypothesize that this will have an affect

on heteroskedasticity

Notation

- Heteroskedasticity is observed as cross-section

variability in the data - data across units at point in time
- In our notation, heteroskedasticity is
- E(ei2) ? ?2
- We can also write
- E(ei2) ?i2
- This means that we expect variable variance the

variance changes with each unit of observation

Consequences

- When heteroskedasticity is present
- 1) OLS estimator is still linear
- 2) OLS estimator is still unbiased
- 3) OLS estimator is not efficient - the minimum

- variance property no longer holds
- 4) Estimates of the variances are biased
- 5)
- is not an unbiased estimator of sYX2
- 6) We cant trust the confidence intervals or

- hypothesis tests (t-tests F-tests) we

may draw the - wrong conclusions

Consequences (2)

- When BLUE holds and there is homoskedasticity,

the first-order condition gives - With heteroskedasticity, we have
- If we substitute the equation for ci to both

equations, we find

where

Cases

- With homoskedasticity around each point, the

variance around the regression line is constant - With heteroskedasticity around each point, the

variance around the regression line varies with

each value of the independent variable (with each

i)

Detecting heteroskedasticity

- There are three ways of detecting

heteroskedastiticy - 1) Graphically
- 2) Park Test
- 3) Glejser Test

Graphical detection

- Graph the errors (or error squared) against the

independent variable(s). Note you can use either

e or e2 on the y-axis. - With homoskedasticity we have E(ei, X) 0
- The errors are independent of the independent

variables - With heteroskedasticity we can get a variety of

patterns - The errors show a systematic relationship with

the independent variables

Graphical detection (2)

- Using the Palm Beach example (L17.xls), the

estimated regression equation was

- The errors of this equation, can be graphed

against the number of registered Reform party

voters, (the independent variable) - Graph shows that the errors increasing with the

number of registered reform voters - While the graphs may be convincing, we also want

to use a test to confirm this. We have two

Park Test

- Procedure
- 1) Run regression Yi a bXi ei despite the

heteroskedasticity problem (it can also be

multivariate) - 2) Obtain residuals (ei), square them (ei2), and

take their logs (ln ei2) - 3) Run a spurious regression
- 4) Do a hypothesis test on with H0 g1 0
- 5) Look at the results of the hypothesis test
- reject the null you have heteroskedasticity
- fail to reject the null homoskedasticity, or

which is a constant

Glejser Test

- When we use the Glejser, were looking for a

scaling effect - The procedure
- 1) Run the regression (it can also be

multivariate) - 2) Collect ei terms
- 3) Take the absolute value of the errors
- 4) Regress ei against independent variable(s)
- you can run different kinds of regressions

Glejser Test (2)

- 4) continued
- If heteroskedasticity takes one of these forms,

this will suggest an appropriate transformation

of the model - The null hypothesis is still H0 g1 0 since

were testing for a relationship between the

errors and the independent variables - We reach the same conclusions as in the Park Test

A cautionary note

- The errors in the Park Test (vi) and the Glejser

Test (ui) might also be heteroskedastic. - If this is the case, we cannot trust the

hypothesis test H0 g1 0 or the t-test - If we find heteroskedastic disturbances in the

data, what can we do? - Estimate the model Yi a bXi ei using

weighted least squares - Well look at two examples of weighted least

squares one where we know the true variance, and

one where we dont

Correction with known ?i2

- Given that the true variance is known and our

model is - Yi a bXi ei
- Consider the following transformation of the

model

- In the transformed model, let
- So the expected value of the error squared is

Correction with known ?i2 (2)

- Given that there is heteroskedasticity, E(ei2)

?i2 - thus
- In this simplistic example, we re-weighted model

by the constant ?i - What this example shows when the variance is

known, we must transform our model to obtain a

homoskedastic error term.

Correction with unknown ?i2

- Given an unknown variance, we need to state the

ad-hoc but plausible assumptions with our

variance ?i2 (how the errors vary with the

independent variable) - For example we can assert that E(ei2) ?2Xi
- Remember Glejser Test allows us to choose a

relationship between the errors and the

independent variable

Correction with unknown ?i2 (2)

- In this example you would transform the

estimating equation by dividing through by

to get

- Letting
- The expected value of this error squared is

Correction with unknown ?i2 (3)

- Recalling an earlier assumption, we find

- When we dont know the true variance we re-scale

the estimating equation by the independent

variable

Returning to Palm Beach

- On L17.xls we have presidential election data by

county in Florida - To get a correct estimating equation, we can run

a regression without Palm Beach if we think its

an outlier. - Then we can see if we can obtain a prediction for

the number of reform votes cast in Palm Beach - We can perform a Glejser Test for the regression

excluding Palm Beach - We run a regression of the absolute value of the

errors (ei)against registered Reform voters (Xi)

Returning to Palm Beach (2)

- The t-test rejects the null
- this indicates the presence of heteroskedasticity

- We can re-scale the model in different ways or

introduce a new independent variable (such as the

total number of registered voters by county) - Keep transforming the model and running the

Glejser Test - When we fail to reject the null there is no

longer heteroskedasticity in the model

Robust estimation

- Heteroskedastic tests not used any more. Most

software reports robust standard errors. Note

that this is also the approach of the text book. - Have looked at tests for heteroskedasticity to

get you used to weighted least squares.

Important for the topics to come. - Robust standard errors report approximations to

the estimation of the variance for the

coefficient when there is a non-constant

variance. It only holds for large samples. - Know that for a homoskedastic error term

Var(uiXi) s2 - Var(b) s2/Sxi2

Robust estimation (2)

- Using analogous arguments, we can state that for

the heteroskedastic case Var(uiXi) si2 - Var(b) si2 Sxi2 /(Sxi2)2
- This can be approximated (in the bi-variate model

case) by - Var(b) Sxi2ui2 /(Sxi2)2
- See L17_robust.xls and hetero.pdf to compare the

results from calculating the robust standard

error on the spreadsheet using EXCEL and the

results from STATA for robust estimation.

Summary

- Even with re-weighted equations, we might still

have heteroskedastic errors - so we have to rerun the Glejser Test until we

cannot reject the null - If we cannot reject the null, we may have to

rethink our model transformation - if we suspect a scale effect, we may want to

introduce new scaling variables - Variables from the re-scaled equation are

comparable with the coefficients from the

original model