1 / 54

Heteroskedasticity

Objectives

- What is heteroskedasticity?
- What are the consequences?
- How is heteroskedasticity identified?
- How is heteroskedasticity corrected?

Main empirical model for Unit 10 foodexpi

?0 ?1incomei ?i.

foodexp Family food expenditure income

Family income

Least squares estimates, US data (UE_Tab0301)

Is this the best estimated equation?

1. The Nature of Heteroskedasticity

- In a regression about firms, for the same mistake,

- Heteroskedasticity is a problem that occurs when

the error term does not have a constant variance.

CLRM Each error term comes from the same

probability distribution.

Assumption CLRM.5 is violated!

Regression Model

Yi b0 b1X1i b2X2i ?i

E(?iX1i,X2i) 0

zero mean

var(?iX1i,X2i) s 2

homoskedasticity

Identical distributions for observations i and j

Distribution for i

Distribution for j

HomoskedasticityYi ?0 ?1Xi ?i

var(?iXi) s2 for all i

Conditional Distribution

HeteroskedasticityYi ?0 ?1Xi

?i var(?iXi) si2 for all i

Conditional Distribution

(No Transcript)

(No Transcript)

Pure heteroskedasticity Different variances

of the error term. Correctly specified PRF.

Impure heteroskedasticity Different variances

of the error term. Specification error.

2. Detecting Heteroscedasticity

2.1 Graphical Method

Plotting foodexp against income (for one

regressor)

Example 1 Food expenditure, US Data

(UE_Tab0301)

Example 1 Food expenditure, US Data, UE_Tab0301

Plotting e against income.

Plotting e2 against income.

Example 2 textbook data, (Woody3)

3.2 Park Test

Model Yi ?0 ?1X1i ?KXKi ?t i

1,,N () Suppose it is suspected that var(?i)

depends on Zi in the form of var(??i)

?i2 ?2Zi?1evi ln?i2 ln?2 ?1lnZki vi

Ho ?1 0 (Homoskedastic errors) HA ?1 ? 0

(Heteroskedastic errors).

Step 1 Estimate the equation () with OLS and

obtain the residuals.

Step 2 Regress the natural log of squared

residuals on the natural log of a possible

proportionality factor

ln(ei2) ?0 ?1lnZi vi where vi is

an error term satisfying all classical

assumptions.

Step 3 If the coefficient of lnZ is

significantly different from zero, then it would

suggest that there is heteroscedastic pattern in

the residuals with respect to Z. Otherwise,

homoscedastic errors cannot be rejected.

Example 3 Park Test US data (UE_Tab0301)

ln(e2) -7.46 2.07 ln(income)

t (2.28)

p-value (0.0284)

- Advantages of the Park test
- The test is simple.
- It provides information about the variance

structure.

- Limitations of the Park test
- The distribution of the dependent variable is

problematic. - It assumes a specific functional form.
- It does not work when the variance depends on two

or more variables. - The correct variable with which to order the

observations must be identified first. - It cannot handle partitioned data.

3.3 Whites Test

Model Yi ?0 ?1X1i ?2X2i ?i i

1,,N () Suppose it is suspected there may be

heteroskedasticity but we are not sure of its

functional form.

Ho The conditional variance of ?i is

constant. HA The conditional variance of ?i is

not constant.

Step 1 Estimate the equation () with OLS and

obtain the residuals.

Step 2 Regress the squared residuals on all

explanatory variables, all cross product terms

and the square of each explanatory variable.

ei2 ?0 ?1X1i ?2X2i ?3X1i2

?4X2i2 ?5X1iX2i vi

Step 3 Test the overall significance of the

equation in Step 2. (df number of regressors)

Statistic NR2white ?2df Critical value (cv)

?2df,?

Reject the hypothesis of homoskedasticity if

NR2err gt cv.

Example 4 White test US data (UE_Tab0301)

e2 1924 7.4 income

0.0088income2 R2 0.3646, N 40,

N?R2 14.58 cv ?2(2, 0.01) 9.21.

Advantages of the White test a. It does not

assume a specific functional form. b. It is

applicable when the variance depends on two or

more variables.

- Limitations of the White test
- It is an large-sample test.
- It provides no information about the variance

structure. - It loses many degrees of freedom when there are

many regressors. - It cannot handle partitioned data.
- It also captures specification errors.

3. Consequences of Heteroskedasticity

If heteroskedasticity appears but OLS is used for

estimation, how are the OLS estimates affected?

Unaffected OLS estimators are still linear and

unbiased because, on average, overestimates are

as likely as underestimates.

3.1 OLS estimators are inefficient.

Some fluctuations of the error term are

attributed to the variation in independent

variables.

There are other linear and unbiased estimators

that have smaller variances than the OLS

estimator.

3.2 Unreliable Hypothesis Testing

? unreliable testing conclusion

4. Remedies

4.1 Heteroskedasticity-Corrected Standard Errors

Yi b0 b1X1i b2X2i ?i

var(?i) si 2

heteroskedasticity

OLS estimators are unbiased. The standard errors

of OLS are biased.

A heteroskedasticity-consistent (HC) standard

error of an estimated coefficient is a standard

error of an estimated coefficient adjusted for

heteroskedasticity.

a. HC standard errors are consistent for any

type of heteroskedasticity. b. Hypothesis tests

are valid with HC standard errors in large

samples. c. Typically, HC se gt OLS se

Example 5 Yi ?0 ?1Xi ?i,

var(?iXi) ?i.

incorrect variance formula

correct variance formula

HC estimator of the variance of the slope

coefficient in a simple regression model

Example 6 HC Standard Errors, US data

(UE_Tab0301)

4.2 Weighted Least Squares

Yi b0 b1X1i b2X2i ?i

The variance is assumed to be proportional

to the value of Zi2

si 2 c Zi 2

Step 1 Decide which variable is proportional to

the heteroskedasticity. Step 2 Divide all terms

in the original model by that variable (divide

by Zi ).

Step 3 Run least squares on the transformed

model which has new variables. Note that the

transformed model have an intercept only if Z is

one of the explanatory variables.

For example, if Zi X2i, then

Example 7 WLS US data (UE_Tab0301)

What are values of the estimated coefficients of

the original model?

Has the problem of heteroskedasticity solved?

Comparing different estimates US data

(UE_Tab0301)

?0 ?1

OLS estimate 40.77 0.128

OLS se 22.14 0.031

HC se 24.32 0.039

WLS estimate 21.28 0.158

WLS se 14.03 0.023

The WLS estimates have improved upon those of OLS.

Other possibilities

- var(?i) cZi
- var(?i) cZi?
- var(?i) c(a1X1i a2X2i)

In large samples HC standard errors are

consistent measures for any type of

heteroscedasticity. CI t-test are valid.

4.3 Re-specifying the Regression Model

The heteroskedasticity may be impure.

4.3.1 Use another functional form

E.g., Double-log Less variation

Example 8 US data (UE_Tab0301)

The hypothesis of constant variance can be

rejected.

Example 9 India data (Food_India55)

Empirical model foodexpi ?0

?1totexpi ?i.

The hypothesis of homoskedasticity can be

rejected by the Park and White tests.

Which model is the best?

Double-log

HC

WLS

4.3.2 Other reformulations E.g., take average of

variables related to the size of observed units,

adding more variables

Example 10 Data set Concert The concert tour

of a singer in the US

revenue ?0 ?1adv ?2stad ?3cd

?4radio ?5weekend ?.

(1)

(2)

(3)

- Remarks
- The variable Z is difficult to identify. The

functional relationship between the error and Z

is not known. Use WLS at last. - With correct WLS, we expect the standard errors

of the regression coefficients will be smaller

than the OLS counterparts. - A log transformation usually reduces the degree

of heteroskedasticity. - The hypothesis of homoskedasticity should not be

rejected in the new model.

5. A Complete Example

Sources Section 8.2.2 (pp. 255 256)

Section 10.5 (pp. 369 376)

Empirical regression model

pconi ?0 ?1regi ?2taxi ?3uhmi ?i.

pconi1 petroleum consumption in the ith

state regi motor vehicle registrations in the

ith state (000) taxi the gasoline tax rate in

the ith state(cents per gallon) uhm urban

highway miles wihtin the ith state

Equation 1

Equation 2

Graphical investigation

Park test

White test

Checking for other specifications Double

log, quadratic

(4)

(5)

(6)

Selected Exercises

Ch. 10 Q. 1, 3, 4, 5, 8, 10, 12, 14

Regression Model

Yi b0 b1X1i b2X2i ?i

E(?iX1i,X2i) 0

zero mean

var(?iX1i,X2i) s 2

homoskedasticity

HeteroskedasticityYi ?0 ?1Xi

?i var(?iXi) si2 for all i

f(Y)

Y

0

X

Conditional Distribution

Step 3 Test the overall significance of the

equation in Step 2. (df number of regressors)

Statistic NR2err ?2df Critical value (cv)

?2df,?

Reject the hypothesis of homoskedasticity if

NR2err gt cv.

Step 1 Decide which variable is proportional to

the heteroskedasticity. Step 2 Divide all terms

in the original model by that variable (divide

by Zi ).

Step 3 Run least squares on the transformed

model which has new variables. Note that the

transformed model have an intercept only if Z is

one of the explanatory variables.

For example, if Zi X2i, then

In large samples HC standard errors are

consistent measures for any type of

heteroscedasticity. CI t-test are valid.