1 / 19

Heteroskedasticity

What is Heteroskedasticity?

- One of the assumptions of the CLR model

(assumption V) is that the error term in the

linear model is drawn from a distribution with a

constant variance - When this is the case, we say that the errors are

homoskedastic - If this assumption does not hold, we run into the

problem of heteroskedasticity - Heteroskedasticity, as a violation of the

assumptions of the CLR model, causes the OLS

estimates to lose some of their nice properties

What is Heteroskedasticity?

- Example Suppose we take a cross-section sample

of firms from a certain industry and we want to

estimate a model with sales as the dependent

variable - We may suspect that sales of larger firms will

vary more than those of smaller firms, implying

that there will be heteroskedasticity - Heteroskedasticity is common in cross-section

data and needs to be identified and corrected

(firms, banks, insurance companies, mutual funds,

real estate, etc.)

The Heteroskedasticity Problem

- Heteroskedasticity can be distinguished between

two versions - Pure Heteroskedasticity
- Impure Heteroskedasticity
- Pure heteroskedasticity refers to

heteroskedasticity that occurs in a correctly

specified model - This term can be used to distinguish it from the

case of impure heteroskedasticity caused by a

specification error, such as an omitted variable

bias - Typically, when we refer to heteroskedasticity,

we imply the pure version

The Heteroskedasticity Problem

- If heteroskedasticity is present in a correctly

specified model, then the variance of the error

term is not constant - for i 1, 2, , n
- Implication Rather than being constant across

observations, the variance of the error term

changes depending on the observation

The Heteroskedasticity Problem

- Heteroskedasticity is common in cross-section

data where there is a wide disparity between

large and small observed values of the variables - The larger is this disparity, the higher is the

chance that the error term will not have a

constant variance - In the most frequent model of heteroskedasticity,

the variance of the error term (?i) depends on an

exogenous variable (Zi)

The Heteroskedasticity Problem

- We write
- The variable Z, called the proportionality

factor, may or may not be one of the explanatory

variables in the regression model - The higher is the value of Z, the higher is the

variance of the error term for observation i

The Heteroskedasticity Problem

- Example Trying to explain firm sales with a

cross-section sample of firms, the variable Zi

may be the asset size of firm i - This would imply that the larger the asset size

of firm i the more volatile will be that firms

sales - This example shows that heteroskedasticity is

likely to occur in a cross-section sample because

of the larger variability in the observations of

the dependent variable - Heteroskedasticity may also occur in time series

data (more rarely) or when the quality of data

changes in the sample

The Heteroskedasticity Problem

- If there is specification error in our model,

such as omitted variable bias, the errors may

exhibit impure heteroskedasticity - Example In the two-factor model of stock returns

estimated without the variable (INF), the errors

may exhibit heteroskedasticity - The error term now includes the variable (INF)

and the error term may be larger for larger

values of the variable (INF)

The Impact of Heteroskedasticity on the OLS

Estimates

- Heteroskedasticity has the following implications

for the OLS estimates - OLS estimates are still unbiased
- OLS estimates do not have the minimum variance

anymore (not efficient) - Heteroskedasticity causes OLS to underestimate

the variances and standard errors of the

estimated coefficients - This implies that the t-test and F-test are not

reliable - The t-statistics tend to be higher leading us to

reject a null hypothesis that should not be

rejected

Testing for the Presence of Heteroskedasticity

- There are several tests that can be used to

detect the presence of heteroskedasticity in the

data - We will cover two such tests
- The Breusch-Pagan Test
- The White Test
- These test the following null hypothesis against

the alternative that it is not true

Testing for the Presence of Heteroskedasticity

- The steps of the Breusch-Pagan test are as

follows - Estimate the original regression model by OLS and

obtain the squared OLS residuals (one for each

observation) - Run a new linear regression of the squared OLS

residuals on all the explanatory variables - Obtain the R-sq of this regression
- Calculate the following F-statistic using the

above R-sq

Testing for the Presence of Heteroskedasticity

- The above F-statistic follows an F distribution

with k degrees of freedom in the numerator and (n

k 1) degrees of freedom in the denominator - Reject the null hypothesis that there exists no

heteroskedasticity if the F-statistic is greater

than the critical F-value at the selected level

of significance - If the null cannot be rejected, then there exists

heteroskedasticity in the data and an alternative

estimation method to OLS must be followed

Testing for the Presence of Heteroskedasticity

- The steps of the White test are as follows
- Estimate the original regression model by OLS and

obtain the squared OLS residuals (one for each

observation) - Run a new regression of the squared OLS residuals

on each explanatory variable X, the square of

each explanatory variable X, and the product of

each variable X times every other variable X - Example If our original model has two

explanatory variables, then we would run the

following regression at the second stage

Testing for the Presence of Heteroskedasticity

- Obtain the R-sq of this regression
- Calculate the test statistic nR2 where n is the

sample size and R-sq is that obtained in the

previous step - This statistic follows a chi-square distribution

with K degrees of freedom (K number of

variables included in the second stage

regression) - In our example above, there are five explanatory

variables, so the test statistic nR2 will follow

a chi-square distribution with five degrees of

freedom - Reject the null hypothesis that there exists no

heteroskedasticity if the test-statistic is

greater than the critical ?2-value at the

selected level of significance

Correcting the Problem of Heteroskedasticity

- If heteroskedasticity is detected, an alternative

estimation approach to OLS must be used that

corrects this problem - Two commonly-used approaches are
- The method of Weighted Least Squares (WLS)
- Obtaining heteroskedasticity-corrected standard

errors - The WLS method transforms the original regression

model in order to make the errors have the same

variance - After this transformation takes place, the model

can be estimated by OLS since the

heteroskedasticity problem has been corrected

Correcting the Problem of Heteroskedasticity

- Example Suppose we want to estimate the

following model - and that the variance of the error term takes

the following form - where ?2 is the assumed constant variance and Zi

is the proportionality factor

Correcting the Problem of Heteroskedasticity

- If we divide the model by the proportionality

factor Zi we obtain the following model - The error term of the transformed model (ui) has

now a constant variance and, thus, the model can

be estimated by OLS - Note If the proportionality factor (or weight

variable) in the above regression is NOT any of

the explanatory variables, then we must include a

constant in the above model, otherwise a constant

is already included

Correcting the Problem of Heteroskedasticity

- The heteroskedasticity-corrected standard errors

is the most popular method to correct for

heteroskedasticity - This approach improves the estimation of the

models standard errors (SE) without having to

transform the estimated model - Given that these SE are more accurate, they can

be used for t-tests and other hypotheses tests - Typically, the corrected SE will be larger

leading to lower t-statistics - This approach works better in large samples and

some software packages do include it (such as

SPSS ?)