Analysis of Cross Section and Panel Data presentation

About This Presentation

Transcript and Presenter's Notes

Title: Analysis of Cross Section and Panel Data

1
Analysis of Cross Section and Panel Data

Yan Zhang
School of Economics, Fudan University
CCER, Fudan University

2
Introductory Econometrics A Modern
Approach

Yan Zhang
School of Economics, Fudan University
CCER, Fudan University

3
Analysis of Cross Section and Panel Data

Part 1. Regression Analysis on Cross Sectional
Data

4
Chap 2. The Simple Regression ModelPractice
for learning multiple Regression

Bivariate linear regression model
the slope parameter in the relationship
between y and x holding the other factors in u
fixed it is of primary interest in applied
economics.
the intercept parameter, also has its uses,
although it is rarely central to an analysis.

5
More Discussion

A one-unit change in x has the same effect
on y, regardless of the initial value of x.
Increasing returns wage-education (f. form)
Can we draw ceteris paribus conclusions about how
x affects y from a random sample of data, when we
are ignoring all the other factors?
Only if we make an assumption restricting how the
unobservable random variable u is related to the
explanatory variable x

6
Classical Regression Assumptions

Feasible assumption if the intercept term is
included
Linearly uncorrelated zero conditional
expectation
Meaning
???
PRF (Population Regression Function) sth. fixed
but unknown

7
OLS

Minimize uu
sample regression function (SRF)
The point is always on the OLS regression
line.

??????
PRF
8
OLS

Coefficient of determination
the fraction of the sample variation in y that is
explained by x.
the square of the sample correlation coefficient
between and
Low R-squareds

9
Units of Measurement

If one of the dependent variables is multiplied
by the constant cwhich means each value in the
sample is multiplied by cthen the OLS intercept
and slope estimates are also multiplied by c.
If one of the independent variables is divided or
multiplied by some nonzero constant, c, then its
OLS slope coefficient is also multiplied or
divided by c respectively.
The goodness-of-fit of the model, R-squareds,
should not depend on the units of measurement of
our variables.

10
Function Form

Linear Nonlinear
Logarithmic dependent variable
A
Percentage change in y, semi-elasticity
an increasing return to edu.
Other nonlinearity diploma effect
Bi-Logarithmic
A
a
Constant elasticity
Change of units of measurement
P45, error b0b0log(c1)-b1log(c2)

Bi-Logarithmic
A
a
Constant elasticity
Change of units of measurement
P45, error
b0b0log(c1)-b1log(c2)
Be proficient at interpreting the coef.

11
Unbiasedness of OLS Estimators

Statistical properties of OLS
????????????????OLS?? ?????
Assumptions
Linear in parameters (f. form advanced methods)
Random sampling (time series data nonrandom
sampling)
Zero conditional mean (unbiased biased
spurious cor)
Sample Variation in the independent variables
(colinearity)
Theorem (Unbiasedness)
Under the four assumptions above, we have

12
Variance of OLS Estimators

?????? ???,??? ???? ???
Assumptions
Homoskedasticity
Error variance
A larger means that the distribution of the
unobservables affecting y is more spread out.
Theorem (Sampling variance of OLS estimators)
Under the five assumptions above

13
Variance of y given x

Conditional mean
and variance of y
Heteroskedasticity

14
What does depend on?

More variation in the unobservables affecting y
makes it more difficult to precisely estimate
The more spread out is the sample of xi -s, the
easier it is to find the relationship between
E(y x) and x
As the sample size increases, so does the total
variation in the xi. Therefore, a larger sample
size results in a smaller variance of the
estimator

15
Estimating Error Variance

Errors (Disturbances) and Residuals
Errors , population
Residuals , estimated f.
Theorem (The unbiased estimator of )
Under the five assumptions above, we have
standard error of the regression (SER)
Estimating the standard deviation in y after the
effect of x has been taken out.
Standard Error of

16
Regression through the Origin

Regression through the Origin
Pass through
E.g. income tax revenue income
The estimator of OLS
only if 0
if the intercept 0, then is a
biased estimator of

17
Chap 3. Multiple Regression AnalysisEstimation

Advantages of multiple regression analysis
build better models for predicting the dependent
variable.
E.g.
generalize functional form.
Marginal propensity to consume
Be more amenable to ceteris paribus analysis
Chap 3.2
Key assumption
Implication other factors affecting wage are not
related on average to educ and exper.
Multiple linear regression model

the ceteris paribus effect of xj on y
18
Ordinary Least Square Estimator

SPF
OLS
Minimize
F.O.C
ceteris paribus interpretations
Holding fixed, then
Thus, we have controlled for the variables
when estimating the effect of x1 on y.

19
Holding Other Factors Fixed

The power of multiple regression analysis is that
it provides this ceteris paribus interpretation
even though the data have not been collected in a
ceteris paribus fashion.
it allows us to do in non-experimental
environments what natural scientists are able to
do in a controlled laboratory setting keep other
factors fixed.

20
OLS and Ceteris Paribus Effects

Step of OLS
(1) the OLS residuals from a multiple
regression
of x1 on
(2) the OLS estimator from a simple
regression
of y on
measures the effect of x1 on y after x2,,
xk have been partialled or netted out.
Two special cases in which the simple regression
of y on x1 will produce the same OLS estimate on
x1 as the regression of y on x1 and x2.

21
Goodness-of-fit

also equal the squared correlation coef.
between the actual and the fitted values of y.
R never decreases, and it usually increases when
another independent variable is added to a
regression.
The factor that should determine whether an
explanatory variable belongs in a model is
whether the explanatory variable has a nonzero
partial effect on y in the population.

22
Regression through the origin

the properties of OLS derived earlier no longer
hold for regression through the origin.
the OLS residuals no longer have a zero sample
average.
can actually be negative.
to calculate it as the squared correlation
coefficient
if the intercept in the population model is
different from zero, then the OLS estimators of
the slope parameters will be biased.

23
The Expectation of OLS Estimator

Assumptions(???????????????)
Linear in parameters
Random sampling
Zero conditional mean
No perfect co-linearity
none of the independent variables is constant
and there are no exact linear relationships among
the independent variables
Theorem (Unbiasedness)
Under the four assumptions above, we have

rank (X)K
24
Notice 1 Zero conditional mean

Exogenous Endogenous
Misspecification of function form (Chap 9)
Omitting the quadratic term
The level or log of variable
Omitting important factors that correlated with
any independent v.
???????????????,?????????,??????
Measurement Error (Chap 15, IV)
Simultaneously determining one or more x-s with y
(Chap 16, ?????)

25
Omitted Variable Bias The Simple Case

ProblemExcluding a relevant variable or
Under-specifying the model(???????????(??)??????)
Omitted Variable Bias (misspecification analysis)
The true population model
The underspecified OLS line
The expectation of
The Omitted variable bias

??3.2???x1?x2??
26
Omitted Variable Bias Nonexistence

Two cases where is unbiased
The true population model
is the sample covariance between x1 and x2
over the sample variance of x1
If , then
?????x2??,?????????,?x2???????????????
Summary of Omitted Variable Bias
The expectation of
The Omitted variable bias

27
The Size of Omitted Variable Bias

Direction Size
A small bias of either sign need not be a cause
for concern.
Unknown Some idea
we usually have a pretty good idea about the
direction of the partial effect of x2 on y, that
is, the sign of
in many cases we can make an educated guess about
whether x1 and x2 are positively or negatively
correlated.
E.g. (Upward/downward Bias biased toward zero)

??!
28
Omitted Variable Bias More General Cases

Suppose x2 and x3 are uncorrelated, but that x1
is correlated with x3.
Both and will normally be biased. The
only exception to this is when x1 and x2 are also
uncorrelated.
Difficult to obtain the direction of the bias in
and
Approximation if x1 and x2 are also uncor.

29
Notice 2 No Perfect Collinearity

An assumption only about x-s, nothing about the
relationship between u and x-s
Assumption MLR.4 does allow the independent
variables to be correlated they just cannot be
perfectly correlated. Ceteris Paribus
effect
If we did not allow for any correlation among the
independent variables, then multiple regression
would not be very useful for econometric
analysis.
Significance

30
Cases of Perfect Collinearity

When can independent variables be perfectly
collinear softwaresingular
Nonlinear functions of the same variable is not
an exact linear f.
Not to include the same explanatory variable
measured in different units in the same
regression equation.
More subtle ways
one independent variable can be expressed as an
exact linear function of some or all of the other
independent variables. Drop it
Key

31
Notice 3 Unbiase

the meaning of unbiasedness
an estimate cannot be unbiased an estimate is a
fixed number, obtained from a particular sample,
which usually is not equal to the population
parameter.
When we say that OLS is unbiased under
Assumptions MLR.1 through MLR.4, we mean that the
procedure by which the OLS estimates are obtained
is unbiased when we view the procedure as being
applied across all possible random samples.

32
Notice 4 Over-Specification

Inclusion of an irrelevant variable or
over-specifying the model
does not affect the unbiasedness of the OLS
estimators.
including irrelevant variables can have
undesirable effects on the variances of the OLS
estimators.

33
Variance of The OLS Estimators

Adding Assumptions
Homoskedasticity
Error variance
A larger means that the distribution of the
unobservables affecting y is more spread out.
Gauss-Markov assumptions (for cross-sectional
regression) Assumption 1-5
Theorem (Sampling variance of OLS estimators)
Under the five assumptions above

34
More about

The stastical properties of y on x(x1, x2, ,
xk)
Error variance
only one way to reduce the error variance to add
more explanatory variablesnot always possible
and desirable
The total sample variations in xj SSTj
Increase the sample size

35
Multi-collinearity(?????)

The linear relationships among the independent v.
???????xj?????(????)
If k2
the proportion of the total variation in
xj that can be explained by the other independent
variables
High (but not perfect)
correlation between two or more of the in
dependent variables is called multicollinearity.

36
Micro-numerosity problem of small sample size

High
Low SSTj
one thing is clear everything else being equal,
for estimating j, it is better to have less
correlation between xj and the other x-s.
How to solve the multicollinearity?
Increase sample size
Dropping some v.? ???????????????,??????

37
Notice The influence of multicollinearity

A high degree of correlation between certain
independent variables can be irrelevant as to how
well we can estimate other parameters in the
model.
E.g.

Importance for economistscontrolling v.

????
38
Variances in Misspecified Models
39
Whether or Not to Include x2 Two Favorable
Reasons

The choice of whether or not to include a
particular variable in a regression model can be
made by analyzing the tradeoff between bias and
variance..
However, when2 0, there are two favorable
reasons for including x2 in the model.
any bias in does not shrink as the sample
size grows
The variance of estimators both shrink to zero as
n increase
Therefor, the multicollinearity induced by adding
x2 becomes less important as the sample size
grows. In large samples, we would prefer

40
Estimating Standard Errors of the OLS
Estimators
????
41
EFFICIENCY OF OLS THE GAUSS-MARKOV THEOREM

BLUE
Best smallest variance
linear
unbiased

????(1)????????????????(2)??G-M????????,?BLUE???
???????????(???)????????????,???????????
42
Classical Linear Model AssumptionsInference
43
???????????

Jeffrey M. Wooldridge, Introductory
EconometricsA Modern Approach, Chap 2-3.

Write a Comment

User Comments (0)

About PowerShow.com

Analysis of Cross Section and Panel Data PowerPoint PPT Presentation