Qualitative and Limited Dependent Variable Models - PowerPoint PPT Presentation

About This Presentation
Title:

Qualitative and Limited Dependent Variable Models

Description:

ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova s notes – PowerPoint PPT presentation

Number of Views:358
Avg rating:3.0/5.0
Slides: 59
Provided by: tabak
Category:

less

Transcript and Presenter's Notes

Title: Qualitative and Limited Dependent Variable Models


1
Chapter 16
ECON 6002 Econometrics Memorial University of
Newfoundland
  • Qualitative and Limited Dependent Variable Models

Adapted from Vera Tabakovas notes
2
Chapter 16 Qualitative and Limited Dependent
Variable Models
  • 16.1 Models with Binary Dependent Variables
  • 16.2 The Logit Model for Binary Choice
  • 16.3 Multinomial Logit
  • 16.4 Conditional Logit
  • 16.5 Ordered Choice Models
  • 16.6 Models for Count Data
  • 16.7 Limited Dependent Variables

3
16.6 Models for Count Data
  • When the dependent variable in a regression
    model is a count of the number of occurrences of
    an event, the outcome variable is y 0, 1, 2, 3,
    These numbers are actual counts, and thus
    different from the ordinal numbers of the
    previous section. Examples include
  • The number of trips to a physician a person makes
    during a year.
  • The number of fishing trips taken by a person
    during the previous year.
  • The number of children in a household.
  • The number of automobile accidents at a
    particular intersection during a month.
  • The number of televisions in a household.
  • The number of alcoholic drinks a college student
    takes in a week.

4
16.6 Models for Count Data
  • If Y is a Poisson random variable, then its
    probability function is
  • This choice defines the Poisson regression model
    for count data.

(16.27)
rate
Also equal To the variance
(16.28)
5
16.6.1 Maximum Likelihood Estimation
If we observe 3 individuals one faces one event,
the other two two events each

6
16.6.2 Interpretation in the Poisson Regression
Model

So now you can calculate the predicted
probability of a certain number y of events
7
16.6.2 Interpretation in the Poisson Regression
Model

(16.29)
You may prefer to express this marginal effect as
a
8
16.6.2 Interpretation in the Poisson Regression
Model

If there is a dummy Involved, be careful, remember
Which would be identical to the effect of a
dummy In the log-linear model we saw under OLS
9
Extensions overdispersion
  • Under a plain Poisson the mean of the count is
    assumed to be equal to
  • the average (equidispersion)
  • This will often not hold
  • Real life data are often overdispersed
  • For example
  • a few women will have many affairs and many
    women will have few
  • a few travelers will make many trips to a park
    and many will make few
  • etc.

Slide16-9
Principles of Econometrics, 3rd Edition
10
Extensions overdispersion
use "C\bbbECONOMETRICS\Rober\GRAD\GROSMORNE.dta",
clear


Slide16-10
Principles of Econometrics, 3rd Edition
11
Extensions negative binomial

Under a plain Poisson the mean of the count is
assumed to be equal to the average
(equidispersion) The Poisson will inflate your
t-ratios in this case, making you think that your
model works better than it actually does ? Or
use a Negative Binomial model instead (nbreg) or
even a Generalised Negative Binomial (gnbreg) ,
which will allow you to model the overdispersion
parameter as a function of covariates of our
choice You can also test for overdispersion, to
test whether the problem is significant
Slide16-11
Principles of Econometrics, 3rd Edition
12
Extensions negative binomial

sum visits Variable Obs Mean
Std. Dev. Min Max ---------------
--------------------------------------------------
---- visits 966 1.416149
1.718147 1 26
Slide16-12
Principles of Econometrics, 3rd Edition
13
Extensions negative binomial

Slide16-13
Principles of Econometrics, 3rd Edition
14
Extensions excess zeros

Often the numbers of zeros in the sample cannot
be accommodated properly by a Poisson or Negative
Binomial model They would underpredict them
too There is said to be an excess zeros
problem You can then use hurdle models or zero
inflated or zero augmented models to accommodate
the extra zeros
Slide16-14
Principles of Econometrics, 3rd Edition
15
Extensions excess zeros

Often the numbers of zeros in the sample cannot
be accommodated properly by a Poisson or Negative
Binomial model They would underpredict them
too nbvargr Is a very useful command
Slide16-15
Principles of Econometrics, 3rd Edition
16
Extensions excess zeros
  • You can then use hurdle models or zero inflated
    or zero augmented
  • models to accommodate the extra zeros
  • They will also allow you to have a different
    process driving the value of the
  • strictly positive count and whether the value is
    zero or strictly positive
  • EXAMPLES
  • Number of extramarital affairs versus gender
  • Number of children before marriage versus
    religiosity
  • In the continuous case, we have similar models
    (e.g. Craggs Model) and an example is that of
    size of Insurance Claims from fires versus the
    age of the building

Slide16-16
Principles of Econometrics, 3rd Edition
17
Extensions excess zeros

You can then use hurdle models or zero inflated
or zero augmented models to accommodate the
extra zeros Hurdle ModelsA hurdle model is a
modified count model in which there are two
processes, one generating the zeros and one
generating the positive values. The two models
are not constrained to be the same. In the hurdle
model a binomial probability model governs the
binary outcome of whether a count variable has a
zero or a positive value. If the value is
positive, the "hurdle is crossed," and the
conditional distribution of the positive values
is governed by a zero-truncated count model.
Example smokers versus non-smokers, if you are a
smoker you will smoke!
Slide16-17
Principles of Econometrics, 3rd Edition
18
Extensions excess zeros

Hurdle ModelsIn Stata Joseph Hilbes
downloadable ado HPLOGIT will work, although it
does not allow for two different sets of
variables, just two different sets of
coefficients
Example smokers versus non-smokers, if you are a
smoker you will smoke!
Slide16-18
Principles of Econometrics, 3rd Edition
19
Extensions excess zeros

You can then use hurdle models or zero inflated
or zero augmented models to accommodate the
extra zeros Zero-inflated models (initially
suggested by D. Lambert) attempt to account for
excess zeros in a subtly different way. In this
model there are two kinds of zeros, "true zeros"
and excess zeros. Zero-inflated models estimate
also two equations, one for the count model and
one for the excess zero's. The key difference
is that the count model allows zeros now. It is
not a truncated count model, but allows for
corner solutions Example meat eaters (who
sometime just did not eat meat that week) versus
vegetarians who never ever do
Slide16-19
Principles of Econometrics, 3rd Edition
20
Extensions excess zeros

webuse fish We want to model how many fish are
being caught by fishermen at a state park.
Visitors are asked how long they stayed, how
many people were in the group, were there
children in the group and how many fish were
caught. Some visitors do not fish at all, but
there is no data on whether a person fished or
not. Some visitors who did fish did not catch
any fish (and admitted it ?) so there are excess
zeros in the data because of the people that did
not fish.
Slide16-20
Principles of Econometrics, 3rd Edition
21
Extensions excess zeros

. histogram count, discrete freq
Lots of zeros!
Slide16-21
Principles of Econometrics, 3rd Edition
22
Extensions excess zeros

Vuong test
Slide16-22
Principles of Econometrics, 3rd Edition
23
Extensions excess zeros

Vuong test
Slide16-23
Principles of Econometrics, 3rd Edition
24
Extensions truncation
  • Count data can be truncated too (usually at
    zero)
  • So ztp and ztnb can accommodate that
  • Example you interview visitors at the
    recreational site, so they all made at least that
    one trip
  • In the continuous case we would have to use the
    truncreg command

Slide16-24
Principles of Econometrics, 3rd Edition
25
Extensions truncation
This model works much better and showcases the
bias in the previous estimates

Smaller now estimated Consumer Surplus
Slide16-25
Principles of Econometrics, 3rd Edition
26
Extensions truncation
This model works much better and showcases the
bias in the previous estimates
  • Now accounting for overdispersion

Slide16-26
Principles of Econometrics, 3rd Edition
27
Extensions truncation and endogenous
stratification
  • Example you interview visitors at the
    recreational site, so they all made at least that
    one trip
  • You interview patients at the doctors office
    about how often they visit the doctor
  • You ask people in George St. how often the go to
    George St
  • Then you are oversampling frequent visitors and
    biasing your estimates, perhaps substantially

Slide16-27
Principles of Econometrics, 3rd Edition
28
Extensions truncation and endogenous
stratification
  • Then you are oversampling frequent visitors and
    biasing your estimates, perhaps substantially
  • It turns out to be supereasy to deal with a
    Truncated and Endogenously Stratified Poisson
    Model (as shown by Shaw, 1988)
  • Simply run a plain Poisson on Count-1 and that
    will work (In STATA poisson on the corrected
    count)
  • It is more complex if there is overdispersion
    though ?

Slide16-28
Principles of Econometrics, 3rd Edition
29
Extensions truncation and endogenous
stratification
  • Supereasy to deal with a Truncated and
    Endogenously Stratified Poisson Model

Much smaller now estimated Consumer Surplus
Slide16-29
Principles of Econometrics, 3rd Edition
30
Extensions truncation and endogenous
stratification
  • Endogenously Stratified Negative Binomial Model
    (as shown by Shaw, 1988 Englin and Shonkwiler,
    1995)

Even after accounting for overdispersion, CS
estimate is relatively low
Slide16-30
Principles of Econometrics, 3rd Edition
31
Extensions truncation and endogenous
stratification
  • How do we calculate the pseudo-R2 for this
    model???

Slide16-31
Principles of Econometrics, 3rd Edition
32
Extensions truncation and endogenous
stratification
  • GNBSTRAT will also allow you to model the
    overdispersion parameter in this case, just as
    gnbreg did for the plain case

Slide16-32
Principles of Econometrics, 3rd Edition
33
NOTE what is the exposure
  • Count models often need to deal with the fact
    that the counts may be measured over different
    observation periods, which might be of different
    length (in terms of time or some other relevant
    dimension)
  • For example, the number of accidents are recorded
    for 50 different intersections. However, the
    number of vehicles that pass through the
    intersections can vary greatly. Five accidents
    for 30,000 vehicles is very different from five
    accidents for 1,500 vehicles.
  • Count models account for these differences by
    including the log of the exposure variable in
    model with coefficient constrained to be one.
  • The use of exposure is often superior to
    analyzing rates as response variables as such,
    because it makes use of the correct probability
    distributions

Slide16-33
Principles of Econometrics, 3rd Edition
34
16.7 Limited Dependent Variables
  • 16.7.1 Censored Data
  • Figure 16.3 Histogram of Wifes Hours of Work in
    1975

35
16.7.1 Censored Data
  • Having censored data means that a substantial
    fraction of the observations on the dependent
    variable take a limit value. The regression
    function is no longer given by (16.30).
  • The least squares estimators of the regression
    parameters obtained by running a regression of y
    on x are biased and inconsistentleast squares
    estimation fails.

(16.30)
36
16.7.1 Censored Data
  • Having censored data means that a substantial
    fraction of the observations on the dependent
    variable take a limit value. The regression
    function is no longer given by (16.30).
  • The least squares estimators of the regression
    parameters obtained by running a regression of y
    on x are biased and inconsistentleast squares
    estimation fails.

(16.30)
37
Censoring versus Truncation
  • With truncation, we only observe the value of the
    regressors when the dependent variable takes a
    certain value (usually a positive one instead of
    zero)
  • With censoring we observe in principle the value
    of the regressors for everyone, but not the value
    of the dependent variable for those whose
    dependent variable takes a value beyond the limit

38
16.7.2 A Monte Carlo Experiment
  • We give the parameters the specific values and
  • Assume

(16.31)
39
16.7.2 A Monte Carlo Experiment
  • Create N 200 random values of xi that are
    spread evenly (or uniformly) over the interval
    0, 20. These we will keep fixed in further
    simulations.
  • Obtain N 200 random values ei from a normal
    distribution with mean 0 and variance 16.
  • Create N 200 values of the latent variable.
  • Obtain N 200 values of the observed yi using

40
16.7.2 A Monte Carlo Experiment
  • Figure 16.4 Uncensored Sample Data and Regression
    Function

41
16.7.2 A Monte Carlo Experiment
  • Figure 16.5 Censored Sample Data, and Latent
    Regression Function and Least Squares Fitted
    Line

42
16.7.2 A Monte Carlo Experiment

(16.32a)
(16.32b)
(16.33)
43
16.7.3 Maximum Likelihood Estimation
  • The maximum likelihood procedure is called Tobit
    in honor of James Tobin, winner of the 1981 Nobel
    Prize in Economics, who first studied this model.
  • The probit probability that yi 0 is

44
16.7.3 Maximum Likelihood Estimation
  • The maximum likelihood estimator is consistent
    and asymptotically normal, with a known
    covariance matrix.
  • Using the artificial data the fitted values are

(16.34)
45
16.7.3 Maximum Likelihood Estimation

46
16.7.4 Tobit Model Interpretation
  • Because the cdf values are positive, the sign of
    the coefficient does tell the direction of the
    marginal effect, just not its magnitude. If ß2 gt
    0, as x increases the cdf function approaches 1,
    and the slope of the regression function
    approaches that of the latent variable model.

(16.35)
47
16.7.4 Tobit Model Interpretation
  • Figure 16.6 Censored Sample Data, and Regression
    Functions for Observed and Positive y values

48
16.7.5 An Example

(16.36)
49
16.7.5 An Example

50
16.7.6 Sample Selection
  • Problem our sample is not a random sample. The
    data we observe are selected by a systematic
    process for which we do not account.
  • Solution a technique called Heckit, named after
    its developer, Nobel Prize winning econometrician
    James Heckman.

51
16.7.6a The Econometric Model
  • The econometric model describing the situation is
    composed of two equations. The first, is the
    selection equation that determines whether the
    variable of interest is observed.

(16.37)
(16.38)
52
16.7.6a The Econometric Model
  • The second equation is the linear model of
    interest. It is

(16.39)
(16.40)
(16.41)
53
16.7.6a The Econometric Model
  • The estimated Inverse Mills Ratio is
  • The estimating equation is

(16.42)
54
16.7.6b Heckit Example Wages of Married Women
(16.43)
55
16.7.6b Heckit Example Wages of Married Women
  • The maximum likelihood estimated wage equation is
  • The standard errors based on the full
    information maximum likelihood procedure are
    smaller than those yielded by the two-step
    estimation method.

(16.44)
56
Keywords
  • binary choice models
  • censored data
  • conditional logit
  • count data models
  • feasible generalized least squares
  • Heckit
  • identification problem
  • independence of irrelevant alternatives (IIA)
  • index models
  • individual and alternative specific variables
  • individual specific variables
  • latent variables
  • likelihood function
  • limited dependent variables
  • linear probability model
  • logistic random variable
  • logit
  • log-likelihood function
  • marginal effect

57
Further models
  • Survival analysis (time-to-event data analysis)
  • Multivariate probit (biprobit, triprobit,
    mvprobit)

58
References
  • Hoffmann, 2004 for all topics
  • Long, S. and J. Freese for all topics
  • Cameron and Trivedis book for count data
  • Agresti, A. (2001) Categorical Data Analysis (2nd
    ed). New York Wiley.
Write a Comment
User Comments (0)
About PowerShow.com