Structure of the class - PowerPoint PPT Presentation

Loading...

PPT – Structure of the class PowerPoint presentation | free to download - id: 7c18d3-YTRmY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Structure of the class

Description:

Structure of the class The linear probability model Maximum likelihood estimations Binary logit models and some other models Multinomial models ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 67
Provided by: Lione155
Learn more at: http://vidac.org
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Structure of the class


1
Structure of the class
  1. The linear probability model
  2. Maximum likelihood estimations
  3. Binary logit models and some other models
  4. Multinomial models

2
The Linear Probability Model
3
The linear probability model
  • When the dependent variable is binary (0/1, for
    example, Y1 if the firm innovates, 0 otherwise),
    OLS is called the linear probability model.
  • How should one interpret ßj? Provided that
    E(uX)0 holds true, then
  • ß measures the variation of the probability of
    success for a one-unit variation of X (?X1)

4
Limits of the linear probability model
  1. Non normality of errors
  2. Heteroskedastic errors
  3. Fallacious predictions

5
Overcoming the limits of the LPM
  • Non normality of errors
  • Increase sample size
  • Heteroskedastic errors
  • Use robust estimators
  • Fallacious prediction
  • Perform non linear or constrained regressions

6
Persistent use of LPM
  • Although it has limits, the LPM is still used
  • In the process of data exploration (early stages
    of the research)
  • It is a good indicator of the marginal effect of
    the representative observation (at the mean)
  • When dealing with very large samples, least
    squares can overcome the complications imposed by
    maximum likelihood techniques.
  • Time of computation
  • Endogeneity and panel data problems

7
The LOGIT/PROBIT Model
8
Probability, odds and logit/probit
  • We need to explain the occurrence of an event
    the LHS variable takes two values y01.
  • In fact, we need to explain the probability of
    occurrence of the event, conditional on X P(Yy
    X) ? 0 1.
  • OLS estimations are not adequate, because
    predictions can lie outside the interval 0
    1.
  • We need to transform a real number, say z to ?
    -88 into P(Yy X) ? 0 1.
  • The logit/probit transformation links a real
    number z ? -88 to P(Yy X) ? 0 1.It is
    also called the link function

9
Binary Response Models Logit - Probit
  • Link function approach

10
Maximum likelihood estimations
  • OLS can be of much help. We will use Maximum
    Likelihood Estimation (MLE) instead.
  • MLE is an alternative to OLS. It consists of
    finding the parameters values which is the most
    consistent with the data we have.
  • The likelihood is defined as the joint
    probability to observe a given sample, given the
    parameters involved in the generating function.
  • One way to distinguish between OLS and MLE is as
    follows

OLS adapts the model to the data you have you
only have one model derived from your data. MLE
instead supposes there is an infinity of models,
and chooses the model most likely to explain your
data.
11
Likelihood functions
  • Let us assume that you have a sample of n random
    observations. Let f(yi ) be the probability that
    yi 1 or yi 0. The joint probability to
    observe jointly n values of yi is given by the
    likelihood function
  • Logit likelihood

12
Likelihood functions
  • Knowing p (as the logit), having defined f(.), we
    come up with the likelihood function

13
Log likelihood (LL) functions
  • The log transform of the likelihood function (the
    log likelihood) is much easier to manipulate, and
    is written

14
Maximum likelihood estimations
  • The LL function can yield an infinity of values
    for the parameters ß.
  • Given the functional form of f(.) and the n
    observations at hand, which values of parameters
    ß maximize the likelihood of my sample?
  • In other words, what are the most likely values
    of my unknown parameters ß given the sample I
    have?

15
Maximum likelihood estimations
The LL is globally concave and has a maximum. The
gradient is used to compute the parameters of
interest, and the hessian is used to compute the
variance-covariance matrix.
However, there is not analytical solutions to
this non linear problem. Instead, we rely on a
optimization algorithm (Newton-Raphson)
You need to imagine that the computer is going to
generate all possible values of ß, and is going
to compute a likelihood value for each (vector of
) values to then choose (the vector of) ß such
that the likelihood is highest.
16
Binary Dependent Variable Research questions
  • We want to explore the factors affecting the
    probability of being successful innovator (inno
    1) Why?

17
Logistic Regression with STATA
? Instruction Stata logit
logit y x1 x2 x3 xk if weight , options
  • Options
  • noconstant estimates the model without the
    constant
  • robust estimates robust variances, also in case
    of heteroscedasticity
  • if it allows to select the observations we want
    to include in the analysis
  • weight it allows to weight different
    observations

18
Interpretation of Coefficients
  • A positive coefficient indicates that the
    probability of innovation success increases with
    the corresponding explanatory variable.
  • A negative coefficient implies that the
    probability to innovate decreases with the
    corresponding explanatory variable.
  • Warning! One of the problems encountered in
    interpreting probabilities is their
    non-linearity the probabilities do not vary in
    the same way according to the level of regressors
  • This is the reason why it is normal in practice
    to calculate the probability of (the event
    occurring) at the average point of the sample

19
Interpretation of Coefficients
  • Lets run the more complete model
  • logit inno lrdi lassets spe biotech

20
Interpretation of Coefficients
  • Using the sample mean values of rdi, lassets, spe
    and biotech, we compute the conditional
    probability

21
Marginal Effects
  • It is often useful to know the marginal effect of
    a regressor on the probability that the event
    occur (innovation)
  • As the probability is a non-linear function of
    explanatory variables, the change in probability
    due to a change in one of the explanatory
    variables is not identical if the other variables
    are at the average, median or first quartile,
    etc. level.

22
Goodness of Fit Measures
  • In ML estimations, there is no such measure as
    the R2
  • But the log likelihood measure can be used to
    assess the goodness of fit. But note the
    following
  • The higher the number of observations, the lower
    the joint probability, the more the LL measures
    goes towards -8
  • Given the number of observations, the better the
    fit, the higher the LL measures (since it is
    always negative, the closer to zero it is)
  • The philosophy is to compare two models looking
    at their LL values. One is meant to be the
    constrained model, the other one is the
    unconstrained model.

23
Goodness of Fit Measures
  • A model is said to be constrained when the
    observed set the parameters associated with some
    variable to zero.
  • A model is said to be unconstrained when the
    observer release this assumption and allows the
    parameters associated with some variable to be
    different from zero.
  • For example, we can compare two models, one with
    no explanatory variables, one with all our
    explanatory variables. The one with no
    explanatory variables implicitly assume that all
    parameters are equal to zero. Hence it is the
    constrained model because we (implicitly)
    constrain the parameters to be nil.

24
The likelihood ratio test (LR test)
  • The most used measure of goodness of fit in ML
    estimations is the likelihood ratio. The
    likelihood ratio is the difference between the
    unconstrained model and the constrained model.
    This difference is distributed c2.
  • If the difference in the LL values is (no)
    important, it is because the set of explanatory
    variables brings in (un)significant information.
    The null hypothesis H0 is that the model brings
    no significant information as follows
  • High LR values will lead the observer to reject
    hypothesis H0 and accept the alternative
    hypothesis Ha that the set of explanatory
    variables does significantly explain the outcome.

25
The McFadden Pseudo R2
  • We also use the McFadden Pseudo R2 (1973). Its
    interpretation is analogous to the OLS R2.
    However its is biased doward and remain generally
    low.
  • Le pseudo-R2 also compares The likelihood ratio
    is the difference between the unconstrained model
    and the constrained model and is comprised
    between 0 and 1.

26
Goodness of Fit Measures
Constrained model
Unconstrained model
27
Other Binary Choice models
  • The Logit model is only one way of modeling
    binary choice models
  • The Probit model is another way of modeling
    binary choice models. It is actually more used
    than logit models and assume a normal
    distribution (not a logistic one) for the z
    values.
  • The complementary log-log models is used where
    the occurrence of the event is very rare, with
    the distribution of z being asymetric.

28
Other Binary Choice models
  • Probit model
  • Complementary log-log model

29
Likelihood functions and Stata commands
  • Example
  • logit inno rdi lassets spe pharma
  • probit inno rdi lassets spe pharma
  • cloglog inno rdi lassets spe pharma

30
Probability Density Functions
31
Cumulative Distribution Functions
32
Comparison of models
OLS Logit Probit C log-log
Ln(RD intensity) 0.110 0.752 0.422 354
3.90 3.57 3.46 3.13
ln(Assets) 0.125 0.997 0.564 0.493
8.58 7.29 7.53 7.19
Spe 0.056 0.425 0.224 0.151
1.11 1.01 0.98 0.76
BiotechDummy 0.442 3.799 2.120 1.817
7.49 6.58 6.77 6.51
Constant -0.843 -11.634 -6.576 -6.086
3.91 6.01 6.12 6.08
Observations 431 431 431 431
Absolute t value in brackets (OLS) z value for
other models. 10, 5, 1
33
Comparison of marginal effects
OLS Logit Probit C log-log
Ln(RD intensity) 0.110 0.082 0.090 0.098
ln(Assets) 0.125 0.110 0.121 0.136
Specialisation 0.056 0.046 0.047 0.042
Biotech Dummy 0.442 0.368 0.374 0.379
For all models logit, probit and cloglog,
marginal effects have been computed for a
one-unit variation (around the mean) of the
variable at stake, holding all other variables at
the sample mean values.
34
Multinomial LOGIT Models
35
Multinomial models
  • Let us now focus on the case where the dependent
    variable has several outcomes (or is
    multinomial). For example, innovative firms may
    need to collaborate with other organizations. One
    can code this type of interactions as follows
  • Collaborate with university (modality 1)
  • Collaborate with large incumbent firms (modality
    2)
  • Collaborate with SMEs (modality 3)
  • Do it alone (modality 4)
  • Or, studying firm survival
  • Survival (modality 1)
  • Liquidation (modality 2)
  • Mergers acquisition (modality 3)

36
Multinomial Logit - Idea
Multiple alternatives without obvious ordering
? Choice of a single alternative out of a number
of distinct alternatives
e.g. which means of transportation do you use to
get to work?
bus, car, bicycle etc.
? example for ordered structure how do you feel
today very well, fairly well, not too well,
miserably
37
(No Transcript)
38
Random Utility Model
  • RUM underlies economic interpretation of discrete
    choice models. Developed by Daniel McFadden for
    econometric applications
  • see JoEL January 2001 for Nobel lecture also
    Manski (2001) Daniel McFadden and the Econometric
    Analysis of Discrete Choice, Scandinavian Journal
    of Economics, 103(2), 217-229
  • Preferences are functions of biological taste
    templates, experiences, other personal
    characteristics
  • Some of these are observed, others unobserved
  • Allows for taste heterogeneity
  • Discussion below is in terms of individual
    utility (e.g. migration, transport mode choice)
    but similar reasoning applies to firm choices

39
Random Utility Model
  • Individual is utility from a choice j can be
    decomposed into two components
  • Vij is deterministic common to everyone, given
    the same characteristics and constraints
  • representative tastes of the population e.g.
    effects of time and cost on travel mode choice
  • ?ij is random
  • reflects idiosyncratic tastes of i and unobserved
    attributes of choice j

40
Random Utility Model
  • Vij is a function of attributes of alternative j
    (e.g. price and time) and observed consumer and
    choice characteristics.
  • We are interested in finding ?, ?, ?
  • Lets forget about z now for simplicity

41
RUM and binary choices
  • Consider two choices e.g. bus or car
  • We observe whether an individual uses one or the
    other
  • Define
  • What is the probability that we observe an
    individual choosing to travel by bus?
  • Assume utility maximisation
  • Individual chooses bus (y1) rather than car
    (y0) if utility of commuting by bus exceeds
    utility of commuting by car

42
RUM and binary choices
  • So choose bus if
  • So the probability that we observe an individual
    choosing bus travel is

43
The linear probability model
  • Assume probability depends linearly on observed
    characteristics (price and time)
  • Then you can estimate by linear regression
  • Where is the dummy variable for mode
    choice (1 if bus, 0 if car)
  • Other consumer and choice characteristics can be
    included (the zs in the first slide in this
    section)

44
Probits and logits
  • Common assumptions
  • Cumulative normal distribution function
    Probit
  • Logistic function Logit
  • Estimation by maximum likelihood

45
Multinomial Logit
  • A discrete choice underpinning
  • choice between M alternatives
  • decision is determined by the utility level Uij,
    an individual i derives from choosing alternative
    j
  • Let
  • where i1,,N individuals j0,,J alternatives

(1)
The alternative providing the highest level of
utility will be chosen.
46
Multinomial Logit
  • The probability that alternative j will be chosen
    is
  • In general, this requires solving
    multidimensional integrals ? analytical solutions
    do not exist

47
Multinomial Logit
Exception If the error terms eij in are assumed
to be independently identically standard
extreme value distributed, then an analytical
solution exists. In this case, similar to binary
logit, it can be shown that the choice
probabilities are
48
Likelihood functions
  • Let us assume that you have a sample of n random
    observations. Let f(yj ) be the probability that
    yi j. The joint probability to observe jointly
    n values of yj is given by the likelihood
    function
  • We need to specify function f(.). It comes from
    the empirical discrete distribution of an event
    that can have several outcomes. This is the
    multinomial distribution. Hence

49
The maximum likelihood function
  • The maximum likelihood function reads

50
The maximum likelihood function
  • The log transform of the likelihood yields

51
Multinomial logit models
? Stata Instruction mlogit
mlogit y x1 x2 x3 xk if weight , options
  • Options noconstant omits the constant
  • robust controls for heteroskedasticity
  • if select observations
  • weight weights observations

52
Multinomial logit models
  • use mlogit.dta, clear
  • mlogit type_exit log_time log_labour entry_age
    entry_spin cohort_

Goodness of fit
Base outcome, chosen by STATA, with the highest
empirical frequency
53
Interpretation of coefficients
The interpretation of coefficients always refer
to the base category
Does the probability of being bought-out decrease
overtime ?
No! Relative to survival the probability of being
bought-out decrease overtime
54
Interpretation of coefficients
The interpretation of coefficients always refer
to the base category
Is the probability of being bought-out lower for
spinoff?
No! Relative to survival the probability of being
bought-out is lower for spinoff
55
Multinomial Logit - Coefficients
Marginal Effects
Elasticities
? relative change of pij if x increases
by 1 per cent
56
Independence of irrelevant alternatives - IAA
  • The model assumes that each pair of outcome is
    independent from all other alternatives. In other
    words, alternatives are irrelevant.
  • From a statistical viewpoint, this is tantamount
    to assuming independence of the error terms
    across pairs of alternatives
  • A simple way to test the IIA property is to
    estimate the model taking off one modality
    (called the restrained model), and to compare the
    parameters with those of the complete model
  • If IIA holds, the parameters should not change
    significantly
  • If IIA does not hold, the parameters should
    change significantly

57
Multinomial logit and IIA
  • Many applications in economic and geographical
    journals (and other research areas)
  • The multinomial logit model is the workhorse of
    multiple choice modelling in all disciplines.
    Easy to compute
  • But it has a drawback

58
Independence of Irrelevant Alternatives
  • Consider market shares
  • Red bus 20
  • Blue bus 20
  • Train 60
  • IIA assumes that if red bus company shuts down,
    the market shares become
  • Blue bus 20 5 25
  • Train 60 15 75
  • Because the ratio of blue bus trips to train
    trips must stay at 13

59
Independence of Irrelevant Alternatives
  • Model assumes that unobserved attributes of all
    alternatives are perceived as equally similar
  • But will people unable to travel by red bus
    really switch to travelling by train?
  • Most likely outcome is (assuming supply of bus
    seats is elastic)
  • Blue bus 40
  • Train 60
  • This failure of multinomial/conditional logit
    models is called the
  • Independence of Irrelevant Alternatives
    assumption (IIA)

60
Independence of irrelevant alternatives - IAA
  • H0 The IIA property is valid
  • H1 The IIA property is not valid
  • The H statistics (H stands for Hausman) follows a
    ?² distribution with M degree of freedom (M being
    the number of parameters)

61
STATA application the IIA test
  • H0 The IIA property is valid
  • H1 The IIA property is not valid

mlogtest, hausman
Omitted variable
62
Application de IIA
  • H0 The IIA property is valid
  • H1 The IIA property is not valid

mlogtest, hausman
We compare the parameters of the
model liquidation relative bought-out estimated
simultaneously with survival relative to
bought-out avec the parameters of the
model liquidation relative bought-out estimated
without survival relative to bought-out
63
Application de IIA
  • H0 The IIA property is valid
  • H1 The IIA property is not valid

mlogtest, hausman
The conclusion is that outcome survival
significantly alters the choice between
liquidation and bought-out. In fact for a
company, being bought-out must be seen as a way
to remain active with a cost of losing control on
economic decision, notably investment.
64
Multinomial Logit - IIA
  • Cramer-Ridder Test
  • Often you want to know whether certain
    alternatives can be merged into one
  • e.g., do you have to distinguish between
    employment states such as unemployment and
    nonemployment
  • The Cramer-Ridder tests the null hypothesis that
    the alternatives can be merged. It has the form
    of a LR test
  • 2(logLU-logLR)?²

65
Multinomial Logit - IIA
  • Derive the log likelihood value of the restricted
    model where two alternatives (here, A and N) have
    been merged

where log
is the log likelihood of the
restricted model, log
is the log likelihood
of the pooled model, and nA and nN are the
number of times A and N have been chosen
66
Exercise
  • use http//www.stata-press.com/data/r8/sysdsn3
  • tabulate insure
  • mlogit insure age male nonwhite site2 site3
About PowerShow.com