Loading...

PPT – Structure of the class PowerPoint presentation | free to download - id: 7c18d3-YTRmY

The Adobe Flash plugin is needed to view this content

Structure of the class

- The linear probability model
- Maximum likelihood estimations
- Binary logit models and some other models
- Multinomial models

The Linear Probability Model

The linear probability model

- When the dependent variable is binary (0/1, for

example, Y1 if the firm innovates, 0 otherwise),

OLS is called the linear probability model.

- How should one interpret ßj? Provided that

E(uX)0 holds true, then

- ß measures the variation of the probability of

success for a one-unit variation of X (?X1)

Limits of the linear probability model

- Non normality of errors
- Heteroskedastic errors
- Fallacious predictions

Overcoming the limits of the LPM

- Non normality of errors
- Increase sample size
- Heteroskedastic errors
- Use robust estimators
- Fallacious prediction
- Perform non linear or constrained regressions

Persistent use of LPM

- Although it has limits, the LPM is still used
- In the process of data exploration (early stages

of the research) - It is a good indicator of the marginal effect of

the representative observation (at the mean) - When dealing with very large samples, least

squares can overcome the complications imposed by

maximum likelihood techniques. - Time of computation
- Endogeneity and panel data problems

The LOGIT/PROBIT Model

Probability, odds and logit/probit

- We need to explain the occurrence of an event

the LHS variable takes two values y01. - In fact, we need to explain the probability of

occurrence of the event, conditional on X P(Yy

X) ? 0 1. - OLS estimations are not adequate, because

predictions can lie outside the interval 0

1. - We need to transform a real number, say z to ?

-88 into P(Yy X) ? 0 1. - The logit/probit transformation links a real

number z ? -88 to P(Yy X) ? 0 1.It is

also called the link function

Binary Response Models Logit - Probit

- Link function approach

Maximum likelihood estimations

- OLS can be of much help. We will use Maximum

Likelihood Estimation (MLE) instead. - MLE is an alternative to OLS. It consists of

finding the parameters values which is the most

consistent with the data we have. - The likelihood is defined as the joint

probability to observe a given sample, given the

parameters involved in the generating function. - One way to distinguish between OLS and MLE is as

follows

OLS adapts the model to the data you have you

only have one model derived from your data. MLE

instead supposes there is an infinity of models,

and chooses the model most likely to explain your

data.

Likelihood functions

- Let us assume that you have a sample of n random

observations. Let f(yi ) be the probability that

yi 1 or yi 0. The joint probability to

observe jointly n values of yi is given by the

likelihood function

- Logit likelihood

Likelihood functions

- Knowing p (as the logit), having defined f(.), we

come up with the likelihood function

Log likelihood (LL) functions

- The log transform of the likelihood function (the

log likelihood) is much easier to manipulate, and

is written

Maximum likelihood estimations

- The LL function can yield an infinity of values

for the parameters ß. - Given the functional form of f(.) and the n

observations at hand, which values of parameters

ß maximize the likelihood of my sample? - In other words, what are the most likely values

of my unknown parameters ß given the sample I

have?

Maximum likelihood estimations

The LL is globally concave and has a maximum. The

gradient is used to compute the parameters of

interest, and the hessian is used to compute the

variance-covariance matrix.

However, there is not analytical solutions to

this non linear problem. Instead, we rely on a

optimization algorithm (Newton-Raphson)

You need to imagine that the computer is going to

generate all possible values of ß, and is going

to compute a likelihood value for each (vector of

) values to then choose (the vector of) ß such

that the likelihood is highest.

Binary Dependent Variable Research questions

- We want to explore the factors affecting the

probability of being successful innovator (inno

1) Why?

Logistic Regression with STATA

? Instruction Stata logit

logit y x1 x2 x3 xk if weight , options

- Options
- noconstant estimates the model without the

constant - robust estimates robust variances, also in case

of heteroscedasticity - if it allows to select the observations we want

to include in the analysis - weight it allows to weight different

observations

Interpretation of Coefficients

- A positive coefficient indicates that the

probability of innovation success increases with

the corresponding explanatory variable. - A negative coefficient implies that the

probability to innovate decreases with the

corresponding explanatory variable. - Warning! One of the problems encountered in

interpreting probabilities is their

non-linearity the probabilities do not vary in

the same way according to the level of regressors - This is the reason why it is normal in practice

to calculate the probability of (the event

occurring) at the average point of the sample

Interpretation of Coefficients

- Lets run the more complete model
- logit inno lrdi lassets spe biotech

Interpretation of Coefficients

- Using the sample mean values of rdi, lassets, spe

and biotech, we compute the conditional

probability

Marginal Effects

- It is often useful to know the marginal effect of

a regressor on the probability that the event

occur (innovation) - As the probability is a non-linear function of

explanatory variables, the change in probability

due to a change in one of the explanatory

variables is not identical if the other variables

are at the average, median or first quartile,

etc. level.

Goodness of Fit Measures

- In ML estimations, there is no such measure as

the R2 - But the log likelihood measure can be used to

assess the goodness of fit. But note the

following - The higher the number of observations, the lower

the joint probability, the more the LL measures

goes towards -8 - Given the number of observations, the better the

fit, the higher the LL measures (since it is

always negative, the closer to zero it is) - The philosophy is to compare two models looking

at their LL values. One is meant to be the

constrained model, the other one is the

unconstrained model.

Goodness of Fit Measures

- A model is said to be constrained when the

observed set the parameters associated with some

variable to zero. - A model is said to be unconstrained when the

observer release this assumption and allows the

parameters associated with some variable to be

different from zero. - For example, we can compare two models, one with

no explanatory variables, one with all our

explanatory variables. The one with no

explanatory variables implicitly assume that all

parameters are equal to zero. Hence it is the

constrained model because we (implicitly)

constrain the parameters to be nil.

The likelihood ratio test (LR test)

- The most used measure of goodness of fit in ML

estimations is the likelihood ratio. The

likelihood ratio is the difference between the

unconstrained model and the constrained model.

This difference is distributed c2. - If the difference in the LL values is (no)

important, it is because the set of explanatory

variables brings in (un)significant information.

The null hypothesis H0 is that the model brings

no significant information as follows - High LR values will lead the observer to reject

hypothesis H0 and accept the alternative

hypothesis Ha that the set of explanatory

variables does significantly explain the outcome.

The McFadden Pseudo R2

- We also use the McFadden Pseudo R2 (1973). Its

interpretation is analogous to the OLS R2.

However its is biased doward and remain generally

low. - Le pseudo-R2 also compares The likelihood ratio

is the difference between the unconstrained model

and the constrained model and is comprised

between 0 and 1.

Goodness of Fit Measures

Constrained model

Unconstrained model

Other Binary Choice models

- The Logit model is only one way of modeling

binary choice models - The Probit model is another way of modeling

binary choice models. It is actually more used

than logit models and assume a normal

distribution (not a logistic one) for the z

values. - The complementary log-log models is used where

the occurrence of the event is very rare, with

the distribution of z being asymetric.

Other Binary Choice models

- Probit model
- Complementary log-log model

Likelihood functions and Stata commands

- Example
- logit inno rdi lassets spe pharma
- probit inno rdi lassets spe pharma
- cloglog inno rdi lassets spe pharma

Probability Density Functions

Cumulative Distribution Functions

Comparison of models

OLS Logit Probit C log-log

Ln(RD intensity) 0.110 0.752 0.422 354

3.90 3.57 3.46 3.13

ln(Assets) 0.125 0.997 0.564 0.493

8.58 7.29 7.53 7.19

Spe 0.056 0.425 0.224 0.151

1.11 1.01 0.98 0.76

BiotechDummy 0.442 3.799 2.120 1.817

7.49 6.58 6.77 6.51

Constant -0.843 -11.634 -6.576 -6.086

3.91 6.01 6.12 6.08

Observations 431 431 431 431

Absolute t value in brackets (OLS) z value for

other models. 10, 5, 1

Comparison of marginal effects

OLS Logit Probit C log-log

Ln(RD intensity) 0.110 0.082 0.090 0.098

ln(Assets) 0.125 0.110 0.121 0.136

Specialisation 0.056 0.046 0.047 0.042

Biotech Dummy 0.442 0.368 0.374 0.379

For all models logit, probit and cloglog,

marginal effects have been computed for a

one-unit variation (around the mean) of the

variable at stake, holding all other variables at

the sample mean values.

Multinomial LOGIT Models

Multinomial models

- Let us now focus on the case where the dependent

variable has several outcomes (or is

multinomial). For example, innovative firms may

need to collaborate with other organizations. One

can code this type of interactions as follows - Collaborate with university (modality 1)
- Collaborate with large incumbent firms (modality

2) - Collaborate with SMEs (modality 3)
- Do it alone (modality 4)
- Or, studying firm survival
- Survival (modality 1)
- Liquidation (modality 2)
- Mergers acquisition (modality 3)

Multinomial Logit - Idea

Multiple alternatives without obvious ordering

? Choice of a single alternative out of a number

of distinct alternatives

e.g. which means of transportation do you use to

get to work?

bus, car, bicycle etc.

? example for ordered structure how do you feel

today very well, fairly well, not too well,

miserably

(No Transcript)

Random Utility Model

- RUM underlies economic interpretation of discrete

choice models. Developed by Daniel McFadden for

econometric applications - see JoEL January 2001 for Nobel lecture also

Manski (2001) Daniel McFadden and the Econometric

Analysis of Discrete Choice, Scandinavian Journal

of Economics, 103(2), 217-229 - Preferences are functions of biological taste

templates, experiences, other personal

characteristics - Some of these are observed, others unobserved
- Allows for taste heterogeneity
- Discussion below is in terms of individual

utility (e.g. migration, transport mode choice)

but similar reasoning applies to firm choices

Random Utility Model

- Individual is utility from a choice j can be

decomposed into two components - Vij is deterministic common to everyone, given

the same characteristics and constraints - representative tastes of the population e.g.

effects of time and cost on travel mode choice - ?ij is random
- reflects idiosyncratic tastes of i and unobserved

attributes of choice j

Random Utility Model

- Vij is a function of attributes of alternative j

(e.g. price and time) and observed consumer and

choice characteristics.

- We are interested in finding ?, ?, ?
- Lets forget about z now for simplicity

RUM and binary choices

- Consider two choices e.g. bus or car
- We observe whether an individual uses one or the

other - Define

- What is the probability that we observe an

individual choosing to travel by bus? - Assume utility maximisation
- Individual chooses bus (y1) rather than car

(y0) if utility of commuting by bus exceeds

utility of commuting by car

RUM and binary choices

- So choose bus if

- So the probability that we observe an individual

choosing bus travel is

The linear probability model

- Assume probability depends linearly on observed

characteristics (price and time)

- Then you can estimate by linear regression

- Where is the dummy variable for mode

choice (1 if bus, 0 if car) - Other consumer and choice characteristics can be

included (the zs in the first slide in this

section)

Probits and logits

- Common assumptions
- Cumulative normal distribution function

Probit - Logistic function Logit

- Estimation by maximum likelihood

Multinomial Logit

- A discrete choice underpinning
- choice between M alternatives
- decision is determined by the utility level Uij,

an individual i derives from choosing alternative

j - Let
- where i1,,N individuals j0,,J alternatives

(1)

The alternative providing the highest level of

utility will be chosen.

Multinomial Logit

- The probability that alternative j will be chosen

is - In general, this requires solving

multidimensional integrals ? analytical solutions

do not exist

Multinomial Logit

Exception If the error terms eij in are assumed

to be independently identically standard

extreme value distributed, then an analytical

solution exists. In this case, similar to binary

logit, it can be shown that the choice

probabilities are

Likelihood functions

- Let us assume that you have a sample of n random

observations. Let f(yj ) be the probability that

yi j. The joint probability to observe jointly

n values of yj is given by the likelihood

function

- We need to specify function f(.). It comes from

the empirical discrete distribution of an event

that can have several outcomes. This is the

multinomial distribution. Hence

The maximum likelihood function

- The maximum likelihood function reads

The maximum likelihood function

- The log transform of the likelihood yields

Multinomial logit models

? Stata Instruction mlogit

mlogit y x1 x2 x3 xk if weight , options

- Options noconstant omits the constant
- robust controls for heteroskedasticity
- if select observations
- weight weights observations

Multinomial logit models

- use mlogit.dta, clear
- mlogit type_exit log_time log_labour entry_age

entry_spin cohort_

Goodness of fit

Base outcome, chosen by STATA, with the highest

empirical frequency

Interpretation of coefficients

The interpretation of coefficients always refer

to the base category

Does the probability of being bought-out decrease

overtime ?

No! Relative to survival the probability of being

bought-out decrease overtime

Interpretation of coefficients

The interpretation of coefficients always refer

to the base category

Is the probability of being bought-out lower for

spinoff?

No! Relative to survival the probability of being

bought-out is lower for spinoff

Multinomial Logit - Coefficients

Marginal Effects

Elasticities

? relative change of pij if x increases

by 1 per cent

Independence of irrelevant alternatives - IAA

- The model assumes that each pair of outcome is

independent from all other alternatives. In other

words, alternatives are irrelevant. - From a statistical viewpoint, this is tantamount

to assuming independence of the error terms

across pairs of alternatives - A simple way to test the IIA property is to

estimate the model taking off one modality

(called the restrained model), and to compare the

parameters with those of the complete model - If IIA holds, the parameters should not change

significantly - If IIA does not hold, the parameters should

change significantly

Multinomial logit and IIA

- Many applications in economic and geographical

journals (and other research areas) - The multinomial logit model is the workhorse of

multiple choice modelling in all disciplines.

Easy to compute - But it has a drawback

Independence of Irrelevant Alternatives

- Consider market shares
- Red bus 20
- Blue bus 20
- Train 60
- IIA assumes that if red bus company shuts down,

the market shares become - Blue bus 20 5 25
- Train 60 15 75
- Because the ratio of blue bus trips to train

trips must stay at 13

Independence of Irrelevant Alternatives

- Model assumes that unobserved attributes of all

alternatives are perceived as equally similar - But will people unable to travel by red bus

really switch to travelling by train? - Most likely outcome is (assuming supply of bus

seats is elastic) - Blue bus 40
- Train 60
- This failure of multinomial/conditional logit

models is called the - Independence of Irrelevant Alternatives

assumption (IIA)

Independence of irrelevant alternatives - IAA

- H0 The IIA property is valid
- H1 The IIA property is not valid

- The H statistics (H stands for Hausman) follows a

?² distribution with M degree of freedom (M being

the number of parameters)

STATA application the IIA test

- H0 The IIA property is valid
- H1 The IIA property is not valid

mlogtest, hausman

Omitted variable

Application de IIA

- H0 The IIA property is valid
- H1 The IIA property is not valid

mlogtest, hausman

We compare the parameters of the

model liquidation relative bought-out estimated

simultaneously with survival relative to

bought-out avec the parameters of the

model liquidation relative bought-out estimated

without survival relative to bought-out

Application de IIA

- H0 The IIA property is valid
- H1 The IIA property is not valid

mlogtest, hausman

The conclusion is that outcome survival

significantly alters the choice between

liquidation and bought-out. In fact for a

company, being bought-out must be seen as a way

to remain active with a cost of losing control on

economic decision, notably investment.

Multinomial Logit - IIA

- Cramer-Ridder Test
- Often you want to know whether certain

alternatives can be merged into one - e.g., do you have to distinguish between

employment states such as unemployment and

nonemployment - The Cramer-Ridder tests the null hypothesis that

the alternatives can be merged. It has the form

of a LR test - 2(logLU-logLR)?²

Multinomial Logit - IIA

- Derive the log likelihood value of the restricted

model where two alternatives (here, A and N) have

been merged

where log

is the log likelihood of the

restricted model, log

is the log likelihood

of the pooled model, and nA and nN are the

number of times A and N have been chosen

Exercise

- use http//www.stata-press.com/data/r8/sysdsn3
- tabulate insure
- mlogit insure age male nonwhite site2 site3