An Introduction to Regression with Binary Dependent Variables - PowerPoint PPT Presentation

About This Presentation
Title:

An Introduction to Regression with Binary Dependent Variables

Description:

An Introduction to Regression with Binary Dependent Variables Brian Goff Department of Economics Western Kentucky University Introduction and Description Examples of ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 28
Provided by: peopleWku5
Learn more at: http://people.wku.edu
Category:

less

Transcript and Presenter's Notes

Title: An Introduction to Regression with Binary Dependent Variables


1
An Introduction to Regression with Binary
Dependent Variables
  • Brian Goff
  • Department of Economics
  • Western Kentucky University

2
Introduction and Description
  • Examples of binary regression
  • Features of linear probability models
  • Why use logistic regression?
  • Interpreting coefficients
  • Evaluating the performance of the model

3
Binary Dependent Variables
  • In many regression settings, the Y variable is
    (0,1)
  • A Few Examples
  • Consumer chooses brand (1) or not (0)
  • A quality defect occurs (1) or not (0)
  • A person is hired (1) or not (0)
  • Evacuate home during hurricane (1) or not (0)
  • Other Examples?

4
Scatterplot of with Y(0,1) Y Hired-Not
Hired X Experience
Y
1
X
0
5
The Linear Probability Model (LPM)
  • If we estimate the slope using OLS regression
  • Hired a ??Income e
  • The result is called a Linear Probability Model
  • The predicted values are probabilities that Y
    equals 1
  • The equation is linear the slope is constant

6
Picture of LPM
Y
1
LPM Regression Line (slope coefficient)
Points on regression line represent predicted
probabilities For Y for each value of X
X
0
7
An Example Loan Approvals
Data Dependent Variable Loaned 1 if Loan
Approved, 0 if not Approved by Bank Z
Independent Variables ROA net income as of
total assets of applicant Debt debt as of
total assets of applicant Officer 1 if loan
handled by loan officer A and 0 if handled by
officer B
8
Scatterplot (Loaned NITA)
9
LPM Results
Coefficient on NITA implies 1 increase in ROA
increases Probability of loan by 2.2 (0.022)
10
LPM Weaknesses
  • The predicted probabilities can be greater than 1
    or less than 0
  • Probabilities, by definition, have max 1 min
    0
  • This is not a big issue if they are very close to
    0 and 1
  • The error terms vary based on size of X-variable
    (heteroskedastic)
  • There may be models that have lower variance
    more efficient
  • The errors are not normally distributed because Y
    takes on only two values
  • Creates problems for
  • More of an issue for statistical theorists

11
Predicted Probabilities in LPM Loans Model
In loan case, all of the predicted probabilities
fall within (0,1) range
12
(Binary) Logistic Regression or Logit
  • Selects regression coefficient to force predicted
    values for Y to be between (0,1)
  • Produces S-shaped regression predictions rather
    than straight line
  • Selects these coefficient through Maximum
    Likelihood estimation technique

13
Picture of Logistic Regression
Y
1
Logistic Regression (non-linear slope
coefficient)
Points on regression line represent predicted
probabilities For Y for each value of X
X
0
14
LPM Logit Regressions
  • LPM Logit Regressions in some cases provide
    similar answers
  • If few outlying X-values on upper or lower ends
    then LPM model often produces predicted values
    within (0,1) band
  • In such cases, the non-linear sections of the
    Logit regression are not needed
  • In such cases, simplicity of LPM may be reason
    for use
  • See following slide for an illustration

15
Example where LPM Logit Results Similar
Y
LP Model
1
X
0
16
LPM Logit Loan Case
  • In Loan example the results are similar
  • R-square 98 for regression of LPM-predicted
    probabilities Logit-predicted probabilities
  • Descriptive statistics for both probabilities
    appear below
  • The main difference is the LPM is max/min closer
    to 0 and 1

17
SPSS Logistic Regression Output for Loan Approval
Note The, instead of t-statistics, Wald
statistics are used to test whether the
Coefficients differ from zero the associated
p-values (Sig) have the same Interpretation as in
any other regression output
18
Interpreting Logistic Regression (Logit)
Coefficients
  • The slope coefficient from a logistic regression
  • (?) the rate of change in the "log odds" of
    the event under study as X changes one unit
  • What in the world does that mean?
  • We want to know the change in the probability of
    the event as X changes
  • In Logistic Regression, this value changes as
    X-changes (S-shape instead of linear)

19
Loan Example Effect of NITA on Probability of
LoanNITA coefficient (B) 0.11
20
Meaning?
  • At moderate probabilities (around 0.5) of getting
    a loan (corresponds to average NITA of about 5),
    the likelihood of getting a loan increases by
    2.75 for each 1 increase in NITA
  • This estimate is very close to the LPM estimate
    of 2.2
  • At the lower and upper extremes (NITA values -/
    teens), the probability changes by only about
    0.9 for a 1 unit increase in NITA

21
Alternative Methods of Evaluating Logit
Regressions
  • Statistics for comparing alternative logit
    models
  • Model Chi-Square
  • Percent Correct Predictions
  • Pseudo-R2

22
Chi-Square Test for Fit
  • The Chi-Square statistic and associated p-value
    (Sig.) tests whether the model coefficients as a
    group equal zero
  • Larger Chi-squares and smaller p-values indicate
    greater confidence in rejected the null
    hypothesis of no

23
Percent Correct Predictions
  • The "Percent Correct Predictions" statistic
    assumes that if the estimated p is greater than
    or equal to .5 then the event is expected to
    occur and not occur otherwise.
  • By assigning these probabilities 0s and 1s and
    comparing these to the actual 0s and 1s, the
    correct Yes, correct No, and overall correct
    scores are calculated.
  • Note subgroups for the correctly predicted is
    also important, especially if most of the data
    are 0s or 1s

24
Percent Correct Results
35 of loan rejected cases (0) were correctly
predicted
75 of all cases (0,1) were correctly predicted
94 of loan accepted cases (1) were correctly
predicted
Note The model is much better at predicting loan
acceptance than loan rejection this may serve
as a basis for thinking about additional
variables to improve the model
25
R2 Problems
Y
1
X
0
Notice that whether using LPM or logit, the
predicted values on the regression lines are not
near The actual observations (which are all
either 0 or 1). This makes the typical R-square
statistic of no value in assessing how well the
model fits the data
26
Pseudo-R2 Values
  • There are psuedo-R2 statistics that make
    adjustment for the (0,1) nature of the actual
    data two are listed above
  • Their computation is somewhat complicated but
    yield measures that vary between 0 and (somewhat
    close to) 1 much like the R2 in a LP model.

27
Appendix Calculating Effect of X-variable on
Probability of Y
  • Effect on probability of from 1 unit change in X
  • (?)(Probability)(1-Probability)
  • Probability changes as the value of X changes
  • To calculate (1-P) for a given X values
  • (1-P) 1/expa ??1X1 ?2X2
  • With multiple X-variables it is common to focus
    on one at a time and use average values for all
    but one
Write a Comment
User Comments (0)
About PowerShow.com