Title: Testing and Interpreting Mediational and Moderational Models in Logistic Regression
1Testing and Interpreting Mediational and
Moderational Models in Logistic Regression
- A practical talk
- Kevin M. King, M.A.
2Goals
- Provide a basic understanding of logistic
regression - Demonstrate how to compute standardized
coefficients using Excel to compute the indirect
and direct effect for mediational models in
logistic regression (e.g. MacKinnon Dwyer,
1993) - Demonstrate how to use Excel to create graphs of
logistic regression interactions
3A caveat
- I am not a quantitative expert.
- Im just a guy whos worked with these models
quite a bit and has some shortcuts to share. - To really understand whats going on, you really
need to understand logistic regression, which you
wont find here. Take a course or read a book. - A great (and cheap) resource for learning
logistic regression is the Sage Primer by Pampel
(2000)
4Prediction of Dichotomous Variables
- Many variables are dichotomous (e.g. death,
psychological diagnosis, participation in an
intervention) and very relevant. - It is desirable to apply the same modeling
framework that we use to predict continuous
outcomes to dichotomous outcomes
5Difficulties
- We often think in terms of continuous prediction
- i.e.
- How does a personality lead to drug use disorder
- How is income related to participation
- Yet we cant talk about 1 to 1 relations, where
5,000 more of income leads to 1 more
participation, or 1 higher impulsivity leads to 1
more drug use disorder
6Continuous Distribution
7Risk and Odds
- We must translate our thinking to binary outcomes
and talk about risk and odds, to describe the
probability of being in one state or another. - These probabilities are not distributed linearly
- E.g.
- relation of of pubs to tenure
- Relation of of alcoholic relatives to alcoholism
8Binary Distributions
9Probability Curve (Sigmoid Curve)
10Estimation of Logistic Regression-and
implications for mediation-
- Logistic Regression uses a transformation of the
probability curve, called the logit (logged
odds), which linearizes the probability curve. - Logitln( Pi /(1-Pi) ), Odds Pi /(1-Pi),
Probability Pi - But since we observe the actual presence or
absence, rather than its probability, we cant
use OLS procedures - Error term is non-normally distributed
- Error variances are non-equal across values of IV
- (i.e. distribution is inherently nonlinear)
11Probability vs. Odds vs. Logit
- Probability Pi chance of occurrence
- Odds Pi /(1-Pi) odds of occurring vs. not (e.g.
men are twice as likely to be diagnosed with
alcoholism) - Logitln( Pi /(1-Pi) ) log of the odds,
otherwise un-interpretable - Standardizing variables (Z-scores) by dividing by
their SE helps interpretation
12Maximum Likelihood Estimation
- Maximum likelihood estimation (MLE) is used to
estimate these unobserved logged odds. - An estimation procedure that tries to best
reproduce the covariance matrix of the data given
the model - In predicting binary outcomes, the initial best
guess is the proportion in the population. - Each explanatory variable that is added to the
model improves model fit and heightens the chance
of accurate prediction of binary class membership
(e.g. correctly predicting diagnosis). - Because Y is unobserved in logistic regression,
its variance is also unobserved. In order to
estimate the model, the variance of the residual
is fixed to p2/3 in logistic regression. - The scale in logistic regression depends on the
extent of prediction that depends on the
variables in the model (MacKinnon Dwyer, 1993,
p. 150) - Thus the coefficients in each model are scaled
according to the explanatory power of the other
coefficients in the model and p2/3.
13Mediational ModelsApplication to Logistic
Regression
- Mediation is where a predictor has an indirect
effect on an outcome through a third variable.
This indirect effect accounts for some to all of
the main effect of the predictor on the outcome
(see Baron and Kenny, 1988 ) - Mediation can be tested through measuring the
impact of the mediator on t (e.g. t-t) or
through testing the significance of the indirect
effect (ab) - See Shrout Bolger, 2002 and MacKinnon, Krull
Lockwood, 2000 for good discussions of these
methods)
Mediator
a
b
Predictor
Outcome
t
14The Rub
- In logistic regression, a and b and t and t
dont come from the same equations, which means
that when the outcome or mediator is binary they
do not have the same scale. - Thus one cannot compute the mediation effect
either through ab or t-t
Mediator
a
b
Predictor
Outcome
t
15The Solution
- The solution is to standardize the coefficients,
as MacKinnon and Dwyer (1993) recommend. - Because the variance of the outcome is dependent
on the variables in the model plus p2/3, we can
estimate the variance of the outcome and use it
to standardize the coefficients.
16The Formula
- Variance of Outcome
- s2(O) b2s2(M) t2 s2(P) 2bts(PM) p2/3
- This means
- Variance of outcome (coefficient for Mediator
squaredvariance of the Mediator) (coefficient
of Predictor squared variance of Predictor)
(2coeff. of med. coeff. of pred. covariance
of P and M) pi squared/3. - This can be expanded to include any number of
covariates. Each variable must be included both
as its coeffcient squared by its variance and a
term for its coefficient times each other
coefficient in the mode times their covariance - To standardized coefficents
- b b/ s2(O)
17An Example The mediating effect of behavioral
undercontrol on the relation between parental
alcoholism and drug use disorder (from King
Chassin, 2004)
Table 2 Logistic Regression Predicting Drug
Diagnosis from Parental Alcoholism and Behavioral
Undercontrol
Note.plt.001
Undercontrol
0.27
0.61
Parent Alc.
Offspring Drug Disorder
0.54
18Standardization of coefficients
- Using the coefficients and the variance-covariance
matrix of the variables in the equation, we can
easily fill in the values of the formula. - Steps
- Get the variance-covariance matrix for all
variables in the model, paste into Excel - Make a table of all coefficients and SEs from
model results - Using the table of coefficients, make a
variance-covariance like table (where the
on-diagonal is the coefficient squared and the
off diagonal is 2ab) - Combine the variance-covariance table and the new
table of coefficients my multiplying matching
cells - Sum the new combined table and add p2/3.
- Use this outcome variance to standardize b and t
by dividing each coefficient by the outcome
variance
19Standardized Model
Undercontrol
0.27
0. 15
Parent Alc.
Offspring Drug Disorder
0.12
Proportion Mediatedb/(tb)
20Moderation in Logistic Regression Interpreting
Coefficients and Graphing Interactions
- Testing interactions in logistic regression is
similar to OLS regression methods, in that one
includes an interaction term in the model
predicting a binary outcome. - Interactions can also be probed using Aiken
Wests method (test at 1 SD above and below the
mean). - Centering is just as important as in OLS
regression, and standardizing variables will also
aid in interpretation of coeffcients - Present model shows maternal support moderating
the relation between behavioral undercontrol and
risk for young adult drug use disorder.
Support
-0. 61
0.40
Undercontrol
Offspring Drug Disorder
0.85
21Interpreting Coefficients and Graphing
Interactions An Example (from King Chassin,
2004)
- An Example Behavioral undercontrols effect on
drug use disorder is moderated by parental
support.
Table 5 Logistic Regression Predicting Drug
Diagnosis From Behavioral Undercontrol and
Parental Support
Note. B the unstandardized logistic regression
coefficient. plt.05, plt.01, plt.001
22The OLS Extension
- Probe the interaction at 1 SD above and below the
mean of the moderated variable to obtain
coefficients and intercepts. - Plot these coefficients across a range of data
points of the moderated variable (remember,
youve standardized your predictors for easy
interpretation).
23The Logistic Twist
- The previous graph is in terms of the logit. Its
good for helping us understand the nature of the
interaction (in this case protective but
reactive) - However, it fails to give us a sense of whats
really happening in terms of how the predicted
probabilities differ across levels of the
moderator - Thus, we need to transform the coefficients to
the odds or probability to create interpretable
graphs - Pe(logit)/(1e(logit)), Oddse(logit)
24Odds and Probability Metric
25Interpreting with REAL values
- While we may see the shape of the probability or
odds function in the above graphs, note that they
extend out to 6.5 SD above the mean for
undercontrol! - Its important to display your interactions where
there is real data. - To do this, you can run your moderational model
in SPSS and save out the predicted probabilities
for each participant as a variable. See code
below for an example.
LOGISTIC REGRESSION VARc4drugdx /METHODENTER
rgrp paranti rgen zc3age /METHODENTER zunder
zc3ss /METHODENTER unbyks /CRITERIA PIN(.05)
POUT(.10) ITERATE(20) CUT(.5) /save pred
/CLASSPLOT.
26Putting it all together
- Take the predicted probabilities from SPSS and
move them next to the participants scores on the
moderated variable (e.g. undercontrol). - Select both columns, copy and past into Excel.
- Select the predicted probabilities and the model
implied probabilities and graph - I use a scatter plot in Excel. Using the chart
wizard - For the X values of the predicted probabilities,
select the actual values of the moderated
variable - For the model implied, select the column of
values used to make the simple slope graph (e.g.
-1.5 SD to 1.5 SD, etc)
27(No Transcript)