Title: Workshop on Statistical Mediation and Moderation: Statistical Mediation
1Workshop on Statistical Mediation and
ModerationStatistical Mediation
- Paul Jose
- Victoria University of Wellington
- 27 March, 2008
- SASP Conference
2What do you want to know?
- Lets briefly have each person state what he or
she would like to learn this morning. - Also, what is your level of statistical
knowledge/experience? - Okay, let me tell you what Im planning to cover.
3What am I doing today?
- I want to define mediation and moderation
- How are they similar or different?
- Basic mediation and moderation
- Advanced mediation and moderation
- Questions and answers
4Where does one start?
- I began to be interested in mediation and
moderation because I found that I was
increasingly using these approaches in
understanding process among variables. - I found that there was little about these
techniques in traditional statistics textbooksI
mostly obtained information through
word-of-mouth. - . . . and I was confused. I dont like being
confused, so I did something about it. I educated
myself on these techniques. And now I can pass on
what Ive learned. Let me list what I consider to
be the main sources of confusion.
5Five major sources of confusion
- Moderation and mediation sound alike. It makes it
seem that they are very similar, and or they
derive from the same origin. They are somewhat
similar (cousins), but they dont come from the
same place. - Second, statistics textbooks typically do not do
a very good job of explaining these two
approaches. Exception Howell (2006). - Third, reports of moderation and mediation in the
research literature are not always clear or
accurately performed.
6More confusion
- Both are special cases of two separate broad
statistical approaches mediation is a special
case of semi-partial correlations (path modeling)
and moderation is a special case of statistical
interactions (from ANOVA). Both are included
under GLM, but this is not usually appreciated. - Its not entirely clear what distinguishes a
moderating variable from a mediating variable.
Can one a priori define mediating and moderating
variables?
7One last stumbling block
- Problem there are no easily used statistics
programmes that compute mediation and moderation.
Can do analyses in SPSS and other programmes that
do regression, but there is no graphing
capability dedicated to either mediation or
moderation (except ModGraph and MedGraph). - What we have here is a case of the users getting
ahead of the statisticians in the sense that
researchers frequently use mediation and
moderation but many statisticians arent even
familiar with the terms.
8Background and history
- Most peoples awareness of this area comes from
this article - Baron, Reuben M. Kenny, David A.(1986). The
moderator-mediator variable distinction in social
psychological research Conceptual, strategic,
and statistical considerations.Journal of
Personality and Social Psychology. Vol 51(6), pp.
1173-1182. - Cited about 6,500 times by PsychInfos count. And
thats just in Psychology. - Most people are unclear about what they said and
how to perform the techniques.
9Lets get startedSimilarities and differences
- Similarities
- They both involve three variables
- You can use regression to compute both
- You wish to see how a third variable affects a
basic relationship (IV to DV). - Differences
- You create a product term in moderation not in
mediation - You dont have to centre anything in mediation
- Moderation can be used on concurrent or
longitudinal data, but mediation is best used on
longitudinal data. - Graphing is critical for moderation helpful for
mediation.
10How do you know if you have a moderator or a
mediator?
- Whats the diff?
- Moderators tend to be variables that are
relatively immune to change over time
(personality trait, gender, ethnic group, etc.). - Mediators tend to be variables that change in
relation to other variables (anxiety,
helpfulness, honesty, mood). - However, there is a class of variables (e.g.,
coping efforts/strategies) that might be examined
in both ways. These two categories are not
mutually exclusive.
11So lets focus on mediation first
- Definition A mediating variable is one which
specifies how (or the mechanism by which) a given
effect occurs between an independent variable
(IV) and a dependent variable (DV). (Holmbeck,
1997, p. 599). - The question you wish to answer is whether the
effect of the IV on the DV is at least partially
mediated by a third variable (MV). - You can answer this question with two regressions
(and a correlation matrix). - Lets consider a specific example.
12An example from my research
Stressor intensity
Depression
Rumination
13The theories
- Susan Nolen-Hoeksema believes that an individual
who ruminates more ends up more depressed. X gt
Y. Notice that its a causal statement. - I dont disagree with her, but I think that this
simple effect should be embedded within the
stress and coping context. - We know that stress leads to depression. The
question I want to ask is whether at least part
of the effect of stress on depression occurs
because certain individuals ruminate about
stressful events, and this rumination leads to
depression.
14The basic relationship
Stressor intensity
Depression
One must have a significant correlation between
the IV and DV (in fact among all 3 variables).
The essential question is whether by adding a
third variable, one can at least partially
explain the basic relationship. Lets look at
some real data.
15The two steps
Step 1
.45
Stressor intensity
Depression
Step 2
Stressor intensity
Depression
.45
(.29)
.51
.46
Rumination
(.32)
16Baron Kennys 4 criteria
- IV to MV must be significant
- IV to DV must be significant
- MV to DV must be significant (when entered with
the IV) - The effect of the IV on the DV must be less in
the third equation than the second. Perfect
mediation holds if the IV has no effect when the
mediator is controlled. - must be less is measured with the Sobel formula
(see following pages) - Perfect mediation occurs when the original
relationship goes to zero. This never happens in
psychology. I have a proposal for how to deal
with this issue, presented below.
17What changed?
- Note that the beta weight from IV to DV changed
from .45 to .29. - What does that tell us?
- According to Baron and Kenny (1986), if one
obtains a significant drop in beta for this
relationship, then one has obtained significant
mediation. - How can one test whether this is significant or
not? (It is not simply whether it goes from
significant to non-significant.) One needs to
compute the Sobels test - z-value ab/SQRT(b2sa2 a2sb2)
18Who ya gonna call?
- Many people have been using a web-site by
Preacher and Leonardelli, and its quite useful
for computing the Sobels statistic
http//www.psych.ku.edu/preacher/sobel/sobel.htm - Let me show you how to use the site. It is
generally very helpful. - I have invented my own programme to do what P
Ls site does, and MORE. Lets check it out too.
19Preparatory work
- Before we run off to use these, please know that
you have to obtain some statistical information
first - Compute a correlation matrix of the 3 variables
- Perform a multiple regression of the IV on the
mediating variable and - Perform a multiple regression of the IV and
mediator on the DV (simultaneous inclusion).
20Correlation matrix
21Results from the two regressions
1st regression (Stress on Rumination) B
7.501 (unstand regression coefficient) se .938 (
standard error) 2nd regression (Stress,
Rumination on Depression) You select the B and
se for the mediating variable here B .069 se
.016 new beta for Stress .288 new beta for
Rumination .317 (P L web-site needs the first
four values.)
22Okay, go to the programmes
- It is necessary to have written down the
pertinent statistical output, or to have printed
off the relevant sections. - Can do both programmes on the internet.
- If youre away from the internet you can download
the Excel macro of MedGraph and run it whenever
you want.
23MedGraph output
24Comparison of web-sites
- Preachers site has been around longer, it allows
variations on the Sobel formula, and gives you an
alternate way to compute the Sobels t. - My site results in a graphical presentation of
results, I think its harder to make mistakes
with my programme, and it has/will have
information about the type of mediation.
25My criteria for type of mediation
- At present my programme stipulates
- None non-significant Sobels z-value
- Partial significant Sobels and significant
basic relationship in the 2nd regression (IV to
DV) - Full significant Sobels and non-significant
basic relationship in the 2nd regression (IV to
DV) - Dave Kenny argues against this (see his
web-site), and I tend to agree with him now. My
new approach is on the following page.
26What kind of mediation?
- None non-significant Sobels z-value
- Partial significant Sobels and ratio lt .80.
(ratio is indirect/total in this case its
.161/.449) - Full significant Sobels and ratio gt .80
- --------------------------------------------------
---- - In the present case we have a significant Sobels
and ratio .36. Thus, we have partial mediation.
Notice that I dont use the term perfect
mediation. There is no consensus on the
partial/full mediation issue.
27Causal finding?
- Many researchers would be keen to argue from this
result that the experience of stress leads to
rumination, which in turn partially leads to
depressive symptoms, i.e., a causal argument. Is
this merited? - Cole and Maxwell (2003) argue strenuously that
concurrent mediation CANNOT support a causal
statement. They argue that few concurrent
mediation results actually turn out to hold up in
longitudinal data. What do they mean?
28Shared and unique variance
Stress
Depressive symptoms
Basic relationship is just a correlation between
two variables.
29Three variables mediation
Direct effect
Stress
Depressive symptoms
Indirect effect
Rumination
The green area indicates the degree of shared
variance among the three variables thats the
size of the indirect effect. It is hard to
argue that these relationships are causal with
these data they are the size of shared and
unique variance.
30Warnings!
- One must have all three correlations be
significant before launching this. K now suggests
that 1st one may be optional. - Be sure that you do the regressions correctly,
and that you are taking the proper statistical
information from the print-outs (B vs. b). - Some people make causal arguments from these
results. They are shaky at best. - Types of specification error 1) ordering of
variables, 2) variables with/without error, and
3) third variable problem - Longitudinal data are best.
- Bootstrapping is best with small N samples.
- Path models involving more than three variables
is the general casedont do a bunch of
three-variable mediation analyses when you can do
one path model.
31Specification error
- Major boogeyman in path model analytic work have
you correctly specified your model? - Several issues here
- Temporal order of variables
- Variables measured with error
- Missing variable?
32Why is your proposed model the best?
Rumination
Stress intensity
Depressive symptoms
There are exactly 6 combinations of any three
variableswhy is your proposed model the best?
Why not test all of them? I have, and in the
present case I find six instances of partial
mediation. Which is correct? They all tell us
something useful about shared and unique variance.
33Variables measured with error
- One can obtain biased estimates of the indirect
effect if the MV is measured with significant
error. (Same is true of the IV and DV too, by the
way.) - Answer? Do mediation in a latent variable path
model in SEM. Possible but not easy. Also, a lot
of the times one doesnt have a sufficient N or
multiple indicators of the variables (3
indicators per variable). Would look like this
34Latent variable path model
Stress intensity
Depression
.30
(.20)
.40
.24
Rumination
Indirect effect .10 direct effect .20 ratio
.33 (.36 in MR)
35Missing variable?
- This is the old third variable problem, but in
this case we might wish to call it the fourth
variable problem. - My student, Kirsty Weir, suggests that
anxiety/worry might explain the relationship
between rumination and depression. Graph is on
the following page. - One can never completely resolve this question
include the likely candidates and try to reject
them.
36The road from stress to depression
Note that the Rum to Dep path was removed because
it was non- significant when we added the 4th
variable (control). Is the 3-variable mediation
pattern wrong then?
37Bootstrapping
- David MacKinnon and others have argued that
typical multiple regression analysis is unbiased
only for large samples. (present case N 575) - They suggest
- Large sample use MR
- Small sample use bootstrapping
- What is bootstrapping?
38Wave of the future
- Bootstrapping is a compilation of regression
results from many subsets of the original
dataset. - The programme selects a subset of the data (e.g.,
50 from 100 participants), runs the regression
analysis, stores the result, does it again and
again up to a predetermined number of times, and
then compiles the results of the repeated
analyses. - Baron Kenny didnt mention thiswasnt used in
1986 very much at all. It is performed now, but
infrequently. It is the wave of the future.
39So how does one do this?
- If you toddle off to SPSS to do this, you will be
disappointed. Although it can perform
bootstrapping, it is not set up to do mediation
bootstrapping. - Preacher and Hayes (see the Preacher web-site on
mediation) offers two different macros SAS and
SPSS. Download it and use it within SPSS. (not
easy) - Lets look at the results of the SPSS macro.
40Macro output
Run MATRIX procedure DIRECT AND TOTAL EFFECTS
Coeff s.e. t
Sig(two) b(YX) .3934 .0288
13.6685 .0000 b(MX) 1.0412 .0691
15.0779 .0000 b(YM.X) .1369 .0165
8.3200 .0000 b(YX.M) .2508 .0322
7.8002 .0000 INDIRECT EFFECT AND
SIGNIFICANCE USING NORMAL DISTRIBUTION
Value s.e. LL 95 CI UL 95 CI Z
Sig(two) Sobel .1426 .0196
.1042 .1810 7.2723
.0000 BOOTSTRAP RESULTS FOR INDIRECT EFFECT
Mean s.e. LL 95 CI UL 95 CI LL
99 CI UL 99 CI Effect .1434 .0239
.1001 .1939 .0879 .2113 SAMPLE SIZE
575 NUMBER OF BOOTSTRAP RESAMPLES 2000
Its telling us that the indirect effect was
significantagrees with the multiple regression
result, but this is an unbiased estimate. (z
3.80 before)
41Mediation with longitudinal data
- . . . is very complicated but is very
illuminating. - Much of structural equation modelling (SEM) is
devoted to trying to understand mediational
models. - Path modelling with longitudinal data is hard to
do but will generate very interesting and
interpretable results. - One should obtain the same variables at different
times of measurement to allow residualisation.
42Hierarchical multiple regression
Time 1
Time 2
Rum
2nd step
Dep
Dep
1st step
This is N-Hs hypothesis Rum1 should explain
unique variance in Dep2 after Dep1 is entered,
i.e., explaining new variance in the residual.
43Back to Venn diagrams,but with a difference
Dep2
Dep1
Stability coefficient typically medium to large.
The purple area is the residual variance. It
represents the change in depression over this
time period. The overlapping area refers to the
stability of depression over this time period.
44Does Rum1 predict any of the residual?
Dep1
Dep2
Rum1
The red area is the amount of variance in Dep2
explained by Rum1, i.e., the degree to which Rum1
explains change in depression over time.
45So whats the answer?
- Perform a hierarchical regression
- IV DV
- Dep1 Dep2
- Rum1
- I found that N-Hs hypothesis was not supported
Rum1 did not explain any of the residual of Dep2
after Dep1 was entered.
.72
.05ns
46This is what it looks like
Dep1
Dep2
Rum1
Although Dep1 and Rum1 are significantly
correlated, Rum1 doesnt explain much new
variance in Dep2 above and beyond what Dep1 can
do.
47The other direction
- IV DV
- Rum1 Rum2
- Dep1
- This result suggests that depression may
contribute to rumination over a 3-month period of
time, but not the other way around. - It is recommended that you perform a path
analysis in SEM for this type of analysis allows
for concurrent correlation (see next page).
.64
.08
48Two time points
Time 1
Time 2
Rum
Rum
Dep
Dep
SEM computes all of these relationships
simultaneously, allowing one to identify the
unique relationships. Enact in LISREL, EQS, AMOS,
etc. What did I find?
49Same basic results
Time 1
Time 2
.63
Rum
Rum
.08
.47
.43
Dep
Dep
.74
But you get model fit indices, modification
indices, and so forth . . . I deleted the Rum1 to
Dep2 path because it was non-significant.
50Three time points and three variables
Time 1 Time 2
Time 3
Stress
Stress
Stress
my hypoth
?
Rum.
Rum.
Rum.
N-H
MR
Dep.
Dep.
Dep.
51SEM yielded this result
.74
Stress
Stress
Stress
.59
.11
Rum.
Rum.
Rum.
.61
.59
.08
Dep.
Dep.
Dep.
.72
.51
52Powerful but hard to do
- Need to have three times of measurement
reasonably spread out so that stability
coefficients are not too high. - Need to have good measures (small measurement
error) or do latent variable longitudinal path
modelling. - This type of test of mediation is very stringent
because it occurs over time and must be strong
enough to exist against the backdrop of the
stability coefficients, i.e., these residualised
effects explain change in other variables.
53Back to types of mediation
- Why do I think in terms of null, partial, and
full mediation? - Because SEM-based path models yield those three
possible patterns. - Sociological point basic mediation (e.g., BK)
is rooted in multiple regression where issues of
model specification are not salient. On the other
hand, if you learn SEM, then you will think in
terms like Ive enunciated above. Confusions
occur because of the anachronisms in the field of
mediation (harkening back to MR rather than
embracing path modelling).
54Lets bring mediation to a close
- Ive covered many powerful techniques that derive
from the basic mediation paradigm. - Remaining issues
- Logistic mediation
- Mediation in other contexts HLM
- Still much to learn and master, but this is a
good start.