Title: Causal Diagrams: Directed Acyclic Graphs to Understand, Identify, and Control for Confounding
1Causal Diagrams Directed Acyclic Graphs to
Understand, Identify, and Control for Confounding
- Maya Petersen
- PH 250B 11/03/04
2What is causation?
- Ex We observe a high degree of association
between carrying matches and lung cancer - Can we infer that carrying matches causes lung
cancer? - The counterfactual definition of causation
- Carrying matches is a cause of lung cancer if
the risk of lung cancer is higher in people who
carry matches than it would be if these exact
same people did not carry matches
3Causal diagrams
- Intuitive approach to representing our
assumptions about causal relationships - Provide relatively straightforward tool for
relating observed statistical associations and
causal effects - What do we need to know (or assume) before we can
infer that an exposure causes a disease, and get
an unbiased estimate of this effect?
4Causal diagrams
- Today will focus on
- How to draw a causal diagram
- Use of causal diagrams to decide
- Is confounding present?
- What should we adjust for to get an unbiased
estimate of effect? - Causal diagrams to illustrate a situation where
the traditional approach to controlling
confounding (i.e. multivariable adjustment) fails
5 Ex . Constructing a Causal Diagram
- We are interested in the effect of maternal
multivitamin use on birth defects, and make the
following causal assumptions - Prenatal care (PNC) leads to an increase in
vitamin use (as a result of intervention and
education.) - Prenatal care protects against birth defects in
ways other than by increasing vitamin use . - Difficulty conceiving may cause a woman to seek
out PNC once she becomes pregnant - Maternal genetics that lead to difficulty
conceiving can also lead to birth defects. - Socio-economic characteristics directly affect
both access to PNC and use of vitamins
6Ex Constructing a Causal Diagram
Difficulty conceiving
SES
Maternal genetics
Pre-Natal Care
Vitamins
Birth Defects
7Directed Acyclic Graph (DAG) construction Basics
- Direct causal relationships between variables are
represented by arrows - All causal relationships have a direction,
because any given variable cannot be
simultaneously a cause and an effect (Directed) - There are no feedback loops ( Acyclic)
- There can be no feedback loops because causes
always precede their effects - To avoid feedback loops, extend graph over time
Malnutrition
Malnut. (t0)
Malnut. (t1)
Infection
Infect. (t0)
Infect. (t1)
8Directed Acyclic Graph (DAG) construction
Terminology
- Parent Child
- Directly connected by an arrow (No intermediates)
- Pre-Natal care is a parent of birth defects
- Birth defects is a child of Pre-natal care
- Ancestor Descendant
- Connected by a directed path of a series of
arrows - SES is an ancestor of Birth Defects
- Birth Defects is a descendant of SES
Difficulty conceiving
SES
Maternal genetics
Pre-Natal Care
Vitamins
Birth Defects
9Directed Acyclic Graph (DAG) construction
Assumptions
- Not all intermediate steps between two variables
need to be represented (depends on level of
detail of the model) - Ex can represent the effect of smoking on lung
cancer as - Smoking -gt Cancer or
- Smoking -gt tar -gt mutations -gt Cancer
- Absence of a directed path from X to Y implies
that X has no effect on Y
10Directed Acyclic Graph (DAG) construction
Assumptions
- DAGs assume that all common causes of exposure
and disease of interest are included in causal
diagram - If common causes are unknown, or cannot be
observed, they must still be included - Ex
Unmeasured characteristics (religious beliefs,
culture, lifestyle, etc.)
Alcohol Use
Heart Disease
Smoking
11Ex What assumptions does the DAG we constructed
make?
- SES has no effect on difficulty conceiving
- Difficulty conceiving has no effect on maternal
vitamin use, other than through its effect on
seeking prenatal care - SES has no effect on birth defects other than via
its effects on access to prenatal care and on
vitamin use - There are no additional common causes of vitamin
use and birth defects - Etc
Difficulty conceiving
SES
Maternal genetics
Pre-Natal Care
Vitamins
Birth Defects
12- Back to our basic problem
- What can we say about causal effects, based on
the associations we observe in our data? - Associations between exposure and disease in our
crude data can arise in several ways
13Crude (unadjusted) associations in our
observational data 1) Exposure causes disease
- A crude association between smoking and cancer
could be due to - Smoking -gt Cancer
- Smoking -gt tar -gt mutations -gt Cancer
- Adjusting for an intermediate in the causal
pathway between exposure and disease removes any
association that results from that pathway - In the DAG above, if we control for tar levels,
we will block the association between smoking and
cancer -
- Smoking tar mutations Cancer
- By adjusting for the effects of the exposure, we
will no longer be able to study them
14Crude (unadjusted) associations in our
observational data 2) Exposure and disease
share a common cause
- A crude association between matches and cancer
could be due to - Matches have no causal effect on cancer, but the
two are associated because they have a common
cause (smoking) - This is a classic example of confounding
- By adjusting for the common cause, association is
eliminated - Matches are no longer associated with cancer
after we stratify on smoking - This is what we do when we adjust for a
confounder
Smoking
Matches
Cancer
15Yet again- What is confounding?
- If the crude association between exposure and
disease is unconfounded, then - All of the association we see between exposure
and disease is due to the effect of exposure on
disease - None of the association between exposure and
disease is due to common causes that they share.
(confounding) - In other words If exposure has no effect on
disease, would we still expect to observe an
association in our data? - If yes -gt confounding is present
16How can we use a DAG to check for presence of
confounding?
- Remove all direct effects of the exposure
- These are the effects we are interested in. We
want to see if, in their absence, an association
is still present. - Check whether disease and exposure share a common
cause (ancestor) - Does any variable connect E and D by following
only forward pointing arrows? - If E and D have a common cause -gt confounding is
present - Any common cause they share will lead to an
association between E and D that is not due to
the effect of E on D
17Vitamins and Birth Defects Is confounding
present?
- Remove all direct effects of vitamin use
- Do exposure and disease share a common cause
(ancestor)?
Difficulty conceiving
SES
Maternal genetics
Pre-Natal Care
Vitamins
Birth Defects
18How can we use a DAG to decide what variables to
control for in our analysis?
- We want to choose a set of variables that, when
adjusted for, will give us an unconfounded
estimate of the effect of exposure on disease - In other words, if the exposure had no effect on
disease, after adjusting for these variables,
exposure and disease will no longer be associated
19How can two variables become associated?
- Review A crude (unadjusted) association between
exposure (E) and disease (D) can be due to - Causal pathway from E to D (or vice versa)
- E -gt D or E -gt x -gt y -gt D
- Common cause of E and D
-
- By adjusting (or stratifying) on a third
variable, it is possible to introduce a new
source of non-causal association (confounding)
between E D - As we begin to adjust for variables in attempt to
control for confounding, we must take this
potential source of association into account
C
D
E
20Adjusting for a common effect of two variables
will induce a new association between them (Even
if they were unassociated before adjusting)
- Ex
- Being on a diet does not cause cancer (or vice
versa), and dieting and cancer share no common
causes In our crude data, diet and cancer will
not be associated - Whether or not an individual was on a diet does
not tell us anything about whether or not he/she
has cancer. - If we stratify on weight loss, we can create a
new association between dieting and cancer - Within the strata of people who lost weight, if
we know an individual was on a diet, it tells us
that he/she is less likely to have cancer
(dieting provides an alternate explanation for
weight loss).
Weight-loss diet
Cancer
Weight Loss
21Using a DAG to decide what variable to adjust for
in analysis
- Ex 1 Is adjusting for prenatal care sufficient
to control for confounding of the effect of
vitamin use on birth defects?
22Using a DAG to decide what to adjust for in
analysis
- Step 1 Is prenatal care caused by vitamin use?
If yes, we should not adjust for it. - Do not adjust for an effect of the exposure of
interest
Difficulty conceiving
SES
Pre-Natal Care
Maternal genetics
Vitamins
Birth Defects
23Using a DAG to decide what to adjust for in
analysis
- Step 2 Delete all non-ancestors of vitamin use,
birth defects, and pre-natal care - If a variable is not an ancestor of vitamin use
or birth defects, it cannot be a common cause,
and so cannot be a source of crude association
between them - If a variable is not an ancestor of prenatal
care, new associations with that variable can not
be created by adjusting for prenatal care
Difficulty conceiving
SES
Maternal genetics
Pre-Natal Care
Vitamins
Birth Defects
24Using a DAG to decide what to adjust for in
analysis
- Step 3 Delete all direct effects of Vitamins
- These are the effects we are interested in. We
want to see if, in their absence, an association
is still present. If it is, we still have
confounding.
Difficulty conceiving
SES
Pre-Natal Care
Maternal genetics
Vitamins
Birth Defects
25Using a DAG to decide what to adjust for in
analysis
- Step 4 Connect any two causes sharing a common
effect - Adjustment for the effect will result in
association of its common causes
Difficulty conceiving
SES
Pre-Natal Care
Maternal genetics
Vitamins
Birth Defects
26Using a DAG to decide what to adjust for in
analysis
- Step 5 Strip arrow heads from all edges
- We are moving from a graph that represents causal
effects, to a graph that represents the
associations we expect to observe (as a result of
both causal effects and the adjustment process)
Difficulty conceiving
SES
Pre-Natal Care
Maternal genetics
Vitamins
Birth Defects
27Using a DAG to decide what to adjust for in
analysis
- Step 6 Delete prenatal care
- This is equivalent to adjusting for prenatal
care, now that we have added to the graph the new
associations that will be created by adjusting
Difficulty conceiving
SES
Maternal genetics
Vitamins
Birth Defects
28Using a DAG to decide what to adjust for in
analysis
- Test Are Vitamins and Birth Defects still
connected? - Yes Adjusting for Prenatal Care is not
sufficient for control of confounding - After adjusting for prenatal care, vitamin use
and birth defects will still be associated in our
data, even if vitamin use has no causal effect on
birth defects
Difficulty conceiving
SES
Maternal genetics
Vitamins
Birth Defects
29Using a DAG to decide what to adjust for in
analysis
- Adjustment for which variables would result in
control of confounding? - Our DAG shows that adjusting for any one or more
of the three remaining variables, in addition to
prenatal care, would be sufficient for control of
confounding (e.g. SES and prenatal care)
Difficulty conceiving
Maternal genetics
Vitamins
Birth Defects
30Vitamins and Birth Defects Lessons learned
- It may not be immediately intuitive what
variables we need to control for in our analysis - The process of adjustment/stratifiction can
introduce new sources of association in our data
that must be accounted for in any attempt to
control confounding - Step by step analysis of a DAG provides a
rigorous check whether we have adequately
controlled for confounding - Adjustment for several different sets of
confounders may each be sufficient to control
confounding of the same exposure disease
relationship. - Can inform study design
31DAGs for control of confounding Summary of Steps
- Problem Is adjustment for/stratification on a
set of confounders C sufficient to control for
confounding of the relationship between E and D? - No variables in C should be descendants of E
- Delete all non-ancestors of E, D, C
- Delete all arrows emanating from E
- Connect any two parents with a common child
- Strip arrowheads from all edges
- Delete C
- Test If E is disconnected from D in the
remaining graph, then adjustment for C is
sufficient to remove confounding
Pearl, J. Causality. Cambridge University Press,
Cambridge UK. 2001. pp. 355-57.
32Stratification has its limits
- Up till now, you have heard about one way to
remove confounding adjustment or stratification
on certain variables - But in some situations, there are no variables
you can stratify on and sucessfully remove
confounding - We will illustrate this using a DAG
- In a future lecture, you will hear about a method
you can use in these circumstances (Marginal
Structural Models)
33A DAG-based illustration of time-dependent
confoundingA situation in which traditional
methods to control for confounding (i.e.
adjustment/stratification) break down
- Ex What variables should we control for to
estimate the effect of antiretroviral therapy on
CD4 count?
34Ex. Antiretroviral therapy and CD4 count
- Question of interest What is the effect of
antiretroviral therapy on CD4 count? - Study Population A cohort of HIV-infected
individuals - Outcome CD4 count at the end of the study
- Exposure Antiretroviral therapy (ART) (treated
or not for the entire study period)
35Ex. Antiretroviral therapy and CD4 count
- Sicker individuals (those with lower baseline
CD4 counts at the beginning of the study) are
more likely to be treated with ART - Low baseline CD4 count causes physicians to treat
their patients - CD4 count at baseline also affects CD4 count at
the end of the study
36Representing these relations in a DAG
CD4 Count at beginning of study
Outcome CD4 count at the end of a study
Causal effect of interest
Exposure Antiretroviral Treatment
37Simple confounding
- CD4 count at baseline is a confounder
- If we dont adjust for baseline CD4 count, we
will underestimate the effect of ART on
preserving final CD4 count - Sicker people/ those with lower initial counts
will be overrepresented among those who get
treated - We can see this in the DAG- we must adjust for
baseline CD4 or ART and final CD4 will still be
connected once we delete our causal effect of
interest - CD4 and ART share a common cause
38Representing these relations in a DAG
Confounder
CD4 Count at beginning of study
Outcome CD4 count at the end of a study
Exposure Antiretroviral Treatment
39Antiretroviral therapy and CD4 count A more
realistic example
- Same study population and outcome
- Cohort of HIV-infected
- Outcome is final CD4 count
- Now, an individual can change treatment status
during the course of follow-up - E.g. an individual who is not treated at the
beginning of the study (t0) may go on treatment
partway through the study (e.g. t1) - CD4 also measured during course of follow-up
40DAG- Expanded to incorporate changing treatment
over time
Baseline confounder
Y Final CD4 count
CD4 Count partway through study (t1)
CD4 Count at beginning of study (t0)
Causal effect of interest
Antiretroviral Treatment at t0
Antiretroviral Treatment at t1
41Something is missing.
- Our effect of interest is how antiretroviral
treatment throughout the study (eg t0 and t1)
affects final CD4 count - We have left out an important causal relationship
in the previous DAG! - Antiretroviral treatment at baseline affects
intermediate CD4 counts (e.g. CD4 measured at
t1) , which in turn affect final CD4 counts - This is part of our causal effect of interest!
42Filling in the DAG
Baseline confounder
Y Final CD4 count
CD4 Count partway through study (t1)
CD4 Count at beginning of study (t0)
Causal effect of interest
Antiretroviral Treatment at t0
Antiretroviral Treatment at t1
43Something is still missing
- CD4 count at t1 will also affect subsequent
treatment (ART at t1) - Note we take the convention that CD4(t) is
measured before ART(t) - Patients with lower CD4 counts at t1 are more
likely to start ART partway through the study - A patient getting sicker causes his/her physician
to start them on treatment
44Filling in the DAG
Baseline confounder
Y Final CD4 count
CD4 Count partway through study (t1)
CD4 Count at beginning of study (t0)
Causal effect of interest
Antiretroviral Treatment at t0
Antiretroviral Treatment at t1
45What does this DAG tell us about what we need to
adjust for to control confounding?
46Using the DAG to decide what we need to control
for
- We cant adjust for anything that is a descendant
of (caused by) ART - Rules out CD4 at t1
- Delete all non-ancestors of exposure, disease,
and things we are considering adjusting for - NA Everything in current graph is an ancestor of
outcome or exposure
Y Final CD4 count
CD4 Count at beginning of study (t0)
CD4 Count partway through study (t1)
Causal effect of interest
Antiretroviral Treatment at t0
Antiretroviral Treatment at t1
47Using the DAG to decide what we need to control
for
- Delete any arrows from ART
- Connect parents sharing a common child
- NA Already connected
Y Final CD4 count
CD4 Count at beginning of study (t0)
CD4 Count partway through study (t1)
Antiretroviral Treatment at t0
Antiretroviral Treatment at t1
48Using the DAG to decide what we need to control
for
- Strip arrowheads
- What can we delete that will leave ART and final
CD4 unconnected? - Remember CD4 at t1 is not an option since ART
at t0 affects it
Y Final CD4 count
CD4 Count at beginning of study (t0)
CD4 Count partway through study (t1)
Antiretroviral Treatment at t0
Antiretroviral Treatment at t1
49A Dilemma
- From our analysis of the DAG it is clear that if
we dont adjust for CD4 at t1, we fail to
control for confounding - But we know we cannot adjust for a variable
affected by our exposure of interest - Adjusting for CD4 at t1 would be equivalent to
adjusting for part of our causal effect of
interest - We would again fail to correctly estimate the
total effect of ART on final CD4 because we would
lose that component of the effect mediated by
early changes in CD4
50Adjusting for a variable on the causal pathway of
interest
Baseline confounder- could include it in
traditional multivariable model
Time-dependent confounder
Y Final CD4 count
CD4 Count partway through study t1
CD4 Count at beginning of study t0
Causal effect of interest
Antiretroviral Treatment at t0
Antiretroviral Treatment at t1
51Time-dependent confounding
- Time-dependent confounder A covariate that is
predictive of subsequent exposure, is an
independent risk factor for the outcome, and is
itself affected by prior exposure - If we dont adjust for the covariate we get bias
due to confounding - If we do adjust, we fail to estimate the causal
effect we are interested in because we are
adjusting for part of our effect of interest - You will see more of this problem, and hear about
some ways to address it (i.e. Marginal Structural
Models)
52Conclusions
- Today we have outlined the steps to
- Construct a DAG, based on knowledge/assumptions
- Use a DAG to decide if confounding is present
- Use a DAG to decide what variables to control for
in analysis - We have also used a DAG to illustrate a situation
where traditional methods for controlling
confounding are not adequate (time-dependent
confounding)
53References
- Pearl J. Causality Models reasoning and
Inference. Cambridge University Press, Cambridge
UK. 2001. - Jewell NP. Statistics for Epidemiology. Chapman
Hall/CRC, USA. 2004102-112 - Greenland S. Causal Diagrams for Epidemiologic
Research. Epidemiology, 1999 Jan, 10(3) 37-48. - Robins JM. Data, design, and background knowledge
in etiologic inference. Epidemiology,
200211313-320. - Hernan M, et al. Causal knowledge as a
prerequisite for confounding evaluation an
application to birth defects epidemiology. Am J
Epidemiol, 2002 155(2)176-184.
54Example DAG from Mayas research
55Example from Mayas research
- Effect of interest Effect of observed viral
mutation profile (presence of specific mutations)
on viral load (i.e. response to treatment - DAG reveals that adjustment for treatment history
is sufficient