Title: Extending OPUS to wider domains Graphical models as a tool for combining multiple sources of information in the health and social sciences
1Extending OPUS to wider domainsGraphical models
as a tool for combining multiple sources of
information in the health and social sciences
- Nicky Best
- Imperial College
- Thanks to
- Sylvia Richardson, Chris Jackson, David
Spiegelhalter, Dave Lunn
2Outline
- Graphical modelling as a tool for building
complex models - Case study 1 Water disinfection byproducts and
adverse birth outcomes - Modelling multiple sources of bias in
observational studies - Case study 2 Socioeconomic factors and heart
disease - Combining individual and aggregate level data
- Simulation study
- Case study 3 Population pharmacokinetic-dynamic
models - Modelling drug dose ? concentration ? effect
pathways - Cutting feedback in graphical models
3Graphical modelling
Modelling
Mathematics
Inference
Algorithms
4Simple example of graphical model
Mendelian inheritance
- Nodes represent variables
- (Absence of) links represent conditional
independence assumptions
- Y, Z genotype of parents
- W, X genotypes of 2 children
- If we know the genotypes of the parents, then the
childrens genotypes are conditionally independent
5Joint distributions and graphical models
- Use ideas from graph theory to
- represent structure of a joint probability
distribution.. - ..by encoding conditional independencies
D
C
A
F
E
B
- Factorization thm
- Jt distribution P(V) ? P(v parentsv)
P(A,B,C,D,E,F) P(AC) P(BD,E) P(CD,E) P(D)
P(E) P(FD,E)
6Building complex models
D
C
A
F
E
B
- Conditional independence provides mathematical
basis for splitting up large system into smaller
components
7Building complex models
D
C
C
A
D
F
E
B
E
- Conditional independence provides mathematical
basis for splitting up large system into smaller
components
8Building complex models
- Key idea
- understand complex system
- through global model
- built from small pieces
- comprehensible
- each with only a few variables
- modular
9Case study 1
- Epidemiological study of low birth weight and
mothers exposure to water disinfection
byproducts - Background
- Chlorine added to tap water supply for
disinfection - Reacts with natural organic matter in water to
form unwanted byproducts (including
trihalomethanes, THMs) - Some evidence of adverse health effects (cancer,
birth defects) associated with exposure to high
levels of THM - We are carrying out study in Great Britain using
routine data, to investigate risk of low birth
weight associated with exposure to different THM
levels
10Data sources
- National postcoded births register
- Routinely monitored THM concentrations in tap
water samples for each water supply zone within
14 different water company regions - Census data area level socioeconomic factors
- Millenium cohort study (MCS) individual level
outcomes and confounder data on sample of mothers - Literature relating to factors affecting personal
exposure (uptake factors, water consumption, etc.)
11Model for combining data sources
f
THMzt
true
s2
THMztj
raw
THMik
mother
THMim
mother
bT
yim
yik
bc
cik
cim
qi
12Regression sub-model (national data)
Regression model for national data relating risk
of low birth weight (yik) to mothers THM
exposure and other confounders (cik)
f
THMzt
true
s2
THMztj
raw
THMik
mother
THMim
mother
bT
yim
yik
bc
cik
cim
k indexes mother i indexes groups (areas)
qi
13Regression sub-model (MCS)
Regression model for MCS data relating risk of
low birth weight (yim) to mothers THM exposure
and other confounders (cim)
f
THMzt
true
s2
THMztj
raw
THMik
mother
THMim
mother
bT
yim
yik
bc
cik
cim
qi
m indexes mother i indexes groups (areas)
14Missing confounders sub-model
Missing data model to estimate confounders (cik)
for mothers in national data, using information
on within area distribution of confounders in MCS
f
THMzt
true
s2
THMztj
raw
THMik
mother
THMim
mother
bT
yim
yik
bc
cik
cim
qi
k indexes mother (national) i indexes groups
(areas)
m indexes mother (MCS)
15THM measurement error sub-model
Model to estimate true tap water THM
concentration from raw data
f
z indexes water zone t indexes season j indexes
replicates
THMzt
true
s2
THMztj
raw
THMik
mother
THMim
mother
bT
yim
yik
bc
cik
cim
qi
16THM personal exposure sub-model
Model to predict personal exposure using
estimated tap water THM level and literature on
distribution of factors affecting individual
uptake of THM
f
THMzt
true
s2
THMztj
raw
THMik
mother
THMim
mother
bT
yim
yik
bc
cik
cim
qi
17Inference
Data
f
Unknowns
THMzt
true
s2
THMztj
raw
THMik
mother
THMim
mother
bT
yim
yik
bc
cik
cim
qi
18Case study 2
- Socioeconomic factors affecting health
- Background
- Interested in individual and contextual effects
of socioeconomic determinants of health - Often investigated using multi-level studies
(individuals within areas) - Ecological studies also widely used in
epidemiology and social sciences due to
availability of small-area data - investigate relationships at level of group,
rather than individual - outcome and exposures are available as
group-level summaries - usual aim is to transfer inference to individual
level
19Case study 2
- Example Socioeconomic risk factors for heart
disease - Health outcome
- Hospital admissions for heart disease in adults
living in London - Socioeconomic risk factors
- ethnicity, social class, education, area
deprivation - Data sources
- Individual Health Survey for England (with ward
identifier) - 1-54 observations per ward (median 7)
- Aggregate (ward) outcomes Hospital Episode
Statistics - Aggregate (ward) risk factors 1991 Census
20Building the model
- Multilevel model for individual data
bx
bZ
b0
s2
ai
xik
yik
Zi
person k
area i
21Building the model
- Multilevel model for individual data
yik Bernoulli(pik), person k, area i
bx
bZ
b0
s2
log pik b0 bx xik bZ Zi ai
ai
xik
yik
Zi
person k
area i
22Building the model
- Multilevel model for individual data
yik Bernoulli(pik), person k, area i
bx
bZ
b0
s2
log pik b0 bx xik bZ Zi ai
ai
ai Normal(0, s2)
xik
yik
Zi
person k
area i
23Building the model
- Multilevel model for individual data
yik Bernoulli(pik), person k, area i
bx
bZ
b0
s2
log pik b0 bx xik bZ Zi ai
ai
ai Normal(0, s2)
xik
yik
Zi
Priors on s2, b0, bx, bZ
person k
area i
24Building the model
bZ
b0
bx
s2
ai
Zi
Yi
Ni
area i
25Building the model
Ecological model Yi Binomial(qi, Ni), area i
bZ
b0
bx
s2
qi ? pik(xik,Zi,ai, b0,bx,bZ)fi(x)dx
ai
Zi
Yi
Ni
area i
26Building the model
Ecological model Yi Binomial(qi, Ni), area i
bZ
b0
bx
s2
qi ? pik(xik,Zi,ai, b0,bx,bZ)fi(x)dx
ai
ai Normal(0, s2)
Zi
Yi
Ni
area i
27Building the model
Ecological model Yi Binomial(qi, Ni), area i
bZ
b0
bx
s2
qi ? pik(xik,Zi,ai, b0,bx,bZ)fi(x)dx
ai
ai Normal(0, s2)
Zi
Yi
Priors on s2, b0, bx, bZ
Ni
area i
28Combining individual and aggregate data
- Individual level survey data often lack power to
inform about contextual and/or individual-level
effects - Even when correct (integrated) model used,
ecological data often contain little information
about some or all effects of interest - Can we improve inference by combining both types
of model / data?
29Combining individual and aggregate data
Multilevel model for individual data
Ecological model
bx
bZ
b0
bZ
s2
b0
bx
s2
ai
ai
Zi
xik
yik
Yi
Zi
person k
Ni
area i
area i
30Combining individual and aggregate data
b0
bx
bZ
Hierarchical Related Regression (HRR) model
s2
ai
xik
yik
Yi
Zi
person k
Ni
area i
31Simulation Study
32Comments
- Inference from aggregate data can be unbiased
provided exposure contrasts between areas are
high (and appropriate integrated model used) - Combining aggregate data with small samples of
individual data can reduce bias when exposure
contrasts are low - Combining individual and aggregate data can
reduce MSE of estimated compared to individual
data alone
33Case Study 3
- Generic scenario linking together exposure
sub-models and response sub-models - Example population pharamacokinetic (PK)
pharmacodynamic (PD) models - PK data provide information about a drugs
concentration-time profile (exposure sub-model) - PD data provide info concentration-response
profile (response sub-model) - Graphical models provide natural tool for
combining PK and PD data
34PK model
35PD model
36Graph of PK-PD model
37Posterior mean and 95 intervals for population
mean PK parameters
38Posterior mean and 95 intervals for population
mean PD parameters
39Cutting feedback in graphical models
- In situations with sparse PK data relative to
abundant PD data - Feedback from PD data may be having undue
influence on PK parameter estimates in joint
PK-PD model - Can cut feedback by ignoring PD contribution
when estimating PK parameters - Similar to 2 stage analysis, but with full
propagation of uncertainty - Note no longer a Bayesian full probability model!
40Graph of cut PK-PD model
41Posterior mean and 95 intervals for population
mean PK parameters
42Posterior mean and 95 intervals for population
mean PD parameters
43Concluding Remarks
- Graphical models are powerful and flexible tool
for building realistic statistical models for
complex problems - Applicable in many domains
- Allow exploiting of subject matter knowledge
- Allow formal combining of multiple data sources
- Built on rigorous mathematics
- Ideally suited to Bayesian inferential methods
Thank you for your attention!