Extending OPUS to wider domains Graphical models as a tool for combining multiple sources of information in the health and social sciences - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Extending OPUS to wider domains Graphical models as a tool for combining multiple sources of information in the health and social sciences

Description:

Extending OPUS to wider domains Graphical models as a tool for combining multiple sources of information in the health and social sciences Nicky Best – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 44
Provided by: rcla7
Category:

less

Transcript and Presenter's Notes

Title: Extending OPUS to wider domains Graphical models as a tool for combining multiple sources of information in the health and social sciences


1
Extending OPUS to wider domainsGraphical models
as a tool for combining multiple sources of
information in the health and social sciences
  • Nicky Best
  • Imperial College
  • Thanks to
  • Sylvia Richardson, Chris Jackson, David
    Spiegelhalter, Dave Lunn

2
Outline
  • Graphical modelling as a tool for building
    complex models
  • Case study 1 Water disinfection byproducts and
    adverse birth outcomes
  • Modelling multiple sources of bias in
    observational studies
  • Case study 2 Socioeconomic factors and heart
    disease
  • Combining individual and aggregate level data
  • Simulation study
  • Case study 3 Population pharmacokinetic-dynamic
    models
  • Modelling drug dose ? concentration ? effect
    pathways
  • Cutting feedback in graphical models

3
Graphical modelling
Modelling
Mathematics
Inference
Algorithms
4
Simple example of graphical model
Mendelian inheritance
  • Nodes represent variables
  • (Absence of) links represent conditional
    independence assumptions
  • Y, Z genotype of parents
  • W, X genotypes of 2 children
  • If we know the genotypes of the parents, then the
    childrens genotypes are conditionally independent

5
Joint distributions and graphical models
  • Use ideas from graph theory to
  • represent structure of a joint probability
    distribution..
  • ..by encoding conditional independencies

D
C
A
F
E
B
  • Factorization thm
  • Jt distribution P(V) ? P(v parentsv)

P(A,B,C,D,E,F) P(AC) P(BD,E) P(CD,E) P(D)
P(E) P(FD,E)
6
Building complex models
D
C
A
F
E
B
  • Conditional independence provides mathematical
    basis for splitting up large system into smaller
    components

7
Building complex models
D
C
C
A
D
F
E
B
E
  • Conditional independence provides mathematical
    basis for splitting up large system into smaller
    components

8
Building complex models
  • Key idea
  • understand complex system
  • through global model
  • built from small pieces
  • comprehensible
  • each with only a few variables
  • modular

9
Case study 1
  • Epidemiological study of low birth weight and
    mothers exposure to water disinfection
    byproducts
  • Background
  • Chlorine added to tap water supply for
    disinfection
  • Reacts with natural organic matter in water to
    form unwanted byproducts (including
    trihalomethanes, THMs)
  • Some evidence of adverse health effects (cancer,
    birth defects) associated with exposure to high
    levels of THM
  • We are carrying out study in Great Britain using
    routine data, to investigate risk of low birth
    weight associated with exposure to different THM
    levels

10
Data sources
  • National postcoded births register
  • Routinely monitored THM concentrations in tap
    water samples for each water supply zone within
    14 different water company regions
  • Census data area level socioeconomic factors
  • Millenium cohort study (MCS) individual level
    outcomes and confounder data on sample of mothers
  • Literature relating to factors affecting personal
    exposure (uptake factors, water consumption, etc.)

11
Model for combining data sources
f
THMzt
true
s2
THMztj
raw
THMik
mother
THMim
mother
bT
yim
yik
bc
cik
cim
qi
12
Regression sub-model (national data)
Regression model for national data relating risk
of low birth weight (yik) to mothers THM
exposure and other confounders (cik)
f
THMzt
true
s2
THMztj
raw
THMik
mother
THMim
mother
bT
yim
yik
bc
cik
cim
k indexes mother i indexes groups (areas)
qi
13
Regression sub-model (MCS)
Regression model for MCS data relating risk of
low birth weight (yim) to mothers THM exposure
and other confounders (cim)
f
THMzt
true
s2
THMztj
raw
THMik
mother
THMim
mother
bT
yim
yik
bc
cik
cim
qi
m indexes mother i indexes groups (areas)
14
Missing confounders sub-model
Missing data model to estimate confounders (cik)
for mothers in national data, using information
on within area distribution of confounders in MCS
f
THMzt
true
s2
THMztj
raw
THMik
mother
THMim
mother
bT
yim
yik
bc
cik
cim
qi
k indexes mother (national) i indexes groups
(areas)
m indexes mother (MCS)
15
THM measurement error sub-model
Model to estimate true tap water THM
concentration from raw data
f
z indexes water zone t indexes season j indexes
replicates
THMzt
true
s2
THMztj
raw
THMik
mother
THMim
mother
bT
yim
yik
bc
cik
cim
qi
16
THM personal exposure sub-model
Model to predict personal exposure using
estimated tap water THM level and literature on
distribution of factors affecting individual
uptake of THM
f
THMzt
true
s2
THMztj
raw
THMik
mother
THMim
mother
bT
yim
yik
bc
cik
cim
qi
17
Inference
Data
f
Unknowns
THMzt
true
s2
THMztj
raw
THMik
mother
THMim
mother
bT
yim
yik
bc
cik
cim
qi
18
Case study 2
  • Socioeconomic factors affecting health
  • Background
  • Interested in individual and contextual effects
    of socioeconomic determinants of health
  • Often investigated using multi-level studies
    (individuals within areas)
  • Ecological studies also widely used in
    epidemiology and social sciences due to
    availability of small-area data
  • investigate relationships at level of group,
    rather than individual
  • outcome and exposures are available as
    group-level summaries
  • usual aim is to transfer inference to individual
    level

19
Case study 2
  • Example Socioeconomic risk factors for heart
    disease
  • Health outcome
  • Hospital admissions for heart disease in adults
    living in London
  • Socioeconomic risk factors
  • ethnicity, social class, education, area
    deprivation
  • Data sources
  • Individual Health Survey for England (with ward
    identifier)
  • 1-54 observations per ward (median 7)
  • Aggregate (ward) outcomes Hospital Episode
    Statistics
  • Aggregate (ward) risk factors 1991 Census

20
Building the model
  • Multilevel model for individual data

bx
bZ
b0
s2
ai
xik
yik
Zi
person k
area i
21
Building the model
  • Multilevel model for individual data

yik Bernoulli(pik), person k, area i
bx
bZ
b0
s2
log pik b0 bx xik bZ Zi ai
ai
xik
yik
Zi
person k
area i
22
Building the model
  • Multilevel model for individual data

yik Bernoulli(pik), person k, area i
bx
bZ
b0
s2
log pik b0 bx xik bZ Zi ai
ai
ai Normal(0, s2)
xik
yik
Zi
person k
area i
23
Building the model
  • Multilevel model for individual data

yik Bernoulli(pik), person k, area i
bx
bZ
b0
s2
log pik b0 bx xik bZ Zi ai
ai
ai Normal(0, s2)
xik
yik
Zi
Priors on s2, b0, bx, bZ
person k
area i
24
Building the model
  • Ecological model

bZ
b0
bx
s2
ai
Zi
Yi
Ni
area i
25
Building the model
Ecological model Yi Binomial(qi, Ni), area i
bZ
b0
bx
s2
qi ? pik(xik,Zi,ai, b0,bx,bZ)fi(x)dx
ai
Zi
Yi
Ni
area i
26
Building the model
Ecological model Yi Binomial(qi, Ni), area i
bZ
b0
bx
s2
qi ? pik(xik,Zi,ai, b0,bx,bZ)fi(x)dx
ai
ai Normal(0, s2)
Zi
Yi
Ni
area i
27
Building the model
Ecological model Yi Binomial(qi, Ni), area i
bZ
b0
bx
s2
qi ? pik(xik,Zi,ai, b0,bx,bZ)fi(x)dx
ai
ai Normal(0, s2)
Zi
Yi
Priors on s2, b0, bx, bZ
Ni
area i
28
Combining individual and aggregate data
  • Individual level survey data often lack power to
    inform about contextual and/or individual-level
    effects
  • Even when correct (integrated) model used,
    ecological data often contain little information
    about some or all effects of interest
  • Can we improve inference by combining both types
    of model / data?

29
Combining individual and aggregate data
Multilevel model for individual data
Ecological model
bx
bZ
b0
bZ
s2
b0
bx
s2
ai
ai
Zi
xik
yik
Yi
Zi
person k
Ni
area i
area i
30
Combining individual and aggregate data
b0
bx
bZ
Hierarchical Related Regression (HRR) model
s2
ai
xik
yik
Yi
Zi
person k
Ni
area i
31
Simulation Study
32
Comments
  • Inference from aggregate data can be unbiased
    provided exposure contrasts between areas are
    high (and appropriate integrated model used)
  • Combining aggregate data with small samples of
    individual data can reduce bias when exposure
    contrasts are low
  • Combining individual and aggregate data can
    reduce MSE of estimated compared to individual
    data alone

33
Case Study 3
  • Generic scenario linking together exposure
    sub-models and response sub-models
  • Example population pharamacokinetic (PK)
    pharmacodynamic (PD) models
  • PK data provide information about a drugs
    concentration-time profile (exposure sub-model)
  • PD data provide info concentration-response
    profile (response sub-model)
  • Graphical models provide natural tool for
    combining PK and PD data

34
PK model
35
PD model
36
Graph of PK-PD model
37
Posterior mean and 95 intervals for population
mean PK parameters
38
Posterior mean and 95 intervals for population
mean PD parameters
39
Cutting feedback in graphical models
  • In situations with sparse PK data relative to
    abundant PD data
  • Feedback from PD data may be having undue
    influence on PK parameter estimates in joint
    PK-PD model
  • Can cut feedback by ignoring PD contribution
    when estimating PK parameters
  • Similar to 2 stage analysis, but with full
    propagation of uncertainty
  • Note no longer a Bayesian full probability model!

40
Graph of cut PK-PD model
41
Posterior mean and 95 intervals for population
mean PK parameters
42
Posterior mean and 95 intervals for population
mean PD parameters
43
Concluding Remarks
  • Graphical models are powerful and flexible tool
    for building realistic statistical models for
    complex problems
  • Applicable in many domains
  • Allow exploiting of subject matter knowledge
  • Allow formal combining of multiple data sources
  • Built on rigorous mathematics
  • Ideally suited to Bayesian inferential methods

Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com