Title: Monte Carlo Maximum Likelihood Methods for Estimating Uncertainty Arising from Shared Errors in Exposures in Epidemiological Studies
1Monte Carlo Maximum Likelihood Methods for
Estimating Uncertainty Arising from Shared Errors
in Exposures in Epidemiological Studies
- Daniel O. Stram
- University of Southern California
2Complex Dosimetry Systems a Working Definition
(my definition)
- A complex dosimetry system for the study of an
environmental exposure is one in which no single
best exposure estimate is provided - Instead a distribution of possible true exposures
is developed, together with a computer program
that generates exposure replications from this
distribution - Generates doses conditional on input data
- Both shared and unshared errors are incorporated
into the dose replications - The statistician/epidemiologist treats this
system as a Black Box, ie one that (s)he can
manipulate, but doesnt know (or care?) about its
inner workings
3- Some examples of epidemiological studies (of
radiation) that use a complex dosimetry system to
estimate doses - Utah Thyroid Disease Cohort Study
- Hanford Thyroid Disease Study
- Colorado plateau Uranium Miners study
- In such studies limited or no direct measurements
of individual dose exist. Instead a complex dose
reconstruction (Utah, Hanford) or interpolation
system (Colorado) is used to construct individual
dose estimates or histories.
4- Even when all subjects in the study have
(radiation) badge measurements these may need
adjustments to reflect temporal or geographical
differences in monitoring technology - Random errors and systematic biases exist for
virtually any method - Information about the size of random and
systematic biases for each dosimeter type comes
from only a few experiments - Therefore there may considerable uncertainty in
the systematic biases for any single dosimeter - Systematic biases constitute shared error
5Representation of uncertainty in complex
dosimetry systems
- Uncertainty in the dose estimates produced by
these systems is increasingly characterized using
Monte-Carlo methods which yield many
realizations of possible dose, rather than a
single best estimate of dose for each subject. - Part of the uncertainty of these estimates may be
due to lack of knowledge of factors that
influence simultaneously some or all the
subjects doses
6Dose estimation in the Hanford Thyroid Disease
Study
- Reconstruction based on physical modeling and
some measurements of - Releases of I-131
- Deposition and pasture retention of I-131
- Pasture practices
- Milk transfer coefficients
- Individual consumption of milk
- Note that errors in most of these will affect
doses for all individuals simultaneously
7Colorado Plateau Underground Miners Study
- Dose estimates created using a complex exposure
history / job history matrix - PHS exposure history matrix consisted of
interpolations of limited WLM measurements
temporally and geographically - Stram et al 1999 used a PHS developed hierarchy
of mines within localities within districts and
used a multilevel model to mimic temporal and
geographical variation in dose. - The 1999 analysis was based upon the
regression-substitution method in which E(true
doseall measurements) was computed for each
mine-year, after fitting the lognormal multilevel
model to the WLM measurements - Errors in mine-year measurements are correlated
by the interpolation system used, and many miners
work in the same mines leading to correlated
errors in the exposure history of each miner.
8Pooled Nuclear Workers
- Multi-facility, multi-year study
- Each worker had badge measurements but the
technologies changed through time and across
facility. - The systematic errors in each badge type are
shared by all subjects working at the time the
badge was in use - For many but not all types of personal monitor
some limited work (using phantoms, etc.) has been
done to assess the relationship between true
exposure and the badge measurement - One important issue is whether the low dose-rate
exposures of the N workers produce risks that are
in line with those seen for the A-bomb - upper confidence intervals that take account of
shared dosimetry error needed
9Monte-Carlo Dosimetry
- Adopts a Bayesian framework
- Is Bayesian about sampling error in the
experimental work (with badges), interpreted as
giving posterior distributions - Prior distributions for uncertain parameters (for
N workers, the likely size of biases for
unmeasured badges) using expert opinion - For each replication the uncertain parameters are
sampled from their distribution and combined with
samples of other random factors (e.g. local
meteorology for the Hanford or Utah studies) and
with all relevant individual data for each
subject (location, milk consumption, age, etc) - Each set of random quantities is combined with
individual data to form dose estimates for each
individual
10- Let us assume that the dose replications really
may be regarded as samples from the distribution
of true dose given all the individual data - For retrospective dose-reconstruction systems
this assumption may be a very large leap of faith - For other studies using badge calibration
(Workers) or measurement interpolation this may
be considerably more solidly founded. - Consider the sampling characteristics of
frequentist inference concerning risk estimation.
We want to know the influence of uncertainty on - The power to detect an effect (of exposure on
risk of disease) of a certain size - Confidence limits on estimated risk parameters
11An idealized dosimetry system
- Assume each replication of dose is a sample from
the joint distribution - f(X1, X2,.., XN W1, W2,,WN)
- of true dose given the input data Wi recorded
for all subjects. Because many realizations,
from f(XW) are available we can calculate - Zi E(Xi W)
- as the average over a very large number of
realizations, Xri where Xr f(XW) r1 ?
12How should an epidemiologist deal with the
uncertainty in the random replications of dose?
- We are interested in estimating parameters in the
dose-response function for disease Di given true
dose Xi , specifically the relationship - E(Di Xi)
- parameterized by ? (dose response slope)
13Simplifications of the disease model
- Assume a linear relation between D and X
- E(Di Xi) a b Xi (1)
- Linear models are of interest for at least two
reasons - They may be important for radio-biological and
radio protection reasons even for binary disease
outcomes (where logistic regression models are
the standard) - For small b it may be impossible to distinguish
between linear and smooth nonlinear (e.g.
logistic) dose response shapes - A study with good power to detect a dose-response
relationship may have very poor power to fully
define the shape of the response
14Berkson error models
- If the errors in the Z_is defined above are
independent from one another then fitting model
(1) is done by replacement of true X_i with Z_i.
- This is a Berkson error model in the sense that
the truth is distributed around the measured
value. Regression-substitution yields unbiased
estimates. - The classical error model has the measurement
distributed around the truth. This produces risk
estimates that are biased towards the null.
15Impact of independent measurement error
- For either Berkson or Classical error models the
most important effect of random error is loss of
power to detect nonzero risk estimates - If R2 is the squared correlation between true
exposure X and measured exposure Z then it will
take 1/R2 subjects to detect the same risk using
Z as using true X.
16Shared versus unshared dosimetry error
- A key distinction between the effects of shared
versus unshared dosimetry error is their effect
on the validity of sample variance estimates used
to characterize the variability of the estimates - Independent Berkson Errors The usual estimate
of the std error of the slope estimate remains
valid despite the loss of power - Independent Classical Errors Again the usual
estimate of the standard error of the slope
estimates generally remains valid despite - The loss of power
- The attenuation in the dose response parameter
estimate
17Dosimetry simplifications
- Adopt a generalization of the Berkson error model
for the joint distribution of true Xi around its
conditional mean Zi which incorporates both
shared and unshared errors - ?SM is shared multiplicative error with mean
1 ?M,i is unshared multiplicative error with
mean 1 ?SA is shared additive error with mean
0 ?A,i is unshared additive error with mean 0
18- Under this shared and unshared multiplicative and
additive (SUMA) error model we have E(XW) Z
(the usual Berkson property) over the
distribution of all four ? - What happens when we fit
- E(Di Zi) a b Zi
- If there are no measurement errors Var(?) 0, we
will have (for small values of b ) - (1)
19- Effects of shared and unshared errors on
estimation - We are interested in three questions regarding
each error component in the SUMA model - What is its effect on study power?
- What is its effect on the validity of expression
(1) for the variance of the estimate of b? - How valid are the estimates of study power when
they are based on expression (1)?
20- Shared Additive error has little effect on
either the estimation of b or on the variability
of the estimate - Unshared Additive or Multiplicative errors
reduces the correlation, R, between Xi and Zi,
thereby reducing study power, the reduction in
study efficiency due to unshared measurement
error is roughly proportional to R2 - however the validity of expression (1) for the
variance of the estimator remains appropriate.
Further the estimate of study power using (1)
remains appropriate
21Effect of multiplicative shared error
- Averaging over the distribution of random ?SM we
retain the Berkson property that - But with
22- Notice that if b 0 that the naïve estimate of
the variance of - ignoring the shared error is equal to
- the true variance of this parameter
- If b gt 0, the naïve estimate of the variance is
biased downward by
23- We conclude
- Ignoring shared error does not affect the
validity of the test of the null hypothesis that
b0, because expression (2) expression (1)
when b0 - More generally non-differential ME weakens the
power, but doesnt invalidate the validity, of a
test of association between disease and exposure - Ignoring shared error will overstate the power to
detect a bgt0, because (1) lt (2) in this case
24- Ignoring shared error will result in confidence
limits that are too narrow - However it is the upper confidence limit that is
most affected. - If the lower confidence limit ignoring shared
error does not include zero, correcting for
shared error will not cause it to include zero
(because of conclusion 1)
25How to incorporate shared ME directly into an
analysis
- Multiple imputation
- Full Parametric Bootstrap
- Likelihood analysis with MCML
26Multiple Imputation
- It is tempting to try to quantify the uncertainty
in by regressing Di on each set of Xr and
using the quantiles of the resulting as
confidence limits for b - This ignores the sampling variability of D
- Moreover the distribution of the slope estimates
can be badly biased towards the null value.
Essentially there is a reintroduction of
classical error into the problem - True multiple imputation requires sampling Xr
from the distribution of X given both the input
data W and the outcomes Di (not just W) to
remove these biases
27Full Parametric Bootstrap
- A simulation experiment in which is used as the
true value of the risk parameter and both doses
and outcomes Di are simulated from a complete
model
28Monte-Carlo maximum likelihood
- We can compute likelihood ratio tests as follows
- For null a0 and b0 generate n samples of Xr from
the distribution of X given W and D - For any test values a and b compute the log
likelihood ratio as - If we use b0 0 then we dont have to condition
on D (so that we can use the dosimetry system
directly)
29Once we compute the likelihood what do we do with
it?
- We have a funny mishmash we are being
- Bayesian about the doses
- Frequentist about the dose-response parameter
- Moreover we cant really expect standard
Frequentist asymptotic likelihood theory to hold - Suppose the number of subjects ? 8 then the
distribution of will be dominated by the
distribution of the shared multiplicative errors
in the dosimetry system the distribution of which
is arbitrary. - Is it still reasonable to use chi-square
approximations to the distribution of changes in
log likelihood?
30Other problems
- If shared multiplicative error is large then as
b-b0 gets large the summands in (5) - become extremely variable
- Convergence of the average is incredibly slow
- Round-off error dominates the performance of the
algorithm
31Application to the ORNL N-workers dataStayner et
al in review
- Estimate a single risk parameter using (time
dependent) total dose in a partial likelihood
analysis - Write a computer program that simulates the bias
factors for the badges used in those facilities
and re-links the risk sets
32Three analyses
- 1. Compute the MCML Likelihood
- For each replication of doses compute the partial
likelihoods over a 1-dimensional grid of risk
parameters - Average the partial likelihoods over the
replications - Pretend that the asymptotics still hold and
compute a confidence interval
33- Compute FPB estimates of
- Compare these to the MCML confidence intervals
- For each set of D computed in 2 compute a
separate MCML confidence interval (more
simulations from the dose distribution) - Count the number of times that the standard
frequentist confidence interval contains the true
value of the risk parameter
34(No Transcript)
35FPB simulations
36Some observations
- The MCML widens the confidence interval on the
high side more than the low side - The 90 percent asymptotic lower CI for the MCML
does not include 0. - This is good because (1) the uncorrected CI did
not include 0 and (2) we claim that correcting
for measurement error shouldnt affect the
significance of a test of no association. - Note that the two curves (corrected and MCML log
likelihoods) are very close to parallel at b0 - This implies that a score test of beta0 wil be
(nearly) identical using the corrected and
uncorrected likelihoods using any significance
criterion - This observed result follows from Tosteson and
Tsiatis 1988 on score tests for EIV problems
37- The FPB on the other hand puts significantly
more than 5 percent of the estimates lt 0 (68 of
1,000) and significantly fewer of the estimates
(33 of 1,000) above the MCML UCI. - This may actually be a promising observation for
the validity of the MCML confidence intervals - They tend to be skewed to the right (not
symmetric around the MLE) so more (than 3.3
percent) of the upper confidence limits and fewer
(than 6.8 percent) of the lower confidence
intervals should fail to contain the true value - Simulations are now in progress
38Validity of MCML CI
- Consider limiting case when n-gt8 the slope
estimate will be determined by the distribution
of shared multiplicative errors - Worst case would be the SUMA model
- Suppose that ?SM is distributed as log normal
with arithmetic mean 1 (log mean -1/2 ?2) - Then b / ?SM is also distributed as log normal
with mean parameter log(b) -1/2 ?2 and log
variance ?2 - Consider twice the change in log likelihood from
true b to MLE - This will be 1/?2log( )-(log(b)-1/2 ?2)2
which is exactly ?2 - Consider next a normal distribution for the
shared multiplicative error - Would make sense if the ?SM was itself a sum of
many components
39- For this model twice the change in log likelihood
is of form - -2 log(?SM) 1/(?2)(?SM-1)2 c
- Where c 2 log(1/21/2?(14 ?2))
- - 1/2?(14 ?2)-1/22 / ?2
- What is the distribution of this random variable?
- How close is it to a Chi Square w 1 df?
40(No Transcript)
41Conclusions
- The MCML has promise but it is complicated
- But other methods (multiple imputation, etc, have
complications of their own) - Score tests of b0 based on the average
likelihood agree with analyses that ignore
measurement errors - Our application of the MCML method for partial
likelihoods ignores the dilution effects
described by Prentice (Biometrika 1982) but
these are expected to be very small in most
settings - In shared error settings the asymptotics are not
correct for ordinary frequentist calculations,
but it seems to be hard to come up with
situations where they fail drastically
42Acknowledgements
- Leslie Stayner, Stephen Gilbert UIC/NIOSH
- Elisabeth Cardis, Martine Vrijheid, Isabelle
Deltour, IARC - Geoffrey Howe, Columbia
- Terri Kang, USC