1 / 42

Monte Carlo Maximum Likelihood Methods for

Estimating Uncertainty Arising from Shared Errors

in Exposures in Epidemiological Studies

- Daniel O. Stram
- University of Southern California

Complex Dosimetry Systems a Working Definition

(my definition)

- A complex dosimetry system for the study of an

environmental exposure is one in which no single

best exposure estimate is provided - Instead a distribution of possible true exposures

is developed, together with a computer program

that generates exposure replications from this

distribution - Generates doses conditional on input data
- Both shared and unshared errors are incorporated

into the dose replications - The statistician/epidemiologist treats this

system as a Black Box, ie one that (s)he can

manipulate, but doesnt know (or care?) about its

inner workings

- Some examples of epidemiological studies (of

radiation) that use a complex dosimetry system to

estimate doses - Utah Thyroid Disease Cohort Study
- Hanford Thyroid Disease Study
- Colorado plateau Uranium Miners study
- In such studies limited or no direct measurements

of individual dose exist. Instead a complex dose

reconstruction (Utah, Hanford) or interpolation

system (Colorado) is used to construct individual

dose estimates or histories.

- Even when all subjects in the study have

(radiation) badge measurements these may need

adjustments to reflect temporal or geographical

differences in monitoring technology - Random errors and systematic biases exist for

virtually any method - Information about the size of random and

systematic biases for each dosimeter type comes

from only a few experiments - Therefore there may considerable uncertainty in

the systematic biases for any single dosimeter - Systematic biases constitute shared error

Representation of uncertainty in complex

dosimetry systems

- Uncertainty in the dose estimates produced by

these systems is increasingly characterized using

Monte-Carlo methods which yield many

realizations of possible dose, rather than a

single best estimate of dose for each subject. - Part of the uncertainty of these estimates may be

due to lack of knowledge of factors that

influence simultaneously some or all the

subjects doses

Dose estimation in the Hanford Thyroid Disease

Study

- Reconstruction based on physical modeling and

some measurements of - Releases of I-131
- Deposition and pasture retention of I-131
- Pasture practices
- Milk transfer coefficients
- Individual consumption of milk
- Note that errors in most of these will affect

doses for all individuals simultaneously

Colorado Plateau Underground Miners Study

- Dose estimates created using a complex exposure

history / job history matrix - PHS exposure history matrix consisted of

interpolations of limited WLM measurements

temporally and geographically - Stram et al 1999 used a PHS developed hierarchy

of mines within localities within districts and

used a multilevel model to mimic temporal and

geographical variation in dose. - The 1999 analysis was based upon the

regression-substitution method in which E(true

doseall measurements) was computed for each

mine-year, after fitting the lognormal multilevel

model to the WLM measurements - Errors in mine-year measurements are correlated

by the interpolation system used, and many miners

work in the same mines leading to correlated

errors in the exposure history of each miner.

Pooled Nuclear Workers

- Multi-facility, multi-year study
- Each worker had badge measurements but the

technologies changed through time and across

facility. - The systematic errors in each badge type are

shared by all subjects working at the time the

badge was in use - For many but not all types of personal monitor

some limited work (using phantoms, etc.) has been

done to assess the relationship between true

exposure and the badge measurement - One important issue is whether the low dose-rate

exposures of the N workers produce risks that are

in line with those seen for the A-bomb - upper confidence intervals that take account of

shared dosimetry error needed

Monte-Carlo Dosimetry

- Adopts a Bayesian framework
- Is Bayesian about sampling error in the

experimental work (with badges), interpreted as

giving posterior distributions - Prior distributions for uncertain parameters (for

N workers, the likely size of biases for

unmeasured badges) using expert opinion - For each replication the uncertain parameters are

sampled from their distribution and combined with

samples of other random factors (e.g. local

meteorology for the Hanford or Utah studies) and

with all relevant individual data for each

subject (location, milk consumption, age, etc) - Each set of random quantities is combined with

individual data to form dose estimates for each

individual

- Let us assume that the dose replications really

may be regarded as samples from the distribution

of true dose given all the individual data - For retrospective dose-reconstruction systems

this assumption may be a very large leap of faith - For other studies using badge calibration

(Workers) or measurement interpolation this may

be considerably more solidly founded. - Consider the sampling characteristics of

frequentist inference concerning risk estimation.

We want to know the influence of uncertainty on - The power to detect an effect (of exposure on

risk of disease) of a certain size - Confidence limits on estimated risk parameters

An idealized dosimetry system

- Assume each replication of dose is a sample from

the joint distribution - f(X1, X2,.., XN W1, W2,,WN)
- of true dose given the input data Wi recorded

for all subjects. Because many realizations,

from f(XW) are available we can calculate - Zi E(Xi W)
- as the average over a very large number of

realizations, Xri where Xr f(XW) r1 ?

How should an epidemiologist deal with the

uncertainty in the random replications of dose?

- We are interested in estimating parameters in the

dose-response function for disease Di given true

dose Xi , specifically the relationship - E(Di Xi)
- parameterized by ? (dose response slope)

Simplifications of the disease model

- Assume a linear relation between D and X
- E(Di Xi) a b Xi (1)
- Linear models are of interest for at least two

reasons - They may be important for radio-biological and

radio protection reasons even for binary disease

outcomes (where logistic regression models are

the standard) - For small b it may be impossible to distinguish

between linear and smooth nonlinear (e.g.

logistic) dose response shapes - A study with good power to detect a dose-response

relationship may have very poor power to fully

define the shape of the response

Berkson error models

- If the errors in the Z_is defined above are

independent from one another then fitting model

(1) is done by replacement of true X_i with Z_i.

- This is a Berkson error model in the sense that

the truth is distributed around the measured

value. Regression-substitution yields unbiased

estimates. - The classical error model has the measurement

distributed around the truth. This produces risk

estimates that are biased towards the null.

Impact of independent measurement error

- For either Berkson or Classical error models the

most important effect of random error is loss of

power to detect nonzero risk estimates - If R2 is the squared correlation between true

exposure X and measured exposure Z then it will

take 1/R2 subjects to detect the same risk using

Z as using true X.

Shared versus unshared dosimetry error

- A key distinction between the effects of shared

versus unshared dosimetry error is their effect

on the validity of sample variance estimates used

to characterize the variability of the estimates - Independent Berkson Errors The usual estimate

of the std error of the slope estimate remains

valid despite the loss of power - Independent Classical Errors Again the usual

estimate of the standard error of the slope

estimates generally remains valid despite - The loss of power
- The attenuation in the dose response parameter

estimate

Dosimetry simplifications

- Adopt a generalization of the Berkson error model

for the joint distribution of true Xi around its

conditional mean Zi which incorporates both

shared and unshared errors - ?SM is shared multiplicative error with mean

1 ?M,i is unshared multiplicative error with

mean 1 ?SA is shared additive error with mean

0 ?A,i is unshared additive error with mean 0

- Under this shared and unshared multiplicative and

additive (SUMA) error model we have E(XW) Z

(the usual Berkson property) over the

distribution of all four ? - What happens when we fit
- E(Di Zi) a b Zi
- If there are no measurement errors Var(?) 0, we

will have (for small values of b ) - (1)

- Effects of shared and unshared errors on

estimation - We are interested in three questions regarding

each error component in the SUMA model - What is its effect on study power?
- What is its effect on the validity of expression

(1) for the variance of the estimate of b? - How valid are the estimates of study power when

they are based on expression (1)?

- Shared Additive error has little effect on

either the estimation of b or on the variability

of the estimate - Unshared Additive or Multiplicative errors

reduces the correlation, R, between Xi and Zi,

thereby reducing study power, the reduction in

study efficiency due to unshared measurement

error is roughly proportional to R2 - however the validity of expression (1) for the

variance of the estimator remains appropriate.

Further the estimate of study power using (1)

remains appropriate

Effect of multiplicative shared error

- Averaging over the distribution of random ?SM we

retain the Berkson property that - But with

- Notice that if b 0 that the naïve estimate of

the variance of - ignoring the shared error is equal to
- the true variance of this parameter
- If b gt 0, the naïve estimate of the variance is

biased downward by

- We conclude
- Ignoring shared error does not affect the

validity of the test of the null hypothesis that

b0, because expression (2) expression (1)

when b0 - More generally non-differential ME weakens the

power, but doesnt invalidate the validity, of a

test of association between disease and exposure - Ignoring shared error will overstate the power to

detect a bgt0, because (1) lt (2) in this case

- Ignoring shared error will result in confidence

limits that are too narrow - However it is the upper confidence limit that is

most affected. - If the lower confidence limit ignoring shared

error does not include zero, correcting for

shared error will not cause it to include zero

(because of conclusion 1)

How to incorporate shared ME directly into an

analysis

- Multiple imputation
- Full Parametric Bootstrap
- Likelihood analysis with MCML

Multiple Imputation

- It is tempting to try to quantify the uncertainty

in by regressing Di on each set of Xr and

using the quantiles of the resulting as

confidence limits for b - This ignores the sampling variability of D
- Moreover the distribution of the slope estimates

can be badly biased towards the null value.

Essentially there is a reintroduction of

classical error into the problem - True multiple imputation requires sampling Xr

from the distribution of X given both the input

data W and the outcomes Di (not just W) to

remove these biases

Full Parametric Bootstrap

- A simulation experiment in which is used as the

true value of the risk parameter and both doses

and outcomes Di are simulated from a complete

model

Monte-Carlo maximum likelihood

- We can compute likelihood ratio tests as follows
- For null a0 and b0 generate n samples of Xr from

the distribution of X given W and D - For any test values a and b compute the log

likelihood ratio as - If we use b0 0 then we dont have to condition

on D (so that we can use the dosimetry system

directly)

Once we compute the likelihood what do we do with

it?

- We have a funny mishmash we are being
- Bayesian about the doses
- Frequentist about the dose-response parameter
- Moreover we cant really expect standard

Frequentist asymptotic likelihood theory to hold - Suppose the number of subjects ? 8 then the

distribution of will be dominated by the

distribution of the shared multiplicative errors

in the dosimetry system the distribution of which

is arbitrary. - Is it still reasonable to use chi-square

approximations to the distribution of changes in

log likelihood?

Other problems

- If shared multiplicative error is large then as

b-b0 gets large the summands in (5) - become extremely variable
- Convergence of the average is incredibly slow
- Round-off error dominates the performance of the

algorithm

Application to the ORNL N-workers dataStayner et

al in review

- Estimate a single risk parameter using (time

dependent) total dose in a partial likelihood

analysis - Write a computer program that simulates the bias

factors for the badges used in those facilities

and re-links the risk sets

Three analyses

- 1. Compute the MCML Likelihood
- For each replication of doses compute the partial

likelihoods over a 1-dimensional grid of risk

parameters - Average the partial likelihoods over the

replications - Pretend that the asymptotics still hold and

compute a confidence interval

- Compute FPB estimates of
- Compare these to the MCML confidence intervals
- For each set of D computed in 2 compute a

separate MCML confidence interval (more

simulations from the dose distribution) - Count the number of times that the standard

frequentist confidence interval contains the true

value of the risk parameter

(No Transcript)

FPB simulations

Some observations

- The MCML widens the confidence interval on the

high side more than the low side - The 90 percent asymptotic lower CI for the MCML

does not include 0. - This is good because (1) the uncorrected CI did

not include 0 and (2) we claim that correcting

for measurement error shouldnt affect the

significance of a test of no association. - Note that the two curves (corrected and MCML log

likelihoods) are very close to parallel at b0 - This implies that a score test of beta0 wil be

(nearly) identical using the corrected and

uncorrected likelihoods using any significance

criterion - This observed result follows from Tosteson and

Tsiatis 1988 on score tests for EIV problems

- The FPB on the other hand puts significantly

more than 5 percent of the estimates lt 0 (68 of

1,000) and significantly fewer of the estimates

(33 of 1,000) above the MCML UCI. - This may actually be a promising observation for

the validity of the MCML confidence intervals - They tend to be skewed to the right (not

symmetric around the MLE) so more (than 3.3

percent) of the upper confidence limits and fewer

(than 6.8 percent) of the lower confidence

intervals should fail to contain the true value - Simulations are now in progress

Validity of MCML CI

- Consider limiting case when n-gt8 the slope

estimate will be determined by the distribution

of shared multiplicative errors - Worst case would be the SUMA model
- Suppose that ?SM is distributed as log normal

with arithmetic mean 1 (log mean -1/2 ?2) - Then b / ?SM is also distributed as log normal

with mean parameter log(b) -1/2 ?2 and log

variance ?2 - Consider twice the change in log likelihood from

true b to MLE - This will be 1/?2log( )-(log(b)-1/2 ?2)2

which is exactly ?2 - Consider next a normal distribution for the

shared multiplicative error - Would make sense if the ?SM was itself a sum of

many components

- For this model twice the change in log likelihood

is of form - -2 log(?SM) 1/(?2)(?SM-1)2 c
- Where c 2 log(1/21/2?(14 ?2))
- - 1/2?(14 ?2)-1/22 / ?2
- What is the distribution of this random variable?

- How close is it to a Chi Square w 1 df?

(No Transcript)

Conclusions

- The MCML has promise but it is complicated
- But other methods (multiple imputation, etc, have

complications of their own) - Score tests of b0 based on the average

likelihood agree with analyses that ignore

measurement errors - Our application of the MCML method for partial

likelihoods ignores the dilution effects

described by Prentice (Biometrika 1982) but

these are expected to be very small in most

settings - In shared error settings the asymptotics are not

correct for ordinary frequentist calculations,

but it seems to be hard to come up with

situations where they fail drastically

Acknowledgements

- Leslie Stayner, Stephen Gilbert UIC/NIOSH
- Elisabeth Cardis, Martine Vrijheid, Isabelle

Deltour, IARC - Geoffrey Howe, Columbia
- Terri Kang, USC