Title: Power of logistic regression with measurement error in predictor variable and varying number of obse
1Power of logistic regression with measurement
error in predictor variable and varying number of
observations per subject
- Olga Melnichouk, Salomon Minkin, Lisa J. Martin,
Norman F. Boyd - Princess Margaret Hospital, Ontario Cancer
Institute - Department of Epidemiology and Statistics
April 2009
2Outline
- What motivated our calculations? Proposed
case-control study of serum biomarkers and risk
of breast cancerCanadian Diet and Breast
Cancer Prevention Study ? nested case-control
study. - Measurement error and its effect on estimated
regression coefficient and power. - Measurement error model and simulation study.
- Results and Discussion.
3Canadian Diet and Breast Cancer Prevention Study
Design
Eligible Subjects Identified
Pre-randomization
assessment
Intervention
Comparison
(n2,341)
(n2,349)
Annual Visits
Demographic data
Anthropometric data
3 day food records
non fasting serum
Follow up until Dec 2005
(7-17 years per subject)
3
4Canadian Diet and Breast Cancer Prevention Study
- Randomized trial of intervention with a low-fat
high-carbohydrate diet. - Begun in Toronto in 1988.
- Participating centers Toronto, Hamilton,
Kitchener-Waterloo, London, Sarnia, and Windsor
(Ontario), Vancouver and Surrey (British
Columbia). - In total, 4690 women with extensive mammographic
density were recruited and randomized to an
intervention or comparison group. - The intervention group received intensive
individual counseling to reduce fat intake to a
target of 15 calories, and increase carbohydrate
to 65 of calories.
5Canadian Diet and Breast Cancer Prevention Study
Nested case-control study
- 251 (projected 320) case subjects
- Individually matched with 2 controls selected
from all trial subjects - who had the same or
longer follow-up time and - who had not within
that time developed breast cancer. - Additional matching criteria age (within
1 year), date of randomization (within 1
year), center of randomization. - In addition to epidemiologic data and food
records, a blood sample was obtained at baseline
and annually thereafter, ? varying number of
blood samples per subject.
6Proposed case-control study
- Most of the published studies of serum biomarkers
and risk of breast cancer are based on
biomarkers measured in one blood sample. - Variability of a single measurement (Table 1).
7Proposed case-control study
- To examine association of long-term exposure to
serum biomarkers and risk of breast cancer?
Measure biomarkers in multiple blood samples per
subject.? Use their average as a surrogate of a
long-term exposure. - What do we gain if we measure biomarkers in
multiple blood samples?? Simulation study
8Measurement error
- Regression analysis ? predictor of interest is
measured exactly. - Regression analysis with measurement error ?
available predictor is related to predictor of
interest but with additional variability. - Two issues 1) attenuated estimate of the
regression coefficient (in the model with a
single predictor)2) power loss.
9Measurement error
- To correct for measurement errora) conduct a
validation study regress gold standard measure
(no measurement error) on exposure measured with
errorb) obtain multiple measurements per
subject of the error-prone exposure (reliability
study). - In both cases, we use predicted values as a
surrogate of the true exposure. - External or internal reliability study ? naive
estimate of the regression coefficient can be
corrected. - If a reliability study is part of the main study
(internal study) ? we have more information about
the true exposure ? should power increase?
10Measurement error model
- errors are non-differential
11Number of blood samples per subject
Figure 1. (A) Distribution and (B) cumulative
distribution of the number of available blood
samples per subject for the projected 311 case
subjects.
12Measurement error model reliability
13Measurement error model
14Measurement error model
15Measurement error model
16Measurement error model
17Simulation study plan
Continued on next page
18Simulation study plan
In each repeat, significance testing H0
ßZ0, H0 ßZ_bar0, H0 ßZ_bar_t0, H0 ßX0.
To account for extra variability due to
estimation of R, correct SE of ßZ_bar_t
7
8
Power probability that the null hypothesis is
rejected. Empirical power proportion of
simulated data sets in which a regression
coefficient was significantly different from zero
at 5 level of significance.
9
Estimate of the regression coefficient average
of the estimated regression coefficients in 1000
repeats.
19Table 2. Number of measured blood samples.
20Simulation study parameters
21Results power
true exposure measured without error ?solid
line transformed average ?long dashed
line average ?dotted line one observation
measured with error ?short dashed line
22Results regression coefficient
true exposure measured without error ?solid
line transformed average ?long dashed
line average ?dotted line one observation
measured with error ?short dashed line
23Results biomarkers
April 7, 2009
23
24Discussion
- Measuring serum biomarkers in repeated blood
samples - reduces bias? valid estimates of the effect of
these biomarkers on breast cancer risk can be
obtained - increases power? study with relatively moderate
sample size will detect important
associations? improved precision of the effect
estimates.
25References
- BG Armstrong, AS Whittemore, and GR Howe (1989)
Analysis of case-control data with covariate
measurement error application to diet and colon
cancer. Stat Med 81151-1163. - AS Whittemore (1989) Errors-in-variables
regression using Stein estimates. Am Statistician
43226-228. - D Thomas, D Stram, J Dwyer (1993) Exposure
measurement error influence on exposure-disease
relationships and methods of correction. Annu Rev
Public Health 1469-93. - MY Kim and A Zeleniuch-Jacquotte (1997)
Correcting for measurement error in the analysis
of case-control data with repeated measurements
of exposure. Am J Epidemiol 145 1003-1010.
26Thank you!