Title: Imputation Methods for Missing Quality of Life Data in the Adjuvant Breast Cancer Trials IBCSG Trial
1Imputation Methods for Missing Quality of Life
Data in the Adjuvant Breast Cancer Trials IBCSG
Trial VI and VII
2Aim
- Analysing a clinical trial looking at the effect
of quality of life (QOL) on disease free survival
(DFS) - QOL is measured repeatedly but there are many
missing values - Is the conclusion about the effect of QOL on DFS
sensitive to the missing values?
3Outline
- IBCSG Trial VI and VII
- Imputing Coping Scores
- Time-dependent Cox models
- Missing data pattern
- Common imputation methods
- Results from time-dependent Cox models
- Comparing imputation methods
- Estimated difference between imputed coping score
and missing coping score - Simulated dataset with relationship between
disease-free survival and coping score - Conclusions
4IBCSG Trial VI/VII
- Between July 1986 and April 1993, 1475 pre- and
perimenopausal patients were randomized to Trial
VI - Between March 1990 and April 1993, 1212
postmenopausal patients were randomized to Trial
VII - Patients followed up every 3 months for 2 years
5Quality of Life Assessment
- To assess coping / adjustment to disease, a
simple self-assessment scale was used - The patients were asked to rate the amount of
adjustment needed to cope with her illness on a
scale from 0 to100 by marking on a linear
analogue line - High numbers reflect worse quality of life
6Quality of Life Objectives
- The objective of assessing quality of life was
- to evaluate the hypotheses the level of early
coping / well-being of the patient can be used as
a prognostic factor of outcome - to investigate if the level of early coping /
well-being of the patient changes during the
study (not discussed here)
7Missing Data Pattern
- Baseline 6 months 12 months 24 months
- Observed 2231 1870 1812 1501
- Missing 456 751 662 666
- Post-recurrence 0 57 173 340
- Lost to follow-up 0 1 3 5
- Dead 0 8 37 175
- Total 2687 2687 2687 2687
-
8Imputing Coping Scores
- The aim of the analysis is to investigate the
relationship between quality of life and
disease-free survival (DFS) - Considered coping score from quality of life
assessments up to 2 years (24 months) after
randomization and before recurrence - Only 585 out of 2687 patients have all 9 quality
of life assessments up to 2 years before
recurrence - Imputation of the quality of life scores is
therefore considered
9Time Dependent Cox Models
- Coping score as covariate in a time-dependent Cox
model for DFS - In a time-dependent Cox model for DFS, the length
of time a patient spends in each time period,
whether DFS survival event occurred in the time
period is calculated - The coping score changes with time and the coping
score during each period is a time dependent
covariate
10Common Imputation Methods
- Last Observation Carried Forward (LOCF)
- Median imputation
- Bootstrapping
- Linear regression
- Predicted mean matching
- Pattern mixture models
- Nearest neighbour imputation
-
11Bootstrapping
- Replace a missing coping score an observed coping
score selected at random from the observed coping
scores of patients in the subgroup - Subgroups defined by baseline coping score and
then previous observed or imputed coping score - Bootstrap procedure run 150 times
12Linear Regression with Concurrent Variables
- Concurrent variables considered in a linear
regression model for coping score included - UICC Performance Status Menstrual status
- Severity of adverse events
- nausea and vomiting
- diarrhoea
- stomatitis / mucous membrane
13Hazard Ratios for Square Root of Coping Score
(S_Pacis)
- The hazard ratios for the square root of the
coping score (S_Pacis) from a time-dependent Cox
model stratified by trial are presented - The hazard ratios for S_Pacis from a
time-dependent Cox model stratified by trial with
re-introduction of chemotherapy as an explanatory
variable are similar and are not presented
14Results from Time-Dependent Cox Model
(Stratified by Trial)
- Hazard Ratio 95 CI for HR
- LOCF 0.993 (0.972, 1.015)
- Median imputation
- Median of patients observed scores 1.003 (0.983,
1.025) - Median of time period 0.994 (0.970, 1.018)
- Median of treatment arm by time
period 0.991 (0.968, 1.015) - Linear regression
- Using previous coping scores 0.993 (0.970,
1.018) - Using concurrent variables 1.008 (0.986, 1.030)
15Results Time-Dependent Cox Model (Stratified by
Trial)Mean of 150 Simulations
- Hazard Ratio 95 CI for HR
- Bootstrapping, subgroups defined by
- Baseline coping score 0.992 (0.970, 1.015)
- Previous coping score 0.994 (0.972, 1.016)
- Predicted mean matching 0.995 (0.972, 1.017)
- Pattern mixture models 0.993 (0.971, 1.015)
- Nearest neighbour imputation 0.996 (0.974,
1.018) -
16Comparing Imputation Methods
- Patients with a complete history of observed
coping scores were identified and some values
were removed to imitate the missing data pattern
in the full data - 150 simulated datasets with artificially removed
coping scores were generated - The differences between the imputed coping score
and the real coping score artificially removed
were calculated
17Comparing Imputation Methods
- From the differences between the imputed coping
score and the real coping score artificially
removed, the mean and standard deviation of the
difference between the imputed coping score and
the missing coping score was estimated for the
imputation method
18Estimated Difference BetweenImputed Coping Score
and Missing Coping Score
- Mean Standard Deviation
- Difference of Difference
- LOCF -0.72 20.64
- Median imputation
- Median of patients observed scores 2.10 18.02
- Median of time period 11.24 25.18
- Median of treatment arm by time
period 10.21 25.01 - Linear regression
- Using previous coping scores 5.36 18.37
- Using concurrent variables 9.40 25.62
19Estimated Difference BetweenImputed Coping Score
and Missing Coping ScoreMean of 150 Simulations,
First Artificial Dataset
-
- Mean Standard Deviation
- Difference of Difference
- Bootstrapping, subgroups defined by
- Baseline coping score 3.17 30.58
- Previous coping score 2.54 27.41
-
- Predicted mean matching 2.07 25.36
- Pattern mixture models -0.76 20.88
- Nearest neighbour imputation 2.94 30.25
-
20Simulated Dataset with RelationshipBetween DFS
and Coping Score
- A simulated dataset was created where good
quality of life is associated with high DFS - The simulated coping scores for the observed
coping scores were selected at random from a
range of possible values - For each coping score assessment expected, a
simulated coping score was generated
21Simulated Dataset with RelationshipBetween DFS
and Coping Score
- The same missing data pattern was used as the
original IBCSG data - There is no relationship between time period and
coping score - As expected, the square root of the coping score
(S_Pacis) was a significant parameter for all
common imputation methods
22Results from Time-Dependent Cox Model(Stratified
by Trial)
- Hazard Ratio 95 CI for HR
- Full simulated data 3.377 (3.216, 3.546)
- LOCF 3.380 (3.205, 3.564)
- Median imputation
- Median of patients observed scores 3.398 (3.232,
3.573) - Median of time period 2.388 (2.295, 2.485)
- Median of treatment arm by time
period 2.395 (2.302, 2.493) - Linear regression
- Using previous coping scores 3.462 (3.281,
3.654) - Using concurrent variables 3.048 (2.903, 3.200)
23Results Time-Dependent Cox Model(Stratified by
Trial)Mean of 150 Iterations
- Hazard Ratio 95 CI for HR
- Bootstrapping, subgroups defined by
- Baseline coping score 3.125 (2.970, 3.288)
- Previous coping score 2.722 (2.602, 2.847)
- Predicted mean matching 3.068 (2.920, 3.225)
- Pattern mixture models 3.207 (3.048, 3.374)
- Nearest neighbour imputation 3.207 (3.046,
3.376) -
24ConclusionsComparing imputed and missing scores
- In the IBCSG data, there is large within patient
and between patient variability - For all common methods the standard deviation of
the difference between the imputed coping score
and the missing coping score was high, indicating
a lack of precision in predicting the missing
coping score - The estimated standard deviations of the
difference between the imputed coping score and
missing coping score were similar for the
simple imputation methods as the multiple
imputation methods
25ConclusionsHazard Ratios in Cox Model
- No evidence of a relationship between quality of
life and disease-free survival in the IBCSG data - The multiple imputation methods showed hazard
ratios which were similar for each application - When imputing the explanatory variable in a time
dependent Cox regression there is no effect of
imputation method on the hazard ratio - In the simulated data set with a strong
relationship between coping score and DFS but the
same missing data pattern as in IBCSG data there
is evidence that some imputation methods are
biased (underestimate) in the estimate of the
hazard ratio - Among common methods, LOCF worked well in
imputing both the IBCSG data and the simulated
data set with a strong relationship between
coping score and DFS