Title: Visual displays for the comparison of two survival functions
1Visual displays for the comparison of two
survival functions
- John Reynolds
- Centre for Biostatistics Clinical Trials
- Peter MacCallum Cancer Centre
- East Melbourne
2How to display statistical uncertainty in
survival plots?
- Plots should include some measure of statistical
uncertainty, otherwise any visual signs of
treatment differences might look more convincing
than they really are. Either SEs or CIs should
be displayed at regular time points, or an
overall estimate of treatment difference (eg,
relative risk) with its 95 CI should be given. - whether SEs or 95 CIs should be plotted is
open to debate - Pocock, S.J., Clayton, T.C. and Altman, D.G.
(2002) Survival plots of time-to-event outcomes
in clinical trials good practice and pitfalls.
The Lancet 359 1686-1689.
3Example Gastric cancer
- Chemotherapy alone versus chemotherapy plus
radiation - 45 patients in each arm
- Primary Endpoint Overall Survival time (from
registration) to death from any cause - Therneau, T. and Grambsch, P. (2000) Modeling
survival data Extending the Cox model.
Springer-Verlag, New York. Chapter 6.
4Gastric cancer - overall survival
5Gastric cancer treatment difference?
- The log-rank (Mantel-Haenszel) test for the
total curve comparison has a p-value of 0.251 - The log-rank test is not so powerful when the
hazards arent proportional (powerful when
survival is exponential) - The PetoPeto modification to the Gehan-Wilcoxon
test has a p-value of 0.030 - The G-W test is more powerful when survival is
logistic and gives more weight to earlier
survival experience - All this is well-known see, for example,
Friedman, L.M., Furberg, C.D. and DeMets, D.L.
(1998) Fundamentals of clinical trials (3rd Ed).
Springer-Verlag, New York.
6Gastric cancer treatment difference?
- The survival curves look to be different in the
first 2 years - In this case, reporting the p-value from the
log-rank test along with a graph of the two
curves is probably not a good summary of the data
7Question
- Should we help viewers of such graphs make
unplanned comparisons of aspects of the curves
such as - Is there a significant difference in one-year
survival? (A vertical comparison) - Is there a significant difference in median
survival time? (A horizontal comparison)
8Answer
- I think the answer is yes if the purpose of the
graph is to summarise the data in the study
rather than to lend support to an outcome of a
hypothesis test specified in a protocol - Exploration and summary of data versus
confirmation and summary of a planned hypothesis
test
9General Approach
- Uncertainty envelopes around the curves
- Overlap indicates no significant difference of a
pointwise test - Underlap indicates a significant difference of
a pointwise test - ? SE, too anti-conservative (? ? 0.16)
- ? 95 CI, too conservative (? ? 0.006)
- ? LSD/2, just-right? (? ? 0.05)
10General Approach (contd)
- Similar to the LSIs of Andrews, H.P., Snee, R.D.
and Sarner, M.H. (1980) Graphical display of
means. American Statistician 34 195-199, and,
Hannah, M. and Quigley, P. (1996) Presentation of
ordinal regression analysis on the ordinal scale.
Biometrics 52 771-775. - Except we dont have to worry about the
approximation of k(k-1)/2 square roots of sums by
sums of k square roots - Only need to worry about unequal SEs at each
point - We plot estimate ? 1.96?delta where delta is
related to the standard errors (SE) of the
estimates as follows
11Derivation of deltas
12General Approach (contd)
- The pointwise comparison of estimates via the
overlap (and underlap) of these uncertainty
intervals will behave like pairwise t-tests (or
z-tests)
13Vertical Comparisons
- Comparing surviving at a given time
- Use Kaplan-Meier approach to estimate the
cumulative hazard - H(t) log(S(t))
- at each event time for each group
- Compute the SE of this estimate in the usual way
(see for example Chapter 2 of Collett, D. (2003)
Modelling survival data in medical research (2nd
Ed). Chapman Hall/CRC Press, Boca Raton) - Compute the uncertainty interval
- estimate ? 1.96?delta
- then back-transform (ie. exponentiate) and plot
14Vertical Comparisons (contd)
- We work on the scale of the cumulative hazard,
H-log(S) for ease and other reasons (Link, C.
(1984) Confidence intervals for the survival
function using Coxs proportional-hazard model
with covariates. Biometrics 40 601-610) - Easy to write a function to do this in
- S-PLUS2000
- a plug for our conference sponsor
15Gastric cancer vertical comparisons
16Gastric cancer vertical comparisons
- Overlap of cross-hatched regions indicates no
significant differences at the associated time
points - Daylight between the curves (ie. underlap of
cross-hatched regions) indicates significant
differences at those time points (pointwise
comparisons on the scale of the cumulative
hazard) - The difference in the early survival experience
of the two arms (from about day 144 to day 381)
is readily apparent in the graph - Data to ink ratio uncomfortably low - see Tufte,
E.R.(1983) Visual display of quantitative
information. Graphics press, Cheshire.
17Horizontal Comparisons
- Comparing percentiles of each group (eg. median
of group 1 with median of group 2) - Using the K-M estimated survivor function, the
estimated pth percentile is the smallest
observed event time t(p) for which - S(t(p)) lt 1 (p/100)
- The SE of the estimated pth percentile can be
found from the usual delta method formula (see
Collett op.cit.)
18Horizontal Comparisons (contd)
- SE of the estimated pth percentile
- where the SE of the survival function estimate
uses Greenwoods formula and where the estimate
of the density function (a ratio of differences)
can be very unstable!
19Gastric cancer horizontal comparisons
20Gastric cancer horizontal comparisons
- Overlap of cross-hatched regions indicates no
significant differences between treatment arms at
those survival proportions or percentages - Evidently the times associated with the 55th
through to the 85th percentiles of survival are
significantly different between the two treatment
arms (as judged by pointwise tests) - We have had to limit our comparisons to the 25th
through to the 95th percentiles
21Horizontal Comparisons - Issues
- Which percentile test to use what are the
operating characteristics of various tests? - Weve used a crude asymptotic z-test on the scale
of the survival probability - Where and how to automatically restrict
comparisons estimation of the density function
of the survival distribution (required for the
variance estimate of the percentile) is the
problem
22Another example Monoclonal gammopathy of
undetermined significance (MGUS)
- 241 patients diagnosed at the Mayo Clinic with an
apparently benign monoclonal gammopathy before
January 1971 were followed forward to 1992. - 140 males, 101 females
- Example 8.4.1 in Therneau Grambsch op. cit.
- We investigate the gender difference
23MGUS Overall survival
24MGUS Vertical Comparisons
25MGUS Horizontal Comparisons
26MGUS Smoothed (lowess, f0.2) Horizontal
Comparisons?
27Summary and Conclusions
- Could be a useful exploratory tool?
- But dangerous in some hands. How much daylight
shining between the curtains, which shroud the
curves, should cause us to take action? - Characterising the data by emphasizing the
results of a collection of pointwise tests,
rather than the actual data (!) (cf. Tufte op.
cit.) - Re the horizontal comparison, a more stable
estimation procedure for SEs of percentiles is
required - Identifying and fitting models from a suitable
parametric family neatly avoids this whole issue
the curves are everywhere different, except
at points of intersection, when one or more
parameters are significantly different