Title: Statistics 262: Intermediate Biostatistics
1Statistics 262 Intermediate Biostatistics
KaplanMeier methods and Parametric Regression
methods
2More on KaplanMeier estimator of
S(t) (productlimit estimator or KM estimator)
 When there are no censored data, the KM estimator
is simple and intuitive  Estimated S(t) proportion of observations with
failure times gt t.  For example, if you are following 10 patients,
and 3 of them die by the end of the first year,
then your best estimate of S(1 year) 70.  When there are censored data, KM provides
estimate of S(t) that takes censoring into
account (see last weeks lecture).  If the censored observation had actually been a
failure S(1 year)4/53/42/32/540  KM estimator is defined only at times when events
occur! (empirically defined)
3KM (productlimit) estimator, formally
4KM (productlimit) estimator, formally
This formula gives the productlimit estimate of
survival at each time an event happens.
5Example 1 timetoconception for subfertile women
Failure here is a good thing. 38 women (in
1982) were treated for infertility with
laparoscopy and hydrotubation. All women were
followed for up to 2years to describe
timetoconception. The event is conception, and
women "survived" until they conceived.
Example from BMJ, Dec 1998 317 1572  1580.
6Raw data Time (months) to conception or
censoring in 38 subfertile women after
laparoscopy and hydrotubation (1982 study)
Data from Luthra P, Bland JM, Stanton SL.
Incidence of pregnancy after laparoscopy and
hydrotubation. BMJ 1982 284 10131014
7Corresponding KaplanMeier Curve
S(t) is estimated at 9 event times. (stepwise
function)
8Raw data Time (months) to conception or
censoring in 38 subfertile women after
laparoscopy and hydrotubation (1982 study)
Data from Luthra P, Bland JM, Stanton SL.
Incidence of pregnancy after laparoscopy and
hydrotubation. BMJ 1982 284 10131014
9Raw data Time (months) to conception or
censoring in 38 subfertile women after
laparoscopy and hydrotubation (1982 study)
Data from Luthra P, Bland JM, Stanton SL.
Incidence of pregnancy after laparoscopy and
hydrotubation. BMJ 1982 284 10131014
10Corresponding KaplanMeier Curve
6 women conceived in 1st month (1st menstrual
cycle). Therefore, 32/38 survived
pregnancyfree past 1 month.
11Corresponding KaplanMeier Curve
12Important detail of how the data were
coded Censoring at t2 indicates survival PAST
the 2nd cycle (i.e., we know the woman survived
her 2nd cycle pregnancyfree). Thus, for
calculating KM estimator at 2 months, this person
should still be included in the risk set. Think
of it as 2 months, e.g., 2.1 months.
Raw data Time (months) to conception or
censoring in 38 subfertile women after
laparoscopy and hydrotubation (1982 study)
Data from Luthra P, Bland JM, Stanton SL.
Incidence of pregnancy after laparoscopy and
hydrotubation. BMJ 1982 284 10131014
13Corresponding KaplanMeier Curve
14Corresponding KaplanMeier Curve
5 women conceive in 2nd month. The risk set at
event time 2 included 32 women. Therefore,
27/3284.4 survived event time 2
pregnancyfree.
Can get an estimate of the hazard rate here,
h(t2) 5/3215.6. Given that you didnt get
pregnant in month 1, you have an estimated 5/32
chance of conceiving in the 2nd month. And
estimate of density (marginal probability of
conceiving in month 2) f(t)h(t)S(t)(.711)(.15
6)11
15Raw data Time (months) to conception or
censoring in 38 subfertile women after
laparoscopy and hydrotubation (1982 study)
Data from Luthra P, Bland JM, Stanton SL.
Incidence of pregnancy after laparoscopy and
hydrotubation. BMJ 1982 284 10131014
16Corresponding KaplanMeier Curve
17Corresponding KaplanMeier Curve
3 women conceive in the 3rd month. The risk set
at event time 3 included 26 women. 23/2688.5
survived event time 3 pregnancyfree.
18Raw data Time (months) to conception or
censoring in 38 subfertile women after
laparoscopy and hydrotubation (1982 study)
Risk set at 4 months includes 22 women
Data from Luthra P, Bland JM, Stanton SL.
Incidence of pregnancy after laparoscopy and
hydrotubation. BMJ 1982 284 10131014
19Corresponding KaplanMeier Curve
20Corresponding KaplanMeier Curve
3 women conceive in the 4th month, and 1 was
censored between months 3 and 4. The risk set at
event time 4 included 22 women. 19/2286.4
survived event time 4 pregnancyfree.
And estimate of density (marginal probability of
conceiving in month 4) f(t)h(t)S(t)(.136)
(.542)7.4
21Raw data Time (months) to conception or
censoring in 38 subfertile women after
laparoscopy and hydrotubation (1982 study)
Risk set at 6 months includes 18 women
Data from Luthra P, Bland JM, Stanton SL.
Incidence of pregnancy after laparoscopy and
hydrotubation. BMJ 1982 284 10131014
22Corresponding KaplanMeier Curve
23Corresponding KaplanMeier Curve
2 women conceive in the 6th month of the study,
and one was censored between months 4 and 6. The
risk set at event time 5 included 18
women. 16/1888.8 survived event time 5
pregnancyfree.
24Skipping ahead to the 9th and final event time
(months16)
25Raw data Time (months) to conception or
censoring in 38 subfertile women after
laparoscopy and hydrotubation (1982 study)
2 remaining at 16 months (9th event time)
Data from Luthra P, Bland JM, Stanton SL.
Incidence of pregnancy after laparoscopy and
hydrotubation. BMJ 1982 284 10131014
26Skipping ahead to the 9th and final event time
(months16)
Tail here just represents that the final 2 women
did not conceive (cannot make many inferences
from the end of a KM curve)!
27KaplanMeier SAS output
 The LIFETEST
Procedure  ProductLimit
Survival Estimates 
Survival 
Standard Number Number  time Survival Failure
Error Failed Left  0.0000 1.0000 0
0 0 38  1.0000 . .
. 1 37  1.0000 . .
. 2 36  1.0000 . .
. 3 35  1.0000 . .
. 4 34  1.0000 . .
. 5 33  1.0000 0.8421 0.1579
0.0592 6 32  2.0000 . .
. 7 31  2.0000 . .
. 8 30  2.0000 . .
. 9 29
28KaplanMeier SAS output

 Survival

Standard Number Number  time Survival Failure
Error Failed Left  6.0000 . .
. 18 17  6.0000 0.4825 0.5175
0.0834 19 16  7.0000 . .
. 19 15  7.0000 . .
. 19 14  8.0000 . .
. 19 13  8.0000 . .
. 19 12  9.0000 . .
. 20 11  9.0000 . .
. 21 10  9.0000 0.3619 0.6381
0.0869 22 9  9.0000 . .
. 22 8  9.0000 . .
. 22 7  9.0000 . .
. 22 6  10.0000 0.3016 0.6984
0.0910 23 5
29Monday Gut Check Problem
 Calculate the productlimit estimate of survival
for the following data (n9)
Timetoevent (months) Survival (1died/0censored)
10 0
2 1
4 0
8 1
12 0
14 0
10 1
1 0
3 0
30Not so easy to get a plot of the actual hazard
function! In SAS, need a complicated MACRO, and
depends on assumptionsheres what I get from
Paul Allisons macro for these data
31At best, you can get the cumulative hazard
function
32Cumulative Hazard Function
 If the hazard function is constant, e.g. h(t)k,
then the cumulative hazard function will be
linear (and higher hazards will have steeper
slopes)
 If the hazard function is increasing with time,
e.g. h(t)kt, then the cumulative hazard function
will be curved up, for example h(t)kt gives a
quadratic
 If the hazard function is decreasing over time,
e.g. h(t)k/t, then the cumulative hazard
function should be curved down, for example
33KaplanMeier example 2
 Researchers randomized 44 patients with chronic
active hepatitis were to receive prednisolone or
no treatment (control), then compared survival
curves.
Example from BMJ 1998317468469 ( 15 August )
34Survival times (months) of 44 patients with
chronic active hepatitis randomised to receive
prednisolone or no treatment.
Data from BMJ 1998317468469 ( 15 August )
censored
35KaplanMeier example 2
Are these two curves different?
Misleading to the eyeapparent convergence by end
of study. But this is due to 6 controls who
survived fairly long, and 3 events in the
treatment group when the sample size was small.
36Control group
 Survival

Standard Number Number  time Survival Failure
Error Failed Left  0.000 1.0000 0
0 0 22  2.000 0.9545 0.0455
0.0444 1 21  3.000 0.9091 0.0909
0.0613 2 20  4.000 0.8636 0.1364
0.0732 3 19  7.000 0.8182 0.1818
0.0822 4 18  10.000 0.7727 0.2273
0.0893 5 17  22.000 0.7273 0.2727
0.0950 6 16  28.000 0.6818 0.3182
0.0993 7 15  29.000 0.6364 0.3636
0.1026 8 14  32.000 0.5909 0.4091
0.1048 9 13  37.000 0.5455 0.4545
0.1062 10 12  40.000 0.5000 0.5000
0.1066 11 11  41.000 0.4545 0.5455
0.1062 12 10  54.000 0.4091 0.5909
0.1048 13 9  61.000 0.3636 0.6364
0.1026 14 8
6 controls made it past 100 months.
37treated group


Survival 
Standard Number Number  time Survival Failure
Error Failed Left  0.000 1.0000 0
0 0 22  2.000 0.9545 0.0455
0.0444 1 21  6.000 0.9091 0.0909
0.0613 2 20  12.000 0.8636 0.1364
0.0732 3 19  54.000 0.8182 0.1818
0.0822 4 18  56.000 . .
. 4 17  68.000 0.7701 0.2299
0.0904 5 16  89.000 0.7219 0.2781
0.0967 6 15  96.000 . .
. 7 14  96.000 0.6257 0.3743
0.1051 8 13  125.000 . .
. 8 12  128.000 . .
. 8 11  131.000 . .
. 8 10  140.000 . .
. 8 9
38Pointwise confidence intervals
We will not worry about mathematical formula for
confidence bands. The important point is that
there is a confidence interval for each estimate
of S(t). (SAS uses Greenwoods formula.)
39Logrank test
 Test of Equality over Strata
 Pr gt
Test ChiSquare DF
ChiSquare  LogRank 4.6599 1 0.0309
 Wilcoxon 6.5435 1 0.0105
 2Log(LR) 5.4096 1 0.0200
Chisquare test (with 1 df) of the (overall)
difference between the two groups. Groups appear
significantly different.
40Logrank test
Logrank test is just a CochranMantelHaenszel
chisquare test! Anyone remember (know) what
this is?
41CMH test of conditional independence
K Strata unique event times
Nk
42CMH test of conditional independence
K Strata unique event times
Nk
43CMH test of conditional independence
How do you know that this is a chisquare with 1
df?
44Event time 1 (2 months), control group
 Survival

Standard Number Number  time Survival Failure
Error Failed Left  0.000 1.0000 0
0 0 22  2.000 0.9545 0.0455
0.0444 1 21  3.000 0.9091 0.0909
0.0613 2 20  4.000 0.8636 0.1364
0.0732 3 19  7.000 0.8182 0.1818
0.0822 4 18  10.000 0.7727 0.2273
0.0893 5 17  22.000 0.7273 0.2727
0.0950 6 16  28.000 0.6818 0.3182
0.0993 7 15  29.000 0.6364 0.3636
0.1026 8 14  32.000 0.5909 0.4091
0.1048 9 13  37.000 0.5455 0.4545
0.1062 10 12  40.000 0.5000 0.5000
0.1066 11 11  41.000 0.4545 0.5455
0.1062 12 10  54.000 0.4091 0.5909
0.1048 13 9  61.000 0.3636 0.6364
0.1026 14 8
45Event time 1 (2 months), treated group


Survival 
Standard Number Number  time Survival Failure
Error Failed Left  0.000 1.0000 0
0 0 22  2.000 0.9545 0.0455
0.0444 1 21  6.000 0.9091 0.0909
0.0613 2 20  12.000 0.8636 0.1364
0.0732 3 19  54.000 0.8182 0.1818
0.0822 4 18  56.000 . .
. 4 17  68.000 0.7701 0.2299
0.0904 5 16  89.000 0.7219 0.2781
0.0967 6 15  96.000 . .
. 7 14  96.000 0.6257 0.3743
0.1051 8 13  125.000 . .
. 8 12  128.000 . .
. 8 11  131.000 . .
. 8 10  140.000 . .
. 8 9
46Stratum 1 event time 1
Event time 1 1 died from each group. (22 at risk
in each group)
44
47Event time 2 (3 months), control group
 Survival

Standard Number Number  time Survival Failure
Error Failed Left  0.000 1.0000 0
0 0 22  2.000 0.9545 0.0455
0.0444 1 21  3.000 0.9091 0.0909
0.0613 2 20  4.000 0.8636 0.1364
0.0732 3 19  7.000 0.8182 0.1818
0.0822 4 18  10.000 0.7727 0.2273
0.0893 5 17  22.000 0.7273 0.2727
0.0950 6 16  28.000 0.6818 0.3182
0.0993 7 15  29.000 0.6364 0.3636
0.1026 8 14  32.000 0.5909 0.4091
0.1048 9 13  37.000 0.5455 0.4545
0.1062 10 12  40.000 0.5000 0.5000
0.1066 11 11  41.000 0.4545 0.5455
0.1062 12 10  54.000 0.4091 0.5909
0.1048 13 9  61.000 0.3636 0.6364
0.1026 14 8
48Event time 2 (3 months), treated group


Survival 
Standard Number Number  time Survival Failure
Error Failed Left  0.000 1.0000 0
0 0 22  2.000 0.9545 0.0455
0.0444 1 21  6.000 0.9091 0.0909
0.0613 2 20  12.000 0.8636 0.1364
0.0732 3 19  54.000 0.8182 0.1818
0.0822 4 18  56.000 . .
. 4 17  68.000 0.7701 0.2299
0.0904 5 16  89.000 0.7219 0.2781
0.0967 6 15  96.000 . .
. 7 14  96.000 0.6257 0.3743
0.1051 8 13  125.000 . .
. 8 12  128.000 . .
. 8 11  131.000 . .
. 8 10  140.000 . .
. 8 9
49Stratum 2 event time 2
Event time 2 At 3 months, 1 died in the control
group. At that time 21 from each group were at
risk
42
50Event time 3 (4 months), control group
 Survival

Standard Number Number  time Survival Failure
Error Failed Left  0.000 1.0000 0
0 0 22  2.000 0.9545 0.0455
0.0444 1 21  3.000 0.9091 0.0909
0.0613 2 20  4.000 0.8636 0.1364
0.0732 3 19  7.000 0.8182 0.1818
0.0822 4 18  10.000 0.7727 0.2273
0.0893 5 17  22.000 0.7273 0.2727
0.0950 6 16  28.000 0.6818 0.3182
0.0993 7 15  29.000 0.6364 0.3636
0.1026 8 14  32.000 0.5909 0.4091
0.1048 9 13  37.000 0.5455 0.4545
0.1062 10 12  40.000 0.5000 0.5000
0.1066 11 11  41.000 0.4545 0.5455
0.1062 12 10  54.000 0.4091 0.5909
0.1048 13 9  61.000 0.3636 0.6364
0.1026 14 8
51Event time 3 (4 months), treated group


Survival 
Standard Number Number  time Survival Failure
Error Failed Left  0.000 1.0000 0
0 0 22  2.000 0.9545 0.0455
0.0444 1 21  6.000 0.9091 0.0909
0.0613 2 20  12.000 0.8636 0.1364
0.0732 3 19  54.000 0.8182 0.1818
0.0822 4 18  56.000 . .
. 4 17  68.000 0.7701 0.2299
0.0904 5 16  89.000 0.7219 0.2781
0.0967 6 15  96.000 . .
. 7 14  96.000 0.6257 0.3743
0.1051 8 13  125.000 . .
. 8 12  128.000 . .
. 8 11  131.000 . .
. 8 10  140.000 . .
. 8 9
52Stratum 3 event time 3 (4 months)
Event time 3 At 4 months, 1 died in the control
group. At that time 21 from the treated group and
20 from the control group were atrisk.
41
53Etc.
54Logrank test, et al.
 Test of Equality over Strata
 Pr gt
Test ChiSquare DF
ChiSquare  LogRank 4.6599 1 0.0309
 Wilcoxon 6.5435 1 0.0105
 2Log(LR) 5.4096 1 0.0200
55Estimated log(S(t))
Maybe hazard function decreases a little then
increases a little? Hard to say exactly
56Approximated h(t)
57One more graph from SAS
log(log(S(t)) log(cumulative hazard) If group
plots are parallel, this indicates that the
proportional hazards assumption is
valid. Necessary assumption for calculation of
Hazard Ratios
58Uses of KaplanMeier
 Commonly used to describe survivorship of study
population/s.  Commonly used to compare two study populations.
 Intuitive graphical presentation.
59Limitations of KaplanMeier
 Mainly descriptive
 Doesnt control for covariates
 Requires categorical predictors
 SAS does let you easily discretize continuous
variables for KM methods, for exploratory
purposes.  Cant accommodate timedependent variables
60Parametric Models for the hazard/survival function
 The class of regression models estimated by PROC
LIFEREG is known as the accelerated failure time
models.
61Shape parameter (inverse of the scale
parameter) lt1 hazard rate is decreasing gt1
hazard rate is increasing
Parameters of the Weibull distribution
62Constant hazard rate (special case of Weibull
where shape parameter 1.0)
63Recall two parametric models
 Components
 A baseline hazard function (that may change over
time).  A linear function of a set of k fixed covariates
that when exponentiated (and a few other things)
gives the relative risk.
64To get Hazard Ratios (relative risk)
 Weibull (and thus exponential) are proportional
hazards models, so hazard ratio can be
calculated.  For other parametric models, you cannot calculate
hazard ratio (hazards are not necessarily
proportional over time).
More tricky to get confidence intervals here!
65Whats a hazard ratio?
 Distinction between hazard/rate ratio and odds
ratio/risk ratio  Hazard/rate ratio ratio of incidence rates
 Odds/risk ratio ratio of proportions
66Example 1
 Using data from pregnancy study
 Recall roughly, hazard rates were similar over
time  (implies exponential model should be a good fit).
67 The LIFEREG Procedure Analysis of
Parameter Estimates
Standard 95 Confidence Chi
Parameter DF Estimate Error
Limits Square Pr gt ChiSq
Intercept 1 2.2636 0.2049 1.8621
2.6651 122.08 lt.0001 Scale
1 1.0217 0.1638 0.7462 1.3987
Weibull Shape 1 0.9788 0.1569 0.7149
1.3401
Scale of 1.0 makes a Weibull an exponential, so
looks exponential.
68Parametric estimates of survival function based
on a Weibull model (left) and exponential (right).
69Example 2 2 groups
 Using data from hepatitis trial, I fit
exponential and Weibull models in SAS using
LIFEREG (Weibull is default in LIFEREG)
70 The LIFEREG Procedure Dependent
Variable Log(time) Right
Censored Values 17
Left Censored Values
0 Interval Censored
Values 0
Name of Distribution Exponential
Log Likelihood
68.03461345
Analysis of Parameter Estimates
Standard 95
Confidence Chi Parameter DF
Estimate Error Limits Square Pr gt
ChiSq Intercept 1 4.4886
0.2500 3.9986 4.9786 322.37 lt.0001
group 1 0.9008 0.3917 0.1332
1.6685 5.29 0.0214 Scale
0 1.0000 0.0000 1.0000 1.0000
Weibull Shape 0 1.0000 0.0000 1.0000
1.0000
Hazard ratio (treated vs. control) e0.9008
.406
Interpretation median time to death was
decreased 60 in treated group or, equivalently,
mortality rate is 60 lower in treated group.
71 Model Information Dependent
Variable Log(time) Right
Censored Values 17
Left Censored Values
0 Interval Censored
Values 0
Name of Distribution Weibull
Log Likelihood
66.94904552
Analysis of Parameter Estimates
Standard 95 Confidence
Chi Parameter DF Estimate
Error Limits Square Pr gt ChiSq
Intercept 1 4.4811 0.3169 3.8601
5.1022 200.00 lt.0001 group
1 1.0544 0.5096 0.0556 2.0533
4.28 0.0385 Scale 1
1.2673 0.2139 0.9103 1.7643
Weibull Shape 1 0.7891 0.1332 0.5668
1.0985
Comparison of models using Likelihood Ratio
test 2LogLikelihood(simpler model)2LogLikelihoo
d(more complex) chisquare with 1 df (1 extra
parameter estimated for weibull model). 136134
2 NS No evidence that Weibull model is much
better than exponential.
Hazard ratio (treated vs. control) e1.05/1.267
.43
Shape parameter is just 1/scale parameter!
72Parametric estimates of cumulative survival based
on Weibull model (left) and exponential (right),
by group.
73Compare to Cox regression
 Parameter Standard
Hazard 95 Hazard Ratio  Variable DF Estimate Error
ChiSquare Pr gt ChiSq Ratio Confidence
Limits  group 1 0.83230 0.39739
4.3865 0.0362 0.435 0.200 0.948