Title: Selecting the Appropriate Statistical Distribution for a Primary Analysis
1Selecting the Appropriate Statistical
Distribution for a Primary Analysis
2 A Study of Xeroderma Pigmentosa (XP)
- A characteristic of XP is the formation of
Actinic Keratoses (AK s ) - Multiple lesions appear haphazardly on a
patients back - The rate of appearance may not be the same for
different patients
3Background
- Analysis Rank Sum test.
- Late in study the Statistical Analysis Plan (SAP)
was amended to use Poisson regression - Unclear if stepwise selection of covariates was
planned a priori
4Study Results
- Poisson regression analysis showed highly
significant treatment difference (p0.009)
adjusting for baseline AK, age, and age x
treatment interaction (stepwise selection) - All these effects were highly significant.
- Substantial outlier problem
5Assumptions
- Each patient has the same incidence rate, ? per
area unit. - Chance of more than one AK in small area unit is
negligible. - Non-overlapping lesions are independent, that is,
lesions occurring in one area of the body are not
affected by those occurring in another area.
6Outliers
- Outliers are observations that are jarringly
different from the remainder of the data - May be multiple outliers
- If frequency is large, this may be evidence that
we have a mixture distribution. - Can substantially affect analysis
7Analyses
- Two-Sample Wilcoxon rank-sum (Mann-Whitney) test
- trt obs rank sum expected
- -----------------------------------------
- 0 9 158 135
- 1 20 277 300
- -----------------------------------------
- Combined 29 435 435
- unadjusted variance 450.00
- adjustment for ties -15.07
- ----------
- adjusted variance 434.93
- Ho ak12tot(trt0) ak12tot(trt1)
- z 1.103
- Prob gt z 0.2701
8Distribution of AK Data at Baseline (Stem and
Leaf)(Yarosh et al, Lancet)
Lead Trailing digits 0
00000000000000000011223335 // 4 27
// 10 0 ? oops!
9Distribution of 12 Month AK Total Data (Stem and
Leaf)
. stem ak12tot,w(10) Lead Trailing digits
0 000000001111222233457 1 00345 2
3 7 // 7 1 8 9
// 19 3 ? same patient - in placebo group
10Results of Poisson Analyses
- Poisson regression Number of obs
29 - LR chi2(3)
1044.65 - Prob gt chi2
0.0000 - Log likelihood -127.46684 Pseudo R2
0.8038 - --------------------------------------------------
-------- - ak12tot Coef. Std. Err. z Pgtz 95 Conf.
Interval - -------------------------------------------------
-------- - age .017 .0056 3.00 0.003 .0058
.0276 - trt .532 .167 3.20 0.001 .2061
.859 - akb .045 .0019 23.10 0.000 .0409
.0485 - _cons .658 .219 3.00 0.003 .2282
1.0878 - --------------------------------------------------
-------- - G-O-F in control group, ?2 1222.5 with 8 d.f.
- G-O-F in treatment group, ?2 682.5 with 19 d.f.
11Permutation Test
- Procedure Scramble treatment codes and redo
analysis. Repeat many (5,000?) times. - Count number of times the coefficient for
treatment exceeds the observed value.
12Command and Output
- . permute trt "permpois trt ak12tot age akb"
rtrtrtrt ragerage rakbrakb ,reps(5000) d - command permpois trt ak12tot age akb
- statistics rtrt rtrt
- rage rage
- rakb rakb
- permute var trt
- Monte Carlo permutation statistics Number of
obs 30 - Replications
5000 - --------------------------------------------------
-------- - T T(obs) c n pc/n
SE(p) - -------------------------------------------------
-------- - rtrt .5324557 2660 5000 0.5320
0.0071 - rage .0167116 3577 5000 0.7154
0.0064 - rakb .0446938 1118 5000 0.2236
0.0059 - --------------------------------------------------
-------- - Note c T gt T(obs)
13Permutation Tests (2)
- Poisson with 5000 Replications
- Treatment p 0.57
- Age p 0.62
- AK Baseline p 0.28
- All significant results disappear
14Results of Poisson Analysis
- Sponsor found that all terms were highly
significant (including the treatment x age
interaction). - We reproduced this analysis.
- We also did a Poisson goodness-of-fit test that
strongly rejected the assumption of a Poisson
distribution. - What does a highly significant result mean when
the model is wrong?
15Conclusions
- The data are poorly fit by both Poisson and
Negative Binomial distributions - Permutation tests suggest no treatment effect
unless treatment by age interaction is included - Justification of interaction term by stepwise
procedure is exploratory - Outliers are a problem and can affect the
conclusions.
16Conclusions (2)
- The results of the study are based on exploratory
data analysis. - The analysis is based on wrong assumptions of the
data. - Our analyses based on distribution free tests do
not agree with the sponsors results. - The results based on appropriate assumptions do
not support approval of the product.
17Suggestions
- Conduct a phase II study to determine appropriate
covariates. - Need to use appropriate inclusion / exclusion
criteria. - Stratification.
- a priori specification of full analysis
18Reference
- Yarosh D. et al., "Effect of topically applied
T4 endonuclease V in liposomes on skin cancer in
xeroderma pigmentosum a randomised study" Lancet
357926-929, 2001.
19The End
20Grid on Back
21The Data
- -------------------------
- sex trt akb ak12tot
- -------------------------
- F 0 0 5
- M 0 0 1
- F 0 0 1
- F 0 0 0
- F 0 1 15
- -------------------------
- M 0 0 3
- F 0 100 193
- M 0 0 2
- M 0 2 13
- M 1 47 71
- -------------------------
- -------------------------
- sex trt akb ak12tot
- -------------------------
- F 1 3 2
- F 1 0 10
- M 1 0 0
- F 1 0 2
- M 1 0 0
- -------------------------
- F 1 0 0
- F 1 3 10
- F 1 1 0
- F 1 0 4
- F 1 5 3
- -------------------------
- M 1 0 0
- F 1 0 2
- F 1 0 7
- F 1 3 14
22Descriptive Statistics (1)
23Descriptive Statistics (2)
24Negative Binomial Model
- Need a model that allows for individual
variability. - Negative binomial distribution assumes that each
patient has Poisson, but incidence rate varies
according to a gamma distribution. - Treatment p 0.64
- Age p 0.45
- AK Baseline p 0.0001
- Age x Treat p lt0.001
- Main effect of treatment is not interpretable.
Need to look at effects separately by age.
25Negative Binomial Results
- This model shows only that the baseline AK and
age x treatment effects are significant factors. - It also gives a test for whether the data are
Poisson the test rejects the Poisson
Distribution plt0.0005 - A test based on chisquare test (obs - exp)
suggests that these data are not negative
binomial.