A Course in Multiple Comparisons and Multiple Tests

About This Presentation

Title:

A Course in Multiple Comparisons and Multiple Tests

Description:

Elucidate reasons that multiple comparisons procedures (MCPs) are used, as well ... Closure Method, Global Tests; Holm, Hommel, Hochberg and Fisher combined methods ... – PowerPoint PPT presentation

Number of Views:992

Avg rating:3.0/5.0

Slides: 186

Provided by: busi246

Learn more at: https://askit.ttu.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Course in Multiple Comparisons and Multiple Tests

1
A Course in Multiple Comparisons and Multiple
Tests

Peter H. Westfall, Ph.D.
Professor of Statistics, Department of Inf.
Systems and Quant. Sci.
Texas Tech University

2
Learning Outcomes

Elucidate reasons that multiple comparisons
procedures (MCPs) are used, as well as their
controversial nature
Know when and how to use classical interval-based
MCPs including Tukey, Dunnett, and Bonferroni
Understand how MCPs affect power
Elucidate the definition of closed testing
procedures (CTPs)
Understand specific types of CTPs, benefits and
drawbacks
Distinguish false discovery rate (FDR) from
familywise error rate (FWE)
Understand general issues regarding Bayesian MCPs

3
Outline of Material
Introduction. Overview of Problems, Issues, and
Solutions, Regulatory and Ethical Perspectives,
Families of Tests, Familywise Error Rate,
Bonferroni. (pp. 5-21) Interval-Based Multiple
Inferences in the standard linear models
framework. One-way ANOVA and ANCOVA, Tukey,
Dunnett, and Monte Carlo Methods, Adjusted
p-values, general contrasts, Multivariate T
distribution, Tight Confidence Bands,
TreatmentxCovariate Interaction, Subgroup
Analysis (pp. 22-55) Power and Sample Size
Determinations for multiple comparisons. (pp.
56-65) Stepwise and Closed Testing Procedures I
P-value-Based Methods. Closure Method, Global
Tests Holm, Hommel, Hochberg and Fisher combined
methods for p-Values (pp. 66-90) Stepwise and
Closed Testing Procedures II Fixed Sequences,
Gatekeepers and I-U tests Fixed Sequence tests,
Gatekeeper procedures, Multiple hypotheses in a
gate, Intersection-union tests with application
to dose response, primary and secondary
endpoints, bioequivalence and combination
therapies (pp. 91-101)
4
Outline (Continued)
Stepwise and Closed Testing Procedures III
Methods that use logical constraints and
correlations. Lehmacher et al. Method for
Multiple endpoints Range-Based and F-based ANOVA
Tests, Fishers protected LSD, Free and
Restricted Combinations, Shaffer-Type Methods
for dose comparisons and subgroup analysis (pp.
102-118) Multiple nonparametric and
semiparametric tests Bootstrap and
Permutation-based Closed tesing. PROC MULTTEST,
examples with multiple endpoints, genetic
associations, gene expression, binary data and
adverse events (pp. 119-139) More complex models
and FWE control Heteroscedasticity, Repeated
measures, and large sample methods.
Applications multiple treatment comparisons,
crossover designs, logistic regression of cure
rates (pp. 140-152) False Discovery Rate
Benjamini and Hochbergs method, comparison with
FWE controlling methods (153-158) Bayesian
methods Simultaneous credible intervals,
ranking probabilities and loss functions, PROC
MIXED posterior sampling, Bayesian testing of
multiple endpoints (pp. 159-178) Conclusion,
discussion, references (179-184)
5
Sources of Multiplicity

Multiple variables (endpoints)
Multiple timepoints
Subgroup analysis
Multiple comparisons
Multiple tests of the same hypothesis
Variable and Model selection
Interim analysis
Hidden Multiplicity File Drawers, Outliers

6
The Problem

Significant results may fail to replicate.
Documented cases
Ioannidis (JAMA 2005)

7
An Example

Phase III clinical trial
Three arms Placebo, AC, Drug
Endpoints Signs and symptoms
Measured at weekly visits
Baseline covariates

8
Example-Continued

Features displayed at trial conclusion
Trends
Baseline adjusted comparisons of raw data
Baseline adjusted changes
Nonparametric and parametric tests
Specific endpoints and combinations of endpoints
Particular week results
AC and Placebo comparisons
Fact The features that look the best are
biased.

9
Example Continued Feature Selection

Effect Size is a feature
Effect size (mean difference)/sd
Dimensionless
.2small, .5medium, .8large
Estimated effect sizes F1, F2,,Fk
What if you select (maxF1,F2,,Fk) and publish
it?

10
The Scientific Concern
11
Feature Selection Model

Clinical Trials Simulation
Real data used
Conservative!
If you must know more
Fj mj ej, j1,,20.
Error terms or N(0,.22)
True effect sizes mj are N(.3,.12)
Features Fj are highly correlated.

12
Key Points (i) Multiplicity invites
Selection(ii) Selection has an EFFECT

Just like effects due to
Treatment
Confounding
Learning
Nonresponse
Placebo

13
Published Guidelines

ICH Guidelines
CPMP Points to consider
CDRH Statistical Guidance
ASA Ethical Guidelines

14
Regulatory/Journal/Ethical/Professional Concerns

Replicability (good science)
Fairness
Regulatory report
The drug company reported efficacy at p.047.
We repeated the analysis in several different
ways that the company might have done. In 20
re-analyses of the data, 18 produced p-values
greater than .05. Only one of the 20 re-analyses
produced a p-value smaller than .047.

15
Multiple Inferences Notation

There is a family of k inferences
Parameters are q1,, qk
Null hypotheses are
H01 q10, , H0k qk0

16
Comparisonwise Error Rate (CER)

Intervals
CERj P(Intervalj incorrect)
Tests
CERj P(Reject H0j H0j is true)
Usually CER a .05

17
Familywise Error Rate (FWE)
Intervals FWE 1 - P(all intervals are
correct) Tests FWE P(reject at
least one true null)
18
False Discovery Rate

FDR E(proportion of rejections that are
incorrect)
Let R total of rejections
Let V of erroneous rejections
FDR E(V/R) (0/0 defined as 0).
FWE P(Vgt0)

19
Bonferroni Method

Identify Family of inferences
Identify number of elements (k) in the Family
Use a/k for all inferences.
Ex With k36, p-values must be less than
0.05/36 0.0014 to be significant

20
FWE Control for Bonferroni

FWE
P(p0j1.05/36 or or p0jm .05/36
H0j1,..., H0jmtrue)
P(p0j1.05/36) P( p0jm .05/36)
(.05)m/36 .05

B
A
P(AÈB) P(A) P(B)
21
Families in clinical trials1
Efficacy
Safety
Main Interest - Primary Secondary Approval and
Labeling depend on these. Tight FWE control
needed.
Serious and known treatment- related AEs FWE
control not needed
Lesser Interest - Depending on goals and
reviewers, FWE controlling methods might be
needed.
All other AEs Reasonable to control FWE (or FDR)
Supportive Tests - mostly descriptive FWE control
not needed.
Exploratory Tests - investigate new indications
- future trials needed to confirm - do what makes
sense.
1Westfall, P. and Bretz, F. (2003). Multiplicity
in Clinical Trials. Encyclopedia of
Biopharmaceutical Statistics, second edition,
Shein-Chung Chow, ed., Marcel Decker Inc., New
York, pp. 666-673.
22
Classical Single-Step Testing and Interval
Methods to Control FWE

Simultaneous confidence intervals
Adjusted p-values
Dunnett method
Tukeys method
Simulation-based methods for general comparisons

23
Specificity and Sensitivity
then use
If you want ...

Estimates of effect sizes error margins
Confident inequalities
Overall Test

Simultaneous Confidence Intervals
Stepwise or closed tests
F-test, OBrien, etc.

24
The Model

Y Xb e
where e N(0, s2 I )
Includes ANOVA, ANCOVA, regression
For group comparisons, covariate adjustment
Not valid for survival analysis, binary data,
multivariate data

25
Example Pairwise Comparisons against Control
Goal Estimate all mean differences from control
and provide simultaneous 95 error margins
What ca to use?
26
Comparison of Critical Values
27
Results - Dunnett
The GLM Procedure Dunnett's t Tests for
gain NOTE This test controls the Type I
experimentwise error for comparisons of all
treatments againstba control. Alpha
0.05 Error Degrees of Freedom
21 Error Mean Square
210.0048 Critical Value of Dunnett's t
2.78972 Minimum Significant Difference
28.586 Comparisons significant at the 0.05 level
are indicated by . Difference
Simultaneous g Between 95
Confidence Comparison Means
Limits 1 - 0 -9.48 -38.07
19.11 4 - 0 -13.50 -42.09
15.09 5 - 0 -20.70 -49.29
7.89 2 - 0 -24.90 -53.49 3.69
6 - 0 -31.14 -59.73 -2.55
3 - 0 -33.24 -61.83 -4.65
28
ca is the 1-a quantile of the distribution of
maxi Zi-Z0/(2c2/df)1/2, called Dunnetts
two-sided range distribution.
29
Adjusted p-Values
Definition Adjusted p-value smallest FWE at
which the hypothesis is rejected. or The FWE
for which the confidence interval has 0 as a
boundary.
30
Adjusted p-values for Dunnett
proc glm datatox class g model gaing
lsmeans g/adjustdunnett pdiff run
31
Example All Pairwise Comparisons
Goal Estimate all mean differences and provide
simultaneous 95 error margins
What ca to use?
32
Comparison of Critical Values
33
Tukey Comparisons
Alpha 0.05 df 21 MSE
210.0048 Critical Value of
Studentized Range 4.597
Minimum Significant Difference 33.311
Means with the same letter are not significantly
different. Tukey Grouping
Mean N G
A 105.38 4 0
A A
95.90 4 1
A A
91.88 4 4 A
A 84.68
4 5 A
A 80.48 4
2 A
A 74.24 4 6
A
A 72.14 4 3
34
Tukey Adjusted p-Values
General Linear Models
Procedure Least
Squares Means Adjustment for
multiple comparisons Tukey G GAIN Pr gt
T H0 LSMEAN(i)LSMEAN(j) LSMEAN i/j
1 2 3 4 5 6
7 0 105.380 1 . 0.9641 0.2351
0.0507 0.8364 0.4319 0.0769 1 95.900 2
0.9641 . 0.7391 0.2810 0.9996 0.9227
0.3806 2 80.480 3 0.2351 0.7391 .
0.9808 0.9172 0.9995 0.9958 3 72.140 4
0.0507 0.2810 0.9808 . 0.4860 0.8771
1.0000 4 91.880 5 0.8364 0.9996 0.9172
0.4860 . 0.9910 0.6102 5 84.680 6
0.4319 0.9227 0.9995 0.8771 0.9910 .
0.9438 6 74.240 7 0.0769 0.3806 0.9958
1.0000 0.6102 0.9438 .
35
Tukey Simultaneous Intervals
Simultaneous
Simultaneous
Lower Difference Upper
Confidence Between
Confidence i j Limit
Means Limit 1
2 -23.831013 9.480000
42.791013 1 3 -8.411013
24.900000 58.211013 1
4 -0.071013 33.240000
66.551013 1 5 -19.811013
13.500000 46.811013 1
6 -12.611013 20.700000
54.011013 1 7 -2.171013
31.140000 64.451013 2
3 -17.891013 15.420000
48.731013 2 4 -9.551013
23.760000 57.071013 2
5 -29.291013 4.020000
37.331013 2 6 -22.091013
11.220000 44.531013 2
7 -11.651013 21.660000
54.971013 3 4 -24.971013
8.340000 41.651013 3
5 -44.711013 -11.400000
21.911013 3 6 -37.511013
-4.200000 29.111013 3
7 -27.071013 6.240000
39.551013 4 5 -53.051013
-19.740000 13.571013 4
6 -45.851013 -12.540000
20.771013 4 7 -35.411013
-2.100000 31.211013 5
6 -26.111013 7.200000
40.511013 5 7 -15.671013
17.640000 50.951013 6
7 -22.871013 10.440000
43.751013
36
ca is (1/Ö2) the 1-a quantile of the
distribution of maxi,i Zi-Zi/(c2/df)1/2,
which is called the Studentized range
distribution.
37
Unbalanced Designs and/or Covariates

Tukey method is conservative when the design is
unbalanced and/or there are covariates otherwise
exact
Dunnett method is conservative when there are
covariates otherwise exact
Conservative means
True FWE lt Nominal FWE
also means less powerful

38
Tukey-Kramer Method for all pairwise comparisons

Let ca be the critical value for the balanced
case using Tukeys method and the correct df.
Intervals are
Conservative (Hayter, 1984 Annals)

39
Exact Method for General Comparisons of Means
40
Multivariate T-Distribution Details
40
41
Calculation of Exact ca

Edwards and Berry Simple simulation
Hsu and Nelson Factor analytic control variate
(better)
Genz and Bretz Integration using lattice methods
(best)
Even with simple simulation, the value ca can be
obtained
with reasonable precision.

Edwards, D., and Berry, J. (1987) The efficiency
of simulation-based multiple comparisons.
Biometrics, 43, 913-928. Hsu, J.C. and Nelson,
B.L. (1998) Multiple comparisons in the general
linear model. Journal of Computational and
Graphical Statistics, 7, 23-41. Genz, A. and
Bretz, F. (1999), Numerical Computation of
Multivariate t Probabilities with Application to
Power Calculation of Multiple Constrasts, J.
Stat. Comp. Simul. 63, pp. 361-378.
42
Example ANCOVA with two covariates
Y Diastolic BP Group Therapy
(Control, D1, D2, D3) X1 Baseline
Diastolic BP X2 Baseline Systolic BP
Goal Compare all therapies, controlling for
baseline
proc glm dataresearch.bpr class therapy
model dbp10 therapy dbp7 sbp7 lsmeans
therapy/pdiff cl adjustsimulate(nsamp
10000000 cvadjust seed121011
report) run quit
43
Results From ANCOVA
Source DF Type III SS Mean Square F
Value Pr gt F THERAPY 3 677.429172
225.809724 6.05 0.0006 DBP7 1
6832.878653 6832.878653 183.06 lt.0001
SBP7 1 51.123459 51.123459
1.37 0.2435
Least Squares Means for Effect THERAPY
Difference Simultaneous 95
Between Confidence Limits for i
j Means LSMean(i)-LSMean(j) 1
2 2.832658 -0.424816 6.09013
1 3 1.328481 -2.099566
4.756527 1 4 -2.536262
-5.981471 0.908947 2 3
-1.504178 -4.846403 1.838047 2 4
-5.368920 -8.734744 -2.003097 3 4
-3.864743 -7.398994 -0.330491
Note 4 is control
44
Details for Quantile Simulation
Random number seed
121011 Comparison type
All Sample
size 9999938
Target alpha 0.05
Accuracy radius (target)
0.0002 Accuracy radius
(actual) 437E-7
Accuracy confidence 99
Simulation Results
Estimated
99 Confidence Method 95
Quantile Alpha Limits
Simulated 2.594159 0.0500
0.0500 0.0500 Tukey-Kramer
2.594637 0.0499 0.0499 0.0500
Bonferroni 2.669484 0.0411
0.0410 0.0411 Sidak
2.662029 0.0419 0.0418 0.0419
GT-2 2.660647 0.0420
0.0420 0.0421 Scheffe
2.823701 0.0270 0.0270 0.0270
T 1.974017 0.2017
0.2016 0.2018
NOTE PROCEDURE GLM used real time 21.23 seconds
45
Results from ANCOVA-Dunnett
H0LSMean
Control THERAPY DBP10 LSMEAN
Pr gt t Dose 1 88.8171113
0.1407 Dose 2 85.9844529 0.0002 Dose
3 87.4886307 0.0140 Placebo
91.3533732 Least Squares Means for Effect
THERAPY Difference
Simultaneous 95 Between
Confidence Limits for i j Means
LSMean(i)-LSMean(j) 1 4 -2.536262
-5.675847 0.603323 2 4 -5.368920
-8.436161 -2.301679 3 4 -3.864743
-7.085470 -0.644015
46
Details for Quantile Simulation-Dunnett
Random number seed
121011
Comparison type Control, two-sided
Sample size
9999938
Target alpha 0.05
Accuracy radius (target)
0.0002
Accuracy radius (actual) 139E-7
Accuracy confidence
99
Simulation Results
Estimated
99 Confidence Method
95 Quantile Alpha Limits
Simulated 2.364031
0.0500 0.0500 0.0500
Dunnett-Hsu, two-sided 2.364084
0.0500 0.0500 0.0500
Bonferroni 2.417902
0.0437 0.0437 0.0437 Sidak
2.411491 0.0444
0.0444 0.0444 GT-2
2.410664 0.0445 0.0445
0.0445 Scheffe
2.823701 0.0145 0.0145 0.0145
T 1.974017
0.1229 0.1229 0.1230
NOTE PROCEDURE GLM used real time 19.00 seconds
47
More General Inferences
Question For what values of the covariate
is treatment A better than treatment B?
48
Discussion of (Treatment Covariate) Interaction
Example
49
The GLIMMIX Procedure
Computes MC-exact simultaneous confidence
intervals and adjusted p-values for any set
of linear functions in a linear model
50
GLIMMIX syntax
proc glimmix dataresearch.tire class
make model cost make mph makemph
estimate "10" make 1 -1 makemph 10 -10,
"15" make 1 -1 makemph 15 -15, "20"
make 1 -1 makemph 20 -20, "25" make 1
-1 makemph 25 -25, "30" make 1 -1
makemph 30 -30, "35" make 1 -1
makemph 35 -35, "40" make 1 -1
makemph 40 -40, "45" make 1 -1
makemph 45 -45, "50" make 1 -1
makemph 50 -50, "55" make 1 -1
makemph 55 -55, "60" make 1 -1
makemph 60 -60, "65" make 1 -1
makemph 65 -65, "70" make 1 -1
makemph 70 -70 /adjustsimulate(nsamp10
000000 report) cl run
51
Output from PROC GLIMMIX
Simultaneous intervals are Estimate -
2.648 StdErr Label Estimate StdErr
tValue AdjLower AdjUpper 10 -4.1067
0.9143 -4.49 -6.5279 -1.6854 15
-3.4539 0.8084 -4.27 -5.5947
-1.3131 20 -2.8011 0.7101
-3.94 -4.6815 -0.9207 25 -2.1483
0.6230 -3.45 -3.7981 -0.4985 30
-1.4956 0.5524 -2.71 -2.9585
-0.03260 35 -0.8428 0.5054
-1.67 -2.1812 0.4956 40 -0.1900
0.4887 -0.39 -1.4842 1.1042 45
0.4628 0.5054 0.92 -0.8756
1.8012 50 1.1156 0.5524
2.02 -0.3474 2.5785 55 1.7683
0.6230 2.84 0.1185 3.4181 60
2.4211 0.7101 3.41 0.5407
4.3015 65 3.0739 0.8084 3.80
0.9331 5.2147 70 3.7267
0.9143 4.08 1.3054 6.1479
Bonferroni critical value is t_16,.05/213
3.377.
52
Other Applications of Linear Combinations

Multiple Trend Tests
(0,1,2,3), (0,1,2,4), (0,4,6,7)
(carcinogenicity)
(0,0,1), (0,1,1), (0,1,2) (recessive/dominant/ordi
nal genotype effects)
Subgroup Analysis
Subgroups define linear combinations (more on
next slide)

53
Subgroup Analysis Example

Data Yijkl , where iTrt,Cntrl jOld, Yng
k GoodInit, PoorInit.
Model Yijkl mijk eijkl, where
mijkmaibjgk(ab)ij(ag)ik(bg)jk
Subgroup Contrasts
m111 m112 m121
m122 m211 m212 m221 m222
Overall ¼ ¼ ¼ ¼
-¼ -¼ -¼ -¼
Older ½ ½ 0 0
-½ -½ 0 0
Younger 0 0 ½ ½
0 0 -½ -½
GoodInit ½ 0 ½ 0
-½ 0 -½ 0
PoorInit 0 ½ 0 ½
0 -½ 0 -½
OldGood 1 0 0 0
-1 0 0 0
OldPoor 0 1 0 0
0 -1 0 0
YoungGood 0 0 1 0
0 0 -1 0
YoungPoor 0 0 0 1
0 0 0 -1

54
Subgroup Analysis Results Label
Estimate StdErr tValue Probt
Adjp AdjLower AdjUpper Overall
0.7075 0.1956 3.62 0.0002
0.0015 0.2460 I Older
0.9952 0.2673 3.72 0.0002
0.0011 0.3646 I Younger
0.4197 0.2847 1.47 0.0717
0.2605 -0.2521 I GoodInitHealth
0.5871 0.2878 2.04 0.0219
0.0984 -0.09197 I PoorInitHealth
0.8279 0.2644 3.13 0.0011
0.0068 0.2039 I OldGood
0.8748 0.3387 2.58 0.0056
0.0295 0.07566 I OldPoor
1.1157 0.3231 3.45 0.0004
0.0026 0.3532 I YoungGood
0.2993 0.3562 0.84 0.2014
0.5494 -0.5413 I YoungPoor
0.5401 0.3338 1.62 0.0544
0.2091 -0.2476 I
(SAS code available upon request)
55
Summary

Include only comparisons of interest.
Utilize correlations to be less conservative.
The critical values can be computed exactly only
in balanced ANOVA for all pairwise comparisons,
or in unbalanced ANOVA for comparisons with
control.
Simulation-based methods are exact if you let
the computer run for a while. This is my general
recommendation.

56
Power Analysis

Sample size - Design of study
Power is less when you use multiple comparisons Þ
larger sample sizes
Many power definitions
Bonferroni independence are convenient (but
conservative) starting points

57
Power Definitions
Complete Power P(Reject all H0i that are
false) Minimal Power P(Reject at least one
H0i that is false) Individual Power P(Reject
a particular H0i that is false) Proportional
Power Average proportion of false H0i
that are rejected
58
Power Calculations.
Example H1 and H2 powered individually at 50
H3 and H4 powered individually at 80, all tests
independent. Complete Power P(reject H1 and
H2 and H3 and H4)
.5 .5 .8 .8 0.16. Minimal Power
P(reject H1or H2 or H3 or H4) 1-P(accept H1
and H2 and H3 and H4) 1- (1-.5) (1 -.5)
(1-.8) (1-.8) 0.99. Individual Power
P(reject H3 (say)) 0.80. (depends on the
test) Proportional Power (.5 .5 .8 .8)/4
0.65
59
Sample Size for Adequate Individual Power -
Conservative Estimate
60
Individual power of two-tail two-sample
Bonferroni t-tests
let MuDiff 5 / Smallest meaningful
difference MUx-MUy that
you want to detect / let Sigma
10.0 / A guess of the population std. dev.
/ let alpha .05 / Familywise Type I
error probability of the
test / let k 4
/ Number of tests / options ls76 data
power cer alpha/k do n 2 to 100 by
2 nsample size for each group
df n n - 2 ncp (Mudiff)/(Sigmasq
rt(2/n)) The noncentrality
parameter
tcrit tinv(1-cer/2, df) The
critical t value power 1 -
probt(tcrit, df, ncp) probt(-tcrit,df,ncp)
output end proc print
datapower run proc plot datapower plot
powern/vpos30 run
61
Graph of Power Function
Plot of powern. Legend A 1 obs, B 2 obs,
etc. power
1.0 ˆ

AAA 0.8 ˆ
AAAA

AAA
AAA
AAA
AA 0.6 ˆ
AAA n92 for 80
AA
power
AA AA
AA 0.4
ˆ A
AA
AA AA
AA 0.2 ˆ AA
AA
AA AA AAA
0.0 ˆ A
Šƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒ
ƒƒƒˆƒƒ 0 20 40
60 80 100
n
62
IndividualPower macro

Uses PROBMC and PROBT (noncentral)
Assumes that you want to use the single-step
(confidence interval based) Dunnett (one-
or two-sided) or Range (two-sided) test
Less conservative than Bonferroni
Conservative compared to stepwise procedures
IndividualPower(MCPDUNNETT2,g4,d5,s10)

Westfall et al (1999), Multiple Comparisons and
Multiple Tests Using SAS
63
IndividualPower Output
64
More general Power- Simulate!
Invocation SimPower(method dunnett
, TrueMeans (10, 10, 13, 15,
15) , s 10
, n 87 ,
seed12345 )
Output MethodDUNNETT,
Nominal FWE0.05, nrep1000
True means (10, 10, 13, 15, 15), n87, s10
Quantity Estimate
---95 CI---- Complete Power
0.28800 (0.260,0.316) Minimal
Power 0.92900 (0.913,0.945)
Proportional Power 0.65133
(0.633,0.669) True FWE
0.01900 (0.011,0.027)
Directional FWE 0.01900 (0.011,0.027)
65
Concluding Remarks - Power

Need a bigger n
Like to avoid bigger n (see sequential,
gatekeepers methods later)
Which definition?
Bonferroni and independence useful
Simulation useful especially for the more
complex methods that follow

66
Closed and Stepwise Testing Methods I Standard
P-Value Based Methods
then use
If you want ...

Estimates of effect sizes error margins
Confident inequalities
Overall Test

Simultaneous Confidence Intervals
Stepwise or closed tests
Holms Method
Hommels Method
Hochbergs Method
Fisher Combination Method
F-test, OBrien, etc.

67
Closed Testing Method(s)

Form the closure of the family by including all
intersection hypotheses.
Test every member of the closed family by a
(suitable) a-level test. (Here, a refers to
comparison-wise error rate).
A hypothesis can be rejected provided that
its corresponding test is significant at level a,
and
every other hypothesis in the family that implies
it is rejected by its a-level test.

68
Closed Testing Multiple Endpoints
H0 d1d2d3d4 0
H0 d1d2d3 0
H0 d1d2d4 0
H0 d1d3d4 0
H0 d2d3d4 0
H0 d1d4 0
H0 d2d4 0
H0 d1d2 0
H0 d1d3 0
H0 d2d3 0
H0 d3d4 0
H0 d10 p 0.0121
H0 d20 p 0.0142
H0 d30 p 0.1986
H0 d40 p 0.0191
Where dj mean difference, treatment -control,
endpoint j.
69
Closed Testing Multiple Comparisons
m1m2m3m4
m1m3, m2m4
m1m4, m2m3
m1m2, m3m4
m1m2m3
m1m2m4
m1m3m4
m2m3m4
m1m2
m1m3
m1m4
m2m3
m2m4
m3m4
Note Logical implications imply that there are
only 14 nodes, not 26 -1 63 nodes.
70
Control of FWE with Closed Tests
Suppose H0j1,..., H0jm all are true (unknown to
you which ones). Reject at least one of
H0j1,..., H0jmusing CTP ? Reject H0j1Ç... Ç
H0jm Thus, P(reject at least one of H0j1,...,
H0jm H0j1,..., H0jm
all are true) P(reject H0j1Ç... Ç H0jm
H0j1,..., H0jm all are true) a
71
Examples of Closed Testing Methods
When the Composite Test is Then the Closed
Method is

Bonferroni MinP
Resampling-Based MinP
Simes
OBrien
Simple or weighted test

Holms Method
Westfall-Young method
Hommels method
Lehmachers method
Fixed sequence test (a-priori ordered)

72
P-value Based Methods

Test global hypotheses using p-value combination
tests
Benefit Fewer model assumptions only need to
say that the p-values are valid
Allows for models other than homoscesdastic
normal linear models (like survival analysis).

73
Holms Method is Closed Testing Using the
Bonferroni MinP Test

Reject H0j1 ÇH0j2Ç... Ç H0jm if
Min (p0j1 , p0j2 ,... , p0jm )
a/m.
Or, Reject H0j1 ÇH0j2Ç... Ç H0jm if
p m Min (p0j1 , p0j2 ,... ,
p0jm ) a.
(Note that p is a valid p-value for the joint
null, comparable to p-value for Hotellings T2
test.)

74
Holms Stepdown Method
H0 d1d2d3d4 0 minp0.0121 p0.0484
H0 d1d3d4 0 minp.0121 p0.0363
H0 d2d3d4 0 minp0.0142 p0.0426
H0 d1d2d3 0 minp0.0121 p0.0363
H0 d1d2d4 0 minp0.0121 p0.0363
H0 d1d2 0 minp0.0121 p0.0242
H0 d1d3 0 minp0.0121 p0.0242
H0 d1d4 0 minp0.0121 p0.0242
H0 d2d3 0 minp0.0142 p0.0284
H0 d2d4 0 minp0.0142 p0.0284
H0 d3d4 0 minp0.0191 p0.0382
H0 d10 p 0.0121
H0 d20 p 0.0142
H0 d30 p 0.1986
H0 d40 p 0.0191
Where dj mean difference, treatment -control,
endpoint j.
75
Shortcut For Holms Method

Let H(1) ,,H(k) be the hypotheses corresponding
to p(1) p(k)
If p(1) a/k, reject H(1) and continue, else
stop and retain all H(1) ,,H(k) .
If p(2) a/(k-1), reject H(2) and continue,
else stop and retain all H(1) ,,H(k) .
If p(k) a, reject H(k)

76
Adjusted p-values for Closed Tests

The adjusted p-value for H0j is the maximum of
all p-values over all relevant nodes
In the previous example,
pA(1)0.0484,pA(2)0.0484, pA(3)0.0484,
pA(4)0.1986.
General formula for Holm pA(j) maxij
(k-i1)p(i) .

77
Worksheet For Holms Method
78
Simes Test for Global Hypotheses

Uses all p-values p1, p2, , pm not just the
MinP
Simes test rejects H01ÇH02Ç...ÇH0m if
p(j) ja/m for at least one j.
Þ p-value for the joint test is p min
(m/j)p(j)
Uniformly smaller p-value than m MinP
Type I error at most a under independence or
positive dependence of p-values

79
Rejection Regions
p2
1
a
a/2
0
1
p1
a/2
a
P(Simes Reject) 1 (1- a/2)2 (a/2)2
a P(Bonferroni Reject ) 1 (1- a/2)2 a -
(a/2)2
80
Hommels Method (Closed Simes)
H0 d1d2d3d4 0 p0.0255
H0 d1d2d3 0 p0.0213
H0 d1d2d4 0 p0.0191
H0 d1d3d4 0 p0.0287
H0 d2d3d4 0 p0.0287
H0 d1d2 0 p0.0142
H0 d1d3 0 p0.0242
H0 d1d4 0 p0.0191
H0 d2d3 0 p0.0284
H0 d2d4 0 p0.0191
H0 d3d4 0 p0.0382
H0 d10 p 0.0121
H0 d20 p 0.0142
H0 d30 p 0.1986
H0 d40 p 0.0191
Where dj mean difference, treatment -control,
endpoint j.
81
Adjusted P-values for Hommels Method

Again, take the maximum p-value over all
hypotheses that imply the given one.
In the previous example, the Hommel adjusted
p-values are pA(1)0.0287, pA(2)0.0287,
pA(3)0.0382, pA(4)0.1986.
These adjusted p-values are always smaller than
the Holm step-down adjusted p-values.

82
Adjusted P-values for Hommels Method

They are maxima over relevant nodes
In example, Hommel adjusted p-values are
pA(1)0.0287, pA(2)0.0287, pA(3)0.0382,
pA(4)0.1986.
Hommel adjusted p-value Holm adjusted
p-value

83
Hochbergs Method

A conservative but simpler approximation to
Hommels method
Hommel adjusted p-value
Hochberg adjusted p-value
Holm adjusted p-value

84
Hochbergs Shortcut Method

Let H(1) ,,H(k) be the hypotheses corresponding
to p(1) p(k)
If p(k) a, reject all H(j) and stop, else
retain H(k) and continue.
If p(k-1) a/2, reject H(2) H(k) and stop,
else retain H(k-1) and continue.
If p(1) a/k, reject H(k)
Adjusted p-values are pA(j) minji (k-i1)p(i) .

85
Worksheet for Hochbergs Method
86
Comparison of Adjusted P-Values
p-Values
Stepdown Test Raw Bonferroni
Hochberg Hommel 1 0.0121
0.0484 0.0382 0.0286 2
0.0142 0.0484 0.0382
0.0286 3 0.1986 0.1986
0.1986 0.1986 4 0.0191
0.0484 0.0382 0.0382
87
Fisher Combination Test for Independent p-Values
Reject H01ÇH02Ç...ÇH0m if -2Sln(pi) gt
c2(1-a, 2m)
88
Example Non-Overlapping Subgroup p-values
The Multtest Procedure
p-Values
Stepdown
Fisher Test Raw Bonferroni Hochberg
Hommel Combination 1 0.0784 0.3918
0.1550 0.1550 0.0784 2 0.0480
0.2883 0.1550 0.1441
0.0480 3 0.0041 0.0325 0.0305
0.0285 0.0053 4 0.0794 0.3918
0.1550 0.1550 0.0794 5 0.0044
0.0325 0.0305 0.0305 0.0056
6 0.0873 0.3918 0.1550 0.1550
0.0873 7 0.1007 0.3918
0.1550 0.1550 0.1007 8 0.1550
0.3918 0.1550 0.1550 0.1550
Non-overlapping is required by the independence
assumption.
89
Power Comparison
Liptak test stat T S F-1(pi) S Zi
90
Concluding Notes

Closed testing more powerful than single-step
(a/m rather than a/k).
P-value based methods can be used whenever
p-values are valid
Dependence issues
MinP (Holm) conservative
Simes (Hommel, Hochberg) less conservative,
rarely anti-conservative
Fisher combination, Liptak require independence

91
Closed and Stepwise Testing Methods IIFixed
Sequences and Gatekeepers

Methods Covered
Fixed Sequences (hierarchical endpoints, dose
response, non-inferiority superiority)
Gatekeepers (primary and secondary analyses)
Multiple Gatekeepers (multiple endpoints
multiple doses)
Intersection-Union tests

Doesnt really belong in this section
92
Fixed Sequence Tests

Pre-specify H1, H2, , Hk, and test in this
sequence, stopping as soon as you fail to reject.
No a-adjustment is necessary for individual
tests.
Applications
Dose response High vs. Control, then Mid vs.
Control, then Low vs. Control
Primary endpoint, then Secondary endpoint

93
Fixed Sequence as a Closed Procedure
H123 d1d2d3 0 Rej if p1 .05
H12 d1d20 Rej if p1 .05
H13 d1d3 0 Rej if p1 .05
H23 d2d3 0 Rej if p2 .05
H1 d10 Rej if p1 .05
H2 d2 0 Rej if p2 .05
H3 d3 0 Rej if p3 .05

Rej H1 if p1.05
Rej H2 if p1.05 and p2.05
Rej H3 if p1.05 and p2.05 and p3.05

94
A Seemingly Reasonable But Incorrect Protocol

1. Test Dose 2 vs Pbo, and Dose 3 vs Pbo using
the Bonferroni method (0.025 level).
2. Test Dose 1 vs Pbo at the unadjusted 0.05
level only if at least one of the first two tests
is significant at the 0.025 level.

95
The problem FWE 0.075
Moral Caution needed when there are multiple
hypotheses at some point in the sequence.
96
Correcting the Incorrect Protocol Use Closure
Where pij 2min(pi,pj)
97
References Fixed Sequence and Gatekeeper Tests

Bauer, P (1991) Multiple Testing in Clinical
Trials, Statistics in Medicine, 10, 871-890.
ONeill RT. (1997) Secondary endpoints cannot be
validly analyzed if the primary endpoint does not
demonstrate clear statistical significance.
Controlled Clinical Trials 18550 556.
DAgostino RB. (2000) Controlling alpha in
clinical trials the case for secondary
endpoints. Statistics in Medicine 19763766.
Chi GYH. (1998) Multiple testings multiple
comparisons and multiple endpoints. Drug
Information Journal 321347S1362S.
Bauer P, Röhmel J, Maurer W, Hothorn L. (1998)
Testing strategies in multi-dose experiments
including active control. Statistics in Medicine
172133 2146.
Westfall, P.H. and Krishen, A. (2001). Optimally
weighted, fixed sequence, and gatekeeping
multiple testing procedures, Journal of
Statistical Planning and Inference 99, 25-40.
Chi, G. Clinical Benefits, Decision Rules, and
Multiple Inferences, http//www.fda.gov/cder/Offi
ces/Biostatistics/chi_1/sld001.htm
Dmitrienko, A, Offen, W. and Westfall, P. (2003).
Gatekeeping strategies for clinical trials that
do not require all effects to be significant.
Stat Med. 22 2387-2400.
Chen X, Luo X, Capizzi T. (2005) The application
of enhanced parallel gatekeeping strategies. Stat
Med. 241385-97.
Alex Dmitrienko, Geert Molenberghs, Christy
Chuang-Stein, and Walter Offen (2005), Analysis
of Clinical Trials Using SAS A Practical Guide,
SAS Press.
Wiens, B, and Dmitrienko, A. (2005). The fallback
procedure for evaluating a single family of
hypotheses. J Biopharm Stat.15(6)929-42.
Dmitrienko, A., Wiens, B. and Westfall, P.
(2006). Fallback Tests in Dose Response Clinical
Trials, J Biopharm Stat, 16, 745-755.

98
Intersection-Union (IU) Tests

Union-Intersection (UI) Nulls are intersections,
alternatives are unions.
H0 d10 and d20 vs. H1 d1¹0 or d2¹0
Intersection-Union (IU) Nulls are unions,
alternatives are intersections
H0 d10 or d20 vs. H1 d1¹0 and d2¹0
IU is NOT a closed procedure. It is just a single
test of a different kind of null hypothesis.

99
Applications of I-U

Bioequivalence The TOST test
Test 1. H01 d -d0 vs. HA1 d gt -d0
Test 2. H01 d ³ d0 vs. HA1 d lt d0
Can test both at a.05, but must reject both.
Combination Therapy
Test 1. H01 m12 m1 vs. HA1 m12 gt m1
Test 2. H01 m12 m2 vs. HA1 m12 gt m2
Can test both at a.05, but must reject both.

100
Control of Type I Error for IU tests
Suppose d10 or d20. Then P(Type I error)
P(Reject H0)
(1) P(p1.05 and p2.05)
(2) lt minP(p1.05), P(p2.05)
(3) .05.
(4) Note The
inequality at (3) becomes an approximate
equality when p2 is extremely noncentral.
101
Concluding Notes Fixed Sequences and
Gatekeepers

Many times, no adjustment is necessary at all!
Other times you can gain power by specifying
gatekeeping sequences
However, you must clearly state the method and
follow the rules
There are many incorrect no adjustment
methods -
use caution

102
Closed and Stepwise Testing Methods III Methods
that Use Logical Constraints and Correlations
Methods Application Lehmac
her et al Multiple endpoints Westfall
-Tobias- Shaffer-Royen General
contrasts
103
Lehmacher et al. Method

Use OBrien test at each node (incorporates
correlations)
Do closed testing
Note Possibly no adjustment whatsoever possibly
big
adjustment

104
Calculations for Lehmachers Method
proc standard dataresearch.multend1 mean0 std1
outstdzd var Endpoint1-Endpoint4 run data
combine set stdzd H1234
Endpoint1Endpoint2Endpoint3Endpoint4 H123
Endpoint1Endpoint2Endpoint3
H124 Endpoint1Endpoint2 Endpoint4
H134 Endpoint1 Endpoint3Endpoint4
H234 Endpoint2Endpoint3Endpoint
4 H12 Endpoint1Endpoint2
H13 Endpoint1 Endpoint3
H14 Endpoint1
Endpoint4 H23
Endpoint2Endpoint3 H24
Endpoint2 Endpoint4 H34
Endpoint3Endpoint4 H1
Endpoint1 H2
Endpoint2 H3
Endpoint3
H4
Endpoint4 run proc ttest class treatment
var H1234 H123 H124 H134 H234
H12 H13 H14 H23 H24 H34 H1 H2 H3 H4
ods output tteststtests run
105
Output For Lehmachers Method
Obs Variable Method Variances
tValue DF Probt 1 H1234
Pooled Equal 2.69 109 0.0082
3 H123 Pooled Equal
2.59 109 0.0108 5 H124
Pooled Equal 3.03 109 0.0031
7 H134 Pooled Equal
2.36 109 0.0201 9 H234
Pooled Equal 2.51 109 0.0136
11 H12 Pooled Equal
3.03 109 0.0030 13 H13
Pooled Equal 2.12 109 0.0365
15 H14 Pooled Equal
2.68 109 0.0085 17 H23
Pooled Equal 2.22 109 0.0287
19 H24 Pooled Equal
2.88 109 0.0047 21 H34
Pooled Equal 2.03 109 0.0450
23 H1 Pooled Equal
2.55 109 0.0121 25 H2
Pooled Equal 2.49 109 0.0142
27 H3 Pooled Equal
1.29 109 0.1986 29 H4
Pooled Equal 2.38 109 0.0191
pA1 max(0.0121, 0.0085, 0.0365, 0.0030, 0.0201,
0.0031, 0.0108, 0.0082) 0.0365 pA2
max(0.0142, 0.0047, 0.0287, 0.0030, 0.0136,
0.0031, 0.0108, 0.0082) 0.0287 pA3
max(0.1986, 0.0450, 0.0287, 0.0365, 0.0136,
0.0201, 0.0108, 0.0082) 0.1986 pA4
max(0.0191, 0.0450, 0.0047, 0.0085, 0.0136,
0.0201, 0.0031, 0.0082) 0.0450
106
Free and Restricted Combinations

If truth of some null hypotheses logically forces
other nulls to be true, the hypotheses are
restricted.
Examples
Multiple Endpoints, one test per endpoint - free
All Pairwise Comparisons - restricted

107
Pairwise Comparisons, 3 Groups
H0 m1m2m3
H0 m1m3,m2m3
H0 m1m2,m1m3
H0 m1m2,m2m3
H0 m1m2
H0 m1m3
H0 m2m3
Note The entire middle layer is not needed!!!!!
Fisher protected LSD valid!
108
Pairwise Comparisons, 4 Groups
m1m2m3m4
m1m3, m2m4
m1m4, m2m3
m1m2, m3m4
m1m2m3
m1m2m4
m1m3m4
m2m3m4
m1m2
m1m3
m1m4
m2m3
m2m4
m3m4
Note Logical implications imply that there are
only 14 nodes, not 26 -1 63 nodes. Also,
Fisher protected LSD not valid.
109
Restricted Combinations Multipliers
(Shaffer Method 1 Modified Holm)
Shaffer, J.P. (1986). Modified sequentially
rejective multiple test procedures. JASA 81,
826831.
110
Shaffers (1) Adjusted p-values
111
Westfall/Tobias/Shaffer/Royen Method

Uses actual distribution of MinP instead of
conservative Bonferroni approximation
Closed testing incorporating logical constraints
Hard-coded in PROC GLIMMIX
Allows arbitrary linear functions

Westfall, P.H. and Tobias, R.D. (2007).
Multiple Testing of General Contrasts
Truncated Closure and the Extended Shaffer-Royen
Method, Journal of the American Statistical
Association 102 487-494.
112
Application of Truncated Closed MinP to Subgroup
Analysis

Compare Treatment with control as follows
Overall
In the Older Patients subgroup
In the Younger Patients subgroup
In patients with better initial health subgroup
In patients with poorer initial health subgroup
In each of the four (old/young)x(better/poorer)
subgroups
9 tests overall (but better 1 gatekeeper 8
follow-up)

113
Analysis File
ods output estimatesestimates_logicaltests proc
glimmix dataresearch.respiratory class
Treatment AgeGroup InitHealth model score
Treatment AgeGroup InitHealth TreatmentAgeGroup
TreatmentInitHealth AgeGroupInitHealth Estimate
"Overall" treatment 4 -4
treatmentAgegroup 2 2 -2 -2 treatmentInitHealt
h 2 2 -2 -2 (divisor4), "Older"
treatment 2 -2 treatmentAgegroup 2 0 -2 0
treatmentInitHealth 1 1 -1 -1 (divisor2), "Young
er" treatment 2 -2 treatmentAgegroup 0
2 0 -2 treatmentInitHealth 1 1 -1 -1
(divisor2), "GoodInitHealth" treatment 2 -2
treatmentAgegroup 1 1 -1 -1 treatmentInitHealt
h 2 0 -2 0 (divisor2), "PoorInitHealth"
treatment 2 -2 treatmentAgegroup 1 1 -1 -1
treatmentInitHealth 0 2 0 -2 (divisor2), "OldGo
od" treatment 1 -1 treatmentAgegroup 1
0 -1 0 treatmentInitHealth 1 0 -1 0
, "OldPoor" treatment 1 -1
treatmentAgegroup 1 0 -1 0 treatmentInitHealt
h 0 1 0 -1 , "YoungGood" treatment 1 -1
treatmentAgegroup 0 1 0 -1 treatmentInitHealt
h 1 0 -1 0 , "YoungPoor" treatment 1 -1
treatmentAgegroup 0 1 0 -1 treatmentInitHealt
h 0 1 0 -1 /adjustsimulate(nsamp1000000
0 report seed12321) upper stepdown(typelogical
report) run proc print dataestimates_logicalt
ests noobs title "Subgroup Analysis Results
Truncated Closure" var label estimate Stderr
tvalue probt Adjp run
114
Results Truncated Closure
Subgroup Analysis Results

adjp_ adjp_ Label Estimate
StdErr tValue Probt logical
interval Overall 0.7075 0.1956
3.62 0.0002 0.0011 0.0015 Older
0.9952 0.2673 3.72
0.0002 0.0011 0.0011 Younger
0.4197 0.2847 1.47 0.0717 0.1049
0.2605 GoodInitHealth 0.5871
0.2878 2.04 0.0219 0.0432
0.0984 PoorInitHealth 0.8279 0.2644
3.13 0.0011 0.0023 0.0068 OldGood
0.8748 0.3387 2.58 0.0056
0.0124 0.0295 OldPoor 1.1157
0.3231 3.45 0.0004 0.0011
0.0026 YoungGood 0.2993 0.3562
0.84 0.2014 0.2014 0.5494 YoungPoor
0.5401 0.3338 1.62 0.0544
0.1049 0.2091
The adjusted p-values for the stepdown tests are
mathematically smaller than those of the
simultaneous interval-based tests,
115
Example Stepwise Pairwise vs. Control Testing

Teratology data set
Observations are litters
Response variable litter weight
Treatments 0,5,50,500.
Covariates Litter size, Gestation time

116
Analysis File
proc glimmix dataresearch.litter class
dose model weight dose gesttime number
estimate "5 vs 0" dose -1 1 0 0, "50 vs
0" dose -1 0 1 0, "500 vs 0" dose -1 0 0 1
/ adjustsimulate(nsample10000000 report)
stepdown(typelogical) run quit
117
Results
Estimates with Simulated Adjustment
Standard Label Estimate
Error DF t Value Pr gt t Adj P
5 vs 0 -3.3524 1.2908 68
-2.60 0.0115 0.0316 50 vs 0 -2.2909
1.3384 68 -1.71 0.0915
0.0915 500 vs 0 -2.6752 1.3343 68
-2.00 0.0490 0.0907
Note 50-0 and 500-0 not significant at .10 with
regular Dunnett
118
Concluding Notes

More power is available when combinations are
restricted.
Power of closed tests can be improved using
correlation and other distributional
characteristics

119
Nonparametric Multiple Testing Methods

Overview Use nonparametric tests at each node
of the
closure tree
Bootstrap tests
Rank-based tests
Tests for binary data

120
Bootstrap MinP Test (Semi-Parametric Test)

The composite hypothesis H1ÇH2ÇÇHk may be tested
using the p-value
p P(MinP minp H1ÇH2ÇÇHk)
Westfall and Young (1993) show
how to obtain p by bootstrapping the residuals
in a multivariate regression model.
how to obtain all ps in the closure tree
efficiently

121
Multivariate Regression Model (Next Five
slides are from Westfall and Young, 1993)
122
Hypotheses and Test Statistics
123
Joint Distribution of the Test Statistics
124
Testing Subset Intersection Hypotheses Using the
Extreme Pivotals
125
Exact Calculation of pK
Bootstrap Approximation
126
Bootstrap Tests (PROC MULTTEST)
H0 d1d2d3d4 0 min p .0121, p .0379
H0 d1d3d4 0 min p .0121, p lt .0379
H0 d2d3d4 0 min p .0142, p .0351
H0 d1d2d3 0 min p .0121, p lt .0379
H0 d1d2d4 0 min p .0121, p lt .0379
H0 d1d2 0 minp .0121 p lt .0379
H0 d1d3 0 minp .0121 p lt .0379
H0 d3d4 0 minp .0191 p .0355
H0 d1d4 0 minp .0121 p lt .0379
H0 d2d3 0 minp .0142 p lt .0351
H0 d2d4 0 minp .0142 p lt .0351
H0 d40 p 0.0191 p lt .0355
H0 d10 p 0.0121 p lt .0379
H0 d20 p 0.0142 p lt .0351
H0 d30 p 0.1986 p .1991
p P(Min P min p H0) (computed using
bootstrap resampling) (Recall, for Bonferroni, p
k(MinP) )
127
Permutation Tests for Composite Hypotheses H0K
Joint p-value proportion of the n!/(nT!nC!)
permutations for which miniÎK Pi miniÎK pi .
128
Problem Simplification
Problem There are 2k -1 subsets K to be
tested This might take a while...
Simplification You need only test k of the 2k-1
subsets! Why? Because P(miniÎK Pi c)
P(miniÎK Pi c) when KÌ K. Significance
for most lower order subsets is determined by
significance of higher order subsets.
129
MULTTEST PROCEDURE
Tests only the needed subsets (k, not 2k -
1). Samples from the permutation
distribution. Only one sample is needed, not k
distinct samples, if the joint distribution of
minP is identical under HK and HS. (Called
the subset pivotality condition by Westfall
and Young, 1993, valid under location shift and
other models)
130
Great Savings are Possible with Exact Permutation
Tests!
Why? Suppose you test H12k using MinP. The
joint p-value is p P(MinP minp)
P(P1 minp) P(P2 minp)
P(Pk minp) Many summands can be zero,
others much less than minp.
131
Multiple Binary Adverse Events

Stepdown Stepdown Variable
Contrast Raw Bonferroni
Permutation ae1 t vs c
0.0008 0.0025 0.0020 ae2
t vs c 0.6955 1.0000
1.0000 ae3 t vs c
0.5000 1.0000 1.0000 ae4
t vs c 0.7525 1.0000
1.0000 ae5 t vs c
0.2213 1.0000 0.6274 ae6
t vs c 0.0601 0.3321
0.2608 ae7 t vs c
0.8165 1.0000 1.0000 ae8
t vs c 0.0293 0.1587
0.1328 ae9 t vs c
0.9399 1.0000 1.0000 ae10
t vs c 0.2484 1.0000
0.9273 ae11 t vs c
1.0000 1.0000 1.0000 ae12
t vs c 1.0000 1.0000
1.0000 ae13 t vs c
1.0000 1.0000 1.0000 ae14
t vs c 1.0000 1.0000
1.0000 ae15 t vs c
0.2484 1.0000 0.9273 ae16
t vs c 0.7516 1.0000
1.0000 ae17 t vs c
1.0000 1.0000 1.0000 ae18
t vs c 1.0000 1.0000
1.0000 ae19 t vs c
1.0000 1.0000 1.0000 ae20
t vs c 0.5000 1.0000
1.0000 ae21 t vs c
0.7516 1.0000 1.0000 ae22
t vs c 1.0000 1.0000
1.0000 ae23 t vs c
0.5000 1.0000 1.0000 ae24
t vs c 1.0000 1.0000
1.0000 ae25 t vs c
1.0000 1.0000 1.0000 ae26
t vs c 1.0000 1.0000
1.0000 ae27 t vs c
1.0000 1.0000 1.0000 ae28
t vs c 0.4344 1.0000
0.9400
132
Example Genetic Associatons
Phenotype 0/1 (diseased or not). Sample n1 from
diseased, n2 from not diseased. Compare 100s of
genotype frequencies (using dominant and
recessive codings) for diseased and non-diseased
using multiple Fisher exact tests.
133
PROC MULTTEST Code
proc multtest dataresearch.gen stepperm n20000
outpval hommel fdr class y test
fisher(d1-d1

Write a Comment

User Comments (0)