A Course in Multiple Comparisons and Multiple Tests - PowerPoint PPT Presentation

1 / 185
About This Presentation
Title:

A Course in Multiple Comparisons and Multiple Tests

Description:

Elucidate reasons that multiple comparisons procedures (MCPs) are used, as well ... Closure Method, Global Tests; Holm, Hommel, Hochberg and Fisher combined methods ... – PowerPoint PPT presentation

Number of Views:992
Avg rating:3.0/5.0
Slides: 186
Provided by: busi246
Learn more at: https://askit.ttu.edu
Category:

less

Transcript and Presenter's Notes

Title: A Course in Multiple Comparisons and Multiple Tests


1
A Course in Multiple Comparisons and Multiple
Tests
  • Peter H. Westfall, Ph.D.
  • Professor of Statistics, Department of Inf.
    Systems and Quant. Sci.
  • Texas Tech University

2
Learning Outcomes
  • Elucidate reasons that multiple comparisons
    procedures (MCPs) are used, as well as their
    controversial nature
  • Know when and how to use classical interval-based
    MCPs including Tukey, Dunnett, and Bonferroni
  • Understand how MCPs affect power
  • Elucidate the definition of closed testing
    procedures (CTPs)
  • Understand specific types of CTPs, benefits and
    drawbacks
  • Distinguish false discovery rate (FDR) from
    familywise error rate (FWE)
  • Understand general issues regarding Bayesian MCPs

3
Outline of Material
Introduction. Overview of Problems, Issues, and
Solutions, Regulatory and Ethical Perspectives,
Families of Tests, Familywise Error Rate,
Bonferroni. (pp. 5-21) Interval-Based Multiple
Inferences in the standard linear models
framework. One-way ANOVA and ANCOVA, Tukey,
Dunnett, and Monte Carlo Methods, Adjusted
p-values, general contrasts, Multivariate T
distribution, Tight Confidence Bands,
TreatmentxCovariate Interaction, Subgroup
Analysis (pp. 22-55) Power and Sample Size
Determinations for multiple comparisons. (pp.
56-65) Stepwise and Closed Testing Procedures I
P-value-Based Methods. Closure Method, Global
Tests Holm, Hommel, Hochberg and Fisher combined
methods for p-Values (pp. 66-90) Stepwise and
Closed Testing Procedures II Fixed Sequences,
Gatekeepers and I-U tests Fixed Sequence tests,
Gatekeeper procedures, Multiple hypotheses in a
gate, Intersection-union tests with application
to dose response, primary and secondary
endpoints, bioequivalence and combination
therapies (pp. 91-101)
4
Outline (Continued)
Stepwise and Closed Testing Procedures III
Methods that use logical constraints and
correlations. Lehmacher et al. Method for
Multiple endpoints Range-Based and F-based ANOVA
Tests, Fishers protected LSD, Free and
Restricted Combinations, Shaffer-Type Methods
for dose comparisons and subgroup analysis (pp.
102-118) Multiple nonparametric and
semiparametric tests Bootstrap and
Permutation-based Closed tesing. PROC MULTTEST,
examples with multiple endpoints, genetic
associations, gene expression, binary data and
adverse events (pp. 119-139) More complex models
and FWE control Heteroscedasticity, Repeated
measures, and large sample methods.
Applications multiple treatment comparisons,
crossover designs, logistic regression of cure
rates (pp. 140-152) False Discovery Rate
Benjamini and Hochbergs method, comparison with
FWE controlling methods (153-158) Bayesian
methods Simultaneous credible intervals,
ranking probabilities and loss functions, PROC
MIXED posterior sampling, Bayesian testing of
multiple endpoints (pp. 159-178) Conclusion,
discussion, references (179-184)
5
Sources of Multiplicity
  • Multiple variables (endpoints)
  • Multiple timepoints
  • Subgroup analysis
  • Multiple comparisons
  • Multiple tests of the same hypothesis
  • Variable and Model selection
  • Interim analysis
  • Hidden Multiplicity File Drawers, Outliers

6
The Problem
  • Significant results may fail to replicate.
  • Documented cases
  • Ioannidis (JAMA 2005)

7
An Example
  • Phase III clinical trial
  • Three arms Placebo, AC, Drug
  • Endpoints Signs and symptoms
  • Measured at weekly visits
  • Baseline covariates

8
Example-Continued
  • Features displayed at trial conclusion
  • Trends
  • Baseline adjusted comparisons of raw data
  • Baseline adjusted changes
  • Nonparametric and parametric tests
  • Specific endpoints and combinations of endpoints
  • Particular week results
  • AC and Placebo comparisons
  • Fact The features that look the best are
    biased.

9
Example Continued Feature Selection
  • Effect Size is a feature
  • Effect size (mean difference)/sd
  • Dimensionless
  • .2small, .5medium, .8large
  • Estimated effect sizes F1, F2,,Fk
  • What if you select (maxF1,F2,,Fk) and publish
    it?

10
The Scientific Concern
11
Feature Selection Model
  • Clinical Trials Simulation
  • Real data used
  • Conservative!
  • If you must know more
  • Fj mj ej, j1,,20.
  • Error terms or N(0,.22)
  • True effect sizes mj are N(.3,.12)
  • Features Fj are highly correlated.

12
Key Points (i) Multiplicity invites
Selection(ii) Selection has an EFFECT
  • Just like effects due to
  • Treatment
  • Confounding
  • Learning
  • Nonresponse
  • Placebo

13
Published Guidelines
  • ICH Guidelines
  • CPMP Points to consider
  • CDRH Statistical Guidance
  • ASA Ethical Guidelines

14
Regulatory/Journal/Ethical/Professional Concerns
  • Replicability (good science)
  • Fairness
  • Regulatory report
  • The drug company reported efficacy at p.047.
    We repeated the analysis in several different
    ways that the company might have done. In 20
    re-analyses of the data, 18 produced p-values
    greater than .05. Only one of the 20 re-analyses
    produced a p-value smaller than .047.

15
Multiple Inferences Notation
  • There is a family of k inferences
  • Parameters are q1,, qk
  • Null hypotheses are
  • H01 q10, , H0k qk0

16
Comparisonwise Error Rate (CER)
  • Intervals
  • CERj P(Intervalj incorrect)
  • Tests
  • CERj P(Reject H0j H0j is true)
  • Usually CER a .05

17
Familywise Error Rate (FWE)
Intervals FWE 1 - P(all intervals are
correct) Tests FWE P(reject at
least one true null)
18
False Discovery Rate
  • FDR E(proportion of rejections that are
    incorrect)
  • Let R total of rejections
  • Let V of erroneous rejections
  • FDR E(V/R) (0/0 defined as 0).
  • FWE P(Vgt0)

19
Bonferroni Method
  • Identify Family of inferences
  • Identify number of elements (k) in the Family
  • Use a/k for all inferences.
  • Ex With k36, p-values must be less than
    0.05/36 0.0014 to be significant

20
FWE Control for Bonferroni
  • FWE
  • P(p0j1.05/36 or or p0jm .05/36
    H0j1,..., H0jmtrue)
  • P(p0j1.05/36) P( p0jm .05/36)
  • (.05)m/36 .05

B
A
P(AÈB) P(A) P(B)
21
Families in clinical trials1
Efficacy
Safety
Main Interest - Primary Secondary Approval and
Labeling depend on these. Tight FWE control
needed.
Serious and known treatment- related AEs FWE
control not needed
Lesser Interest - Depending on goals and
reviewers, FWE controlling methods might be
needed.
All other AEs Reasonable to control FWE (or FDR)
Supportive Tests - mostly descriptive FWE control
not needed.
Exploratory Tests - investigate new indications
- future trials needed to confirm - do what makes
sense.
1Westfall, P. and Bretz, F. (2003). Multiplicity
in Clinical Trials. Encyclopedia of
Biopharmaceutical Statistics, second edition,
Shein-Chung Chow, ed., Marcel Decker Inc., New
York, pp. 666-673.
22
Classical Single-Step Testing and Interval
Methods to Control FWE
  • Simultaneous confidence intervals
  • Adjusted p-values
  • Dunnett method
  • Tukeys method
  • Simulation-based methods for general comparisons

23
Specificity and Sensitivity
then use
If you want ...
  • Estimates of effect sizes error margins
  • Confident inequalities
  • Overall Test
  • Simultaneous Confidence Intervals
  • Stepwise or closed tests
  • F-test, OBrien, etc.

24
The Model
  • Y Xb e
  • where e N(0, s2 I )
  • Includes ANOVA, ANCOVA, regression
  • For group comparisons, covariate adjustment
  • Not valid for survival analysis, binary data,
    multivariate data

25
Example Pairwise Comparisons against Control
Goal Estimate all mean differences from control
and provide simultaneous 95 error margins
What ca to use?
26
Comparison of Critical Values
27
Results - Dunnett
The GLM Procedure Dunnett's t Tests for
gain NOTE This test controls the Type I
experimentwise error for comparisons of all
treatments againstba control. Alpha
0.05 Error Degrees of Freedom
21 Error Mean Square
210.0048 Critical Value of Dunnett's t
2.78972 Minimum Significant Difference
28.586 Comparisons significant at the 0.05 level
are indicated by . Difference
Simultaneous g Between 95
Confidence Comparison Means
Limits 1 - 0 -9.48 -38.07
19.11 4 - 0 -13.50 -42.09
15.09 5 - 0 -20.70 -49.29
7.89 2 - 0 -24.90 -53.49 3.69
6 - 0 -31.14 -59.73 -2.55
3 - 0 -33.24 -61.83 -4.65
28
ca is the 1-a quantile of the distribution of
maxi Zi-Z0/(2c2/df)1/2, called Dunnetts
two-sided range distribution.
29
Adjusted p-Values
Definition Adjusted p-value smallest FWE at
which the hypothesis is rejected. or The FWE
for which the confidence interval has 0 as a
boundary.
30
Adjusted p-values for Dunnett
proc glm datatox class g model gaing
lsmeans g/adjustdunnett pdiff run
31
Example All Pairwise Comparisons
Goal Estimate all mean differences and provide
simultaneous 95 error margins
What ca to use?
32
Comparison of Critical Values
33
Tukey Comparisons
Alpha 0.05 df 21 MSE
210.0048 Critical Value of
Studentized Range 4.597
Minimum Significant Difference 33.311
Means with the same letter are not significantly
different. Tukey Grouping
Mean N G
A 105.38 4 0
A A
95.90 4 1
A A
91.88 4 4 A
A 84.68
4 5 A
A 80.48 4
2 A
A 74.24 4 6
A
A 72.14 4 3
34
Tukey Adjusted p-Values
General Linear Models
Procedure Least
Squares Means Adjustment for
multiple comparisons Tukey G GAIN Pr gt
T H0 LSMEAN(i)LSMEAN(j) LSMEAN i/j
1 2 3 4 5 6
7 0 105.380 1 . 0.9641 0.2351
0.0507 0.8364 0.4319 0.0769 1 95.900 2
0.9641 . 0.7391 0.2810 0.9996 0.9227
0.3806 2 80.480 3 0.2351 0.7391 .
0.9808 0.9172 0.9995 0.9958 3 72.140 4
0.0507 0.2810 0.9808 . 0.4860 0.8771
1.0000 4 91.880 5 0.8364 0.9996 0.9172
0.4860 . 0.9910 0.6102 5 84.680 6
0.4319 0.9227 0.9995 0.8771 0.9910 .
0.9438 6 74.240 7 0.0769 0.3806 0.9958
1.0000 0.6102 0.9438 .
35
Tukey Simultaneous Intervals
Simultaneous
Simultaneous
Lower Difference Upper
Confidence Between
Confidence i j Limit
Means Limit 1
2 -23.831013 9.480000
42.791013 1 3 -8.411013
24.900000 58.211013 1
4 -0.071013 33.240000
66.551013 1 5 -19.811013
13.500000 46.811013 1
6 -12.611013 20.700000
54.011013 1 7 -2.171013
31.140000 64.451013 2
3 -17.891013 15.420000
48.731013 2 4 -9.551013
23.760000 57.071013 2
5 -29.291013 4.020000
37.331013 2 6 -22.091013
11.220000 44.531013 2
7 -11.651013 21.660000
54.971013 3 4 -24.971013
8.340000 41.651013 3
5 -44.711013 -11.400000
21.911013 3 6 -37.511013
-4.200000 29.111013 3
7 -27.071013 6.240000
39.551013 4 5 -53.051013
-19.740000 13.571013 4
6 -45.851013 -12.540000
20.771013 4 7 -35.411013
-2.100000 31.211013 5
6 -26.111013 7.200000
40.511013 5 7 -15.671013
17.640000 50.951013 6
7 -22.871013 10.440000
43.751013
36
ca is (1/Ö2) the 1-a quantile of the
distribution of maxi,i Zi-Zi/(c2/df)1/2,
which is called the Studentized range
distribution.
37
Unbalanced Designs and/or Covariates
  • Tukey method is conservative when the design is
    unbalanced and/or there are covariates otherwise
    exact
  • Dunnett method is conservative when there are
    covariates otherwise exact
  • Conservative means
  • True FWE lt Nominal FWE
  • also means less powerful

38
Tukey-Kramer Method for all pairwise comparisons
  • Let ca be the critical value for the balanced
    case using Tukeys method and the correct df.
  • Intervals are
  • Conservative (Hayter, 1984 Annals)

39
Exact Method for General Comparisons of Means
40
Multivariate T-Distribution Details
40
41
Calculation of Exact ca
  • Edwards and Berry Simple simulation
  • Hsu and Nelson Factor analytic control variate
    (better)
  • Genz and Bretz Integration using lattice methods
    (best)
  • Even with simple simulation, the value ca can be
    obtained
  • with reasonable precision.

Edwards, D., and Berry, J. (1987) The efficiency
of simulation-based multiple comparisons.
Biometrics, 43, 913-928. Hsu, J.C. and Nelson,
B.L. (1998) Multiple comparisons in the general
linear model. Journal of Computational and
Graphical Statistics, 7, 23-41. Genz, A. and
Bretz, F. (1999), Numerical Computation of
Multivariate t Probabilities with Application to
Power Calculation of Multiple Constrasts, J.
Stat. Comp. Simul. 63, pp. 361-378.
42
Example ANCOVA with two covariates
Y Diastolic BP Group Therapy
(Control, D1, D2, D3) X1 Baseline
Diastolic BP X2 Baseline Systolic BP
Goal Compare all therapies, controlling for
baseline
proc glm dataresearch.bpr class therapy
model dbp10 therapy dbp7 sbp7 lsmeans
therapy/pdiff cl adjustsimulate(nsamp
10000000 cvadjust seed121011
report) run quit
43
Results From ANCOVA
Source DF Type III SS Mean Square F
Value Pr gt F THERAPY 3 677.429172
225.809724 6.05 0.0006 DBP7 1
6832.878653 6832.878653 183.06 lt.0001
SBP7 1 51.123459 51.123459
1.37 0.2435
Least Squares Means for Effect THERAPY
Difference Simultaneous 95
Between Confidence Limits for i
j Means LSMean(i)-LSMean(j) 1
2 2.832658 -0.424816 6.09013
1 3 1.328481 -2.099566
4.756527 1 4 -2.536262
-5.981471 0.908947 2 3
-1.504178 -4.846403 1.838047 2 4
-5.368920 -8.734744 -2.003097 3 4
-3.864743 -7.398994 -0.330491
Note 4 is control
44
Details for Quantile Simulation
Random number seed
121011 Comparison type
All Sample
size 9999938
Target alpha 0.05
Accuracy radius (target)
0.0002 Accuracy radius
(actual) 437E-7
Accuracy confidence 99
Simulation Results
Estimated
99 Confidence Method 95
Quantile Alpha Limits
Simulated 2.594159 0.0500
0.0500 0.0500 Tukey-Kramer
2.594637 0.0499 0.0499 0.0500
Bonferroni 2.669484 0.0411
0.0410 0.0411 Sidak
2.662029 0.0419 0.0418 0.0419
GT-2 2.660647 0.0420
0.0420 0.0421 Scheffe
2.823701 0.0270 0.0270 0.0270
T 1.974017 0.2017
0.2016 0.2018
NOTE PROCEDURE GLM used real time 21.23 seconds
45
Results from ANCOVA-Dunnett
H0LSMean
Control THERAPY DBP10 LSMEAN
Pr gt t Dose 1 88.8171113
0.1407 Dose 2 85.9844529 0.0002 Dose
3 87.4886307 0.0140 Placebo
91.3533732 Least Squares Means for Effect
THERAPY Difference
Simultaneous 95 Between
Confidence Limits for i j Means
LSMean(i)-LSMean(j) 1 4 -2.536262
-5.675847 0.603323 2 4 -5.368920
-8.436161 -2.301679 3 4 -3.864743
-7.085470 -0.644015
46
Details for Quantile Simulation-Dunnett
Random number seed
121011
Comparison type Control, two-sided
Sample size
9999938
Target alpha 0.05
Accuracy radius (target)
0.0002
Accuracy radius (actual) 139E-7
Accuracy confidence
99
Simulation Results
Estimated
99 Confidence Method
95 Quantile Alpha Limits
Simulated 2.364031
0.0500 0.0500 0.0500
Dunnett-Hsu, two-sided 2.364084
0.0500 0.0500 0.0500
Bonferroni 2.417902
0.0437 0.0437 0.0437 Sidak
2.411491 0.0444
0.0444 0.0444 GT-2
2.410664 0.0445 0.0445
0.0445 Scheffe
2.823701 0.0145 0.0145 0.0145
T 1.974017
0.1229 0.1229 0.1230
NOTE PROCEDURE GLM used real time 19.00 seconds
47
More General Inferences
Question For what values of the covariate
is treatment A better than treatment B?
48
Discussion of (Treatment Covariate) Interaction
Example
49
The GLIMMIX Procedure
Computes MC-exact simultaneous confidence
intervals and adjusted p-values for any set
of linear functions in a linear model
50
GLIMMIX syntax
proc glimmix dataresearch.tire class
make model cost make mph makemph
estimate "10" make 1 -1 makemph 10 -10,
"15" make 1 -1 makemph 15 -15, "20"
make 1 -1 makemph 20 -20, "25" make 1
-1 makemph 25 -25, "30" make 1 -1
makemph 30 -30, "35" make 1 -1
makemph 35 -35, "40" make 1 -1
makemph 40 -40, "45" make 1 -1
makemph 45 -45, "50" make 1 -1
makemph 50 -50, "55" make 1 -1
makemph 55 -55, "60" make 1 -1
makemph 60 -60, "65" make 1 -1
makemph 65 -65, "70" make 1 -1
makemph 70 -70 /adjustsimulate(nsamp10
000000 report) cl run
51
Output from PROC GLIMMIX
Simultaneous intervals are Estimate -
2.648 StdErr Label Estimate StdErr
tValue AdjLower AdjUpper 10 -4.1067
0.9143 -4.49 -6.5279 -1.6854 15
-3.4539 0.8084 -4.27 -5.5947
-1.3131 20 -2.8011 0.7101
-3.94 -4.6815 -0.9207 25 -2.1483
0.6230 -3.45 -3.7981 -0.4985 30
-1.4956 0.5524 -2.71 -2.9585
-0.03260 35 -0.8428 0.5054
-1.67 -2.1812 0.4956 40 -0.1900
0.4887 -0.39 -1.4842 1.1042 45
0.4628 0.5054 0.92 -0.8756
1.8012 50 1.1156 0.5524
2.02 -0.3474 2.5785 55 1.7683
0.6230 2.84 0.1185 3.4181 60
2.4211 0.7101 3.41 0.5407
4.3015 65 3.0739 0.8084 3.80
0.9331 5.2147 70 3.7267
0.9143 4.08 1.3054 6.1479
Bonferroni critical value is t_16,.05/213
3.377.
52
Other Applications of Linear Combinations
  • Multiple Trend Tests
  • (0,1,2,3), (0,1,2,4), (0,4,6,7)
  • (carcinogenicity)
  • (0,0,1), (0,1,1), (0,1,2) (recessive/dominant/ordi
    nal genotype effects)
  • Subgroup Analysis
  • Subgroups define linear combinations (more on
    next slide)

53
Subgroup Analysis Example
  • Data Yijkl , where iTrt,Cntrl jOld, Yng
    k GoodInit, PoorInit.
  • Model Yijkl mijk eijkl, where
    mijkmaibjgk(ab)ij(ag)ik(bg)jk
  • Subgroup Contrasts
  • m111 m112 m121
    m122 m211 m212 m221 m222
  • Overall ¼ ¼ ¼ ¼
    -¼ -¼ -¼ -¼
  • Older ½ ½ 0 0
    -½ -½ 0 0
  • Younger 0 0 ½ ½
    0 0 -½ -½
  • GoodInit ½ 0 ½ 0
    -½ 0 -½ 0
  • PoorInit 0 ½ 0 ½
    0 -½ 0 -½
  • OldGood 1 0 0 0
    -1 0 0 0
  • OldPoor 0 1 0 0
    0 -1 0 0
  • YoungGood 0 0 1 0
    0 0 -1 0
  • YoungPoor 0 0 0 1
    0 0 0 -1

54
Subgroup Analysis Results Label
Estimate StdErr tValue Probt
Adjp AdjLower AdjUpper Overall
0.7075 0.1956 3.62 0.0002
0.0015 0.2460 I Older
0.9952 0.2673 3.72 0.0002
0.0011 0.3646 I Younger
0.4197 0.2847 1.47 0.0717
0.2605 -0.2521 I GoodInitHealth
0.5871 0.2878 2.04 0.0219
0.0984 -0.09197 I PoorInitHealth
0.8279 0.2644 3.13 0.0011
0.0068 0.2039 I OldGood
0.8748 0.3387 2.58 0.0056
0.0295 0.07566 I OldPoor
1.1157 0.3231 3.45 0.0004
0.0026 0.3532 I YoungGood
0.2993 0.3562 0.84 0.2014
0.5494 -0.5413 I YoungPoor
0.5401 0.3338 1.62 0.0544
0.2091 -0.2476 I
(SAS code available upon request)
55
Summary
  • Include only comparisons of interest.
  • Utilize correlations to be less conservative.
  • The critical values can be computed exactly only
    in balanced ANOVA for all pairwise comparisons,
    or in unbalanced ANOVA for comparisons with
    control.
  • Simulation-based methods are exact if you let
    the computer run for a while. This is my general
    recommendation.

56
Power Analysis
  • Sample size - Design of study
  • Power is less when you use multiple comparisons Þ
    larger sample sizes
  • Many power definitions
  • Bonferroni independence are convenient (but
    conservative) starting points

57
Power Definitions
Complete Power P(Reject all H0i that are
false) Minimal Power P(Reject at least one
H0i that is false) Individual Power P(Reject
a particular H0i that is false) Proportional
Power Average proportion of false H0i
that are rejected
58
Power Calculations.
Example H1 and H2 powered individually at 50
H3 and H4 powered individually at 80, all tests
independent. Complete Power P(reject H1 and
H2 and H3 and H4)
.5 .5 .8 .8 0.16. Minimal Power
P(reject H1or H2 or H3 or H4) 1-P(accept H1
and H2 and H3 and H4) 1- (1-.5) (1 -.5)
(1-.8) (1-.8) 0.99. Individual Power
P(reject H3 (say)) 0.80. (depends on the
test) Proportional Power (.5 .5 .8 .8)/4
0.65
59
Sample Size for Adequate Individual Power -
Conservative Estimate
60
Individual power of two-tail two-sample
Bonferroni t-tests
let MuDiff 5 / Smallest meaningful
difference MUx-MUy that
you want to detect / let Sigma
10.0 / A guess of the population std. dev.
/ let alpha .05 / Familywise Type I
error probability of the
test / let k 4
/ Number of tests / options ls76 data
power cer alpha/k do n 2 to 100 by
2 nsample size for each group
df n n - 2 ncp (Mudiff)/(Sigmasq
rt(2/n)) The noncentrality
parameter
tcrit tinv(1-cer/2, df) The
critical t value power 1 -
probt(tcrit, df, ncp) probt(-tcrit,df,ncp)
output end proc print
datapower run proc plot datapower plot
powern/vpos30 run
61
Graph of Power Function
Plot of powern. Legend A 1 obs, B 2 obs,
etc. power
1.0 ˆ

AAA 0.8 ˆ
AAAA

AAA
AAA
AAA
AA 0.6 ˆ
AAA n92 for 80
AA
power
AA AA
AA 0.4
ˆ A
AA
AA AA
AA 0.2 ˆ AA
AA
AA AA AAA
0.0 ˆ A
Šƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒ
ƒƒƒˆƒƒ 0 20 40
60 80 100
n
62
IndividualPower macro
  • Uses PROBMC and PROBT (noncentral)
  • Assumes that you want to use the single-step
  • (confidence interval based) Dunnett (one-
  • or two-sided) or Range (two-sided) test
  • Less conservative than Bonferroni
  • Conservative compared to stepwise procedures
  • IndividualPower(MCPDUNNETT2,g4,d5,s10)

Westfall et al (1999), Multiple Comparisons and
Multiple Tests Using SAS
63
IndividualPower Output
64
More general Power- Simulate!
Invocation SimPower(method dunnett
, TrueMeans (10, 10, 13, 15,
15) , s 10
, n 87 ,
seed12345 )
Output MethodDUNNETT,
Nominal FWE0.05, nrep1000
True means (10, 10, 13, 15, 15), n87, s10
Quantity Estimate
---95 CI---- Complete Power
0.28800 (0.260,0.316) Minimal
Power 0.92900 (0.913,0.945)
Proportional Power 0.65133
(0.633,0.669) True FWE
0.01900 (0.011,0.027)
Directional FWE 0.01900 (0.011,0.027)
65
Concluding Remarks - Power
  • Need a bigger n
  • Like to avoid bigger n (see sequential,
    gatekeepers methods later)
  • Which definition?
  • Bonferroni and independence useful
  • Simulation useful especially for the more
    complex methods that follow

66
Closed and Stepwise Testing Methods I Standard
P-Value Based Methods
then use
If you want ...
  • Estimates of effect sizes error margins
  • Confident inequalities
  • Overall Test
  • Simultaneous Confidence Intervals
  • Stepwise or closed tests
  • Holms Method
  • Hommels Method
  • Hochbergs Method
  • Fisher Combination Method
  • F-test, OBrien, etc.

67
Closed Testing Method(s)
  • Form the closure of the family by including all
    intersection hypotheses.
  • Test every member of the closed family by a
    (suitable) a-level test. (Here, a refers to
    comparison-wise error rate).
  • A hypothesis can be rejected provided that
  • its corresponding test is significant at level a,
    and
  • every other hypothesis in the family that implies
    it is rejected by its a-level test.

68
Closed Testing Multiple Endpoints
H0 d1d2d3d4 0
H0 d1d2d3 0
H0 d1d2d4 0
H0 d1d3d4 0
H0 d2d3d4 0
H0 d1d4 0
H0 d2d4 0
H0 d1d2 0
H0 d1d3 0
H0 d2d3 0
H0 d3d4 0
H0 d10 p 0.0121
H0 d20 p 0.0142
H0 d30 p 0.1986
H0 d40 p 0.0191
Where dj mean difference, treatment -control,
endpoint j.
69
Closed Testing Multiple Comparisons
m1m2m3m4
m1m3, m2m4
m1m4, m2m3
m1m2, m3m4
m1m2m3
m1m2m4
m1m3m4
m2m3m4
m1m2
m1m3
m1m4
m2m3
m2m4
m3m4
Note Logical implications imply that there are
only 14 nodes, not 26 -1 63 nodes.
70
Control of FWE with Closed Tests
Suppose H0j1,..., H0jm all are true (unknown to
you which ones). Reject at least one of
H0j1,..., H0jmusing CTP ? Reject H0j1Ç... Ç
H0jm Thus, P(reject at least one of H0j1,...,
H0jm H0j1,..., H0jm
all are true) P(reject H0j1Ç... Ç H0jm
H0j1,..., H0jm all are true) a
71
Examples of Closed Testing Methods
When the Composite Test is Then the Closed
Method is
  • Bonferroni MinP
  • Resampling-Based MinP
  • Simes
  • OBrien
  • Simple or weighted test
  • Holms Method
  • Westfall-Young method
  • Hommels method
  • Lehmachers method
  • Fixed sequence test (a-priori ordered)

72
P-value Based Methods
  • Test global hypotheses using p-value combination
    tests
  • Benefit Fewer model assumptions only need to
    say that the p-values are valid
  • Allows for models other than homoscesdastic
    normal linear models (like survival analysis).

73
Holms Method is Closed Testing Using the
Bonferroni MinP Test
  • Reject H0j1 ÇH0j2Ç... Ç H0jm if
  • Min (p0j1 , p0j2 ,... , p0jm )
    a/m.
  • Or, Reject H0j1 ÇH0j2Ç... Ç H0jm if
  • p m Min (p0j1 , p0j2 ,... ,
    p0jm ) a.
  • (Note that p is a valid p-value for the joint
    null, comparable to p-value for Hotellings T2
    test.)

74
Holms Stepdown Method
H0 d1d2d3d4 0 minp0.0121 p0.0484
H0 d1d3d4 0 minp.0121 p0.0363
H0 d2d3d4 0 minp0.0142 p0.0426
H0 d1d2d3 0 minp0.0121 p0.0363
H0 d1d2d4 0 minp0.0121 p0.0363
H0 d1d2 0 minp0.0121 p0.0242
H0 d1d3 0 minp0.0121 p0.0242
H0 d1d4 0 minp0.0121 p0.0242
H0 d2d3 0 minp0.0142 p0.0284
H0 d2d4 0 minp0.0142 p0.0284
H0 d3d4 0 minp0.0191 p0.0382
H0 d10 p 0.0121
H0 d20 p 0.0142
H0 d30 p 0.1986
H0 d40 p 0.0191
Where dj mean difference, treatment -control,
endpoint j.
75
Shortcut For Holms Method
  • Let H(1) ,,H(k) be the hypotheses corresponding
    to p(1) p(k)
  • If p(1) a/k, reject H(1) and continue, else
    stop and retain all H(1) ,,H(k) .
  • If p(2) a/(k-1), reject H(2) and continue,
    else stop and retain all H(1) ,,H(k) .
  • If p(k) a, reject H(k)

76
Adjusted p-values for Closed Tests
  • The adjusted p-value for H0j is the maximum of
    all p-values over all relevant nodes
  • In the previous example,
  • pA(1)0.0484,pA(2)0.0484, pA(3)0.0484,
    pA(4)0.1986.
  • General formula for Holm pA(j) maxij
    (k-i1)p(i) .

77
Worksheet For Holms Method
78
Simes Test for Global Hypotheses
  • Uses all p-values p1, p2, , pm not just the
    MinP
  • Simes test rejects H01ÇH02Ç...ÇH0m if
  • p(j) ja/m for at least one j.
  • Þ p-value for the joint test is p min
    (m/j)p(j)
  • Uniformly smaller p-value than m MinP
  • Type I error at most a under independence or
    positive dependence of p-values

79
Rejection Regions
p2
1
a
a/2
0
1
p1
a/2
a
P(Simes Reject) 1 (1- a/2)2 (a/2)2
a P(Bonferroni Reject ) 1 (1- a/2)2 a -
(a/2)2
80
Hommels Method (Closed Simes)
H0 d1d2d3d4 0 p0.0255
H0 d1d2d3 0 p0.0213
H0 d1d2d4 0 p0.0191
H0 d1d3d4 0 p0.0287
H0 d2d3d4 0 p0.0287
H0 d1d2 0 p0.0142
H0 d1d3 0 p0.0242
H0 d1d4 0 p0.0191
H0 d2d3 0 p0.0284
H0 d2d4 0 p0.0191
H0 d3d4 0 p0.0382
H0 d10 p 0.0121
H0 d20 p 0.0142
H0 d30 p 0.1986
H0 d40 p 0.0191
Where dj mean difference, treatment -control,
endpoint j.
81
Adjusted P-values for Hommels Method
  • Again, take the maximum p-value over all
    hypotheses that imply the given one.
  • In the previous example, the Hommel adjusted
    p-values are pA(1)0.0287, pA(2)0.0287,
    pA(3)0.0382, pA(4)0.1986.
  • These adjusted p-values are always smaller than
    the Holm step-down adjusted p-values.

82
Adjusted P-values for Hommels Method
  • They are maxima over relevant nodes
  • In example, Hommel adjusted p-values are
    pA(1)0.0287, pA(2)0.0287, pA(3)0.0382,
    pA(4)0.1986.
  • Hommel adjusted p-value Holm adjusted
    p-value

83
Hochbergs Method
  • A conservative but simpler approximation to
    Hommels method
  • Hommel adjusted p-value
  • Hochberg adjusted p-value
  • Holm adjusted p-value

84
Hochbergs Shortcut Method
  • Let H(1) ,,H(k) be the hypotheses corresponding
    to p(1) p(k)
  • If p(k) a, reject all H(j) and stop, else
    retain H(k) and continue.
  • If p(k-1) a/2, reject H(2) H(k) and stop,
    else retain H(k-1) and continue.
  • If p(1) a/k, reject H(k)
  • Adjusted p-values are pA(j) minji (k-i1)p(i) .

85
Worksheet for Hochbergs Method
86
Comparison of Adjusted P-Values
p-Values
Stepdown Test Raw Bonferroni
Hochberg Hommel 1 0.0121
0.0484 0.0382 0.0286 2
0.0142 0.0484 0.0382
0.0286 3 0.1986 0.1986
0.1986 0.1986 4 0.0191
0.0484 0.0382 0.0382
87
Fisher Combination Test for Independent p-Values
Reject H01ÇH02Ç...ÇH0m if -2Sln(pi) gt
c2(1-a, 2m)
88
Example Non-Overlapping Subgroup p-values
The Multtest Procedure
p-Values
Stepdown
Fisher Test Raw Bonferroni Hochberg
Hommel Combination 1 0.0784 0.3918
0.1550 0.1550 0.0784 2 0.0480
0.2883 0.1550 0.1441
0.0480 3 0.0041 0.0325 0.0305
0.0285 0.0053 4 0.0794 0.3918
0.1550 0.1550 0.0794 5 0.0044
0.0325 0.0305 0.0305 0.0056
6 0.0873 0.3918 0.1550 0.1550
0.0873 7 0.1007 0.3918
0.1550 0.1550 0.1007 8 0.1550
0.3918 0.1550 0.1550 0.1550
Non-overlapping is required by the independence
assumption.
89
Power Comparison
Liptak test stat T S F-1(pi) S Zi
90
Concluding Notes
  • Closed testing more powerful than single-step
    (a/m rather than a/k).
  • P-value based methods can be used whenever
    p-values are valid
  • Dependence issues
  • MinP (Holm) conservative
  • Simes (Hommel, Hochberg) less conservative,
    rarely anti-conservative
  • Fisher combination, Liptak require independence

91
Closed and Stepwise Testing Methods IIFixed
Sequences and Gatekeepers
  • Methods Covered
  • Fixed Sequences (hierarchical endpoints, dose
  • response, non-inferiority superiority)
  • Gatekeepers (primary and secondary analyses)
  • Multiple Gatekeepers (multiple endpoints
  • multiple doses)
  • Intersection-Union tests

Doesnt really belong in this section
92
Fixed Sequence Tests
  • Pre-specify H1, H2, , Hk, and test in this
    sequence, stopping as soon as you fail to reject.
  • No a-adjustment is necessary for individual
    tests.
  • Applications
  • Dose response High vs. Control, then Mid vs.
    Control, then Low vs. Control
  • Primary endpoint, then Secondary endpoint

93
Fixed Sequence as a Closed Procedure
H123 d1d2d3 0 Rej if p1 .05
H12 d1d20 Rej if p1 .05
H13 d1d3 0 Rej if p1 .05
H23 d2d3 0 Rej if p2 .05
H1 d10 Rej if p1 .05
H2 d2 0 Rej if p2 .05
H3 d3 0 Rej if p3 .05
  • Rej H1 if p1.05
  • Rej H2 if p1.05 and p2.05
  • Rej H3 if p1.05 and p2.05 and p3.05

94
A Seemingly Reasonable But Incorrect Protocol
  • 1. Test Dose 2 vs Pbo, and Dose 3 vs Pbo using
    the Bonferroni method (0.025 level).
  • 2. Test Dose 1 vs Pbo at the unadjusted 0.05
    level only if at least one of the first two tests
    is significant at the 0.025 level.

95
The problem FWE 0.075
Moral Caution needed when there are multiple
hypotheses at some point in the sequence.
96
Correcting the Incorrect Protocol Use Closure
Where pij 2min(pi,pj)
97
References Fixed Sequence and Gatekeeper Tests
  1. Bauer, P (1991) Multiple Testing in Clinical
    Trials, Statistics in Medicine, 10, 871-890.
  2. ONeill RT. (1997) Secondary endpoints cannot be
    validly analyzed if the primary endpoint does not
    demonstrate clear statistical significance.
    Controlled Clinical Trials 18550 556.
  3. DAgostino RB. (2000) Controlling alpha in
    clinical trials the case for secondary
    endpoints. Statistics in Medicine 19763766.
  4. Chi GYH. (1998) Multiple testings multiple
    comparisons and multiple endpoints. Drug
    Information Journal 321347S1362S.
  5. Bauer P, Röhmel J, Maurer W, Hothorn L. (1998)
    Testing strategies in multi-dose experiments
    including active control. Statistics in Medicine
    172133 2146.
  6. Westfall, P.H. and Krishen, A. (2001). Optimally
    weighted, fixed sequence, and gatekeeping
    multiple testing procedures, Journal of
    Statistical Planning and Inference 99, 25-40.
  7. Chi, G. Clinical Benefits, Decision Rules, and
    Multiple Inferences, http//www.fda.gov/cder/Offi
    ces/Biostatistics/chi_1/sld001.htm
  8. Dmitrienko, A, Offen, W. and Westfall, P. (2003).
    Gatekeeping strategies for clinical trials that
    do not require all effects to be significant.
    Stat Med. 22 2387-2400.
  9. Chen X, Luo X, Capizzi T. (2005) The application
    of enhanced parallel gatekeeping strategies. Stat
    Med. 241385-97.
  10. Alex Dmitrienko, Geert Molenberghs, Christy
    Chuang-Stein, and Walter Offen (2005), Analysis
    of Clinical Trials Using SAS A Practical Guide,
    SAS Press.
  11. Wiens, B, and Dmitrienko, A. (2005). The fallback
    procedure for evaluating a single family of
    hypotheses. J Biopharm Stat.15(6)929-42.
  12. Dmitrienko, A., Wiens, B. and Westfall, P.
    (2006). Fallback Tests in Dose Response Clinical
    Trials, J Biopharm Stat, 16, 745-755.

98
Intersection-Union (IU) Tests
  • Union-Intersection (UI) Nulls are intersections,
    alternatives are unions.
  • H0 d10 and d20 vs. H1 d1¹0 or d2¹0
  • Intersection-Union (IU) Nulls are unions,
    alternatives are intersections
  • H0 d10 or d20 vs. H1 d1¹0 and d2¹0
  • IU is NOT a closed procedure. It is just a single
    test of a different kind of null hypothesis.

99
Applications of I-U
  • Bioequivalence The TOST test
  • Test 1. H01 d -d0 vs. HA1 d gt -d0
  • Test 2. H01 d ³ d0 vs. HA1 d lt d0
  • Can test both at a.05, but must reject both.
  • Combination Therapy
  • Test 1. H01 m12 m1 vs. HA1 m12 gt m1
  • Test 2. H01 m12 m2 vs. HA1 m12 gt m2
  • Can test both at a.05, but must reject both.

100
Control of Type I Error for IU tests
Suppose d10 or d20. Then P(Type I error)
P(Reject H0)
(1) P(p1.05 and p2.05)
(2) lt minP(p1.05), P(p2.05)
(3) .05.
(4) Note The
inequality at (3) becomes an approximate
equality when p2 is extremely noncentral.
101
Concluding Notes Fixed Sequences and
Gatekeepers
  • Many times, no adjustment is necessary at all!
  • Other times you can gain power by specifying
  • gatekeeping sequences
  • However, you must clearly state the method and
  • follow the rules
  • There are many incorrect no adjustment
    methods -
  • use caution

102
Closed and Stepwise Testing Methods III Methods
that Use Logical Constraints and Correlations
Methods Application Lehmac
her et al Multiple endpoints Westfall
-Tobias- Shaffer-Royen General
contrasts
103
Lehmacher et al. Method
  • Use OBrien test at each node (incorporates
    correlations)
  • Do closed testing
  • Note Possibly no adjustment whatsoever possibly
    big
  • adjustment

104
Calculations for Lehmachers Method
proc standard dataresearch.multend1 mean0 std1
outstdzd var Endpoint1-Endpoint4 run data
combine set stdzd H1234
Endpoint1Endpoint2Endpoint3Endpoint4 H123
Endpoint1Endpoint2Endpoint3
H124 Endpoint1Endpoint2 Endpoint4
H134 Endpoint1 Endpoint3Endpoint4
H234 Endpoint2Endpoint3Endpoint
4 H12 Endpoint1Endpoint2
H13 Endpoint1 Endpoint3
H14 Endpoint1
Endpoint4 H23
Endpoint2Endpoint3 H24
Endpoint2 Endpoint4 H34
Endpoint3Endpoint4 H1
Endpoint1 H2
Endpoint2 H3
Endpoint3
H4
Endpoint4 run proc ttest class treatment
var H1234 H123 H124 H134 H234
H12 H13 H14 H23 H24 H34 H1 H2 H3 H4
ods output tteststtests run
105
Output For Lehmachers Method
Obs Variable Method Variances
tValue DF Probt 1 H1234
Pooled Equal 2.69 109 0.0082
3 H123 Pooled Equal
2.59 109 0.0108 5 H124
Pooled Equal 3.03 109 0.0031
7 H134 Pooled Equal
2.36 109 0.0201 9 H234
Pooled Equal 2.51 109 0.0136
11 H12 Pooled Equal
3.03 109 0.0030 13 H13
Pooled Equal 2.12 109 0.0365
15 H14 Pooled Equal
2.68 109 0.0085 17 H23
Pooled Equal 2.22 109 0.0287
19 H24 Pooled Equal
2.88 109 0.0047 21 H34
Pooled Equal 2.03 109 0.0450
23 H1 Pooled Equal
2.55 109 0.0121 25 H2
Pooled Equal 2.49 109 0.0142
27 H3 Pooled Equal
1.29 109 0.1986 29 H4
Pooled Equal 2.38 109 0.0191
pA1 max(0.0121, 0.0085, 0.0365, 0.0030, 0.0201,
0.0031, 0.0108, 0.0082) 0.0365 pA2
max(0.0142, 0.0047, 0.0287, 0.0030, 0.0136,
0.0031, 0.0108, 0.0082) 0.0287 pA3
max(0.1986, 0.0450, 0.0287, 0.0365, 0.0136,
0.0201, 0.0108, 0.0082) 0.1986 pA4
max(0.0191, 0.0450, 0.0047, 0.0085, 0.0136,
0.0201, 0.0031, 0.0082) 0.0450
106
Free and Restricted Combinations
  • If truth of some null hypotheses logically forces
  • other nulls to be true, the hypotheses are
    restricted.
  • Examples
  • Multiple Endpoints, one test per endpoint - free
  • All Pairwise Comparisons - restricted

107
Pairwise Comparisons, 3 Groups
H0 m1m2m3
H0 m1m3,m2m3
H0 m1m2,m1m3
H0 m1m2,m2m3
H0 m1m2
H0 m1m3
H0 m2m3
Note The entire middle layer is not needed!!!!!
Fisher protected LSD valid!
108
Pairwise Comparisons, 4 Groups
m1m2m3m4
m1m3, m2m4
m1m4, m2m3
m1m2, m3m4
m1m2m3
m1m2m4
m1m3m4
m2m3m4
m1m2
m1m3
m1m4
m2m3
m2m4
m3m4
Note Logical implications imply that there are
only 14 nodes, not 26 -1 63 nodes. Also,
Fisher protected LSD not valid.
109
Restricted Combinations Multipliers
(Shaffer Method 1 Modified Holm)
Shaffer, J.P. (1986). Modified sequentially
rejective multiple test procedures. JASA 81,
826831.
110
Shaffers (1) Adjusted p-values
111
Westfall/Tobias/Shaffer/Royen Method
  • Uses actual distribution of MinP instead of
  • conservative Bonferroni approximation
  • Closed testing incorporating logical constraints
  • Hard-coded in PROC GLIMMIX
  • Allows arbitrary linear functions

Westfall, P.H. and Tobias, R.D. (2007).
Multiple Testing of General Contrasts
Truncated Closure and the Extended Shaffer-Royen
Method, Journal of the American Statistical
Association 102 487-494.
112
Application of Truncated Closed MinP to Subgroup
Analysis
  • Compare Treatment with control as follows
  • Overall
  • In the Older Patients subgroup
  • In the Younger Patients subgroup
  • In patients with better initial health subgroup
  • In patients with poorer initial health subgroup
  • In each of the four (old/young)x(better/poorer)
    subgroups
  • 9 tests overall (but better 1 gatekeeper 8
    follow-up)

113
Analysis File
ods output estimatesestimates_logicaltests proc
glimmix dataresearch.respiratory class
Treatment AgeGroup InitHealth model score
Treatment AgeGroup InitHealth TreatmentAgeGroup
TreatmentInitHealth AgeGroupInitHealth Estimate
"Overall" treatment 4 -4
treatmentAgegroup 2 2 -2 -2 treatmentInitHealt
h 2 2 -2 -2 (divisor4), "Older"
treatment 2 -2 treatmentAgegroup 2 0 -2 0
treatmentInitHealth 1 1 -1 -1 (divisor2), "Young
er" treatment 2 -2 treatmentAgegroup 0
2 0 -2 treatmentInitHealth 1 1 -1 -1
(divisor2), "GoodInitHealth" treatment 2 -2
treatmentAgegroup 1 1 -1 -1 treatmentInitHealt
h 2 0 -2 0 (divisor2), "PoorInitHealth"
treatment 2 -2 treatmentAgegroup 1 1 -1 -1
treatmentInitHealth 0 2 0 -2 (divisor2), "OldGo
od" treatment 1 -1 treatmentAgegroup 1
0 -1 0 treatmentInitHealth 1 0 -1 0
, "OldPoor" treatment 1 -1
treatmentAgegroup 1 0 -1 0 treatmentInitHealt
h 0 1 0 -1 , "YoungGood" treatment 1 -1
treatmentAgegroup 0 1 0 -1 treatmentInitHealt
h 1 0 -1 0 , "YoungPoor" treatment 1 -1
treatmentAgegroup 0 1 0 -1 treatmentInitHealt
h 0 1 0 -1 /adjustsimulate(nsamp1000000
0 report seed12321) upper stepdown(typelogical
report) run proc print dataestimates_logicalt
ests noobs title "Subgroup Analysis Results
Truncated Closure" var label estimate Stderr
tvalue probt Adjp run
114
Results Truncated Closure
Subgroup Analysis Results

adjp_ adjp_ Label Estimate
StdErr tValue Probt logical
interval Overall 0.7075 0.1956
3.62 0.0002 0.0011 0.0015 Older
0.9952 0.2673 3.72
0.0002 0.0011 0.0011 Younger
0.4197 0.2847 1.47 0.0717 0.1049
0.2605 GoodInitHealth 0.5871
0.2878 2.04 0.0219 0.0432
0.0984 PoorInitHealth 0.8279 0.2644
3.13 0.0011 0.0023 0.0068 OldGood
0.8748 0.3387 2.58 0.0056
0.0124 0.0295 OldPoor 1.1157
0.3231 3.45 0.0004 0.0011
0.0026 YoungGood 0.2993 0.3562
0.84 0.2014 0.2014 0.5494 YoungPoor
0.5401 0.3338 1.62 0.0544
0.1049 0.2091
The adjusted p-values for the stepdown tests are
mathematically smaller than those of the
simultaneous interval-based tests,
115
Example Stepwise Pairwise vs. Control Testing
  • Teratology data set
  • Observations are litters
  • Response variable litter weight
  • Treatments 0,5,50,500.
  • Covariates Litter size, Gestation time

116
Analysis File
proc glimmix dataresearch.litter class
dose model weight dose gesttime number
estimate "5 vs 0" dose -1 1 0 0, "50 vs
0" dose -1 0 1 0, "500 vs 0" dose -1 0 0 1
/ adjustsimulate(nsample10000000 report)
stepdown(typelogical) run quit
117
Results
Estimates with Simulated Adjustment
Standard Label Estimate
Error DF t Value Pr gt t Adj P
5 vs 0 -3.3524 1.2908 68
-2.60 0.0115 0.0316 50 vs 0 -2.2909
1.3384 68 -1.71 0.0915
0.0915 500 vs 0 -2.6752 1.3343 68
-2.00 0.0490 0.0907
Note 50-0 and 500-0 not significant at .10 with
regular Dunnett
118
Concluding Notes
  • More power is available when combinations are
    restricted.
  • Power of closed tests can be improved using
    correlation and other distributional
    characteristics

119
Nonparametric Multiple Testing Methods
  • Overview Use nonparametric tests at each node
    of the
  • closure tree
  • Bootstrap tests
  • Rank-based tests
  • Tests for binary data

120
Bootstrap MinP Test (Semi-Parametric Test)
  • The composite hypothesis H1ÇH2ÇÇHk may be tested
    using the p-value
  • p P(MinP minp H1ÇH2ÇÇHk)
  • Westfall and Young (1993) show
  • how to obtain p by bootstrapping the residuals
    in a multivariate regression model.
  • how to obtain all ps in the closure tree
    efficiently

121
Multivariate Regression Model (Next Five
slides are from Westfall and Young, 1993)
122
Hypotheses and Test Statistics
123
Joint Distribution of the Test Statistics
124
Testing Subset Intersection Hypotheses Using the
Extreme Pivotals
125
Exact Calculation of pK
Bootstrap Approximation
126
Bootstrap Tests (PROC MULTTEST)
H0 d1d2d3d4 0 min p .0121, p .0379
H0 d1d3d4 0 min p .0121, p lt .0379
H0 d2d3d4 0 min p .0142, p .0351
H0 d1d2d3 0 min p .0121, p lt .0379
H0 d1d2d4 0 min p .0121, p lt .0379
H0 d1d2 0 minp .0121 p lt .0379
H0 d1d3 0 minp .0121 p lt .0379
H0 d3d4 0 minp .0191 p .0355
H0 d1d4 0 minp .0121 p lt .0379
H0 d2d3 0 minp .0142 p lt .0351
H0 d2d4 0 minp .0142 p lt .0351
H0 d40 p 0.0191 p lt .0355
H0 d10 p 0.0121 p lt .0379
H0 d20 p 0.0142 p lt .0351
H0 d30 p 0.1986 p .1991
p P(Min P min p H0) (computed using
bootstrap resampling) (Recall, for Bonferroni, p
k(MinP) )
127
Permutation Tests for Composite Hypotheses H0K
Joint p-value proportion of the n!/(nT!nC!)
permutations for which miniÎK Pi miniÎK pi .
128
Problem Simplification
Problem There are 2k -1 subsets K to be
tested This might take a while...
Simplification You need only test k of the 2k-1
subsets! Why? Because P(miniÎK Pi c)
P(miniÎK Pi c) when KÌ K. Significance
for most lower order subsets is determined by
significance of higher order subsets.
129
MULTTEST PROCEDURE
Tests only the needed subsets (k, not 2k -
1). Samples from the permutation
distribution. Only one sample is needed, not k
distinct samples, if the joint distribution of
minP is identical under HK and HS. (Called
the subset pivotality condition by Westfall
and Young, 1993, valid under location shift and
other models)
130
Great Savings are Possible with Exact Permutation
Tests!
Why? Suppose you test H12k using MinP. The
joint p-value is p P(MinP minp)
P(P1 minp) P(P2 minp)
P(Pk minp) Many summands can be zero,
others much less than minp.
131
Multiple Binary Adverse Events

Stepdown Stepdown Variable
Contrast Raw Bonferroni
Permutation ae1 t vs c
0.0008 0.0025 0.0020 ae2
t vs c 0.6955 1.0000
1.0000 ae3 t vs c
0.5000 1.0000 1.0000 ae4
t vs c 0.7525 1.0000
1.0000 ae5 t vs c
0.2213 1.0000 0.6274 ae6
t vs c 0.0601 0.3321
0.2608 ae7 t vs c
0.8165 1.0000 1.0000 ae8
t vs c 0.0293 0.1587
0.1328 ae9 t vs c
0.9399 1.0000 1.0000 ae10
t vs c 0.2484 1.0000
0.9273 ae11 t vs c
1.0000 1.0000 1.0000 ae12
t vs c 1.0000 1.0000
1.0000 ae13 t vs c
1.0000 1.0000 1.0000 ae14
t vs c 1.0000 1.0000
1.0000 ae15 t vs c
0.2484 1.0000 0.9273 ae16
t vs c 0.7516 1.0000
1.0000 ae17 t vs c
1.0000 1.0000 1.0000 ae18
t vs c 1.0000 1.0000
1.0000 ae19 t vs c
1.0000 1.0000 1.0000 ae20
t vs c 0.5000 1.0000
1.0000 ae21 t vs c
0.7516 1.0000 1.0000 ae22
t vs c 1.0000 1.0000
1.0000 ae23 t vs c
0.5000 1.0000 1.0000 ae24
t vs c 1.0000 1.0000
1.0000 ae25 t vs c
1.0000 1.0000 1.0000 ae26
t vs c 1.0000 1.0000
1.0000 ae27 t vs c
1.0000 1.0000 1.0000 ae28
t vs c 0.4344 1.0000
0.9400
132
Example Genetic Associatons
Phenotype 0/1 (diseased or not). Sample n1 from
diseased, n2 from not diseased. Compare 100s of
genotype frequencies (using dominant and
recessive codings) for diseased and non-diseased
using multiple Fisher exact tests.
133
PROC MULTTEST Code
proc multtest dataresearch.gen stepperm n20000
outpval hommel fdr class y test
fisher(d1-d1
Write a Comment
User Comments (0)
About PowerShow.com