Title: SampleSize Analysis: Considering Traditional and Crucial Type I and Type II Error Rates
1Sample-Size AnalysisConsidering Traditional and
CrucialType I and Type II Error Rates
- Ralph O'Brien, PhDCenter for Clinical
InvestigationCase Western Reserve University
Edvard MunchThe Scream1893
2Objectives
- Explain Type I and II errors and error rates,
both classical and crucial. - Understand what factors affect these error rates.
- Determine the sample size to achieve given
objectives. - Use the software PASS and OBriens Excel program
to analyze power and sample size.
3Eric Topol, et al (1997)
- The calculation and justification of sample size
is at the crux of the design of a trial. Ideally,
clinical trials should have adequate power, 90,
to detect a clinically relevant difference
between the experimental and control therapies.
Unfortunately, the power of clinical trials is
frequently influenced by budgetary concerns as
well as pure biostatistical principles
4Eric Topol, et al (1997)
- Yet an underpowered trial is, by definition,
unlikely to demonstrate a difference between the
interventions assessed and may ultimately be
considered of little or no clinical value. From
an ethical standpoint, an underpowered trial may
put patients needlessly at risk of a new therapy
without being able to come to a clear conclusion.
5Richard Feynman
Richard Feynman, 1918-1988Nobel Laureate in
PhysicsAdventuresome, joking ever the
curious character
6The March of Science
- Scientific knowledge is a body of statements of
varying degrees of uncertainty, - some mostly unsure,
- some nearly sure,
- none absolutely certain
Richard Feynman, 1918-1988Nobel Laureate in
PhysicsAdventuresome, joking ever the
curious character
7March of Science in clinical research
8Peter Stacpoole
- Does DCAdecrease mortality inchildren
withsevere malaria?
9Sol Capote
- Does QCAdecrease mortality inchildren
withsevere malaria?
Pablo Picasso Portrait Max Jacob, 1907
10Capotes proposed study?
- Design? Allocation ratio?
- Subjects?
- Primary efficacy outcome measure?
- Primary analysis? One- or two-sided test?
- Scenario for the infinite data set?
11Design? Allocation ratio?
- Two groups
- Randomized, double blind.
- 1 pt gets Usual Care 2 pts get QCA
12Subjects?
- Set inclusion and exclusion criteria.
- Total N? Consider 700 1400 2100.
13Primary efficacy outcome measure?
- Death before 10 days (0 vs. 1)
- No censoring
- Disregard exact survival times
14Primary analysis?One- or two-sided test?
- Compare deaths
- Likelihood ratio test of 2 ind.
proportions.(Others would be OK, too.) - Two-sided (This is arguable.)
15Scenario for the infinite data set?
- Usual Care mortality rate 0.15
- QCA cuts mortality by 25 or 33.33
- ? QCA mortality rate 0.1125 or 0.10.
16(No Transcript)
17(No Transcript)
18- Do classical power analysis with PASS.
19Beyond ? and ?
- What arecrucialType I and Type II error rates?
20Crucial error rates
- Type I If the test yields traditional
statistical significance (p ? ?), what is the
chance this will be an incorrect inference? - Type II If the test does not yield traditional
statistical significance (p ??), what is the
chance this will be an incorrect inference?
21- How does this relate to statistical power?
22Which study has the strongest evidence that QCA
is effective?
Note With UCO mortality of 0.15 and QCA relative
risk of 0.67 and ? 0.05 Ntotal 450 ? power
0.33 Ntotal 2100 ? power 0.90
23Suppose you believe that
- Only 30 of hypotheses being investigated are
actually non-null.
24Lee and Zelen (2000)
Same logic as in OBrien and Castelloe
(2006), but we switched ? and ? notation (on
purpose).
25Lee and Zelen (2000)
26(No Transcript)
27Of course, this is an example of Bayes Theorem,
but you may want to say that quietly due to
Bayesphobia, disease with still high prevalence
in the scientific community (including
statistical scientists).
28Greater power reduces both crucial error rates.
29- Study crucial error rates with Excel program.
30 31Does zinc gluconate glycine reduce the duration
of the common cold?
Dr. Macknins study
32(No Transcript)
33(No Transcript)
34(No Transcript)
35Mossad, Macknin, et al. (1996)
36Mossad, Macknin, et al. (1996)
- sample size of 100 patients
- detect a difference in number of days with cold
symptomsmeans - 8 days in the placebo group
- 4 days in the zinc group
- standard deviation of 6 days
- two-sided P-value of 0.05
- approximate power of 90
It looks like theyassumed Normality.
37But wait! Does this scenario make sense?
38Checking (with SAS)
- proc power
- TwoSampleMeans
- GroupMeans 4 8
- StdDev 6
- alpha 0.05
- NPerGroup 50
- power .
- run
39- The POWER Procedure
- Two-sample t Test for Mean Difference
- Distribution Normal
- Method Exact
- Alpha 0.05
- Group 1 Mean 4
- Group 2 Mean 8
- Standard Deviation 6
- Sample Size Per Group 50
- Number of Sides 2
- Null Difference 0
- Computed Power
- 0.910
40Better way assume logNormal
- proc power
- TwoSampleMeans
- test ratio
- dist logNormal
- MeanRatio 0.5 / 4/8 /
- / Coef_Var SD/mean 6/4, 6/6, 6/8 /
- CV 1.5 1.0 0.75
- alpha 0.05
- NPerGroup 50
- power .
- run
41Powers for logNormal
- The POWER Procedure
- Two-sample t Test for Mean Ratio
- Distribution Lognormal
- Method Exact
- Alpha 0.05
- Geometric Mean Ratio 0.5
- Sample Size Per Group 50
- Number of Sides 2
- Null Geometric Mean Ratio 1
-
-
Computed Power - Index CV Power
- 1 1.50 0.885
- 2 1.00 0.985
- 3 0.75 gt.999
42As-analyzed way log-rank test
- proc power
- TwoSampleSurvival
- test logrank
- alpha 0.05
- AccrualTime 30
- TotalTime 90
- GroupMedSurvTimes 4 8
- NPerGroup 50
- power .
- run
43- The POWER Procedure
- Log-Rank Test for Two Survival Curves
- Method Lakatos
normal approx - Form of Survival Curve 1 Exponential
- Form of Survival Curve 2 Exponential
- Accrual Time 30
- Total 90
- Alpha 0.05
- Group 1 Median Survival Time 4
- Group 2 Median Survival Time 8
- Sample Size Per Group 50
- Computed Power
- 0.917
44Results
P lt 0.001, log-rank test
Placebo (n 50)
Cold-Eeze(n 49)
45(No Transcript)
46Dr. Macknin
- I got goosebumps when we broke the code. I
didnt think it was going to work. - here was something that actually looked like
it was helping the common cold. - nothing had really worked like this before.
47What was that again?
- I didnt think it was going to work.
48So, what do you think?
49- Quickly Reducing Atherosclerosis
50(No Transcript)
51 Atheroma Volume
Atheroma Area
EEM Area
8.1 mm2
14.37 mm2
56
52Does SuperHDL reduce atheroma volume in
patients with atherosclerosis?
Dr. Nissens study
53One critical thing
November 5, 2003 Cholesterol Study Offers Hope
for a Bold Therapy By GINA KOLATA
- A small study of heart disease patients testing a
hypothesis so improbable its principal
investigator says he gave it a one-in-10,000
chance of succeeding
54What was that again?
- a one-in-10,000 chance of succeeding
55(No Transcript)
56Power 75
57Change in Atheroma Volume
baseline
After 5 weeks
56
46
58Primary analysis
SuperHDL vs Placebo p 0.29
59ABC News World News TonightNovember 4, 2003
60Quotes from ABC News story
- Peter Jennings
- enormously promising treatment to stave off
heart disease. - very much appears to be a real breakthrough
61Quotes from ABC News story
- Dr. Rader (wrote JAMA editorial)
- unprecedented. This study shows that plaque
regression occurred much faster and to a much
greater extent than weve ever seen
62Quotes from ABC News story
- ABCs John McKenzie
- After just five weekly treatments, researchers
saw an average of 4 reduction in the amount of
plaque on artery walls.
63Quotes from ABC News story
- Dr. Rader (wrote JAMA editorial)
- A regression of 4 actually represents several
years worth of plaque build-up in the coronary
arteries.
64Dr. Nissen(Cleveland Clinic website, 11
November 2006)
- This is an extraordinary and unprecedented
finding, said Cleveland Clinic cardiologist
Steven E. Nissen, M.D., who directed the
10-center nationwide study. This is the first
convincing demonstration that targeting HDL, good
cholesterol, can benefit patients with heart
disease, the leading cause of death in the United
States.
65So, what do you think?
66Why is this statement wrong?
- As a result of this logic, if we are willing to
assert a difference when P lt .05, we are tacitly
agreeing to accept the fact that, over the long
run, we expect 1 assertion of a difference in 20
to be wrong. - From Glantz, SA (2002), Primer of Biostatistics,
p. 108
67This is a common misunderstanding of p-values.
- But the error rate it describesis what we should
care about.
68The crucial error rates
- Crucial False Positive Rate (?)If a test
turns out to be significant (p ?),what is the
chance it is a (Type I) error? - Crucial False Negative Rate (?)If a test turns
out to be non-significant(p gt ?), what is the
chance it is a (Type II) error?
69 70Same logic as common statistical methodology in
diagnostic testing
- sensitivity
- specificity
- positive predictive value
- negative predictive value
? power 1 - ? ? 1 - ?? ? 1 - ? ? 1 - ?
71Wacholder, et. Al (2004)
- FPRP False Positive Report Probability
Same logic as Lee and Zelen (2000).
72How many candidate drugs are studied for every
one that finally gets FDA approval?
73 74JAMA, 24 June 1998
75(No Transcript)
76(No Transcript)
77Needed!
- Sound way to use the data to compute the
posterior March of Science position. - Stay within frequentist testing. (Sorry, even as
cool as they are, regular Bayesian methods
require a much fuller specification of prior
information than just position of March of
Science.) - Stay tuned.
78Thanks!
Dont make the wrong mistake. - Yogi
Berra