Statistical Principles for Clinical Research - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Statistical Principles for Clinical Research

Description:

... in software, printed tables in stat texts, or even shuffled slips of paper. ... Margin of error = the value (half-width) of the 95% confidence interval. ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 49
Provided by: peterchr3
Category:

less

Transcript and Presenter's Notes

Title: Statistical Principles for Clinical Research


1
Statistical Principles for Clinical Research
Conducting Clinical Trials 2007
Sponsored by NIH General Clinical Research
Center Los Angeles Biomedical Research
Institute at Harbor-UCLA Medical Center November
1, 2007 Peter D. Christenson
2
Speaker Disclosure Statement
The speaker has no financial relationships
relevant to this presentation.
3
Recommended Textbook Making Inference
Design issues Biases How to read
papers Meta-analyses Dropouts Non-mathematical Man
y examples
4
Example Harbor Study Protocol
18 Pages of Background and Significance,
Preliminary Studies, and Research Design and
Methods. Then Pearson correlation, repeated
measure of the general linear model, ANOVA
analyses and student t tests will be used where
appropriate. The two main parameters of
interest will be A and B. For A, using a
t-test 40 subjects provide 80 assurance that a
XX reduction will be detected, with plt0.05.
Similar comparisons as for A and B will be
carried out
5
Example Harbor Study Protocol
The good . The two main parameters of
interest will be A and B. For A, using a
t-test, 40 subjects provide 80 assurance that a
XX reduction will be detected, with plt0.05.
  • Because
  • Explicit Specifies primary outcome of interest.
  • Explicit Justification for of subjects.

6
Example Harbor Study Protocol
the Bad Pearson correlation, repeated
measure of the general linear model, ANOVA
analyses and student t tests will be used where
appropriate.
  • Because
  • Boilerplate.
  • These methods are almost always used.
  • Where appropriate?
  • Tries to satisfy reviewer, not science.

7
Example Harbor Study Protocol
and the Ugly. Similar comparisons as for A
and B will be carried out
  • Because
  • 1ยบ OK Diff b/w 2 visits for 2 measures, A B.
  • But, 15 measures taken at each of 19 visits.
  • Torture the data long enough, and it will
    confess to something.

8
Goals of this Presentation
More good. Less bad. Less ugly.
9
Biostatistical Involvement in Studies
Off-site statistical design and
analysis Multicenter studies data coordinating
center. In house drug company statisticians. CRO
through NIH or drug company. Local study
contracted elsewhere e.g. UCLA, USC,
CRO. Local protocol, and statistical design and
analysis Occasionally multicenter.
10
Studies with Off-Site Biostatistics
  • Not responsible for statistical design and
    analysis.
  • Are responsible for study conduct that may
  • impact analysis, believability of results.
  • reduce sensitivity (power) of the study to
    be able to detect effects.

11
Review of Basic Method of Inference from
Clinical Studies
12
Typical Study Data Analysis
Large enough signal-to-noise ratio ? Proves an
effect beyond a reasonable doubt. Often
Observed Effect Natural Variation/vN
Signal Noise
Ratio


For a t-test comparing two groups
Difference in Means SD/vN
t Ratio

Degree of allowable doubt ? How large t needs to
be.
5 (plt0.05) ? t gt 2
13
Meaning of p-value
p-value Probability of a test statistic (ratio)
that is at least as deviant as was observed, if
there is really no effect. Smaller p-values ?
more evidence of effect.
  • Validity of p-value interpretation typically
    requires
  • Proper data generation, e.g., randomness.
  • Subjects provide independent information.
  • Data is not used in other statistical tests.
  • or an accounting for not satisfying these
    criteria.

? p-values are earned by satisfying appropriately.
14
Analogy with Diagnostic Testing
Analogy True Effect ? Disease Study Claim
? Diagnosis
Truth
No Effect
Effect
Study Claims
No Effect
Correct
Error
Specificity
Sensitivity
Effect
Correct
Error
Set p0.05 Specificity95
Power Maximize. Choose N for 80
? Typical ?
15
Study Conduct Impacting Analysis
? effect detectability (and ?ratio) results from
Non-adherence of study personnel to the protocol
in general. Increases variation. Enrolling
subjects who do not satisfy inclusion or
exclusion criteria. E.g., no effect in 10
wrongly included real effect50 ? 0.9(50)
45 observed effect. Can decrease observed
effect. Subjects not completing entire study.
May decrease N, or give potentially conflicting
results.
16
Potentially Conflicting Results
Example Subjects not completing the entire study.
17
Tigabine Study Results How Believable?
1
2
3
Conclusions differ depending on how
non-completing subjects (24) are handled in the
analysis.
Primary analysis here is specified, but we would
prefer robustness to the method of analysis
(agreement), which is more likely with more
completing subjects.
18
Study Conduct Impacting Analysis
Intention-to-Treat (ITT)
ITT typically specifies that all subjects are
included in analysis, regardless of treatment
compliance or whether lost to follow-up. Purposes
Avoid bias from subjective exclusions or
differential exclusion between treatment groups
sometimes argued to mimic non-compliance in real
world setting. More emphasis on policy
implications of societal effectiveness than on
scientific efficacy. Not appropriate for many
studies.
Continued
19
Study Conduct Impacting Analysis
Intention-to-Treat (ITT)
Lost to follow-up Always minimize no real
world analogy as for treatment compliance. Need
to define outcomes for non-completing
subjects. Current Harbor study N1200 would
need N3000 if ITT used, 20 lost, and lost
counted as treatment failures.
20
ITT Need to Impute Unknown Values
Observations
LOCF Ignore Presumed Progression
0
Change from Baseline
Individual Subjects
Baseline
Intermediate Visit
Final Visit
Ranks
LRCF Maintain Expected Relative Progression
0
Change from Baseline
Intermediate Visit
Final Visit
Baseline
21
Study Conduct Impacting Feasibility
Potential Effects of Slow Enrollment
  • Needed N may be impossible ? Study stopped.
  • Competitive site enrollment ? Local financial
    loss.
  • Insufficient person-years (PY) of observation
    for some studies, even if N is attained

Area PY
of Subjects
Detects Effect?
Detects Effect1.1?
Detects Effect1.7?
N
Year
0 1 2
0 1 2
0 1 2
Planned
Slower Yet
Slower
22
Biostatistical Involvement in Studies
Off-site statistical design and
analysis Multicenter studies data coordinating
center. In-house drug company statisticians. By
CRO through NIH or drug company. Local study
contracted elsewhere e.g. UCLA, USC, CRO Local
protocol, and statistical design and
analysis Occasionally multicenter.
23
Local Protocols and Data Analysis
  • Develop protocol and data analysis plan.
  • Have randomization and blinding strategy, if
    study requires.
  • Data management.
  • Perform data analyses.

24
Local Data Analysis Resources
Biostatistician Peter Christenson,
PChristenson_at_labiomed.org. Develop study design,
analysis plan. Advise throughout for any
study. Perform all non-basic analyses. Full
responsibility for studies with funded
FTE. Review some protocols for committees. Data
Management Database development for GCRC
studies by database manager.
25
Statistical Components of Protocols
  • Target population / source of subjects.
  • Quantification of aims, hypotheses.
  • Case definitions, endpoints quantified.
  • Randomization plan, if any.
  • Masking, if used.
  • Study size screen, enroll, complete.
  • Use of data from non-completers.
  • Justification of study size (power, precision,
    other).
  • Methods of analysis.
  • Mid-study analyses.

26
SelectedStatistical Componentsand Issues
27
Case Definitions and Endpoints
  • Primary case definitions and endpoints need
    careful thought.
  • Will need to report results based on these.

Example Study at Harbor Definition of cure very
strict. Analyzed data with this definition. Cure
rates too low - would not be taken
seriously. Scientific method ? need to report
them otherwise cherry-picking. Publication Use
primary definition explain also report with
secondary definition. Less credible.
28
Randomization
  • Helps assure attributability of treatment
    effects.
  • Blocked randomization assures approximate
    chronologic equality of numbers of subjects in
    each treatment group.
  • Recruiters must not have access to randomization
    list.
  • List can be created with a random number
    generator in software, printed tables in stat
    texts, or even shuffled slips of paper.

29
Non-completing Subjects
  • Enrolled subjects are never dropouts.
  • Protocol should specify
  • Primary analysis set (e.g., ITT or per-protocol).
  • How final values will be assigned to
    non-completers.
  • Time-to-event (survival analysis) studies may not
    need final assignments use time followed.
  • Study size estimates should incorporate the
    number of expected non-completers.

30
Study Size Power
  • Power Probability of detecting real effects of
    a specified minimal (clinically relevant)
    magnitude
  • Power will be different for each outcome.
  • Power depends on the statistical method.
  • Five factors including power are inter-related.
    Fixing four of these specifies the fifth
  • Study size
  • Heterogeneity among subjects (SD)
  • Magnitude of treatment effect to be detected
  • Power to detect this magnitude of effect
  • Acceptable chance of false positive conclusion,
    usually 0.05

31
Free Study Size Software
www.stat.uiowa.edu/rlenth/Power
32
Free Study Size Software Example
Pilot data SD8.19 in 36 subjects. We propose
N40 subjects/group in order to provide 80 power
to detect (plt0.05) an effect ? of 5.2
33
Study Size May Not be Based on Power
Precision refers to how well a measure is
estimated. Margin of error the value
(half-width) of the 95 confidence
interval. Smaller margin of error ?? greater
precision. To achieve a specified margin of
error, solve the CI formula for N. Polls N
1000? margin of error on 1/vN 3.
Pilot Studies, Phase I, Some Phase II Power not
relevant may have a goal of obtaining an SD for
future studies.
34
Mid-Study Analyses
  • Mid-study comparisons should not be made before
    study completion unless planned for (interim
    analyses). Early comparisons are unstable, and
    can invalidate final comparisons.
  • Interim analyses are planned comparisons at
    specific times, usually by an unmasked advisory
    board. They allow stopping the study early due to
    very dramatic effects, and final comparisons, if
    study continues, are adjusted to validly account
    for peeking.

Continued
35
Mid-Study Analyses
Too many analyses
Effect
0
Wrong early conclusion
Time ?
Number of Subjects Enrolled
Need to monitor, but also account for many
analyses
36
Mid-Study Analyses
  • Mid-study reassessment of study size is advised
    for long studies. Only standard deviations to
    date, not effects themselves, are used to assess
    original design assumptions.
  • Feasibility analysis
  • may use the assessment noted above to decide
    whether to continue the study.
  • may measure effects, like interim analyses, by
    unmasked advisors, to project ahead on the
    likelihood of finding effects at the planned end
    of study.

Continued
37
Mid-Study Analyses
Examples Studies at Harbor Randomized not
masked data available to PI. Compared treatment
groups repeatedly, as more subjects were enrolled.
Study 1 Groups do not differ plan to add more
subjects. Consequence ? final p-value not valid
probability requires no prior knowledge of
effect. Study 2 Groups differ significantly
plan to stop study. Consequence ? use of this
p-value not valid the probability requires
incorporating later comparison.
38
Multiple Analyses at Study End
False Positive Conclusions
Torturing Data
Replacing Subgroup with Analysis Gives a
Similar Problem
Lagakos NEJM 354(16)1667-1669.
39
Multiple Analyses at Study End
  • There are formal methods to incorporate the
    number of multiple analyses.
  • Bonferroni
  • Tukey
  • Dunnett
  • Transparency of what was done is most
    important.
  • Should be aware of number of analyses and
    report it with any conclusions.

40
SummaryBad Science That May Seem So Good
  • Re-examining data, or using many outcomes,
    seeming to be performing due diligence.
  • Adding subjects to a study that is showing
    marginal effects or, stopping early due to
    strong results.
  • Examining effects in subgroups. See NEJM 2006
    354(16)1667-1669.
  • Actually bad? Could be negligent NOT to do these,
    but need to account for doing them.

41
Statistical Software
42
Professional Statistics Software Package
Output
Stored data access-ible.
Enter code syntax.
43
Microsoft Excel for Statistics
  • Primarily for descriptive statistics.
  • Limited output.

44
Almost Free On-Line Statistics Software
www.statcrunch.com
Run from browser not local. 5/ 6 months
usage. Potential HIPPA concerns
Supported by NSF
45
Typical Statistics Software Package
Select Methods from Menus
www.ncss.com www.minitab.com www.stata.com
100 - 500
Output after menu selection
Data in spreadsheet
46
http//gcrc.labiomed.org/biostat
This and other biostat talks posted
47
Conclusions
Dont put off slow enrollment find the cause
solve it. I am available. Do put off analyses of
efficacy, not of design assumptions. I am
available. P-values are earned, by following
methods which are needed for them to be valid. I
am available. You may have to pay for lack of
attention to protocol decisions, to satisfy the
scientific method. I am available. Software
always takes more time than expected.
48
Thank You
Nils Simonson, in Furberg Furberg, Evaluating
Clinical Research
Write a Comment
User Comments (0)
About PowerShow.com