# Basic Statistical Principles for the Clinical Research Scientist Kristin Cobb October 13 and October 20, 2004 - PowerPoint PPT Presentation

Title:

## Basic Statistical Principles for the Clinical Research Scientist Kristin Cobb October 13 and October 20, 2004

Description:

### Basic Statistical Principles for the Clinical Research Scientist Kristin Cobb October 13 and October 20, 2004 Statistics in Medical Research 1. Design phase ... – PowerPoint PPT presentation

Number of Views:197
Avg rating:3.0/5.0
Slides: 136
Provided by: John61
Category:
Tags:
Transcript and Presenter's Notes

Title: Basic Statistical Principles for the Clinical Research Scientist Kristin Cobb October 13 and October 20, 2004

1
Basic Statistical Principlesfor the Clinical
Research ScientistKristin CobbOctober 13 and
October 20, 2004
2
Statistics in Medical Research
• 1. Design phase
• Statistics starts in the planning stages of a
clinical trial or laboratory experiment to
• establish optimal sample size needed
• ensure sound study design
• 2. Analysis phase
• Make inferences about a wider population.

3
Common problems with statistics in medical
research
• Sample size too small to find an effect (design
phase problem)
• Sub-optimal choice of measurement for predictors
and outcomes (design phase problem)
• Inadequate control for confounders (design or
analysis problem)
problem)
• Incorrect statistical test used (analysis
problem)
• Incorrect interpretation of computer output
(analysis problem)
• Therefore, it is essential to collaborate with
a statistician both during planning and analysis!

4
• The statistical content of the paper is confusing
or misleading because the authors do not fully
understand the statistical techniques used by the
statistician.
• The statistician performs inadequate or
inappropriate analyses because she is unclear
about the questions the research is designed to
• Therefore, clinical research scientists need to
understand the basic principles of biostatistics

5
Outline (today and next week)
• 1. Primer on hypothesis testing, p-values,
confidence intervals, statistical power.
• 2. Biostatistics in Practice Applying statistics
to clinical research design

6
Quick review
• Standard deviation
• Histograms (frequency distributions)
• Normal distribution (bell curve)

7
Review Standard deviation
Standard deviation tells you how variable a
characteristic is in a population. For example,
how variable is height in the US? A standard
deviation of height represents the average
distance that a random person is away from the
mean height in the population.
8
Review Histograms
9
Review Histograms
1 inch bins
10
Review Normal Distribution
11
Review Normal Distribution
In fact, here, 101/150 (67) subjects have
heights between 62.7 and 67.7 (1 standard
deviation below and above the mean).
A perfect, theoretical normal distribution
carries 68 of its area within 1 standard
deviation of the mean.
12
Review Normal Distribution
In fact, here, 146/150 (97) subjects have
heights between 60.2 and 70.2 (2 standard
deviations below and above the mean).
A perfect, theoretical normal distribution
carries 95 of its area within 2 standard
deviations of the mean.
13
Review Normal Distribution
In fact, here, 150/150 (100) subjects have
heights between 57.7 and 72.7 (1 standard
deviation below and above the mean).
A perfect, theoretical normal distribution
carries 99.7 of its area within 3 standard
deviations of the mean.
14
Review Applying the normal distribution
• If womens heights in the US are normally
distributed with a mean of 65 inches and a
standard deviation of 2.5 inches, what percentage
of women do you expect to have heights above 6
feet (72 inches)?

From standard normal chart or computer ? Z of
2.8 corresponds to a right tail area of .0026
expect 2-3 women per 1000 to have heights of 6
feet or greater.
15
Statistics Primer
• Statistical Inference
• Sample statistics
• Sampling distributions
• Central limit theorem
• Hypothesis testing
• P-values
• Confidence intervals
• Statistical power

16
Statistical Inference The process of making
guesses about the truth from a sample.
17
• EXAMPLE What is the average blood pressure of
US post-docs?
• We could go out and measure blood pressure in
every US post-doc (thousands).
• Or, we could take a sample and make inferences
about the truth from our sample.

Using what we observe, 1. We can test an a priori
guess (hypothesis testing). 2. We can estimate
the true value (confidence intervals).
18
Statistical Inference is based on Sampling
Variability
• Sample Statistic we summarize a sample into one
number e.g., could be a mean, a difference in
means or proportions, an odds ratio, or a
correlation coefficient
• E.g. average blood pressure of a sample of 50
American men
• E.g. the difference in average blood pressure
between a sample of 50 men and a sample of 50
women
• Sampling Variability If we could repeat an
experiment many, many times on different samples
with the same number of subjects, the resultant
sample statistic would not always be the same
(because of chance!).
• Standard Error a measure of the sampling
variability

19
Examples of Sample Statistics
• Single population mean
• Difference in means (ttest)
• Difference in proportions (Z-test)
• Odds ratio/risk ratio
• Correlation coefficient
• Regression coefficient

20
Variability of a sample mean
Random Postdocs
The Truth (not knowable)
21
Variability of a sample mean
Random samples of 5 post-docs
The Truth (not knowable)
22
Variability of a sample mean
Samples of 50 Postdocs
The Truth (not knowable)
129 mmHg
134 mmHg
131 mmHg
130 mmHg
128 mmHg
130 mmHg
23
Variability of a sample mean
Samples of 150 Postdocs
The Truth (not knowable)
131.2 mmHg
130.2 mmHg
129.7 mmHg
130.9 mmHg
130.4 mmHg
129.5 mmHg
24
How sample means vary A computer experiment
• 1. Pick any probability distribution and specify
a mean and standard deviation.
• 2. Tell the computer to randomly generate 1000
observations from that probability distributions
• E.g., the computer is more likely to spit out
values with high probabilities
• 3. Plot the observed values in a histogram.
• 4. Next, tell the computer to randomly generate
1000 averages-of-2 (randomly pick 2 and take
their average) from that probability
distribution. Plot observed averages in
histograms.
• 5. Repeat for averages-of-5, and averages-of-100.

25
Uniform on 0,1 average of 1(original
distribution)
26
Uniform 1000 averages of 2
27
Uniform 1000 averages of 5
28
Uniform 1000 averages of 100
29
Exp(1) average of 1(original distribution)
30
Exp(1) 1000 averages of 2
31
Exp(1) 1000 averages of 5
32
Exp(1) 1000 averages of 100
33
Bin(40, .05) average of 1(original
distribution)
34
Bin(40, .05) 1000 averages of 2
35
Bin(40, .05) 1000 averages of 5
36
Bin(40, .05) 1000 averages of 100
37
The Central Limit Theorem
• If all possible random samples, each of size n,
are taken from any population with a mean ? and a
standard deviation ?, the sampling distribution
of the sample means (averages) will

3. be approximately normally distributed
regardless of the shape of the parent population
(normality improves with larger n)
38
Example 1 Weights of doctors
• Experimental question Are practicing doctors
setting a good example for their patients in
their weights?
• Experiment Take a sample of practicing doctors
and measure their weights
• Sample statistic mean weight for the sample
• ?IF weight is normally distributed in doctors
with a mean of 150 lbs and standard deviation of
15, how much would you expect the sample average
to vary if you could repeat the experiment over
and over?

39
Relative frequency of 1000 observations of weight
mean 150 lbs standard deviation 15 lbs
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
Using Sampling Variability
• In reality, we only get to take one sample!!
• But, since we have an idea about how sampling
variability works, we can make inferences about
the truth based on one sample.

44
Experimental results
• Lets say we take one sample of 100 doctors and
calculate their average weight.

45
Expected Sampling Variability for n100 if the
true weight is 150 (and SD15)
46
Expected Sampling Variability for n100 if the
true weight is 150 (and SD15)
47
P-value associated with this experiment
P-value (the probability of our sample average
being 160 lbs or more IF the true average weight
is 150) lt .0001 Gives us evidence that 150 isnt
a good guess
48
The P-value
• P-value is the probability that we would have
seen our data (or something more unexpected) just
by chance if the null hypothesis (null value) is
true.
• Small p-values mean the null value is unlikely
given our data.

49
The P-value
• By convention, p-values of lt.05 are often
accepted as statistically significant in the
medical literature but this is an arbitrary
cut-off.
• A cut-off of plt.05 means that in about 5 of 100
experiments, a result would appear significant
just by chance (Type I error).

50
Hypothesis Testing
• The Steps
• Define your hypotheses (null, alternative)
• The null hypothesis is the straw man that we
are trying to shoot down.
• Null here mean weight of doctors 150 lbs
• Alternative here mean weight gt 150 lbs
(one-sided)
• Specify your sampling distribution (under the
null)
• If we repeated this experiment many, many times,
the sample average weights would be normally
distributed around 150 lbs with a standard error
of 1.5
• 3. Do a single experiment (observed sample mean
160 lbs)
• 4. Calculate the p-value of what you observed
(plt.0001)
• 5. Reject or fail to reject the null hypothesis
(reject)

51
Errors in Hypothesis Testing
52
Errors in Hypothesis Testing
• Type-I Error (false positive)
• Concluding that the observed effect is real when
its just due to chance.
• Type-II Error (false negative)
• Missing a real effect.
• POWER (the complement of type-II error)
• The probability of seeing a real effect (of
rejecting the null if the null is false).

53
Beyond Hypothesis TestingEstimation (confidence
intervals)
Wed estimate based on these data that the
average weight is somewhere closer to 160 lbs.
And we could state the precision of this estimate
(a confidence interval)
54
Confidence Intervals
• (Sample statistic) ? (measure of how confident
we want to be) ? (standard error)

55
• 95 CI for the mean
• 1601.96(1.5) (157 163)

56
What Confidence Intervals do
• They indicate the un/certainty about the size
of a population characteristic or effect. Wider
CIs indicate less certainty.
•   Confidence intervals can also answer the
question of whether or not an association exists
or a treatment is beneficial or harmful.
(analogous to p-values)
• e.g., since the 95 CI of the mean weight does
not cross 150 lbs (the null value), then we
reject the null at plt.05.

57
Expected Sampling Variability for n2
58
Expected Sampling Variability for n2
59

Expected Sampling Variability for n10
60
Statistical Power
• We found the same sample mean (160 lbs) in our
100-doctor sample, 10-doctor sample, and 2-doctor
sample.
• But we only rejected the null based on the
100-doctor and 10-doctor samples.
• Larger samples give us more statistical power

61
Can we quantify how much power we have for given
sample sizes?
62
(No Transcript)
63
(No Transcript)
64
Null Distribution mean150 sd4.74
Clinically relevant alternative mean160
sd4.74
65
(No Transcript)
66
Null Distribution mean150 sd1.37
Nearly 100 power!
Clinically relevant alternative mean160
sd1.37
67
Factors Affecting Power
• 1. Size of the difference (10 pounds higher)
• 2. Standard deviation of the characteristic
(sd15)
• 3. Bigger sample size
• 4. Significance level desired

68
1. Bigger difference from the null mean
69
2. Bigger standard deviation
70
3. Bigger Sample Size
71
4. Higher significance level
72
Examples of Sample Statistics
• Single population mean
• Difference in means (ttest)
• Difference in proportions (Z-test)
• Odds ratio/risk ratio
• Correlation coefficient
• Regression coefficient

73
Example 2 Difference in means
• Example Rosental, R. and Jacobson, L. (1966)
Teachers expectancies Determinates of pupils
I.Q. gains. Psychological Reports, 19, 115-118.

74
The Experiment (note exact numbers have been
altered)
• Grade 3 at Oak School were given an IQ test at
the beginning of the academic year (n90).
• Classroom teachers were given a list of names of
students in their classes who had supposedly
scored in the top 20 percent these students were
• BUT the children on the teachers lists had
actually been randomly assigned to the list.
• At the end of the year, the same I.Q. test was

75
The results
• Children who had been randomly assigned to the
top-20 percent list had mean I.Q. increase of
12.2 points (sd2.0) vs. children in the control
group only had an increase of 8.2 points (sd2.0)
• Is this a statistically significant difference?
Give a confidence interval for this difference.

76
Difference in means
• Sample statistic Difference in mean change in IQ
test score.
• Null hypothesis no difference between academic
bloomers and normal students

77
Explore sampling distributionof difference in
means
• Simulate 1000 differences in mean IQ change under
the null hypothesis (both academic bloomer and
controls improve by, lets say, 8 points, with a
standard deviation of 2.0)

78
79
normal students
80
Notice that most experiments yielded a difference
value between 1.1 and 1.1 (wider than the above
sampling distributions!)
81
• 95 CI for the difference 4.01.99(.52) (3.0
5.0)

Does not cross 0 therefore, significant at .05.
82
95 confidence interval for the observed
difference 4 2.523-5
83
Clearly lots of power to detect a difference of 4!
84
• How much power to detect a difference of 1.0?

85
Power closer to 50 now.
86
Example 3 Difference in proportions
• Experimental question Do men tend to prefer Bush
more than women?
• Experimental design Poll representative samples
of men and women in the U.S. and ask them the
question do you plan to vote for Bush in
November, yes or no?
• Sample statistic The difference in the
proportion of men who are pro-Bush versus women
who are pro-Bush?
• Null hypothesis the difference in proportions
0
• Observed results women.36 men.46

87
Explore sampling distributionof difference in
proportions
• Simulate 1000 differences in proportion
preferring Bush under the null hypothesis (41
overall prefer Bush, with no difference between
genders)

88
men
89
women
Under the null hypothesis, most experiments
yielded a mean between .27 and .55
90
Difference men-women
Under the null hypothesis, most experiments
yielded difference values between -.20 (women
preferring Bush more than men) and .20 (men
preferring Bush more)
91
• What if we had 200 men and 200 women?

92
men
Most of 1000 simulated experiments yielded a mean
between .34 and .48
93
women
Most of 1000 simulated experiments yielded a mean
between .34 and .48
94
Difference men-women
Notice that most experiments will yield a
difference value between -.10 (women preferring
Bush more than men) and .10 (men preferring Bush
more)
95
• What if we had 800 men and 800 women?

96
men
Most experiments will yield a mean between .38
and.44
97
women
Most experiments will yield a mean between .38
and.44
98
Difference men-women
Notice that most experiments will yield a
difference value between -.05 (women preferring
Bush more than men) and .05 (men preferring Bush
more)
99
If we sampled 1600 per group, a 2.5 difference
would be statistically significant at a
significance level of .05. If we sampled 3200
per group, a 1.25 difference would be
statistically significant at a significance
level of .05. If we sampled 6400 per group, a
.625 difference would be statistically
significant at a significance level of
.05. BUT if we found a significant difference
of 1 between men and women, would we care if we
were Bush or Kerry??
100
Limits of hypothesis testingStatistical vs.
Clinical Significance
Consider a hypothetical trial comparing death
rates in 12,000 patients with multi-organ failure
receiving a new inotrope, with 12,000 patients
receiving usual care. If there was a 1
reduction in mortality in the treatment group
(49 deaths versus 50 in the usual care group)
this would be statistically significant (plt.05),
because of the large sample size. However, such
a small difference in death rates may not be
clinically important.
101
Example 4 The odds ratio
• Experimental question Does smoking increase
fracture risk?
• Experiment Ask 50 patients with fractures and 50
controls if they ever smoked.
• Sample statistic Odds Ratio (measure of relative
risk)
• Null hypothesis There is no association between
smoking and fractures (odds ratio1.0).

102
The Odds Ratio (OR)
103
Example 3 Sampling Variability of the null Odds
Ratio (OR) (50 cases/50 controls/20 exposed)
If the Odds Ratio1.0 then with 50 cases and 50
controls, of whom 20 smoke, this is the expected
variability of the sample OR?note the right skew
104
The Sampling Variability of the natural log of
the OR (lnOR) is more Gaussian
105
Statistical Power
• Statistical power here is the probability of
concluding that there is an association between
exposure and disease if an association truly
exists.
• The stronger the association, the more likely we
are to pick it up in our study.
• The more people we sample, the more likely we are
to conclude that there is an association if one
exists (because the sampling variability is
reduced).

106
Part II Biostatistics in Practice Applying
statistics to clinical research design
107
From concept to protocol
• Define your primary predictor and outcome
variables
• Decide on study type (cross-sectional,
case-control, cohort, RCT)
• Decide how you will measure your predictor and
outcome variables, balancing statistical power,
ease of measurement, and potential biases
• Decide on the main statistical tests that will be
used in analysis
• Calculate sample size needs for your chosen
statistical test/s
• Write a statistical analysis plan
• Briefly, describe descriptive statistics that you
plan to present
• Describe which statistical tests you will use to
• Describe which statistical tests you will use to
• Describe how you will account for confounders and
test for interactions
• Describe any exploratory analyses that you might
perform

108
Powering a studyWhat is the primary hypothesis?
• Before you can calculate sample size, you need to
know the primary statistical analysis that you
will use in the end.
• What is your main outcome of interest?
• What is your main predictor of interest?
• Which statistical test will you use to test for
associations between your outcome and your
predictor?
• Do you need to adjust sample size needs upwards
to account for loss to follow-up, switching arms
of a randomized trial, accounting for
confounders?
• Seek guidance from a statistician

109
Overview of statistical tests
• The following table gives the appropriate choice
of a statistical test or measure of association
for various types of data (outcome variables and
predictor variables) by study design.

e.g., blood pressure pounds age treatment
(1/0)
110
(No Transcript)
111
Comparing Groups
• T-test compares two means
• (null hypothesis difference in means 0)
• ANOVA compares means between gt2 groups
• (null hypothesis difference in means 0)
• Non-parametric tests are used when normality
assumptions are not met
• (null hypothesis difference in medians 0)
• Chi-square test compares proportions between
groups
• (null hypothesis categorical variables are
independent)

112
Simple sample size formulas/calculators available
• Sample size for a difference in means
• Sample size for a difference in proportions
• Can roughly be used if you plan to calculate risk
ratios, odds ratios, or to run logistic
regression or chi-square tests
• Sample size for a hazard ratio/log-rank test
• If you plan to do survival analysis Kaplan-Meier
methods (log-rank test), Cox regression

113
(No Transcript)
114
The pay-off for sitting through the theoretical
part of these lectures!
• Heres where it pays to understand whats behind
sample size/power calculations!
• Youll have a much easier time using sample size
calculators if you arent just putting numbers
into a black box!

115
(No Transcript)
116
(No Transcript)
117
(No Transcript)
118
(No Transcript)
119
If this look complicated, dont panic!
• In reality, youre unlikely to have to derive
sample size formulas yourself
• ?but its critical to understand where they come
from if youre going to apply them yourself.

120
Formula for difference in means
121
Formula for difference in proportions
122
Formula for hazard ratio/log-rank test
123
Recommended sample size calculators!
• http//hedwig.mgh.harvard.edu/sample_size/size.htm
l
• http//vancouver.stanford.edu8080/clio/index.html
• ?Traverse protocol wizard

124
These sample size calculations are idealized
• We have not accounted for losses-to-follow up
• We have not accounted for non-compliance (for
intervention trial or RCT)
• We have assumed that individuals are independent
observations (not true in clustered designs)
• Consult a statistician for these considerations!

125
Applying statistics to clinical research design
Example
• You want to study the relationship between
smoking and fractures.

126
Steps
• ?Define your primary predictor and outcome
variables
• ?Decide on study type

127
Applying statistics to clinical research design
Example
• ? predictor smoking (yes/no or continuous)
• ?outcome osteoporotic fracture (time-to-event)
• ?Study design cohort

128
From concept to protocol
• ?Decide how you will measure your predictor and
outcome variables
• ?Decide on the main statistical tests that will
be used in analysis
• ?Calculate sample size needs for your chosen
statistical test/s

129
(No Transcript)
130
Formula for hazard ratio/log-rank test
131
Example sample size calculation
• Ratio of exposed to unexposed in your sample?
• 11
• Proportion of non-smokers who will fracture in
period?
• 10
• What is a clinically meaningful hazard ratio?
• 2.0
• Based on hazard ratio, how many smokers will
fracture?
• 1-902 19
• What power are you targeting?
• 80
• What significance level?
• .05

132
Formula for hazard ratio/log-rank test
You may want to adjust upwards for loss to
follow-up. E.g., if you expect to lose 10,
divide the above estimate by 90.
133
From concept to protocol
• Write a statistical analysis plan

134
(No Transcript)
135
Statistical analysis plan
• Descriptive statistics
• E.g., of study population by smoking status
• Kaplan-Meier Curves (univariate)
• Describe exploratory analyses that may be used to
identify confounders and other predictors of
fracture
• Cox regression (multivariate)
• What confounders have you measured, and how will
you incorporate them into multivariate analysis?
• How will you explore for possible interactions?
• Describe potential exploratory analysis for other
predictors of fracture