Introduction to Clinical Investigation - PowerPoint PPT Presentation

1 / 80

About This Presentation

Title:

Introduction to Clinical Investigation

Description:

Marathon. Study Populations. The study populations will consist of those ... marathon in a time of no more than 6 hours. Questions of Interest. Is the average ... – PowerPoint PPT presentation

Number of Views:104

Avg rating:3.0/5.0

Slides: 81

Provided by: JimPa3

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Clinical Investigation

1
Introduction to Clinical Investigation
Analyzing the Data Applied Biostatistics
November 2, 2004 James Patrie MS Senior
Biostatistician Department of Health Evaluation S
ciences University of Virginia Health Science Cen
ter Charlottesville, Virginia.
2
Presentation Outline
Case study Data Description Data Anal
ysis
Data Interpretation
3
Case Study

Data were collected from the internet websites
of four nonrandomly selected marathons that
were held in United States during the Fall of
2002. The marathon times from the New York
Marathon,the Chicago Marathon, the Twin Cities
Marathon, and the Philadelphia Marathon will be
analysis.

4
Sample Populations
The sample populations will co
nsist of those marathon runners who competed in
either the 2002 New York Marathon, the 2002 Chi
cago Marathon, the 2002 Twin Cities Marathon, or
the 2002 Philadelphia Marathon. As the inclu
sion criteria, the runner must have
been at least 18 years of age at the time the
marathon was held, and the runner must have
completed the marathon in a time of no more
than 6 hours.
5
Four Sample Populations

New York Marathon (31184)

Chicago Marathon (31120)
Phil Marathon (4565)
Twin Cities Marathon (6641)
Ex
Ex
Ex
Ex
88
990
866
41
4477
30194
30254
6600
71525
6
Available Data
7
Additional Marathon Information
8
Study Populations
The study populations will cons
ist of those marathon runners 18 years of age or
older who compete in either the New York Marat
hon, the Chicago Marathon, the Twin Cities Marat
hon, or the Philadelphia Marathon, and who compl
ete the marathon in a time of no more than 6 hou
rs.
9
Questions of Interest

Is the average marathon completion time similar
for the four different marathon study
populations? By how many minutes would we estim
ate the average marathon completion time to diff
er between the four study populations?
10
Some Definitions
A sample is a collection of observations taken
from a subset of the study population (data se
t). An element is an observational unit withi
n the sample (runner). A variable is an a
ttribute that varies from one element
of the sample to the next (marathon time).
A parameter is a characteristic of the study p
opulation (mean marathon completion time in th
e study pop.).
11
Additional Definitions
A statistic is a characteristic of the sample
(mean marathon completion time in the sample po
p.). An empirical frequency distribution is a
listing of the values or the range of values
of the variable together with the frequencies
with which the values or range of values occur
in the sample. A relative empirical freque
ncy distribution is the empirical frequency dis
tribution divided by the sample size.
12
(No Transcript)
13
Goal of Data Analysis
Once a sample of data is collected from a defined
study population, the goal of data analysis is to
use the information in the sample, to make valid
statements about the study population.
This is accomplished by reducing the sample of
data to a small number of summary measures that
we call statistics. Together these summary me
asures retain sufficient information to allow
characteristics of the study population to be
estimated.
14
Steps in the Statistical Analysis Process

There are basically three steps in the
statistical analysis process. I Data descr
iption. II Data analysis. III Data interpre
tation.
15
Step I Data Description
The goal of data description is to describe the
empirical frequency distribution of the variable.
The manner of the description will be dependen
t on the class of the variable. There are two
distinct classes of variables
qualitative variables, and quantitative variables.
16
Qualitative Variables
Qualitative variables (categorical) are
intrinsically non-numeric and there are two types
of qualitative variables nominal variables and
ordinal variables. Nominal variables such a
s gender (female,male) and marathon site (New,
York, Chicago, Twin Cities,and Philadelphia) ha
ve categories that have no natural order of ra
nk. Ordinal variables such as the runners
age class (18-34, 35-39,40-44, 45-49, 50-54, an
d 55) have categories that have a natural ord
er of rank.
17
Qualitative Variable Description
The empirical frequency distribution of a
nominal or an ordinal variable is usually summa
rized as list of Frequencies (counts).
Proportions. Percentages. The list
can be displayed in tabular form or graphically
displayed as a barplot.
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
Quantitative Variables
Quantitative variables are intrinsically numeric
and there are two distinct types of quantitative
variables discrete variables, and continuous var
iables. Discrete variables such as the numbe
r of prior marathons that the runner had compl
eted (count 0,1,2,?), take on a limited numbe
r of unique values. Continuous variables
such as the runners marathon finishing time
take on a large or an infinite number of uni
que values.
23
Quantitative Variable Description
The empirical distribution of a discrete or a
continuous variable is generally described by
one of the following types of statistics.
Statistics derived from the sample moments of
the empirical frequency distribution. Sta
tistics derived from the percentiles of the
empirical frequency distribution.
24
Statistics Derived from Sample Moments
The arithmetic mean is a statistic that is der
ived from the first sample moment of the empi
rical frequency distribution. The arithmetic m
ean is computed by the following formula.

We can use the arithmetic mean of the empirical
frequency distribution to estimate the mean va
lue of the variable in the study population.

25
The standard deviation is a statistic that is
derived from the second sample moment of the e
mpirical frequency distribution. The standard
deviation is computed by the following formul
a. We can use standard deviation of t
he empirical distribution to estimate the vari
ation among the individual observations in the
study population.
26
(No Transcript)
27
Statistics Derived from Sample Percentiles
The Pth percentile of a sample of n
observations is the value of the variable that
has ordered rank (P/100)(1n). As and example
for P20, and n100, (20/100)(1100) 20.2.
So the 20th smallest value of the variable is
the value at the 20th Percentile.
The 50th percentile of a sample of n observat
ions is referred to as the sample median.
We can use the median of the empirical
frequency distribution to estimate the median
value of the variable in the study populatio
n.
28
The 25th percentile of a sample of n
observations is referred to as the lower quart
ile, and the 75th percentile of a sample of n
observations is referred to as the upper quar
tile. The difference between the upper quarti
le value and the lower quartile value is referr
ed to as the interquartile range of the empiri
cal frequency distribution. We can use th
e interquartile range of the empirical
frequency distribution to estimate the
interquartile range of the distribution of the
values of the variable in the study populatio
n.

29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
Choosing Between Statistics
Robustness The robustness of a statistic is
related to the statistics resistance to being af
fected by extreme values. The arithmetic mean is
a non-robust statistic while the median is a ro
bust statistic. If the empirical distribution is
skewed or extreme values are
present the median will provide a better measure
of central location than the arithmetic mean.
Summarizing Capability The arithmetic mean is
a more appropriate statistic if the data can be
described by a particular mathematical model su
ch as the Normal (Gaussian) distribution.
41
Case Study
Descriptive summary for the gender demographics.
42
Case Study
Descriptive summary for the age demographics.
43
Case Study
Descriptive summary for marathon completion time.
44
Step II Data Analysis
There are two distinct types of data analysis
methods Nonparametric methods. Paramet
ric methods.

45
Nonparametric Methods
Nonparametric methods are typically utilized when
the form of the distribution of the values of th
e variable within the study population is assumed
unknown or thought not to be easily characteriz
ed by a few parameters such as the mean and vari
ance. Nonparametric procedures are generally us
ed when the sample size is too small to make a r
eliable decision about the true shape of the dis
tribution of the values of the variable within t
he study population.
46
Some Non-Parametric Methods
Wilcoxon Rank Sum Test. Kruskal Wallis Tes
t. Spearmans Rank Correlation. Fishers
Exact Test.
47
Parametric Methods
Parametric methods are typically utilized when
the form of the distribution of the values of th
e variable within the study population is assume
d known and easily characterized by a few paramet
ers. Parametric methods are generally used when
the sample size is large enough to make a relia
ble decision about the true shape of the distrib
ution of the values of the variable within the s
tudy population.
48
Some Parametric Methods
Students t-Test. Analysis of Variance.
Linear Regression.
Logistic Regression.
49
Case Study
Since the sample size is extremely large
(n71525) there is sound theoretical justificati
on (central limit theorem) to assume the 4 sampl
e means will be approximately normally distribut
ed. Therefore, a parametric data analysis metho
d that utilizes the population mean as the param
eter of comparison would be well suited to addres
s the study objectives.
50
Parametric Data Analysis
For this study, a parametric data analysis will
involve the following four steps. Model for
mulation. Hypotheses testing. Mean Separ
ation Confidence Interval Construction.
51
Model Formulation
Analysis of variance (ANOVA) is a linear model
in which the response variable is a continuous
variable and all of the predictor variables a
re categorical. Each categorical variable (ma
rathon) is referred to as a factor and each ca
tegory within a factor is referred to as a lev
el of the factor (New York). The ANOVA model
estimates a study population
mean for each level of the factor and provides
a global test for equal means across all level
s of the factor.
52
One-Way ANOVA Linear Model
yij ? ?j ?ij
53
Model Assumptions
t
1

t
3
t
2
t
4
m2
m1
m3
m4
m
54
Case Study
Check of the ANOVA equal variance assumption.
55
Hypothesis Testing
The hypothesis testing procedure involves the
following three steps. Based on the study obj
ective we formulate a null hypothesis (Ho), wh
ich generally we wish to reject. We speci
fy the probability (type I error rate) of
making an incorrect decision to reject the null
hypothesis (Ho) when we should not. We
select a statistical test that will allow us to
test the null hypothesis.
56
Case Study
The study objective is to determine whether the
marathon completion times are similar across
the four marathon study populations.
As the null hypothesis, we state that the mean
marathon completion time is equal for the four
study populations Ho m1 m2 m3 m4.
As the alternative hypothesis we state that
the mean marathon completion time is not equal
for at least two of the study populations Ha
mj? mj.
57
Hypotheses
Hom1 m2 m3 m4
m
m2
m4
m1
m3
Hamj ? mj for at least one j?j
58
Type I Error Rate
The type I error rate (a) of a statistical test
is the probability that the test leads to falsel
y rejecting the null hypothesis (Ho) when Ho is
true. The type I error rate should always be s
et at the study design stage. Traditionally, a
is set at either the 0.05 or 0.01 level.
59
Test Statistic
If the assumptions of the ANOVA model are valid
we can formulate a test statistic F that is a
function of the ratio of two chi-square random va
riables. Under the null hypothesis m1
m2 . . . mt, the value of the F-statistic ha
s a known probability distribution F with t-1 a
nd N-t degrees of freedom.
60
Null F(3 ,71521) Probability Distribution

Fc F(1-0.05,3,71521)
Acceptance Region
2.605
1- 0.05
0.05
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
F-value
61
Case Study One-Way ANOVA
ANOVA summary for marathon completion time.
The Pvalue is interpreted as the probability of
observing an F-statistic more extreme than the
one observed under the null hypothesis m1m2m3
m4.
62
Mean Separation
If the null hypothesis that m1 m2 . . . mt
is rejected then it implies that the mean respons
e differs between at least two of the groups.
The process of determining which group means
differ is referred to as mean separation. Th
e objective of mean separation is determine
which pair of means differ while still maintainin
g an overall type I error rate ? a.
63
Steps In Mean Separation
The process of mean separation includes the
following three steps. selecting an appropri
ate multiple comparison test. selecting a mul
tiple comparison type I error rate
adjustment criterion. selecting the appropr
iate critical value of the test.
64
Multiple Comparison Test
The multiple comparison test procedure is
carried out by performing the equivalence of mult
iple Student t-tests, where the test statistic i
s defined as

Under the null hypothesis ?j ?j, the test
statistic t has a known probability distribution
t with N-t
degrees of freedom.
65
Multiple Comparison Adjustment

To maintain a pre-specified type I error rate (a)
over interrelated hypothesis tests, a more string
ent criterion is required to reject the null hypo
thesis ?j ?j. The Bonferroni type I error r
ate adjustment is widely used when the number o
f hypothesis tests is small. The Bonferroni cri
terion states that to maintain a
pre-specified type I error rate of (a) over
multiple hypothesis tests, the original type I e
rror rate must be divided by the number of hypot
hesis test, or equivalently by the total number
of comparisons (a/c).
66
(No Transcript)
67
Selecting the Appropriate Critical Value
A two-sided null hypothesis ?j ?j, has a
single alternative hypothesis mj ? mj. For
a two-sided hypothesis test with an adjusted
type I error rate (a/c) we can find a critical
t-value (tc), such that if the absolute value of
observe t-statistic exceeds tc we reject the nul
l hypothesis ?j ?j.
68
Case Study Example
Since the study objective is to determine which
of the mean marathon completion times differ, re
gard- less of the direction of the difference, th
e alternative hypothesis is two-sided mj ? mj.
Since there are 4 marathons, 6 hypothesis test
s will have to be conducted. To maintain a ov
erall type I error rate of 0.05 will
require the comparison type I error rate for any
single test to be 0.05/6 ?0.008.
69
Null t(71521)-Probability Distribution
Ho mj mj
Ha mj ? mj
Acceptance Region
tc t(1-0.05/12,71521)
2.64
Reject
Reject
0.05/12
0.05/12
1- 0.05/6
-5
-4
-3
-2
-1
0
1
2
3
4
5
t-value
70
Case Study
Mean separation for marathon completion time.
The pvalue represent the probability of
obtaining a t-statistic more extreme than the
one observed under the null hypothesis mj mj
.
71
IV. Confidence Interval Construction
A confidence interval for a parameter ?
represents a plausible range of values for the pa
rameter ?.
A interval (L,U) is a 100(1-?) confidence
interval for the parameter ? if the probability (
L ? ? ? U)1-?.

The quantity 1-? is called the confidence
coefficient
or the confidence level.
72
Case Study
To compute a 95 confidence interval for the true
difference (?) between the mean marathon times of
any two marathon study populations we invert the
t-test and evaluate the formula at t tc.
73
Case Study
Difference (?) between marathon completion time.
74
(No Transcript)
75
Step III Data Interpretation
The following two features of study design play a
major role in how the data analysis is interpret
ed. How were the subjects sampled (selected).
How were the subjects assigned to groups.
76
Study Design and Data Interpretation
Assigned to Groups
By Randomization
Not by Randomization
Selected
(Randomized Exp.) A random sample is
selected from one population units are
then randomly assigned to different groups
.
(Survey) Random samples are select
ed from existing
distinct populations.
Random
(Observational Studies) A collection of avail
able units from distinct groups are examined
.

(Clinical Trials)
A group of study units is found, units are then

randomly assign to study groups.
Non-Random
77
Data Analysis Summary
Statistical Methods The marathon completion
times from the 4 marathon sites were analyzed by
one-way analysis of variance. The response vari
able was the runners completion time (h) and th
e independent factor was the marathon site, whic
h had four levels (New York, Chicago, Twin Citie
s, and Philadelphia). Multiple comparison adjus
tment was base on a Bonferroni criterion in whic
h the overall type I error rate (a) was ? 0.05.

78
Data Analysis Summary
Results There was an association between the
marathon site and the mean marathon completion
time (P
the mean marathon completion time (h) was betwee
n the New York and Philadelphia marathon (16.4 m
in, 95CI14.6,18.1,P
icago and Philadelphia marathon (11.2 min, 9.98
,12.92, P
elphia marathon (10.4 min, 8.2, 12.5,P
There was no statistical difference between the
Chicago mean marathon completion time and the T
win Cities mean marathon completion time (0.78 m
in, -0.72, 2.3, P1.00).
79
Data Analysis Summary
Conclusions Although, we found an association
between the marathon site and the mean marathon
completion time, assigning cause to this
relationship would only be speculative. Since the
runners were not selected at random, nor were t
hey randomly assigned to participate in a partic
ular marathon, there may be multiple confounding
factors that contributed to the discrepancy betw
een the average marathon completion time of thes
e 4 marathons.
80
Statistical Resource Material
Rosner, B. Fundamentals of Biostatistics. 5th ed.
2000. Duxbury Press. Pacific Grove, CA. Fishe
r, L.D., van Belle, G. Biostatistics A
Methodology for the Health Sciences. 1993. John W
iley Sons. NY. Campbell, M.J., Machin, D., Me
dical Statistics A Common Sense Approach. 1993.
John Wiley Sons. NY. Bailar, J.C., Mosteller,
F. Medical User of Statistics. 2nd Ed.
1992. NEJM Books, Boston MA.

Write a Comment

User Comments (0)