Practical Applications of Statistical Methods in the Clinical Laboratory - PowerPoint PPT Presentation

Loading...

PPT – Practical Applications of Statistical Methods in the Clinical Laboratory PowerPoint presentation | free to download - id: a810d-ODM0N



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Practical Applications of Statistical Methods in the Clinical Laboratory

Description:

Practical Applications of Statistical Methods in the Clinical Laboratory ... difficulties that bars the path of those who pursue the Science of Man. ... – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 218
Provided by: rogerbe
Learn more at: http://www.bertholf.net
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Practical Applications of Statistical Methods in the Clinical Laboratory


1
Practical Applications of Statistical Methods in
the Clinical Laboratory
  • Roger L. Bertholf, Ph.D., DABCC
  • Associate Professor of Pathology
  • Director of Clinical Chemistry Toxicology
  • UF Health Science Center/Jacksonville

2
Statistics are the only tools by which an
opening can be cut through the formidable thicket
of difficulties that bars the path of those who
pursue the Science of Man.
  • Sir Francis Galton (1822-1911)

3
There are three kinds of lies Lies, damned
lies, and statistics
  • Benjamin Disraeli (1804-1881)

4
What are statistics, and what are they used for?
  • Descriptive statistics are used to characterize
    data
  • Statistical analysis is used to distinguish
    between random and meaningful variations
  • In the laboratory, we use statistics to monitor
    and verify method performance, and interpret the
    results of clinical laboratory tests

5
Do not worry about your difficulties in
mathematics, I assure you that mine are greater
  • Albert Einstein (1879-1955)

6
I don't believe in mathematics
  • Albert Einstein

7
Summation function
8
Product function
9
The Mean (average)
  • The mean is a measure of the centrality of a set
    of data.

10
Mean (arithmetical)
11
Mean (geometric)
12
Use of the Geometric mean
  • The geometric mean is primarily used to average
    ratios or rates of change.

13
Mean (harmonic)
14
Example of the use of Harmonic mean
  • Suppose you spend 6 on pills costing 30 cents
    per dozen, and 6 on pills costing 20 cents per
    dozen. What was the average price of the pills
    you bought?

15
Example of the use of Harmonic mean
  • You spent 12 on 50 dozen pills, so the average
    cost is 12/500.24, or 24 cents.
  • This also happens to be the harmonic mean of 20
    and 30

16
Root mean square (RMS)
17
For the data set 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
18
The Weighted Mean
19
Other measures of centrality
  • Mode

20
The Mode
  • The mode is the value that occurs most often

21
Other measures of centrality
  • Mode
  • Midrange

22
The Midrange
  • The midrange is the mean of the highest and
    lowest values

23
Other measures of centrality
  • Mode
  • Midrange
  • Median

24
The Median
  • The median is the value for which half of the
    remaining values are above and half are below it.
    I.e., in an ordered array of 15 values, the 8th
    value is the median. If the array has 16 values,
    the median is the mean of the 8th and 9th values.

25
Example of the use of median vs. mean
  • Suppose youre thinking about building a house in
    a certain neighborhood, and the real estate agent
    tells you that the average (mean) size house in
    that area is 2,500 sq. ft. Astutely, you ask
    Whats the median size? The agent replies
    1,800 sq. ft.
  • What does this tell you about the sizes of the
    houses in the neighborhood?

26
Measuring variance
  • Two sets of data may have similar means, but
    otherwise be very dissimilar. For example, males
    and females have similar baseline LH
    concentrations, but there is much wider variation
    in females.
  • How do we express quantitatively the amount of
    variation in a data set?

27
(No Transcript)
28
The Variance
29
The Variance
  • The variance is the mean of the squared
    differences between individual data points and
    the mean of the array.
  • Or, after simplifying, the mean of the squares
    minus the squared mean.

30
The Variance
31
The Variance
  • In what units is the variance?
  • Is that a problem?

32
The Standard Deviation
33
The Standard Deviation
  • The standard deviation is the square root of the
    variance. Standard deviation is not the mean
    difference between individual data points and the
    mean of the array.

34
The Standard Deviation
In what units is the standard deviation? Is that
a problem?
35
The Coefficient of Variation
  • Sometimes called the Relative Standard Deviation
    (RSD or RSD)

36
Standard Deviation (or Error) of the Mean
  • The standard deviation of an average decreases by
    the reciprocal of the square root of the number
    of data points used to calculate the average.

37
Exercises
  • How many measurements must we average to improve
    our precision by a factor of 2?

38
Answer
  • To improve precision by a factor of 2

39
Exercises
  • How many measurements must we average to improve
    our precision by a factor of 2?
  • How many to improve our precision by a factor of
    10?

40
Answer
  • To improve precision by a factor of 10

41
Exercises
  • How many measurements must we average to improve
    our precision by a factor of 2?
  • How many to improve our precision by a factor of
    10?
  • If an assay has a CV of 7, and we decide run
    samples in duplicate and average the
    measurements, what should the resulting CV be?

42
Answer
  • Improvement in CV by running duplicates

43
Population vs. Sample standard deviation
  • When we speak of a population, were referring to
    the entire data set, which will have a mean ?

44
Population vs. Sample standard deviation
  • When we speak of a population, were referring to
    the entire data set, which will have a mean ?
  • When we speak of a sample, were referring to a
    subset of the population, customarily designated
    x-bar
  • Which is used to calculate the standard deviation?

45
Sir, I have found you an argument. I am not
obliged to find you an understanding.
  • Samuel Johnson (1709-1784)

46
Population vs. Sample standard deviation
47
Distributions
  • Definition

48
Statistical (probability) Distribution
  • A statistical distribution is a
    mathematically-derived probability function that
    can be used to predict the characteristics of
    certain applicable real populations
  • Statistical methods based on probability
    distributions are parametric, since certain
    assumptions are made about the data

49
Distributions
  • Definition
  • Examples

50
Binomial distribution
  • The binomial distribution applies to events that
    have two possible outcomes. The probability of r
    successes in n attempts, when the probability of
    success in any individual attempt is p, is given
    by

51
Example
  • What is the probability that 10 of the 12 babies
    born one busy evening in your hospital will be
    girls?

52
Solution
53
Distributions
  • Definition
  • Examples
  • Binomial

54
God does arithmetic
  • Karl Friedrich Gauss (1777-1855)

55
The Gaussian Distribution
  • What is the Gaussian distribution?

56
63 81 36 12 28 7 79 52 96 17 22 4 61 85
etc.
57
(No Transcript)
58
63 81 36 12 28 7 79 52 96 17 22 4 61 85
22 73 54 33 99 5 61 28 58 24 16 77 43 8
85 152 90 45 127 12 140 70 154 41 38 81 104 93


59
(No Transcript)
60
. . . etc.
61
Probability
x
62
The Gaussian Probability Function
  • The probability of x in a Gaussian distribution
    with mean ? and standard deviation ? is given by

63
The Gaussian Distribution
  • What is the Gaussian distribution?
  • What types of data fit a Gaussian distribution?

64
Like the ski resort full of girls hunting for
husbands and husbands hunting for girls, the
situation is not as symmetrical as it might seem.
  • Alan Lindsay Mackay (1926- )

65
Are these Gaussian?
  • Human height
  • Outside temperature
  • Raindrop size
  • Blood glucose concentration
  • Serum CK activity
  • QC results
  • Proficiency results

66
The Gaussian Distribution
  • What is the Gaussian distribution?
  • What types of data fit a Gaussian distribution?
  • What is the advantage of using a Gaussian
    distribution?

67
Gaussian probability distribution
Probability
.67
.95
µ
µ?
µ2?
µ3?
µ-?
µ-2?
µ-3?
68
What are the odds of an observation . . .
  • more than 1 ??from the mean (/-)
  • more than 2 ? greater than the mean
  • more than 3 ? from the mean

69
Some useful Gaussian probabilities
Range
Probability
Odds
/- 1.00 ?
68.3
1 in 3
/- 1.64 ?
90.0
1 in 10
/- 1.96 ?
95.0
1 in 20
/- 2.58 ?
99.0
1 in 100
70
Example
That
This
71
On the Gaussian curve Experimentalists think
that it is a mathematical theorem while the
mathematicians believe it to be an experimental
fact.
  • Gabriel Lippman (1845-1921)

72
Distributions
  • Definition
  • Examples
  • Binomial
  • Gaussian

73
"Life is good for only two things, discovering
mathematics and teaching mathematics"
  • Siméon Poisson (1781-1840)

74
The Poisson Distribution
  • The Poisson distribution predicts the frequency
    of r events occurring randomly in time, when the
    expected frequency is ?

75
Examples of events described by a Poisson
distribution
?
  • Lightning
  • Accidents
  • Laboratory?

76
A very useful property of the Poisson distribution
77
Using the Poisson distribution
  • How many counts must be collected in an RIA in
    order to ensure an analytical CV of 5 or less?

78
Answer
79
Distributions
  • Definition
  • Examples
  • Binomial
  • Gaussian
  • Poisson

80
The Students t Distribution
  • When a small sample is selected from a large
    population, we sometimes have to make certain
    assumptions in order to apply statistical methods

81
Questions about our sample
  • Is the mean of our sample, x bar, the same as the
    mean of the population, ??
  • Is the standard deviation of our sample, s, the
    same as the standard deviation for the
    population, ??
  • Unless we can answer both of these questions
    affirmatively, we dont know whether our sample
    has the same distribution as the population from
    which it was drawn.

82
  • Recall that the Gaussian distribution is defined
    by the probability function
  • Note that the exponential factor contains both
    ??and ?, both population parameters. The factor
    is often simplified by making the substitution

83
  • The variable z in the equation
  • is distributed according to a unit gaussian,
    since it has a mean of zero and a standard
    deviation of 1

84
Gaussian probability distribution
Probability
.67
.95
0
1
2
3
-1
-2
-3
z
85
  • But if we use the sample mean and standard
    deviation instead, we get
  • and weve defined a new quantity, t, which is not
    distributed according to the unit Gaussian. It
    is distributed according to the Students t
    distribution.

86
Important features of the Students t distribution
  • Use of the t statistic assumes that the parent
    distribution is Gaussian
  • The degree to which the t distribution
    approximates a gaussian distribution depends on N
    (the degrees of freedom)
  • As N gets larger (above 30 or so), the
    differences between t and z become negligible

87
Application of Students t distribution to a
sample mean
  • The Students t statistic can also be used to
    analyze differences between the sample mean and
    the population mean

88
Comparison of Students t and Gaussian
distributions
  • Note that, for a sufficiently large N (gt30), t
    can be replaced with z, and a Gaussian
    distribution can be assumed

89
Exercise
  • The mean age of the 20 participants in one
    workshop is 27 years, with a standard deviation
    of 4 years. Next door, another workshop has 16
    participants with a mean age of 29 years and
    standard deviation of 6 years.
  • Is the second workshop attracting older
    technologists?

90
Preliminary analysis
  • Is the population Gaussian?
  • Can we use a Gaussian distribution for our
    sample?
  • What statistic should we calculate?

91
Solution
  • First, calculate the t statistic for the two
    means

92
Solution, cont.
  • Next, determine the degrees of freedom

93
Statistical Tables
94
Conclusion
  • Since 1.16 is less than 1.64 (the t value
    corresponding to 90 confidence limit), the
    difference between the mean ages for the
    participants in the two workshops is not
    significant

95
The Paired t Test
  • Suppose we are comparing two sets of data in
    which each value in one set has a corresponding
    value in the other. Instead of calculating the
    difference between the means of the two sets, we
    can calculate the mean difference between data
    pairs.

96
  • Instead of
  • we use
  • to calculate t

97
Advantage of the Paired t
  • If the type of data permit paired analysis, the
    paired t test is much more sensitive than the
    unpaired t.
  • Why?

98
Applications of the Paired t
  • Method correlation
  • Comparison of therapies

99
Distributions
  • Definition
  • Examples
  • Binomial
  • Gaussian
  • Poisson
  • Students t

100
The ?2 (Chi-square) Distribution
  • There is a general formula that relates actual
    measurements to their predicted values

101
The ?2 (Chi-square) Distribution
  • A special (and very useful) application of the ?2
    distribution is to frequency data

102
Exercise
  • In your hospital, you have had 83 cases of
    iatrogenic strep infection in your last 725
    patients. St. Elsewhere, across town, reports 35
    cases of strep in their last 416 patients.
  • Do you need to review your infection control
    policies?

103
Analysis
  • If your infection control policy is roughly as
    effective as St. Elsewheres, we would expect
    that the rates of strep infection for the two
    hospitals would be similar. The expected
    frequency, then would be the average

104
Calculating ?2
  • First, calculate the expected frequencies at your
    hospital (f1) and St. Elsewhere (f2)

105
Calculating ?2
  • Next, we sum the squared differences between
    actual and expected frequencies

106
Degrees of freedom
  • In general, when comparing k sample proportions,
    the degrees of freedom for ?2 analysis are k - 1.
    Hence, for our problem, there is 1 degree of
    freedom.

107
Conclusion
  • A table of ?2 values lists 3.841 as the ?2
    corresponding to a probability of 0.05.
  • So the variation (?2?????????between strep
    infection rates at the two hospitals is within
    statistically-predicted limits, and therefore is
    not significant.

108
Distributions
  • Definition
  • Examples
  • Binomial
  • Gaussian
  • Poisson
  • Students t
  • ?2

109
The F distribution
  • The F distribution predicts the expected
    differences between the variances of two samples
  • This distribution has also been called Snedecors
    F distribution, Fisher distribution, and variance
    ratio distribution

110
The F distribution
  • The F statistic is simply the ratio of two
    variances
  • (by convention, the larger V is the numerator)

111
Applications of the F distribution
  • There are several ways the F distribution can be
    used. Applications of the F statistic are part
    of a more general type of statistical analysis
    called analysis of variance (ANOVA). Well see
    more about ANOVA later.

112
Example
  • Youre asked to do a quick and dirty
    correlation between three whole blood glucose
    analyzers. You prick your finger and measure
    your blood glucose four times on each of the
    analyzers.
  • Are the results equivalent?

113
Data
114
Analysis
  • The mean glucose concentrations for the three
    analyzers are 70, 85, and 76.
  • If the three analyzers are equivalent, then we
    can assume that all of the results are drawn from
    a overall population with mean ? and variance ?2.

115
Analysis, cont.
  • Approximate ? by calculating the mean of the
    means

116
Analysis, cont.
  • Calculate the variance of the means

117
Analysis, cont.
  • But what we really want is the variance of the
    population. Recall that

118
Analysis, cont.
  • Since we just calculated
  • we can solve for ??

119
Analysis, cont.
  • So we now have an estimate of the population
    variance, which wed like to compare to the real
    variance to see whether they differ. But what is
    the real variance?
  • We dont know, but we can calculate the variance
    based on our individual measurements.

120
Analysis, cont.
  • If all the data were drawn from a larger
    population, we can assume that the variances are
    the same, and we can simply average the variances
    for the three data sets.

121
Analysis, cont.
  • Now calculate the F statistic

122
Conclusion
  • A table of F values indicates that 4.26 is the
    limit for the F statistic at a 95 confidence
    level (when the appropriate degrees of freedom
    are selected). Our value of 10.6 exceeds that,
    so we conclude that there is significant
    variation between the analyzers.

123
Distributions
  • Definition
  • Examples
  • Binomial
  • Gaussian
  • Poisson
  • Students t
  • ?2
  • F

124
Unknown or irregular distribution
  • Transform

125
Log transform
Probability
Probability
log x
x
126
Unknown or irregular distribution
  • Transform
  • Non-parametric methods

127
Non-parametric methods
  • Non-parametric methods make no assumptions about
    the distribution of the data
  • There are non-parametric methods for
    characterizing data, as well as for comparing
    data sets
  • These methods are also called distribution-free,
    robust, or sometimes non-metric tests

128
Application to Reference Ranges
  • The concentrations of most clinical analytes are
    not usually distributed in a Gaussian manner.
    Why?
  • How do we determine the reference range (limits
    of expected values) for these analytes?

129
Application to Reference Ranges
  • Reference ranges for normal, healthy populations
    are customarily defined as the central 95.
  • An entirely non-parametric way of expressing this
    is to eliminate the upper and lower 2.5 of data,
    and use the remaining upper and lower values to
    define the range.
  • NCCLS recommends 120 values, dropping the two
    highest and two lowest.

130
Application to Reference Ranges
  • What happens when we want to compare one
    reference range with another? This is precisely
    what CLIA 88 requires us to do.
  • How do we do this?

131
Everything should be made as simple as possible,
but not simpler.
  • Albert Einstein

132
Solution 1 Simple comparison
  • Suppose we just do a small internal reference
    range study, and compare our results to the
    manufacturers range.
  • How do we compare them?
  • Is this a valid approach?

133
NCCLS recommendations
  • Inspection Method Verify reference populations
    are equivalent
  • Limited Validation Collect 20 reference
    specimens
  • No more than 2 exceed range
  • Repeat if failed
  • Extended Validation Collect 60 reference
    specimens compare ranges.

134
Solution 2 Mann-Whitney
  • Rank normal values (x1,x2,x3...xn) and the
    reference population (y1,y2,y3...yn)
  • x1, y1, x2, x3, y2, y3 ... xn, yn
  • Count the number of y values that follow each x,
    and call the sum Ux. Calculate Uy also.
  • Also called the U test, rank sum test, or
    Wilcoxens test.

135
Mann-Whitney, cont.
  • It should be obvious that Ux Uy NxNy
  • If the two distributions are the same, then
  • Ux Uy 1/2NxNy
  • Large differences between Ux and Uy indicate that
    the distributions are not equivalent

136
Obvious is the most dangerous word in
mathematics.
  • Eric Temple Bell (1883-1960)

137
Solution 3 Run test
  • In the run test, order the values in the two
    distributions as before
  • x1, y1, x2, x3, y2, y3 ... xn, yn
  • Add up the number of runs (consecutive values
    from the same distribution). If the two data
    sets are randomly selected from one population,
    there will be few runs.

138
Solution 4 The Monte Carlo method
  • Sometimes, when we dont know anything about a
    distribution, the best thing to do is
    independently test its characteristics.

139
The Monte Carlo method
y
x
140
The Monte Carlo method
Reference population
141
The Monte Carlo method
  • With the Monte Carlo method, we have simulated
    the test we wish to apply--that is, we have
    randomly selected samples from the parent
    distribution, and determined whether our in-house
    data are in agreement with the randomly-selected
    samples.

142
Analysis of paired data
  • For certain types of laboratory studies, the data
    we gather is paired
  • We typically want to know how closely the paired
    data agree
  • We need quantitative measures of the extent to
    which the data agree or disagree
  • Examples?

143
Examples of paired data
  • Method correlation data
  • Pharmacodynamic effects
  • Risk analysis
  • Pathophysiology

144
Correlation
50
45
40
35
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
40
45
50
145
Linear regression (least squares)
  • Linear regression analysis generates an equation
    for a straight line
  • y mx b
  • where m is the slope of the line and b is the
    value of y when x 0 (the y-intercept).
  • The calculated equation minimizes the differences
    between actual y values and the linear regression
    line.

146
Correlation
50
45
40
35
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
40
45
50
147
Covariance
  • Do x and y values vary in concert, or randomly?

148
  • What if y increases when x increases?
  • What if y decreases when x increases?
  • What if y and x vary independently?

149
Covariance
  • It is clear that the greater the covariance, the
    stronger the relationship between x and y.
  • But . . . what about units?
  • e.g., if you measure glucose in mg/dL, and I
    measure it in mmol/L, whos likely to have the
    highest covariance?

150
The Correlation Coefficient
151
The Correlation Coefficient
  • The correlation coefficient is a unitless
    quantity that roughly indicates the degree to
    which x and y vary in the same direction.
  • ? is useful for detecting relationships between
    parameters, but it is not a very sensitive
    measure of the spread.

152
Correlation
50
45
40
y 1.031x - 0.024 ? 0.9986
35
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
40
45
50
153
Correlation
50
45
40
y 1.031x - 0.024 ? 0.9894
35
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
40
45
50
154
Standard Error of the Estimate
  • The linear regression equation gives us a way to
    calculate an estimated y for any given x value,
    given the symbol y (y-hat)

155
Standard Error of the Estimate
  • Now what we are interested in is the average
    difference between the measured y and its
    estimate, y

156
Correlation
50
45
40
y 1.031x - 0.024 ? 0.9986 sy/x1.83
35
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
40
45
50
157
Correlation
50
45
40
y 1.031x - 0.024 ? 0.9894 sy/x 5.32
35
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
40
45
50
158
Standard Error of the Estimate
  • If we assume that the errors in the y
    measurements are Gaussian (is that a safe
    assumption?), then the standard error of the
    estimate gives us the boundaries within which 67
    of the y values will fall.
  • ?2sy/x defines the 95 boundaries..

159
Limitations of linear regression
  • Assumes no error in x measurement
  • Assumes that variance in y is constant throughout
    concentration range

160
Alternative approaches
  • Weighted linear regression analysis can
    compensate for non-constant variance among y
    measurements
  • Deming regression analysis takes into account
    variance in the x measurements
  • Weighted Deming regression analysis allows for
    both

161
Evaluating method performance
  • Precision

162
Method Precision
  • Within-run 10 or 20 replicates
  • What types of errors does within-run precision
    reflect?
  • Day-to-day NCCLS recommends evaluation over 20
    days
  • What types of errors does day-to-day precision
    reflect?

163
Evaluating method performance
  • Precision
  • Sensitivity

164
Method Sensitivity
  • The analytical sensitivity of a method refers to
    the lowest concentration of analyte that can be
    reliably detected.
  • The most common definition of sensitivity is the
    analyte concentration that will result in a
    signal two or three standard deviations above
    background.

165
Signal
time
166
Other measures of sensitivity
  • Limit of Detection (LOD) is sometimes defined as
    the concentration producing an S/N gt 3.
  • In drug testing, LOD is customarily defined as
    the lowest concentration that meets all
    identification criteria.
  • Limit of Quantitation (LOQ) is sometimes defined
    as the concentration producing an S/N gt5.
  • In drug testing, LOQ is customarily defined as
    the lowest concentration that can be measured
    within 20.

167
Question
  • At an S/N ratio of 5, what is the minimum CV of
    the measurement?
  • If the S/N is 5, 20 of the measured signal is
    noise, which is random. Therefore, the CV must
    be at least 20.

168
Evaluating method performance
  • Precision
  • Sensitivity
  • Linearity

169
Method Linearity
  • A linear relationship between concentration and
    signal is not absolutely necessary, but it is
    highly desirable. Why?
  • CLIA 88 requires that the linearity of
    analytical methods is verified on a periodic
    basis.

170
Ways to evaluate linearity
  • Visual/linear regression

171
Signal
Concentration
172
Outliers
  • We can eliminate any point that differs from the
    next highest value by more than 0.765 (p0.05)
    times the spread between the highest and lowest
    values (Dixon test).
  • Example 4, 5, 6, 13
  • (13 - 4) x 0.765 6.89

173
Limitation of linear regression method
  • If the analytical method has a high variance
    (CV), it is likely that small deviations from
    linearity will not be detected due to the high
    standard error of the estimate

174
Signal
Concentration
175
Ways to evaluate linearity
  • Visual/linear regression
  • Quadratic regression

176
Quadratic regression
  • Recall that, for linear data, the relationship
    between x and y can be expressed as
  • y f(x) a bx

177
Quadratic regression
  • A curve is described by the quadratic equation
  • y f(x) a bx cx2
  • which is identical to the linear equation except
    for the addition of the cx2 term.

178
Quadratic regression
  • It should be clear that the smaller the x2
    coefficient, c, the closer the data are to linear
    (since the equation reduces to the linear form
    when c approaches 0).
  • What is the drawback to this approach?

179
Ways to evaluate linearity
  • Visual/linear regression
  • Quadratic regression
  • Lack-of-fit analysis

180
Lack-of-fit analysis
  • There are two components of the variation from
    the regression line
  • Intrinsic variability of the method
  • Variability due to deviations from linearity
  • The problem is to distinguish between these two
    sources of variability
  • What statistical test do you think is appropriate?

181
Signal
Concentration
182
Lack-of-fit analysis
  • The ANOVA technique requires that method variance
    is constant at all concentrations. Cochrans
    test is used to test whether this is the case.

183
Lack-of-fit method calculations
  • Total sum of the squares the variance
    calculated from all of the y values
  • Linear regression sum of the squares the
    variance of y values from the regression line
  • Residual sum of the squares difference between
    TSS and LSS
  • Lack of fit sum of the squares the RSS minus
    the pure error (sum of variances)

184
Lack-of-fit analysis
  • The LOF is compared to the pure error to give the
    G statistic (which is actually F)
  • If the LOF is small compared to the pure error, G
    is small and the method is linear
  • If the LOF is large compared to the pure error, G
    will be large, indicating significant deviation
    from linearity

185
Significance limits for G
  • 90 confidence 2.49
  • 95 confidence 3.29
  • 99 confidence 5.42

186
If your experiment needs statistics, you ought
to have done a better experiment.
  • Ernest Rutherford (1871-1937)

187
Evaluating Clinical Performance of laboratory
tests
  • The clinical performance of a laboratory test
    defines how well it predicts disease
  • The sensitivity of a test indicates the
    likelihood that it will be positive when disease
    is present

188
Clinical Sensitivity
  • If TP as the number of true positives, and FN
    is the number of false negatives, the
    sensitivity is defined as

189
Example
  • Of 25 admitted cocaine abusers, 23 tested
    positive for urinary benzoylecgonine and 2 tested
    negative. What is the sensitivity of the urine
    screen?

190
Evaluating Clinical Performance of laboratory
tests
  • The clinical performance of a laboratory test
    defines how well it predicts disease
  • The sensitivity of a test indicates the
    likelihood that it will be positive when disease
    is present
  • The specificity of a test indicates the
    likelihood that it will be negative when disease
    is absent

191
Clinical Specificity
  • If TN is the number of true negative results,
    and FP is the number of falsely positive results,
    the specificity is defined as

192
Example
  • What would you guess is the specificity of any
    particular clinical laboratory test? (Choose any
    one you want)

193
Answer
  • Since reference ranges are customarily set to
    include the central 95 of values in healthy
    subjects, we expect 5 of values from healthy
    people to be abnormal--this is the false
    positive rate.
  • Hence, the specificity of most clinical tests is
    no better than 95.

194
Sensitivity vs. Specificity
  • Sensitivity and specificity are inversely related.

195
Marker concentration
-

Disease
196
Sensitivity vs. Specificity
  • Sensitivity and specificity are inversely
    related.
  • How do we determine the best compromise between
    sensitivity and specificity?

197
Receiver Operating Characteristic
198
Evaluating Clinical Performance of laboratory
tests
  • The sensitivity of a test indicates the
    likelihood that it will be positive when disease
    is present
  • The specificity of a test indicates the
    likelihood that it will be negative when disease
    is absent
  • The predictive value of a test indicates the
    probability that the test result correctly
    classifies a patient

199
Predictive Value
  • The predictive value of a clinical laboratory
    test takes into account the prevalence of a
    certain disease, to quantify the probability that
    a positive test is associated with the disease in
    a randomly-selected individual, or alternatively,
    that a negative test is associated with health.

200
Illustration
  • Suppose you have invented a new screening test
    for Addison disease.
  • The test correctly identified 98 of 100 patients
    with confirmed Addison disease (What is the
    sensitivity?)
  • The test was positive in only 2 of 1000 patients
    with no evidence of Addison disease (What is the
    specificity?)

201
Test performance
  • The sensitivity is 98.0
  • The specificity is 99.8
  • But Addison disease is a rare disorder--incidence
    110,000
  • What happens if we screen 1 million people?

202
Analysis
  • In 1 million people, there will be 100 cases of
    Addison disease.
  • Our test will identify 98 of these cases (TP)
  • Of the 999,900 non-Addison subjects, the test
    will be positive in 0.2, or about 2,000 (FP).

203
Predictive value of the positive test
  • The predictive value is the of all positives
    that are true positives

204
What about the negative predictive value?
  • TN 999,900 - 2000 997,900
  • FN 100 0.002 0 (or 1)

205
Summary of predictive value
  • Predictive value describes the usefulness of a
    clinical laboratory test in the real world.
  • Or does it?

206
Lessons about predictive value
  • Even when you have a very good test, it is
    generally not cost effective to screen for
    diseases which have low incidence in the general
    population. Exception?
  • The higher the clinical suspicion, the better the
    predictive value of the test. Why?

207
Efficiency
  • We can combine the PV and PV- to give a quantity
    called the efficiency
  • The efficiency is the percentage of all patients
    that are classified correctly by the test result.

208
Efficiency of our Addison screen
209
To call in the statistician after the experiment
is done may be no more than asking him to
perform a postmortem examination he may be able
to say what the experiment died of.
  • Ronald Aylmer Fisher (1890 - 1962)

210
Application of Statistics to Quality Control
  • We expect quality control to fit a Gaussian
    distribution
  • We can use Gaussian statistics to predict the
    variability in quality control values
  • What sort of tolerance will we allow for
    variation in quality control values?
  • Generally, we will question variations that have
    a statistical probability of less than 5

211
He uses statistics as a drunken man uses lamp
posts -- for support rather than illumination.
  • Andrew Lang (1844-1912)

212
Westgards rules
  • 12s
  • 13s
  • 22s
  • R4s
  • 41s
  • 10x
  • 1 in 20
  • 1 in 300
  • 1 in 400
  • 1 in 800
  • 1 in 600
  • 1 in 1000

213
Some examples
3sd
2sd
1sd
mean
-1sd
-2sd
-3sd
214
Some examples
3sd
2sd
1sd
mean
-1sd
-2sd
-3sd
215
Some examples
3sd
2sd
1sd
mean
-1sd
-2sd
-3sd
216
Some examples
3sd
2sd
1sd
mean
-1sd
-2sd
-3sd
217
In science one tries to tell people, in such a
way as to be understood by everyone, something
that no one ever knew before. But in poetry, it's
the exact opposite.
  • Paul Adrien Maurice Dirac (1902- 1984)
About PowerShow.com