CONFIDENCE INTERVALS AND PROBABILITY - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

CONFIDENCE INTERVALS AND PROBABILITY

Description:

The confidence interval need not be constrained to 95% probability and it can ... In figure 1on previous page, z represents the number of standard deviations (Sd) ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 32

Provided by: shrpss

Category:

more less

Transcript and Presenter's Notes

Title: CONFIDENCE INTERVALS AND PROBABILITY

1

CONFIDENCE INTERVALS AND PROBABILITY
Basics of Confidence Interval - through example
As observed, hematocrit (Hct) values ( measured
in ) for healthy patient are not all same they
range over an interval.
What is that interval?
Extreme values occasionally arise in healthy
patients.
It is not possible to specify an interval that
will always include healthy patients and exclude
unhealthy ones.
Best you can do is to find an interval that most
frequently include healthy patients and exclude
unhealthy patients.
We might say that the interval should include 95
of the healthy population, which is to say that a
randomly chosen healthy patient has 0.95
probability of falling in the interval.

This leads to the term Confidence Interval.
Error Rate
We let ? represent the probability that a healthy
patients Hct will fall outside the healthy
interval. In this case ? 0.05 or 5.
When a patients Hct does fall outside the
healthy interval, we think, it is likely that
the patient has arisen from an unhealthy lot. But
not certain.
The confidence interval need not be constrained
to 95 probability and it can apply to any type
of distribution ( need not be normal).
A general statement for the confidence interval
is
The probability that a randomly chosen
observation from a given probability distribution
is contained in a specified interval is given by
the area of the distribution under the curve
over that interval.

3
Fig 1 A standard normal distribution shown with
alpha 2.5 and z 1.96
4
The most commonly used distribution in
bio-statistics is the normal, as most of the data
sets naturally follow the normal
distribution. In figure 1on previous page, z
represents the number of standard deviations (Sd)
away from the mean. This figure shows a standard
normal distribution with z 1.96 and
corresponding area ? 2.5 ( under the curve to
the right). Similar area will be towards left,
making a total of ? 5.
5
Example Prostate volume has been posed as a
possible indicator of prostate cancer. Assume
prostate volume distribution normal with a mean
of 35.46 ml and SD 16.35 ml. Two patients are
presented to you with prostate volume of 59 ml
and 83 ml. How typical are they? They are 1.5 and
3.0 SD above the mean. Look at the table on the
next transparency and note that 6.7 of the area
under the curve lies to the right of the z-value
of 1.5. This tells 6.7 percent of the normal
patients have prostate volume of at least this
much greater than the mean and 13.4 denotes (
larger and smaller from the mean). This indicates
a large prostate but not an obviously atypical
one.
6
Fig 2
7
On the other hand, the second patients z 3,
yields a ? 0.0013 only 1 in 1000 normal
patient would have a prostate so large by chance
alone. We are likely to conclude that this
patient is unlikely to have arisen from
clinically normal population.
Critical Value If we want to select out as
abnormal all patients with prostate volumes in
upper 1 of the tail, the actual critical value
will be
8
Confidence Interval on an Observation from
Individual Patient Suppose that we should find
that healthy Hct arise from a distribution
Symbolizes a normal distribution with a mean of
47 and SD of 3.6 (1- ?) is the area( except for
both tails) is 95. In this requirement, the area
is enclosed by m ? SD, which is 47 1.96 x
3.6 54 to 47 - 1.96 x 3.6 40. Therefore,
the healthy HCT interval is 40 - 54 If we have
a healthy patient, we would bet 95 to 5 that his
Hct will fall in this interval, which is to say,
we are 95 confident that it will include a
healthy patient.
9
We can say The probability that a randomly drawn
Hct from is contained in the interval ( 40-54)
95. P( 40 lt Hct lt 54 ) 0.95 This is for
normal distribution. A confidence interval is an
interval about an estimate based on probability
distribution, that expresses the confidence, that
the interval contains the population statistic
being estimated. If we calculate mean from
smaller samples, the values will be different
from population mean ?. And and there will be
standard deviation of these sample means. This is
known as standard error of means (SEM) and is
defined as The ?m is known as population SEM.
10
The statistical quantity, standard error of
means(SEM) is defined as The population SD
divided by the square root of the population
size. If ? 18.04 and the sample size is 301,
the SEM (?m ) 18.04/?301 1.0398. Similarly
sample SEM sm can be defined. Normally when
population SD is not known, the sample standard
deviation (S) is used. And we estimate the value
of ?m by sm sm ( estimated value) (s/?n), n
is population size. .Note that the population
mean (?) and the population standard deviation
(?) are important in getting the confidence
interval.
11
Let us take 10 prostate volumes as our sample. We
calculate the mean m 32.73 ml. And the sample
standard deviation s as s 15.92ml Now assume
that the population SD is not given. You can
estimate the population SEM (?m ) from sm It is
given by sm s/ ?n , which will be
15.92/sqrt(10) 5.3067. Now, Suppose that we do
not know ?, but want to describe it as well as
possible from 10 samples given. We would
estimate ? by calculating m ( which is given as
32.73 ml). We will then find ?m which is ?m
16/?10 ?/ ?n 5.06 ml and, we will use these
numbers to put a confidence on ?.
12
To find a confidence interval with 2.5 of
unusual cases on either side in each tail, we can
calculate end points of the interval as m ? 1.96
?m 32.73 ? 1.96 x 5.06 22.81 to 42.65
ml We are 95 confident that the population mean
? is included in the interval 22.81 - 42.65.
Indeed the population mean is 36.47 ml. Often we
need need to look at a confidence interval on a
mean, when we do not know the population standard
deviation and must estimate it from a small
sample. In this case, because of small sample
size, rules of normal distribution do not
apply. In such situation, we have to look at the
t distribution. This distribution is similar to
normal distribution, but gives a wider confidence
interval to compensate for the lack of accuracy
in the standard deviation. - smaller the samples
- wider is the t distribution.
13
The t distribution depends upon the degrees of
freedom (df). df is defined as (n-1), where n is
the number of samples. Smaller the df, less
accurate is the estimated value of ?m and fatter
is the distribution. Figure 3 shows t
distribution for n 10 ( df 9) with 2.5 of
the area under the curve shaded. This
distribution is very similar to normal
distribution, except 2.5 critical value lies at
2.262 s ( s is sample SD, rather than 1.96 ?
). In the Table on next transparency we have
shown some commonly selected distances (t) , in
terms of number of sds away from means for most
commonly used one and two tailed ? and (1- ? )
areas under the curve for various df values.
14

Figure 3
15
Figure 4
16

Alternative is one sided hypothesis It would
be two sided if we believe that the antibiotic
could either lengthen or shorten the healing time
(H1 ?a ? ? ).
We alter the treatment only for significance in
the positive tail and not in negative tail.
Therefore a one tailed test is appropriate.

Confidence Interval on a Mean- known Standard
Deviation
Assume that it is given population mean ? 35
ml and population SD ? 16 ml.
Suppose that we did not know ?, but want to
describe it from 10 sample values.
We can calculate sample mean m from the 10
values, however, it would be meaningful to know
standard deviation of the sample mean (SEM) to
get the confidence in the sample mean.
The SEM ?m is given by ? /?n.
We would estimate ? by calculating sample mean,
m ( 32.73)
SEM will be required, which is ?m 16/?10 ?/
?n 5.06 ml.
We will use these numbers to put a confidence on
? from sample mean - m.

18
To find a confidence interval with 2.5 of
unusual cases on either side in each tail, we can
calculate end points of the interval as m ? 1.96
?m 32.73 ? 1.96 x 5.06 22.81 to 42.65
ml We are 95 confident that the population mean
? is included in the interval 22.81 - 42.65.
Indeed the population mean is 36.47
ml. Example An orthopedist is experimenting with
the use of nitrox an an anesthetic in the
treatment of childrens arm fracture. He
anticipates that it may provide an attractive
short procedure. He treats 50 children and
records the treatment time in minutes. m 26.26
min and ? 7.13 min. He wants to calculate 95
confidence interval on the population mean
treatment time.
19
He will require SEM for the confidence interval
on population mean. ?m ?/ ?n 7.13 /
?50 1.008 min The confidence interval will be
defined as follows P m - 1.96 ?m lt ? lt , m
1.96 ?m P 26.26 - 1.96 x1.008lt ? lt ,
26.26 1.96 x 1.008 P24.28lt ? lt ,28.24
0.95 Which means that the orthopedics is 95
confident that the population mean time, ?, to
treat lies between 24 and 28 minutes. Next, If we
do not know the population standard deviation and
have to estimate from a small sample, we have to
use t-distribution to estimate ? first. Now we
will use the sample mean, m, and sample SD, s
from samples (rather than population mean ? and
population SD ?), for 10 samples.
20
Now,For 10 samples, m 32.7 and s 15.9 (say,
given), with df 9. Now look into the table of
t- distribution, for df 9. Let us examine,
prostate volume of 59 ml, this yields, t (
59-32.7)/15.9 1.65. Which places it 1.65 SD
above the mean. On the t- distribution, it lies
between ? 0.05 and 0.10( between 1.383 and
1.833, for df 9 row. Which means that between
5 and 10 of patients will have prostate
volume greater than this patient. Similarly, 83
ml prostate will yield, t 3.16, which will fall
between ? 0.001 and 0.005, which implies less
than 1 of patients will have prostate volume
that large. Actual figure are ? 0.067 for t
1.65 and ? 0.006 for t 3.16.
21
Suppose we want to use 10 data to put a 95
confidence interval on the population mean
prostate volume with m 32.7 ml and s 15.9
ml. Since ? is not available, first you have to
calculate sm from s. sm s/?n 15.92/ ? 10
15.92/ 3.16 1.87ml. To find t- values for
95 confidence, you have to look under 0.95 for
two tailed ( 1- ? ) in the table for df 9 to
find t(1- ?) 2. 262 Because, here you have
to use sample SEM, rather than population SEM,
the confidence interval based on t-distribution
wider and extends to 2.262 ( instead of 1.96) and
this interval for 95 confidence will be P m
- t(1- ?/2) .smlt ? lt , m 1.96 t(1- ?/2) .
sm P 32.73 - 2.262 x 1.87 lt ? lt 32.73
2.262 x 1.87
22
P 28.5 lt ? lt 36.96 0.95. Actual value
of ? is 36.47 ml, which is included in this 95
confidence interval. The Chi-Squared
Distribution Suppose that the amount of active
ingredient in a medicinal capsule is crucial.
Less then ?- c mg fails to work and more than ?
c mg damages patient. We need a confidence
interval on c, in terms of multiples of SD, to be
confident that probable variability in content
is not too large. Normally, the confidence
interval on variability are found in terms of the
variance s2 and afterwards converted to SD
units. The sample variance , s2, drawn randomly
from population, when
23
multiplied by the constant df/ ?2 follows a
Chi-squared ( ) distribution. So we need
distribution to find confidence interval on the
variance. The table in figure 6 provides the
values that yield commonly used values of ?
randomly drawn value from distribution lies in
the tail demarked by the tabulated
value. The distribution is shown in the
figure 5. Example Using a sample of 10 prostate
volumes and s, what is the 95 confidence
interval on the population standard deviation,
which is 16.35ml. We have calculated the sample
SD, s 15.92 ml and df 9. We shall find the
interval on the population variance ?2 . s2
15.92 2 253.45
24
Figure 5 Chi squared distribution for df 9
25

Figure 6 Chi Squared Distribution, Right Tail
26
Look in the table of Chi-squared distribution,
for df 9, and the right tail area of 2.5 , we
find 19.02, and for the left tail, it is
2.70 (the distribution is not symmetric). By
substituting these values P253.45 x
9/19.02 lt ?2 lt253.45 x 9 / 2.70 P 119 lt ?2
lt 844.83 0.95 Take the square root, P
10.95 lt ? lt 29.07 0.95 The population SD is
16.35 ml and it is falling within the confidence
interval.
27
Example 2 We are investigating reliability of a
certain brand of thermometer. The standard
deviation will give an indication of its
precision. We want an upper 95 confidence limit
on this precision. This is the value of SD that
will not exceed no more than 5 times in 100
readings. s 1.23 degrees, and s2 1.23 2
1.5 16 readings were taken and it forms a sample
set. For the upper 95 limit, we will use the
right side of the equation.
28
which gives us P?2 lt1.51 x 15 / 7.26
P?2 lt 3.13 0.95 or The for the
given distribution is 7.26 . P? lt1.77
0.95 Which means that we are 95 sure that the
population SD of this thermometer will not
exceed 1.77 0 F Example 3 We know that for the
prostate volumes, the population mean is 16.35
ml. We ask a question, how large a standard
deviation S would have to observe in a sample
size of 10 to have a 5 or less probability that
it could have been so large by chance alone?
29
From the table ( figure 6), the critical
chi-square value for a 5 tail for n-1 9 df is-
( right hand side tail )
16.9 which gives 16.9 9 s2/16.352 After
solving this , you would get s2 501.98 or s
22.4 ml Which implies that a sample standard
deviation exceeding 22.4 ml would have less than
5 probability of occurring by a chance alone.
30

Concept and Practice in Hypothesis Testing
Suppose that we want to know whether a new
antibiotic reduces time for a particular type of
lesion to heal.
We already know the mean time ? to heal the
population without antibiotic - say m.
Our clinical hypothesis is that the antibiotic
helps in healing time
ma lt ?
To be sure that our comparison gives us
believable answer, we have too go through the
formal hypothesis testing procedure.
From the data, we calculate a value called
statistic, that will answer the statistical
question implied by the hypothesis.
- Answer the time to heal question - compare ?
with ma .

Such statistics are calculated from data that
follow probability distributions. These
distributions are assumed to be normal.
In most cases we start will null hypothesis ( our
sample is no different from known information.
We hypothesize that the population mean time to
heal with antibiotic ?a ( which we estimate from
our sample mean) is no different from ? .
After forming the null hypothesis, we can form
the alternate hypothesis stating that the nature
of difference, if it should appear.
The null hypothesis is represented by H0 and the
alternate by H1.
H0 ?a ?, and H1 ?a lt ?,