Recall the hypothesis test we did last time in Class Exercise - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Recall the hypothesis test we did last time in Class Exercise

Description:

... and heart rate (HR) in beats per minute were derived from a data ... The data suggest that the mean heart rate for males is larger than 72 beats per minute. ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 28
Provided by: genespr
Category:

less

Transcript and Presenter's Notes

Title: Recall the hypothesis test we did last time in Class Exercise


1
Recall the hypothesis test we did last time in
Class Exercise 2(a) in Class Handout 4
2
2.
Measurements of body temperature (BT) in degrees
Fahrenheit and heart rate (HR) in beats per
minute were derived from a data set presented in
Mackowiak, P. A., Wasserman, S. S., and Levine,
M. M. (1992), "A Critical Appraisal of 98.6
Degrees F, the Upper Limit of the Normal Body
Temperature, and Other Legacies of Carl Reinhold
August Wunderlich," Journal of the American
Medical Association, 268, 1578-1580. The
resulting data is as follows
Males BT 96.3 96.7 96.9 97.0
97.1 97.1 97.1 97.2 97.3 97.4
97.4 HR 70 71 74 80
73 75 82 64 69
70 68 BT 97.4 97.4 97.5
97.5 97.6 97.6 97.6 97.7
97.8 97.8 97.8 HR 72 78
70 75 74 69 73
77 58 73 65 BT 97.8
97.9 97.9 98.0 98.0 98.0 98.0
98.0 98.0 98.1 98.1 HR 74
76 72 78 71 74
67 64 78 73 67 BT
98.2 98.2 98.2 98.2 98.3
98.3 98.4 98.4 98.4 98.4
98.5 HR 66 64 71 72
86 72 68 70 82
84 68 BT 98.5 98.6 98.6
98.6 98.6 98.6 98.6 98.7 98.7
98.8 98.8 HR 71 77 78
83 66 70 82 73
78 78 81 BT 98.8 98.9
99.0 99.0 99.0 99.1 99.2
99.3 99.4 99.5 HR 78 80
75 79 81 71 83
63 70 75
3
Females BT 96.4 96.7 96.8 97.2
97.2 97.4 97.6 97.7 97.7
97.8 97.8 HR 69 62 75
66 68 57 61 84
61 77 62 BT 97.8 97.9
97.9 97.9 98.0 98.0 98.0 98.0
98.0 98.1 98.2 HR 71 68
69 79 76 87 78
73 89 81 73 BT 98.2
98.2 98.2 98.2 98.2 98.3
98.3 98.3 98.4 98.4 98.4 HR
64 65 73 69 57
79 78 80 79 81
73 BT 98.4 98.4 98.5 98.6
98.6 98.6 98.6 98.7 98.7 98.7
98.7 HR 74 84 83 82
85 86 77 72 79
59 64 BT 98.7 98.7 98.8
98.8 98.8 98.8 98.8 98.8
98.8 98.9 99.0 HR 65 82
64 70 83 89 69
73 84 76 79 BT 99.0
99.1 99.1 99.2 99.2 99.3 99.4
99.9 100.0 100.8 HR 81 80
74 77 66 68 77
79 78 77
A 0.10 significance level is selected to see if
there is any evidence that the mean heart rate
for males is different from 72 beats per minute.
4
2.-continued
This summarizes how we got the SPSS output last
class
(a)
The 65 males in the data set will be treated as a
random sample. Use SPSS to do the calculations
necessary for the hypothesis test and to create
an appropriate graphical display. Then, complete
the four steps of the hypothesis test by
completing the table titled Hypothesis Test About
Mean Heart Rate of Males. The data is stored in
the SPSS data file metabolism. Before using the
Analyze gt Compare Means gt One Sample T Test
options in SPSS, we must first select only the
males in that data set as follows Select the
Datagt Select Cases options to display the Select
Cases dialogue box, and select the If condition
is satisfied option. Click on the If button to
display the Select Cases If dialogue box. From
the list of variables on the left, select the
variable sex, and click on the arrow button
pointing to the right. Either by use of the
buttons in the dialog box or by direct typing,
finish the formula so that it reads sex 1
. Click on the Continue button, and click on the
OK button, after which you will now notice that a
new variable has been added to indicate which
cases are to be included and which are to be
excluded. In case we reject H0 and want to
estimate the mean with a confidence interval, set
the confidence level in SPSS to be 90, since we
have ? 0.10.
5
This is the SPSS output we obtained
6
This is the hypothesis test we performed
Hypothesis Test About Mean Heart Rate of
Males Step 1 H0 H1 ? Step 2 Step
3 Step 4
? 72
? ? 72
0.10 (two sided)
65
t
1.879
n y s
73.37
5.875
These statistics can all be obtained from the
SPSS output.
p-value
reject H0
0.05 lt p lt 0.10
from the Students t distribution table
t distribution with df
64
from the SPSS output
p 0.065
1.671
t0.05
1.671
Since t64 1.879 and t640.05 1.671, we have
sufficient evidence to reject H0. We conclude
that the mean heart rate for males is different
from 72 beats per minute (0.05 lt p lt 0.10). The
data suggest that the mean heart rate for males
is larger than 72 beats per minute.
or (p 0.065)
7
2.-continued
For today, you were supposed to try to answer
parts (b) to (e)
(b) (c) (d) (e)
Considering the results of the hypothesis test,
decide which of the Type I or Type II errors is
possible, and describe this error.
Since H0 is rejected, the Type I error is
possible, which is concluding that ? ? 72 when
actually ? 72.
Decide whether H0 would have been rejected or
would not have been rejected with each of the
following significance levels (i) ? 0.05 ,
(ii) ? 0.01 .
H0 would not have been rejected either with ?
0.05 or with ? 0.01.
Does the difference between the sample mean heart
rate and the hypothesized mean heart rate
represent a clinically significant difference?
Why or why not?
It is a matter of judgment whether or not the
difference between 73.37 beats per minute and 72
beats per minute has practical impact.
Considering the results of the hypothesis test,
explain why a 90 confidence interval for the
mean heart rate for males would be of interest.
Then find and interpret the confidence interval.
Since rejecting H0 suggests that the hypothesized
value for the mean is not correct, a 90
confidence interval will provide us with some
information about the mean.
Look at the SPSS output to obtain the limits of
the confidence interval
8
Note that SPSS displays a confidence interval for
? ?0 instead of a confidence interval for ?.
To get the limits for a confidence interval for
?, we must add the value of the hypothesized mean
?0 to each of the displayed limits.
9
(e)
Considering the results of the hypothesis test,
explain why a 90 confidence interval for the
mean heart rate for males would be of interest.
Then find and interpret the confidence interval.
Since rejecting H0 suggests that the hypothesized
value for the mean is not correct, a 90
confidence interval will provide us with some
information about the mean. From the SPSS
output, we obtain the confidence interval limits,
and write the results
We are 90 confident that the mean heart rate for
males is between 72.15 and 74.59 beats per minute.
10
3.
Two identical footballs, one air-filled and one
helium-filled, were used outdoors on a windless
day at the Ohio State University's athletic
complex to compare the distances traveled by the
ball with helium and with air. A novice punter,
who was not informed which football contained the
helium, was the kicker. He kicked each football
39 times and changed footballs after each kick so
that his leg would play no favorites if he tired
or improved with practice. The experimenter
recorded the distance in yards traveled by each
ball. The following data are reported by
Lafferty, M. B. (1993), "OSU scientists get a
kick out of sports controversy," The Columbus
Dispatch (November, 21, 1993), B7
Trial 1 2 3 4 5 6
7 8 9 10 11 12 13 14
15 16 Air 25 23 18 16 35 15
26 24 24 28 25 19 27 25
34 26 Helium 25 16 25 14 23 29
25 26 22 26 12 28 28 31
22 29 Trial 17 18 19 20 21 22
23 24 25 26 27 28 29 30
31 32 Air 20 22 33 29 31 27
22 29 28 29 22 31 25 20
27 26 Helium 23 26 35 24 31 34
39 32 14 28 30 27 33 11
26 32 Trial 33 34 35 36 37 38
39 Air 28 32 28 25 31 28
28 Helium 30 29 30 29 29 30 26
11
A 0.05 significance level is chosen to see if
there is any evidence that the mean distance
traveled by a football is larger with helium than
with air.
(a)
State whether a paired t test or a two sample t
test should be used and why.
Since the data consists of one sample of paired
measurements, a paired t test should be used.
Return to the definitions to see a summary of the
paired t test
one-sample t test about a mean difference ?d
often called a paired t test
The H0 states that a population mean difference
?d is equal to 0 (zero).
The H1 is a statement that ?d 0 is not correct.
yd 0
The test statistic is t (sometimes written tn1)
sd ?nd
The data consists of one random sample of n
quantitative paired measurements.
(Note that this is essentially the same as the
one-sample t test where the hypothesized mean is
zero, and data consists of differences between
paired measurements.)
one sample confidence interval for a mean
difference ?d
We can be (1 ?)100 confident that
the population mean difference ?d is between
sd ? nd
sd . ? nd
yd t?/2
and
yd t?/2
The data consists of one random sample of n
quantitative paired measurements.
12
3.-continued
(b)
Use SPSS to do the calculations necessary for the
hypothesis test and to create an appropriate
graphical display. Then, complete the four steps
of the hypothesis test by completing the table
titled Hypothesis Test About Mean Football
Distance. The data is stored in the SPSS data
file football. When using the Analyze gt Compare
Means gt Paired-Samples T Test options in SPSS,
two variables must be selected for the Paired
Variables section. In case we reject H0 and want
to estimate the mean difference with a confidence
interval, set the confidence level in SPSS to be
95, since we have ? 0.05.
13
Two box plots, one for the distances with helium
and one for the distances with air, would be an
appropriate graphical display for one sample of
quantitative paired measurements one box plot of
the differences would also be appropriate.
Since we notice that there are several outliers
among the differences, we might decide to perform
the paired t test with these outliers removed
from the data. (See part (f).)
Each outlier is labeled with its case number
(i.e., its line number in the SPSS data file).
14
Hypothesis Test About Mean Football Distance Step
1 H0 H1 ? Step 2 Step 3 Step 4
?HA 0
?HA gt 0
Note that the order of subtraction on the SPSS
output is the opposite of the order chosen in H0 .
0.05 (one sided)
n yHA sHA
t
0.420
39
0.462
6.867
These statistics can all be obtained from the
SPSS output.
p-value
do not reject H0
t distribution with df 38
from the Students t distribution table
0.10 lt p
from the SPSS output
p 0.677/2 0.3385
t0.05
1.684
Since t38 0.420 and t380.05 1.684, we do not
have sufficient evidence to reject H0. We
conclude that the mean distance traveled by a
football is not larger with helium than with air
(0.10 lt p).
or (p 0.3385)
15
3.-continued
Considering the results of the hypothesis test,
decide which of the Type I or Type II errors is
possible, and describe this error.
(c) (d) (e)
Since H0 is not rejected, the Type II error is
possible, which is concluding that ?HA 0 when
actually ?HA gt 0.
Decide whether H0 would have been rejected or
would not have been rejected with each of the
following significance levels (i) ? 0.01 ,
(ii) ? 0.10 .
H0 would not have been rejected with ? 0.01 nor
with ? 0.10.
Considering the results of the hypothesis test,
explain why a confidence interval for the mean
difference is not of interest.
Since H0 is not rejected, we have no reason to
doubt the hypothesized value for the mean
difference in fact the 95 confidence interval
will most likely contain the hypothesized value
for the mean difference.
Note that the limits of the confidence interval
(displayed on the SPSS output) contain the
hypothesized value for the mean difference zero
(0).
16
(f)
If the paired t test is performed after removing
the outliers among the differences in this data,
the resulting p-value is 0.071. Would this
change the conclusion in the hypothesis test
(i) with the significance level ? 0.05
actually selected? (ii) if a significance
level ? 0.01 had been selected? (iii) if
a significance level ? 0.10 had been selected?
H0 is not rejected with ? 0.05 both when the
outliers are removed from the data and when the
outliers are included in the data.
H0 is not rejected with ? 0.01 both when the
outliers are removed from the data and when the
outliers are not removed.
H0 is rejected with ? 0.10 when the outliers
are removed from the data, but H0 is not rejected
when the outliers are included in the data.
17
4.
On the west coast of the United States is a chain
of restaurants known as McDoogle's. Information
about the past year is gathered for a random
sample of restaurants in the northern part of the
chain, and for a random sample of restaurants in
the southern part of the chain however, for some
restaurants, number of customers for the past
year was not available. The variables recorded
in the data set are a restaurant identification
number (ID), the part of the chain in which the
restaurant is located (LOCATION), millions of
dollars of expenses (EXPEN), millions of dollars
of sales (SALES), and millions of customers
(CUSTOMER). (Of course, the variable ID is
intended only as a label and not intended to be
part of any statistical analysis.) The resulting
data is as follows
ID LOCATION EXPEN SALES
CUSTOMER 01 South 1.0 0.1 3.5 02 North 1.
2 1.8 4.4 03 South 2.8 4.0 04 North 1.9 6.1
05 South 0.3 5.3 4.2 06 South 1.5 4.0 0.5 07
South 3.4 7.4 5.1 08 North 1.0 3.6 3.7 09 S
outh 1.6 2.2 10 North 0.9 7.5 3.5
18
11 North 1.6 4.6 3.1 12 North 1.9 8.0 13 Nort
h 0.6 3.3 3.8 14 North 1.0 2.5 3.5 15 South
2.5 2.6 2.4 16 North 1.0 8.1 3.9 17 South
1.9 1.7 1.5 18 North 1.4 6.7 3.5 19 North 0.
7 3.8 20 North 1.0 5.2 4.4 21 North 1.3 7.8
22 North 2.4 5.1 4.7 23 North 1.2 7.8 3.7 24
North 1.7 7.7 4.5 25 South 1.4 2.8 3.7 26 N
orth 1.0 4.9 3.5 27 North 1.2 8.0 28 South 2
.3 2.0 2.9 29 North 2.2 2.5 4.1 30 North 0.8
5.1 4.2
19
4.-continued (a)
A 0.05 significance level is chosen to see if
there is any evidence that the mean number of
customers is larger for the northern chain than
for the southern chain.
State whether a paired t test or a two sample t
test should be used and why.
Since the data consists of two independent random
samples of measurements, a two sample t test
should be used.
Return to the definitions to see a summary of the
two sample t test
20
two sample t test about a difference between
means ?1 ?2
The H0 states that the difference ?1 ?2 is 0
(zero).
The H1 is a (one-sided or two-sided) statement
that the hypothesized difference 0 (zero) is not
correct.
y1 y2
A two sample t test statistic is of the form t
standard error
The data consists of two independent random
samples, one with n1 quantitative measurements
and the other with n2 quantitative measurements.
The formula for estimated standard error and the
degrees of freedom given in Section 1.10 of the
textbook is based on the assumption that the two
sampled populations have the same standard
deviations.
two sample confidence interval for a difference
between means ?1 ?2
We can be (1 ?)100 confident that
the difference between means ?1 ?2 is between
(y1 y2) t?/2(standard error)
(y1 y2) t?/2(standard error)
and
The data consists of two independent random
samples, one with n1 quantitative measurements
and the other with n2 quantitative measurements.
21
The appropriate formula for estimated standard
error and the appropriate degrees of freedom are
different when the two sampled populations do not
have the same standard deviations the formula
for degrees of freedom is very complicated but is
easily available from statistical software (such
as with SPSS or a TI-84 calculator).
The approach based on the assumption that the two
sampled populations have the same standard
deviations is called the pooled approach. The
approach based on the assumption that the two
sampled populations have the same standard
deviations is called the separate approach. Since
both approaches give virtually identical results
when the population standard deviations are
indeed equal, on can always use the separate
approach (in place of the textbook approach).
22
4.-continued (a) (b)
A 0.05 significance level is chosen to see if
there is any evidence that the mean number of
customers is larger for the northern chain than
for the southern chain.
State whether a paired t test or a two sample t
test should be used and why.
Since the data consists of two independent random
samples of measurements, a two sample t test
should be used.
Use SPSS to do the calculations necessary for the
hypothesis test and to create an appropriate
graphical display. Then, complete the four steps
of the hypothesis test by completing the table
titled Hypothesis Test About Difference in Mean
Number of Customers. The data is stored in the
SPSS data file chain. In case we reject H0 and
want to estimate the difference in mean number of
customers with a confidence interval, set the
confidence level in SPSS to be 95, since we have
? 0.05. When using the Analyze gt Compare Means
gt Independent-Samples T Test options in SPSS,
selecting the three variables expenses, sales,
and customrs for the Test Variable(s) section
will produce output for three hypothesis tests,
one for this hypothesis test and the other two
for hypothesis tests in a homework assignment.
Also, location must be selected for the Grouping
Variable slot, and the Define Groups button must
be used to enter 1 (one) for Group 1 and 2 (two)
for Group 2.
Two box plots, one for the for the northern chain
and one for the southern, chain would be
appropriate.
23
4.-continued
24
(No Transcript)
25
Hypothesis Test About Difference in Mean Number
of Customers Step 1 H0 H1 ? Step 2 Step
3 Step 4
?N ?S 0
We shall take the approach of always use the
separate t test.
?N ?S gt 0
0.05 (one sided)
nN yN sN nS yS sS
15
3.900
0.4629
t
1.717
8
2.975
1.4859
These statistics can all be obtained from the
SPSS output.
p-value
do not reject H0
t distribution with df
from the Students t distribution table
0.05 lt p lt 0.10
8
from the SPSS output
p 0.126 / 2 0.063
t0.05
1.860
Since t8 1.717 and t80.05 1.860, we do not
have sufficient evidence to reject H0. We
conclude that the mean number of customers is not
larger for the northern chain than for the
southern chain (0.05 lt p lt 0.10).
or (p 0.063)
26
4.-continued
Considering the results of the hypothesis test,
decide which of the Type I or Type II errors is
possible, and describe this error.
(c) (d) (e)
Since H0 is not rejected, the Type II error is
possible, which is concluding that ?N ?S 0
when actually ?N ?S gt 0 .
Decide whether H0 would have been rejected or
would not have been rejected with each of the
following significance levels (i) ? 0.01 ,
(ii) ? 0.10 .
H0 would not have been rejected with ? 0.01 but
would have been rejected with ? 0.10.
Considering the results of the hypothesis test,
explain why a confidence interval for the
difference between means is not of interest.
Since H0 is not rejected, we have no reason to
believe that there is a difference between the
means in fact the 95 confidence interval will
most likely contain zero the hypothesized
difference between means (0).
Note that the limits of the confidence interval
(displayed on the SPSS output) contain the
hypothesized difference between means zero (0).
27
Using the SPSS output, compare the p-value
corresponding to the separate t test statistic
actually used in the hypothesis test with the
p-value corresponding to the pooled t test
statistic. Explain whether or not the pooled t
test statistic would have given different results
than the separate t test statistic actually used
and why.
(f)
Since the p-value corresponding to the separate t
test statistic is 0.126 / 2 0.063, and the
p-value corresponding to the pooled t test
statistic is 0.035 / 2 0.0175, we see that with
? 0.05 the pooled t test statistic would have
led us to reject H0 even though we did not reject
H0 with the separate t test statistic actually
used. The pooled t test statistic is not reliable
when the two samples are selected from
populations with significantly different standard
deviations. It appears from the box plots that
the standard deviations are significantly
different.
Before submitting Homework 5, check some of the
answers (if you havent done so already) from the
link on the course schedule http//srv2.lycoming
.edu/sprgene/M214/Schedule214.htm
Write a Comment
User Comments (0)
About PowerShow.com