Title: ChE 253K Lecture 11
1ChE 253K Lecture 11
Hypotheis Testing Comparing Means
Paired Testing Analysis of Variance
4/6/04 4/8/04
2Hypothesis Testing
Proof by contradiction You assumed the
opposite of what you wanted to prove .
Demonstrate that the assumption false because it
led to contradiction and conclude that what you
wanted to prove must (might) therefore be true
..
3Example The smoke alarm again Ho
There is no fire Ha
There is a huge fire consuming your home
4 How to Rules for Hypothesis Testing
 The Process sequence is as follows
 Formulate Ho and Ha
 State (choose) test criteria or decision point.
 Draw a picture and find statistic at the decision
point  Calculate that statistic from data
 Compare calculated statistic to decision point
 Report engineering implication.
5 Fast Example
 Claim My tires last at least 28K miles
 Test 40 tires get
 What can we say at 0.01 confidence ?
 a)
 We choose
 c)
_at_
s
c
miles
348
,
1
463
,
27
s
lt
m
m
miles
28
miles
28
K
H
K
H
0
a
a
01
.
0
We choose 1 side ( gt than or at least) and use z
as
a
01
.
0
³
30
n

33
.
2
z
6

m
c
000
,
28
463
,
27

0
52
.
2
Z
 d)
  2.52 ? test point (2.33) . we can reject
. Very unlikely  that
 Very unlikely that .. true value
is actually less than  28K at 99 confidence.
40
1348
n
s
H
0
m
!
28
K
m
28
K
This could be called a ztest, a large sample
test for a population mean
7Small Sample Test for Population Meana simple
ttest
The Righteous Insurance Co. will not insure your
car unless the mean repair cost after a 10 mph
collision is significantly less than 1K. They
have crashed 5 cars like yours and the cost of
repairs averaged 650.00 with s 299.00
8Lets do this step by step
 H0 m ? 1000 (cost is too high), Ha m lt 1000
(cost is ok)  We choose 95 confidence
 Draw and find t0.05,4
We want the observed t to lie below the test
value The difference, x m should be negative
to support Ha ( m lt 1K) Could we
do this the other way around???
9Continuing step 3
A mini t table
Statistic at decision point we see that t0.05,4
2.13 so we reject H0 if Tobs lt t0.05,4
2.13
Homework 1. Finish this!
10 Calculate Statistic from data
650  1000
tobs
 2.615
299
5
5. Compare calculated to observed
 2.13
2.615
t0.05, 4
6. State Conclusions We reject Ho. We are more
than 95 confident that the average cost is not
greater than 1K.sell the kid some insurance!
You get to do this again at home!!
11Inferences from 2 means ..analysis of differences
between two samples.
Theorem 7.1 If the distributions of two
independent random variables have the means
and and the variances and
, then the distribution of their sum ( or
difference ) has the mean
and the variance
m
m
s
s
2
2
1
1
2
2
m
m
m
m

)
or
(
2
2
s
s
2
1
2
1
2
1
This allows us to test for example, two
manufacturers of the same product . a common
sort of study in our business.
2
2
s
s
2
1
c
c
1
2
2
s
2
s
1
2
m
m
)
m
m
or ?
(
1
2
2
1
Normal Distribution
12c
c
m
m
If are population means, then
should be normally distributed with
a mean and
with a standard deviation

2
1
2
1
m
m
m

c
c

2
1
2
1
s
s
2
2
s
2
1
n
n
c
c

2
1
2
1
s
s
s
s
2
2
2
2
n
n
c
c
2
2
1
1
2
1
Has Normal Distribution
So .. as usual
(
)
(
)
m
m
c
c



2
1
2
1
z
(
)
)
(
s
2
s
2
n
n
1
1
2
2
13An Example My bulbs last longer than your bulbs
s
c
27
,
647
,
40
n
are they different to
1
1
1
a
05
.
0
s
c
31
,
638
,
40
n
2
2
2

m
m
m
m
(they are the same)
0
or
H
a)
2
1
2
1
0
m
gt

m
m
0
H
(they are different is better)
1
2
1
a
a
b) Choose or 95
confidence
05
.
0
Here n is big and we know s
c)
a
05
.
0
z
z of 1 0.05 0.95 gt z 0.95 1.65
1.65
14(
)
(
)
(
)
(
)





m
m
c
c
!
0
638
647
2
1
2
1
38
.
1
z
d)
s
s
2
2
2
2
31
27
2
1
40
40
n
n
2
1
a 0.005
 We can NOT reject
 This guys claim to be better is .. ?
 Please work through the coal mine example on
page 260 261.  carefully. It is messy arithmetic but you need to
see the concept  design of the test.
65.
.
1
38
.
1
z
1.65
1.38
15This is ok if n gt 30. we can even use
if we dont know but if
, we have to do t and t is messy in this
case.requires pooling the variance ..see pages
260261. When
lt
s
30
n
Pooled Variance
(
)
(
)
(
)




m
m
c
c
2
n
n
n
n
!
!
2
1
2
1
2
1
2
1
mess
t
(
)
(
)
n
n


2
2
1
1
s
n
s
n
2
1
2
2
1
1
Where the degrees of freedom are n1 n2  2
So . what is the value of this messy stuff ?? !
16n is small and I dont know s !!
 Options are
 Punt, cheat, fake itwe pick fake it
 If we believe that s1 s2 then
or use form in book On page 260
and n n1 n2 2
If we do not know s or if the standard deviations
are different, computing n gets messy.see
options on page 273
17Here is book form
where
Example
We test heat generation from coal from two mines
Samples are taken from mines 1 and 2. Mine 1, n
5, x1 8,230, s2 15.750 Mine 2, n 6,
x2 7,940, s2 10,920. Are the mines
different??
1
2
We choose 99 or t 0.005 as our criterion and n
5 6 2 9
18H0 m1 m2 0, Ha m1 m2 ?
0 Criterion a 0.01, n 5 6 2 9 T
0.005, 9 3.250 Calculating spooled
114.3
(8,230 7,940)
t
4.19
a 0.005
t
114.3
4.16
3.25
Since the calculated statistic, 4.19 is bigger
that the test, 3.20 we reject the null
hypothesisthere is little likelihood that the
coal from the two mines is the same
19NOTE In the analysis above, the populations are
independent. They must be so for this analysis
to be valid. If the samples are paired, then a
different approach is required. The analysis
above can NOT be used for before and after type
analysis .. unfortunately, we often work with
paired samples. A paired comparison experiment
is one of the most effective ways to reduce
natural variability while comparing treatments.
For example, in comparing hand creams, the two
brands are randomly assigned to each subjects
right hands. This eliminates variability due to
skin differences.the variance is at least as big
as the effect!
20Please consider the example on pages 263 265.
In paired tests, we must concern ourselves with
the distribution function of differences between
pairs . to remove pair to pair differences ..
here are results after the class
D
Testing effectiveness of safety class.loss of
worker hrs/week
Before After 45 36 9 73
60 13 46 44 2 124 119 5
33 35 2 57 51 6 83
77 6 34 29 5 26 24
2 17 11 6
c
08
.
4
,
2
.
5
S
21 1)
 2) (given)
 3)
 4)
 5) 4.03 gt 1.83 .. Reject
 6) .. Looks like strong evidence for a change !
gt
m
m
0.
0
H
H
0
a
a
05
.
0
0.05
u
83
.
1
,
9
1

10
t
05
.
0
,
9
H
0


m
c
0
2
.
5
0
03
.
4
t
10
08
.
4
n
s
!
H
0
22Please consider this silly example Cab
company wants to compare Gas A Gas B for
best mileage .. owner randomly assigns 50 of her
cabs to A 50 to B and .
c
n mpg S A 50 25 5.00 B 50 26 4.00



c
c
1
26
25
2
1
is B better than A ?
23Here is a better way . just 10 cabs.
Here is the data
Cab Gas A Gas B Difference
1 27.01 26.95 0.06 2 20.00
20.44  0.44 3 23.41 25.05  1.64
4 25.22 26.32  1.10 5 30.11
29.56  0.55 6 25.55 26.60  1.05
7 22.23 22.93  0.70 8 19.78
20.23  0.45 9 33.45 33.95  0.50
10 25.22 26.01  0.79 Mean 25.20
25.80  0.60 4.27 4.10 0.61
Note that the means S are about the same as
before !
Standard Deviation
24Fortunately, the column has a very small s
. we can get something from this ! Pairing
eliminates variability between drivers
D

m
c
.. you try this .. see what you learn. Are
A B different ? This is your second
homework assignment
t
n
s
Here is a cute way to do pairs .
Gas B
Is the overall tilt left or right ??
Gas A
20
22
24
26
28
30
32
34
Miles per gallon
25Two more tests . for samples, see pages 215
217. The point here is that if our sample comes
from the normal population, not only should it
have the same , it should also have the same
. So . we can test to see if this is true.
The first test is based on theorem 6.4 that
asserts. If is the variance of a random
sample of size n taken from a normal population
having the variance , then
m
s
2
S
s
2
(
)
2
å
(
)
n

X
X

2
1
S
n
c
2
i
1
i
s
s
2
2
is a random variable having the chisquare
distribution with the parameter

u
1
n
26c
2
Table 5 at the end of the book contains selected
values of for various values of ,
again called the number of degrees of
freedom, where is such that the area
under the chisquare distribution to its right
is equal to . In this table the lefthand
column contains values of , the column
headings are areas in the righthand
tail of the chisquare distribution, and the
entries are values of (see also figure
6.7). Unlike the tdistribution, it is necessary
to tabulate values of for
, because the chisquare distribution is not
symmetrical.
a
u
c
2
a
a
a
u
c
2
a
c
2
gt
a
50
.
0
a
u
c
( degrees of freedom)
2
)
(
f
Figure 6.7 Tabulated values of chisquare
a
c
2
c
0
2
a
27(No Transcript)
28Lithography Exposure Tool
180nm Image
29c2 30.1
we find from Table 5 that for 19 degrees of
freedom, .
Substituing into the formula for the chisquare
statistic, we get
Thus, the probability that a good shipment will
erroneously be rejected is less than 0.05.
30Another Example
The machine that a pharmaceutical company uses to
fill bottles with a popular cough medicine has a
standard deviation of 0.18 ounces. The
manufacturer of another machine claims that his
product has a smaller standard deviation and
therefore produces a more consistent product. To
test this claim, the pharmaceutical company
samples 41 bottles on the new machine and finds a
standard deviation of 0.15 ounces. At a 0.05
level of significance, test the manufacturer's
claim. SolutionTranslate the claim into a
hypothesis statement. Label the claim and
figure appropriately.
Ho gt 0.18 Ha lt 0.18 claim
31Use the level of significance, a, which is the
area of the H0 rejection region, to determine the
DECISION POINT or CRITICAL VALUE. The decision
point is the number that separates the "Fail to
Reject H0" from the "Reject Ho" in the figure.
Label the CRITICAL VALUE on the drawing.
The area under the ChiSquare curve to the right
of the location of the critical value is 0.95.
The degrees of freedom is 41  1 40. Read row
40 (df) and column 0.95. The CHI SQUARE table
yields the critical value of 26.509.
32Use the formula to calculate the TEST STATISTIC.
Label it on your figure.
Note All the c2values are positive, starting
with zero (0) on the left and extending to
infinity on the right. Thus, the test statistic
27.778 falls in the Fail to Reject H0 region.
Therefore, the data appears to refute the
claim. The standard deviation of the new machine
is NOT less than 0.18.
33Another analysis of variance
Theorem 6.5 If and are the
variances of independent random samples of size
and respectively, taken from two
normal populations having the same variance, then
2
2
S
S
1
2
n
n
1
2
2
S
1
F
2
S
2
is a random variable having the F distribution
with the parameters and

u

u
1.
n
1
n
1
1
2
2
Please work through example on page 217! Note
Both (chi) and F are NOT symmetrical
distributions. Please be careful with tables 5
(c2) and 6 (F).
2
c
34Class Schedule
No Class on Thursday April 15th 
4/13/04
35Homework
 Due before 331 on TUE. 4/14/04
Read again and again Pages 215217 and Chapter
7 carefully! Read Pages 275281. Work the
examples pointed out in the notes. Do Two
problems from the notes and 7.43, 7.46, 7.48,
7.52, 7.63, 7.68, 7.72, 8.8, 8.14