IE341: Introduction to Design of Experiments presentation

About This Presentation

Transcript and Presenter's Notes

Title: IE341: Introduction to Design of Experiments

1
IE341 Introduction to Design of Experiments
2

Last term we talked about testing the
difference between two independent means. For
means from a normal population, the test
statistic is

We also covered the case where the two means
are not independent, and what we must do to
account for the fact that they are dependent.

And finally, we talked about the difference
between two variances, where we used the F ratio.
The F distribution is a ratio of two chi-square
variables. So if s21 and s22 possess
independent chi-square distributions with v1 and
v2 df, respectively, then
has the F distribution with v1 and v2 df.

All of this is valuable if we are testing
only two means. But what if we want to test to
see if there is a difference among three means,
or four, or ten?
What if we want to know whether fertilizer
A or fertilizer B or fertilizer C is best? In
this case, fertilizer is called a factor, which
is the condition under test.
A, B, C, the three types of fertilizer
under test, are called levels of the factor
fertilizer.
Or what if we want to know if treatment A
or treatment B or treatment C or treatment D is
best? In this case, treatment is called a
factor.
A,B,C,D, the four types of treatment under
test, are called levels of the factor treatment.
It should be noted that the factor may be
quantitative or qualitative.

Enter the analysis of variance!
ANOVA, as it is usually called, is a way to
test the differences between means in such
situations.
Previously, we tested single-factor
experiments with only two treatment levels.
These experiments are called single-factor
because there is only one factor under test.
Single-factor experiments are more commonly
called one-way experiments.
Now we move to single-factor experiments with
more than two treatment levels.

Lets start with some notation.
Yij ith observation in the jth level
N total number of experimental
observations
the grand mean of all N
experimental observations
the mean of the observations
in the jth level
nj number of observations in the jth
level the nj are called replicates.
Replication of the design refers to using
more than one experimental unit for each level.

Designs are more powerful if they are
balanced, but balance is not always possible.
Suppose you are doing an experiment and the
equipment breaks down on one of the tests. Now,
not by design but by circumstance, you have
unequal numbers of replicates for the levels.
In all the formulas, we used nj as the number
of replicates in treatment j, not n, so there is
no problem.

Notation continued
the effect of the jth level
J number of treatment levels
eij the error associated with the ith
observation in the jth level,
assumed to be independent normally distributed
random variables with mean 0 and variance
s2, which are constant for all levels of the
factor.

For all experiments, randomization is
critical. So to draw any conclusions from the
experiment, we must require that the treatments
be applied in random order.
We must also assign the experimental units to
the treatments randomly.
If all this randomization occurs, the design
is called a completely randomized design.

ANOVA begins with a linear statistical model

This model is for a one-way or single-factor
ANOVA. The goal of the model is to test
hypotheses about the treatment effects and to
estimate them.
If the treatments have been selected by the
experimenter, the model is called a fixed-effects
model. In this case, the conclusions will apply
only to the treatments under consideration.

Another type of model is the random effects
model or components of variance model.
In this situation, the treatments used are a
random sample from large population of
treatments. Here the ti are random variables and
we are interested in their variability, not in
the differences among the means being tested.

First, we will talk about fixed effects,
completely randomized, balanced models.
In the model we showed earlier, the tj are
defined as deviations from the grand mean so
It follows that the mean of the jth treatment
is

Now the hypothesis under test is
Ho µ1 µ2 µ3 µJ
Ha µj? µk for at least one j,k
pair
The test procedure is ANOVA, which is a
decomposition of the total sum of squares into
its components parts according to the model.

The total SS is
and ANOVA is about dividing it into its
component parts.
SS variability of the differences
among the J levels
SSe pooled variability of the random
error within levels

This is easy to see because
But the cross-product term vanishes because

So SStotal SS treatments SS error
Most of the time, this is called
SStotal SS between SS within
Each of these terms becomes an MS (mean
square) term when divided by the appropriate df.

The df for SSerror N-J because
and the df for SSbetween J-1 because
there are J levels.

Now the expected values of each of these terms
are
E(MSerror) s2
E(MStreatments)

Now if there are no differences among the
treatment means, then for all j.
So we can test for differences with our old
friend F
with J -1 and N -J df.
Under Ho, both numerator and denominator are
estimates of s2 so the result will not be
significant.
Under Ha, the result should be significant
because the numerator is estimating the treatment
effects as well as s2.

The results of an ANOVA are presented in an
ANOVA table. For this one-way, fixed-effects,
balanced model
Source SS df MS p
Model SSbetween J-1 MSbetween p
Error SSwithin N-J MSwithin
Total SStotal N-1

Lets look at a simple example.
A product engineer is investigating the
tensile strength of a synthetic fiber to make
mens shirts. He knows from prior experience
that the strength is affected by the weight
percent of cotton in the material. He also knows
that the percent should range between 10 and
40 so that the shirts can receive permanent
press treatment.

The engineer decides to test 5 levels
15, 20, 25, 30, 35
and to have 5 replicates in this design.
His data are

15 7 7 15 11 9 9.8
20 12 17 12 18 18 15.4
25 14 18 18 19 19 17.6
30 19 25 22 19 23 21.6
35 7 10 11 15 11 10.8
15.04
25

In this tensile strength example, the
ANOVA table is
In this case, we would reject Ho and declare
that there is an effect of the cotton weight
percent.

Source SS df MS p
Model 475.76 4 118.94 lt0.01 Error
161.20 20 8.06 Total 636.96
24
26

We can estimate the treatment parameters by
subtracting the grand mean from the treatment
means. In this example,
t1 9.80 15.04 -5.24
t2 15.40 15.04 0.36
t3 17.60 15.04 -2.56
t4 21.60 15.04 6.56
t5 10.80 15.04 -4.24
Clearly, treatment 4 is the best because it
provides the greatest tensile strength.

Now you could have computed these values from
the raw data yourself instead of doing the ANOVA.
You would get the same results, but you wouldnt
know if treatment 4 was significantly better.
But if you did a scatter diagram of the
original data, you would see that treatment 4 was
best, with no analysis whatsoever.
In fact, you should always look at the
original data to see if the results do make
sense. A scatter diagram of the raw data usually
tells as much as any analysis can.

28
(No Transcript)
29

How do you test the adequacy of the model?
The model assumes certain assumptions that must
hold for the ANOVA to be useful. Most
importantly, that the errors are distributed
normally and independently.
The error for each observation, sometimes
called the residual, is

A residual check is very important for testing
for nonconstant variance. The residuals should
be structureless, that is, they should have no
pattern whatsoever, which, in this case, they do
not.

These residuals show no extreme differences in
variation because they all have about the same
spread.
They also do not show the presence of any
outlier. An outlier is a residual value that is
vey much larger than any of the others. The
presence of an outlier can seriously jeopardize
the ANOVA, so if one is found, its cause should
be carefully investigated.

A histogram of residuals shows the
distribution is slightly skewed. Small
departures from symmetry are of less concern than
heavy tails.

Another check is for normality. If we do a
normal probability plot of the residuals, we can
see whether normality holds.

A normal probability plot is made with
ascending ordered residuals on the x-axis and
their cumulative probability points, 100(k-.5)/n,
on the y-axis. k is the order of the residual and
n number of residuals. There is no evidence of
an outlier here.
The previous slide is not exactly a normal
probability plot because the y-axis is not
scaled properly. But it does gives a pretty good
suggestion of linearity.

A plot of residuals vs run order is useful to
detect correlation between the residuals, a
violation of the independence assumption.
Runs of positive or of negative residuals
indicates correlation. None is observed here.

One of the goals of the analysis is to
estimate the level means. If the results of the
ANOVA shows that the factor is significant, we
know that at least one of the means stands out
from the rest. But which one or ones?
The procedures for making these mean
comparisons are called multiple comparison
methods. These methods use linear combinations
called contrasts.

A contrast is a particular linear combination
of level means, such as
to test the difference between level 4 and level
5.
Or if one wished to test the average of levels
1 and 3 vs levels 4 and 5, he would use
.
In general, where

An important case of contrasts is called
orthogonal contrasts. Two contrasts in a design
with coefficients cj and dj are orthogonal if

There are many ways to choose the orthogonal
contrast coefficients for a set of levels. For
example, if level 1 is a control and levels 2 and
3 are two real treatments, a logical choice is to
compare the average of the two treatments with
the control
and then the two treatments against one
another
These two contrasts are orthogonal because

Only J-1 orthogonal contrasts may be chosen
because the J levels have only J-1 df. So for
only three levels, the contrasts chosen exhaust
those available for this experiment.
Contrasts must be chosen before seeing the data
so that experimenters arent tempted to contrast
the levels with the greatest differences.

For the tensile strength experiment with 5
levels and thus 4 df, the 4 contrasts are
C1 0(5)(9.8)0(5)(15.4)0(5)(17.6)-1(5)(21.6)
1(5)(10.8) -54
C2 1(5)(9.8)0(5)(15.4)1(5)(17.6)-1(5)(21.6)-
1(5)(10.8) -25
C3 1(5)(9.8)0(5)(15.4)-1(5)(17.6)0(5)(21.6)
0(5)(10.8) -39
C4 -1(5)(9.8)4(5)(15.4)-1(5)(17.6)-1(5)(21.6)-
1(5)(10.8) 9
These 4 contrasts completely partition the
SStreatments. Then the SS for each contrast is
formed

So for the 4 contrasts we have

Now the revised ANOVA table is
Source SS df MS p
Weight 475.76 4 118.94 lt0.001
C1 291.60 1 291.60 lt0.001
C2 31.25 1 31.25 lt0.06
C3 152.10 1 152.10 lt0.001
C4 0.81 1 0.81 lt0.76
Error 161.20 20 8.06
Total 636.96 24

So contrast 1 (level 5 level 4) and contrast
3 (level 1 level 3) are significant.
Although the orthogonal contrast approach is
widely used, the experimenter may not know in
advance which levels to test or they may be
interested in more than L-1 comparisons. A
number of other methods are available for such
testing.

These methods include
Scheffes Method
Least Significant Difference Method
Duncans Multiple Range Test
Newman-Keuls test
There is some disagreement about which is the
best method, but it is best if all are applied
only after there is significance in the overall F
test.

Now lets look at the random effects model.
Suppose there is a factor of interest with an
extremely large number of levels. If the
experimenter selects L of these levels at random,
we have a random effects model or a components of
variance model.

The linear statistical model is
as before, except that both and
are random variables instead of simply .
Because and are independent, the variance
of any observation is
These two variances are called variance
components, hence the name of the model.

The requirements of this model are that the
are NID(0,s2), as before, and that the
are NID(0, ) and that and are
independent. The normality assumption is not
required in the random effects model.
As before,
SSTotal SStreatments SSerror
And the E(MSerror) s2.
But now E(MStreatments) s2 n
So the estimate of is

The computations and the ANOVA table are the
same as before, but the conclusions are quite
different.
Lets look at an example.
A textile company uses a large number of
looms. The process engineer suspects that the
looms are of different strength, and selects 4
looms at random to investigate this.

The results of the experiment are shown in the
table below.
The ANOVA table is
Source SS df MS
p
Looms 89.19 3 29.73 lt0.001
Error 22.75 12 1.90
Total 111.94 15

Loom
1 98 97 99 96 97.5
2 91 90 93 92 91.5
3 96 95 97 95 95.75
4 95 96 99 98 97.0
95.44
51

In this case, the estimates of the variances
are
1.90
Thus most of the variability in the
observations is due to variability in loom
strength. If you can isolate the causes of this
variability and eliminate them, you can reduce
the variability of the output and increase its
quality.

When we studied the differences between two
treatment means, we considered repeated measures
on the same individual experimental unit.
With three or more treatments, we can still do
this. The result is a repeated measures design.

Consider a repeated measures ANOVA partitioning
the SSTotal.
This is the same as
SStotal SSbetween subjects SSwithin
subjects
The within-subjects SS may be further
partitioned into SStreatment SSerror .

In this case, the first term on the RHS is the
differences between treatment effects and the
second term on the RHS is the random error.

Now the ANOVA table looks like this.
Source SS df MS p
Between subjects n-1
Within Subjects
n(J-1)
Treatments
J-1
Error
(J-1)(n-1)
Total
Jn-1

The test for treatment effects is the usual
but now it is done entirely within subjects.
This design is really a randomized complete
block design with subjects considered to be the
blocks.

Now what is a randomized complete blocks
design?
Blocking is a way to eliminate the effect of a
nuisance factor on the comparisons of interest.
Blocking can be used only if the nuisance factor
is known and controllable.

Lets use an illustration. Suppose we want to
test the effect of four different tips on the
readings from a hardness testing machine.
The tip is pressed into a metal test coupon,
and from the depth of the depression, the
hardness of the coupon can be measured.

The only factor is tip type and it has four
levels. If 4 replications are desired for each
tip, a completely randomized design would seem to
be appropriate.
This would require assigning each of the 4x4
16 runs randomly to 16 different coupons.
The only problem is that the coupons need to
be all of the same hardness, and if they are not,
then the differences in coupon hardness will
contribute to the variability observed.
Blocking is the way to deal with this problem.

In the block design, only 4 coupons are used
and each tip is tested on each of the 4 coupons.
So the blocking factor is the coupon, with 4
levels.
In this setup, the block forms a homogeneous
unit on which to test the tips.
This strategy improves the accuracy of the tip
comparison by eliminating variability due to
coupons.

Because all 4 tips are tested on each coupon,
the design is a complete block design. The data
from this design are shown below.

Test coupon Test coupon Test coupon Test coupon
Tip type 1 2 3 4
1 9.3 9.4 9.6 10.0
2 9.4 9.3 9.8 9.9
3 9.2 9.4 9.5 9.7
4 9.7 9.6 10.0 10.2
62

Now we analyze these data the same way we did
for the repeated measures design. The model is
where ßk is the effect of the kth block and the
rest of the terms are those we already know.

Since the block effects are deviations from the
grand mean,
just as

We can express the total SS as
which is equivalent to
SStotal SStreatments SSblocks SSerror
with df
N-1 J-1 K-1 (J-1)(K-1)

The test for equality of treatment means
is
and the ANOVA table is
Source SS df MS p
Treatments SStreatments J-1
MStreatments
Blocks SSblocks
K-1 MSblocks
Error SSerror
(J-1)(K-1) MSerror
Total SStotal
N-1

For the hardness experiment, the ANOVA table is
Source SS df MS p
Tip type 38.50 3 12.83 0.0009
Coupons 82.50 3 27.50
Error 8.00 9 .89
Total 129.00 15
As is obvious, this is the same analysis as the
repeated measures design.

Now lets consider the Latin Square design.
Well introduce it with an example.
The object of study is 5 different formulations
of a rocket propellant on the burning rate of
aircraft escape systems. Each formulation comes
from a batch of raw material large enough for
only 5 formulations. Moreover, the formulations
are prepared by 5 different operators, who differ
in skill and experience.

The way to test in this situation is with a
5x5 Latin Square, which allows for double
blocking and therefore the removal of two
nuisance factors. The Latin Square for this
example is

Batches of raw material Operators Operators Operators Operators Operators
Batches of raw material 1 2 3 4 5
1 A B C D E
2 B C D E A
3 C D E A B
4 D E A B C
5 E A B C D
69

Note that each row and each column has all 5
letters, and each letter occurs exactly once in
each row and column.
The statistical model for a Latin Square is
where Yjkl is the jth treatment observation in
the kth row and the lth column.

Again we have
SStotalSSrowsSScolumnsSStreatmentsSSerror
with df
N R-1 C-1 J-1 (R-2)(C-1)
The ANOVA table for propellant data is
Source SS df MS p
Formulations 330.00 4 82.50
0.0025
Material batches 68.00 4
17.00
Operators 150.00 4
37.50 0.04
Error 128.00 12
10.67
Total 676.00 24

So both the formulations and the operators
were significantly different. The batches of raw
material were not, but it still is a good idea to
block on them because they often are different.
This design was not replicated, and Latin
Squares often are not, but it is possible to put
n replicates in each cell.

Now if you superimposed one Latin Square on
another Latin Square of the same size, you would
get a Graeco-Latin Square. In one Latin Square,
the treatments are designated by roman letters.
In the other Latin Square, the treatments are
designated by Greek letters.
Hence the name Graeco-Latin Square.

A 5x5 Graeco-Latin Square is
Note that the five Greek treatments appear
exactly once in each row and column, just as the
Latin treatments did.

Batches of raw material Operators Operators Operators Operators Operators
Batches of raw material 1 2 3 4 5
1 Aa B? Ce Dß Ed
2 Bß Cd Da E? Ae
3 C? De Eß Ad Ba
4 Dd Ea A? Be Cß
5 Ee Aß Bd Ca D?
74

If Test Assemblies had been added as an
additional factor to the original propellant
experiment, the ANOVA table for propellant data
would be
Source SS df MS p
Formulations 330.00 4 82.50
0.0033
Material batches 68.00 4
17.00
Operators 150.00 4
37.50 0.0329
Test Assemblies 62.00 4 15.50
Error 66.00 8
8.25
Total 676.00 24
The test assemblies turned out to be
nonsignificant.

Note that the ANOVA tables for the Latin Square
and the Graeco-Latin Square designs are
identical, except for the error term.
The SS(error) for the Latin Square design was
decomposed to be both Test Assemblies and error
in the Graeco-Latin Square. This is a good
example of how the error term is really a
residual. Whatever isnt controlled falls into
error.

Before we leave one-way designs, we should look
at the regression approach to ANOVA. The model
is
Using the method of least squares, we rewrite
this as

Now to find the LS estimates of µ and tj,
When we do this differentiation with respect to
µ and tj, and equate to 0, we obtain
for all j

After simplification, these reduce to
In these equations,

These j 1 equations are called the least
squares normal equations.
If we add the constraint
we get a unique solution to these normal
equations.

It is important to see that ANOVA designs are
simply regression models. If we have a one-way
design with 3 levels, the regression model is
where Xi1 1 if from level 1
0 otherwise
and Xi2 1 if from level 2
0 otherwise
Although the treatment levels may be
qualitative, they are treated as dummy
variables.

Since Xi1 1 and Xi2 0,
so
Similarly, if the observations are from level
2,
so

Finally, consider observations from level 3,
for which Xi1 Xi2 0. Then the regression
model becomes
so
Thus in the regression model formulation of
this one-way ANOVA with 3 levels, the regression
coefficients describe comparisons of the first
two level means with the third.

So
Thus, testing ß1 ß2 0 provides a test of
the equality of the three means.
In general, for J levels, the regression model
will have J-1 variables
and

Now what if you have two factors under test?
Or three? Or four? Or more?
Here the answer is the factorial design. A
factorial design crosses all factors. Lets take
a two-way design. If there are J levels of
factor A and K levels of factor B, then all JK
treatment combinations appear in the experiment.
Most commonly, J K 2.

In a two-way design, with two levels of each
factor, we have, where -1 and 1 are codes for
low and high levels, respectively
We can have as many replicates as we want in
this design. With n replicates, there are n
observations in each cell of the design.

Factor A Factor B Response
-1 (low level) -1 (low level) 20
1 (high level) -1 (low level) 40
-1 (low level) 1 (high level) 30
1 (high level) 1 (high level) 52
86

SStotal SSA SSB SSAB SSerror
This decomposition should be familiar by now
except for SSAB. What is this term? Its
official name is interaction.
This is the magic of factorial designs. We
find out about not only the effect of factor A
and the effect of factor B, but the effect of the
two factors in combination.

How do we compute main effects? The main
effect of factor A is the difference between the
average response at A high and the average
response at A low,
Similarly, the B effect is the difference
between the average response at B high and the
average response at B low

So the main effect of factor A is 21 and the
main effect of factor B is 11.
That is, changing the level of factor A from
the low level to the high level brings a response
increase of 21 units.
And changing the level of factor B from the low
level to the high level increases the response by
11 units.

The plots below show the main effects of
factors A and B.

Both A and B are significant, which you can
see by the fact that the slope is not 0.
A 0 slope in the effect line that connects the
response at the high level with the response at
the low level indicates that it doesnt matter to
the response whether the factor is set at its
high value or its low value, so the effect of
such a factor is not significant.
Of course, the p value from the F test gives
the significance of the factors precisely, but it
is usually evident from the effects plots.

Now how do you compute the interaction effect?
Interaction occurs when the difference in
response between the levels of one factor are
different at the different levels of the other
factor. That is,
The first term here is the difference between
the two levels of factor A at the low level of
factor B. That is, 40 -20 20.
But the difference between the two levels of
factor A at the high level of factor B is
52-30 22.

Then the size of the interaction effect is
(22 -20) / 2 1
and the interaction is not significant. The
interaction plot below shows almost parallel
lines, which indicates no interaction.

Now suppose the two factors are quantitative,
like temperature, pressure, time, etc. Then you
could write a regression model version of the
design.
As before, X1 represents factor A and X2
represents factor B. X1X2 is the interaction
term, and e is the error term.
The parameter estimates for this model turn out
to be related to the effect estimates.

The parameter estimates are
So the model is

With this equation, you can find all the
effects of the design. For example, if you want
to know the mean when both A and B are at the
high (1) level, the equation is
Now if you want the mean when A is at the high
level and B is at the low level, the equation is
All you have to do is fill in the values of X1
and X2 with the appropriate codes, 1 or -1.

Now suppose the data in this experiment are
Now lets look at the main and interaction
effects.

Factor A Factor B Response
-1 (low level) -1 (low level) 20
1 (high level) -1 (low level) 50
-1 (low level) 1 (high level) 40
1 (high level) 1 (high level) 12
97

The main effect of factor A is
and the main effect of factor B is
The interaction effect is
(50- 20) (12-40) 3028 58
which is very high and is significant.

Now lets look at the main effects of the
factors graphically.

Clearly, factor A is not significant, which you
can see by the approximately 0 slope.
Factor B is probably significant because the
slope is not close to 0. The p value from the F
test gives the actual significance.

100

Now lets look at the interaction effect.
This is the effect of factors A and B in
combination, and is often the most important
effect.

101

Now these two lines are definitely not
parallel, so there is an interaction. It
probably is very significant because the two
lines cross.
Only the p value associated with the F
test can give the actual significance, but you
can see with the naked eye that there is no
question about significance here.

102

Interaction of factors is the key to the East,
as we say in the West.
Suppose you wanted the factor levels that give
the lowest possible response. If you picked by
main effects, you would pick A low and B high.
But look at the interaction plot and it will
tell you to pick A high and B high.

103

This is why, if the interaction term is
significant, you never, never, never interpret
the corresponding main effects. They are
meaningless in the presence of interaction.
And it is because factorial designs provide
the ability to test for interactions that they
are so popular and so successful.

104

You can get response surface plots for these
regression equations. If there is no
interaction, the response surface is a plane in
the 3rd dimension above the X1,X2 Cartesian
space. The plane may be tilted, but it is still
a plane.
If there is interaction, the response surface
is a twisted plane representing the curvature in
the model.

105

The simplest factorials are two-factor
experiments.
As an example, a battery must be designed to
be robust to extreme variations in temperature.
The engineer has three possible choices for the
plate material. He decides to test all three
plate materials at three temperatures,
-15F, 70F, 125F. He tests four batteries
at each combination of material type and
temperature. The response variable is battery
life.

106

Here are
the data
he got.

Plate material type Temperature (F) Temperature (F) Temperature (F)
Plate material type -15 70 125
1 130 34 20
1 74 40 70
1 155 80 82
1 180 75 58
2 150 136 25
2 159 122 70
2 188 106 58
2 126 115 45
3 138 174 96
3 110 120 104
3 168 150 82
3 160 139 60
107

The model here is
Both factors are fixed so we have the same
constraints as before
and
In addition,

108

The experiment has n 4 replicates, so there
are nJK total observations.

109

The total sum of squares can be partitioned
into four components
SStotal SSA SSB SSAB SSe

110

and the ANOVA table is
Source SS df MS p
A SSA J-1
B SSB K-1
AB SSAB (J-1)(K-1)
Error SSe JK(n-1)
Total SStotal JKn -1

111

For the battery life experiment,

Material type Temperature (F) Temperature (F) Temperature (F) Temperature (F)
Material type -15 70 125
1 134.75 57.25 57.50 83.17
2 155.75 119.75 49.50 108.33
3 144.00 145.75 85.50 125.08
144.83 107.58 64.17
112

The ANOVA table is
Source SS df MS p
Material 10,683.72 2 5,341.86
0.0020
Temperature 39,118.72 2 19,558.36
0.0001
Interaction 9,613.78 4 2,403.44
0.0186
Error 18,230.75 27
675.21
Total 77,646.97 35

113

Because the interaction is significant, the
only plot of interest is the interaction plot.

114

Although it is not the best at the lowest
temperature, Type 3 is much better than the other
two at normal and high temperatures. Its life at
the lowest temperature is just an average of 12
hours less than the life with Type 2.
Type 3 would probably provide the design most
robust to temperature differences.

115

Suppose you have a factorial design with more
than two factors. Take, for example, a three-way
factorial design, where the factors are A, B, and
C.
All the theory is the same, except that now you
have three 2-way interactions, AB, AC, BC, and
one 3-way interaction, ABC.

116

Consider the problem of soft-drink bottling.
The idea is to get each bottle filled to a
uniform height, but there is variation around
this height. Not every bottle is filled to the
same height.
The process engineer can control three
variables during the filling process percent
carbonation (A), operating pressure (B), and
number of bottles produced per minute or line
speed (C).

117

The engineer chooses three levels of
carbonation (factor A), two levels of pressure
(factor B), and two levels for line speed (factor
C). This is a fixed effects design. He also
decides to run two replicates.
The response variable is the average deviation
from the target fill height in a production run
of bottles at each set of conditions. Positive
deviations are above the target and negative
deviations are below the target.

118

The data are

Operating pressure (B) Operating pressure (B) Operating pressure (B) Operating pressure (B)
Percent carbonation (A) 25 psi 25 psi 30 psi 30 psi
Percent carbonation (A) line speed (C) line speed (C) line speed (C) line speed (C)
Percent carbonation (A) 200 250 200 250
10 -3 -1 -1 1
10 -1 0 0 1
12 0 2 2 6
12 1 1 3 5
14 5 7 7 10
14 4 6 9 11
119

The 3way means are

Operating pressure (B) Operating pressure (B) Operating pressure (B) Operating pressure (B)
Percent carbonation (A) 25 psi 25 psi 30 psi 30 psi
Percent carbonation (A) line speed (C) line speed (C) line speed (C) line speed (C)
Percent carbonation (A) 200 250 200 250
10 -2 -.5 -.5 1
12 .5 1.5 2.5 5.5
14 4.5 6.5 8 10.5
120

The 2-way means are

B (low) B (high)
A 25 psi 30 psi
10 -1.25 0.25
12 1.00 4.00
14 5.50 9.25
C (low) C (high)
A 200 250
10 -1.25 0.25
12 1.50 3.50
14 6.25 8.50
C (low) C (high)
B 200 250
25 psi 1.00 2.50
30 psi 3.33 5.67
121

The main effect means are

Factor A Mean
10 -0.500
12 2.500
14 7.375
Factor B Mean
25 psi 1.75
30 psi 4.50
Factor C Mean
200 2.167
250 4.083
122

The ANOVA table is
Source SS df MS
p
A 252.750 2 126.375 lt0.0001
B 45.375 1 45.375
lt0.0001
C 22.042 1 22.042
0.0001
AB 5.250 2 2.625
0.0557
AC 0.583 2 0.292
0.6713
BC 1.042 1 1.042
0.2485
ABC 1.083 2 0.542
0.4867
Error 8.500 12 0.708
Total 336.625 23

123

So the only significant effects are those for
A, B, C, AB. The AB interaction is barely
significant, so interpretation must be tempered
by what we see in the A and B main effects. The
plots are shown next.

124

The plots are

125

Our goal is to minimize the response. Given
the ANOVA table and these plots, we would choose
the low level of factor A, 10 carbonation, and
the low level of factor B, 25 psi. This is true
whether we look at the two main effects plots or
the interaction plot. This is because the
interaction is barely significant.
We would also choose the slower line speed, 200
bottles per minute.

126

Now suppose you do an experiment where you
suspect nonlinearity and want to test for both
linear and quadratic effects.
Consider a tool life experiment, where the life
of a tool is thought to be a function of cutting
speed and tool angle. Three levels of each
factor are used. So this is a 2-way factorial
fixed effects design.

127

The three levels of cutting speed are 125,
150, 175. The three levels of tool angle are
15, 20, 25. Two replicates are used and the
data are shown below.

Tool Angle (degrees) Cutting Speed (in/min) Cutting Speed (in/min) Cutting Speed (in/min)
Tool Angle (degrees) 125 150 175
15 -2 -3 2
15 -1 0 3
20 0 1 4
20 2 3 6
25 -1 5 0
25 0 6 -1
128

The ANOVA table for this experiment is
Source SS df MS p
Tool Angle 24.33 2 12.17 0.0086
Cut Speed 25.33 2 12.67 0.0076
TC 61.34 4 15.34 0.0018
Error 13.00 9 1.44
Total 124.00 17

129

The table of cell and marginal means is

Factor T Factor C Factor C Factor C
Factor T 125 150 175
15 -1.5 -1.5 2.5 -0.167
20 1.0 2.0 5.0 2.667
25 -0.5 5.5 -0.5 1.500
-0.33 2.0 2.33
130

Clearly there is reason to suspect quadratic
effects here. So we can break down each factors
df into linear and quadratic components.
We do this by using orthogonal contrasts. The
contrast for linear is
-1, 0, 1 and the contrast for quadratic is 1,
-2, 1.

131

We need a table of factor totals to proceed.
For factor T,
Now applying the linear and quadratic
contrasts to these sums,

Factor T Sum of Obs
15 -1
20 16
25 9
Factor T Sum of Obs Linear Quadratic
15 -1 -1 1
20 16 0 -2
25 9 1 1
Contrast Contrast 10 -24
132

Now to find the SS due to these two new
contrasts,

133

Now we can do the same thing for factor C. The
table of sums with the contrasts included is
Now for the SS due to each contrast,

Factor C Sum of Obs Linear Quadratic
125 -2 -1 1
150 12 0 -2
175 14 1 1
Contrast Contrast 16 -12
134

Now we can write the new ANOVA table
Source SS df MS p
Tool angle 24.33 2 12.17 0.0086
Linear 8.33 1 8.33
0.0396
Quad 16.00 1 16.00 0.0088
Cut Speed 25.33 2 12.67 0.0076
Linear 21.33 1 21.33 0.0039
Quad 4.00 1 4.00 0.1304
TC 61.34 4 15.34 0.0018
Error 13.00 9 1.44
Total 124.00 17

135

Now see how the df for each of the factors has
been split into its two components, linear and
quadratic. It turns out that everything except
the quadratic for Cutting Speed is significant.
Now guess what! There are 4 df for the
interaction term and why not split them into
linear and quadratic components as well. It
turns out that you can get TlinClin, TlinCquad,
TquadClin, and TquadCquad.
These 4 components use up the 4 df for the
interaction term.

136

There is reason to believe the quadratic
component in the interaction, as shown below, but
well pass on this for now.

137

Now lets talk about blocking in a factorial
design. The concept is identical to blocking in
a 1-way design. There is either a nuisance
factor or it is not possible to completely
randomize all the runs in the design.
For example, there simply may not be enough
time to run the entire experiment in one day, so
perhaps the experimenter could run one complete
replicate on one day and another complete
replicate on the second day, etc. In this case,
days would be a blocking factor.

138

Lets look at an example. An engineer is
studying ways to improve detecting targets on a
radar scope. The two factors of importance are
background clutter and the type of filter placed
over the screen.
Three levels of background clutter and two
filter types are selected to be tested. This is
a fixed effects 2 x 3 factorial design.

139

To get the response, a signal is introduced
into the scope and its intensity is increased
until an operator sees it. Intensity at
detection is the response variable.
Because of operator availability, an operator
must sit at the scope until all necessary runs
have been made. But operator differ in skill and
ability to use the scope, so it makes sense to
use operators as a blocking variable.

140

Four operators are selected for use in the
experiment. So each operator receives the 2 x 3
6 treatment combinations in random order, and
the design is a completely randomized block
design. The data are

Operators 1 1 2 2 3 3 4 4
Filter type 1 2 1 2 1 2 1 2
Ground clutter
Low 90 86 96 84 100 92 92 81
Medium 102 87 106 90 105 97 96 80
High 114 93 112 91 108 95 98 83
141

Since each operator (block) represents the
complete experiment, all effects are within
operators. The ANOVA table is
Source SS df MS
p
Within blocks 1479.33 5 295.87
lt0.000001
Ground clutter 335.58 2 167.79
lt0.0003
Filter type 1066.67 1 1066.67
lt0.0001
GF interaction 77.08 2 38.54
0.0573
Between blocks 402.17 3 134.06
lt0.0002
Error 166.33 15 11.09
Total 2047.83 23

142

The effects of both the background clutter and
the filter type are highly significant. Their
interaction is marginally significant.
As suspected, the operators are significantly
different in their ability to detect the signal,
so it is good that they were used as blocks.

143

Now lets look at the 2k factorial design.
This notation means that there are k factors,
each at 2 levels, usually a high and a low level.
These factors may be qualitative or
quantitative.
This is a very important class of designs and
is widely used in screening experiments. Because
there are only 2 levels, it is assumed that the
response is linear over the range of values
chosen.

144

Lets look at an example of the simplest of
these designs, the 22 factorial design.
Consider the effect of reactant concentration
(factor A) and amount of catalyst (factor B) on
the yield in a chemical process. The 2 levels of
factor A are 15 and 25. The 2 levels of
factor B are 1 pound and 2 pounds. The
experiment is replicated three times.

145

Here are the data.
This design can be pictured as rectangle.
20 30
1
factor B
-1
26.67
33.33
-1 factor A
1

Factor A Factor B Replicate 1 Replicate 2 Replicate 3
-1 (low) -1 (low) 28 25 27
1 (high) -1 (low) 36 32 32
-1 (low) 1 (high) 18 19 23
1 (high) 1 (high) 31 30 29
146

The interaction effect can also be derived from
this table.
Multiplying the A and B factor level codes
gets the AB interaction. This is always the way
interactions are derived. Now averaging
according to the AB coefficients gives the
interaction effect.

Factor A Factor B AB interaction
-1 (low) -1 (low) (-1)(-1) 1
1 (high) -1 (low) (1)(-1) -1
-1 (low) 1 (high) (-1)(1) -1
1 (high) 1 (high) (1)(1) 1
147

Now we can find the effects easily from the
table below.

A B AB Replicate average
-1 -1 1 26.67
1 -1 -1 33.33
-1 1 -1 20
1 1 1 30
148

Because there are only first-order effects,
the response surface is a plane. Yield increases
with increasing reactant concentration (factor A)
and decreases with increasing catalyst amount
(factor B).

149

The ANOVA table is
Source SS df MS p
A 208.33 1 208.33 lt0.0001
B 75.00 1 75.00 lt0.0024
AB 8.33 1 8.33 0.1826
Error 31.34 8 3.92
Total 323.00 11

150

It is clear that both main effects are
significant and that there is no AB interaction.
The regression model is
where the ß coefficients are ½ the effects, as
before. 27.5 is the grand average of all 12
observations.

151

Now lets look at the 23 factorial design. In
this case, there are three factors, each at 2
levels. The design is

Run A B C AB AC BC ABC
1 -1 -1 -1 (-1)(-1) 1 (-1)(-1) 1 (-1)(-1) 1 (-1)(-1)(-1) -1
2 1 -1 -1 (1)(-1) -1 (1)(-1) -1 (-1)(-1) 1 (1)(-1)(-1) 1
3 -1 1 -1 (-1)(1) -1 (-1)(-1) 1 (1)(-1) -1 (-1)(1)(-1) 1
4 1 1 -1 (1)(1) 1 (1)(-1) -1 (1)(-1) -1 (1)(1)(-1) -1
5 -1 -1 1 (-1)(-1) 1 (-1)(1) -1 (-1)(1) -1 (-1)(-1)(1) 1
6 1 -1 1 (1)(-1) -1 (1)(1) 1 (-1)(1) -1 (1)(-1)(1) -1
7 -1 1 1 (-1)(1) -1 (-1)(1) -1 (1)(1) 1 (-1)(1)(1) -1
8 1 1 1 (1)(1) 1 (1)(1) 1 (1)(1) 1 (1)(1)(1) 1
152

Remember the beverage filling study we talked
about earlier? Now assume that each of the 3
factors has only two levels.
So we have factor A ( carbonation) at levels
10 and 12.
Factor B (operating pressure) is at levels 25
psi and 30 psi.
Factor C (line speed) is at levels 200 and 250.

153

Now our experimental matrix becomes

Run A Percent carbonation B Operating pressure C Line speed Replicate 1 Replicate 2 Total of obs
1 10 25 200 -3 -1 -4
2 12 25 200 0 1 1
3 10 30 200 -1 0 -1
4 12 30 200 2 3 5
5 10 25 250 -1 0 -1
6 12 25 250 2 1 3
7 10 30 250 1 1 2
8 12 30 250 6 5 11
154

And our design matrix is
From this matrix, we can determine all our
effects by applying the linear codes and dividing
by 8, the number of terms being averaged.

Run A B C AB AC BC ABC Replicate 1 Replicate 2 Total of obs
1 -1 -1 -1 1 1 1 -1 -3 -1 -4
2 1 -1 -1 -1 -1 1 1 0 1 1
3 -1 1 -1 -1 1 -1 1 -1 0 -1
4 1 1 -1 1 -1 -1 -1 2 3 5
5 -1 -1 1 1 -1 -1 1 -1 0 -1
6 1 -1 1 -1 1 -1 -1 2 1 3
7 -1 1 1 -1 -1 1 -1 1 1 2
8 1 1 1 1 1 1 1 6 5 11
155

The effects are

156

The ANOVA table is
Source SS df MS
p
A Percent carb 36.00 1 36.00
lt0.0001
B Op Pressure 20.25 1 20.25
lt0.0005
C Line speed 12.25 1 12.25
0.0022
AB 2.25 1
2.25 0.0943
AC 0.25 1
0.25 0.5447
BC 1.00 1
1.00 0.2415
ABC 1.00 1 1.00
0.2415
Error 5.00 8
0.625
Total 78.00 15
There are only 3 significant effects, factors
A, B, and C. None of the interactions is
significant.

157

The regression model for soft-drink fill height
deviation is
Because the interactions are not significant,
they are not included in the regression model.
So the response surface here is a plane at each
level of line speed.

158

All along we have had at least 2 replicates
for each design so we can get an error term.
Without the error term, how do we create the
F-ratio to test for significance?
But think about it. A 24 design has 16 runs.
With 2 replicates, that doubles to 32 runs. The
resources need for so many runs are often not
available, so some large designs are run with
only 1 replicate.

159

Now what do we do for an error term to test for
effects?
The idea is to pool some high-level
interactions under the assumption that they are
not significant anyway and use them as an error
term. If indeed they are not significant, this
is OK. But what if you pool them as error and
they are significant? This is not OK.

160

So it would be nice to know before we pool,
which terms are actually poolable. Thanks to
Cuthbert Daniel, we can do this. Daniels idea is
to do a normal probability plot of the effects.
All negligible effects will fall along a line
and those that do not fall along the line are
significant. So we may pool all effects that are
on the line. The reasoning is that the
negligible effects, like error, are normally
distributed with mean 0 and variance s2 and so
will fall along the line.

161

Lets look at an example of a chemical
product. The purpose of this experiment is to
maximize the filtration rate of this product, and
it is thought to be influenced by 4 factors
temperature (A), pressure (B), concentration of
formaldehyde (C), and stirring rate (D).

162

The design matrix and response are

Run A B C D AB AC BC AD BD CD ABC ABD ACD BCD ABCD Filt rate
1 -1 -1 -1 -1 1 1 1 1 1 1 -1 -1 -1 -1 1 45
2 1 -1 -1 -1 -1 -1 1 -1 1 1 1 1 1 -1 -1 71
3 -1 1 -1 -1 -1 1 -1 1 -1 1 1 1 -1 1 -1 48
4 1 1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 1 1 1 65
5 -1 -1 1 -1 1 -1 -1 1 1 -1 1 -1 1 1 -1 68
6 1 -1 1 -1 -1 1 -1 -1 1 -1 -1 1 -1 1 1 60
7 -1 1 1 -1 -1 -1 1 1 -1 -1 -1 1 1 -1 1 80
8 1 1 1 -1 1 1 1 -1 -1 -1 1 -1 -1 -1 -1 65
9 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 1 1 1 -1 43
10 1 -1 -1 1 -1 -1 1 1 -1 -1 1 -1 -1 1 1 100
11 -1 1 -1 1 -1 1 -1 -1 1 -1 1 -1 1 -1 1 45
12 1 1 -1 1 1 -1 -1 1 1 -1 -1 1 -1 -1 -1 104
13 -1 -1 1 1 1 -1 -1 -1 -1 1 1 1 -1 -1 1 75
14 1 -1 1 1 -1 1 -1 1 -1 1 -1 -1 1 -1 -1 86
15 -1 1 1 1 -1 -1 1 -1 1 1 -1 -1 -1 1 -1 70
16 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 96
163

From this matrix, we can estimate all the
effects and then do a normal probability plot of
them. The effects are
A 21.625 AB 0.125 ABC 1.875
B 3.125 AC-18.125 ABD 4.125
C 9.875 AD 16.625 ACD-1.625
D 14.625 BC 2.375 BCD-2.625

BD -0.375 ABCD1.375
CD -1.125

164

The best stab at a normal probability plot is

165

There are only 5 effects that are off the line.
These are, in the upper right corner C, D, AD,
A, and in the lower left corner, AC. All of the
points on the line are negligible, behaving like
residuals.

166

Because we drop factor B and all its
interactions, we now get an ANOVA table with the
extra observations as error.
Source SS df MS
p
A 1870.56 1 1870.56 lt0.0001
C 390.06 1 390.06 lt0.0001
D 855.56 1 855.56 lt0.0001
AC 1314.06 1 1314.06 lt0.0001
AD 1105.56 1 1105.56 lt0.0001
CD 5.06 1 5.06
ACD 10.56 1 10.56
Error 179.52 8 22.44
Total 5730.94 15

167

Essentially, we have changed the design from a
24 design with only 1 replicate to a 23 design
with two replicates.
This is called projecting a higher-level design
into a lower-level design. If you start with an
unreplicated 2k design, then drop h of the
factors, you can continue with a 2k-h design with
2h replicates.
In this case, we started with a 24 design,
dropped h1 factor, and ended up with a 24-1
design with 21 replicates.

168

The main effects plots are

169

The two significant interaction plots are

170

Now we are going to talk about the addition of
center points to a 2k design. In this case, we
are looking for quadratic curvature, so we must
have quantitative factors.
The center points are run at 0 for each of the
k factors in the design. So now the coefficients
are -1, 0, 1. We have the same n replicates at
the center points as at the -1 and 1 points.

171

Now lets go back to the box we used earlier to
describe a 22 design.
1
factor B
-1
-1 1
factor A
At each corner, we have a point of the design,
for example, (A-,B-), (A-,B), (A,B-), and
(A,B).

IE341: Introduction to Design of Experiments PowerPoint PPT Presentation