Title: Analysis of Variance
1Chapter 15
- Analysis of Variance
- ( ANOVA )
2Analysis of Variance
- Analysis of variance is a technique that allows
us to compare two or more populations of interval
data. - Analysis of variance is
- ? an extremely powerful and widely used
procedure. - ? a procedure which determines whether
differences exist between population means. - ? a procedure which works by analyzing sample
variance.
3One-Way Analysis of Variance
- Independent samples are drawn from k populations
- Note These populations are referred to as
treatments. - It is not a requirement that n1 n2 nk.
4 Table 15.01 Notation for the One-Way Analysis of
Variance
5 Notation
Independent samples are drawn from k populations
(treatments).
X11 x21 . . . Xn1,1
X12 x22 . . . Xn2,2
X1k x2k . . . Xnk,k
Sample size
Sample mean
X is the response variable. The variables
value are called responses.
6One Way Analysis of Variance
- New Terminology
- x is the response variable, and its values are
responses. - xij refers to the i th observation in the j th
sample. - E.g. x35 is the third observation of the fifth
sample.
nj
? xij xj mean of the jth sample
nj
i1
nj number of observations in the sample taken
from the jth population
7One Way Analysis of Variance
x
The grand mean, , is the mean of all the
observations, i.e. (n n1 n2
nk) and k is the number of populations
k nj
? ? xij x n
j 1 i 1
8One Way Analysis of Variance
- More New Terminology
- The unit that we measure is called an
experimental unit. - Population classification criterion is called a
factor. - Each population is a factor level.
9Example 15-1
- An apple juice company has a new product
featuring - more convenience,
- similar or better quality, and
- lower price
- when compared with existing juice products.
- Which factor should an advertising campaign focus
on? - Before going national, test markets are set-up in
three cities, each with its own campaign, and
data is recorded - Do differences in sales exist between the test
markets?
10 City 1 City2
City3 (Convenience)
(Quality) (Price)
529.00 658.00 793.00 514.00 663.00 719.00 711.00 6
06.00 461.00 529.00 498.00 663.00 604.00 495.00 48
5.00 557.00 353.00 557.00 542.00 614.00
804.00 630.00 774.00 717.00 679.00 604.00 620.00 6
97.00 706.00 615.00 492.00 719.00 787.00 699.00 57
2.00 523.00 584.00 634.00 580.00 624.00
672.00 531.00 443.00 596.00 602.00 502.00 659.00 6
89.00 675.00 512.00 691.00 733.00 698.00 776.00 56
1.00 572.00 469.00 581.00 679.00 532.00
Data
Xm15-01
11Example 15.1
Terminology
- x is the response variable, and its values are
responses. - weekly sales is the response variable
- the actual sales figures are the responses in
this example. - xij refers to the ith observation in the jth
sample. - E.g. x42 is the fourth weeks sales in city 2
717 pkgs. - x20, 3 is the last week of sales for city 3 532
pkgs.
comma added for clarity
12Example 15.1
Terminology
- The unit that we measure is called an
experimental unit. - The response variable is weekly sales
- Population classification criterion is called a
factor. - The advertising strategy is the factor were
interested in. This is the only factor under
consideration (hence the term one way analysis
of variance). - Each population is a factor level.
- In this example, there are three factor levels
convenience, quality, and price.
13 Terminology
In the context of this problem Response
variable weekly salesResponses actual sale
valuesExperimental unit weeks in the three
cities when we record sales figures.Factor the
criterion by which we classify the populations
(the treatments). In this problem the factor is
the marketing strategy. Factor levels the
population (treatment) names. In this problem
factor levels are the marketing strategies.
14Example 15.1
IDENTIFY
- The null hypothesis in this case is
- H0 µ1 µ2 µ3
- i.e. there are no differences between population
means. - Our alternative hypothesis becomes
- H1 at least two means differ
- OK. Now we need some test statistics
15The rationale of the test statistic
Two types of variability are employed when
testing for the equality of the population means
16Graphical demonstration Employing two types of
variability
1720
16 15 14
11 10 9
The sample means are the same as before, but the
larger within-sample variability makes it harder
to draw a conclusion about the population means.
A small variability within the samples makes it
easier to draw a conclusion about the population
means.
Treatment 1
Treatment 2
Treatment 3
18 The rationale behind the test statistic I
- If the null hypothesis is true, we would expect
all the sample means to be close to one another
(and as a result, close to the grand mean). - If the alternative hypothesis is true, at least
some of the sample means would differ. - Thus, we measure variability between sample
means.
19 Variability between sample means
- The variability between the sample means is
measured as the sum of squared distances between
each mean and the grand mean. - This sum is called the
- Sum of Squares for Treatments
- SST
In our example treatments are represented by the
different advertising strategies.
20 Sum of squares for treatments (SST)
There are k treatments
The mean of sample j
The size of sample j
Note When the sample means are close toone
another, their distance from the grand mean is
small, leading to a small SST. Thus, large SST
indicates large variation between sample means,
which supports H1.
21Test Statistics
- Since µ1 µ2 µ3 is of interest to us, a
statistic that measures the proximity of the
sample means to each other would also be of
interest. - Such a statistic exists, and is called the
between-treatments variation. It is denoted SST,
short for sum of squares for treatments. Its is
calculated as
grand mean
sum across k treatments
A large SST indicates large variation between
sample means which supports H1.
22Example 15.1
COMPUTE
- Since
- If it were the case that
- then SST 0 and our null hypothesis, H0
- would be supported.
- More generally, a small value of SST supports
the null hypothesis. The question is, how small
is small enough?
23Example 15.1
COMPUTE
- The following sample statistics and grand mean
were computed - Hence, the between-treatments variation, sum of
squares for treatments, is - is SST 57,512.23 large enough to indicate the
population means differ?
24 The rationale behind test statistic II
- Large variability within the samples weakens the
ability of the sample means to represent their
corresponding population means. - Therefore, even though sample means may markedly
differ from one another, SST must be judged
relative to the within samples variability.
25 Within samples variability
- The variability within samples is measured by
adding all the squared distances between
observations and their sample means. - This sum is called the
- Sum of Squares for Error
- SSE
In our example this is the sum of all squared
differences between sales in city j and
the sample mean of city j (over all the three
cities).
26Test Statistics
- SST gave us the between-treatments variation. A
second statistic, SSE (Sum of Squares for Error)
measures the within-treatments variation. - SSE is given by
or - In the second formulation, it is easier to see
that it provides a measure of the amount of
variation we can expect from the random variable
weve observed.
27Example 15.1
COMPUTE
- We calculate the sample variances as
3
and from these, calculate the within-treatments
variation (sum of squares for error) as
28 Sum of squares for errors (SSE)
Is SST 57,512.23 large enough relative to SSE
506,983.50 to reject the null hypothesis that
specifies that all the means are equal? We
still need a couple more quantities in order to
relate SST and SSE together in a meaningful way
29Mean Squares
- The mean square for treatments (MST) is given by
- is F-distributed with k1 and nk degrees of
freedom.
The mean square for errors (MSE) is given by
And the test statistic
?1 3 1 2 ?2 60 3 57
30Example 15.1
COMPUTE
- We can calculate the mean squares treatment and
mean squares error quantities as
31 Example 15.1
COMPUTE
Giving us our F-statistic of
Does F 3.23 fall into a rejection region or
not? How does it compare to a critical value of
F?
Note these required conditions 1. The
populations tested are normally distributed. 2.
The variances of all the populations tested are
equal.
32Example 15.1
INTERPRET
- Since the purpose of calculating the F-statistic
is to determine whether the value of SST is large
enough to reject the null hypothesis, if SST is
large, F will be large. - Hence our rejection region is
- Our value for FCritical is
33Example 15.1
INTERPRET
- Since F 3.23 is greater than FCritical 3.15,
we reject the null hypothesis (H0 µ1 µ2 µ3 )
in favor of the alternative hypothesis (H1 at
least two population means differ). - That is there is enough evidence to infer that
the mean weekly sales differ between the three
cities. - Stated another way we are quite confident that
the strategy used to advertise the product will
produce different sales figures.
34 35Summary of Techniques (so far)
36ANOVA Table
- The results of analysis of variance are usually
reported in an ANOVA table
Source of Variation degrees of freedom Sum of Squares Mean Square
Treatments k1 SST MSTSST/(k1)
Error nk SSE MSESSE/(nk)
Total n1 SS(Total)
F-statMST/MSE
37 Table 15.2 ANOVA Table for the One-Way Analysis
of Variance
38 Table 15.3 ANOVA Table for Example 15.1
39 SPSS Output
40 41 Checking required conditions
Figure 15.3a Histogram of Sales, City 1
(Convenience)
42 Figure 15.3b Histogram of Sales, City 2 (Quality)
43 Figure 15.3c Histogram of Sales, City 3 (Price)
44Can We Use t Test Instead of ANOVA?
- We cant for two reasons
- We need to perform more calculations. If we have
six pairs then we will have to test C6 ( 6 x 5
) / 2 15 times - It will increase the probability of making Type I
error from 5 to 54
2
45Relationship Between t and F Statistics
The F statistic is approximately equal to the
square of t
Hence we will draw exactly the same conclusion
using analysis of variance as we did when we
applied t test of u1 u2.
46Identifying Factors
- Factors that Identify the One-Way Analysis of
Variance
47Analysis of Variance Experimental Designs
- Experimental design is one of the factors that
determines which technique we use. - In the previous example we compared three
populations on the basis of one factor
advertising strategy. - One-way analysis of variance is only one of many
different experimental designs of the analysis of
variance.
48Analysis of Variance Experimental Designs
- A multifactor experiment is one where there are
two or more factors that define the treatments. - For example, if instead of just varying the
advertising strategy for our new apple juice
product if we also vary the advertising medium
(e.g. television or newspaper), then we have a
two-factor analysis of variance situation. - The first factor, advertising strategy, still has
three levels (convenience, quality, and price)
while the second factor, advertising medium, has
two levels (TV or print).
49 One - way ANOVA Single factor
Two - way ANOVA Two factors
Response
Response
Treatment 3 (level 1)
Treatment 2 (level 2)
Treatment 1 (level 3)
Level 3
Level2
Factor A
Level 1
Level 1
Level2
Factor B
50Independent Samples and Blocks
- Similar to the matched pairs experiment, a
randomized block design experiment reduces the
variation within the samples, making it easier to
detect differences between populations. - The term block refers to a matched group of
observations from each population. - We can also perform a blocked experiment by using
the same subject for each treatment in a
repeated measures experiment.
51Independent Samples and Blocks
- The randomized block experiment is also called
the two-way analysis of variance, not to be
confused with the two-factor analysis of
variance. To illustrate where were headed
well do this first
52Models of Fixed and Random Effects
- Fixed effects
- If all possible levels of a factor are included
in our analysis we have a fixed effect ANOVA. - The conclusion of a fixed effect ANOVA applies
only to the levels studied. - Random effects
- If the levels included in our analysis represent
a random sample of all the possible levels, we
have a random-effect ANOVA. - The conclusion of the random-effect ANOVA applies
to all the levels (not only those studied).
53Models of Fixed and Random Effects.
- In some ANOVA models the test statistic of the
fixed effects case may differ from the test
statistic of the random effect case. - Fixed and random effects - examples
- Fixed effects - The advertisement Example
(15.1) All the levels of the marketing
strategies were included - Random effects - To determine if there is a
difference in the production rate of 50 machines,
four machines are randomly selected and there
production recorded.
54Randomized Block Analysis of Variance
- The purpose of designing a randomized block
experiment is to reduce the within-treatments
variation to more easily detect differences
between the treatment means. - In this design, we partition the total variation
into three sources of variation - SS(Total) SST SSB SSE
- where SSB, the sum of squares for blocks,
measures the variation between the blocks.
55 Randomized Blocks
Block all the observations with some commonality
across treatments
Treatment 4
Treatment 3
Treatment 2
Treatment 1
Block 1
Block3
Block2
56Randomized Blocks
- In addition to k treatments, we introduce
notation for b blocks in our experimental design
mean of the observations of the 1st block
mean of the observations of the 2nd treatment
57Sum of Squares Randomized Block
- Squaring the distance from the grand mean,
leads to the following set of formulae
test statistic for treatments
test statistic for blocks
58ANOVA Table
- We can summarize this new information in an
analysis of variance (ANOVA) table for the
randomized block analysis of variance as follows
Source of Variation d.f. Sum of Squares Mean Square F Statistic
Treatments k1 SST MSTSST/(k1) FMST/MSE
Blocks b1 SSB MSBSSB/(b-1) FMSB/MSE
Error nkb1 SSE MSESSE/(nkb1)
Total n1 SS(Total)
59Test Statistics Rejection Regions
60Example 15.2
IDENTIFY
- Are there difference in the effectiveness of four
new cholesterol drugs? 25 groups of men were
matched according to age weight, and the
results were recorded. - The hypotheses to test in this case are
-
- H0 µ1 µ2 µ3 µ4
- H1 At least two means differ
61 Group Drug 1 Drug 2
Drug 3 Drug 4
2.70 2.40 6.50 16.20 8.30 5.40 15.40 17.10 7.70 16
.10 9.00 24.30 9.30 19.20 18.70 18.90 7.90 23.80 8
.80 26.70 25.20 27.30 17.60 25.60 26.10
8.70 9.30 10.00 12.60 10.60 15.40 16.30 18.90 13.7
0 19.40 18.50 21.10 19.30 21.90 22.10 19.40 25.40
26.50 22.20 23.50 19.60 30.10 26.60 24.50 27.40
1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00
11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 1
9.00 20.00 21.00 22.00 23.00 24.00 25.00
6.60 7.10 7.50 9.90 13.80 13.90 15.90 14.30 16.00
16.30 14.60 18.70 17.30 19.60 20.70 18.40 21.50 20
.40 21.90 22.50 21.50 25.20 23.00 23.70 28.40
12.60 3.50 4.40 7.50 6.40 13.50 16.90 11.40 16.90
14.80 18.60 21.20 10.00 17.00 21.00 27.20 26.80 28
.00 31.70 11.90 28.70 29.50 22.20 19.50 31.20
62Example 15.2
IDENTIFY
- Each of the four drugs can be considered a
treatment. - Each group) can be blocked, because they are
matched by age and weight. - By setting up the experiment this way, we
eliminate the variability in cholesterol
reduction related to different combinations of
age and weight. This helps detect differences in
the mean cholesterol reduction attributed to the
different drugs.
63 Example 15.2
The Data
Group Drug 1 Drug 2 Drug 3 Drug 4 Group Drug 1 Drug 2 Drug 3 Drug 4
1 6.6 12.6 2.7 8.7 14 19.6 17.0 19.2 21.9
2 7.1 3.5 2.4 9.3 15 20.7 21.0 18.7 22.1
3 7.5 4.4 6.5 10.0 16 18.4 27.2 18.9 19.4
4 9.9 7.5 16.2 12.6 17 21.5 26.8 7.9 25.4
5 13.8 6.4 8.3 10.6 18 20.4 28.0 23.8 26.5
6 13.9 13.5 5.4 15.4 19 21.9 31.7 8.8 22.2
7 15.9 16.9 15.4 16.3 20 22.5 11.9 26.7 23.5
8 14.3 11.4 17.1 18.9 21 21.5 28.7 25.2 19.6
9 16.0 16.9 7.7 13.7 22 25.2 29.5 27.3 30.1
10 16.3 14.8 16.1 19.4 23 23.0 22.2 17.6 26.6
11 14.6 18.6 9.0 18.5 24 23.7 19.5 25. 6 24.5
12 18.7 21.2 24.3 21.1 25 28.4 31.2 26.1 27.4
13 17.3 10.0 9.3 19.3
64 65 SPSS Output
b - 1
K - 1
MSB
MST
Blocks
Treatments
66 The p value to determine whether differences
exist between the four drugs ( treatments) is
.009. Thus we reject H0 in favor of the research
hypothesis at least two means differ. The p
value for groups 0 indicates that there are
differences between groups of men ( blocks) that
is age, and weight have an impact, but our
experiment design accounts for that.
67 68 69Identifying Factors
- Factors that Identify the Randomized Block of the
Analysis of Variance
70Two-Factor Analysis of Variance
- The original set-up for Example 15.1 examined one
factor, namely the effects of the marketing
strategy on sales. - Emphasis on convenience,
- Emphasis on quality, or
- Emphasis on price.
- Suppose we introduce a second factor, that being
the effects of the selected media on sales, that
is - Advertise on television, or
- Advertise in newspapers.
- To which factor(s) or the interaction of factors
can we attribute any differences in mean sales of
apple juice?
71More Terminology
- A complete factorial experiment is an experiment
in which the data for all possible combinations
of the levels of the factors are gathered. This
is also known as a two-way classification. - The two factors are usually labeled A B, with
the number of levels of each factor denoted by a
b respectively. - The number of observations for each combination
is called a replicate, and is denoted by r. For
our purposes, the number of replicates will be
the same for each treatment, that is they are
balanced.
72Example 15.3 Test Marketing of Advertising
Strategies and Advertising Media
- Manufacturing
- Media Television Newspaper
- City 1 Convenience Television
- City 2 Convenience Newspaper
- City 3 Quality - Television
- City 4 Quality Newspaper
- City 5 Price - Television
- City 6 Price - Newspaper
73 Sales Data
C-1 C-2 C-3 C-4 C-5 C-6 491 464 677 689 575 803
712 559 627 650 614 584 558 759 590 704 706 525
447 557 632 652 484 498 479 528 683 576 478 812
624 670 760 836 650 565 546 534 690 628 583 708
444 657 548 798 536 546 582 557 579 497 579 616
672 474 644 841 795 587
74 The Data
Factor A Strategy Convenience Quality
Price Factor B Medium Television Newspaper
Convenience Quality Price
Convenience Quality Price
Newspaper 464 689 803 Newspaper 559 650 584 Newspa
per 759 704 525 Newspaper 557 652 498 Newspaper 52
8 576 812 Newspaper 670 836 565 Newspaper 534 628
708 Newspaper 657 798 546 Newspaper 557 497 616 Ne
wspaper 474 841 587
Television 491 677 575 Television 712 627 614 Tele
vision 558 590 706 Television 447 632 484 Televisi
on 479 683 478 Television 624 760 650 Television 5
46 690 583 Television 444 548 536 Television 582 5
79 579 Television 672 644 795
75 Example 15.3
The Data
Factor A Strategy
Factor B Medium
There are a 3 levels of factor A, b 2 levels
of factor B, yielding 3 x 2 6 replicates, each
replicate has r 10 observations
76Possible Outcomes
Fig. 15.5
- This figure illustrates the case where there are
differences between levels of A, but no
difference between the levels of B and no
interaction between A B
77Possible Outcomes
Fig. 15.6
- This figure illustrates the case where there are
differences between levels of B, but no
differences between the levels of A and no
interaction between A B
78Possible Outcomes
Fig. 15.4
- This figure illustrates the case where there are
differences between levels of A, and there are
differences between the levels of B, but and no
interaction between A B - (i.e. the factors affect sales independently,
which means there is no interaction)
79Possible Outcomes
Fig. 15.7
- This figure shows the levels of A B interacting
80ANOVA Table
Table 15.8
Source of Variation d.f. Sum of Squares Mean Square F Statistic
Factor A a-1 SS(A) MS(A)SS(A)/(a-1) FMS(A)/MSE
Factor B b1 SS(B) MS(B)SS(B)/(b-1) FMS(B)/MSE
Interaction (a-1)(b-1) SS(AB) MS(AB) SS(AB) (a-1)(b-1) FMS(AB)/MSE
Error nab SSE MSESSE/(nab)
Total n1 SS(Total)
81Two Factor ANOVA
- Test for the differences between the Levels of
Factor A - H0 The means of the a levels of Factor A are
equal - H1 At least two means differ
- Test statistic F MS(A) / MSE
- Example 15.3 Are there differences in the mean
sales caused by different marketing strategies? - H0 µconvenience µquality µprice
- H1 At least two means differ
82Two Factor ANOVA
- Test for the differences between the Levels of
Factor B - H0 The means of the a levels of Factor B are
equal - H1 At least two means differ
- Test statistic F MS(B) / MSE
- Example 15.3 Are there differences in the mean
sales - caused by different advertising media?
- H0 µtelevision µnewspaper
- H1 At least two means differ
83Two Factor ANOVA
- Test for interaction between Factors A and B
- H0 Factors A and B do not interact to affect the
mean responses. - H1 Factors A and B do interact to affect the
mean responses. - Test statistic F MS(AB) / MSE
- Example 15.3 Are there differences in the mean
sales caused by interaction between marketing
strategy and advertising medium?? - H0 µconvenience television µquality
television µprice newspaper - H1 At least two means differ
84 COMPUTE
SPSS Output
Factor B - Media
Factor A - Mktg Strategy
Interaction of AB
Error
85Example 15.3
INTERPRET
There is evidence at the 5 significance level to
infer that differences in weekly sales exist
between the different marketing strategies
(Factor A).
86Example 15.3
INTERPRET
There is insufficient evidence at the 5
significance level to infer that differences in
weekly sales exist between television and
newspaper advertising (Factor B).
87Example 15.3
INTERPRET
There is not enough evidence to conclude that
there is an interaction between marketing
strategy and advertising medium that affects mean
weekly sales (interaction of Factor A Factor B).
88 89See for yourself
- There are differences between the levels of
factor A, no difference between the levels of
factor B, and no interaction is apparent.
90See for yourself
INTERPRET
- These results indicate that emphasizing quality
produces the highest sales and that television
and newspapers are equally effective.
91 92Identifying Factors
- Independent Samples Two-Factor Analysis of
Variance
93Multiple Comparisons
µ1 µ2 µ3
- When we conclude from the one-way analysis of
variance that at least two treatment means differ
(i.e. we reject the null hypothesis that H0 µ1
µ2 . µk ), we often need to know which
treatment means are responsible for these
differences. - We will examine three statistical inference
procedures that allow us to determine which
population means differ - Fishers least significant difference (LSD)
method - Bonferroni adjustment, and
- Tukeys multiple comparison method.
94Multiple Comparisons
- Two means are considered different if the
difference between the corresponding sample means
is larger than a critical number. The general
case for this is, - IF
- THEN we conclude µi and µj differ.
- The larger sample mean is then believed to be
associated with a larger population mean.
95Fishers Least Significant Difference
- What is this critical number, NCritical ?
- One measure is the Least Significant Difference,
given by - LSD will be the same for all pairs of means if
all k sample sizes are equal. If some sample
sizes differ, LSD must be calculated for each
combination.
96Back to Example 15.1
- With k3 treatments (marketing strategy based on
convenience, quality, or price), we will perform
three comparisons based on the sample means
We compare these to the Least Significant
Difference we calculate as (at 5significance)
97Example 15.1 Fishers LSD
we conclude that only the means for convenience
and quality differ
98Bonferroni Adjustment to LSD Method
- Fishers method may result in an increased
probability of committing a type I error. - We can adjust Fishers LSD calculation by using
the Bonferroni adjustment. - Where we used alpha ( ), say .05, previously,
we now use and adjusted value for alpha - where
99Example 15.1 Bonferronis Adjustment
- Since we have k3 treatments,
Ck(k1)/23(2)/23, hence we set our new alpha
value to - Thus, instead of using t.05/2 in our LSD
calculation, we are going to use t.0167/2
100 Bonferroni
Similar result as before but different Std.error
and Sig
101Tukeys Multiple Comparison Method
- As before, we are looking for a critical number
to compare the differences of the sample means
against. In this case - Note is a lower case Omega, not a w
harmonic mean of the sample sizes
Critical value of the Studentized range with nk
degrees of freedom Table 7 - Appendix B
?
102Example 15.1 Tukeys Method
Similar result as before but different Std.error
and Sig
103Which method to use?
- In example 15.1, all three multiple comparison
methods yielded the same results. This will not
always be the case! Generally speaking - If you have identified two or three pairwise
comparisons that you wish to make before
conducting the analysis of variance, use the
Bonferroni method. - If you plan to compare all possible combinations,
use Tukeys comparison method.
104Nonparametric Tests forTwo or More Populations
105Kruskal-Wallis Test
- So far weve been comparing locations of two
populations, now well look at comparing two or
more populations. - The Kruskal-Wallis test is applied to problems
where we want to compare two or more populations
of ordinal or interval (but nonnormal) data from
independent samples. - Our hypotheses will be
- H0 The locations of all k populations
are the same. - H1 At least two population locations
differ.
106Test Statistic
- In order to calculate the Kruskal-Wallis test
statistic, we need to - Rank all the observations from smallest (1) to
largest (n), and average the ranks in the case of
ties. - We calculate rank sums for each sample T1, T2,
, Tk - Lastly, we calculate the test statistic (denoted
H)
107Sampling Distribution of the Test Statistic
- For sample sizes greater than or equal to 5, the
test statistic H is approximately Chi-squared
distributed with k1 degrees of freedom. - Our rejection region is H gt ?2a,k-1
- And our p-value is P ( ?2 gt H )
108 Figure 21.10 Sampling Distribution of H
109Example 21.5
IDENTIFY
- Can we compare customer ratings (4good 1poor)
for speed of service across three shifts in a
fast food restaurant? Our hypotheses will be - H0 The locations of all 3 populations are
the same. - (that is, there is no difference in service
between shifts), - and
- H1 At least two population locations
differ. - Customer ratings for service were recorded
110Example 21.5
- 10 customers were selected at random from each
shift
400 P.M to Midnight 4 4 3 4 3 3 3 3 2 3
Midnight to 800 A.M 3 4 2 2 3 4 3 3 2 3
8 A.M to 4P.M 3 1 3 2 1 3 4 2 4 1
111Example 21.5
COMPUTE
- One way to solve the problem is to take the
original data, - stack it, and then
- sort by customer response
- rank bottom to top
sorted by response
112Example 21.5
COMPUTE
- Once its in stacked format, put in straight
rankings from 1 to 30, average the rankings for
the same response, then parse them out by shift
to come up with rank sum totals
113Example 21.5
COMPUTE
2.64
Our critical value of Chi-squared (5
significance and k12 degrees of freedom) is
5.99147, hence there is not enough evidence to
reject H0.
114Example 21.5
- There is not enough evidence to infer that a
difference in speed of service exists between the
three shifts, i.e. all three of the shifts are
equally rated, and any action to improve service
should be applied to all three shifts
INTERPRET
115 Example 21.5
COMPUTE
- There is not enough evidence to infer that a
difference in speed of service exists between the
three shifts, i.e. all three of the shifts are
equally rated, and any action to improve service
should be applied to all three shifts
compare
p-value
116 SPSS Output
There is not enough evidence to infer that a
difference in speed of service exists between
the three shifts, i.e. all three of the shifts
are equally rated, and any action to improve
service should be applied to all three shifts
117Identifying Factors
- Factors that Identify the Kruskal-Wallis Test
118Friedman Test
- The Friedman Test is a technique used compare two
or more populations of ordinal or interval
(nonnormal) data that are generated from a
matched pairs experiment. - The hypotheses are the same as before
- H0 The locations of all k populations are the
same. - H1 At least two population locations differ.
119Friedman Test Test Statistic
- Since this is a matched pairs experiment, we
first rank each observation within each of b
blocks from smallest to largest (i.e. from 1 to
k), averaging any ties. We then compute the rank
sums T1, T2, , Tk. Then we calculate our test
statistic
120Friedman Test Test Statistic
- This test statistic is approximate Chi-squared
with k1 degrees of freedom (provided either k or
b 5). Our rejection region and p-value are
121 Sampling Distribution of the Test Statistic
The test statistics is approximately chi-squared
distributed with k 1 degrees of freedom
provided either k or b is greater than or equal
to 5.The rejection region is Fr gt ?2a, k-1 and
the p value is P( ?2 gt Fr ) The figure on next
slide depicts the sampling distribution and p
value
122 Figure 21.11 Sampling Distribution of Fr
123Example 21.6
IDENTIFY
- Four managers evaluate and score job applicants
on a scale from 1 (good) to 5 (not so good).
There have been complaints that the process isnt
fair. Is it the case that all managers score the
candidates equally or not? That is
124 Example 21.6
IDENTIFY
- H0 The locations of all 4 populations are the
same. - (i.e. all managers score like candidates alike)
- H1 At least two population locations differ.
- (i.e. there is some disagreement between managers
on scores) - The rejection region is
- Fr gt ?2a,k-1 ?2.05,3 7.81473
-
125Example 21.6
COMPUTE
There are k4 populations (managers) and b8
blocks (applicants) in this set-up.
126 Example 21.6
COMPUTE
Applicant 1 for example, received a top
score from manager v and next-to-top scores from
the other three. Applicant 7 received a top
score from manager v as well, but the other three
scored this candidate very low
127Example 21.6
COMPUTE
- rank each observation within block from smallest
to largest (i.e. from 1 to k), averaging any
ties For example, consider the case of
candidate 2
Manager u Manager v Manager w Manager x
Original Scores 4 2 3 2 checksum
straight ranking 4 1 3 2 10
averaged ranking 4 (12)/2 1.5 3 (12)/2 1.5 10
checksum 1 2 3 k
128Example 21.6
COMPUTE
- Compute the rank sums T1, T2, , Tk and our test
statistic
129 Example 21.6
COMPUTE
The rejection region is Fr gt ?2a,k-1
?2.05,3 7.81473
130Example 21.6
INTERPRET
- The value of our Friedman test statistic is 10.61
compared to a critical value of Chi-squared (at
5 significance and 3 d.f.) which is 7.81473 - Thus, there is sufficient evidence to reject H0
in favor of H1
It appears that the managers evaluations of
applicants do indeed differ
131 SPSS Output
132Identifying Factors
- Factors that Identify the Friedman Test