AOV Assumption Checking and Transformations 8.4, 8.5 - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

AOV Assumption Checking and Transformations 8.4, 8.5

Description:

Levene's Test for Homogeneity of resistance Variance ... to straight line (don't care so ... k can be estimated by least squares (regression Next Unit) ... – PowerPoint PPT presentation

Number of Views:122

Avg rating:3.0/5.0

Slides: 35

Provided by: KMPor

Category:

more less

Transcript and Presenter's Notes

Title: AOV Assumption Checking and Transformations 8.4, 8.5

1
AOV Assumption Checking andTransformations
(8.4, 8.5)

How do we check the Normality assumption in AOV?
How do we check the Homogeneity of variances
assumption in AOV? (7.4)
What to do if these assumptions are not met?

2
Model Assumptions

Homoscedasticity (common group variances).
Normality of responses (or of residuals).
Independence of responses (or of residuals).
(Hopefully achieved through randomization)
Effect additivity. (Only an issue in multi-way
AOV later).

3
Checking the Equal Variance Assumption
HA some of the variances are different from each
other
Little work but little power
Hartleys Test A logical extension of the F
test for t2.
Requires equal replication, n, among groups.
Requires normality.
Reject if Fmax gt Fa,t,n-1, tabulated in Table 12.
4
Bartletts Test
More work but better power
Bartletts Test Allows unequal replication, but
requires normality.
T.S.
If C gt c2(t-1),a then apply the correction term
Reject if C/CF gt c2(t-1),a
R.R.
5
Levenes Test
More work but powerful result.
sample median of i-th group
Let
T.S.
df1 t -1 df2 nT - t
Reject H0 if
R.R.
Essentially an AOV on the zij
6
Minitab
Test for Equal Variances Response
Resist Factors Sand ConfLvl 95.0000
Bonferroni confidence intervals for standard
deviations Lower Sigma Upper N
Factor Levels 1.70502 3.28634 14.4467 5
15 1.89209 3.64692 16.0318 5 20
1.07585 2.07364 9.1157 5 25 1.07585
2.07364 9.1157 5 30 1.48567 2.86356
12.5882 5 35 Bartlett's Test (normal
distribution) Test Statistic 1.890 P-Value
0.756 Levene's Test (any continuous
distribution) Test Statistic 0.463 P-Value
0.762
Stat gt ANOVA gt Test for Equal Variances
Minitab Help Use Bartletts test when the data
come from normal distributions Bartletts test
is not robust to departures from normality. Use
Levenes test when the data come from continuous,
but not necessarily normal, distributions. The
computational method for Levenes Test is a
modification of Levenes procedure 10 developed
by 2. This method considers the distances of
the observations from their sample median rather
than their sample mean. Using the sample median
rather than the sample mean makes the test more
robust for smaller samples.
Do not reject H0 since p-value gt 0.05
(traditional a)
7
(No Transcript)
8
SAS Program
proc glm datastress class sand model
resistance sand / solution means sand /
hovtestbartlett means sand / hovtestlevene(type
abs) means sand / hovtestlevene(typesquare) m
eans sand / hovtestbf / Brown and Forsythe mod
of Levene / title1 'Compression resistance in
concrete beams as' title2 ' a function of
percent sand in the mix' run
Hovtest only works when one factor in (right hand
side) model.
9
SAS
hovtestbartlett
Bartlett's Test for Homogeneity of resistance
Variance Source DF Chi-Square Pr gt
ChiSq sand 4 1.8901 0.7560
Levene's Test for Homogeneity of resistance
Variance ANOVA of Absolute Deviations
from Group Means Sum of
Mean Source DF Squares Square
F Value Pr gt F sand 4 8.8320
2.2080 0.95 0.4573 Error 20
46.6080 2.3304
hovtestlevene(typeabs)
Levene's Test for Homogeneity of resistance
Variance ANOVA of Squared Deviations from
Group Means Sum of
Mean Source DF Squares Square
F Value Pr gt F sand 4 202.2
50.5504 0.85 0.5076 Error 20
1182.8 59.1400
hovtestlevene(typesquare)
Brown and Forsythe's Test for Homogeneity of
resistance Variance ANOVA of Absolute
Deviations from Group Medians
Sum of Mean Source DF Squares
Square F Value Pr gt F sand 4
7.4400 1.8600 0.46
0.7623 Error 20 80.4000 4.0200
hovtestbf
10
SPSS
Test of Homogeneity of Variances
Since the p-value (0.457) is greater than our
(typical) a 0.05 Type I error risk level, we do
not reject the null hypothesis.
This is Levenes original test in which the zij
are centered on group means and not medians.
11
R
Tests of Homogeneity of Variances
bartlett.test() Bartletts Test. fligner.test()
Fligner-Killeen Test (nonparametric).
12
Checking for Normality
Reminder Normality of the RESIDUALS is assumed.
The original data are assumed normal also, but
each group may have a different mean if HA is
true. Practice is to first fit the model, THEN
output the residuals, then test for normality of
the residuals. This APPROACH is always correct.
TOOLS

Histogram and/or boxplot of all residuals (eij).
Normal probability (Q-Q) plot.
Formal test for normality.

13
Histogram of Residuals
proc glm datastress class sand model
resistance sand / solution output outresid
rr_resis pp_resis title1 'Compression
resistance in concrete beams as' title2 ' a
function of percent sand in the mix' run proc
capability dataresid histogram r_resis /
normal ppplot r_resis / normal square run
14
Probability Plots (QQ-Plots)
A scatter plot of the percentiles of the
residuals against the percentiles of a standard
normal distribution. The basic idea is that if
the residuals came from a normal distribution,
values for these percentiles should lie on a
straight line.

Compute and sort the residuals e(1), e(2),,
e(n).
Associate with each residual a standard normal
percentile z(i) F-1((i-.5)/n).
Plot z(i) versus e(i). Compare to straight line
(dont care so much about which line).

15
Software
EXCEL Use AddLine option. Percentile pi
(i-0.5)/n Normal percentile NORMSINV(pi)
MTB Graph -gt Probability Plot
R with residuals in y qqnorm(y) qqline(y)
16
Excel Probability Plot
17
Probability Plot
Minitab
SAS (note axes changed)
These look normal!
18
Formal Tests of Normality
Many, many tests (a favorite pass-time of
statisticians is developing new tests for
normality.)

Kolmogorov-Smirnov test Anderson-Darling test
(both based on the empirical CDF).
Shapiro-Wilks test Ryan-Joiner test (both are
correlation based tests applicable for n lt 50).
DAgostinos test (ngt50).

All quite conservative they fail to reject the
null hypothesis of normality more often than they
should.
19
Shapiro-Wilks W test
e1, e2, , en represent data ranked from smallest
to largest.
H0 The population has a normal distribution. HA
The population does not have a normal
distribution.
T.S.
Coefficients ai come from a table.
If n is even
R.R. Reject H0 if W lt W0.05
If n is odd.
Critical values of Wa come from a table.
20
Shapiro-Wilk Coefficients
21
Shapiro-Wilk Coefficients
22
Shapiro-Wilk W Table
23
DAgostinos Test
e1, e2, , en represent data ranked from smallest
to largest.
H0 The population has a normal distribution. HA
The population does not have a normal
distribution.
T.S.
R.R. (two sided test) Reject H0 if
Y0.025 and Y0.975 come from a table of
percentiles of the Y statistic.
24
(No Transcript)
25
Transformations to Achieve Homoscedasticity

What can we do if the homoscedasticity (equal
variances) assumption is rejected?
Declare that the AOV model is not an adequate
model for the data. Look for alternative models.
(Later.)
Try to cheat by forcing the data to be
homoscedastic through a transformation of the
response variable Y. (Variance Stabilizing
Transformations.)

26
Square Root Transformation
Response is positive and continuous.
This transformation works when we notice the
variance changes as a linear function of the mean.
kgt0

Useful for count data (Poisson Distributed).
For small values of Y, use Y.5.

Typical use Counts of items when counts are
between 0 and 10.
27
Logarithmic Transformation
Response is positive and continuous.
This transformation tends to work when the
variance is a linear function of the square of
the mean
kgt0

Replace Y by Y1 if zero occurs.
Useful if effects are multiplicative (later).
Useful If there is considerable heterogeneity in
the data.

Typical use Growth over time. Concentrations.
Counts of times when counts are greater than 10.
28
ARCSINE SQUARE ROOT
Response is a proportion.
With proportions, the variance is a linear
function of the mean times (1-mean) where the
sample mean is the expected proportion.

Y is a proportion (decimal between 0 and 1).
Zero counts should be replaced by 1/4, and
N by N-1/4 before converting to percentages

Typical use Proportion of seeds
germinating. Proportion responding.
29
Reciprocal Transformation
Response is positive and continuous.
This transformation works when the variance is a
linear function of the fourth power of the mean.

Use Y1 if zero occurs.
Useful if the reciprocal of the original
scale has meaning.

Typical use Survival time.
30
Power Family of Transformations (1)
Suppose we apply the power transformation
Suppose the true situation is that the variance
is proportional to the k-th power of the mean.
In the transformed variable we will have
If p is taken as 1-k, then the variance of Z will
not depend on the mean, i.e. the variance will be
constant. This is a Variance stabilizing
transformation.
31
Power Family of Transformations (2)
With replicated data, k can sometimes be found
empirically by fitting
Estimate
k can be estimated by least squares (regression
Next Unit).
If is zero use the logarithmic
transformation.
32
Box/Cox Transformations (advanced)
suggested transformation
geometric mean of the original data.
Exponent, l, is unknown. Hence the model can be
viewed as having an additional parameter which
must be estimated (choose the value of l that
minimizes the residual sum of squares).
33
Handling Heterogeneity
no
Regression?
ANOVA
yes
Fit Effect Model
Fit linear model
accept
OK
Test for Homoscedasticity
Plot residuals
reject
Transform
Not OK
OK
Box/Cox Family Power Family
Traditional
Transformed Data
34
Transformations to Achieve Normality
no
Regression?
ANOVA
yes
Fit linear model
Estimate group means
Probability plot Formal Tests
yes
OK
Residuals Normal?
no
Different Model
Transform

Write a Comment

User Comments (0)