Title: Short Course in Biostatistics2007 Lesson 5: Introduction to ANOVA
1Short Course in Biostatistics-2007Lesson 5
Introduction to ANOVA
2Review
- Many scientific questions can be viewed as
questions about the unknown probability
distributions of variables . - These distributions can often be characterized by
the value of one or two parameters. - Therefore many scientific questions can be
specified in terms of the value of parameters.
3Review (continued)
- Statistical methods provide a formal approach for
doing two things - 1. Determine what the data say regarding the
value of these parameters (estimation and
confidence intervals). - 2. Quantify the strength of the evidence
against specified hypotheses about the unknown
parameters (p-values)
4ANOVA
- A general set of methods to be used when there is
a quantitative outcome and one or more
categorical predictors.
5One-way ANOVA
- Often we might be interested in the distribution
of a quantitative outcome in 3 or more groups. - Example Research Question
- What is the effect of various medications on the
change in blood pressure in lupus patients with
hypertension. - Quantitative outcome Change in SBP
- Categorical Predictor Type of Medication
6Parameterizing one-way ANOVA
- Statistical methods primarily focus on the means
of the distributions. - Lupus Example
- Suppose there are three medications
- (ACE, CCB, or DI)
- Then, if we focus on the means, there are three
parameters - µACE, µCCB, µDI,
7Parameterizing one-way ANOVA
- There is scientific interest in estimating these
means, and their differences, e.g. - MDACE-CCB, MDACE-DI, MDCCB-DI
- We may also be interested assessing hypotheses.
Here are a few we might be interested in - Ho1 µACE µCCB,
- Ho2 µACE µDI
- Ho3 µCCB µDI
- HoG µACE µCCB µDI (Global Null)
-
8Parameterizing one-way ANOVA
- We also have to consider the variance of the
distributions. These can be denoted with three
parameters - sACE , sCCB , sDI
-
9Additional assumptions of ANOVA
- In ANOVA models we usually assume that the
variances in each group are the same, e.g. - sACE sCCB sDI
- In addition, we assume that the distributions of
interest are normal.
10Graphical Representation of One-Way ANOVA with 3
groups
11How do we estimate the parameters?
- Data Independent realizations of the random
variable in each group. - Ygroup1,i i1 to ngroup1
- Ygroup2,i i1 to ngroup2
- Ygroup3,i i1 to ngroup3
- Estimating the means Use the sample mean
12How do we estimate the parameters?
- Estimating the common variance Use the average
squared distance from the sample means, i.e.
13Confidence interval for the means
- Formula for a 95 CI interval for µGroup1
- Note is referred to as the
- Standard Error of the Mean
14Assessing simple pair-wise hypotheses
- To assess
- Ho1 µACE µCCB
- Simply perform a two-sample t-test.
- Nothing new so far!
15But how to quantify the evidence against the
Global Null Hypothesis?
- Global Null
- HoG µACE µCCB µDI
- Use an F-test based on an ANOVA table
(inference based sums of squares) -
16Sums of Squares
- Total Sum of Squares (SSTO)
- Sum of squared deviations between observed
values and estimated the overall sample mean
ignoring groups membership. - Error Sum of Squares (SSE)
- Sum of squared deviations between observed
values and the group-specific means. - SSTO-SSE is Treatment Sum of Squares (SSTR)
- The bigger the SSTR, the stronger the
relationship between group and the outcome. -
17Illustration to explain the Sums of Squares
18Sums of Squares are summarized on an Analysis of
Variance Table
- Under Global Null, the F-statistic has an F
distribution. - Strategy Calculate F and compare to F
distribution
19Quantitative Outcome Two Groups (cont.)
- Example Data
- 110 Lupus patients started on ACE inhibitors
- 82 Lupus patients started on Diuretics
- 50 Lupus patients started on Calcium Channel
Blockers - All were followed for 90 days and their change in
SBP was measured.
20Always look at your data before using fancy
methods!
- Box plot of the changes in SBP by treatment group
ACE
Di
CCB
21Sample Means and Standard Deviations
------------chsbp------------
group N Mean
Std Dev 1
110 -18.3727273 23.8319264
2 82 -21.2195122
24.0092700 3
50 -13.5200000 19.6263463
22Analysis of Variance Table
P.1801 for global null hypothesis. Not strong
evidence against it.
23Multi-way ANOVA
- Often we might be interested in the distribution
of a quantitative outcome in groups
cross-classified by two or more categorical
variables. - Example Research Question
- What is the effect of various medications and an
exercise regimen on change in SBP among patients
with hypertension. - Quantitative outcome Change in SBP
- Categorical Predictors Type of Medication
- Exercise
24Parameterizing Multi-Way ANOVA
- We can parameterize Multi-Way ANOVA using a mean
for each group. For 2-way ANOVA, these means can
be concisely displayed on a table. - Example
-
25Alternative Ways to Parameterize ANOVA Models
- First, lets revisit the one-way ANOVA setting.
- Previously, we parameterized one-way ANOVA using
- µACE µCCB µDI
26Alternative parameterization for one-way ANOVA
- An alternative way to parameterize it would be
- µACE µoverall aACE
- µDI µoverall aDI
- µCCB µoverall aCCB
- where
- µoverall the overall mean, and
- aACE the effect of ACE relative to
the other groups, etc. - Or more generally,
- µj µoverall aj j1,2,3,
27Alternative parameterization for Multi-way
ANOVA
- Using this general approach, we might
parameterize the two-way ANOVA described above as
follows - µij µoverall aj ßi j1,2,3
and i1,2 - where
- aj, j1,2,3, stand for the effects of ACE, DI,
CCB - and
- ßi , i1,2, stand for the effects of
exercise(yes/no) -
28Alternative multi-way parameterization
- µij µoverall aj ßi j1,2,3
and i1,2 - Note that this model implicitly entails the
assumption that the effect of exercise is the
same for each type of medication. (No effect
modification on the MD scale.) - To see this, note
-
29Alternative multi-way parameterization
- The model again
- µij µoverall aj ßi j1,2,3
and i1,2 - Similarly, this model entails the assumption that
the effect of each medication is the same among
those who do or do not exercise. - This is an additive model. The effects are
simply additive. - This is referred to as a Main Effects model.
30Alternative multi-way parameterization
- We can allow for effect modification by including
more parameters. This is usually denoted as
follows - µij µoverall aj ßi aßij,
j1,2,3 and i1,2 - These latter parameters are interaction effects.
31How do we estimate the parameters?
- Data Independent realizations of the random
variable in each group. - For the main-effects model, we cannot estimate
the parameters simply by calculating the sample
means, because they may not satisfy the
assumption of no effect modification. - The computer finds the best estimates of the main
effects using Least Squares Estimation. - The computer will also provide standard errors,
confidence intervals, etc.
32Possible Hypotheses of Interest
- There is a semi-global null hypothesis
corresponding to each predictor. - For example
- Ho,treatment a1 a2 a30
- Ho,exercise ß1 ß20
- And the true Global Null
- Ho,global a1 a2 a3 ß1 ß20
- Each can be assessed with an F test.
332-way ANOVA, Example
- Example Data
- 60 with ACE and no exercise
- 50 with ACE and exercise
- 43 with DI and no exercise
- 37 with DI and exercise
34Looking at the Data
ACE, Exer
ACE, No Exer
DI, Exer
DI, No Exer
35Analysis of Variance Results (Main effect Model)
36Analysis of Variance Results (Including
Interaction Terms)
37ANOVA has lots of jargon.
- The categorical predictors are referred to as
Treatments or Factors. - The categories within each treatment are referred
to as Levels - If some subjects are observed in every
cross-classification of all factors, we have a
factorial design. - If there are the same number of subjects in each
level of each factor, we have a balanced
design. -
38Extensions and Complications.
- Repeated Measures ANOVA
- Incomplete Designs
- MANOVA
- Random Effects Models
- Adjustments for multiple comparisons and post-hoc
comparisons. -