Title: Free Advice for PowerHungry Researchers: Do Not Categorize Continuous Variables
1Free Advice for Power-Hungry ResearchersDo Not
Categorize Continuous Variables!
- Bruce Weaver
- Northern Health Research Conference
- Lakehead University, Thunder Bay
- May 29-30, 2009
2OR
How to Analyze Data Without Being Discrete
- Bruce Weaver
- Northern Health Research Conference
- Lakehead University, Thunder Bay
- May 29-30, 2009
3The Objective
To demonstrate some of the undesirable
consequences of carving continuous variables into
categories prior to statistical analysis.
4GO SEE MY POSTER!
5The Objective
To demonstrate some of the undesirable
consequences of carving continuous variables into
categories prior to statistical analysis.
6Warnings from Statisticians
- Articles warn of the consequences of categorizing
continuous variables prior to statistical
analysis - E.g., Breaking Up is Hard to Do The Heartbreak
of Dichotomizing Continuous Data (Streiner DL.
Can J Psychiatry 200247262266, with apologies
to Neil Sedaka)
7Deaf Ears?
- Despite these warnings, researchers continue to
carve continuous variables into categories - Why?
- Perhaps the warnings are too abstractmore
concrete examples may be needed
8A Concrete Example Age BMI
Carving both variables into categories suggests
analysis with the chi-square test of association.
Obese
Overweight
Ideal weight underweight
9Contingency Table
10Bar Chart Within Age Group
The percentage of people in the ideal BMI group
decreases as age increases.
Within Age Group
Age Group
11Bar Chart Within Age Group
The percentage of people in the Overweight group
increases as age increases.
Within Age Group
Age Group
12Bar Chart Within Age Group
The percentage of people in the Obese group
increases as age increases at first, but then
remains fairly stable.
Within Age Group
Age Group
13Chi-square Test Results
No evidence of an association between Age and BMI.
14Example 2 Treat Age as Categorical, but Treat
BMI as Continuous
Treating Age as categorical but BMI as continuous
suggests the use of one-way ANOVA.
p .242
15Example 3 Treat Age as Continuous, but Treat
BMI as Categorical
Treating Age as continuous but BMI as categorical
suggests the use of multinomial (or ordinal)
logistic regression.
Obese
Overweight
Ideal weight underweight
p .131
16Example 4 Treat Both Variables as Continuous
Treating BOTH variables as continuous suggests
use of simple linear regression.
Statistically significant!
BMI b0 b1(Age) error
p .048
17ANOVA Summary Tablefrom the Linear Regression
Model
When we treat both variables as continuous, the
association between them is statistically
significant.
Carving either or both variables into categories
prior to analysis results in loss of power.
18The Regression Coefficients
BMI 25.445 .454 x Age error
- NOTE Age was centered on 35, and then divided
by 10 - Fitted value of BMI for a 35-year old 25.445
(the constant) - Fitted value increases .454 for every 10-yr
increase in age
19The Moral of the Story
- Streiner (2002) argues that, The purpose of most
research is to discover relationsrelations
between or among variables or between treatment
interventions and outcomes. - If that is true, one ought to use the most
powerful test that is appropriate for the data - Carving continuous variables into categories
decreases powerso dont do it!
20Some Common Objections to the Use of Continuous
Variables
21Ultimately, doctors have to categorize
folkse.g., treat or not treat, diseased or not,
etc.
- True. But categorization of folks can be done
after statistical analysisit does not have to be
done before analysis. - As Streiner says, if the point is to look for
associations between variables, one ought to use
the most powerful test that is appropriate.
22Doctors are much more familiar with methods that
use categorical measures (e.g., proportions, odds
ratios relative risks).
23Streiners Rejoinder
- As for the argument that physicians are more
comfortable with statistics based on categorical
measures, we are likely dealing with both a base
canard that they, like old dogs, cannot learn new
tricks and a vicious circle. - As long as the belief persists, studies will be
designed, analyzed, and reported using
proportions and ORs, meaning that physicians will
not have the opportunity to become more
comfortable with other approaches.
Streiner (2002, p. 263)
24GO SEE MY POSTER!
25So in case you missed it earlier
- The aim of most research is to look for
associations between variables - One ought to use the most powerful test
appropriate for the data - Categorizing continuous variables reduces power.
Therefore
26Dont do it!
QUESTIONS?
27Contact Information
Go see my poster!
- Bruce Weaver
- Assistant Professor, NOSM
- MSW-2006 (West Campus)
- E-mail bweaver_at_lakeheadu.ca
- Tel 807-346-7704
28Extra Slides
29Another illustration Age cholesterol
Fit-line from simple linear regression
Age group boundaries
Group means from one-way ANOVA
30Two Problems with Age Groups
- People on either side of a cut-point can have
very different fitted values of Y despite tiny
differences in X - People at opposite ends of a category have the
same fitted value despite sizeable differences on
X
31A Skeleton in My Closet
- I used to work for the McMaster research group
headed by Gord Guyatt Deb Cook - That is not the skeletontheres more!
- One paper examined factors affecting CPR
directives upon entry into the ICU - Explicit decision Resuscitate
- Explicit decision Do not resuscitate (DNR)
- No explicit decision Resuscitate
Reference category for a multinomial logistic
regression analysis.
32A Skeleton in My Closet (2)
- One explanatory variable was age
- But it was not treated as a continuous variable
- Age was carved into 4 groups
- Under 50
- 50 64
- 65 74
- 75
Reference Category
Doh!
33Odds Ratios for Age
Reference Category
- The odds of an explicit DNR directive (relative
to no explicit decision) are 3.4 times greater in
the 50-64 group than in the under 50 group
34Why I remember that particular case
- That was certainly not the only time I was
complicit in the categorization of continuous
variables - Why do I remember that case specifically?
- Guyatts 50th birthday was imminent, and he was
not thrilled about that one odds ratio
35Whats the point?
- Apart from getting a cheap laugh, what is the
point of that story? - It illustrates 2 consequences of carving
continuous variables into categories prior to
statistical analysis - Two people with nearly identical values for the
predictor variable (X) can have very different
fitted values of Y if they are in different
groups - Everyone within the same group has the same
fitted value of Y, even if there is substantial
variation in X when it is treated as continuous
36Something People Often Overlook
- The chi-square test of association treats both
variables as if they are nominal - I.e., the ordinal nature of the Age and BMI
groups was ignored - We can demonstrate this by re-ordering the
categories (from the earlier example), and
running the test again
37Two Contingency Tables
Categories in natural (ascending) order
Categories in haphazard order
38Results for the Original Table
No evidence of an association between Age and BMI.
39Results with Re-ordered Categories
In previous analysis, ?2 3.084, p .079
Same results as before!
The test of linear-by-linear association takes
into account the ordinal nature of the
categories, so is more powerful than the ordinary
chi-square test (when appropriate!).
40The Original Results Again!
The test of linear-by-linear association can be
used when the categories are ordinal, and is more
powerful than the ordinary chi-square test. But
in this case, the association between variables
is still not statistically significant.
41More on Linear-by-Linear Association
- If anyone wants more information about the
chi-square test of linear-by-linear association,
see Dave Howells website - Do a Google search on
- David Howell chi-square ordinal data