Free Advice for PowerHungry Researchers: Do Not Categorize Continuous Variables - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Free Advice for PowerHungry Researchers: Do Not Categorize Continuous Variables

Description:

To demonstrate some of the undesirable consequences of carving continuous ... likely dealing with both a base canard that they, like old dogs, cannot learn ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 42
Provided by: MedSc5
Category:

less

Transcript and Presenter's Notes

Title: Free Advice for PowerHungry Researchers: Do Not Categorize Continuous Variables


1
Free Advice for Power-Hungry ResearchersDo Not
Categorize Continuous Variables!
  • Bruce Weaver
  • Northern Health Research Conference
  • Lakehead University, Thunder Bay
  • May 29-30, 2009

2
OR
How to Analyze Data Without Being Discrete
  • Bruce Weaver
  • Northern Health Research Conference
  • Lakehead University, Thunder Bay
  • May 29-30, 2009

3
The Objective
To demonstrate some of the undesirable
consequences of carving continuous variables into
categories prior to statistical analysis.
4
GO SEE MY POSTER!
5
The Objective
To demonstrate some of the undesirable
consequences of carving continuous variables into
categories prior to statistical analysis.
6
Warnings from Statisticians
  • Articles warn of the consequences of categorizing
    continuous variables prior to statistical
    analysis
  • E.g., Breaking Up is Hard to Do The Heartbreak
    of Dichotomizing Continuous Data (Streiner DL.
    Can J Psychiatry 200247262266, with apologies
    to Neil Sedaka)

7
Deaf Ears?
  • Despite these warnings, researchers continue to
    carve continuous variables into categories
  • Why?
  • Perhaps the warnings are too abstractmore
    concrete examples may be needed

8
A Concrete Example Age BMI
Carving both variables into categories suggests
analysis with the chi-square test of association.
Obese
Overweight
Ideal weight underweight
9
Contingency Table
10
Bar Chart Within Age Group
The percentage of people in the ideal BMI group
decreases as age increases.
Within Age Group
Age Group
11
Bar Chart Within Age Group
The percentage of people in the Overweight group
increases as age increases.
Within Age Group
Age Group
12
Bar Chart Within Age Group
The percentage of people in the Obese group
increases as age increases at first, but then
remains fairly stable.
Within Age Group
Age Group
13
Chi-square Test Results
No evidence of an association between Age and BMI.
14
Example 2 Treat Age as Categorical, but Treat
BMI as Continuous
Treating Age as categorical but BMI as continuous
suggests the use of one-way ANOVA.
p .242
15
Example 3 Treat Age as Continuous, but Treat
BMI as Categorical
Treating Age as continuous but BMI as categorical
suggests the use of multinomial (or ordinal)
logistic regression.
Obese
Overweight
Ideal weight underweight
p .131
16
Example 4 Treat Both Variables as Continuous
Treating BOTH variables as continuous suggests
use of simple linear regression.
Statistically significant!
BMI b0 b1(Age) error
p .048
17
ANOVA Summary Tablefrom the Linear Regression
Model
When we treat both variables as continuous, the
association between them is statistically
significant.
Carving either or both variables into categories
prior to analysis results in loss of power.
18
The Regression Coefficients
BMI 25.445 .454 x Age error
  • NOTE Age was centered on 35, and then divided
    by 10
  • Fitted value of BMI for a 35-year old 25.445
    (the constant)
  • Fitted value increases .454 for every 10-yr
    increase in age

19
The Moral of the Story
  • Streiner (2002) argues that, The purpose of most
    research is to discover relationsrelations
    between or among variables or between treatment
    interventions and outcomes.
  • If that is true, one ought to use the most
    powerful test that is appropriate for the data
  • Carving continuous variables into categories
    decreases powerso dont do it!

20
Some Common Objections to the Use of Continuous
Variables
21
Ultimately, doctors have to categorize
folkse.g., treat or not treat, diseased or not,
etc.
  • True. But categorization of folks can be done
    after statistical analysisit does not have to be
    done before analysis.
  • As Streiner says, if the point is to look for
    associations between variables, one ought to use
    the most powerful test that is appropriate.

22
Doctors are much more familiar with methods that
use categorical measures (e.g., proportions, odds
ratios relative risks).
23
Streiners Rejoinder
  • As for the argument that physicians are more
    comfortable with statistics based on categorical
    measures, we are likely dealing with both a base
    canard that they, like old dogs, cannot learn new
    tricks and a vicious circle.
  • As long as the belief persists, studies will be
    designed, analyzed, and reported using
    proportions and ORs, meaning that physicians will
    not have the opportunity to become more
    comfortable with other approaches.

Streiner (2002, p. 263)
24
GO SEE MY POSTER!
25
So in case you missed it earlier
  • The aim of most research is to look for
    associations between variables
  • One ought to use the most powerful test
    appropriate for the data
  • Categorizing continuous variables reduces power.
    Therefore

26
Dont do it!
QUESTIONS?
27
Contact Information
Go see my poster!
  • Bruce Weaver
  • Assistant Professor, NOSM
  • MSW-2006 (West Campus)
  • E-mail bweaver_at_lakeheadu.ca
  • Tel 807-346-7704

28
Extra Slides
29
Another illustration Age cholesterol
Fit-line from simple linear regression
Age group boundaries
Group means from one-way ANOVA
30
Two Problems with Age Groups
  • People on either side of a cut-point can have
    very different fitted values of Y despite tiny
    differences in X
  • People at opposite ends of a category have the
    same fitted value despite sizeable differences on
    X

31
A Skeleton in My Closet
  • I used to work for the McMaster research group
    headed by Gord Guyatt Deb Cook
  • That is not the skeletontheres more!
  • One paper examined factors affecting CPR
    directives upon entry into the ICU
  • Explicit decision Resuscitate
  • Explicit decision Do not resuscitate (DNR)
  • No explicit decision Resuscitate

Reference category for a multinomial logistic
regression analysis.
32
A Skeleton in My Closet (2)
  • One explanatory variable was age
  • But it was not treated as a continuous variable
  • Age was carved into 4 groups
  • Under 50
  • 50 64
  • 65 74
  • 75

Reference Category
Doh!
33
Odds Ratios for Age
Reference Category
  • The odds of an explicit DNR directive (relative
    to no explicit decision) are 3.4 times greater in
    the 50-64 group than in the under 50 group

34
Why I remember that particular case
  • That was certainly not the only time I was
    complicit in the categorization of continuous
    variables
  • Why do I remember that case specifically?
  • Guyatts 50th birthday was imminent, and he was
    not thrilled about that one odds ratio

35
Whats the point?
  • Apart from getting a cheap laugh, what is the
    point of that story?
  • It illustrates 2 consequences of carving
    continuous variables into categories prior to
    statistical analysis
  • Two people with nearly identical values for the
    predictor variable (X) can have very different
    fitted values of Y if they are in different
    groups
  • Everyone within the same group has the same
    fitted value of Y, even if there is substantial
    variation in X when it is treated as continuous

36
Something People Often Overlook
  • The chi-square test of association treats both
    variables as if they are nominal
  • I.e., the ordinal nature of the Age and BMI
    groups was ignored
  • We can demonstrate this by re-ordering the
    categories (from the earlier example), and
    running the test again

37
Two Contingency Tables
Categories in natural (ascending) order
Categories in haphazard order
38
Results for the Original Table
No evidence of an association between Age and BMI.
39
Results with Re-ordered Categories
In previous analysis, ?2 3.084, p .079
Same results as before!
The test of linear-by-linear association takes
into account the ordinal nature of the
categories, so is more powerful than the ordinary
chi-square test (when appropriate!).
40
The Original Results Again!
The test of linear-by-linear association can be
used when the categories are ordinal, and is more
powerful than the ordinary chi-square test. But
in this case, the association between variables
is still not statistically significant.
41
More on Linear-by-Linear Association
  • If anyone wants more information about the
    chi-square test of linear-by-linear association,
    see Dave Howells website
  • Do a Google search on
  • David Howell chi-square ordinal data
Write a Comment
User Comments (0)
About PowerShow.com