LSP 121 - PowerPoint PPT Presentation

About This Presentation
Title:

LSP 121

Description:

LSP 121 Statistics That Deceive Simpson s Paradox It is well accepted knowledge that the larger the data set, the better the results Simpson s Paradox ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 9
Provided by: sgu87
Category:
Tags: lsp | lying

less

Transcript and Presenter's Notes

Title: LSP 121


1
LSP 121
  • Statistics That Deceive

2
Simpsons Paradox
  • It is well accepted knowledge that the larger the
    data set, the better the results
  • Simpsons Paradox demonstrates that a great deal
    of care has to be taken when combining smaller
    data sets into a larger one
  • Sometimes the conclusions from the larger data
    set are opposite the conclusion from the smaller
    data sets

3
Example Simpsons Paradox
Baseball batting statistics for two players
First Half Second Half Total Season
Player A .400 .250 .264
Player B .350 .200 .336
How could Player A beat Player B for both halves
individually, but then have a lower total season
batting average?
4
Example Continued
We werent told how many at bats each player had
First Half Second Half Total Season
Player A 4/10 (.400) 25/100 (.250) 29/110 (.264)
Player B 35/100 (.350) 2/10 (.200) 37/110 (.336)
Player As dismal second half and Player Bs
great first half had higher weights than the
other two values.
5
Another Example
Average college physics grades for students in an
engineering program taken HS physics no HS
physics Number of Students 50 5 Average
Grade 80 70
Average college physics grades for students in a
liberal arts program taken HS physics no HS
physics Number of Students 5 50 Average
Grade 95 85 It appears that in both classes,
taking high school physics improves your college
physics grade by 10.
6
Example continued
In order to get better results, lets combine our
datasets. In particular, lets combine all the
students that took high school physics. More
precisely, combine the students in the
engineering program that took high school physics
with those students in the liberal arts program
that took high school physics. Likewise,
combine the students in the engineering program
that did not take high school physics with those
students in the liberal arts program that did
not take high school physics. But be careful!
You cant just take the average of the two
averages, because each dataset has a different
number of values!!
7
Example continued
Average college physics grades for students who
took high school physics
Students AvgGrades Weighted Grade Engineering 50
80 50/558072.7 Lib Arts 5 95 5/55958.6 T
otal 55 Average (72.7 8.6) 81.3 Average
college physics grades for students who did not
take high school physics
Students AvgGrades Weighted Grade Engineering 5
70 5/55706.4 Lib Arts 50 85 50/558577.3 T
otal 55 Average (6.4 77.3) 83.7 Did the
students that did not have high school physics
actually do better?
8
The Problem
  • Two problems with combining the data
  • There was a larger percentage of one type of
    student in each table
  • The engineering students had a more rigorous
    physics class than the liberal arts students,
    thus there is a hidden variable
  • So be very careful when you combine data into a
    larger set
Write a Comment
User Comments (0)
About PowerShow.com