# SIMPSON - PowerPoint PPT Presentation

1 / 12
Title:

## SIMPSON

Description:

### SIMPSON'S PARADOX. Any statistical relationship between two variables may be ... Overall data showed a higher rate of admission among male applicants, but, ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 13
Provided by: kaorumu
Category:
Tags:
Transcript and Presenter's Notes

Title: SIMPSON

1
(Pearson et al. 1899 Yule 1903 Simpson 1951)
• Any statistical relationship between two
variables may be reversed by including additional
factors in the analysis.
• Which factors should be included in the analysis.

2
EXAMPLES OF SIMPSONS REVERSAL
• e.g., UC Berkeley's alleged sex bias in graduate
admission (Science - 1975). Overall data showed a
higher rate of admission among male applicants,
but, broken down by departments, data showed a
slight bias in favor of admitting female
applicants.
• e.g., "reverse regression" (1970-80) Should
one, in salary discrimination cases, compare
salaries of equally qualified men and women, or,
instead, compare qualifications of equally paid
men and women. (Opposite conclusions.)
• Practical Dilemma Why break down by department?
• How about by some other variable Z?
• Find Z such that P(ydo(x)) ?z P(yx,z)P(z)
• Solution The back-door algorithm (Chapter 3).

3
(No Transcript)
4
PEARSONS SHOCK SPURIOUS CORRELATION
We are thus forced to the conclusion that a
mixture of heterogeneous groups, each of which
exhibits in itself no organic correlation, will
exhibit a greater or less amount of correlation.
This correlation may properly be called spurious,
yet as it is almost impossible to guarantee the
absolute homogeneity of any community, our
results for correlation are always liable to an
error, the amount of which cannot be foretold.
To those who persist on looking upon all
correlation as cause and effect, the fact that
correlation can be produced between two quite
uncorrelated characters A and B by taking an
artificial mixture of two closely allied races,
must come as rather a shock. Pearson, Lee
Brandy-Moore (1899) 1. Causation perfect
correlation 2. Not all correlations are
correlations (Aldrich 1994)
5

T Treated T Not treated R Recovered R
• Easy question (1950-1994)
• When / why the reversal?
• Harder questions (1994)
• Is the treatment useful? Which table to consult?
• Why is Simpsons reversal a paradox?

6
SIMPSONS REVERSAL
Group behavior
Pr(recovery drug, male) gt Pr(recovery
no-drug, male) Pr(recovery drug, female)
gt Pr(recovery no-drug, female)
Overall behavior
Pr(recovery drug) lt Pr(recovery
no-drug)
7
Gender
Treatment
Mediating factor
Treatment
Z
X
X
Z
Recovery
Recovery
Y
Y
8
THE INEVITABLE CONCLUSION THE PARADOX STEMS
FROM CAUSAL INTERPRETATION
• TWO PROOFS
• Surprise surfaces only when we speak about
efficacy, not
• When two causal models generate the same
statistical data and
• In one we decide to use the drug yet in the other
not to use it,
• our decision must be driven by causal and not by
statistical
• considerations.
• Thus, there is no statistical criterion to warn
us against consulting
• the wrong table.
• Can Temporal information help?
• No!, see Figure 6.3 (c).

9
WHY TEMPORAL INFORMATION DOES NOT HELP
Treatment
Treatment
Treatment
C
C
C
F
F
F
Gender
Blood Pressure
E
E
E
Recovery
Recovery
Recovery
(a)
(b)
(c)
(d)
• In (c), F may occur before or after C, and the
• consult the combined table.
• In (d), may occur before or after C, and the
• consult the F-specific tables

10
1. People think causes, not proportions. 2.
"Reversal" is possible in the calculus of
proportions but impossible in the calculus
of causes.
11
CAUSAL CALCULUS PROHIBITS REVERSAL
Group behavior
dodrug
dono-drug
Pr(recovery drug, male) gt Pr(recovery
no-drug, male) Pr(recovery drug, female) gt
Pr(recovery no-drug, female)
dodrug
dono-drug
Assumption
Pr (male dodrug ) Pr (male dono-drug)
Overall behavior
dodrug
dono-drug Pr(recovery
drug) gt Pr(recovery no-drug)
12
THE SURE THING PRINCIPLE
Theorem 6.1.1 An action C that increases the
probability of an event E in each subpopulation
must also increase the probability of E in the
population as a whole, provided that the action
does not change the distribution of the
subpopulations.