1 / 40

Introduction to Statistics Political Science

(Class 9)

- Review

Probability of having cardiovascular disease

- Purpose of statistics
- Inferences about populations using samples
- We draw a random sample of 1,000 adults and 405

have some form of CVD - Based on our sample, if we randomly select one

adult from the population what is the

probability that they have cardiovascular disease?

Conditional Probability

No CVD CVD

Exercise less than 3 days/week (N602) 30.3 28.9

Exercise 3 or more days/week (N398) 30.2 10.6

- Probability of exercising lt3 days/week?
- Probability of CVD among those who exercise lt3

days/week? - Probability of CVD among those who exercise 3 or

more days/week?

Association between exercise and CVD?

No CVD CVD

Exercise less than 3 days/week (N602) 30.3 28.9

Exercise 3 or more days/week (N398) 30.2 10.6

- p1 28.9/(30.328.9) 0.488
- p2 10.6/(30.210.6) 0.260
- Difference 0.488 - 0.260 .228
- Those who exercise less than 3 days/week .228

(22.8) more likely to have CVD

Specifying and testing hypotheses

- Difference of proportions .228
- Whats our null hypothesis?
- Why a null hypothesis? Why not test whether the

difference is .228? - Central limit theorem
- In repeated sampling, the distribution of our

estimates of the mean (or difference of means or

slope) will be normally distributed and centered

over the true population value

Central limit theorem

0

1 standard error

Proposed true value

Comparing proportions

- Difference of proportions .228
- p1 28.9/(30.328.9) 0.488 (N602)
- p2 10.6/(30.210.6) 0.260 (N398)
- Standard error of this difference

Comparing proportions

- So, standard error of difference is the square

root of (.488(1-.488)/602)(.260(1-.260)/398) - Which is .0299
- Difference of proportions .237

Hypotheses

- Null hypothesis
- There is no difference in the rate of CVD between

those who exercise less than 3 days/week and

those who do - Alternate hypothesis
- There is a difference in the rate of CVD between

those who exercise less than 3 days/week and

those who do - (i.e., the difference is not 0)

If 0 is was the true difference, it would be very

unlikely that we would find a difference 7.93

(.237/.0299) standard errors from that value by

chance

0

1 standard error

Proposed true value

Does exercise cause lower CVD?

- Reverse causation? Might CVD cause exercise?
- Failure to account for confounds
- Typically leads to over-estimating the strength

of a relationship (not always but usually)

(No Transcript)

Specification and Interpretation

- Multivariate Regression

Does exercise make CDV less likely?

- Regression (predict CDV)
- Estimated likelihood of CDV if exercise 4

days/week? - What might confound our estimate of the

relationship between exercise and CVD?

Coef. SE T P-value Days Exercise

(0-7) -0.06 .001 ? 0.000

Constant 0.56 .002 ? 0.000

Controlling for confounds

Coef. SE T P-value Days Exercise

(0-7) -0.03 .001 -3.0 0.002 Days

Fast Food (0-7) 0.04 .002 2.0

0.048 Constant 0.42 .002 21.0

0.000

Chance CVD

Days per Week Exercise

Controlling for dichotomous confounds

Coef. SE T P-value Days Exercise

(0-7) -0.03 .001 -3.0 0.002 Days

Fast Food (0-7) 0.04 .002 2.0

0.048 Smoker (1yes) 0.11 .001 11.0

0.000 Constant 0.38 .002 19.0

0.000

- Predicted probability of CVD for
- 2 days exercise, 2 days Fast food, smoker

Nominal Variables

- Variable that does not have an order to it
- Nothing is higher or lower
- Create set of dichotomous variables
- Always interpret coefficients with respect to the

reference category

(No Transcript)

Controlling for nominal confounds

Coef. SE T P-value Days Exercise

(0-7) -0.03 .001 -3.0 0.002 Days

Fast Food (0-7) 0.03 .002 1.5 0.135

Smoker (1yes) 0.09 .001 9.0

0.000 South (1yes) 0.03 .002 1.5

0.137 West (1yes) -0.01 .002 -0.5

0.642 Northeast (1yes) 0.02 .002 1.0

0.410 Constant 0.34 .002 17.0

0.000 (Midwest is excluded category)

What if we wanted to test whether including

region indicators improves fit of the model?

Non-linear relationships

Logarithms

Why use a logarithmic transformation? You think

the relationship looks like this

Logarithms

Squared term U(or n)-shaped relationship

Age and political ideology (-2very conservative,

2very liberal)

Coef. SE T P

Age -0.007 0.004 -1.740 0.082

Constant 0.122 0.209 0.580 0.561

Coef. SE T P

Age -0.065 0.025 -2.630 0.009

Age-squared 0.001 0.000 2.390 0.017

Constant 1.554 0.635 2.450 0.015

Age and Political Ideology

Coef. SE T P

Age -0.065 0.025 -2.630 0.009

Age-squared 0.001 0.000 2.390 0.017

Constant 1.554 0.635 2.450 0.015

Age Age2 -0.065Age .0005574Age2 Constant Predicted Value

18 324 -1.178 0.181 1.554 0.557

28 784 -1.832 0.437 1.554 0.159

38 1444 -2.487 0.805 1.554 -0.128

48 2304 -3.141 1.284 1.554 -0.303

58 3364 -3.795 1.875 1.554 -0.366

68 4624 -4.450 2.577 1.554 -0.319

78 6084 -5.104 3.391 1.554 -0.159

(No Transcript)

Create indicators from an ordered variable

Party Identification (-3 to 3)

Seven Variables Strong Republican (1yes) Weak

Republican (1yes) Lean Republican (1yes) Pure

Independent (1yes) Lean Democrat (1yes) Weak

Democrat (1yes) Strong Democrat (1yes)

Predict Obama Favorability (1-4)

Coef. SE T P

Strong Republican -1.632 0.161 -10.160 0.000

Weak Republican -0.707 0.198 -3.580 0.000

Lean Republican -1.235 0.181 -6.810 0.000

Lean Democrat 0.674 0.197 3.430 0.001

Weak Democrat 0.494 0.187 2.640 0.009

Strong Democrat 0.595 0.159 3.750 0.000

Constant 2.940 0.134 21.870 0.000

Excluded category Pure Independents

Obama Favorability

Predict Obama Favorability (1-4)

Coef. SE T P

Strong Republican -0.397 0.150 -2.650 0.008

Weak Republican 0.528 0.189 2.790 0.006

Pure Independent 1.235 0.181 6.810 0.000

Lean Democrat 1.909 0.188 10.150 0.000

Weak Democrat 1.729 0.179 9.680 0.000

Strong Democrat 1.831 0.148 12.360 0.000

Constant 1.705 0.122 14.010 0.000

New excluded category Leaning Republicans

Interactions

- One variable moderates the effect of another

i.e., the relationship between one variable and

an outcome depends on the value of another

variable

Regression estimates an equation

Coef. SE T P

Party Affiliation (-3strong R 3strong D) 1.286 0.878 1.460 0.143

Voted in 2008 -1.138 1.484 -0.770 0.443

Party Affiliation x Voted in 2008 3.575 0.918 3.900 0.000

Constant 61.100 1.358 44.980 0.000

61.100 1.286Party 1.138Voted

3.575PartyVoted u

61.100 Party1.286 PartyVoted3.575

1.138Voted u

OR

61.100 Party1.286 VotedParty3.575

Voted1.138 u

Party Aff. Voted Party Aff. Voted Party x Voted Constant Predicted Value

Coefficients ? Coefficients ? 1.286 -1.138 3.575 61.100

-3 0 -3.858 0 0 61.100 57.242

-2 0 -2.572 0 0 61.100 58.528

-1 0 -1.286 0 0 61.100 59.814

0 0 0.000 0 0 61.100 61.100

1 0 1.286 0 0 61.100 62.386

2 0 2.572 0 0 61.100 63.672

3 0 3.858 0 0 61.100 64.959

Party Aff. Voted Party Aff. Voted Party x Voted Constant Predicted Value

Coefficients ? Coefficients ? 1.286 -1.138 3.575 61.100

-3 1 -3.858 -1.13775 -10.7258 61.100 45.378

-2 1 -2.572 -1.13775 -7.1505 61.100 50.240

-1 1 -1.286 -1.13775 -3.57525 61.100 55.101

0 1 0.000 -1.13775 0 61.100 59.962

1 1 1.286 -1.13775 3.575252 61.100 64.824

2 1 2.572 -1.13775 7.150504 61.100 69.685

3 1 3.858 -1.13775 10.72576 61.100 74.547

(No Transcript)

Establishing causality

Dealing with confounds

- Theory multivariate regression
- Experiments

Dealing with reverse causation

- Theory
- Experiments

Experiments

- What is the key characteristic of an experiment?
- How does this address reverse causality?
- How does it address confounds?
- Weaknesses/limitations of experiments?

Exam Expectations

- Describe probabilities / conditional

probabilities - Write hypotheses
- Demonstrate understanding of how null hypotheses

relate to the central limit theorem - Test difference of proportions (formula for SE

will be provided) - Interpreting multivariate regression
- Relationships (slopes)
- Predicted values
- Sketch graphs of relationships
- Discuss strengths and limitations of analyses
- Why an estimated slope might be biased
- Benefits and limitations of experiments

Notes

- Homework 3 graded
- Homework 4 due Thursday 12/9
- Office hours next week email to come
- Exam December 14 at 2pm