Introduction to Statistics: Political Science (Class 9) - PowerPoint PPT Presentation

Loading...

PPT – Introduction to Statistics: Political Science (Class 9) PowerPoint presentation | free to download - id: 64fad9-ZTAxM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Introduction to Statistics: Political Science (Class 9)

Description:

Title: Introduction to Statistics: Political Science (Class 7) Author: David Doherty Last modified by: David Doherty Created Date: 10/14/2010 12:28:54 PM – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Date added: 9 May 2020
Slides: 41
Provided by: DavidDo155
Learn more at: http://orion.luc.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Introduction to Statistics: Political Science (Class 9)


1
Introduction to Statistics Political Science
(Class 9)
  • Review

2
Probability of having cardiovascular disease
  • Purpose of statistics
  • Inferences about populations using samples
  • We draw a random sample of 1,000 adults and 405
    have some form of CVD
  • Based on our sample, if we randomly select one
    adult from the population what is the
    probability that they have cardiovascular disease?

3
Conditional Probability
No CVD CVD
Exercise less than 3 days/week (N602) 30.3 28.9
Exercise 3 or more days/week (N398) 30.2 10.6
  • Probability of exercising lt3 days/week?
  • Probability of CVD among those who exercise lt3
    days/week?
  • Probability of CVD among those who exercise 3 or
    more days/week?

4
Association between exercise and CVD?
No CVD CVD
Exercise less than 3 days/week (N602) 30.3 28.9
Exercise 3 or more days/week (N398) 30.2 10.6
  • p1 28.9/(30.328.9) 0.488
  • p2 10.6/(30.210.6) 0.260
  • Difference 0.488 - 0.260 .228
  • Those who exercise less than 3 days/week .228
    (22.8) more likely to have CVD

5
Specifying and testing hypotheses
  • Difference of proportions .228
  • Whats our null hypothesis?
  • Why a null hypothesis? Why not test whether the
    difference is .228?
  • Central limit theorem
  • In repeated sampling, the distribution of our
    estimates of the mean (or difference of means or
    slope) will be normally distributed and centered
    over the true population value

6
Central limit theorem
0
1 standard error
Proposed true value
7
Comparing proportions
  • Difference of proportions .228
  • p1 28.9/(30.328.9) 0.488 (N602)
  • p2 10.6/(30.210.6) 0.260 (N398)
  • Standard error of this difference

8
Comparing proportions
  • So, standard error of difference is the square
    root of (.488(1-.488)/602)(.260(1-.260)/398)
  • Which is .0299
  • Difference of proportions .237

9
Hypotheses
  • Null hypothesis
  • There is no difference in the rate of CVD between
    those who exercise less than 3 days/week and
    those who do
  • Alternate hypothesis
  • There is a difference in the rate of CVD between
    those who exercise less than 3 days/week and
    those who do
  • (i.e., the difference is not 0)

10
If 0 is was the true difference, it would be very
unlikely that we would find a difference 7.93
(.237/.0299) standard errors from that value by
chance
0
1 standard error
Proposed true value
11
Does exercise cause lower CVD?
  • Reverse causation? Might CVD cause exercise?
  • Failure to account for confounds
  • Typically leads to over-estimating the strength
    of a relationship (not always but usually)

12
(No Transcript)
13
Specification and Interpretation
  • Multivariate Regression

14
Does exercise make CDV less likely?
  • Regression (predict CDV)
  • Estimated likelihood of CDV if exercise 4
    days/week?
  • What might confound our estimate of the
    relationship between exercise and CVD?

Coef. SE T P-value Days Exercise
(0-7) -0.06 .001 ? 0.000
Constant 0.56 .002 ? 0.000
15
Controlling for confounds
Coef. SE T P-value Days Exercise
(0-7) -0.03 .001 -3.0 0.002 Days
Fast Food (0-7) 0.04 .002 2.0
0.048 Constant 0.42 .002 21.0
0.000
16
Chance CVD
Days per Week Exercise
17
Controlling for dichotomous confounds
Coef. SE T P-value Days Exercise
(0-7) -0.03 .001 -3.0 0.002 Days
Fast Food (0-7) 0.04 .002 2.0
0.048 Smoker (1yes) 0.11 .001 11.0
0.000 Constant 0.38 .002 19.0
0.000
  • Predicted probability of CVD for
  • 2 days exercise, 2 days Fast food, smoker

18
Nominal Variables
  • Variable that does not have an order to it
  • Nothing is higher or lower
  • Create set of dichotomous variables
  • Always interpret coefficients with respect to the
    reference category

19
(No Transcript)
20
Controlling for nominal confounds
Coef. SE T P-value Days Exercise
(0-7) -0.03 .001 -3.0 0.002 Days
Fast Food (0-7) 0.03 .002 1.5 0.135
Smoker (1yes) 0.09 .001 9.0
0.000 South (1yes) 0.03 .002 1.5
0.137 West (1yes) -0.01 .002 -0.5
0.642 Northeast (1yes) 0.02 .002 1.0
0.410 Constant 0.34 .002 17.0
0.000 (Midwest is excluded category)
What if we wanted to test whether including
region indicators improves fit of the model?
21
Non-linear relationships
22
Logarithms
Why use a logarithmic transformation? You think
the relationship looks like this
23
Logarithms
24
Squared term U(or n)-shaped relationship
Age and political ideology (-2very conservative,
2very liberal)
Coef. SE T P
Age -0.007 0.004 -1.740 0.082
Constant 0.122 0.209 0.580 0.561
Coef. SE T P
Age -0.065 0.025 -2.630 0.009
Age-squared 0.001 0.000 2.390 0.017
Constant 1.554 0.635 2.450 0.015
25
Age and Political Ideology
Coef. SE T P
Age -0.065 0.025 -2.630 0.009
Age-squared 0.001 0.000 2.390 0.017
Constant 1.554 0.635 2.450 0.015
Age Age2 -0.065Age .0005574Age2 Constant Predicted Value
18 324 -1.178 0.181 1.554 0.557
28 784 -1.832 0.437 1.554 0.159
38 1444 -2.487 0.805 1.554 -0.128
48 2304 -3.141 1.284 1.554 -0.303
58 3364 -3.795 1.875 1.554 -0.366
68 4624 -4.450 2.577 1.554 -0.319
78 6084 -5.104 3.391 1.554 -0.159
26
(No Transcript)
27
Create indicators from an ordered variable
Party Identification (-3 to 3)
Seven Variables Strong Republican (1yes) Weak
Republican (1yes) Lean Republican (1yes) Pure
Independent (1yes) Lean Democrat (1yes) Weak
Democrat (1yes) Strong Democrat (1yes)
28
Predict Obama Favorability (1-4)
Coef. SE T P
Strong Republican -1.632 0.161 -10.160 0.000
Weak Republican -0.707 0.198 -3.580 0.000
Lean Republican -1.235 0.181 -6.810 0.000
Lean Democrat 0.674 0.197 3.430 0.001
Weak Democrat 0.494 0.187 2.640 0.009
Strong Democrat 0.595 0.159 3.750 0.000
Constant 2.940 0.134 21.870 0.000
Excluded category Pure Independents
29
Obama Favorability
30
Predict Obama Favorability (1-4)
Coef. SE T P
Strong Republican -0.397 0.150 -2.650 0.008
Weak Republican 0.528 0.189 2.790 0.006
Pure Independent 1.235 0.181 6.810 0.000
Lean Democrat 1.909 0.188 10.150 0.000
Weak Democrat 1.729 0.179 9.680 0.000
Strong Democrat 1.831 0.148 12.360 0.000
Constant 1.705 0.122 14.010 0.000
New excluded category Leaning Republicans
31
Interactions
  • One variable moderates the effect of another
    i.e., the relationship between one variable and
    an outcome depends on the value of another
    variable

32
Regression estimates an equation
Coef. SE T P
Party Affiliation (-3strong R 3strong D) 1.286 0.878 1.460 0.143
Voted in 2008 -1.138 1.484 -0.770 0.443
Party Affiliation x Voted in 2008 3.575 0.918 3.900 0.000
Constant 61.100 1.358 44.980 0.000
61.100 1.286Party 1.138Voted
3.575PartyVoted u
61.100 Party1.286 PartyVoted3.575
1.138Voted u
OR
61.100 Party1.286 VotedParty3.575
Voted1.138 u
33
Party Aff. Voted Party Aff. Voted Party x Voted Constant Predicted Value
Coefficients ? Coefficients ? 1.286 -1.138 3.575 61.100
-3 0 -3.858 0 0 61.100 57.242
-2 0 -2.572 0 0 61.100 58.528
-1 0 -1.286 0 0 61.100 59.814
0 0 0.000 0 0 61.100 61.100
1 0 1.286 0 0 61.100 62.386
2 0 2.572 0 0 61.100 63.672
3 0 3.858 0 0 61.100 64.959
Party Aff. Voted Party Aff. Voted Party x Voted Constant Predicted Value
Coefficients ? Coefficients ? 1.286 -1.138 3.575 61.100
-3 1 -3.858 -1.13775 -10.7258 61.100 45.378
-2 1 -2.572 -1.13775 -7.1505 61.100 50.240
-1 1 -1.286 -1.13775 -3.57525 61.100 55.101
0 1 0.000 -1.13775 0 61.100 59.962
1 1 1.286 -1.13775 3.575252 61.100 64.824
2 1 2.572 -1.13775 7.150504 61.100 69.685
3 1 3.858 -1.13775 10.72576 61.100 74.547
34
(No Transcript)
35
Establishing causality
36
Dealing with confounds
  • Theory multivariate regression
  • Experiments

37
Dealing with reverse causation
  • Theory
  • Experiments

38
Experiments
  • What is the key characteristic of an experiment?
  • How does this address reverse causality?
  • How does it address confounds?
  • Weaknesses/limitations of experiments?

39
Exam Expectations
  • Describe probabilities / conditional
    probabilities
  • Write hypotheses
  • Demonstrate understanding of how null hypotheses
    relate to the central limit theorem
  • Test difference of proportions (formula for SE
    will be provided)
  • Interpreting multivariate regression
  • Relationships (slopes)
  • Predicted values
  • Sketch graphs of relationships
  • Discuss strengths and limitations of analyses
  • Why an estimated slope might be biased
  • Benefits and limitations of experiments

40
Notes
  • Homework 3 graded
  • Homework 4 due Thursday 12/9
  • Office hours next week email to come
  • Exam December 14 at 2pm
About PowerShow.com