Measures of Association and Regression Introduction - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Measures of Association and Regression Introduction

Description:

How to analyze one variable at a time (CI) ... two nominal or ordinal level variables with any number of categories (using simple crosstabs) ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 42
Provided by: DAN3179
Category:

less

Transcript and Presenter's Notes

Title: Measures of Association and Regression Introduction


1
Measures of Association and Regression
Introduction
2
Measures of Association
  • Ask the question, How strong is the
    relationship? or How well can we predict y
    using x
  • What is the difference between MoA and...
  • Chi-Square or other hypothesis test results
  • Recall difference between regression and
    correlation

3
How Strong is the Relationship?
4
How Strong is the Relationship?
  • Without an independent variable, our best
    prediction is a measure of central tendency
    (mode)
  • With an independent variable, we might be able to
    improve our ability to predict the dependent
    variable.
  • How much better can we predict the dependent
    variable with another variable than when we only
    had the mode?

5
Gender and Guns
  • ModeFavor Gun Ban
  • If all we knew was this, our best prediction
    would be that everyone favors the gun ban.
  • We would be right 807 times and we would be in
    error 707 times

6
Gender and Guns
  • When we add gender to the mix we would predict
    that all females favor (449 right, 226 wrong) and
    all males oppose (481 right, 358 wrong)
  • That gives us a total of 930 right predictions
    and 584 errors

7
How much better do we do?
  • Without knowing gender, we made 707 errors
  • Knowing gender, we reduced that to 584 errors
  • We made 123 fewer errors by knowing gender.
  • We can show as a proportion, how big the
    reduction in error is

We make about 18 fewer errors than we did
knowing the mode alone.
8
Proportional Reduction of Error
  • This framework is called PRE
  • This measure, lambda (?), is appropriate for
    crosstabs that involve a nominal variable
  • Similar logic can be applied to analyses of two
    ordinal variables

9
Ordinal PRE Measures
  • Basic Logic
  • Look at each case
  • Did the measure of central tendency alone get it
    right?
  • Did adding the ind. Variable get it right?
  • Compare to see which is better
  • Criterion for good choices?
  • Fits your research needs
  • Conservative estimate

10
Choices
  • Gamma
  • Drops cases that are ties
  • Overestimates strength
  • Kendalls tau-b
  • Allows for ties, best for square tables
  • Kendalls tau-c
  • Allows for ties, best for non-square tables
  • Somers d
  • Variant of tau statistics, not quite as common

11
Nominal Measures
  • If one or both variables are nominal, you should
    use lambda (?).
  • Sometimes lambda doesnt work well
  • Because we predict that all women approve and
    that all men approve, we get the same prediction!
    We need something else

12
Cramers V (aka phi / f)
  • We know that the ?2 statistic said there is a
    real relationship, so we know that its strength
    is greater than 0.
  • Cramers V is a mathematical manipulation of the
    ?2 statistic that turns it into a measure of
    association.
  • where m is the less of
    (r-1) and (c-1)
  • This is not a PRE measure of association, though,
    so its interpretation is a little different.

13
Interpreting PRE MoAs
  • All lie on the interval -1,1
  • SPSS will calculate them all for you.
  • General Guidelines

14
Interpreting PRE MoAs
  • SPSS may or may not get the direction of the
    association right (positive or negative). It
    depends on how the variables are coded.
  • You should place the appropriate sign by your
    measure of association (but not by Cramers V,
    its always positive)
  • Tells you the proportion of errors made by the
    measure of central tendency alone that are no
    longer errors when we add the independent variable

15
Interpreting Cramers V
  • For our purposes, you may use the same table as
    for PRE results (weak, moderate)
  • You cannot interpret it in terms of reduction of
    error.

16
Directionality
  • Some tests are directional while others are
    symmetric
  • In directional tests, you get different answers
    depending on which variable is the D.V. and which
    is the I.V.
  • In symmetric tests, the answer is the same
  • Given the option, choose the directional test and
    specify the right D.V.

17
Finally,
  • We only observe a sample, and Cramers V or any
    of the PRE MoAs could be 0 in the population, and
    we observe one higher than 0 by sampling error
    only!
  • We can do tests of significance on many of these
  • SPSS will do this for you, I wont make you do it
    by hand.

18
Regression
19
Multiple Variables
  • So far, we have learned
  • How to analyze one variable at a time (CI)
  • How to compare two means or proportions (a
    relationship between one variable measured at any
    level and a nominal or ordinal variable with two
    categories)
  • How to compare relationships between two nominal
    or ordinal level variables with any number of
    categories (using simple crosstabs)
  • Today, we learn about relationships between two
    interval-level variables.

20
Two Interests
  • Magnitude of relationship between the independent
    variable and the dependent variable (how much
    change in one yields how much change in the
    other).
  • Correlation the predictive power of one variable
    on another. This is a Measure of Association
    (but not a PRE Measure of Association)

21
Correlation
22
Types of Correlation
  • Positive Correlation An increase in one variable
    results in an increase in the other
  • Negative Correlation An increase in one variable
    results in a decrease in the other.

23
Correlation analysis asks
  • How good a predictor is the independent
    Variable of the dependent variable?
  • How good a predictor of income is education?
  • How accurate is our prediction of the effect of
    education on income?
  • How close are the Dots to the line
  • It is a Measure of Association

24
Computing the Correlation Coefficient, Pearsons r
  • Assess how much X and Y move together
    (covariance) out of the amount they move
    individually (variance)

25
Computing r in STATA
  • corr var1 var2 var100
  • Output looks like this

. corr unemplyd mdnincm flood age65 black
(obs427) unemplyd mdnincm flood
age65 black ------------------------------
--------------------------- unemplyd 1.0000
mdnincm -0.4960 1.0000 flood 0.0827
0.0083 1.0000 age65 0.0319 -0.1634
-0.0272 1.0000 black 0.5037 -0.3065
0.0703 -0.1038 1.0000
26
Interpreting the Correlation Coefficient
27
Note Correlation is Linear
r 0
r .9
r 0
r .8
28
Correlation and Regression
  • Regression effects are depicted by the slope of
    the line.
  • Correlation can be seen as the spread of points
    around the regression line. The greater the
    amount of spread of points around the regression
    line, the less predictive is X of Y and
    consequently, the weaker the correlation.

29
Correlation 1 Slope 1
30
Correlation 1 Slope -2
31
Imperfect Correlation and Relationships
  • We rarely see perfect correlation
  • However, even with imperfect correlation, we can
    have some expectation of what will happen on
    average.
  • While Correlation is never perfect, we can draw a
    line to summarize the trend in the data points.
    This is the Regression Line

32
Formula for a line
  • y mx b (algebraic)
  • y a bx (statistical)
  • It is the same thing. Well add one more thing
    error
  • yi a bx ei is called the sample regression
    function
  • Yi a ßx ei is called the population
    regression function

33
Making Predictions
34
Establishing Relationships
35
Establishing Relationships
Now Add 5 years of education
10 Years of Education Means about 12,000 Income
It adds an Additional 4,000 of Income!
36
Where do we Draw the Line?
  • Least Squares Principle
  • Under the Gauss-Markov assumptions, the Ordinary
    Least Squares estimator is the Best Linear
    Unbiased Estimator
  • OLS is BLUE

37
What is the estimator?
  • In the bivariate case (1 dependent, 1
    independent), the least squares principle gives
    us these equations for calculating the slope and
    intercept.

38
(No Transcript)
39
Calculating a and b
  • b 54.73 / 4.19 13.06
  • a 235.7 b (11.2)
  • 235.7 13(11.2)
  • 235.7 146.27
  • 89.43

40
STATA command for Regression
  • regress y x
  • Output looks like this

. regress turnout diplomau Source
SS df MS Number of obs
426 -------------------------------------------
F( 1, 424) 55.40 Model
1.2806e11 1 1.2806e11 Prob gt F
0.0000 Residual 9.8018e11 424
2.3117e09 R-squared
0.1156 ------------------------------------------
- Adj R-squared 0.1135 Total
1.1082e12 425 2.6076e09 Root MSE
48081 ------------------------------------------
------------------------------- turnout
Coef. Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
-------------------------------- diplomau
2164.063 290.7549 7.44 0.000 1592.563
2735.564 _cons 172999.4 6300.284
27.46 0.000 160615.7 185383.1 --------------
--------------------------------------------------
--------
41
How do we interpret it?
  • For now, we look at one key thing coefficients
    (slope and intercept)
  • 172999.4 2164.073 x
  • Every 1 unit increase in the percentage of
    university diploma holders increases voter
    turnout by 2,164 votes, on average.
  • If there was a district with no university
    diploma holders, we would expect 172,999 people
    to turnout, on average.
Write a Comment
User Comments (0)
About PowerShow.com