Measures of Association and Regression Introduction - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Measures of Association and Regression Introduction

Description:

How to analyze one variable at a time (CI) ... two nominal or ordinal level variables with any number of categories (using simple crosstabs) ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 42

Provided by: DAN3179

Category:

more less

Transcript and Presenter's Notes

Title: Measures of Association and Regression Introduction

1
Measures of Association and Regression
Introduction
2
Measures of Association

Ask the question, How strong is the
relationship? or How well can we predict y
using x
What is the difference between MoA and...
Chi-Square or other hypothesis test results
Recall difference between regression and
correlation

3
How Strong is the Relationship?
4
How Strong is the Relationship?

Without an independent variable, our best
prediction is a measure of central tendency
(mode)
With an independent variable, we might be able to
improve our ability to predict the dependent
variable.
How much better can we predict the dependent
variable with another variable than when we only
had the mode?

5
Gender and Guns

ModeFavor Gun Ban
If all we knew was this, our best prediction
would be that everyone favors the gun ban.
We would be right 807 times and we would be in
error 707 times

6
Gender and Guns

When we add gender to the mix we would predict
that all females favor (449 right, 226 wrong) and
all males oppose (481 right, 358 wrong)
That gives us a total of 930 right predictions
and 584 errors

7
How much better do we do?

Without knowing gender, we made 707 errors
Knowing gender, we reduced that to 584 errors
We made 123 fewer errors by knowing gender.
We can show as a proportion, how big the
reduction in error is

We make about 18 fewer errors than we did
knowing the mode alone.
8
Proportional Reduction of Error

This framework is called PRE
This measure, lambda (?), is appropriate for
crosstabs that involve a nominal variable
Similar logic can be applied to analyses of two
ordinal variables

9
Ordinal PRE Measures

Basic Logic
Look at each case
Did the measure of central tendency alone get it
right?
Did adding the ind. Variable get it right?
Compare to see which is better
Criterion for good choices?
Fits your research needs
Conservative estimate

10
Choices

Gamma
Drops cases that are ties
Overestimates strength
Kendalls tau-b
Allows for ties, best for square tables
Kendalls tau-c
Allows for ties, best for non-square tables
Somers d
Variant of tau statistics, not quite as common

11
Nominal Measures

If one or both variables are nominal, you should
use lambda (?).
Sometimes lambda doesnt work well

Because we predict that all women approve and
that all men approve, we get the same prediction!
We need something else

12
Cramers V (aka phi / f)

We know that the ?2 statistic said there is a
real relationship, so we know that its strength
is greater than 0.
Cramers V is a mathematical manipulation of the
?2 statistic that turns it into a measure of
association.
where m is the less of
(r-1) and (c-1)
This is not a PRE measure of association, though,
so its interpretation is a little different.

13
Interpreting PRE MoAs

All lie on the interval -1,1
SPSS will calculate them all for you.
General Guidelines

14
Interpreting PRE MoAs

SPSS may or may not get the direction of the
association right (positive or negative). It
depends on how the variables are coded.
You should place the appropriate sign by your
measure of association (but not by Cramers V,
its always positive)
Tells you the proportion of errors made by the
measure of central tendency alone that are no
longer errors when we add the independent variable

15
Interpreting Cramers V

For our purposes, you may use the same table as
for PRE results (weak, moderate)
You cannot interpret it in terms of reduction of
error.

16
Directionality

Some tests are directional while others are
symmetric
In directional tests, you get different answers
depending on which variable is the D.V. and which
is the I.V.
In symmetric tests, the answer is the same
Given the option, choose the directional test and
specify the right D.V.

17
Finally,

We only observe a sample, and Cramers V or any
of the PRE MoAs could be 0 in the population, and
we observe one higher than 0 by sampling error
only!
We can do tests of significance on many of these
SPSS will do this for you, I wont make you do it
by hand.

18
Regression
19
Multiple Variables

So far, we have learned
How to analyze one variable at a time (CI)
How to compare two means or proportions (a
relationship between one variable measured at any
level and a nominal or ordinal variable with two
categories)
How to compare relationships between two nominal
or ordinal level variables with any number of
categories (using simple crosstabs)
Today, we learn about relationships between two
interval-level variables.

20
Two Interests

Magnitude of relationship between the independent
variable and the dependent variable (how much
change in one yields how much change in the
other).
Correlation the predictive power of one variable
on another. This is a Measure of Association
(but not a PRE Measure of Association)

21
Correlation
22
Types of Correlation

Positive Correlation An increase in one variable
results in an increase in the other
Negative Correlation An increase in one variable
results in a decrease in the other.

23
Correlation analysis asks

How good a predictor is the independent
Variable of the dependent variable?
How good a predictor of income is education?
How accurate is our prediction of the effect of
education on income?
How close are the Dots to the line
It is a Measure of Association

24
Computing the Correlation Coefficient, Pearsons r

Assess how much X and Y move together
(covariance) out of the amount they move
individually (variance)

25
Computing r in STATA

corr var1 var2 var100
Output looks like this

. corr unemplyd mdnincm flood age65 black
(obs427) unemplyd mdnincm flood
age65 black ------------------------------
--------------------------- unemplyd 1.0000
mdnincm -0.4960 1.0000 flood 0.0827
0.0083 1.0000 age65 0.0319 -0.1634
-0.0272 1.0000 black 0.5037 -0.3065
0.0703 -0.1038 1.0000
26
Interpreting the Correlation Coefficient
27
Note Correlation is Linear
r 0
r .9
r 0
r .8
28
Correlation and Regression

Regression effects are depicted by the slope of
the line.
Correlation can be seen as the spread of points
around the regression line. The greater the
amount of spread of points around the regression
line, the less predictive is X of Y and
consequently, the weaker the correlation.

29
Correlation 1 Slope 1
30
Correlation 1 Slope -2
31
Imperfect Correlation and Relationships

We rarely see perfect correlation
However, even with imperfect correlation, we can
have some expectation of what will happen on
average.
While Correlation is never perfect, we can draw a
line to summarize the trend in the data points.
This is the Regression Line

32
Formula for a line

y mx b (algebraic)
y a bx (statistical)
It is the same thing. Well add one more thing
error
yi a bx ei is called the sample regression
function
Yi a ßx ei is called the population
regression function

33
Making Predictions
34
Establishing Relationships
35
Establishing Relationships
Now Add 5 years of education
10 Years of Education Means about 12,000 Income
It adds an Additional 4,000 of Income!
36
Where do we Draw the Line?

Least Squares Principle
Under the Gauss-Markov assumptions, the Ordinary
Least Squares estimator is the Best Linear
Unbiased Estimator
OLS is BLUE

37
What is the estimator?

In the bivariate case (1 dependent, 1
independent), the least squares principle gives
us these equations for calculating the slope and
intercept.

38
(No Transcript)
39
Calculating a and b

b 54.73 / 4.19 13.06
a 235.7 b (11.2)
235.7 13(11.2)
235.7 146.27
89.43

40
STATA command for Regression

regress y x
Output looks like this

. regress turnout diplomau Source
SS df MS Number of obs
426 -------------------------------------------
F( 1, 424) 55.40 Model
1.2806e11 1 1.2806e11 Prob gt F
0.0000 Residual 9.8018e11 424
2.3117e09 R-squared
0.1156 ------------------------------------------
- Adj R-squared 0.1135 Total
1.1082e12 425 2.6076e09 Root MSE
48081 ------------------------------------------
------------------------------- turnout
Coef. Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
-------------------------------- diplomau
2164.063 290.7549 7.44 0.000 1592.563
2735.564 _cons 172999.4 6300.284
27.46 0.000 160615.7 185383.1 --------------
--------------------------------------------------
--------
41
How do we interpret it?

For now, we look at one key thing coefficients
(slope and intercept)
172999.4 2164.073 x
Every 1 unit increase in the percentage of
university diploma holders increases voter
turnout by 2,164 votes, on average.
If there was a district with no university
diploma holders, we would expect 172,999 people
to turnout, on average.

Write a Comment

User Comments (0)