Bivariate Regression and Correlation - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Bivariate Regression and Correlation

Description:

Covariance is not a good measure of the 'magnitude' of the relationship. ... Measure the dependent variable and the independent variables at time t, retake ... – PowerPoint PPT presentation

Number of Views:240
Avg rating:3.0/5.0
Slides: 20
Provided by: homeUc
Category:

less

Transcript and Presenter's Notes

Title: Bivariate Regression and Correlation


1
Bivariate Regression and Correlation
  • Lecture 5

2
Analytic Tool
  • Answer the following
  • Suppose that you have a sample of 64 individuals,
    the sample mean is 20, the sample standard
    deviation is 16.
  • Can you reject the null hypothesis that the
    sample mean is less than or equal to zero at the
    .05 significance level?
  • Can you reject the null hypothesis that the
    sample mean is less than or equal to 18 at the
    .05 significance level?

3
Answer to the Analytic Tool
  • Suppose that you have a sample of 64 individuals,
    the sample mean is 20, the sample standard
    deviation is 16.
  • The critical value of the t-statistic at the .05
    significance level with 63 degrees of freedom is
    about 1.6.
  • To determine whether the sample mean is
    significantly different from zero, we calculate
  • t (20 0) / (16 / 8) 20 / 2 10
  • Since t gt 1.6, we can reject the null
    hypothesis.
  • To determine whether the sample mean is
    significantly different from 18, we calculate
  • t (20 18) / (16 / 8) 2 / 2 1
  • Since t lt 1, we cannot reject the null
    hypothesis.

4
Agenda
  • Today we will begin to learn how to investigate
    the relationship between two continuous
    variables.
  • You will learn
  • 1) how to graphically present the relationship
    between two variables
  • 2) how to measure the correlation between two
    variables

5
Review
  • Thus far, we have learned how to conduct three
    general types of hypothesis tests.
  • 1) Hypothesis tests concerning the sample mean
    of a continuous random variable.
  • e.g. Is the mean income in the U.S. greater
    than 40,000?
  • 2) Hypothesis tests concerning the difference in
    the means of two samples of a continuous random
    variable.
  • e.g. Is the mean income for men greater than
    the mean income for women?
  • 3) Hypothesis tests concerning the independence
    of two categorical variables.
  • e.g. Are race and vote choice independent?

6
Introduction
  • Suppose that you have two continuous variables
    measured for the same observation over a number
    of samples.
  • You would like to whether these two samples are
    related.
  • How would you proceed based on what youve been
    taught so far?

7
Possible Methods (based on what weve covered so
far)
  • Option 1. Split one of the variables at some
    critical value (say the median). Then test for a
    difference in the means of the samples for the
    second variable above and below the critical
    value.
  • e.g. Test whether cities with large percent
    increases in govt expenditures had higher or
    lower percent changes in unemployment than cities
    with small percent changes in govt expenditures.
  • Option 2. Divide both categories into smaller
    sets (say quartiles or quintiles). Then perform a
    chi-squared test to see if those categories are
    independent.

8
Graphical Representation of the Data
  • One way to analyze relationships between two
    continuous variables is with a scatterplot.
  • A scatterplot is a type of diagram that displays
    the covariation of two continuous variables as a
    set of points on a Cartesian coordinate system.

9
Interpretation of the Scatterplot
A Positive Relationship between the two variables
occurs when an increase in the variable
represented on the x-axis corresponds to an
increase in the variable represented on the
y-axis.
A Negative Relationship between the two variables
exists when an increase in the variable
represented on the x-axis corresponds to a
decrease in the variable represented on the
y-axis.
10
Interpretation of the Scatterplot
A curvilinear relationship exists if the effect
of a change in the variable on the x-axis has a
different effect on the variable represented
along the y-axis, depending on the value of x (or
y).
No relationship exists if a change in the
variable represented along the x-axis does not
correspond to a change in the variable along the
y-axis
11
Group Analytic Tool Group
  • How would you device a statistic to determine
    whether there was a positive or negative
    relationship?

12
Covariance
  • Covariance is a statistical measure of the
    relationship between two samples of two
    variables.
  • Cov( X , Y ) ? ( Yi Mean(Y) ) ( Xi
    Mean(X) )
  • N 1
  • If your relationship is positive, then the
    covariance will be positive large values of X
    will be associated with large values of Y and
    small values of X will be associated with small
    values of Y.
  • If your relationship is negative, then the
    covariance will be negative large values of X
    will be associated with small values of Y and
    small values of X will be associated with large
    values of Y.
  • If there is no relationship, then the covariance
    is zero large values of X will be associated
    with both large and small values of Y and small
    values of X will be associated with both large
    and small values of Y.

13
Computing the Covariance
  • Note that the equation for covariance can be
    defined multiple ways
  • The intuitive expression
  • Cov( X , Y ) ?( Yi Mean(Y) ) ( Xi
    Mean(X) )
  • N 1
  • Is equivalent to
  • Cov( X , Y ) N ?(Xi Yi) (?Xi )(?Yi)
  • (N 1)N
  • The second expression may be easier to use for
    calculations in Excel.
  • Note It is useful to calculate results yourself
    rather than with Excels canned function
    because (at least with my version), Excel assumes
    that it is estimating the covariance of two
    populations and uses n as a denominator rather
    than n-1.

14
Comments on Covariance
  • Covariance is a very good indicator of the
    direction of the relationship between two
    variables.
  • Covariance is not a good measure of the
    magnitude of the relationship. This is because
    covariance is sensitive to the scale of the
    variables under investigation.
  • Note you can see this if you simply multiply
    both variables by a constant and compare the
    covariances.
  • So, it is not proper to compare the covariances
    from two different data sets to see if the
    relationship is stronger in one case than the
    other.
  • How would you improve covariance to make your
    findings less sensitive to scale?

15
Correlation
  • Correlation is a statistical measure of
    association closely related to covariance.
  • The correlation coefficient, denoted RXY or just
    R, is defined as
  • RXY ?( Yi Mean(Y) ) ( Xi Mean(X) )
    ?( Xi Mean(X) )2 ?( Yi Mean(Y) )2
  • Covariance (X , Y ) Standard
    Deviation(X) Standard Deviation (Y)
  • RXY by definition can only take values between -1
    and 1.
  • The larger the absolute value of RXY stronger the
    relationship between X and Y. If RXY 1, then X
    and Y are positively related and X is a perfect
    predictor of Y.
  • If If RXY -1, then X and Yare negatively
    related and X is a perfect predictor of Y.
  • If RXY 0, then X and Y are unrelated.

16
Correlation cont.
  • RXY by definition can only take values between -1
    and 1.
  • The larger the absolute value of RXY stronger the
    relationship between X and Y. If RXY 1, then X
    and Y are positively related and X is a perfect
    predictor of Y (and Y is a perfect predictor of
    X).
  • If RXY -1, then X and Yare negatively related
    and X is a perfect predictor of Y (and Y is a
    perfect predictor of X).
  • If RXY 0, then X and Y are unrelated.Overview
  • Give the big picture of the subject
  • Explain how all the individual topics fit together

17
Comments on Correlation
  • The correlation coefficient provides a very
    useful summary of the relationship between X and
    Y.
  • But, it takes real effort to use a knowledge of
    the correlation coefficient and the value of X
    (or Y) to make prediction about the value of Y
    (X).
  • Additionally, correlation does not imply
    causation.
  • e.g. In our original example, do changes in govt
    expenditures cause changes in employment or do
    changes in unemployment cause changes in govt
    expenditures?

18
Getting at Causation
  • When we do statistical analyses, we generally
    have to make assumptions about what constitutes a
    cause and what constitutes an effect.
  • That is, we make a formal statement about our
    hypothesized relationship like
  • Yi f(stuff), where Y is the dependent variable
    and stuff is the set of independent variables.
  • If we are clever, we can estimate the effect of
    stuff (and thats what we will be talking about
    for the next few weeks) to test whether it has a
    statistically significant influence on Y.
  • If we are really clever, can we test for
    causality as well? How?

19
Getting at Causation cont.
  • In order for a variable to be a cause, it is
    necessary (but not sufficient) for the variable
    to occur prior to the effect.
  • Possible Research Designs to Examine Causality.
  • - For a dependent variable that doesnt change
    much
  • Measure a stable set of individual-level
    characteristics (e.g. race, gender, parents value
    for the dependent variable), then examine which
    stable characteristics explain variation in your
    sample.
  • - For a dependent variable that does change
  • Measure the dependent variable and the
    independent variables at time t, retake the
    measurements for the same sample at time t1,
    then examine whether changes (stability) in the
    independent variables led to changes (stability)
    in the dependent variable. (Note ideally, youd
    show that changes in X occurred before changes in
    Y)
  • - For a dependent variable that does change
  • Cohort Analysis
Write a Comment
User Comments (0)
About PowerShow.com