Correlation PowerPoint PPT Presentation

presentation player overlay
1 / 25
About This Presentation
Transcript and Presenter's Notes

Title: Correlation


1
Correlation
  • Chapter 9

2
Correlation
  • Relationship or association between variables
  • If two variables are related, knowing something
    about one of them tells us something about the
    other
  • Ex The relationship between height weight
  • Correlation coefficient
  • A measure of the relationship between variables
  • Pearson product-moment correlation coefficient
    (r)
  • The most common correlation coefficient

3
Graphing Relationships
  • Scatterplot (scatter diagram)
  • A figure in which the individual data points are
    plotted in two-dimensional space (Xi , Yi)
  • The coordinates Xi , Yi are the individuals
    scores on X Y

Y
X
4
Correlation Terms
  • The idea with correlation is that we want to be
    able to predict something about one of the
    variables by knowing something about the other
  • For correlation we call X the predictor variable
  • The variable from which a prediction is made
  • We call Y the criterion variable
  • The variable to be predicted

5
Correlation Terms
  • Correlations are are standardized using the
    standard deviation
  • So they range from -1 to 1
  • A correlation of -1 or 1 means perfect prediction
    or relationship and a correlation of 0 means no
    relationship
  • The relationships are strongest near the extremes
    and weakest near zero
  • Negative Relationships
  • As the values of one variable go up the other
    goes down
  • Positive Relationships
  • As the values of one variable go up the other
    goes up

6
Strong Positive Relationship
7
Hours studying and problems missed
Strong Negative Relationship
30
20
10
Hours Studying
0
12
10
8
6
4
2
0
Problems missed
Correlations
Hours
Missed
Pearson Correlation
Hours
1.000
-.973

Sig. (2-tailed)
.
.000
N
10
10
Pearson Correlation
Missed
-.973

1.000
Sig. (2-tailed)
.000
.
N
10
10
Correlation is significant at the 0.01 level
.
(2-tailed).
8
Hours I studied and problems you missed
No Relation
24.6
24.4
24.2
24.0
23.8
Problems you missed
23.6
23.4
12
10
8
6
4
2
0
Hours Studied
Correlations
Hours
Missed
a
Pearson Correlation
Hours stud.
1.000
.
Sig. (2-tailed)
.
.
10
10
N
a
a
Pearson Correlation
Prob. missed
.
.
Sig. (2-tailed)
.
.
N
10
10
Cannot be computed because at least one of the
a.
variables is constant.
9
Types of Relationships
  • The relationship between X and Y can be linear or
    curvilinear
  • We usually are dealing with linear relationships
  • Linear relationships
  • A situation in which the line that best fits the
    points (in a scatterplot) is a straight line
  • Curvilinear relationship
  • A situation in which the line that best fits the
    points (in a scatterplot) is a not straight line

10
Covariance
  • Correlations are based on covariation between X
    and Y
  • Covariance is a statistic representing the degree
    to which 2 variables vary together
  • ?X ?Y
  • covXY ?XY - n
  • n - 1
  • The covariance is based on how far an observation
    deviates from the mean on EACH variable
  • The covariation can be negative

11
Example
  • Compute the covariance for the following
  • Hours Studied Score on Exam
  • 6 90
  • 8 95
  • 2 70
  • 4 80

12
Pearsons r
  • Because the covariance depends on the standard
    deviations of X Y
  • We use the correlation which is standardized by
    the standard deviation
  • r covXY
  • sxsy
  • r ?XY - ?X ?Y
  • n
  • ?X2 (?X)2?Y2 (?Y)2
  • n n

13
Example
  • Compute the correlation using the following data
  • Undergrad GPA GRE Total
  • 3.8 2350
  • 2.8 1740
  • 3.2 2100
  • 3.5 2230

14
Factors that affect the correlation
  • The correlation can be affected by three factors
  • Restriction of range in X Y
  • Nonlinearity of the relationship between X Y
  • Heterogeneous sub-samples

15
Restriction of range
  • Range restrictions
  • Cases where the range over which X and Y varies
    is artificially limited
  • Ex. College GPA and SAT scores
  • The problem is that usually only people with high
    SATs are allowed into college thus restricting
    the possible values of SAT scores we can use
  • Want to know if SAT scores have a relationship
    with how suitable one is for college, but we are
    really only answering that question using people
    who got into college
  • May cause r to increase or reduce Normally it
    reduces r

16
Nonlinearity
  • A straight line doesnt best fit our data
  • Ex. Height with age
  • Height goes up with age but only to a certain
    point, it will level off or decrease thereafter
    thus reducing our r (which measures linear
    relations)

Height
Age
17
Heterogeneous Sub-Samples
  • Data from the sample of observations could be
    subdivided into 2 distinct sets on the basis of
    some other variable
  • Ex height and weight of U.S. adults
  • A possible sub-group would be males and females
  • If we collapse across sex might get a correlation
    of .78 but if you were to look at these
    correlations for males (.60) and females (.49)
    separately we would find a different pattern
  • Be careful when combining data from various
    sources

18
Not all correlations are meaningful
  • Not all results you find are meaningful even if
    they are strong
  • There is a significant positive correlation
    between ice cream consumption and the number of
    deaths due to drowning
  • Does this mean ice cream consumption causes
    downing? No!
  • There is a third variable that is responsible for
    this relationship Hot weather
  • Correlations usually dont explain causation

19
Hypothesis testing with r
  • Population correlation coefficient rho (?)
  • The correlation coefficient for the population
  • The null hypothesis
  • H0 ? 0
  • The alternative hypothesis
  • H1 ? ? 0
  • Table E.2 to get the critical value
  • Use alpha and df (df n - 2) to get the CV
  • If correlation exceeds the critical value reject
    the null

20
Example
  • We are interested in whether the number of
    Friends episodes you have watched is related to
    the number of hours you have studied. Test the
    hypothesis that these variables are related. Set
    ?.05

21
Intercorrelation Matrix
  • A table (matrix) showing the pairwise
    correlations between all variables

Correlations
HOURS
ICECREAM
DROWNING
MISSED.
Pearson Correlation
HOURS
1.000
1.000

.968

-.973

Sig. (2-tailed)
.
.000
.000
.000
N
10
10
10
10
Pearson Correlation
ICECREAM
1.000

1.000
.968

-.973

Sig. (2-tailed)
.000
.
.000
.000
N
10
10
10
10
Pearson Correlation
DROWNING
.968

.968

1.000
-.908

Sig. (2-tailed)
.000
.000
.
.000
N
10
10
10
10
Pearson Correlation
MISSED
-.973

-.973

-.908

1.000
Sig. (2-tailed)
.000
.000
.000
.
N
10
10
10
10
.
Correlation is significant at the 0.01 level
(2-tailed).
22
Correlations with Ranked data
  • Data for which the observations have been
    replaced with their numerical ranks from lowest
    to highest
  • Ex. Rank these applications in terms of
    acceptability rank them in terms of resume
    clarity correlate clarity with acceptability
  • To correlate ranked data we use Spearmans
    correlation coefficient for ranked data (rs)
  • Also called Spearmans rho
  • This is not the best technique for ranked data
    but it is the most common one

23
Computing Spearmans Rho
  • To compute the correlation for ranked data, you
    can use the Pearson formula
  • Spearmans rho measures the linearity between the
    ranks
  • Monotonic relationship
  • A relationship represented by a line that is
    continually increasing or decreasing but perhaps
    not in a straight line

24
Other Correlation Coefficients
  • Point biserial correlation (rpb)
  • The correlation coefficient when one of the
    variables is dichotomous the other is
    continuous
  • dichotomous variables can only have 2 different
    values (e.g. Yes/No)
  • Compute using the Pearson Formula
  • Phi (?)
  • The correlation coefficient used when both of the
    variables are measured as dichotomies
  • Compute using the Pearson formula
  • See the table on p.164

25
Final Example
  • The following is the scores of job applicants on
    a cognitive ability test and ratings from an
    interview (both are out of 100)
  • CA IR
  • 75 80
  • 98 96
  • 89 87
  • 67 72
  • Using Pearsons r test the hypothesis that there
    is a relationship between cognitive ability and
    interview rating
Write a Comment
User Comments (0)
About PowerShow.com