Pearson's correlation

Diane S. Mendoza

- It is named after Karl Pearson who developed the

correlational method to do agricultural research.

- designated by the Greek letter rho (?)
- The product moment part of the name comes from

the way in which it is calculated, by summing up

the products of the deviations of the scores from

the mean. - A correlation is a number between -1 and 1 that

measures the degree of association between two

variables (call them X and Y). - A positive value for the correlation implies a

positive association - A negative value for the correlation implies a

negative or inverse association

The formula for the Pearson correlation

Suppose we have two variables X and Y, with means

XBAR and YBAR respectively and standard

deviations SX and SY respectively. The

correlation is computed as

as the sum of the product of the Z-scores for the

two variables divided by the number of scores.

If we substitute the formulas for the Z-scores

into this formula we get the following formula

for the Pearson Product Moment Correlation

Coefficient, which we will use as a definitional

formula.

The numerator of this formula says that we sum up

the products of the deviations of a subject's X

score from the mean of the Xs and the deviation

of the subject's Y score from the mean of the Ys.

This summation of the product of the deviation

scores is divided by the number of subjects times

the standard deviation of the X variable times

the standard deviation of the Y variable

- When will a correlation be positive?
- Suppose that an X value was above average, and

that the associated Y value was also above

average. Then the product would be the product of

two positive numbers which would be positive. - If the X value and the Y value were both below

average, then the product above would be of two

negative numbers, which would also be positive. - Therefore, a positive correlation is evidence of

a general tendency that large values of X are

associated with large values of Y and small

values of X are associated with small values of Y.

- When will a correlation be negative?
- Suppose that an X value was above average, and

that the associated Y value was instead below

average. Then the product would be the product of

a positive and a negative number which would make

the product negative. - If the X value was below average and the Y value

was above average, then the product above would

be also be negative. - Therefore, a negative correlation is evidence of

a general tendency that large values of X are

associated with small values of Y and small

values of X are associated with large values of Y.

Interpretation of the correlation

coefficient The correlation coefficient measures

the strength of a linear relationship between two

variables. The correlation coefficient is always

between -1 and 1. The closer the correlation is

to /-1, the closer to a perfect linear

relationship. Here is to interpret

correlations. -1.0 to -0.7 strong negative

association. -0.7 to -0.3 weak negative

association. -0.3 to 0.3 little or no

association. 0.3 to 0.7 weak positive

association. 0.7 to 1.0 strong positive

association.

- Let's calculate the correlation between Reading

(X) and Spelling (Y) for the 10 students. There

is a fair amount of calculation required as you

can see from the table below. First we have to

sum up the X values (55) and then divide this

number by the number of subjects (10) to find the

mean for the X values (5.5). Then we have to do

the same thing with the Y values to find their

mean (10.3).

Formula

We then calculate

The correlation we obtained was -.36, showing us

that there is a small negative correlation

between reading and spelling. The correlation

coefficient is a number that can range from -1

(perfect negative correlation) through 0 (no

correlation) to 1 (perfect positive correlation).

The computational formula for the Pearsonian r is

- By looking at the formula we can see that we need

the following items to calculate r using the raw

score formula - The number of subjects, N
- The sum of each subjects X score times the Y

score, summation XY - The sum of the X scores, summation X
- The sum of the Y scores, summation Y
- The sum of the squared X scores, summation X

squared - The sum of the squared Y scores, summation Y

squared

(No Transcript)

In we plug each of these sums into the raw score

formula we can calculate the correlation

coefficient

We can see that we got the same answer for the

correlation coefficient (-.36) with the raw score

formula as we did with the definitional formula.

GRACIAS!

