Title: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)
1Basic Quantitative Methods in the Social
Sciences(AKA Intro Stats)
2Variation
- Variability The extent numbers in a data set are
dissimilar (different) from each other - When all elements measured receive the same
scores (e.g., everyone in the data set is the
same age, in years), there is no variability in
the data set - As the scores in a data set become more
dissimilar, variability increases
3Variation Range
- The range tells us the span over which the data
are distributed, and is only a very rough measure
of variability - Range The difference between the maximum and
minimum scores - Example The youngest student in a class is 19
and the oldest is 46. Therefore, the age range of
the class is 46 19 27 years.
4Variation
- X
- 5 0.00 This is an example of data
- 5 0.00 with NO variability
- 5 0.00
- 5 0.00
- 5 0.00
- 25 n 5 5
5Variation
- X
- 6 1.00 This is an example of data
- 4 -1.00 with low variability
- 6 1.00
- 5 0.00
- 4 -1.00
- 25 n 5 5
6Variation
- X
- 8 3.00 This is an example of data
- 1 -4.00 with higher variability
- 9 4.00
- 5 0.00
- 2 -3.00
- 25 n 5 5
7Note
- Lets say we wanted to figure out the average
deviation from the mean. Normally, we would want
to sum all deviations from the mean and then
divide by n, i.e., - (recall look at your formula for the mean from
last lecture) - BUT We have a problem. will
always add up to zero
8Variation
- However, if we square each of the deviations from
the mean, we obtain a sum that is not equal to
zero - This is the basis for the measures of variance
and standard deviation, the two most common
measures of variability of data
9Variation
- X
- 8 3.00
9.00 - 1 -4.00
16.00 - 9 4.00
16.00 - 5 0.00
0.00 - 2 -3.00
9.00 - 25 0.00
50.00 - Note The is called the Sum of
Squares
10Variance of a Population
- VARIANCE OF A POPULATION the sum of squared
deviations from the mean divided by the number of
scores (sigma squared)
11Population Standard Deviation
- Square root of the variance
12Sample Variance
- the sum of squared deviations from the mean
divided by the number of degrees of freedom (an
estimate of the population variance, n-1)
13Sample Standard Deviation
- Square root of the variance s2
14Why use Standard Deviation and not Variance!??!
- Normally, you will only calculate variance in
order to calculate standard deviation, as
standard deviation is what we typically want - Why? Because standard deviation expresses
variability in the same units as the data - Example Standard deviation of ages in a class is
3.7 years
15Variance
- The above formulae are definitional - they are
the mathematical representation of the concepts
of variance and standard deviation - When calculating variance and standard deviation
(especially when doing so by hand) the following
computational formulae are easiest to use (trust
us, they really are easier to use. You should
however have a good understanding of the
definitional formulae)
16Population Variance
17Population Standard Deviation
18Sample Variance
19Sample Standard Deviation
20Sample Standard Deviation Example
- Data
- X X2
- 8 64 n 5, 5
- 1 1
- 9 81
- 5 25 s2 175 (25)2/5
- 2 4 4
- X25 175 s2 12.50
-
- s s 3.54
21Computing Standard Deviation
- When calculating standard deviation, create a
table that looks like this
X X2
X1 X12
X2 X22
X3 X32
X4 X42
2
X X2
2 4
4 16
7 49
9 81
22 2 150
22Computing Standard Deviation
- The values are then entered into the formula as
follows - 150
222 484 - n
4 -
-
n-1 3
23Computing Standard Deviation
- The values are then entered into the formula as
follows - 150
222 484 - n
4 -
-
n-1 3
24Computing Standard Deviation
- The values are then entered into the formula as
follows -
25Degrees of Freedom
- Degrees of Freedom The number of independent
observations, or, the number of observations that
are free to vary - In our data example above, there are 5 numbers
that total 25 ( 25, n 5)
26Degrees of Freedom
- Many combinations of numbers can total 25, but
only the first 4 can be any value - The 5th number cannot vary if 25
- This example has 4 degrees of freedom, as four of
the five numbers are free to vary - Sample standard deviation usually underestimates
population standard deviation. Using n-1 in the
denominator corrects for this and gives us a
better estimate of the population standard
deviation.
27Degrees of Freedom
- Degrees of freedom are usually
- n-1
- (the total of data points minus one)
28Time for an example
- Seven people were asked to rate the taste of
McDonalds french fries on a scale of 1 to 10.
Their ratings are as follows - 8, 4, 6, 2, 5, 7, 7
- Calculate the population standard deviation
- Calculate the sample variance
- Class discussion When would this be a
population, and when would it be a sample?
29Why is Standard Deviation so Important?
- What does the standard deviation really tell us?
- Why would a samples standard deviation be small?
- Why would a samples standard deviation be large?
30An Example
- Youre sitting in the CAW Student Centre with 4
of your friends. A member of the opposite sex
walks by, and you and your friends rate this
persons attractiveness on a scale from 1 to 10
(where 1very unattractive and 10drop dead
gorgeous)
31Food for thought
- 1) What would it mean if all five of you rated
this person a 9 on 10? - 2) What would it mean if all five of you rated
this person a 5 on 10? - 3) What would it mean if the five of you produced
the following ratings 1, 10, 2, 9, and 3 (note
that the mean rating would be 5)? - Why would scenario 3 happen instead of scenario
2? What factors would lead to these different
ratings? - These questions form the basis of why
statisticians like to explain variability
32An In-Depth Look at Scenario 3
- So if the five of you produced the following
ratings 1, 10, 2, 9, and 3, what is the standard
deviation of these ratings? - Calculate!
- What is the standard deviation in Scenario 2?
Calculate!
33Normal Distribution
- The normal distribution is a theoretical
distribution - Normal does not mean typical or average, it is
a technical term given to this mathematical
function - The normal distribution is unimodal and
symmetrical, and is often referred to as the Bell
Curve
34Normal Distribution
Mean Median Mode
35Normal Distribution
- We study the normal distribution because many
naturally occurring events yield a distribution
that approximates the normal distribution
36Properties of Area Under the Normal Distribution
- One of the properties of the Normal Distribution
is the fixed area under the curve - If we split the distribution in half, 50 of the
scores of the sample lie to the left of the mean
(or median, or mode), and 50 of the scores lie
to the right of the mean (or median, or mode)
37Properties of Area Under the Normal Distribution
- The mean, median, and mode always cut the Normal
Distribution in half, and are equal since the
Normal Distribution is unimodal and symmetrical
38Properties of Area Under the Normal Distribution
50 of scores
50 of scores
Mean, Median, Mode
39Properties of Area Under the Normal Distribution
- The entire area under the normal curve can be
considered to be a proportion of 1.0000 - Thus, half, or .5000 of the scores lie in the
bottom half (i.e., left of the mean) of the
distribution, and half, or .5000 of the scores
lie in the top half (i.e., right of the mean)
40Properties of Area Under the Normal Distribution
.5000 of scores
.5000 of scores
Mean, Median, Mode
41Z-scores
- Z-Scores (or standard scores) are a way of
expressing a raw scores place in a distribution
42Z-scores
- The mean and standard deviation are
always notated in Greek letters - Z-scores only reflect the data points position
relative to the overall data set (so youre now
considering the data as a population, as youre
not looking to infer to a greater population) - This means use the population formula for
standard deviation rather than the sample formula
whenever you calculate Z
43Z-scores
- A z-score is a better indicator of where your
score falls in a distribution than a raw score - A student could get a 75/100 on a test (75) and
consider this to be a very high score
44Z-scores
- If the average of the class marks is 89 and the
(population) standard deviation is 5.2, then the
z-score for a mark of 75 would be - 89 5.2
- z (75-89)/5.2
- z (-14)/5.2
- z -2.69
45Z-scores
- This means that a mark of 75 is actually 2.69
standard deviations BELOW the mean - The student would have done poorly on this test,
as compared to the rest of the class
46Z-scores
- z 0 represents the mean score (which would be
89 in this example) - z lt 0 represents a score less than the mean
(which would be less than 89) - z gt 0 represents a score greater than the mean
(which would be greater than 89)
47Z-scores
- For any set of scores
- the sum of z-scores will equal zero
- ( 0.00)
- have a mean equal to zero
- ( 0.00)
- and a standard deviation equal to one
- ( 1.00)
48Z-scores
- A z-score expresses the position of the raw score
above or below the mean in standard deviation
sized units - E.g.,
- z 1.50 means that the raw score is 1 and
one-half standard deviations above the mean - z -2.00 means that the raw score is 2 standard
deviations below the mean
49Z-score Example
- If you write two exams, in Math and English, and
get the following scores - Math 70 (class 55, 10)
- English 60 (class 50, 5)
- Which test mark represents the better performance
(relative to the class)?
50Z-score Example cont.
- Math mark
- z (70-55)/10
- z 1.50
- English mark
- z (60-50)/5
- z 2.00
51Z-score Example Illustration
Mean Z0.00
Z1.50
Z2.00
52The Answer
- Because Z 2.00 is greater than Z 1.50, the
English class mark of 60 reflects a better
performance relative to that class than does the
Math class mark of 70
53Z-score Solving for X
- The z-score formula can be rearranged to solve
for X
54Z-scores Solving for X
- This formula is used when you know the z-score of
a data point, and want to solve for the raw score.
55Example
- E.g., if a class midterm exam has 65 and
5, what exam mark has a z-score value of 1.25? - X (1.25)(5) 65
- 6.25 65
- 71.25
- So, a person whose test is 1.25 standard
deviations above the mean obtained a score of
71.25
56Z-scores
- Z-score problems ask you to solve for X or solve
for z - Review both types of problems!