Basic Quantitative Methods in the Social Sciences (AKA Intro Stats) - PowerPoint PPT Presentation

About This Presentation
Title:

Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Description:

Seven people were asked to rate the taste of McDonalds french fries on a scale ... One of the properties of the Normal Distribution is the fixed area under the curve ... – PowerPoint PPT presentation

Number of Views:338
Avg rating:3.0/5.0
Slides: 57
Provided by: hpcus150
Category:

less

Transcript and Presenter's Notes

Title: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)


1
Basic Quantitative Methods in the Social
Sciences(AKA Intro Stats)
  • 02-250-01
  • Lecture 3

2
Variation
  • Variability The extent numbers in a data set are
    dissimilar (different) from each other
  • When all elements measured receive the same
    scores (e.g., everyone in the data set is the
    same age, in years), there is no variability in
    the data set
  • As the scores in a data set become more
    dissimilar, variability increases

3
Variation Range
  • The range tells us the span over which the data
    are distributed, and is only a very rough measure
    of variability
  • Range The difference between the maximum and
    minimum scores
  • Example The youngest student in a class is 19
    and the oldest is 46. Therefore, the age range of
    the class is 46 19 27 years.

4
Variation
  • X
  • 5 0.00 This is an example of data
  • 5 0.00 with NO variability
  • 5 0.00
  • 5 0.00
  • 5 0.00
  • 25 n 5 5

5
Variation
  • X
  • 6 1.00 This is an example of data
  • 4 -1.00 with low variability
  • 6 1.00
  • 5 0.00
  • 4 -1.00
  • 25 n 5 5

6
Variation
  • X
  • 8 3.00 This is an example of data
  • 1 -4.00 with higher variability
  • 9 4.00
  • 5 0.00
  • 2 -3.00
  • 25 n 5 5

7
Note
  • Lets say we wanted to figure out the average
    deviation from the mean. Normally, we would want
    to sum all deviations from the mean and then
    divide by n, i.e.,
  • (recall look at your formula for the mean from
    last lecture)
  • BUT We have a problem. will
    always add up to zero

8
Variation
  • However, if we square each of the deviations from
    the mean, we obtain a sum that is not equal to
    zero
  • This is the basis for the measures of variance
    and standard deviation, the two most common
    measures of variability of data

9
Variation
  • X
  • 8 3.00
    9.00
  • 1 -4.00
    16.00
  • 9 4.00
    16.00
  • 5 0.00
    0.00
  • 2 -3.00
    9.00
  • 25 0.00
    50.00
  • Note The is called the Sum of
    Squares

10
Variance of a Population
  • VARIANCE OF A POPULATION the sum of squared
    deviations from the mean divided by the number of
    scores (sigma squared)

11
Population Standard Deviation
  • Square root of the variance

12
Sample Variance
  • the sum of squared deviations from the mean
    divided by the number of degrees of freedom (an
    estimate of the population variance, n-1)

13
Sample Standard Deviation
  • Square root of the variance s2

14
Why use Standard Deviation and not Variance!??!
  • Normally, you will only calculate variance in
    order to calculate standard deviation, as
    standard deviation is what we typically want
  • Why? Because standard deviation expresses
    variability in the same units as the data
  • Example Standard deviation of ages in a class is
    3.7 years

15
Variance
  • The above formulae are definitional - they are
    the mathematical representation of the concepts
    of variance and standard deviation
  • When calculating variance and standard deviation
    (especially when doing so by hand) the following
    computational formulae are easiest to use (trust
    us, they really are easier to use. You should
    however have a good understanding of the
    definitional formulae)

16
Population Variance
  • Computational Formula

17
Population Standard Deviation
  • Computational Formula

18
Sample Variance
  • Computational Formula

19
Sample Standard Deviation
  • Computational Formula

20
Sample Standard Deviation Example
  • Data
  • X X2
  • 8 64 n 5, 5
  • 1 1
  • 9 81
  • 5 25 s2 175 (25)2/5
  • 2 4 4
  • X25 175 s2 12.50
  • s s 3.54

21
Computing Standard Deviation
  • When calculating standard deviation, create a
    table that looks like this

X X2
X1 X12
X2 X22
X3 X32
X4 X42
2
X X2
2 4
4 16
7 49
9 81
22 2 150
22
Computing Standard Deviation
  • The values are then entered into the formula as
    follows
  • 150
    222 484
  • n
    4

  • n-1 3

23
Computing Standard Deviation
  • The values are then entered into the formula as
    follows
  • 150
    222 484
  • n
    4

  • n-1 3

24
Computing Standard Deviation
  • The values are then entered into the formula as
    follows

25
Degrees of Freedom
  • Degrees of Freedom The number of independent
    observations, or, the number of observations that
    are free to vary
  • In our data example above, there are 5 numbers
    that total 25 ( 25, n 5)

26
Degrees of Freedom
  • Many combinations of numbers can total 25, but
    only the first 4 can be any value
  • The 5th number cannot vary if 25
  • This example has 4 degrees of freedom, as four of
    the five numbers are free to vary
  • Sample standard deviation usually underestimates
    population standard deviation. Using n-1 in the
    denominator corrects for this and gives us a
    better estimate of the population standard
    deviation.

27
Degrees of Freedom
  • Degrees of freedom are usually
  • n-1
  • (the total of data points minus one)

28
Time for an example
  • Seven people were asked to rate the taste of
    McDonalds french fries on a scale of 1 to 10.
    Their ratings are as follows
  • 8, 4, 6, 2, 5, 7, 7
  • Calculate the population standard deviation
  • Calculate the sample variance
  • Class discussion When would this be a
    population, and when would it be a sample?

29
Why is Standard Deviation so Important?
  • What does the standard deviation really tell us?
  • Why would a samples standard deviation be small?
  • Why would a samples standard deviation be large?

30
An Example
  • Youre sitting in the CAW Student Centre with 4
    of your friends. A member of the opposite sex
    walks by, and you and your friends rate this
    persons attractiveness on a scale from 1 to 10
    (where 1very unattractive and 10drop dead
    gorgeous)

31
Food for thought
  • 1) What would it mean if all five of you rated
    this person a 9 on 10?
  • 2) What would it mean if all five of you rated
    this person a 5 on 10?
  • 3) What would it mean if the five of you produced
    the following ratings 1, 10, 2, 9, and 3 (note
    that the mean rating would be 5)?
  • Why would scenario 3 happen instead of scenario
    2? What factors would lead to these different
    ratings?
  • These questions form the basis of why
    statisticians like to explain variability

32
An In-Depth Look at Scenario 3
  • So if the five of you produced the following
    ratings 1, 10, 2, 9, and 3, what is the standard
    deviation of these ratings?
  • Calculate!
  • What is the standard deviation in Scenario 2?
    Calculate!

33
Normal Distribution
  • The normal distribution is a theoretical
    distribution
  • Normal does not mean typical or average, it is
    a technical term given to this mathematical
    function
  • The normal distribution is unimodal and
    symmetrical, and is often referred to as the Bell
    Curve

34
Normal Distribution
Mean Median Mode
35
Normal Distribution
  • We study the normal distribution because many
    naturally occurring events yield a distribution
    that approximates the normal distribution

36
Properties of Area Under the Normal Distribution
  • One of the properties of the Normal Distribution
    is the fixed area under the curve
  • If we split the distribution in half, 50 of the
    scores of the sample lie to the left of the mean
    (or median, or mode), and 50 of the scores lie
    to the right of the mean (or median, or mode)

37
Properties of Area Under the Normal Distribution
  • The mean, median, and mode always cut the Normal
    Distribution in half, and are equal since the
    Normal Distribution is unimodal and symmetrical

38
Properties of Area Under the Normal Distribution
50 of scores
50 of scores
Mean, Median, Mode
39
Properties of Area Under the Normal Distribution
  • The entire area under the normal curve can be
    considered to be a proportion of 1.0000
  • Thus, half, or .5000 of the scores lie in the
    bottom half (i.e., left of the mean) of the
    distribution, and half, or .5000 of the scores
    lie in the top half (i.e., right of the mean)

40
Properties of Area Under the Normal Distribution
.5000 of scores
.5000 of scores
Mean, Median, Mode
41
Z-scores
  • Z-Scores (or standard scores) are a way of
    expressing a raw scores place in a distribution
  • Z-score formula

42
Z-scores
  • The mean and standard deviation are
    always notated in Greek letters
  • Z-scores only reflect the data points position
    relative to the overall data set (so youre now
    considering the data as a population, as youre
    not looking to infer to a greater population)
  • This means use the population formula for
    standard deviation rather than the sample formula
    whenever you calculate Z

43
Z-scores
  • A z-score is a better indicator of where your
    score falls in a distribution than a raw score
  • A student could get a 75/100 on a test (75) and
    consider this to be a very high score

44
Z-scores
  • If the average of the class marks is 89 and the
    (population) standard deviation is 5.2, then the
    z-score for a mark of 75 would be
  • 89 5.2
  • z (75-89)/5.2
  • z (-14)/5.2
  • z -2.69

45
Z-scores
  • This means that a mark of 75 is actually 2.69
    standard deviations BELOW the mean
  • The student would have done poorly on this test,
    as compared to the rest of the class

46
Z-scores
  • z 0 represents the mean score (which would be
    89 in this example)
  • z lt 0 represents a score less than the mean
    (which would be less than 89)
  • z gt 0 represents a score greater than the mean
    (which would be greater than 89)

47
Z-scores
  • For any set of scores
  • the sum of z-scores will equal zero
  • ( 0.00)
  • have a mean equal to zero
  • ( 0.00)
  • and a standard deviation equal to one
  • ( 1.00)

48
Z-scores
  • A z-score expresses the position of the raw score
    above or below the mean in standard deviation
    sized units
  • E.g.,
  • z 1.50 means that the raw score is 1 and
    one-half standard deviations above the mean
  • z -2.00 means that the raw score is 2 standard
    deviations below the mean

49
Z-score Example
  • If you write two exams, in Math and English, and
    get the following scores
  • Math 70 (class 55, 10)
  • English 60 (class 50, 5)
  • Which test mark represents the better performance
    (relative to the class)?

50
Z-score Example cont.
  • Math mark
  • z (70-55)/10
  • z 1.50
  • English mark
  • z (60-50)/5
  • z 2.00

51
Z-score Example Illustration
Mean Z0.00
Z1.50
Z2.00
52
The Answer
  • Because Z 2.00 is greater than Z 1.50, the
    English class mark of 60 reflects a better
    performance relative to that class than does the
    Math class mark of 70

53
Z-score Solving for X
  • The z-score formula can be rearranged to solve
    for X

54
Z-scores Solving for X
  • This formula is used when you know the z-score of
    a data point, and want to solve for the raw score.

55
Example
  • E.g., if a class midterm exam has 65 and
    5, what exam mark has a z-score value of 1.25?
  • X (1.25)(5) 65
  • 6.25 65
  • 71.25
  • So, a person whose test is 1.25 standard
    deviations above the mean obtained a score of
    71.25

56
Z-scores
  • Z-score problems ask you to solve for X or solve
    for z
  • Review both types of problems!
Write a Comment
User Comments (0)
About PowerShow.com