Basic Quantitative Methods in the Social Sciences (AKA Intro Stats) - PowerPoint PPT Presentation

About This Presentation
Title:

Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Description:

Most common statistics: mean and SD ... McDonalds. Burger. King. Men. Total. 100. 150. 250. Total: 55 55 85 55. Now remember... – PowerPoint PPT presentation

Number of Views:212
Avg rating:3.0/5.0
Slides: 54
Provided by: dhow4
Category:

less

Transcript and Presenter's Notes

Title: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)


1
Basic Quantitative Methods in the Social
Sciences(AKA Intro Stats)
  • 02-250-01
  • Lecture 10

2
Quantitative vs. Frequency Data
  • Recall from our first lecture, that data could
    take the form of
  • Quantitative data (AKA measurement data) whereby
    each observation represents a score on a
    continuum
  • Most common statistics mean and SD
  • Examples height, weight, IQ, rating of Chretien
    on a scale of 1-10.
  • Categorical data (AKA frequency data) whereby
    frequencies of observations fall into one of two
    or more categories.
  • Examples female vs. male, brown eyes vs. blue
    eyes, opposed to Chretien vs. support Chretien
  • This type of data is not a measurement of
    anything per se, it is simply frequencies of
    occurrence in the nominal classes (e.g., of
    males vs. females, etc.)

3
What if we want to know whether the observed
frequencies were what we had expected?
  • When it comes to frequency data, once we have
    counted our observations, we have
  • Observed Frequencies frequencies that have
    actually occurred
  • Expected Frequencies frequencies that we
    expected to occur if some assumption is true
  • The Null Hypothesis (H0) is that the observed
    frequencies do not differ from what we had
    expected (the expected frequencies)

4
Chi - Square
  • Examines the difference between the Observed and
    the Expected frequencies among groups
  • Both variables are Nominal (therefore all we can
    measure is observed frequencies)
  • E.g., we know how many males are in this class
    (the observed frequency), and how many we would
    expect to be in this class

5
Non-parametric tests
  • The Chi-Square test is a non-parametric test.
  • With non-parametric tests, we do not need to
    assume that the population data are normally
    distributed.
  • Non parametric tests allow for nominal and
    ordinal scales of measurement.
  • Although non-parametric tests are still
    inferential, they are less powerful than are
    parametric tests (e.g., t-tests, correlations).
    This means that parametric tests are not as
    likely to find significant results when the
    effect size is small.

6
  • Cindy was hired at a fitness club in town. She
    was told by the boss that 68 of people who come
    in and inquire about becoming a member end up
    joining after they are shown around the club.
  • Three months later, Cindy is fired because the
    boss feels that she is not good at selling the
    club to people inquiring about memberships. Over
    the three months, Cindy gave tours to 75 people.
    44 of them ended up joining.
  • Cindy thinks that the boss just doesnt like her.

7
Is the Boss Accusation True?
  • The boss originally said that 68 of people
    typically join. Therefore, of the 75 people Cindy
    gave tours to, 51 of them should have joined if
    Cindy is on par with the other employees.

Join Dont Join
Observed 44 31
Expected 51 24
8
The Chi-Square Goodness-of-Fit Test
  • The Chi-Square Goodness-of-Fit Test is used when
    you have one classification variable (but it has
    2 or more categories).
  • H0 The assumption that Cindy is on par with the
    other employees is true.
  • Ha The assumption that Cindy is on par with the
    other employees is not true.
  • The Chi-Square test allows a decision about
    whether observed and expected frequencies differ
    significantly.
  • Rejection of H0 suggests that our assumption that
    led to the expected frequencies is wrong (or in
    this example, Cindy is not on par with the other
    club employees).

9
? Greek letter Chi, O observed frequencies, E
expected frequencies
? 2
H0 O-E 0 (no difference between observed and
expected frequencies), therefore, ? 2 0.
10
Calculating Chi Square
  • Create a table with columns for each category (so
    here, join and not join)
  • Create a row for each of the following
  • Observed frequencies (O)
  • Expected frequencies (E)
  • O E
  • (O E)2
  • (O E)2 / E
  • The Chi Square statistic is then the sum of this
    final row of (O E)2 / E

11
Chi Square Goodness of Fit Table
Join Not Join
O n join n not join
E
O E
(O E)2
(O E)2/E
Sum (O E)2/E Chi Square Statistic Value Chi Square Statistic Value
12
Calculating Chi-Squared ( )
Join Dont Join
Observed 44 31
Expected 51 24
(O-E) -7 7
(O-E)2 49 49
(O-E)2/E 0.9608 2.0417
Sum 0.9608 2.0417 3.0025
13
Testing the Significance of
  • DF k-1 where k the number of outcome
    categories.
  • Table E.1 in the text (p. 439) at the .050
    level of significance, df 1
  • ?2crit 3.84
  • Since ?2obt 3.00, we retain H0 (NOTE give obt
    value to 2 decimal places)
  • Therefore, the result is not significant, Cindys
    recruiting performance at the fitness club is not
    significantly different than that of the other
    employees.

14
  • It is easy to see that in the numerator of the
    formula, observed frequencies are compared to
    expected frequencies to assess how well the
    sample data match the hypothesized data. Why must
    we divide the numerator by the expected frequency
    for each category?
  • Suppose you were going to throw a party and you
    expected 1000 people to show up. However, at the
    party, you counted the number of guests and
    observed that 1040 actually showed up. Forty more
    guests than expected are no major problem when
    all along you were planning for 1000. There will
    probably still be enough beer and chips for
    everyone.

15
  • On the other hand, suppose you had a party and
    you expected 10 people to attend but instead 50
    actually showed up. Forty more guests in this
    case spell big trouble. How significant the
    discrepancy is depends in part on what you were
    originally expecting.
  • With very large expected frequencies, allowances
    are made for more errors between observed and
    expected frequencies. This is accomplished in the
    chi-square formula by dividing the squared
    discrepancy for each category by its expected
    frequency.

16
What About When There are More Than Two
Categories?
  • In the preceding example, observed frequencies
    fell into one of two categories joined or did
    not join.
  • What if there are more than two categories?

17
Example
  • Suppose a study showed that of 90 people in
    trauma-induced comas who were treated with
    traditional medicine, 30 died, 30 woke up and
    fully recovered, and 30 remained comatose
    indefinitely. (Note These data were made up).
  • Dr. X, a naturopathic doctor who works with
    patients with trauma-induced comas, claims that
    alternative approaches result in superior
    recovery rates. To test his claim, 90 comatose
    people were treated with his alternative approach
    and were then observed. 40 of them woke up and
    were fully recovered, 30 died, and 20 remained
    comatose indefinitely.

18
Chi- Square
Stayed in Coma
Woke
Died
40
20
Total O 90
O
O
30
30
E
E
Whats H0?
19
Chi- Square
Stayed in Coma
Died
Woke
40
30
20
Total O 90
O
O
O
30
30
30
E
E
E
n 30 30 30 90
20
Chi- Square
? 2
Figure for Each Cell
21
Chi- Square
Stayed In Coma
Woke
Died



22
Chi- Square
Stayed In Coma
Died
Woke
( 20 - 30 ) 2



fe
fe
23
Chi- Square
Stayed In Coma
Died
Woke
( -10 ) 2



fe
fe
24
Chi- Square
Stayed in Coma
Died
Woke
100



fe
fe
25
Chi- Square
Stayed In Coma
Woke
Died
100
100



30
30
30
26
Chi- Square
Stayed In Coma
Woke
Died

3.3333
0
3.3333


27
Chi- Square
?2obt 6.6666 6.67 df k - 1 2 ?2crit
5.99 Therefore We reject H0, Dr. Xs
alternative approach does indeed generate
significantly more recoveries.

28
Distribution of violent crimes in the United
States, 1995
29
Sample results for 500 randomly selected
violent-crime reports from last year
30
Expected frequencies if last years violent-crime
distribution is the same as the1995 distribution
31
Calculating the goodness of fit (Chi-Square)
?2obt 4.219. With dfk-13, at .05, ?2crit
7.81 Therefore, we retain H0, last years crime
distribution is not significantly different from
that in 1995.
32
The Chi-Square Test for Independence
  • The ?2 statistic can also be used to test whether
    or not there is a relationship between two
    categorical (nominal) variables.
  • Each individual in the sample is measured or
    classified on two separate variables.
  • Also known as the Contingency Table Analysis

33
Do people with cell phones have more car
accidents than people without cell phones?
  • The Department of Transportation wanted to see if
    cell phone users have more car accidents than
    non-cell phone users. The following data are a
    sample of 50 people who have had car accidents
    over the past month, and 50 randomly sampled
    drivers who have not had car accidents over the
    past month

34
Chi-Square
Cell Phone
No Cell Phone
Car Accident
No Car Accident
35
So what now?
  • Notice that here, only observed frequencies are
    given to you. You have to calculate the expected
    frequencies.
  • First, total your rows and columns.

36
Chi-Square
Cell Phone
No Cell Phone
50
Car Accident
50
No Car Accident
55
45
100
37
Eij RiCj / N where Eij the expected
frequency at row i, column j. Ri Row is
total Cj Column js total N Grand total (all
cells included)
38
Chi-Square
Cell Phone
No Cell Phone
50
29
Car Accident
50 X 45 100
O
22.5
22.5
E
50
No Car Accident
55
45
100
39
Chi-Square
Cell Phone
No Cell Phone
50
29
Car Accident
50 X 55 100
O
27.5
22.5
27.5
E
50
No Car Accident
55
45
100
40
Chi-Square
Cell Phone
No Cell Phone
50
29
Car Accident
O
22.5
27.5
E
50
No Car Accident
22.5
27.5
55
45
100
41
And then.
You now have four cells with expected and
observed frequencies. Now use the Chi Square
formula!
(29-22.5)2/22.5 1.8778 (21-27.5)2/27.5
1.5364 (16-22.5)2/22.5 1.8778 (34-27.5)2/27
.5 1.5364 ?2obt 6.8284 6.83
42
To test this statistic
  • Lets use the .05 level of significance.
  • Variable 1 Cell phone?
  • Variable 2 Car accident?
  • Because we are dealing with frequency data of two
    categorical variables, we will perform a
    chi-square test of independence.
  • Because it is a chi-square, it is a two-tailed
    test.
  • H0 Cell phones and car accidents are independent
  • Ha Cell phones and car accidents are not
    independent

43
Chi-Square
  • DF (for test of independence) df
    (R-1) (C-1)
  • Where R number of rows
  • Where C number of columns
  • df (2-1)(2-1)1, ?2crit 3.84 (from table E.1,
    page. 439)
  • ?2obt 6.83, so reject the H0.

44
SO.
  • Results are significant. The frequency of being
    in a car accident depends on whether or not one
    uses a cell phone.

45
Note
  • When the expected frequencies are too small,
    chi-square may not be a valid test, therefore all
    expected frequencies should be at least 5
    (dependent on sample size).
  • The chi-square test is also only valid when the
    observations are independent from each other,
    therefore N should be equal to the number of
    subjects (every subject should only be measured
    once).

46
What about when you have more than two
categories?
  • A fast-food marketing consultant wanted to know
    whether men and women had different preferences
    for fast-food restaurants. She randomly sampled
    150 men and 100 women and asked each to declare
    his or her preference for four fast foods
    restaurants. Here are her data

47
Burger King
Total 100 150 ----- 250
Subway
Harveys
McDonalds
Women
Men
Total 55 55 85
55
48
Now remember.
  • To calculate expected frequencies for each cell,
    Eij RiCj / N
  • So (Row sum) (Column sum) / N
  • Do this for each cell to get expected
    frequencies.

49
Burger King
Total 100 150 ----- 250
Subway
Harveys
McDonalds
Women
22
22
22
34
Men Total
33
33
33
51
55 55 85
55
50
You now have what you need to calculate ?2
(35-22)2/22 7.6818 (25-22)2/22
0.4091 (15-34)2/34 10.6176 (25-22)2/22
0.4091 (20-33)2/33 5.1212 (30-33)2/33
0.2727 (70-51)2/51 7.0784 (30-33)2/33
0.2727
Add these up! ?2obt 31.8626
31.86 df (R-1)(C-1) (2-1)(4-1) 3 ?2crit
7.82 (at .05)
51
So.
  • H0 Gender and fast food preference are
    independent.
  • Ha Gender and fast food preference are
    dependent.
  • We reject the null hypothesis.
  • SO Gender and preference for fast food are
    dependent.
  • Stated differently, men and women do not prefer
    the same fast food joints.

52
Time for Some Review
53
  • Final Exam Info
  • Wednesday, June 25 from 700 pm to 1000 pm
  • REMINDERS Bring pencils, calculator, TEXTBOOK,
    student ID
Write a Comment
User Comments (0)
About PowerShow.com