Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

About This Presentation

Title:

Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Description:

Most common statistics: mean and SD ... McDonalds. Burger. King. Men. Total. 100. 150. 250. Total: 55 55 85 55. Now remember... – PowerPoint PPT presentation

Number of Views:212

Avg rating:3.0/5.0

Slides: 54

Provided by: dhow4

Category:

more less

Transcript and Presenter's Notes

Title: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

1
Basic Quantitative Methods in the Social
Sciences(AKA Intro Stats)

02-250-01
Lecture 10

2
Quantitative vs. Frequency Data

Recall from our first lecture, that data could
take the form of
Quantitative data (AKA measurement data) whereby
each observation represents a score on a
continuum
Most common statistics mean and SD
Examples height, weight, IQ, rating of Chretien
on a scale of 1-10.
Categorical data (AKA frequency data) whereby
frequencies of observations fall into one of two
or more categories.
Examples female vs. male, brown eyes vs. blue
eyes, opposed to Chretien vs. support Chretien
This type of data is not a measurement of
anything per se, it is simply frequencies of
occurrence in the nominal classes (e.g., of
males vs. females, etc.)

3
What if we want to know whether the observed
frequencies were what we had expected?

When it comes to frequency data, once we have
counted our observations, we have
Observed Frequencies frequencies that have
actually occurred
Expected Frequencies frequencies that we
expected to occur if some assumption is true
The Null Hypothesis (H0) is that the observed
frequencies do not differ from what we had
expected (the expected frequencies)

4
Chi - Square

Examines the difference between the Observed and
the Expected frequencies among groups
Both variables are Nominal (therefore all we can
measure is observed frequencies)
E.g., we know how many males are in this class
(the observed frequency), and how many we would
expect to be in this class

5
Non-parametric tests

The Chi-Square test is a non-parametric test.
With non-parametric tests, we do not need to
assume that the population data are normally
distributed.
Non parametric tests allow for nominal and
ordinal scales of measurement.
Although non-parametric tests are still
inferential, they are less powerful than are
parametric tests (e.g., t-tests, correlations).
This means that parametric tests are not as
likely to find significant results when the
effect size is small.

Cindy was hired at a fitness club in town. She
was told by the boss that 68 of people who come
in and inquire about becoming a member end up
joining after they are shown around the club.
Three months later, Cindy is fired because the
boss feels that she is not good at selling the
club to people inquiring about memberships. Over
the three months, Cindy gave tours to 75 people.
44 of them ended up joining.
Cindy thinks that the boss just doesnt like her.

7
Is the Boss Accusation True?

The boss originally said that 68 of people
typically join. Therefore, of the 75 people Cindy
gave tours to, 51 of them should have joined if
Cindy is on par with the other employees.

Join Dont Join
Observed 44 31
Expected 51 24
8
The Chi-Square Goodness-of-Fit Test

The Chi-Square Goodness-of-Fit Test is used when
you have one classification variable (but it has
2 or more categories).
H0 The assumption that Cindy is on par with the
other employees is true.
Ha The assumption that Cindy is on par with the
other employees is not true.
The Chi-Square test allows a decision about
whether observed and expected frequencies differ
significantly.
Rejection of H0 suggests that our assumption that
led to the expected frequencies is wrong (or in
this example, Cindy is not on par with the other
club employees).

9
? Greek letter Chi, O observed frequencies, E
expected frequencies
? 2
H0 O-E 0 (no difference between observed and
expected frequencies), therefore, ? 2 0.
10
Calculating Chi Square

Create a table with columns for each category (so
here, join and not join)
Create a row for each of the following
Observed frequencies (O)
Expected frequencies (E)
O E
(O E)2
(O E)2 / E
The Chi Square statistic is then the sum of this
final row of (O E)2 / E

11
Chi Square Goodness of Fit Table
Join Not Join
O n join n not join
E
O E
(O E)2
(O E)2/E
Sum (O E)2/E Chi Square Statistic Value Chi Square Statistic Value
12
Calculating Chi-Squared ( )
Join Dont Join
Observed 44 31
Expected 51 24
(O-E) -7 7
(O-E)2 49 49
(O-E)2/E 0.9608 2.0417
Sum 0.9608 2.0417 3.0025
13
Testing the Significance of

DF k-1 where k the number of outcome
categories.
Table E.1 in the text (p. 439) at the .050
level of significance, df 1
?2crit 3.84
Since ?2obt 3.00, we retain H0 (NOTE give obt
value to 2 decimal places)
Therefore, the result is not significant, Cindys
recruiting performance at the fitness club is not
significantly different than that of the other
employees.

It is easy to see that in the numerator of the
formula, observed frequencies are compared to
expected frequencies to assess how well the
sample data match the hypothesized data. Why must
we divide the numerator by the expected frequency
for each category?
Suppose you were going to throw a party and you
expected 1000 people to show up. However, at the
party, you counted the number of guests and
observed that 1040 actually showed up. Forty more
guests than expected are no major problem when
all along you were planning for 1000. There will
probably still be enough beer and chips for
everyone.

On the other hand, suppose you had a party and
you expected 10 people to attend but instead 50
actually showed up. Forty more guests in this
case spell big trouble. How significant the
discrepancy is depends in part on what you were
originally expecting.
With very large expected frequencies, allowances
are made for more errors between observed and
expected frequencies. This is accomplished in the
chi-square formula by dividing the squared
discrepancy for each category by its expected
frequency.

16
What About When There are More Than Two
Categories?

In the preceding example, observed frequencies
fell into one of two categories joined or did
not join.
What if there are more than two categories?

17
Example

Suppose a study showed that of 90 people in
trauma-induced comas who were treated with
traditional medicine, 30 died, 30 woke up and
fully recovered, and 30 remained comatose
indefinitely. (Note These data were made up).
Dr. X, a naturopathic doctor who works with
patients with trauma-induced comas, claims that
alternative approaches result in superior
recovery rates. To test his claim, 90 comatose
people were treated with his alternative approach
and were then observed. 40 of them woke up and
were fully recovered, 30 died, and 20 remained
comatose indefinitely.

18
Chi- Square
Stayed in Coma
Woke
Died
40
20
Total O 90
O
O
30
30
E
E
Whats H0?
19
Chi- Square
Stayed in Coma
Died
Woke
40
30
20
Total O 90
O
O
O
30
30
30
E
E
E
n 30 30 30 90
20
Chi- Square
? 2
Figure for Each Cell
21
Chi- Square
Stayed In Coma
Woke
Died

22
Chi- Square
Stayed In Coma
Died
Woke
( 20 - 30 ) 2

fe
fe
23
Chi- Square
Stayed In Coma
Died
Woke
( -10 ) 2

fe
fe
24
Chi- Square
Stayed in Coma
Died
Woke
100

fe
fe
25
Chi- Square
Stayed In Coma
Woke
Died
100
100

30
30
30
26
Chi- Square
Stayed In Coma
Woke
Died

3.3333
0
3.3333

27
Chi- Square
?2obt 6.6666 6.67 df k - 1 2 ?2crit
5.99 Therefore We reject H0, Dr. Xs
alternative approach does indeed generate
significantly more recoveries.

28
Distribution of violent crimes in the United
States, 1995
29
Sample results for 500 randomly selected
violent-crime reports from last year
30
Expected frequencies if last years violent-crime
distribution is the same as the1995 distribution
31
Calculating the goodness of fit (Chi-Square)
?2obt 4.219. With dfk-13, at .05, ?2crit
7.81 Therefore, we retain H0, last years crime
distribution is not significantly different from
that in 1995.
32
The Chi-Square Test for Independence

The ?2 statistic can also be used to test whether
or not there is a relationship between two
categorical (nominal) variables.
Each individual in the sample is measured or
classified on two separate variables.
Also known as the Contingency Table Analysis

33
Do people with cell phones have more car
accidents than people without cell phones?

The Department of Transportation wanted to see if
cell phone users have more car accidents than
non-cell phone users. The following data are a
sample of 50 people who have had car accidents
over the past month, and 50 randomly sampled
drivers who have not had car accidents over the
past month

34
Chi-Square
Cell Phone
No Cell Phone
Car Accident
No Car Accident
35
So what now?

Notice that here, only observed frequencies are
given to you. You have to calculate the expected
frequencies.
First, total your rows and columns.

36
Chi-Square
Cell Phone
No Cell Phone
50
Car Accident
50
No Car Accident
55
45
100
37
Eij RiCj / N where Eij the expected
frequency at row i, column j. Ri Row is
total Cj Column js total N Grand total (all
cells included)
38
Chi-Square
Cell Phone
No Cell Phone
50
29
Car Accident
50 X 45 100
O
22.5
22.5
E
50
No Car Accident
55
45
100
39
Chi-Square
Cell Phone
No Cell Phone
50
29
Car Accident
50 X 55 100
O
27.5
22.5
27.5
E
50
No Car Accident
55
45
100
40
Chi-Square
Cell Phone
No Cell Phone
50
29
Car Accident
O
22.5
27.5
E
50
No Car Accident
22.5
27.5
55
45
100
41
And then.
You now have four cells with expected and
observed frequencies. Now use the Chi Square
formula!
(29-22.5)2/22.5 1.8778 (21-27.5)2/27.5
1.5364 (16-22.5)2/22.5 1.8778 (34-27.5)2/27
.5 1.5364 ?2obt 6.8284 6.83
42
To test this statistic

Lets use the .05 level of significance.
Variable 1 Cell phone?
Variable 2 Car accident?
Because we are dealing with frequency data of two
categorical variables, we will perform a
chi-square test of independence.
Because it is a chi-square, it is a two-tailed
test.
H0 Cell phones and car accidents are independent
Ha Cell phones and car accidents are not
independent

43
Chi-Square

DF (for test of independence) df
(R-1) (C-1)
Where R number of rows
Where C number of columns
df (2-1)(2-1)1, ?2crit 3.84 (from table E.1,
page. 439)
?2obt 6.83, so reject the H0.

44
SO.

Results are significant. The frequency of being
in a car accident depends on whether or not one
uses a cell phone.

45
Note

When the expected frequencies are too small,
chi-square may not be a valid test, therefore all
expected frequencies should be at least 5
(dependent on sample size).
The chi-square test is also only valid when the
observations are independent from each other,
therefore N should be equal to the number of
subjects (every subject should only be measured
once).

46
What about when you have more than two
categories?

A fast-food marketing consultant wanted to know
whether men and women had different preferences
for fast-food restaurants. She randomly sampled
150 men and 100 women and asked each to declare
his or her preference for four fast foods
restaurants. Here are her data

47
Burger King
Total 100 150 ----- 250
Subway
Harveys
McDonalds
Women
Men
Total 55 55 85
55
48
Now remember.

To calculate expected frequencies for each cell,
Eij RiCj / N
So (Row sum) (Column sum) / N
Do this for each cell to get expected
frequencies.

49
Burger King
Total 100 150 ----- 250
Subway
Harveys
McDonalds
Women
22
22
22
34
Men Total
33
33
33
51
55 55 85
55
50
You now have what you need to calculate ?2
(35-22)2/22 7.6818 (25-22)2/22
0.4091 (15-34)2/34 10.6176 (25-22)2/22
0.4091 (20-33)2/33 5.1212 (30-33)2/33
0.2727 (70-51)2/51 7.0784 (30-33)2/33
0.2727
Add these up! ?2obt 31.8626
31.86 df (R-1)(C-1) (2-1)(4-1) 3 ?2crit
7.82 (at .05)
51
So.

H0 Gender and fast food preference are
independent.
Ha Gender and fast food preference are
dependent.
We reject the null hypothesis.
SO Gender and preference for fast food are
dependent.
Stated differently, men and women do not prefer
the same fast food joints.

Basic Quantitative Methods in the Social Sciences (AKA Intro Stats) - PowerPoint PPT Presentation

Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Most common statistics: mean and SD ... McDonalds. Burger. King. Men. Total. 100. 150. 250. Total: 55 55 85 55. Now remember... – PowerPoint PPT presentation