Title: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)
1Basic Quantitative Methods in the Social
Sciences(AKA Intro Stats)
2Quantitative vs. Frequency Data
- Recall from our first lecture, that data could
take the form of - Quantitative data (AKA measurement data) whereby
each observation represents a score on a
continuum - Most common statistics mean and SD
- Examples height, weight, IQ, rating of Chretien
on a scale of 1-10. - Categorical data (AKA frequency data) whereby
frequencies of observations fall into one of two
or more categories. - Examples female vs. male, brown eyes vs. blue
eyes, opposed to Chretien vs. support Chretien - This type of data is not a measurement of
anything per se, it is simply frequencies of
occurrence in the nominal classes (e.g., of
males vs. females, etc.)
3What if we want to know whether the observed
frequencies were what we had expected?
- When it comes to frequency data, once we have
counted our observations, we have - Observed Frequencies frequencies that have
actually occurred - Expected Frequencies frequencies that we
expected to occur if some assumption is true - The Null Hypothesis (H0) is that the observed
frequencies do not differ from what we had
expected (the expected frequencies)
4Chi - Square
- Examines the difference between the Observed and
the Expected frequencies among groups - Both variables are Nominal (therefore all we can
measure is observed frequencies) - E.g., we know how many males are in this class
(the observed frequency), and how many we would
expect to be in this class
5Non-parametric tests
- The Chi-Square test is a non-parametric test.
- With non-parametric tests, we do not need to
assume that the population data are normally
distributed. - Non parametric tests allow for nominal and
ordinal scales of measurement. - Although non-parametric tests are still
inferential, they are less powerful than are
parametric tests (e.g., t-tests, correlations).
This means that parametric tests are not as
likely to find significant results when the
effect size is small.
6- Cindy was hired at a fitness club in town. She
was told by the boss that 68 of people who come
in and inquire about becoming a member end up
joining after they are shown around the club. - Three months later, Cindy is fired because the
boss feels that she is not good at selling the
club to people inquiring about memberships. Over
the three months, Cindy gave tours to 75 people.
44 of them ended up joining. - Cindy thinks that the boss just doesnt like her.
7Is the Boss Accusation True?
- The boss originally said that 68 of people
typically join. Therefore, of the 75 people Cindy
gave tours to, 51 of them should have joined if
Cindy is on par with the other employees.
Join Dont Join
Observed 44 31
Expected 51 24
8The Chi-Square Goodness-of-Fit Test
- The Chi-Square Goodness-of-Fit Test is used when
you have one classification variable (but it has
2 or more categories). - H0 The assumption that Cindy is on par with the
other employees is true. - Ha The assumption that Cindy is on par with the
other employees is not true. - The Chi-Square test allows a decision about
whether observed and expected frequencies differ
significantly. - Rejection of H0 suggests that our assumption that
led to the expected frequencies is wrong (or in
this example, Cindy is not on par with the other
club employees).
9? Greek letter Chi, O observed frequencies, E
expected frequencies
? 2
H0 O-E 0 (no difference between observed and
expected frequencies), therefore, ? 2 0.
10Calculating Chi Square
- Create a table with columns for each category (so
here, join and not join) - Create a row for each of the following
- Observed frequencies (O)
- Expected frequencies (E)
- O E
- (O E)2
- (O E)2 / E
- The Chi Square statistic is then the sum of this
final row of (O E)2 / E
11Chi Square Goodness of Fit Table
Join Not Join
O n join n not join
E
O E
(O E)2
(O E)2/E
Sum (O E)2/E Chi Square Statistic Value Chi Square Statistic Value
12Calculating Chi-Squared ( )
Join Dont Join
Observed 44 31
Expected 51 24
(O-E) -7 7
(O-E)2 49 49
(O-E)2/E 0.9608 2.0417
Sum 0.9608 2.0417 3.0025
13Testing the Significance of
- DF k-1 where k the number of outcome
categories. - Table E.1 in the text (p. 439) at the .050
level of significance, df 1 - ?2crit 3.84
- Since ?2obt 3.00, we retain H0 (NOTE give obt
value to 2 decimal places) - Therefore, the result is not significant, Cindys
recruiting performance at the fitness club is not
significantly different than that of the other
employees.
14- It is easy to see that in the numerator of the
formula, observed frequencies are compared to
expected frequencies to assess how well the
sample data match the hypothesized data. Why must
we divide the numerator by the expected frequency
for each category? - Suppose you were going to throw a party and you
expected 1000 people to show up. However, at the
party, you counted the number of guests and
observed that 1040 actually showed up. Forty more
guests than expected are no major problem when
all along you were planning for 1000. There will
probably still be enough beer and chips for
everyone.
15- On the other hand, suppose you had a party and
you expected 10 people to attend but instead 50
actually showed up. Forty more guests in this
case spell big trouble. How significant the
discrepancy is depends in part on what you were
originally expecting. - With very large expected frequencies, allowances
are made for more errors between observed and
expected frequencies. This is accomplished in the
chi-square formula by dividing the squared
discrepancy for each category by its expected
frequency.
16What About When There are More Than Two
Categories?
- In the preceding example, observed frequencies
fell into one of two categories joined or did
not join. - What if there are more than two categories?
17Example
- Suppose a study showed that of 90 people in
trauma-induced comas who were treated with
traditional medicine, 30 died, 30 woke up and
fully recovered, and 30 remained comatose
indefinitely. (Note These data were made up). - Dr. X, a naturopathic doctor who works with
patients with trauma-induced comas, claims that
alternative approaches result in superior
recovery rates. To test his claim, 90 comatose
people were treated with his alternative approach
and were then observed. 40 of them woke up and
were fully recovered, 30 died, and 20 remained
comatose indefinitely.
18Chi- Square
Stayed in Coma
Woke
Died
40
20
Total O 90
O
O
30
30
E
E
Whats H0?
19Chi- Square
Stayed in Coma
Died
Woke
40
30
20
Total O 90
O
O
O
30
30
30
E
E
E
n 30 30 30 90
20Chi- Square
? 2
Figure for Each Cell
21Chi- Square
Stayed In Coma
Woke
Died
22Chi- Square
Stayed In Coma
Died
Woke
( 20 - 30 ) 2
fe
fe
23Chi- Square
Stayed In Coma
Died
Woke
( -10 ) 2
fe
fe
24Chi- Square
Stayed in Coma
Died
Woke
100
fe
fe
25Chi- Square
Stayed In Coma
Woke
Died
100
100
30
30
30
26Chi- Square
Stayed In Coma
Woke
Died
3.3333
0
3.3333
27Chi- Square
?2obt 6.6666 6.67 df k - 1 2 ?2crit
5.99 Therefore We reject H0, Dr. Xs
alternative approach does indeed generate
significantly more recoveries.
28Distribution of violent crimes in the United
States, 1995
29Sample results for 500 randomly selected
violent-crime reports from last year
30Expected frequencies if last years violent-crime
distribution is the same as the1995 distribution
31Calculating the goodness of fit (Chi-Square)
?2obt 4.219. With dfk-13, at .05, ?2crit
7.81 Therefore, we retain H0, last years crime
distribution is not significantly different from
that in 1995.
32The Chi-Square Test for Independence
- The ?2 statistic can also be used to test whether
or not there is a relationship between two
categorical (nominal) variables. - Each individual in the sample is measured or
classified on two separate variables. - Also known as the Contingency Table Analysis
33Do people with cell phones have more car
accidents than people without cell phones?
- The Department of Transportation wanted to see if
cell phone users have more car accidents than
non-cell phone users. The following data are a
sample of 50 people who have had car accidents
over the past month, and 50 randomly sampled
drivers who have not had car accidents over the
past month
34Chi-Square
Cell Phone
No Cell Phone
Car Accident
No Car Accident
35So what now?
- Notice that here, only observed frequencies are
given to you. You have to calculate the expected
frequencies. - First, total your rows and columns.
36Chi-Square
Cell Phone
No Cell Phone
50
Car Accident
50
No Car Accident
55
45
100
37Eij RiCj / N where Eij the expected
frequency at row i, column j. Ri Row is
total Cj Column js total N Grand total (all
cells included)
38Chi-Square
Cell Phone
No Cell Phone
50
29
Car Accident
50 X 45 100
O
22.5
22.5
E
50
No Car Accident
55
45
100
39Chi-Square
Cell Phone
No Cell Phone
50
29
Car Accident
50 X 55 100
O
27.5
22.5
27.5
E
50
No Car Accident
55
45
100
40Chi-Square
Cell Phone
No Cell Phone
50
29
Car Accident
O
22.5
27.5
E
50
No Car Accident
22.5
27.5
55
45
100
41And then.
You now have four cells with expected and
observed frequencies. Now use the Chi Square
formula!
(29-22.5)2/22.5 1.8778 (21-27.5)2/27.5
1.5364 (16-22.5)2/22.5 1.8778 (34-27.5)2/27
.5 1.5364 ?2obt 6.8284 6.83
42To test this statistic
- Lets use the .05 level of significance.
- Variable 1 Cell phone?
- Variable 2 Car accident?
- Because we are dealing with frequency data of two
categorical variables, we will perform a
chi-square test of independence. - Because it is a chi-square, it is a two-tailed
test. - H0 Cell phones and car accidents are independent
- Ha Cell phones and car accidents are not
independent
43Chi-Square
- DF (for test of independence) df
(R-1) (C-1) - Where R number of rows
- Where C number of columns
- df (2-1)(2-1)1, ?2crit 3.84 (from table E.1,
page. 439) - ?2obt 6.83, so reject the H0.
44SO.
- Results are significant. The frequency of being
in a car accident depends on whether or not one
uses a cell phone.
45Note
- When the expected frequencies are too small,
chi-square may not be a valid test, therefore all
expected frequencies should be at least 5
(dependent on sample size). - The chi-square test is also only valid when the
observations are independent from each other,
therefore N should be equal to the number of
subjects (every subject should only be measured
once).
46What about when you have more than two
categories?
- A fast-food marketing consultant wanted to know
whether men and women had different preferences
for fast-food restaurants. She randomly sampled
150 men and 100 women and asked each to declare
his or her preference for four fast foods
restaurants. Here are her data
47Burger King
Total 100 150 ----- 250
Subway
Harveys
McDonalds
Women
Men
Total 55 55 85
55
48Now remember.
- To calculate expected frequencies for each cell,
Eij RiCj / N - So (Row sum) (Column sum) / N
- Do this for each cell to get expected
frequencies.
49Burger King
Total 100 150 ----- 250
Subway
Harveys
McDonalds
Women
22
22
22
34
Men Total
33
33
33
51
55 55 85
55
50You now have what you need to calculate ?2
(35-22)2/22 7.6818 (25-22)2/22
0.4091 (15-34)2/34 10.6176 (25-22)2/22
0.4091 (20-33)2/33 5.1212 (30-33)2/33
0.2727 (70-51)2/51 7.0784 (30-33)2/33
0.2727
Add these up! ?2obt 31.8626
31.86 df (R-1)(C-1) (2-1)(4-1) 3 ?2crit
7.82 (at .05)
51So.
- H0 Gender and fast food preference are
independent. - Ha Gender and fast food preference are
dependent. - We reject the null hypothesis.
- SO Gender and preference for fast food are
dependent. - Stated differently, men and women do not prefer
the same fast food joints.
52Time for Some Review
53- Final Exam Info
- Wednesday, June 25 from 700 pm to 1000 pm
- REMINDERS Bring pencils, calculator, TEXTBOOK,
student ID