Title: Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12
1Making Inferences for Associations Between
Categorical Variables Chi SquareChapter 12
- Reading Assignment
- pp. 463-482 485
2Elements of a test of hypotheses 3
- Hypothesis testing Process for finding out
whether we can generalize about an association
from a sample to a population - Null hypothesis (H_0) Represents the status quo
to the party performing the sampling experiment,
i.e., will be accepted unless the data provides
convincing evidence it is false. - Research hypothesis (H_1) (aka alternative
hypothesis) Will be accepted only if the data
provides convincing evidence of its truth - Homework Skills 1, p. 464
3Process of Hypothesis Testing 5
- Step 1 Specify a research hypothesis and a null
hypothesis - Step 2 Compute the value of a test statistic for
the relationship - Step 3 Calculate the degrees of freedom for the
variables involved - Step 4 Look up the distribution for the test
statistic to find its critical value at a
specified level of probability (to determine the
likelihood that a test stat. of a particular
value could have occurred by chance alone) - Step 5 Decide whether to reject the null
hypothesis
4Null Hypothesis 3
- Null Hypothesis(H_0) speculates there is no
association between the two variables. Examples - H_0 men are no different from women in there
political affiliations - H_0 There is no relationship between a
respondents educational level and his or her
parents - H_0 Older people are no more likely to be happy
than younger people - This is the only hypothesis that can actually be
tested-we either reject or fail to reject the
null hypothesis - EX H_0 There is no association between age and
happiness among American adults hw/ read p. 466
52 Statistical Independence
- Statistical Independence Two variables are
statistically independent when changes in one
variable (age of respondents) have nothing to do
with changes in a second (happiness), ie, they
vary independently of one another - Conversely, when two variables are statistically
dependent on one another, changes in one variable
are associated with changes in a second
variable.,ie, changes in age(older respondents)
are associated with changes in levels of
happiness (more happiness)
62 Statistical Independence and hypothesis testing
- Ex/ Null Hypothesis Age is statistically
independent of happiness, ie, differences among
respondents to the variable age are unrelated to
any differences in their levels of reported
happiness - Hyp. Testing can assess the likelihood that the
degree of statistical indep found in the sample
is due to chance - If we find that the degree of statistical indep
found in the sample is not likely to be due to
chance, null hyp is rejected - If it is likely due to chance, null hyp is
accepted
73 Type I and Type II Errors
- Mistakes arising from whether a given sample
may or may not be representative of a population - If a Null Hypothesis assumes there is no
association between two variables, and we reject
it even though there is no association is a Type
I error, i.e, we call someone a liar when he is
telling the truth - If a Null Hypothesis assumes there is no
association between two variables, and we accept
it even though there is an association is a Type
II error, i.e., we say someone is truthful when
he is lying
83 Type I and Type II Errors
Conclusion H_0 true H_1 true
H_0 true Correct decision Type II error
H_1 true Type I error Correct decision
93 Elements of a Test of Hypothesis
- Null Hypothesis (H_0) a theory about one of the
population parameters. The theory generally
represents the status quo, which must be proven
false - Research Hypothesis (H_1) a theory that
contradicts the null hypothesis. The theory
generally represents the truth that will be
accepted only if there is evidence - Test statistic Sample statistic used to decide
whether to reject the null hypothesis
103 Elements of a Test of Hypothesis (cont)
- Rejection region The numerical values of the
test statistic for which the null hypothesis will
be rejected. The rejection region is chosen so
that the probability is ? that it will contain
the test statistic when the null hypothesis is
true, thereby leading to a Type I error. The
value of ? chosen is usually small (e.g.,
0.01,0.05, or 0.1), and is referred to as the
level of significance of the test. A 0.05 (or 5)
level of significance indicates that there is a
5 chance that we would reject the hypothesis
when we should not, or we have 95 confidence
that we have made the right decision - Assumptions Clear statement(s) of any
assumptions made about the population(s) being
sampled -
11 Elements of a Test of Hypothesis (cont)
- Experiment and calculation of test statistic
- Conclusion
- If the numerical value of the test statistic
falls in the rejection region, we reject the null
hypothesis and conclude that the research
hypothesis is true. We know that hypothesis
testing will led to this conclusion incorrectly
(Type I Error) 100? of the time when H_0 is
true. - If the test statistic does not fall in the
rejection region, we do not reject H_0. Thus we
reserve judgment about which hypothesis is true.
We do not conclude that the null hypothesis is
true because we do not, in general, know the
probability that our test procedure will lead to
an incorrect failure to reject H_0 (Type II Error)
125 Chi-Square
- Formula 12.1
- Observed vs. Expected Roll a die 6 times, get
three 3sobserved expected one 3 - Pp.469-71 skills Filling in the table of
expected values - Skills 3,4 Excel
- Generally, the greater the value of chi-square,
the more statistical dependence between two
variables
133 Chi-Square /degrees of freedom
- We are using observations from a sample as well
as certain population parameters. If these
parameters are unknown,they must be estimated
from the sample. - Degrees of Freedom (?) the number N of
independent observations in the sample (ie,
sample size) minus the number k of population
parameters which must be estimatede from sample
observations - ? N k
- When working with a contingency table,
df(r-1)(c-1), where r and c are the number of
rows and columns (resp) in the contingency table
14Chi squared examplegenerate random digits 250
times
digit 0 1 2 3 4 5 6 7 8 9
obs freq 17 31 29 18 14 20 35 30 20 36
Exp freq 25 25 25 25 25 25 25 25 25 25
15Chi squared examplegenerate random digits 250
times
- Question Does the observed frequency differ from
the expected distribution in a significant way?
digit 0 1 2 3 4 5 6 7 8 9
obs freq 17 31 29 18 14 20 35 30 20 36
Exp freq 25 25 25 25 25 25 25 25 25 25
163 Chi-Square random digit example
- ?2 (17-25)2/25 (31-25)2/25 (29-25)2/25
(36-25)2/25 excel - 23.3
- Degrees of freedom 10-19
- Table, p. 545
- ?2 at .99 is 21.7 23.3gt 21.7, so the observed
frequency differs from the expected frequency at
the 0.01 level of significance, so the table of
random numbers is somewhat doubtful
173 Chi-Square question
- 200 tosses of a fair coin, 115 heads, 85 tails.
Test the hypothesis that the coin is fair using
(a) 0.05, (b) 0.01 levels of significance - Ans
- Df2-11 (2 for H,T)
- O1115, O285 E1E2100
- ?2(115-100)2/100 (85-100)2/100 4.5
- (a) ?2 table for .95 is 3.84 4.5gt3.84, so reject
hyp that coin is fair at the 0.05 level of
significance - (b) ?2 table for .99 is 6.63 4.5lt6.63, so cannot
reject hyp that coin is fair at the 0.01 level of
significance -
18Interpreting Chi Square 4
- When hypothesizing about an association between
two variables, chi-square tells the likelihood
that the degree of statistical dependence
observed is simply the luck of the draw - A p value of 0.05 tells that there are no more
than 5 chances in 100 that the statistical
dependence is due to chance. Thus, there are 95
chances in 100 that the statistical dependence
found is not due to chance, so the null
hypothesis, ie., no association between
variables, is rejected - The higher the value of p, the less likely we are
to make a Type I error - bility
19Interpreting Chi Square 4
- When hypothesizing about an association between
two variables, chi-square tells the likelihood
that the degree of statistical dependence
observed is simply the luck of the draw - A p value of 0.05 tells that there are no more
than 5 chances in 100 that the statistical
dependence is due to chance. Thus, there are 95
chances in 100 that the statistical dependence
found is not due to chance, so the null
hypothesis, ie., no association between
variables, is rejected - The higher the value of p, the less likely we are
to make a Type I error - bility
20Interpreting Chi Square 4
- P. 480-81 Table 12.4 (p. 472) has ?2 15.487,
?6 - The higher the ?2 value, the less likely it is
that the value obtained is due to chance. (read
table 12.9, p. 481) - Rule of thumb reject null hypothesis when ?2
reaches 0.05only 5 chances in 100 that the
dependence is due to chance - Skills7, p. 481
- Skills 8, p. 485 (following their example, p.
484)
214
- Homework/ p. 492/ 1,3
- P 494/ spss 1,2