Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12 - PowerPoint PPT Presentation

About This Presentation

Title:

Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12

Description:

3 Chi-Square /degrees of freedom ... Chi squared example generate random digits 250 times ... 3 Chi-Square question. 200 tosses of a fair coin, 115 heads, 85 tails. ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 22

Provided by: setonhallu

Learn more at: http://pirate.shu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12

1
Making Inferences for Associations Between
Categorical Variables Chi SquareChapter 12

Reading Assignment
pp. 463-482 485

2
Elements of a test of hypotheses 3

Hypothesis testing Process for finding out
whether we can generalize about an association
from a sample to a population
Null hypothesis (H_0) Represents the status quo
to the party performing the sampling experiment,
i.e., will be accepted unless the data provides
convincing evidence it is false.
Research hypothesis (H_1) (aka alternative
hypothesis) Will be accepted only if the data
provides convincing evidence of its truth
Homework Skills 1, p. 464

3
Process of Hypothesis Testing 5

Step 1 Specify a research hypothesis and a null
hypothesis
Step 2 Compute the value of a test statistic for
the relationship
Step 3 Calculate the degrees of freedom for the
variables involved
Step 4 Look up the distribution for the test
statistic to find its critical value at a
specified level of probability (to determine the
likelihood that a test stat. of a particular
value could have occurred by chance alone)
Step 5 Decide whether to reject the null
hypothesis

4
Null Hypothesis 3

Null Hypothesis(H_0) speculates there is no
association between the two variables. Examples
H_0 men are no different from women in there
political affiliations
H_0 There is no relationship between a
respondents educational level and his or her
parents
H_0 Older people are no more likely to be happy
than younger people
This is the only hypothesis that can actually be
tested-we either reject or fail to reject the
null hypothesis
EX H_0 There is no association between age and
happiness among American adults hw/ read p. 466

5
2 Statistical Independence

Statistical Independence Two variables are
statistically independent when changes in one
variable (age of respondents) have nothing to do
with changes in a second (happiness), ie, they
vary independently of one another
Conversely, when two variables are statistically
dependent on one another, changes in one variable
are associated with changes in a second
variable.,ie, changes in age(older respondents)
are associated with changes in levels of
happiness (more happiness)

6
2 Statistical Independence and hypothesis testing

Ex/ Null Hypothesis Age is statistically
independent of happiness, ie, differences among
respondents to the variable age are unrelated to
any differences in their levels of reported
happiness
Hyp. Testing can assess the likelihood that the
degree of statistical indep found in the sample
is due to chance
If we find that the degree of statistical indep
found in the sample is not likely to be due to
chance, null hyp is rejected
If it is likely due to chance, null hyp is
accepted

7
3 Type I and Type II Errors

Mistakes arising from whether a given sample
may or may not be representative of a population
If a Null Hypothesis assumes there is no
association between two variables, and we reject
it even though there is no association is a Type
I error, i.e, we call someone a liar when he is
telling the truth
If a Null Hypothesis assumes there is no
association between two variables, and we accept
it even though there is an association is a Type
II error, i.e., we say someone is truthful when
he is lying

8
3 Type I and Type II Errors

Conclusion H_0 true H_1 true
H_0 true Correct decision Type II error
H_1 true Type I error Correct decision
9
3 Elements of a Test of Hypothesis

Null Hypothesis (H_0) a theory about one of the
population parameters. The theory generally
represents the status quo, which must be proven
false
Research Hypothesis (H_1) a theory that
contradicts the null hypothesis. The theory
generally represents the truth that will be
accepted only if there is evidence
Test statistic Sample statistic used to decide
whether to reject the null hypothesis

10
3 Elements of a Test of Hypothesis (cont)

Rejection region The numerical values of the
test statistic for which the null hypothesis will
be rejected. The rejection region is chosen so
that the probability is ? that it will contain
the test statistic when the null hypothesis is
true, thereby leading to a Type I error. The
value of ? chosen is usually small (e.g.,
0.01,0.05, or 0.1), and is referred to as the
level of significance of the test. A 0.05 (or 5)
level of significance indicates that there is a
5 chance that we would reject the hypothesis
when we should not, or we have 95 confidence
that we have made the right decision
Assumptions Clear statement(s) of any
assumptions made about the population(s) being
sampled

11

Elements of a Test of Hypothesis (cont)

Experiment and calculation of test statistic
Conclusion
If the numerical value of the test statistic
falls in the rejection region, we reject the null
hypothesis and conclude that the research
hypothesis is true. We know that hypothesis
testing will led to this conclusion incorrectly
(Type I Error) 100? of the time when H_0 is
true.
If the test statistic does not fall in the
rejection region, we do not reject H_0. Thus we
reserve judgment about which hypothesis is true.
We do not conclude that the null hypothesis is
true because we do not, in general, know the
probability that our test procedure will lead to
an incorrect failure to reject H_0 (Type II Error)

12
5 Chi-Square

Formula 12.1
Observed vs. Expected Roll a die 6 times, get
three 3sobserved expected one 3
Pp.469-71 skills Filling in the table of
expected values
Skills 3,4 Excel
Generally, the greater the value of chi-square,
the more statistical dependence between two
variables

13
3 Chi-Square /degrees of freedom

We are using observations from a sample as well
as certain population parameters. If these
parameters are unknown,they must be estimated
from the sample.
Degrees of Freedom (?) the number N of
independent observations in the sample (ie,
sample size) minus the number k of population
parameters which must be estimatede from sample
observations
? N k
When working with a contingency table,
df(r-1)(c-1), where r and c are the number of
rows and columns (resp) in the contingency table

14
Chi squared examplegenerate random digits 250
times
digit 0 1 2 3 4 5 6 7 8 9
obs freq 17 31 29 18 14 20 35 30 20 36
Exp freq 25 25 25 25 25 25 25 25 25 25
15
Chi squared examplegenerate random digits 250
times

Question Does the observed frequency differ from
the expected distribution in a significant way?

digit 0 1 2 3 4 5 6 7 8 9
obs freq 17 31 29 18 14 20 35 30 20 36
Exp freq 25 25 25 25 25 25 25 25 25 25
16
3 Chi-Square random digit example

?2 (17-25)2/25 (31-25)2/25 (29-25)2/25
(36-25)2/25 excel
23.3
Degrees of freedom 10-19
Table, p. 545
?2 at .99 is 21.7 23.3gt 21.7, so the observed
frequency differs from the expected frequency at
the 0.01 level of significance, so the table of
random numbers is somewhat doubtful

17
3 Chi-Square question

200 tosses of a fair coin, 115 heads, 85 tails.
Test the hypothesis that the coin is fair using
(a) 0.05, (b) 0.01 levels of significance
Ans
Df2-11 (2 for H,T)
O1115, O285 E1E2100
?2(115-100)2/100 (85-100)2/100 4.5
(a) ?2 table for .95 is 3.84 4.5gt3.84, so reject
hyp that coin is fair at the 0.05 level of
significance
(b) ?2 table for .99 is 6.63 4.5lt6.63, so cannot
reject hyp that coin is fair at the 0.01 level of
significance

18
Interpreting Chi Square 4

When hypothesizing about an association between
two variables, chi-square tells the likelihood
that the degree of statistical dependence
observed is simply the luck of the draw
A p value of 0.05 tells that there are no more
than 5 chances in 100 that the statistical
dependence is due to chance. Thus, there are 95
chances in 100 that the statistical dependence
found is not due to chance, so the null
hypothesis, ie., no association between
variables, is rejected
The higher the value of p, the less likely we are
to make a Type I error
bility

19
Interpreting Chi Square 4

When hypothesizing about an association between
two variables, chi-square tells the likelihood
that the degree of statistical dependence
observed is simply the luck of the draw
A p value of 0.05 tells that there are no more
than 5 chances in 100 that the statistical
dependence is due to chance. Thus, there are 95
chances in 100 that the statistical dependence
found is not due to chance, so the null
hypothesis, ie., no association between
variables, is rejected
The higher the value of p, the less likely we are
to make a Type I error
bility

20
Interpreting Chi Square 4

P. 480-81 Table 12.4 (p. 472) has ?2 15.487,
?6
The higher the ?2 value, the less likely it is
that the value obtained is due to chance. (read
table 12.9, p. 481)
Rule of thumb reject null hypothesis when ?2
reaches 0.05only 5 chances in 100 that the
dependence is due to chance
Skills7, p. 481
Skills 8, p. 485 (following their example, p.
484)

21
4