Chapter 9'1: Using Chisquare analysis to analyze the fit of a proposed model

About This Presentation

Title:

Chapter 9'1: Using Chisquare analysis to analyze the fit of a proposed model

Description:

... Using Chi-square analysis to analyze the 'fit' of a proposed ... 1. For a Chi-square 'Goodness of Fit' test. a. Know when to use it. b. Know the assumptions. ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 31

Provided by: jw590

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 9'1: Using Chisquare analysis to analyze the fit of a proposed model

1
Chapter 9.1 Using Chi-square analysis to analyze
the fit of a proposed model

Goals
1. For a Chi-square Goodness of Fit test
a. Know when to use it.
b. Know the assumptions.
c. Be able to perform all 9 steps.
d. Know how to make a statistical conclusion and
an interpretation of the results.
2. Be able to calculate the expected counts
for a cell.
3. Be able to calculate the chi-square value for
a cell as well as the chi-square value for the
model.

2
how to test one sample proportion vs. some
expected proportion

Research question
We are interested in simultaneously estimating
multiple unknown proportions for a population
based on a sample.
Example
From newspaper articles, we have read that the US
is approximately 55 Caucasian, 20 Hispanic, 15
African-American, and 10 Other races. We want to
test if the University of Wyoming follows these
percentages.

3
Multiple comparisons?

We could perform 4 studies, each testing one of
the proportions
Test1 Is the proportion Caucasian .55?
Test2 Is the proportion Hispanic .20? Etc.
But this introduces the problem of multiple
comparisons, and it doesnt really test what we
want.
We want one OVERALL test to answer Does this
model (in its entirety) fit the data?
We want an overall test to look for any
differences among all the parameters in which we
are interested. We want something to analyze a
Goodness of fit for the model.

4
To answer this question, we will use the
following ideas

We will set up our actual data in a table with
r rows and c columns.
Using the null hypothesis, we will compute
another table for the expected counts for each
cell.
We will compare each actual value vs. its
corresponding expected value.
If these differences (in total) are large, then
we will reject the model.
If these differences (in total) are small, then
we will not reject the model.

5
Calculating expected counts

Using our problem above, we have heard that the
population proportions should be
Caucasian Hispanic African-American Other
55 20 15 10
Let our sample size be equal to 500. Our sample
results are 440, 15, 20, and 25. For each race,
how many people do we expect if our null
hypothesis is true?

6
Observed and expected Frequencies
7
Chi Square
8
Chi Square cont.
9
Example Conduct a Hypothesis test to see if the
students at UW follow the percentages 55/ 20/ 15/
10 using alpha .05.

Step 1. Hypotheses
The population proportions for Caucasian/
Hispanic/ African-American/Other are .55, .20,
.15, and .10
There is at least one difference in the
population proportions
Step 2. Find critical Chi Square for alpha .05
Step 3. Collect data

10
Checking assumptions

Step 4. Assumptions
a. Data is collected with a SRS
Reason want our data to be representative
b. Population (N) is at least 10 times sample
size (n)
Reason we want the probability of selecting a
yes to be independent from person to person,
thus the probability of a yes is constant.
c. No more than 20 of the Expected cell counts
are less than 5. All expected cell counts are at
least 1.
-We want to get an accurate estimate of the true
proportion.
Step 5. Calculate the test statistic.

11
Chi Square Calculations
12
Chi Square Calculations, cont.
13
Further Steps

Step 6. p-Value from the table,
the p-value is lt .0005
(since it is off the chart)
7. Comparison
pltalpha
8. Statistical Conclusion
reject the null hypothesis
9. Interpretation
There is significant evidence that the true model
for race is different than 55 Caucasian/ 20
Hispanic/15 African American /10 Other, in the
population, alpha .05.

14
Final notes

Note If our data supported a Fail to reject
the null hypothesis decision, then our
interpretation would be the same but just add
NOT.
Specifically, we would sayThere is not
significant evidence that the truemodel for race
is different than 55 Caucasian/ 20 Hispanic/15
African American /10 Other, in the population,
alpha .05.
Note The actual counts for a table must be an
integer.
The expected counts for a table DO NOT have to be
an integer
If we expected 55 of the sample to be Caucasian,
and
our sample size 50 people, then our expected
counts
for that cell would be (.55)(50) 27.5 people

15
Chi square test of independence

New Situation
We are interested in simultaneously estimating
multiple unknown proportions and comparing these
multiple proportions to see if they are all the
same or if at least one of them is different.
Example Cancer rates for different cities in the
US (testing if multiple proportions are the
same)
We want to investigate the claim that different
cities have different rates of cancer. Below is a
breakdown of the number of households affected by
cancer for six cities. A household is either
affected or not affected. We would like to know
if the proportions of affected households in the
different cities are the same or if at least one
of them is different than the rest.

16
Data
17
What does this mean?

We want one OVERALL test to answer Are all of
these proportions the same?
We want an overall test to look for any
differences among all the parameters in which we
are interested. We want something to analyze a
Goodness of fit for the model of all
proportions being equal.

18
To answer this question, we will use the
following ideas

We will set up our actual data in a table with
r rows and c columns.
Using the null hypothesis, we will compute
another table for the expected counts for each
cell.
We will compare each actual value vs. its
corresponding expected value.
If these differences (in total) are large, then
we will reject the model.
If these differences (in total) are small, then
we will not reject the model.

19
Differences from goodness of fit test

Using our problem above, we know that if cancer
rates are the same, then the population
proportions should be the same for each of the
six cities.
Note We are not saying that we know that the
population proportion is equal to some value like
p .08. We are only saying that whatever that
value is, that it is the same for all cities.
Also note the difference between this statement
and the one from last lecture
(The proportions are equal to .55, .20, .15, .10
for Caucasian, Hispanic, African-American and
Other)

20
Row and column totals
21
Calculating expected values
22
Calculating Chi Square
23
Example Conduct a Hypothesis test to see if the
cancer rates for the six cities are the same,

Step 1. Hypotheses
H0 The population proportions for being
affected by cancer are the same for the six
cities
H1 There is at least one difference in the
population proportions
These hypotheses can also be written with
symbols
H1 at least one ? is different from the rest

H0
24
How do we calculate Degrees of freedom?

df number of cells that are free and not
pre-determined
So for us, df (r-1)(c-1) (6-1)(2-1)
df 5

25
Data for cancer study
Observed
Expected
26
Calculations
27
Calculations, cont.
28
Deciding if result is significant

6. p-Value From the table, the p-value is
between .01 and .02
7. Comparison
p lt alpha
8. Statistical Conclusion
Reject the null hypothesis
9. Interpretation
There is significant evidence that the
proportions for getting cancer in the 6 cities
are different in thepopulation, alpha .05.
Note If our data supported a Fail to reject
the null hypothesis decision, then our
interpretation would be the same but just add
NOT.
Specifically, we would say There is not
significant evidence that the proportions for
getting cancer in the 6 cities are different in
the population, alpha .05.

29
Analyzing which cell contributes the most to a
significant overall chi-square value

Look at the individual chi-square values for each
cell and find the largest.
This is the cell in which observed and expected
are the most different.
Interpretation for our data above
The chi-square value for the cell corresponding
to city4 and affected with cancer is 11.338.
For this cell, the observed value of 12 was much
greater than the expected value of 4.7 people.

30
Testing if 2 variables are independent

We want to test if the variables city and
cancer status are related.
That is, are the variables city and cancer
status independent or dependent?
Does knowing the value for city help us in
determining the proportion of individuals
affected with cancer?
If the variables are independent, then the
proportion of people with cancer should be the
same for all cities. Thus, by knowing the city,
this does NOT help us out.
If the variables are dependent, then the
proportion of people with cancer should NOT be
the same for all cities. Thus, by knowing the
city, this DOES help us out.

Write a Comment

User Comments (0)

About PowerShow.com

Chapter 9'1: Using Chisquare analysis to analyze the fit of a proposed model - PowerPoint PPT Presentation

Chapter 9'1: Using Chisquare analysis to analyze the fit of a proposed model

... Using Chi-square analysis to analyze the 'fit' of a proposed ... 1. For a Chi-square 'Goodness of Fit' test. a. Know when to use it. b. Know the assumptions. ... – PowerPoint PPT presentation