Chapter 9'1: Using Chisquare analysis to analyze the fit of a proposed model - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Chapter 9'1: Using Chisquare analysis to analyze the fit of a proposed model

Description:

... Using Chi-square analysis to analyze the 'fit' of a proposed ... 1. For a Chi-square 'Goodness of Fit' test. a. Know when to use it. b. Know the assumptions. ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 31
Provided by: jw590
Category:

less

Transcript and Presenter's Notes

Title: Chapter 9'1: Using Chisquare analysis to analyze the fit of a proposed model


1
Chapter 9.1 Using Chi-square analysis to analyze
the fit of a proposed model
  • Goals
  • 1. For a Chi-square Goodness of Fit test
  • a. Know when to use it.
  • b. Know the assumptions.
  • c. Be able to perform all 9 steps.
  • d. Know how to make a statistical conclusion and
    an interpretation of the results.
  • 2. Be able to calculate the expected counts
    for a cell.
  • 3. Be able to calculate the chi-square value for
    a cell as well as the chi-square value for the
    model.

2
how to test one sample proportion vs. some
expected proportion
  • Research question
  • We are interested in simultaneously estimating
    multiple unknown proportions for a population
    based on a sample.
  • Example
  • From newspaper articles, we have read that the US
    is approximately 55 Caucasian, 20 Hispanic, 15
    African-American, and 10 Other races. We want to
    test if the University of Wyoming follows these
    percentages.

3
Multiple comparisons?
  • We could perform 4 studies, each testing one of
    the proportions
  • Test1 Is the proportion Caucasian .55?
  • Test2 Is the proportion Hispanic .20? Etc.
  • But this introduces the problem of multiple
    comparisons, and it doesnt really test what we
    want.
  • We want one OVERALL test to answer Does this
    model (in its entirety) fit the data?
  • We want an overall test to look for any
    differences among all the parameters in which we
    are interested. We want something to analyze a
    Goodness of fit for the model.

4
To answer this question, we will use the
following ideas
  • We will set up our actual data in a table with
    r rows and c columns.
  • Using the null hypothesis, we will compute
    another table for the expected counts for each
    cell.
  • We will compare each actual value vs. its
    corresponding expected value.
  • If these differences (in total) are large, then
    we will reject the model.
  • If these differences (in total) are small, then
    we will not reject the model.

5
Calculating expected counts
  • Using our problem above, we have heard that the
    population proportions should be
  • Caucasian Hispanic African-American Other
  • 55 20 15 10
  • Let our sample size be equal to 500. Our sample
    results are 440, 15, 20, and 25. For each race,
    how many people do we expect if our null
    hypothesis is true?

6
Observed and expected Frequencies
7
Chi Square
8
Chi Square cont.
9
Example Conduct a Hypothesis test to see if the
students at UW follow the percentages 55/ 20/ 15/
10 using alpha .05.
  • Step 1. Hypotheses
  • The population proportions for Caucasian/
    Hispanic/ African-American/Other are .55, .20,
    .15, and .10
  • There is at least one difference in the
    population proportions
  • Step 2. Find critical Chi Square for alpha .05
  • Step 3. Collect data

10
Checking assumptions
  • Step 4. Assumptions
  • a. Data is collected with a SRS
  • Reason want our data to be representative
  • b. Population (N) is at least 10 times sample
    size (n)
  • Reason we want the probability of selecting a
    yes to be independent from person to person,
    thus the probability of a yes is constant.
  • c. No more than 20 of the Expected cell counts
    are less than 5. All expected cell counts are at
    least 1.
  • -We want to get an accurate estimate of the true
    proportion.
  • Step 5. Calculate the test statistic.

11
Chi Square Calculations
12
Chi Square Calculations, cont.
13
Further Steps
  • Step 6. p-Value from the table,
  • the p-value is lt .0005
  • (since it is off the chart)
  • 7. Comparison
  • pltalpha
  • 8. Statistical Conclusion
  • reject the null hypothesis
  • 9. Interpretation
  • There is significant evidence that the true model
    for race is different than 55 Caucasian/ 20
    Hispanic/15 African American /10 Other, in the
    population, alpha .05.

14
Final notes
  • Note If our data supported a Fail to reject
    the null hypothesis decision, then our
    interpretation would be the same but just add
    NOT.
  • Specifically, we would sayThere is not
    significant evidence that the truemodel for race
    is different than 55 Caucasian/ 20 Hispanic/15
    African American /10 Other, in the population,
    alpha .05.
  • Note The actual counts for a table must be an
    integer.
  • The expected counts for a table DO NOT have to be
    an integer
  • If we expected 55 of the sample to be Caucasian,
    and
  • our sample size 50 people, then our expected
    counts
  • for that cell would be (.55)(50) 27.5 people

15
Chi square test of independence
  • New Situation
  • We are interested in simultaneously estimating
    multiple unknown proportions and comparing these
    multiple proportions to see if they are all the
    same or if at least one of them is different.
  • Example Cancer rates for different cities in the
    US (testing if multiple proportions are the
    same)
  • We want to investigate the claim that different
    cities have different rates of cancer. Below is a
    breakdown of the number of households affected by
    cancer for six cities. A household is either
    affected or not affected. We would like to know
    if the proportions of affected households in the
    different cities are the same or if at least one
    of them is different than the rest.

16
Data
17
What does this mean?
  • We want one OVERALL test to answer Are all of
    these proportions the same?
  • We want an overall test to look for any
    differences among all the parameters in which we
    are interested. We want something to analyze a
    Goodness of fit for the model of all
    proportions being equal.

18
To answer this question, we will use the
following ideas
  • We will set up our actual data in a table with
    r rows and c columns.
  • Using the null hypothesis, we will compute
    another table for the expected counts for each
    cell.
  • We will compare each actual value vs. its
    corresponding expected value.
  • If these differences (in total) are large, then
    we will reject the model.
  • If these differences (in total) are small, then
    we will not reject the model.

19
Differences from goodness of fit test
  • Using our problem above, we know that if cancer
    rates are the same, then the population
    proportions should be the same for each of the
    six cities.
  • Note We are not saying that we know that the
    population proportion is equal to some value like
  • p .08. We are only saying that whatever that
    value is, that it is the same for all cities.
  • Also note the difference between this statement
    and the one from last lecture
  • (The proportions are equal to .55, .20, .15, .10
    for Caucasian, Hispanic, African-American and
    Other)

20
Row and column totals
21
Calculating expected values
22
Calculating Chi Square
23
Example Conduct a Hypothesis test to see if the
cancer rates for the six cities are the same,
  • Step 1. Hypotheses
  • H0 The population proportions for being
    affected by cancer are the same for the six
    cities
  • H1 There is at least one difference in the
    population proportions
  • These hypotheses can also be written with
    symbols
  • H1 at least one ? is different from the rest

H0
24
How do we calculate Degrees of freedom?
  • df number of cells that are free and not
    pre-determined
  • So for us, df (r-1)(c-1) (6-1)(2-1)
  • df 5

25
Data for cancer study
Observed
Expected
26
Calculations
27
Calculations, cont.
28
Deciding if result is significant
  • 6. p-Value From the table, the p-value is
    between .01 and .02
  • 7. Comparison
  • p lt alpha
  • 8. Statistical Conclusion
  • Reject the null hypothesis
  • 9. Interpretation
  • There is significant evidence that the
    proportions for getting cancer in the 6 cities
    are different in thepopulation, alpha .05.
  • Note If our data supported a Fail to reject
    the null hypothesis decision, then our
    interpretation would be the same but just add
    NOT.
  • Specifically, we would say There is not
    significant evidence that the proportions for
    getting cancer in the 6 cities are different in
    the population, alpha .05.

29
Analyzing which cell contributes the most to a
significant overall chi-square value
  • Look at the individual chi-square values for each
    cell and find the largest.
  • This is the cell in which observed and expected
    are the most different.
  • Interpretation for our data above
  • The chi-square value for the cell corresponding
    to city4 and affected with cancer is 11.338.
  • For this cell, the observed value of 12 was much
    greater than the expected value of 4.7 people.

30
Testing if 2 variables are independent
  • We want to test if the variables city and
    cancer status are related.
  • That is, are the variables city and cancer
    status independent or dependent?
  • Does knowing the value for city help us in
    determining the proportion of individuals
    affected with cancer?
  • If the variables are independent, then the
    proportion of people with cancer should be the
    same for all cities. Thus, by knowing the city,
    this does NOT help us out.
  • If the variables are dependent, then the
    proportion of people with cancer should NOT be
    the same for all cities. Thus, by knowing the
    city, this DOES help us out.
Write a Comment
User Comments (0)
About PowerShow.com