1 / 58

Chi-Square Procedures

- Chi-Square Test for Goodness of Fit, Independence

of Variables, and Homogeneity of Proportions

The chi-square Goodness of Fit Test you

have only one set of data on a single

characteristic, and you want to know if it

matches an expected distribution based on the

laws of probability(1 variable, 1population)

In a chi-square goodness of fit test, the null

hypothesis is always Ho The data follow a

specified distribution The alternative

hypothesis is always Ha The data does not

follow a specified distribution

The idea behind testing these types of claims is

to compare actual counts to the counts we would

expect if the null hypothesis were true. If a

significant difference between the actual counts

and expected counts exists, we would take this as

evidence against the null hypothesis.

The method for obtaining the expected counts

requires that we determine the number of

observations within each cell under the

assumption the null hypothesis is true.

Test Statistic for the Test of Goodness of Fit

Let Oi represent the observed number of counts

in the ith cell, Ei represent the expected number

of counts in the ith cell. Then,

approximately follows the chi-square distribution

with( of cells 1) degrees of freedom in the

contingency table

The Chi-Square Test for Goodness of Fit If a

claim is made regarding the data following a

certain distribution, we can use the following

steps to test the claim provided 1. the data is

randomly selected

The Chi-Square Test for Goodness of Fit If a

claim is made regarding the data following a

certain distribution, we can use the following

steps to test the claim provided 1. the data is

randomly selected

2. all expected frequencies are

greater than or equal to 1.

The Chi-Square Test for Goodness of Fit If a

claim is made regarding the data following a

certain distribution, we can use the following

steps to test the claim provided 1. the data is

randomly selected

2. all expected frequencies are

greater than or equal to 1. 3. 80 of

the expected cell counts are greater than or

equal to 5.

EXAMPLE Testing for Goodness of Fit

In consumer marketing, a common problem that any

marketing manager faces is the selection of

appropriate colors for package design. Assume

that a marketing manager wishes to compare five

different colors of package design. He is

interested in knowing if there is a preference

among the five colors so that it can be

introduced in the market. A random sample of 400

consumers reveals the following. Do the consumer

preferences for package colors show any

significant difference?

Package Color Red Blue Green Pink Orange Total

Costumers Preference 70 106 80 70 74 400

Step 1. A claim is made regarding the data fit to

a certain distribution. Ho Ha

Step 1. A claim is made regarding the data fit to

a certain distribution. Ho the number of

customers who prefer each color are the same.

Ha the number of customers who prefer each

color are not the same.

Step 2 Calculate the expected frequencies

(counts) for each cell in the contingency table.

Step 2 Calculate the expected frequencies

(counts) for each cell in the contingency table.

Observed Counts

Package Color Red Blue Green Pink Orange Total

Costumers Preference 70 106 80 70 74 400

Expected Counts

Package Color Red Blue Green Pink Orange Total

Costumers Preference 80 80 80 80 80 400

Step 3 Verify the requirements for the

chi-square test for goodness of fit are

satisfied. (1) data is randomly selected (2)

all expected frequencies are greater than

or equal to 1 (3) 80 of the expected cell

counts are greater than or equal to 5.

Step 4 Select a proper level of significance ?

(No Transcript)

Step 5 Compute the test statistic and P-value

P-value cdf(min,max,df)

Step 5 Compute the test statistic and P-value

P-value 0.0224

11.4

If P-value lt ?, reject null hypothesis

If P-value lt ?, reject null hypothesis

11.4gt9.49 and 0.0224lt0.05. Therefore I would

reject the null hypothesis. The data is

statistically significant and I am led to believe

that there is a difference in preference of

package color

The chi-square independence test you have

two characteristics of a population, and you want

to see if there is any association between the

characteristics(2 variables, 1 population)

In a chi-square independence test, the null

hypothesis is always Ho the variables are

independent The alternative hypothesis is

always Ha the variables are dependent

The idea behind testing these types of claims is

to compare actual counts to the counts we would

expect if the null hypothesis were true (if the

variables are independent). If a significant

difference between the actual counts and expected

counts exists, we would take this as evidence

against the null hypothesis.

The method for obtaining the expected counts

requires that we determine the number of

observations within each cell under the

assumption the null hypothesis is true.

Expected Frequencies in a Chi-Square Independence

Test To find the expected frequencies in a cell

when performing a chi-square independence test,

multiply the row total of the row containing the

cell by the column total of the column containing

the cell and divide this result by the table

total. That is

Test Statistic for the Test of Independence Let

Oi represent the observed number of counts in the

ith cell, Ei represent the expected number of

counts in the ith cell. Then,

approximately follows the chi-square distribution

with(r 1)(c 1) degrees of freedom where r is

the number of rows and c is the number of columns

in the contingency table

The Chi-Square Test for Independence If a claim

is made regarding the association between (or

independence of) two variables in a contingency

table, we can use the following steps to test the

claim provided 1. the data is randomly selected

The Chi-Square Test for Independence If a claim

is made regarding the association between (or

independence of) two variables in a contingency

table, we can use the following steps to test the

claim provided 1. the data is randomly selected

2. all expected frequencies are greater than or

equal to 1.

The Chi-Square Test for Independence If a claim

is made regarding the association between (or

independence of) two variables in a contingency

table, we can use the following steps to test the

claim provided 1. the data is randomly selected

2. all expected frequencies are greater than or

equal to 1. 3. 80 of the expected cell

counts are greater than or equal to 5.

EXAMPLE Testing for Independence

Money Health Love

Men 82 446 355

Women 46 574 273

Step 1. A claim is made regarding the

independence of the data. Ho Ha

Step 1. A claim is made regarding the

independence of the data. Ho there is not

association between gender of lifestyle

choice, the variables are independent

Ha there is an association between gender of

lifestyle choice, the variables are dependent

Step 2 Calculate the expected frequencies

(counts) for each cell in the contingency table.

Step 2 Calculate the expected frequencies

(counts) for each cell in the contingency table.

Observed Counts

Money Health Love

Men 82 446 355

Women 46 574 273

Expected Counts

Money Health Love

Men 63.64 507.13 312.23

Women 64.36 512.87 315.77

Step 3 Verify the requirements for the

chi-square test for independence are

satisfied. (1) data is randomly selected (2)

all expected frequencies are greater than

or equal to 1 (3) 80 of the expected cell

counts are greater than or equal

to 5.

Step 4 Select a proper level of significance ?

Step 5 Compute the test statistic and P-Value

P-value cdf(min,max,df)

Step 5 Compute the test statistic and P-Value

36.84 P 0.00000001

If P-value lt ?, reject null hypothesis

If P-value lt ?, reject null hypothesis

36.84gt5.99 and 0.00000001lt0.05. Therefore I

would reject the null hypothesis. The data is

statistically significant and I am led to believe

that there is an association between gender and

lifestyle choice and that these variables are

dependent

In a chi-square test for homogeneity you take

samples from different populations, and you want

to test to see if the proportions in various

categories is the same for each population(1

variable, multiple populations)

In a chi-square homogeneity test, the null

hypothesis is always Ho populations have the

same proportion of

individuals with some characteristic. The

alternative hypothesis is always Ha

populations have different

proportion of individuals with some

characteristic.

The idea behind testing these types of claims is

to compare actual counts to the counts we would

expect if the null hypothesis were true

(proportions are equal). If a significant

difference between the actual counts and expected

counts exists, we would take this as evidence

against the null hypothesis.

The method for obtaining the expected counts

requires that we determine the number of

observations within each cell under the

assumption the null hypothesis is true.

Expected Frequencies in a Chi-Square Homogeneity

Test To find the expected frequencies in a cell

when performing a chi-square independence test,

multiply the row total of the row containing the

cell by the column total of the column containing

the cell and divide this result by the table

total. That is

Test Statistic for the Test of Homogeneity Let

Oi represent the observed number of counts in the

ith cell, Ei represent the expected number of

counts in the ith cell. Then,

approximately follows the chi-square distribution

with(r 1)(c 1) degrees of freedom where r is

the number of rows and c is the number of columns

in the contingency table

The Chi-Square Test for Homogeneity If a claim is

made regarding that different populations have

the same proportion of individuals with some

characteristic, we can use the following steps to

test the claim provided 1. the data is randomly

selected

The Chi-Square Test for Homogeneity If a claim is

made regarding that different populations have

the same proportion of individuals with some

characteristic, we can use the following steps to

test the claim provided 1. the data is randomly

selected

2. all expected frequencies are greater

than or equal to 1.

The Chi-Square Test for Homogeneity If a claim is

made regarding that different populations have

the same proportion of individuals with some

characteristic, we can use the following steps to

test the claim provided 1. the data is randomly

selected

2. all expected frequencies are greater

than or equal to 1. 3. 80 of the

expected cell counts are greater than or equal to

5.

EXAMPLE A Test of Homogeneity of Proportions The

following question was asked of a random sample

of individuals in 1992, 1998, and 2001 Would

you tell me if you feel being a teacher is an

occupation of very great prestige? The results

of the survey are presented below

1992 1998 2001

Yes 549 539 570

No 522 578 599

Step 1. A claim is made regarding the homogeneity

of the data. Ho Ha

Step 1. A claim is made regarding the homogeneity

of the data. Ho the proportions of

individuals who feel teaching is an

occupation of very great prestige in each year

are equal Ha the proportions of individuals

who feel teaching is an occupation of very

great prestige in each year are not equal

Step 2 Calculate the expected frequencies

(counts) for each cell in the contingency table.

Step 2 Calculate the expected frequencies

(counts) for each cell in the contingency table.

Observed Counts

1992 1998 2001

Yes 549 539 570

No 522 578 599

Expected Counts

1992 1998 2001

Yes 528.96 551.68 577.36

No 542.04 565.32 591.64

Step 3 Verify the requirements for the

chi-square test for homogeneity are

satisfied. (1) data is randomly selected (2)

all expected frequencies are greater than

or equal to 1 (3) 80 of the expected cell

counts are greater than or equal to 5.

Step 4 Select a proper level of significance ?

Step 5 Compute the test statistic and P-Value

P-value cdf(min,max,df)

Step 5 Compute the test statistic and P-Value

2.26 P 0.3228

If P-value lt ?, reject null hypothesis

If P-value lt ?, reject null hypothesis

2.26lt9.21 and 0.323gt0.01. Therefore I would fail

to reject the null hypothesis. The data is not

statistically significant and I can not conclude

that the proportions of individuals who feel

teaching is an occupation of very great prestige

is different each year