Loading...

PPT – Chapter 3: Producing Data PowerPoint presentation | free to view - id: 814ba9-MzA3N

The Adobe Flash plugin is needed to view this content

Chapter 3 Producing Data

- Inferential Statistics
- Sampling
- Designing Experiments

Inferential Statistics

- We start with a question about a group or groups.
- The group(s) we are interested in is(are) called

the population(s). - Examples
- What is the average number of car accidents for a

person over 65 in the United States? - For the entire world, is the IQ of women the same

as the IQ of men? - How many times a day should I feed my goldfish?
- Which is more effective at lowering the

heartrate of mice, no drug (control), drug A,

drug B, or drug C?

Inferential Statistics

- Example 1 What is the average number of car

accidents for a person over 65 in the United

States? - How many populations are of interest?
- One
- What is the population of interest?
- All people in the U.S. over age 65.

Inferential Statistics

- Example 2 For the entire world, is the IQ of

women the same as the IQ of men? - How many populations are of interest?
- Two
- What are the populations of interest?
- All women and all men

Inferential Statistics

- Example 3 How many times a day should I feed my

goldfish? - How many populations are of interest?
- One
- What is the population of interest?
- All pet goldfish

Inferential Statistics

- Example 4 Which is more effective at lowering

the heartrate of mice, no drug (control), drug

A, drug B, or drug C? - How many populations are of interest?
- Four
- What are the populations of interest?
- All mice taking no drug, all mice taking drug A,

all mice taking drug B, all mice taking drug C

Inferential Statistics

- Suppose we have no previous information about

these questions. How could we answer them? - Census
- Advantages
- We get everyone, we know the truth
- Disadvantages
- Expensive, Difficult to obtain, may be

impossible. - Sample
- Advantages
- Less expensive. Feasible.
- Disadvantages
- Uncertainty about the truth. Instead of surety

we may have error.

Inferential Statistics

- Suppose we have no previous information about

these questions. How could we answer them? - If we take a census, we have everyone and we have

no need for inference. We know. - If we take a sample, we make inference from the

sample to the whole population. - For these four questions, it is not likely we can

get a census. We will need to use a sample. - Obviously, for each population we are interested

in, we must get a separate sample.

Inferential Statistics

- General Idea of Inferential Statistics
- We take a sample from the whole population.
- We summarize the sample using important

statistics. - We use those summaries to make inference about

the whole population. - We realize there may be some error involved in

making inference.

Inferential Statistics

- Example (1988, the Steering Committee of the

Physicians' Health Study Research Group) - Question Can Aspirin reduce the risk of heart

attack in humans? - Sample Sample of 22,071 male physicians between

the ages of 40 and 84, randomly assigned to one

of two groups. One group took an ordinary

aspirin tablet every other day (headache or not).

The other group took a placebo every other day.

This group is the control group. - Summary statistic The rate of heart attacks in

the group taking aspirin was only 55 of the rate

of heart attacks in the placebo group. - Inference to population Taking aspirin causes

lower rate of heart attacks in humans.

Sampling a Single Population

- Basics for sampling
- Sampling should not be biased no favoring of any

individual in the population. - Example of a biased sample
- Select goldfish from a particular store
- The selection of an individual in the population

should not affect the selection of the next

individual independence. - Example of non-independent sample
- Choosing cards from a deck without replacement

Sampling a Single Population

- Basics for sampling
- Sampling should be large enough to adequately

cover the population. - Example of a small sample
- Suppose only 20 physicians were used in the

aspirin study. - Sampling should have the smallest variability

possible. - We know there is some error want to minimize it.

Sampling a Single Population

- Sampling Techniques
- Simple Random Sample (SRS) every member of the

population has an equal chance of being selected.

Sampling a Single Population

- Sampling Techniques
- Simple Random Sample (SRS) every member of the

population has an equal chance of being selected - Assign every individual a number and randomly

select 30 numbers using a random number table (or

computer generated random numbers). - Example Obtain a list of all SSN for individuals

in the U.S. who are over 65. Using a random

number table, select 50 of them. - Table B at the back of the book is random digits.

Sampling a Single Population

- Sampling Techniques
- Stratified Random Sample Divide the population

into several strata. Then take a SRS from each

stratum.

Sampling a Single Population

- Sampling Techniques
- Stratified Random Sample
- Advantage Each stratum is guaranteed to be

randomly sampled - Example Obtain a list of all SSN for individuals

in the U.S. who are over 65. Divide up the SSNs

into region of the country (time zones). Then

randomly sample 30 from each time zone.

Sampling a Single Population

- Sampling Techniques
- Cluster Sample Divide the population into

several strata or clusters. Then take a SRS of

clusters using all the observations in each.

Sampling a Single Population

- Sampling Techniques
- Cluster Sample
- Advantage May be the only feasible method, given

resoures. - Example Obtain a list of all SSNs for

individuals in the U.S. who are over 65. Sort

the SSNs by the last 4 digits making each set of

100 a cluster. Use a random number table to pick

the clusters. You may get the 4100s, 5600s and

8200s for example.

Sampling a Single Population

- Sampling Techniques
- Multi-Stage Sample Divide the population into

several strata. Then take a SRS from a random

subset of all the strata.

Sampling a Single Population

- Sampling Techniques
- Multi-Stage Sample
- Advantage May be the only feasible method, given

resources. - Example Obtain a list of all SSN for individuals

in the U.S. who are over 65. Divide up the SSNs

into 50 states. Randomly select 10 states. Then

randomly sample 40 from each of the selected

states.

Sampling a Single Population

- Sampling Problems
- Voluntary response
- Internet surveys
- Call-in surveys
- Convenience sampling
- Sampling friends
- Sampling at the mall
- Dishonesty
- Asking personal questions
- Not enough time to respond honestly

Sampling a Single Population

- Undercoverage Some groups in the population are

left out when the sample is taken - Nonresponse An individual chosen for the sample

cant be contacted or does not cooperate - Response Bias Results that are influenced by

the behavior of the respondent or interviewer - For example, the wording of questions can

influence the answers - Respondent may not want to give truthful answers

to sensitive questions

Sampling More than One Population

- We sample from more than one population when we

are interested in more than one variable. - As previously discussed, one variable is chosen

to be the response variable and the other is

selected as the explanatory variable. - Examples
- Comparing decibel levels of 4 different brands of

speakers - Determining time to failure of 3 different types

of lightbulbs - Comparing GRE scores for students from 5

different majors

Sampling More than One Population

- Example 1 Comparing decibel levels of 4

different brands of speakers - What is the explanatory variable?
- Brand
- What is the response variable?
- Decibel Level
- Number of Populations?
- Four
- Number of Samples needed?
- Four

Sampling More than One Population

- Example 2 Determining time to failure of 3

different types of lightbulbs - What is the explanatory variable?
- Type
- What is the response variable?
- Time to Failure
- Number of Populations?
- Three
- Number of Samples needed?
- Three

Sampling More than One Population

- Example 3 Comparing GRE scores for students from

5 different majors - What is the explanatory variable?
- Major
- What is the response variable?
- GRE score
- Number of Populations?
- Five
- Number of Samples needed?
- Five

Sampling More than One Population

- Important Considerations
- Each sample should represent the population it

corresponds to well. - Samples from more than one population should be

as close to each other in every respect as

possible except for the explanatory variable.

Otherwise we may have confounding variables. - Two variables are confounded if we cannot

determine which one caused the differences in the

response.

Sampling More than One Population

- Important Considerations
- Examples of Confounding
- Suppose we compared the decibel levels of the

four different speaker brands, each with a

different measuring instrument - We wouldnt know if the differences were due to

the different brands or different instruments. - Brand and Instrument are then confounded.
- Suppose we compared the time to failure of the

three different types of lightbulbs, each in a

different light socket. - We wouldnt know if the differences were due to

the different types of lightbulbs or different

light sockets. - Type and Socket confounded.

Sampling More than One Population

- Important Considerations
- Examples of Confounding
- Suppose we obtained GRE scores for each major,

each from a different university. - We wouldnt know if the differences were due to

the different majors or different universities. - Major and University are then confounded.
- Confounding can be avoided by using good sampling

techniques, which will be explained shortly

Sampling More than One Population

- Important Considerations
- It is also possible that more than one (possibly

several) explanatory variable can influence a

given response variable. - Example
- Perhaps both the type of lightbulb and the type

of light socket influence the time to failure of

a lightbulb. - It is likely that different types of lightbulbs

work better for different sockets. - This concept is known as interaction.
- Interaction The responses for the levels of one

variable differ over the levels of another

variable.

Sampling More than One Population

- Randomized Experiment
- The key to a randomized experiment the treatment

(explanatory variable) is randomly assigned to

the experimental units or subjects.

Random Assignment

Compare

Sampling More than One Population

- Randomized Experiment
- Example Suppose that before we want to test the

effect of aspirin on the physicians, we wish to

do a study on the effect of aspirin on mice,

comparing heart rates. - We obtain a random sample of 100 mice.
- We randomly assign 50 mice to receive a placebo.
- We randomly assign 50 mice to receive aspirin.
- After 20 days of administering the placebo and

aspirin, we measure the heart rates and obtain

summary statistics for comparison.

Sampling More than One Population

- Randomized Experiment
- The single greatest advantage of a randomized

experiment is that we can infer causation. - Through randomization to groups, we have

controlled all other factors and eliminated the

possibility of a confounding variable. - Unfortunately or perhaps fortunately, we cannot

always use a randomized experiment - Often impossible or unethical, particularly with

humans.

Sampling More than One Population

- Observational Study
- We are forced to select samples from different

pre-existing populations

Simple Random Sample

Compare

Sampling More than One Population

- Observational Study
- Advantage The data is much easier to obtain.
- Disadvantages
- We cannot say the explanatory variable caused the

response - There may be lurking or confounding variables
- Observational studies should be more to describe

the past, not predict the future. - Case-Control Study A study in which cases

having a particular condition are compared to

controls who do not. The purpose is to find out

whether or not one or more explanatory variables

are related to a certain disease. - Although you cant usually determine cause and

effect, these studies are more efficient and they

can reduce the potential confounding variables.

Sampling More than One Population

- Observational Study
- Example 1 Suppose we are interested in comparing

GRE scores for students in five different majors - We cannot do a randomized experiment because we

cannot randomly assign individuals to a specific

major. The individuals decide that for

themselves. - Thus, we observe students from 5 different

pre-existing populations the five majors. - We obtain a random sample of size 15 from each of

the five majors. - We calculate statistics and compare the 5 groups.
- Can we say being in a specific major causes

someone to get a higher GRE score? - What are some possible confounding variables?
- How might we reduce the effect of these

confounding variables?

Sampling More than One Population

- Observational Study
- Example 2 Suppose we are interested finding out

which age group talks the most on the telephone

0-10 years, 10-20 years, 20-30 years, or 30-40

years - We cannot do a randomized experiment because we

cannot randomly assign individuals to an age

group. - Thus, we observe (through polling or wire

tapping) individuals from 4 different

pre-existing populations the four age groups. - We obtain a random sample of size 25 from each of

the four age groups. - We calculate statistics and compare the 4 groups.
- Can we say being in a specific age group causes

someone to talk more on the telephone? - What are some possible confounding variables?
- How might we control these confounding variables?

Inference Overview

- Recall that inference is using statistics from a

sample to talk about a population. - We need some background in how we talk about

populations and how we talk about samples.

Inference Overview

- Describing a Population
- It is common practice to use Greek letters when

talking about a population. - We call the mean of a population m.
- We call the standard deviation of a population s

and the variance s2. - When we are talking about percentages, we call

the population proportion p. (or pi). - It is important to know that for a given

population there is only one true mean and one

true standard deviation and variance or one true

proportion. - There is a special name for these values

parameters.

Inference Overview

- Describing a Sample
- It is common practice to use Roman letters when

talking about a sample. - We call the mean of a sample .
- We call the standard deviation of a sample s and

the variance s2. - When we are talking about percentages, we call

the sample proportion p. - There are many different possible samples that

could be taken from a given population. For each

sample there may be a different mean, standard

deviation, variance, or proportion. - There is a special name for these values

statistics.

Inference Overview

- We use sample statistics to make inference about

population parameters

m

s

s

p

p

Sampling Variability

- There are many different samples that you can

take from the population. - Statistics can be computed on each sample.
- Since different members of the population are in

each sample, the value of a statistic varies from

sample to sample.

Sampling Distribution

- The sampling distribution of a statistic is the

distribution of values taken by the statistic in

all possible samples of the same size from the

same population. - We can then examine the shape, center, and spread

of the sampling distribution.

Bias and Variability

- Bias concerns the center of the sampling

distribution. A statistic used to a parameter is

unbiased if the mean of the sampling distribution

is equal to the true value of the parameter being

estimated. - To reduce bias, use random sampling. The values

of a statistic computed from an SRS neither

consistently overestimates nor consistently

underestimates the value of the population

parameter. - Variability is described by the spread of the

sampling distribution. - To reduce the variability of a statistic from an

SRS, use a larger sample. You can make the

variability as small as you want by taking a

large enough sample.

Bias and Variability