Loading...

PPT – Sample Surveys PowerPoint presentation | free to download - id: 6edf6e-MTlkN

The Adobe Flash plugin is needed to view this content

Chapter 12

- Sample Surveys
- Producing Valid Data
- If you dont believe in random sampling, the

next time you have a blood test tell the doctor

to take it all.

The election of 1948 The

Predictions The Candidates Crossley Gallup

Roper The Results Truman

45 44 38 50 Dewey 50 50 53 45

Beyond the Data at Hand to the World at Large

- We have learned ways to display, describe, and

summarize data, but have been limited to

examining the particular batch of data we have. - Wed like (and often need) to stretch beyond the

data at hand to the world at large. - Lets investigate three major ideas that will

allow us to make this stretch

3 Key Ideas That Enable Us to Make the Stretch

Idea 1 Examine a Part of the Whole

- The first idea is to draw a sample.
- Wed like to know about an entire population of

individuals, but examining all of them is usually

impractical, if not impossible. - We settle for examining a smaller group of

individualsa sampleselected from the

population.

Examples

- 1. Think about sampling something you are

cookingyou taste (examine) a small part of what

youre cooking to get an idea about the dish as a

whole. - 2. Opinion polls are examples of sample surveys,

designed to ask questions of a small group of

people in the hope of learning something about

the entire population.

Sampling methods

- Convenience sampling Just ask whoever is around.

- Example Man on the street survey (cheap,

convenient, often quite opinionated or emotional

gt now very popular with TV journalism) - Which men, and on which street?
- Ask about gun control or legalizing marijuana on

the street in Berkeley or in some small town in

Idaho and you would probably get totally

different answers. - Even within an area, answers would probably

differ if you did the survey outside a high

school or a country western bar. - Bias Opinions limited to individuals present.

- Voluntary Response Sampling
- Individuals choose to be involved. These samples

are very susceptible to being biased because

different people are motivated to respond or not.

Often called public opinion polls. These are

not considered valid or scientific. - Bias Sample design systematically favors a

particular outcome.

Ann Landers summarizing responses of readers 70

of (10,000) parents wrote in to say that having

kids was not worth itif they had to do it over

again, they wouldnt.

Bias Most letters to newspapers are written by

disgruntled people. A random sample showed that

91 of parents WOULD have kids again.

CNN on-line surveys

Bias People have to care enough about an issue

to bother replying. This sample is probably a

combination of people who hate wasting the

taxpayers money and animal lovers.

Example hospital employee drug use

Administrators at a hospital are concerned about

the possibility of drug abuse by people who work

there. They decide to check on the extent of the

problem by having a random sample of the

employees undergo a drug test. The

administrators randomly select a department (say,

radiology) and test all the people who work in

that department doctors, nurses, technicians,

clerks, custodians, etc.

- Why might this result in a biased sample?
- Dept. might not represent full range of employee

types, experiences, stress levels, or the

hospitals drug supply

Example (cont.)

- Name the kind of bias that might be present if

the administration decides that instead of

subjecting people to random testing theyll just - a. interview employees about possible drug abuse.
- Response bias people will feel threatened, wont

answer truthfully - b. ask people to volunteer to be tested.
- Voluntary response bias only those who are

clean would volunteer

Bias

- Bias is the bane of samplingthe one thing above

all to avoid. - There is usually no way to fix a biased sample

and no way to salvage useful information from it. - The best way to avoid bias is to select

individuals for the sample at random. - The value of deliberately introducing randomness

is one of the great insights of Statistics Idea

2

Idea 2 Randomize

- Randomization can protect you against factors

that you know are in the data. - It can also help protect against factors you are

not even aware of. - Randomizing protects us from the influences of

all the features of our population, even ones

that we may not have thought about. - Randomizing makes sure that on the average the

sample looks like the rest of the population

Idea 2 Randomize (cont.)

Individuals are randomly selected. No one group

should be over-represented.

Sampling randomly gets rid of bias.

Random samples rely on the absolute objectivity

of random numbers. There are tables and books of

random digits available for random sampling.

Statistical software cangenerate random digits

(e.g., Excel random(), ran button

on calculator).

Idea 2 Randomize (cont.)

- Not only does randomizing protect us from bias,

it actually makes it possible for us to draw

inferences about the population when we see only

a sample.

Hospital example (cont.)

- Listed in the table are the names of the 20

pharmacists on the hospital staff. Use the random

numbers listed below to select three of them to

be in the sample. - 04905 83852 29350 91397 19994 65142 05087

11232

Idea 3 Its the Sample Size!!

- How large a random sample do we need for the

sample to be reasonably representative of the

population? - Its the size of the sample, not the size of the

population, that makes the difference in

sampling. - Exception If the population is small enough and

the sample is more than 10 of the whole

population, the population size can matter. - The fraction of the population that youve

sampled doesnt matter. Its the sample size

itself thats important.

Example

- i) In the city of Chicago, Illinois, 1,000 likely

voters are randomly selected and asked who they

are going to vote for in the Chicago mayoral

race. - ii) In the state of Illinois, 1,000 likely voters

are randomly selected and asked who they are

going to vote for in the Illinois governor's

race. - iii) In the United States, 1,000 likely voters

are randomly selected and asked who they are

going to vote for in the presidential election. - Which survey has more accuracy?
- All the surveys have the same accuracy

Idea 3 Its the Sample Size!!

- Chicken soup
- Blood samples

Does a Census Make Sense?

- Why bother worrying the sample size?
- Wouldnt it be better to just include everyone

and sample the entire population? - Such a special sample is called a census.

Does a Census Make Sense? (cont.)

- There are problems with taking a census
- Practicality It can be difficult to complete a

censusthere always seem to be some individuals

who are hard to locate or hard to measure. - Timeliness populations rarely stand still. Even

if you could take a census, the population

changes while you work, so its never possible to

get a perfect measure. - Expense taking a census may be more complex than

sampling. - Accuracy a census may not be as accurate as a

good sample due to data entry error, inaccurate

(made-up?) data, tedium.

Population versus sample

- Population The entire group of individuals in

which we are interested but cant usually assess

directly. - Example All humans, all working-age people in

California, all crickets - A parameter is a number describing a

characteristic of the population.

- Sample The part of the population we actually

examine and for which we do have data. - How well the sample represents the population

depends on the sample design. - A statistic is a number describing a

characteristic of a sample.

Population

Sample

Sample Statistics Estimate Parameters

- Values of population parameters are unknown in

addition, they are unknowable. - Example The distribution of heights of adult

females (at least 18 yrs of age) in the United

States is approximately symmetric and

mound-shaped with mean µ. µ is a population

parameter whose value is unknown and unknowable - The heights of 1500 females are obtained from a

sample of government records. The sample mean x

of the 1500 heights is calculated to be 64.5

inches. - The sample mean x is a sample statistic that we

use to estimate the unknown population parameter µ

We typically use Greek letters to denote

parameters and Latin letters to denote statistics.

Various claims are often made for surveys. Why

are each of the following claims not correct?

- It is always better to take a census than a

sample - Timeliness, expense, complexity, accuracy
- Stopping students on their way out of the

cafeteria is a good way to sample if we want to

know the quality of the food in the cafeteria. - Bias they chose to eat at the cafeteria
- We drew a sample of 100 from the 3,000 students

at a small college. To get the same level of

precision for a town of 30,000 residents, we'll

need a sample of 1,000 residents. - Its the sample size, not the size of the

population or the fraction of the population that

we sample, that is important.

Survey claims (cont.)

- An internet poll taken at the web site

www.statsisfun.org garnered 12,357 responses.

The majority said they enjoy doing statistics

homework. With a sample size that large, we can

be pretty sure that most Statistics students feel

this way, too. - Voluntary response bias size of sample does not

remove the bias. - The true percentage of all Statistics students

who enjoy the homework is called a population

statistic. - The true percentage is a population parameter

(No Transcript)

Simple Random Sample

- A simple random sample (SRS) of size n consists

of n units from the population chosen in such a

way that every set of n units has an equal chance

to be the sample actually selected.

Simple Random Samples (cont.)

- To select a sample at random, we first need to

define where the sample will come from. - The sampling frame is a list of individuals from

which the sample is drawn. - E.g., To select a random sample of students from

a college, we might obtain a list of all

registered full-time students. - When defining sampling frame, must deal with

details defining the population are part-time

students included? How about current study-abroad

students? - Once we have our sampling frame, the easiest way

to choose an SRS is with random numbers.

Warning!

- If some members of the population are not

included in the sampling frame, they cannot be

part of the sample!! (e. g., using a telephone

book as the sampling frame) - Population Wal Mart shoppers
- Sampling frame?

Example simple random sample

- Academic dept wishes to randomly choose a

3-member committee from the 28 members of the

dept - 00 Abbott 07 Goodwin 14 Pillotte 21 Theobald
- 01 Cicirelli 08 Haglund 15 Raman 22 Vader
- 02 Crane 09 Johnson 16 Reimann 23 Wang
- 03 Dunsmore 10 Keegan 17 Rodriguez 24 Wieczoreck
- 04 Engle 11 Lechtenbg 18 Rowe 25 Williams
- 05 Fitzpatk 12 Martinez 19 Sommers 26 Wilson
- 06 Garcia 13 Nguyen 20 Stone 27 Zink

Solution

- Use a random number table read 2-digit pairs

until you have chosen 3 committee members - For example, start in row 21
- 76509 47069 86378 41797 11910 49672 88575
- Rodriguez (17) Lechtenberg (11) Engle (04)
- Your calculator generates random numbers you can

also generate random numbers using Excel

Sampling Variability

- Suppose we had started in line 22?
- 19689 90332 04315 21358 97248 11188 39062
- Our sample would have been
- 19 Summers, 03 Dunsmore, 04 Engle

Sampling Variability

- Samples drawn at random generally differ from one

another. - Each draw of random numbers selects different

people for our sample. - These differences lead to different values for

the variables we measure. - We call these sample-to-sample differences

sampling variability. - Variability is OK bias is bad!!

(No Transcript)

Stratified Random Sampling

- This sampling procedure separates the population

into mutually exclusive sets (strata), and then

selects simple random samples from each stratum.

Stratified Random Sampling

- With this procedure we can acquire information

about - the whole population
- each stratum
- the relationships among strata.

Stratified Random Sampling

- There are several ways to build the stratified

sample. For example, keep the proportion of each

stratum in the population.

A sample of size 1,000 is to be drawn

Total 1,000

Cluster Sampling

- Sometimes stratifying isnt practical and simple

random sampling is difficult. - Splitting the population into similar parts or

clusters can make sampling more practical. - Then we could select one or a few clusters at

random and perform a census within each cluster. - This sampling design is called cluster sampling.
- If each cluster fairly represents the full

population, cluster sampling will give us an

unbiased sample.

Cluster Sampling Useful When

- it is difficult and costly to
- develop a complete list of the
- population members (making
- it difficult to develop a simple
- random sampling procedure.)

? e.g., all items sold in a grocery store

? the population members are widely dispersed

geographically.

? e.g., all Toyota dealerships in North Carolina

Mean length of sentencesin our course text

- We would like to assess the
- reading level of our course text
- based on the length of the sentences.
- Simple random sampling would be awkward
- number each sentence in the book?
- Better way
- choose a few pages at random (the pages are the

clusters, and it's reasonable to assume that each

page is representative of the entire text). - count the length of the sentences on those pages

Cluster sampling - not the same as stratified

sampling!!

- We stratify to ensure that our sample represents

different groups in the population, and sample

randomly within each stratum. - Clusters are more or less alike, each

heterogeneous and resembling the overall

population. - We select clusters to make sampling more

practical or affordable. - We conduct a census on or select a SRS from each

selected cluster.

Strata are homogenous (e.g., male, female)

but differ from one another

Multistage Sampling

- Sometimes we use a variety of sampling methods

together. - Sampling schemes that combine several methods are

called multistage samples.

Most surveys conducted by professional polling

organizations and government agencies use some

combination of stratified and cluster sampling as

well as simple random sampling.

Mean length of sentences in our course text, cont.

- In attempting to assess the reading level of our

course text - we might worry that it starts out easy and gets

harder as the concepts become more difficult - we want to avoid samples that select too heavily

from early or from late chapters - Suppose our course text has 5 sections, with

several chapters in each section.

Mean length of sentences in our course text, cont.

- We could
- i) randomly select 1 chapter from each section
- ii) randomly select a few pages from each of the

selected chapters - iii) if altogether this makes too many sentences,

we could randomly select a few sentences from

each page. - So what is our sampling strategy?
- i) we stratify by section of the book
- ii) we randomly choose a chapter to represent

each stratum (section) - iii) within each chapter we randomly choose pages

as clusters - iv) finally, we choose an SRS of sentences within

each cluster

Systematic Sampling

- Sometimes we draw a sample by selecting

individuals systematically. - For example, you might survey every 10th person

on an alphabetical list of students. - To make it random, you must still start the

systematic selection from a randomly selected

individual. - When there is no reason to believe that the order

of the list could be associated in any way with

the responses sought, systematic sampling can

give a representative sample. - Systematic sampling can be much less expensive

than true random sampling. - When you use a systematic sample, you need to

justify the assumption that the systematic method

is not associated with any of the measured

variables.

Systematic Sampling-example

- You want to select a sample of 50 students from a

college dormitory that houses 500 students. - On a list of all students living in the dorm,

number the students from 001 to 500. - Generate a random number between 001 and 010, and

start with that student. - Every 10th student in the list becomes part of

your sample. - Questions
- 1) does each student have an equal chance to be

in the sample? - 2) what is the chance that a student is included

in the sample? - 3) is this an SRS?

End of Chapter 12