Sample Surveys - PowerPoint PPT Presentation

Loading...

PPT – Sample Surveys PowerPoint presentation | free to download - id: 6edf6e-MTlkN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Sample Surveys

Description:

Sample Surveys Producing Valid Data If you don t believe in random sampling, the next time you have a blood test tell the doctor to take it all. – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 49
Provided by: Depto70
Learn more at: http://www.stat.ncsu.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Sample Surveys


1
Chapter 12
  • Sample Surveys
  • Producing Valid Data
  • If you dont believe in random sampling, the
    next time you have a blood test tell the doctor
    to take it all.

2
The election of 1948 The
Predictions The Candidates Crossley Gallup
Roper The Results Truman
45 44 38 50 Dewey 50 50 53 45
3
Beyond the Data at Hand to the World at Large
  • We have learned ways to display, describe, and
    summarize data, but have been limited to
    examining the particular batch of data we have.
  • Wed like (and often need) to stretch beyond the
    data at hand to the world at large.
  • Lets investigate three major ideas that will
    allow us to make this stretch

4
3 Key Ideas That Enable Us to Make the Stretch
5
Idea 1 Examine a Part of the Whole
  • The first idea is to draw a sample.
  • Wed like to know about an entire population of
    individuals, but examining all of them is usually
    impractical, if not impossible.
  • We settle for examining a smaller group of
    individualsa sampleselected from the
    population.

6
Examples
  • 1. Think about sampling something you are
    cookingyou taste (examine) a small part of what
    youre cooking to get an idea about the dish as a
    whole.
  • 2. Opinion polls are examples of sample surveys,
    designed to ask questions of a small group of
    people in the hope of learning something about
    the entire population.

7

Sampling methods
  • Convenience sampling Just ask whoever is around.
  • Example Man on the street survey (cheap,
    convenient, often quite opinionated or emotional
    gt now very popular with TV journalism)
  • Which men, and on which street?
  • Ask about gun control or legalizing marijuana on
    the street in Berkeley or in some small town in
    Idaho and you would probably get totally
    different answers.
  • Even within an area, answers would probably
    differ if you did the survey outside a high
    school or a country western bar.
  • Bias Opinions limited to individuals present.

8
  • Voluntary Response Sampling
  • Individuals choose to be involved. These samples
    are very susceptible to being biased because
    different people are motivated to respond or not.
    Often called public opinion polls. These are
    not considered valid or scientific.
  • Bias Sample design systematically favors a
    particular outcome.

Ann Landers summarizing responses of readers 70
of (10,000) parents wrote in to say that having
kids was not worth itif they had to do it over
again, they wouldnt.
Bias Most letters to newspapers are written by
disgruntled people. A random sample showed that
91 of parents WOULD have kids again.
9
CNN on-line surveys
Bias People have to care enough about an issue
to bother replying. This sample is probably a
combination of people who hate wasting the
taxpayers money and animal lovers.
10
Example hospital employee drug use
Administrators at a hospital are concerned about
the possibility of drug abuse by people who work
there. They decide to check on the extent of the
problem by having a random sample of the
employees undergo a drug test. The
administrators randomly select a department (say,
radiology) and test all the people who work in
that department doctors, nurses, technicians,
clerks, custodians, etc.
  • Why might this result in a biased sample?
  • Dept. might not represent full range of employee
    types, experiences, stress levels, or the
    hospitals drug supply

11
Example (cont.)
  • Name the kind of bias that might be present if
    the administration decides that instead of
    subjecting people to random testing theyll just
  • a. interview employees about possible drug abuse.
  • Response bias people will feel threatened, wont
    answer truthfully
  • b. ask people to volunteer to be tested.
  • Voluntary response bias only those who are
    clean would volunteer

12
Bias
  • Bias is the bane of samplingthe one thing above
    all to avoid.
  • There is usually no way to fix a biased sample
    and no way to salvage useful information from it.
  • The best way to avoid bias is to select
    individuals for the sample at random.
  • The value of deliberately introducing randomness
    is one of the great insights of Statistics Idea
    2

13
Idea 2 Randomize
  • Randomization can protect you against factors
    that you know are in the data.
  • It can also help protect against factors you are
    not even aware of.
  • Randomizing protects us from the influences of
    all the features of our population, even ones
    that we may not have thought about.
  • Randomizing makes sure that on the average the
    sample looks like the rest of the population

14
Idea 2 Randomize (cont.)
Individuals are randomly selected. No one group
should be over-represented.
Sampling randomly gets rid of bias.
Random samples rely on the absolute objectivity
of random numbers. There are tables and books of
random digits available for random sampling.
Statistical software cangenerate random digits
(e.g., Excel random(), ran button
on calculator).
15
Idea 2 Randomize (cont.)
  • Not only does randomizing protect us from bias,
    it actually makes it possible for us to draw
    inferences about the population when we see only
    a sample.

16
Hospital example (cont.)
  • Listed in the table are the names of the 20
    pharmacists on the hospital staff. Use the random
    numbers listed below to select three of them to
    be in the sample.
  • 04905 83852 29350 91397 19994 65142 05087
    11232

17
Idea 3 Its the Sample Size!!
  • How large a random sample do we need for the
    sample to be reasonably representative of the
    population?
  • Its the size of the sample, not the size of the
    population, that makes the difference in
    sampling.
  • Exception If the population is small enough and
    the sample is more than 10 of the whole
    population, the population size can matter.
  • The fraction of the population that youve
    sampled doesnt matter. Its the sample size
    itself thats important.

18
Example
  • i) In the city of Chicago, Illinois, 1,000 likely
    voters are randomly selected and asked who they
    are going to vote for in the Chicago mayoral
    race.
  • ii) In the state of Illinois, 1,000 likely voters
    are randomly selected and asked who they are
    going to vote for in the Illinois governor's
    race.
  • iii) In the United States, 1,000 likely voters
    are randomly selected and asked who they are
    going to vote for in the presidential election.
  • Which survey has more accuracy?
  • All the surveys have the same accuracy

19
Idea 3 Its the Sample Size!!
  • Chicken soup
  • Blood samples

20
Does a Census Make Sense?
  • Why bother worrying the sample size?
  • Wouldnt it be better to just include everyone
    and sample the entire population?
  • Such a special sample is called a census.

21
Does a Census Make Sense? (cont.)
  • There are problems with taking a census
  • Practicality It can be difficult to complete a
    censusthere always seem to be some individuals
    who are hard to locate or hard to measure.
  • Timeliness populations rarely stand still. Even
    if you could take a census, the population
    changes while you work, so its never possible to
    get a perfect measure.
  • Expense taking a census may be more complex than
    sampling.
  • Accuracy a census may not be as accurate as a
    good sample due to data entry error, inaccurate
    (made-up?) data, tedium.

22
Population versus sample
  • Population The entire group of individuals in
    which we are interested but cant usually assess
    directly.
  • Example All humans, all working-age people in
    California, all crickets
  • A parameter is a number describing a
    characteristic of the population.
  • Sample The part of the population we actually
    examine and for which we do have data.
  • How well the sample represents the population
    depends on the sample design.
  • A statistic is a number describing a
    characteristic of a sample.

Population
Sample
23
Sample Statistics Estimate Parameters
  • Values of population parameters are unknown in
    addition, they are unknowable.
  • Example The distribution of heights of adult
    females (at least 18 yrs of age) in the United
    States is approximately symmetric and
    mound-shaped with mean µ. µ is a population
    parameter whose value is unknown and unknowable
  • The heights of 1500 females are obtained from a
    sample of government records. The sample mean x
    of the 1500 heights is calculated to be 64.5
    inches.
  • The sample mean x is a sample statistic that we
    use to estimate the unknown population parameter µ

24
We typically use Greek letters to denote
parameters and Latin letters to denote statistics.
25
Various claims are often made for surveys. Why
are each of the following claims not correct?
  • It is always better to take a census than a
    sample
  • Timeliness, expense, complexity, accuracy
  • Stopping students on their way out of the
    cafeteria is a good way to sample if we want to
    know the quality of the food in the cafeteria.
  • Bias they chose to eat at the cafeteria
  • We drew a sample of 100 from the 3,000 students
    at a small college. To get the same level of
    precision for a town of 30,000 residents, we'll
    need a sample of 1,000 residents.
  • Its the sample size, not the size of the
    population or the fraction of the population that
    we sample, that is important.

26
Survey claims (cont.)
  • An internet poll taken at the web site
    www.statsisfun.org garnered 12,357 responses.
    The majority said they enjoy doing statistics
    homework. With a sample size that large, we can
    be pretty sure that most Statistics students feel
    this way, too.
  • Voluntary response bias size of sample does not
    remove the bias.
  • The true percentage of all Statistics students
    who enjoy the homework is called a population
    statistic.
  • The true percentage is a population parameter

27
(No Transcript)
28
Simple Random Sample
  • A simple random sample (SRS) of size n consists
    of n units from the population chosen in such a
    way that every set of n units has an equal chance
    to be the sample actually selected.

29
Simple Random Samples (cont.)
  • To select a sample at random, we first need to
    define where the sample will come from.
  • The sampling frame is a list of individuals from
    which the sample is drawn.
  • E.g., To select a random sample of students from
    a college, we might obtain a list of all
    registered full-time students.
  • When defining sampling frame, must deal with
    details defining the population are part-time
    students included? How about current study-abroad
    students?
  • Once we have our sampling frame, the easiest way
    to choose an SRS is with random numbers.

30
Warning!
  • If some members of the population are not
    included in the sampling frame, they cannot be
    part of the sample!! (e. g., using a telephone
    book as the sampling frame)
  • Population Wal Mart shoppers
  • Sampling frame?

31
Example simple random sample
  • Academic dept wishes to randomly choose a
    3-member committee from the 28 members of the
    dept
  • 00 Abbott 07 Goodwin 14 Pillotte 21 Theobald
  • 01 Cicirelli 08 Haglund 15 Raman 22 Vader
  • 02 Crane 09 Johnson 16 Reimann 23 Wang
  • 03 Dunsmore 10 Keegan 17 Rodriguez 24 Wieczoreck
  • 04 Engle 11 Lechtenbg 18 Rowe 25 Williams
  • 05 Fitzpatk 12 Martinez 19 Sommers 26 Wilson
  • 06 Garcia 13 Nguyen 20 Stone 27 Zink

32
Solution
  • Use a random number table read 2-digit pairs
    until you have chosen 3 committee members
  • For example, start in row 21
  • 76509 47069 86378 41797 11910 49672 88575
  • Rodriguez (17) Lechtenberg (11) Engle (04)
  • Your calculator generates random numbers you can
    also generate random numbers using Excel

33
Sampling Variability
  • Suppose we had started in line 22?
  • 19689 90332 04315 21358 97248 11188 39062
  • Our sample would have been
  • 19 Summers, 03 Dunsmore, 04 Engle

34
Sampling Variability
  • Samples drawn at random generally differ from one
    another.
  • Each draw of random numbers selects different
    people for our sample.
  • These differences lead to different values for
    the variables we measure.
  • We call these sample-to-sample differences
    sampling variability.
  • Variability is OK bias is bad!!

35
(No Transcript)
36
Stratified Random Sampling
  • This sampling procedure separates the population
    into mutually exclusive sets (strata), and then
    selects simple random samples from each stratum.

37
Stratified Random Sampling
  • With this procedure we can acquire information
    about
  • the whole population
  • each stratum
  • the relationships among strata.

38
Stratified Random Sampling
  • There are several ways to build the stratified
    sample. For example, keep the proportion of each
    stratum in the population.

A sample of size 1,000 is to be drawn
Total 1,000
39
Cluster Sampling
  • Sometimes stratifying isnt practical and simple
    random sampling is difficult.
  • Splitting the population into similar parts or
    clusters can make sampling more practical.
  • Then we could select one or a few clusters at
    random and perform a census within each cluster.
  • This sampling design is called cluster sampling.
  • If each cluster fairly represents the full
    population, cluster sampling will give us an
    unbiased sample.

40
Cluster Sampling Useful When
  • it is difficult and costly to
  • develop a complete list of the
  • population members (making
  • it difficult to develop a simple
  • random sampling procedure.)

? e.g., all items sold in a grocery store
? the population members are widely dispersed
geographically.
? e.g., all Toyota dealerships in North Carolina
41
Mean length of sentencesin our course text
  • We would like to assess the
  • reading level of our course text
  • based on the length of the sentences.
  • Simple random sampling would be awkward
  • number each sentence in the book?
  • Better way
  • choose a few pages at random (the pages are the
    clusters, and it's reasonable to assume that each
    page is representative of the entire text).
  • count the length of the sentences on those pages

42
Cluster sampling - not the same as stratified
sampling!!
  • We stratify to ensure that our sample represents
    different groups in the population, and sample
    randomly within each stratum.
  • Clusters are more or less alike, each
    heterogeneous and resembling the overall
    population.
  • We select clusters to make sampling more
    practical or affordable.
  • We conduct a census on or select a SRS from each
    selected cluster.

Strata are homogenous (e.g., male, female)
but differ from one another
43
Multistage Sampling
  • Sometimes we use a variety of sampling methods
    together.
  • Sampling schemes that combine several methods are
    called multistage samples.

Most surveys conducted by professional polling
organizations and government agencies use some
combination of stratified and cluster sampling as
well as simple random sampling.
44
Mean length of sentences in our course text, cont.
  • In attempting to assess the reading level of our
    course text
  • we might worry that it starts out easy and gets
    harder as the concepts become more difficult
  • we want to avoid samples that select too heavily
    from early or from late chapters
  • Suppose our course text has 5 sections, with
    several chapters in each section.

45
Mean length of sentences in our course text, cont.
  • We could
  • i) randomly select 1 chapter from each section
  • ii) randomly select a few pages from each of the
    selected chapters
  • iii) if altogether this makes too many sentences,
    we could randomly select a few sentences from
    each page.
  • So what is our sampling strategy?
  • i) we stratify by section of the book
  • ii) we randomly choose a chapter to represent
    each stratum (section)
  • iii) within each chapter we randomly choose pages
    as clusters
  • iv) finally, we choose an SRS of sentences within
    each cluster

46
Systematic Sampling
  • Sometimes we draw a sample by selecting
    individuals systematically.
  • For example, you might survey every 10th person
    on an alphabetical list of students.
  • To make it random, you must still start the
    systematic selection from a randomly selected
    individual.
  • When there is no reason to believe that the order
    of the list could be associated in any way with
    the responses sought, systematic sampling can
    give a representative sample.
  • Systematic sampling can be much less expensive
    than true random sampling.
  • When you use a systematic sample, you need to
    justify the assumption that the systematic method
    is not associated with any of the measured
    variables.

47
Systematic Sampling-example
  • You want to select a sample of 50 students from a
    college dormitory that houses 500 students.
  • On a list of all students living in the dorm,
    number the students from 001 to 500.
  • Generate a random number between 001 and 010, and
    start with that student.
  • Every 10th student in the list becomes part of
    your sample.
  • Questions
  • 1) does each student have an equal chance to be
    in the sample?
  • 2) what is the chance that a student is included
    in the sample?
  • 3) is this an SRS?

48
End of Chapter 12
About PowerShow.com