Topic VI: Sampling Theory - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

Topic VI: Sampling Theory

Description:

Lottery Method. Computer Generated Numbers. Stratified Random Sampling I ... A SRS of size nh is taken from each stratum with population Nh ... – PowerPoint PPT presentation

Number of Views:730
Avg rating:3.0/5.0
Slides: 81
Provided by: mona1
Category:
Tags: lottery | nh | sampling | theory | topic

less

Transcript and Presenter's Notes

Title: Topic VI: Sampling Theory


1
Topic VISampling Theory Sampling Methods
  • Concepts Definitions
  • Sampling With Without replacement
  • Probability Non-Probability Sampling
  • Types of Sampling Methods
  • Determining Sample Size

2
Concepts Definitions
  • Population
  • Sampling Frame
  • Unit of Analysis
  • Sampling Units
  • Principal Information
  • Auxiliary Information
  • Sampling Error
  • Non-Sampling Error
  • Sampling Fraction
  • Bias

3
Population
  • This a collection of all the units of a specified
    type defined over a given space or time
  • It is defined by
  • Content this refers to who or what exactly are
    the subjects of interest. Eg. All persons above
    aged 18 and over
  • Units this refers to how the subjects are
    grouped. Eg. Within households
  • Extent this refers to the spatial feature of
    the population. Eg. The subjects can only be
    living in Jamaica.
  • Time this refers to what period of time that
    your subjects must possess the particulars named
    above. Eg. June October 1998

4
Sampling Frame
  • This is a list of the all the units in the target
    population from which the sample is to be chosen
  • A subset of subjects for a survey should only be
    taken from a sampling frame

5
Sampling Frames II
  • When conducting a national survey there are two
    types of sampling frames that can be used (note
    there are others)
  • Electoral registers this lists electors in each
    polling district by street streets in
    alphabetical order
  • Postcode Sectors as their primary sampling unit

6
Sampling Frames III
  • Postal Sectors are determined from a Postcode
    Address File
  • Postal Areas (CT) 121
  • Postal Districts (CT2) 2700
  • Postal Sectors (CT2 7) 8900
  • Postcodes (CT2 7PE) 1.6mn
  • Delivery Points 26mn

7
Sampling Frames IV
8
Sampling Frames - Notes
Post Code Sectors
Electoral Register
  • not all adults are electors (exclude felons, non
    EU or Commonwealth, ...)
  • limited corrections
  • includes institutions (colleges)
  • coverage high but incomplete
  • compiled annually
  • compiled locally with variety of software
  • in force until 18 months from data collection
  • does not give names
  • no indication of household size
  • multi occupancy indicator available
  • will contain some small business addresses
  • updated quarterly
  • national system
  • computerised format, so lower selection cost

9
Unit of Analysis
  • Sometimes referred to as Sampling Units
  • This is the items/units being investigated
  • Eg Individuals, households, hospitals

10
Sampling Units
  • This refers to the items/units selected for
    inclusion in the sample
  • Eg If John Brown was selected to be included in
    the sample then he is a sampling unit

11
Principal Information
  • This refers to information on the central
    variable of the study
  • Also known as principal variable or principal
    data
  • Eg. For a household budget survey, the principal
    variable would be considered to be expenditure on
    food

12
Auxiliary Information
  • This refers to any other information other than
    the principal data
  • In the example on the previous slide

13
Sampling Error I
  • This is a measure of the departure of all the
    possible estimates of a probability sampling
    procedure from the population quantity being
    measured
  • In other words, it refers to the difference
    between the estimate derived from a sample survey
    and the 'true' value that would result if the
    whole population was studied (using the same
    conditions)
  • It is also known sometimes as the bias

14
Sampling Error II
  • The standard error, variance coefficient of
    variation (C.V.) are all measures of the sampling
    error
  • Note A census does not have a sampling error
  • The sampling error is therefore equal to

Population Parameter
Sample Statistic
15
Sampling Error III - Characteristics
  • The sampling error
  • generally decreases as the sample size increases
    (but not proportionally)
  • depends on the size of the population under study
  • depends on the variability of the characteristic
    of interest in the population
  • can be accounted for and reduced by an
    appropriate sample plan
  • can be measured and controlled in probability
    sample surveys

16
Sampling Error - IV
  • Although the Margin of Error and Sampling Error
    are sometimes used interchangeably they are TWO
    different concepts
  • Sampling error
  • .

Margin of Error
17
Non-Sampling Errors
  • These are errors resulting from some imperfection
    in the research design that causes response error
    from a mistake in the execution of the research
  • Examples
  • sample bias
  • errors in recording responses
  • nonresponses
  • Also know as Systematic Errors

18
Sampling Fraction
  • This is the size of the sample as a proportion of
    the population from which it was drawn
  • It is equal to n/N
  • If n/N gt 1 then there is sampling with
    replacement

19
Bias I
  • This means that results based on the sample do
    not (even on average) reflect the same answers as
    would come from a census
  • They are caused by both sampling non-sampling
    factors

20
Bias II
21
Bias III
This is an adaptation of the diagram in Kish pg
519
22
Sampling with without Replacement
  • Sampling with Replacement
  • occurs when a unit sampled is placed back into
    the population
  • A particular unit is can be included more than
    once in the sample
  • It is possible that n gt N
  • Sampling without Replacement
  • Occurs when a unit sampled is not placed back
    into the population
  • A particular unit can only occur ONCE in the
    sample
  • In some cases sampling without replacement from
    an infinite population can be equal to sampling
    from a small population with replacement

23
Probability Non-Probability Sampling I
  • Probability Samples
  • Aka Random Samples (though sample units are not
    chosen haphazardly)
  • The probabilities for selecting different samples
    are specified
  • For each unit of the population the probability
    of it appearing any sample is known
  • It provides an estimate for the unknown
    population quantity
  • It also allows for the assessment of the standard
    error which can be used to obtain confidence
    intervals

24
  • There are 3 main steps involved in choosing a
    probability sample
  • Decide on the population of interest
  • Establish a sampling frame
  • Select units from the frame using a probabilistic
    algorithm

25
Probability Non-Probability Sampling II
  • Non-Probability Sampling
  • This involves the selection of a units by
    arbitrary methods
  • The probability of selection for each unit is
    unknown
  • It is dangerous to make inferences about the
    target population
  • It is often used to test aspects of a survey such
    as questionnaire design, processing systems etc.
    rather than make inferences about the target
    population

26
Probability Non-Probability Sampling III
  • Choosing between the two types depends on
  • the objectives and scope of the survey
  • the method of data collection suitable to those
    objective
  • the precision required of the results and whether
    that precision needs to be able to be measured
  • the availability of a sampling frame
  • the resources required to maintain the frame
  • the availability of extra information about the
    units in the population

27
Sampling Methods
Probability Sampling
Non Probability Sampling
  • Simple Random
  • Stratified
  • Systematic
  • Cluster
  • Multi-stage
  • Purposive
  • Quota
  • Snowball
  • Convenience

28
Simple Random Sampling I
  • Each member of the population has the same
    probability of being a part of the sample
    independent of whether another subject is in the
    sample
  • AKA equal probability selection method
  • It is the simplest sampling method

29
Simple Random Sampling II
  • n units are selected from N possible units in the
    population
  • Every combination of n units is equally likely to
    be the sample selected
  • The selection process can either be
  • sampling without replacement, which is more
    common
  • unrestricted sampling / sampling with
    replacement

30
Simple Random Sampling III
  • It is important because it possesses simple
    mathematical properties which are useful for
    statistical theory and the computations are
    relatively easy
  • All other probability sampling methods are
    restrictions of SRS (usually where some
    combinations of population elements are
    suppressed)
  • For the mathematical properties to hold we must
    assume an infinite population

31
Simple Random Sampling IV
  • There are three main ways of choosing a simple
    random sample
  • Table of Random Numbers
  • Lottery Method
  • Computer Generated Numbers

32
Stratified Random Sampling I
  • A population is subdivided or partitioned
  • Each subdivision is called a stratum
  • All the subdivisions are the strata
  • The idea is to ensure that the observations of
    the units of a stratum are closer to each other
    than to units of another stratum

33
Stratified Random Sampling II
  • SRS does not produce good results in cases where
    the population to be sampled contains easily
    recognisable subpopulations or strata
  • Strata do not overlap and any member of the
    population can belong to only ONE stratum

34
Stratified Random Sampling III
35
Stratified Random Sampling IV
  • Examples
  • Household income or expenditure surveys
  • urban rural
  • Business surveys
  • employee size
  • Production
  • sales
  • industrial classification
  • Agricultural surveys
  • Stratification depends on purpose of survey

36
Stratified Random Sampling V
  • Why Stratify?
  • In situations where there is foreknowledge of
    some non-homogeneity in the population,
    proportional stratified sampling ensures a
    representative sampling across the non-homogenous
    population
  • Used for administrative or scientific reasons,
    where each stratum needs to be reported
    separately
  • Eg. crop yield in each agro climatic stratum
    where the results for each stratum will have its
    own meaning

37
Stratified Random Sampling VI
  • Why Stratify? Contd
  • Stratification has the advantage of
    administrative convenience
  • May be due to practical constraints of access to
    the population or cost
  • it may be easier to have each province/parish
    conduct the survey
  • Survey problems may be different in different
    strata
  • Eg. A financial survey of businesses maybe done
    differently for small companies who are not
    required to pay a certain tax in comparison with
    larger companies who are

38
Stratified Random Sampling VII
  • Why Stratify? Contd
  • Each stratum is more homogenous than the
    population when taken as a whole
  • stratified sample would provide relatively
    precise estimates within each stratum
  • yield more precise population estimates than if
    simple random sampling was used

39
Stratification The Procedure I
  • The population is divided into H strata
  • Each strata doesnt overlap and is exhaustive
  • A SRS of size nh is taken from each stratum with
    population Nh

40
Stratification The Procedure II
  • There are three types of Allocation
  • Equal Allocation
  • used when the main interest is to compare strata
    parameters
  • and/or the population is thought to have a
    homogenous variance within each stratum  (ie the
    variances are similar)
  • The same number of elements are taken from each
    strata

41
Stratification The Procedure III
  • Proportional Allocation
  • Used in cases where the sample is supposed to
    reflect the population with respect to the
    stratification variable
  • The number of units sampled within a given
    stratum is proportional to the size of the
    stratum
  • It is best to use proportional allocation in
    situations where the variances of each stratum
    are approximately equal

42
Stratification The Procedure IV
  • Optimal Allocation
  • Used in cases where the variances for the strata
    differ greatly
  • Also used when
  • primary interests are the estimates for the
    entire population
  • it is assumed that there is unequal variance
    between each stratum
  • It produces estimates for the population mean or
    total with the lowest variance for a fixed total
    sample size, n

43
Stratification - Advantages
  • Estimates for each stratum can be evaluated
    separately
  • Differences among the strata can be evaluated
  • Total, means and proportion can be estimated with
    high precision using appropriate weights
  • Savings in time and cost (convenience)

44
Stratification - Disadvantages
  • The proportion of the total population that
    belongs to each stratum needs to be known
  • It may be complex and time consuming

45
Systematic Random Sampling I
  • Most widely known method of selection
  • Simple to apply
  • Consists of taking every kth sampling unit after
    a random start
  • AKA Pseudo-Random selection
  • Often used jointly with stratification with
    cluster sampling

46
Systematic Random Sampling II
  • The first element is based on random selection
    but subsequent elements are not
  • Procedure
  • The population is divided into k groups of size n
    N/k in each
  • One unit is chosen randomly from the first k
    units
  • Every kth unit following is included in the
    sample
  • It is possible that N/k is not an integer

47
Systematic Random Sampling III
  • Examples
  • Agricultural Survey Selection of every 10th
    farm from 500 farms in an area (would produce 50
    farms)
  • Industrial Quality Control every 30 minutes or
    every 10th batch
  • Marketing or Political Surveys every 10th
    person passing a particular location
  • Surveys to supplement censuses
  • Large multistage surveys samples are selected
    systematically at the different stages

48
Systematic Random Sampling - Advantages
  • Operationally convenient
  • Flexible
  • Convenient to use when the sampling frame is not
    available
  • It is spread out more evenly over the population
    so that it is more likely to produce a more
    representative sample

49
Systematic Random Sampling - Disadvantages
  • More precise than SRS when units within the
    sample are heterogeneous and imprecise when the
    units are homogeneous
  • Generally, it is not possible to gain suitable
    estimates of the variance of the estimator from
    one sample. The approximate variance can be
    calculated

50
Cluster Sampling I
  • Cluster sampling divides the population into
    groups, or clusters
  • A number of clusters are selected randomly to
    represent the population, and then all units
    within selected clusters are included in the
    sample
  • No units from non-selected clusters are included
    in the sample. They are represented by those from
    selected clusters
  • This differs from stratified sampling, where some
    units are selected from each group

51
Cluster Sampling II
  • The unit of selection contains more than one
    population element
  • Examples of possible clusters

52
Cluster Sampling III
  • Advantages
  • The cost per element is lower due to the lower
    cost of listing or of location. Cost is also
    lower because sampling is done within clusters
  • All elements are in one cluster, then there is
    the convenience of reaching each members
  • Disadvantages
  • Combining the variance from two separately
    homogenous clusters may cause the variance of the
    entire sample to be higher when compared with SRS
  • less accurate results are often obtained due to
    higher sampling error than for simple random
    sampling with the same sample size

53
Multi-Stage Sampling I
  • Multi-stage sampling is like cluster sampling
  • It involves selecting a sample within each chosen
    cluster, rather than including all units in the
    cluster
  • It is sometimes referred to as sub-sampling

54
Multi-Stage Sampling II
  • Multi-stage sampling involves selecting a sample
    in at least two stages
  • 1st stage, large groups or clusters are selected
  • These clusters are designed to contain more
    population units than are required for the final
    sample
  • 2nd stage, population units are chosen from
    selected clusters to derive a final sample
  • This is called TWO-STAGE SAMPLING
  • If more than two stages are used, the process of
    choosing population units within clusters
    continues until the final sample is achieved
    this would be considered MULTI-STAGE SAMPLING

55
Multi-Stage Sampling -Advantages
  • Useful when there is no sampling frame
  • If sub-units within a selected unit give similar
    results, it is uneconomical to measure all the
    second stage units
  • Lists are prepared for a small portion of the
    total populations of second stage units so it is
    considered economical
  • No need for sampling procedures at each stage to
    be the same

56
Multi-Stage Sampling -Disadvantages
  • The sampling of compact clusters may present
    practical difficulties

57
Summary
58
Purposive/Judgemental Sampling
  • The sample is hand-picked
  • The researcher exercises deliberate subjective
    choice in drawing what he/she regards as a
    representative sample
  • Often used for case study research
  • It may also be used to eliminate anticipated
    sources of distortion

59
Quota Sampling
  • Participants are selected from certain subgroups
    in the population
  • In most cases, participants are chosen just
    before the interview begins although the aim is
    to be as random as possible
  • Usually used in market surveys opinion polls
  • A proper statistical design is used to determine
    what numbers are needed for each subgroup

60
Snowball
  • Members of the sample name other persons which
    can (and usually is) included in the sample
  • Used mainly for populations which do not have a
    proper or adequate sampling frame
  • Researcher identifies a few key participants who
    then identify other relevant participants

61
Convenience
  • Participants are selected because they are
    readily available
  • Considered to be the most unreliable method of
    sampling

62
Sampling Rare Populations I
  • Problems arise if there is no relevant, accurate
    sampling frame for a rare group
  • Special methods need to be used to estimate
  • Prevalence / incidence of occurrence
  • Characteristics of the population
  • Population Means, Totals etc
  • Examples
  • Medical Conditions
  • Social Conditions

63
Sampling Rare Populations II
  • There are 6 methods used to sample rare
    populations
  • Screening
  • Disproportionate Sampling
  • Multiplicity Sampling
  • Snowballing
  • Multiple Frames
  • Sequential Sampling

64
Screening I
  • This involves double or two phase sampling
  • Procedure
  • Survey general population
  • Identify potential members of the group
  • Detail survey of the potential members

65
Screening II
  • Problems
  • High cost
  • Should apparent non-members also be sampled
  • High costs can be cut by
  • Telephone interviews (which can be
    unrepresentative)
  • Postal questionnaires (which can have low
    response)
  • Sharing costs with other surveys
  • Sampling more intensively in cluster with
    relatively high concentration of the rare group
    (eg sample cluster only if 1st selected element
    is a member of the rare population)

66
Disproportionate Sampling
  • Sampling more intensively in cluster with
    relatively high concentration of the rare group
  • eg sample cluster only if 1st selected element
    is a member of the rare population
  • Gains are only high when stratum to be
    oversampled does have a high prevalence relative
    to other strata
  • Optimal allocation theory can be used to
    determine sampling fractions in each stratum

67
Multiplicity
  • Sample all close neighbours and / or relatives of
    selected sample members
  • May use proxy information to estimate prevalence

68
Snowballing
  • Creation of a sampling frame through other
    relevant contacts as suggested by group members

69
Multiple Frame
  • This involves using many sampling frames
  • Overlaps are dealt with by
  • merging the files
  • cleaning the data file
  • use of weights related to the probability of
    selection

70
Sequential Sampling
  • Continue sampling until a large enough sample of
    the rare population is achieved

71
Choice of Sample Size I
  • An increase in sample size leads to an increase
    in the precision of the sample mean as an
    estimator of the population mean
  • An increase in sample size leads typically to an
    increase in sampling costs

72
Choice of Sample Size II
  • The trade off between cost and precision is key
  • Sample too large waste of resources
  • Sample too small an estimator with inadequate
    precision
  • Choose either precision required OR maximum cost
    which can be expended and then choose sample size

73
Choice of Sample Size III
  • The main method presupposes that the population
    variance is known
  • In practice, most times the population variance
    is unknown
  • Usually the sample variance is used to replace
    the population variance but there is no sample
  • Solution Can be chosen by using one of the
    following methods

74
Choice of Sample Size IV
  • From Pilot Studies
  • If the pilot study uses SRS then its results may
    give some indication of the value of the
    population variance
  • NB A pilot study is limited to a certain part of
    the population so the estimate of the variance
    will be biased
  • From Previous Surveys
  • Usually the study of a population with similar
    characteristics in a similar population has been
    previously conducted
  • The measure of variability from earlier surveys
    can be used to estimate the variance of the
    population currently under study
  • NB Caution must be taken in using the
    information

75
Choice of Sample Size V
  • From a Preliminary Sample
  • Most reliable approach
  • May not be feasible because of administration or
    cost
  • A preliminary SRS is taken and used to estimate
    the population variance
  • Procedure
  • Preliminary sample of size n1 is chosen and used
    to estimate the population variance by the sample
    variance s12
  • n1 is inadequate in producing the necessary
    precision so another sample of (n-n1) is chosen
    by using s12 as the preliminary estimate of the
    population variance

76
Choice of Sample Size VI
  • From Practical Considerations of the Structure of
    the Population
  • It might be able to determine what kind of
    distribution an event may have
  • The variance is estimated using the formulas for
    the specified distribution
  • eg if it is assumed that a specific event might
    follow a Possion distribution then we can assume
    that the mean and the variance are equal

77
Choice of Sample Size VII- Large Populations
Table 1
Source Parker Rea, Designing and Conducting
Research
78
Choice of Sample Size VIII- Small Populations
Table 2
Source Parker Rea, Designing and Conducting
Research
79
Choice of Sample Size IX Interval Variables
  • Large Populations
  • Small Populations
  • N population
  • n sample
  • C confidence interval (Z times std deviation)
  • Z Z score for level of confidence
  • s standard error for the distribution of sample
    means

80
Choice of Sample Size X
  • If interested in BOTH proportions and intervals
    then choose the higher sample size
  • Tables 1 and 2 should be adequate to cover sample
    sizes for both
Write a Comment
User Comments (0)
About PowerShow.com