Topic VI: Sampling Theory

About This Presentation

Title:

Topic VI: Sampling Theory

Description:

Lottery Method. Computer Generated Numbers. Stratified Random Sampling I ... A SRS of size nh is taken from each stratum with population Nh ... – PowerPoint PPT presentation

Number of Views:730

Avg rating:3.0/5.0

Slides: 81

Provided by: mona1

Category:

Tags: lottery | nh | sampling | theory | topic

more less

Transcript and Presenter's Notes

Title: Topic VI: Sampling Theory

1
Topic VISampling Theory Sampling Methods

Concepts Definitions
Sampling With Without replacement
Probability Non-Probability Sampling
Types of Sampling Methods
Determining Sample Size

2
Concepts Definitions

Population
Sampling Frame
Unit of Analysis
Sampling Units
Principal Information

Auxiliary Information
Sampling Error
Non-Sampling Error
Sampling Fraction
Bias

3
Population

This a collection of all the units of a specified
type defined over a given space or time
It is defined by
Content this refers to who or what exactly are
the subjects of interest. Eg. All persons above
aged 18 and over
Units this refers to how the subjects are
grouped. Eg. Within households
Extent this refers to the spatial feature of
the population. Eg. The subjects can only be
living in Jamaica.
Time this refers to what period of time that
your subjects must possess the particulars named
above. Eg. June October 1998

4
Sampling Frame

This is a list of the all the units in the target
population from which the sample is to be chosen
A subset of subjects for a survey should only be
taken from a sampling frame

5
Sampling Frames II

When conducting a national survey there are two
types of sampling frames that can be used (note
there are others)
Electoral registers this lists electors in each
polling district by street streets in
alphabetical order
Postcode Sectors as their primary sampling unit

6
Sampling Frames III

Postal Sectors are determined from a Postcode
Address File
Postal Areas (CT) 121
Postal Districts (CT2) 2700
Postal Sectors (CT2 7) 8900
Postcodes (CT2 7PE) 1.6mn
Delivery Points 26mn

7
Sampling Frames IV
8
Sampling Frames - Notes
Post Code Sectors
Electoral Register

not all adults are electors (exclude felons, non
EU or Commonwealth, ...)
limited corrections
includes institutions (colleges)
coverage high but incomplete
compiled annually
compiled locally with variety of software
in force until 18 months from data collection

does not give names
no indication of household size
multi occupancy indicator available
will contain some small business addresses
updated quarterly
national system
computerised format, so lower selection cost

9
Unit of Analysis

Sometimes referred to as Sampling Units
This is the items/units being investigated
Eg Individuals, households, hospitals

10
Sampling Units

This refers to the items/units selected for
inclusion in the sample
Eg If John Brown was selected to be included in
the sample then he is a sampling unit

11
Principal Information

This refers to information on the central
variable of the study
Also known as principal variable or principal
data
Eg. For a household budget survey, the principal
variable would be considered to be expenditure on
food

12
Auxiliary Information

This refers to any other information other than
the principal data
In the example on the previous slide

13
Sampling Error I

This is a measure of the departure of all the
possible estimates of a probability sampling
procedure from the population quantity being
measured
In other words, it refers to the difference
between the estimate derived from a sample survey
and the 'true' value that would result if the
whole population was studied (using the same
conditions)
It is also known sometimes as the bias

14
Sampling Error II

The standard error, variance coefficient of
variation (C.V.) are all measures of the sampling
error
Note A census does not have a sampling error
The sampling error is therefore equal to

Population Parameter
Sample Statistic
15
Sampling Error III - Characteristics

The sampling error
generally decreases as the sample size increases
(but not proportionally)
depends on the size of the population under study
depends on the variability of the characteristic
of interest in the population
can be accounted for and reduced by an
appropriate sample plan
can be measured and controlled in probability
sample surveys

16
Sampling Error - IV

Although the Margin of Error and Sampling Error
are sometimes used interchangeably they are TWO
different concepts
Sampling error
.

Margin of Error
17
Non-Sampling Errors

These are errors resulting from some imperfection
in the research design that causes response error
from a mistake in the execution of the research
Examples
sample bias
errors in recording responses
nonresponses
Also know as Systematic Errors

18
Sampling Fraction

This is the size of the sample as a proportion of
the population from which it was drawn
It is equal to n/N
If n/N gt 1 then there is sampling with
replacement

19
Bias I

This means that results based on the sample do
not (even on average) reflect the same answers as
would come from a census
They are caused by both sampling non-sampling
factors

20
Bias II
21
Bias III
This is an adaptation of the diagram in Kish pg
519
22
Sampling with without Replacement

Sampling with Replacement
occurs when a unit sampled is placed back into
the population
A particular unit is can be included more than
once in the sample
It is possible that n gt N
Sampling without Replacement
Occurs when a unit sampled is not placed back
into the population
A particular unit can only occur ONCE in the
sample
In some cases sampling without replacement from
an infinite population can be equal to sampling
from a small population with replacement

23
Probability Non-Probability Sampling I

Probability Samples
Aka Random Samples (though sample units are not
chosen haphazardly)
The probabilities for selecting different samples
are specified
For each unit of the population the probability
of it appearing any sample is known
It provides an estimate for the unknown
population quantity
It also allows for the assessment of the standard
error which can be used to obtain confidence
intervals

There are 3 main steps involved in choosing a
probability sample
Decide on the population of interest
Establish a sampling frame
Select units from the frame using a probabilistic
algorithm

25
Probability Non-Probability Sampling II

Non-Probability Sampling
This involves the selection of a units by
arbitrary methods
The probability of selection for each unit is
unknown
It is dangerous to make inferences about the
target population
It is often used to test aspects of a survey such
as questionnaire design, processing systems etc.
rather than make inferences about the target
population

26
Probability Non-Probability Sampling III

Choosing between the two types depends on
the objectives and scope of the survey
the method of data collection suitable to those
objective
the precision required of the results and whether
that precision needs to be able to be measured
the availability of a sampling frame
the resources required to maintain the frame
the availability of extra information about the
units in the population

27
Sampling Methods
Probability Sampling
Non Probability Sampling

Simple Random
Stratified
Systematic
Cluster
Multi-stage

Purposive
Quota
Snowball
Convenience

28
Simple Random Sampling I

Each member of the population has the same
probability of being a part of the sample
independent of whether another subject is in the
sample
AKA equal probability selection method
It is the simplest sampling method

29
Simple Random Sampling II

n units are selected from N possible units in the
population
Every combination of n units is equally likely to
be the sample selected
The selection process can either be
sampling without replacement, which is more
common
unrestricted sampling / sampling with
replacement

30
Simple Random Sampling III

It is important because it possesses simple
mathematical properties which are useful for
statistical theory and the computations are
relatively easy
All other probability sampling methods are
restrictions of SRS (usually where some
combinations of population elements are
suppressed)
For the mathematical properties to hold we must
assume an infinite population

31
Simple Random Sampling IV

There are three main ways of choosing a simple
random sample
Table of Random Numbers
Lottery Method
Computer Generated Numbers

32
Stratified Random Sampling I

A population is subdivided or partitioned
Each subdivision is called a stratum
All the subdivisions are the strata
The idea is to ensure that the observations of
the units of a stratum are closer to each other
than to units of another stratum

33
Stratified Random Sampling II

SRS does not produce good results in cases where
the population to be sampled contains easily
recognisable subpopulations or strata
Strata do not overlap and any member of the
population can belong to only ONE stratum

34
Stratified Random Sampling III
35
Stratified Random Sampling IV

Examples
Household income or expenditure surveys
urban rural
Business surveys
employee size
Production
sales
industrial classification
Agricultural surveys
Stratification depends on purpose of survey

36
Stratified Random Sampling V

Why Stratify?
In situations where there is foreknowledge of
some non-homogeneity in the population,
proportional stratified sampling ensures a
representative sampling across the non-homogenous
population
Used for administrative or scientific reasons,
where each stratum needs to be reported
separately
Eg. crop yield in each agro climatic stratum
where the results for each stratum will have its
own meaning

37
Stratified Random Sampling VI

Why Stratify? Contd
Stratification has the advantage of
administrative convenience
May be due to practical constraints of access to
the population or cost
it may be easier to have each province/parish
conduct the survey
Survey problems may be different in different
strata
Eg. A financial survey of businesses maybe done
differently for small companies who are not
required to pay a certain tax in comparison with
larger companies who are

38
Stratified Random Sampling VII

Why Stratify? Contd
Each stratum is more homogenous than the
population when taken as a whole
stratified sample would provide relatively
precise estimates within each stratum
yield more precise population estimates than if
simple random sampling was used

39
Stratification The Procedure I

The population is divided into H strata
Each strata doesnt overlap and is exhaustive
A SRS of size nh is taken from each stratum with
population Nh

40
Stratification The Procedure II

There are three types of Allocation
Equal Allocation
used when the main interest is to compare strata
parameters
and/or the population is thought to have a
homogenous variance within each stratum (ie the
variances are similar)
The same number of elements are taken from each
strata

41
Stratification The Procedure III

Proportional Allocation
Used in cases where the sample is supposed to
reflect the population with respect to the
stratification variable
The number of units sampled within a given
stratum is proportional to the size of the
stratum
It is best to use proportional allocation in
situations where the variances of each stratum
are approximately equal

42
Stratification The Procedure IV

Optimal Allocation
Used in cases where the variances for the strata
differ greatly
Also used when
primary interests are the estimates for the
entire population
it is assumed that there is unequal variance
between each stratum
It produces estimates for the population mean or
total with the lowest variance for a fixed total
sample size, n

43
Stratification - Advantages

Estimates for each stratum can be evaluated
separately
Differences among the strata can be evaluated
Total, means and proportion can be estimated with
high precision using appropriate weights
Savings in time and cost (convenience)

44
Stratification - Disadvantages

The proportion of the total population that
belongs to each stratum needs to be known
It may be complex and time consuming

45
Systematic Random Sampling I

Most widely known method of selection
Simple to apply
Consists of taking every kth sampling unit after
a random start
AKA Pseudo-Random selection
Often used jointly with stratification with
cluster sampling

46
Systematic Random Sampling II

The first element is based on random selection
but subsequent elements are not
Procedure
The population is divided into k groups of size n
N/k in each
One unit is chosen randomly from the first k
units
Every kth unit following is included in the
sample
It is possible that N/k is not an integer

47
Systematic Random Sampling III

Examples
Agricultural Survey Selection of every 10th
farm from 500 farms in an area (would produce 50
farms)
Industrial Quality Control every 30 minutes or
every 10th batch
Marketing or Political Surveys every 10th
person passing a particular location
Surveys to supplement censuses
Large multistage surveys samples are selected
systematically at the different stages

48
Systematic Random Sampling - Advantages

Operationally convenient
Flexible
Convenient to use when the sampling frame is not
available
It is spread out more evenly over the population
so that it is more likely to produce a more
representative sample

49
Systematic Random Sampling - Disadvantages

More precise than SRS when units within the
sample are heterogeneous and imprecise when the
units are homogeneous
Generally, it is not possible to gain suitable
estimates of the variance of the estimator from
one sample. The approximate variance can be
calculated

50
Cluster Sampling I

Cluster sampling divides the population into
groups, or clusters
A number of clusters are selected randomly to
represent the population, and then all units
within selected clusters are included in the
sample
No units from non-selected clusters are included
in the sample. They are represented by those from
selected clusters
This differs from stratified sampling, where some
units are selected from each group

51
Cluster Sampling II

The unit of selection contains more than one
population element
Examples of possible clusters

52
Cluster Sampling III

Advantages
The cost per element is lower due to the lower
cost of listing or of location. Cost is also
lower because sampling is done within clusters
All elements are in one cluster, then there is
the convenience of reaching each members
Disadvantages
Combining the variance from two separately
homogenous clusters may cause the variance of the
entire sample to be higher when compared with SRS
less accurate results are often obtained due to
higher sampling error than for simple random
sampling with the same sample size

53
Multi-Stage Sampling I

Multi-stage sampling is like cluster sampling
It involves selecting a sample within each chosen
cluster, rather than including all units in the
cluster
It is sometimes referred to as sub-sampling

54
Multi-Stage Sampling II

Multi-stage sampling involves selecting a sample
in at least two stages
1st stage, large groups or clusters are selected
These clusters are designed to contain more
population units than are required for the final
sample
2nd stage, population units are chosen from
selected clusters to derive a final sample
This is called TWO-STAGE SAMPLING
If more than two stages are used, the process of
choosing population units within clusters
continues until the final sample is achieved
this would be considered MULTI-STAGE SAMPLING

55
Multi-Stage Sampling -Advantages

Useful when there is no sampling frame
If sub-units within a selected unit give similar
results, it is uneconomical to measure all the
second stage units
Lists are prepared for a small portion of the
total populations of second stage units so it is
considered economical
No need for sampling procedures at each stage to
be the same

56
Multi-Stage Sampling -Disadvantages

The sampling of compact clusters may present
practical difficulties

57
Summary
58
Purposive/Judgemental Sampling

The sample is hand-picked
The researcher exercises deliberate subjective
choice in drawing what he/she regards as a
representative sample
Often used for case study research
It may also be used to eliminate anticipated
sources of distortion

59
Quota Sampling

Participants are selected from certain subgroups
in the population
In most cases, participants are chosen just
before the interview begins although the aim is
to be as random as possible
Usually used in market surveys opinion polls
A proper statistical design is used to determine
what numbers are needed for each subgroup

60
Snowball

Members of the sample name other persons which
can (and usually is) included in the sample
Used mainly for populations which do not have a
proper or adequate sampling frame
Researcher identifies a few key participants who
then identify other relevant participants

61
Convenience

Participants are selected because they are
readily available
Considered to be the most unreliable method of
sampling

62
Sampling Rare Populations I

Problems arise if there is no relevant, accurate
sampling frame for a rare group
Special methods need to be used to estimate
Prevalence / incidence of occurrence
Characteristics of the population
Population Means, Totals etc
Examples
Medical Conditions
Social Conditions

63
Sampling Rare Populations II

There are 6 methods used to sample rare
populations
Screening
Disproportionate Sampling
Multiplicity Sampling
Snowballing
Multiple Frames
Sequential Sampling

64
Screening I

This involves double or two phase sampling
Procedure
Survey general population
Identify potential members of the group
Detail survey of the potential members

65
Screening II

Problems
High cost
Should apparent non-members also be sampled
High costs can be cut by
Telephone interviews (which can be
unrepresentative)
Postal questionnaires (which can have low
response)
Sharing costs with other surveys
Sampling more intensively in cluster with
relatively high concentration of the rare group
(eg sample cluster only if 1st selected element
is a member of the rare population)

66
Disproportionate Sampling

Sampling more intensively in cluster with
relatively high concentration of the rare group
eg sample cluster only if 1st selected element
is a member of the rare population
Gains are only high when stratum to be
oversampled does have a high prevalence relative
to other strata
Optimal allocation theory can be used to
determine sampling fractions in each stratum

67
Multiplicity

Sample all close neighbours and / or relatives of
selected sample members
May use proxy information to estimate prevalence

68
Snowballing

Creation of a sampling frame through other
relevant contacts as suggested by group members

69
Multiple Frame

This involves using many sampling frames
Overlaps are dealt with by
merging the files
cleaning the data file
use of weights related to the probability of
selection

70
Sequential Sampling

Continue sampling until a large enough sample of
the rare population is achieved

71
Choice of Sample Size I

An increase in sample size leads to an increase
in the precision of the sample mean as an
estimator of the population mean
An increase in sample size leads typically to an
increase in sampling costs

72
Choice of Sample Size II

The trade off between cost and precision is key
Sample too large waste of resources
Sample too small an estimator with inadequate
precision
Choose either precision required OR maximum cost
which can be expended and then choose sample size

73
Choice of Sample Size III

The main method presupposes that the population
variance is known
In practice, most times the population variance
is unknown
Usually the sample variance is used to replace
the population variance but there is no sample
Solution Can be chosen by using one of the
following methods

74
Choice of Sample Size IV

From Pilot Studies
If the pilot study uses SRS then its results may
give some indication of the value of the
population variance
NB A pilot study is limited to a certain part of
the population so the estimate of the variance
will be biased
From Previous Surveys
Usually the study of a population with similar
characteristics in a similar population has been
previously conducted
The measure of variability from earlier surveys
can be used to estimate the variance of the
population currently under study
NB Caution must be taken in using the
information

75
Choice of Sample Size V

From a Preliminary Sample
Most reliable approach
May not be feasible because of administration or
cost
A preliminary SRS is taken and used to estimate
the population variance
Procedure
Preliminary sample of size n1 is chosen and used
to estimate the population variance by the sample
variance s12
n1 is inadequate in producing the necessary
precision so another sample of (n-n1) is chosen
by using s12 as the preliminary estimate of the
population variance

76
Choice of Sample Size VI

From Practical Considerations of the Structure of
the Population
It might be able to determine what kind of
distribution an event may have
The variance is estimated using the formulas for
the specified distribution
eg if it is assumed that a specific event might
follow a Possion distribution then we can assume
that the mean and the variance are equal

77
Choice of Sample Size VII- Large Populations
Table 1
Source Parker Rea, Designing and Conducting
Research
78
Choice of Sample Size VIII- Small Populations
Table 2
Source Parker Rea, Designing and Conducting
Research
79
Choice of Sample Size IX Interval Variables