Populations and Sampling - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Populations and Sampling

Description:

Samples can be studied more quickly ... 28. Travis. 29. Woody. 30. Brian ... 28. Travis. 29. Woody. 30. Brian. Systematic Sampling. Decide on sample size: n ... – PowerPoint PPT presentation

Number of Views:128

Avg rating:3.0/5.0

Slides: 38

Provided by: drinase

Category:

more less

Transcript and Presenter's Notes

Title: Populations and Sampling

1
Lecture 2

Populations and Sampling
Types of variables and scales of measurement

2
Populations and Sampling a. Reasons for using
samples

There are many good reasons for studying a sample
instead of an entire population
Samples can be studied more quickly than
populations. Speed can be important if a
physician needs to determine something quickly,
such as a vaccine or treatment for a new disease.
A study of a sample is less expensive than a
study of an entire population because a smaller
number of items or subjects are examined. This
consideration is especially important in the
design of large studies that require a long
follow-up.
A study of the entire populations is impossible
in most situations.
Sample results are often more accurate than
results based on a population.
If samples are properly selected, probability
methods can be used to estimate the error in the
resulting statistics.

3
Types of Sampling Methods
Samples
Probability Samples
Non-Probability Samples
Simple Random
Stratified
Consecutive
Judgemental
Cluster
Systematic
Convenience
4
Sampling Methods Non-probability samples

Depends on experts opinion,
Probabilities of selection not considered.
Advantages include convenience, speed, and lower
cost.
Disadvantages
Lack of accuracy,
lack of results generalizability.

5
Sampling Methods Non-probability samples (cont)

Consecutive sampling
It involves taking every patient who meets the
selection criteria over a specified time interval
or number of patients.
It is the best of the nonprobability techniques
and one that is very often practical.
Judgmental sampling
It involves hand-picking from the accessible
population those individuals judged most
appropriate for the study.
Convenience sampling
It is the process of taking those members of the
accessible population who are easily available.
It is widely used in clinical research because of
its obvious advantages in cost and logistics.

6
Probability Samples
Subjects of the sample are chosen based on known
probabilities. Guarantees that every element in
the population of interest has the same
probability of being chosen for the sample as all
other elements in the population random
selection.
Probability Samples
Simple Random
Systematic
Stratified
Cluster
7
Advantages of Probability sampling methods

The population of interest is clear (because it
must be identified before sampling from it.)
Possible sources of bias are removed, such as
self-selection and interviewer selection effects.
The general size of the sampling error can be
estimated.

8
Simple Random Sampling

Every individual or item from the target
population has an equal chance of being selected.
One may use table of random numbers or computers
programs for obtaining samples.

9
How to select a simple random sample

Define the population
Determine the desired sample size
List all members of the population or the
potential subjects
For example
4th grade boys who have demonstrated problem
behaviors
Lets select 10

10
Potential Subject Pool
11
So our selected subjects are numbers 10, 22, 24,
15, 6, 1, 25, 11, 13, 16.
12
Systematic Sampling

Decide on sample size n
Divide population of N individuals into groups
of
k individuals k N/n
Randomly select one individual from the 1st
group.
Select every k-th individual thereafter.

N 64 n 8 k 8
First Group
13
Systematic Sampling (cont)

Advantage The sample usually will be easier to
identify than it would be if simple random
sampling were used.
Example Selecting every 100th listing in a
telephone book after the first randomly selected
listing.

14
Stratified Random Sampling

The population is first divided into groups of
elements called strata.
Each element in the population belongs to one
and only one stratum.
Best results are obtained when the elements
within each stratum are as much alike as possible
(i.e. homogeneous group).
A simple random sample is taken from each
stratum.
Formulas are available for combining the stratum
sample results into one population parameter
estimate.

15
Stratified Random Sampling (cont)

Advantage If strata are homogeneous, this
method is as precise as simple random sampling
but with a smaller total sample size.
Example The basis for forming the strata might
be sex, occupation, location, age, industry type,
etc.

16
Cluster Sampling

The population is first divided into separate
groups of elements called clusters.
Ideally, each cluster is a representative
small-scale version of the population (i.e.
heterogeneous group).
A simple random sample of the clusters is then
taken.
All elements within each sampled (chosen)
cluster form the sample.

17
Cluster Sampling (cont)

Advantage The close proximity of elements can
be cost effective (I.e. many sample observations
can be obtained in a short time).
Disadvantage This method generally requires a
larger total sample size than simple or
stratified random sampling.
Example A primary application is area
sampling, where clusters are city blocks or other
well-defined areas.

18
Random . . .

Random Selection vs. Random Assignment
Random Selection every member of the population
has an equal chance of being selected for the
sample.
Random Assignment every member of the sample
(however chosen) has an equal chance of being
placed in the experimental group or the control
group.
Random assignment allows for individual
differences among test participants to be
averaged out.

19
Subject Selection (Random Selection)
Choosing which potential subjects will actually
participate in the study
20
Subject Assignment (Random Assignment)
Deciding which group or condition each subject
will be part of
Group B
Group A
21
Population 200 8th Graders
40 High IQ students
120 Avg. IQ students
40 Low IQ students
Random Selection
30 students
30 students
30 students
Random Assignment
15 students
15 students
15 students
15 students
15 students
15 students
Group A
Group B
Group A
Group B
Group A
Group B
22
Randomization (Random assignment to two
treatments)

Randomization tends to produce study groups
comparable with respect to known and unknown risk
factors,
removes investigator bias in the allocation of
participants
and guarantees that statistical tests will have
valid significance levels
Trialists most powerful weapon against bias

23
Randomization (Cont)

Simple randomizationToss a Coin
AAABBAAAAABABABBAAAABAA
Random permuted blocks (Block Randomization)
AABB-ABBA-BBAA-BAAB-ABAB-AABB-

24
Block Randomization

Each block contains all conditions of the
experiment in a randomized order.

E, C, C, E
C, E, C, E
E, E, C, C
Control Group N 6
Experimental Group N 6
25
Several ways to classify the variables

They may be defined as
quantitative variables
qualitative (categorical) variables

26
Quantitative variables

Measured in the usual sense
heights of adult males,
weights,
age of patients seen in a clinic.
Measurements made on quantitative variables
convey information regarding amount

Quantitative variables are either
Discrete
only take values from some discrete set of
possible values (whole integer)
number of patients admitted to the hospital
Continuous
Values from a continuous range of possible
values, although the recorded measurements are
rounded
weight,
height,
hemoglobin levels, etc..

28
Qualitative (categorical) variables

Some characteristics are not capable of being
measured in the sense that height, weight, and
age are measured.
These characteristics are categorized only
an ill person is given a medical diagnosis
(hepatitis, cancer, etc..)
a person is designated as belonging to an ethnic
group,
black,
white,
Hispanic, etc.

29
Scales of measurement

Another way to classify the variables is to
assign number to the objects or events according
to a set of rules.
These rules are the scales of measurement
They are commonly broken down into four types
Nominal
Ordinal
Interval (numerical)
Ratio (numerical)

30
Nominal scale

Simplest level of measurement
Data values fit into categories.
No ordering,
it makes no sense to state that M gt F
Arbitrary labels,
m/f, 0/1, etc
Many classifications in medical research are
evaluated on a nominal scale
Outcomes of a medical treatment occurring or not
occurring
Surgical procedure types of procedures
Presence of possible risk or exposure factors.

31
Nominal scale (cont)

Dichotomous variables
take on only one of two values
presence of pain (yes/no),
sex (male/female)
Data that can take on more than two values, as
anemia, for example, may be classified as
microcytic anemia, including iron deficiency
macrocytic or megaloblastic anemia, including
vitamin B12 deficiency
normocytic anemia, often associated with chronic
disease.
A study examining the prognosis for patients with
lung cancer might sort the type of cancer into
several categories, such as
small cell,
large cell,
squamous cell.

32
Nominal scale (cont)

The easiest way to determine whether observations
are measured on a nominal scale is to ask whether
the observations are classified or placed into
categories.
Data evaluated on a nominal scale are also called
qualitative observations, because the values fit
into categories.
Nominal or qualitative data are generally
described in terms of percentages or proportions.

33
Ordinal scale

There is an inherent order among the categories
Tumors, for example, are staged according to
their degree of development.
The international classification for staging of
carcinoma of the cervix is an ordinal scale from
0 to IV
0 Carcinoma in situ (localized)
I Cancer is confined to the cervix
II Cancer extends to the upper third of the
vagina, or the tissue around the uterus, but not
the pelvic wall
III The lower third of the vagina and/or the
pelvic sidewall and possibly the kidneys are
diseased
IV Cancer has spread beyond the reproductive
tract involving the bladder or rectum, and has
invaded distant organs (most often the lungs or
liver), the bones, or other systems in the body

34
Ordinal scale (cont)

Stage IV is worse than stage 0 with respect to
prognosis
This is an inherent order
An important characteristic of ordinal scales is
that although order exists among categories, the
difference between two adjacent categories is not
the same throughout the scale.
To illustrate, consider Apgar scores, which
describe the maturity of newborn infants on a
scale of 0 to 10,
lower scores indicating depression of
cardiorespiratory and neurologic functioning
higher scores indicating good cardiorespiratory
and neurologic functioning
difference between a score of 8 and a score of 10
is probably not of the same magnitude,
as the difference between a score of 0 and a
score of 2.
As with nominal scales, percentages and
proportions are often used with ordinal scales.

35
Numerical scale

Observations in which the difference between
numbers has meaning on a numerical scale
Also called quantitative observations, because
they measure the quantity of something.
There are two types of numerical scales
A continuous scale has values on a continuum
age
a discrete has values equal to integers
number of fractures, number of admissions

36
Numerical scale (cont)

If only a certain level of precision is required,
continuous data may be reported to the closest
integer. The important point, however, is that
more precise measurement is possible, at least
theoretically.
For example, the age of a group of patients can
be any value between zero and the age of the
oldest patient, i.e. age can be specified as
precisely as necessary.
In studies of adults, age to the nearest year
will generally suffice.
For younger children, age to the nearest month is
better.
In infants, age to the nearest hour or even
minute may be appropriate, depending on the
purpose of the study.
Other examples of continuous data include height,
weight, and length of time of survival, range of
joint motion and many laboratory values, such as
serum glucose, sodium, potassium or uric acid.