Sampling Distributions - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Sampling Distributions

Description:

Sampling Distributions Stat 515 Lecture – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 32
Provided by: Eds149
Category:

less

Transcript and Presenter's Notes

Title: Sampling Distributions


1
Sampling Distributions
  • Stat 515 Lecture

2
Inching Towards Inference
  • Recall that one of our main goals is to make
    inference about the unknown parameters of the
    population or the distribution, such as the mean
    ?, the standard deviation ?, or some other
    summary measures such as the median, etc.
  • We now have possible models for the population,
    which are provided by the probability
    distributions (Binomial, Poisson, Normal,
    Uniform, others).
  • We also know how to compute sample statistics
    such as the sample mean, sample standard
    deviation, and others, with these sample
    statistics to be used for making inference about
    the parameters.

3
Sampling as a Random Experiment
  • To understand the notion of a sampling
    distribution of a sample statistic, it is
    important to realize that the process of taking
    a sample from a population could be viewed as a
    random experiment.
  • To illustrate this idea, consider a population
    taking 3 values 2, 4, 5 according to the
    following probability distribution.
  • Probability Function p(2) .4, p(4) .5, p(5)
    .1
  • You may imagine that 40 of all the values in the
    population equals 2 50 equals 4 and 10 equals
    5.

4
The Population
4s
5s
2s
5
Characteristics of the Population
  • For this population, we have the parameters
  • ? (2)(.4) (4)(.5) (5)(.1) .8 2 .5
    3.3
  • ?2 (2 - 3.3)2(.4) (4 - 3.3)2(.5) (5 -
    3.3)2(.1) 1.21
  • ? (1.21)1/2 1.1
  • Its shape is given by the bar graph below

6
Possible Outcomes of Sampling Process
  • Now, consider the sampling process of taking n
    2 observations (with replacement) from this
    population or distribution. Below is a table of
    possibilities.

7
Some Points about the Preceding Table
  • Since we are sampling with replacement, to obtain
    the probability of each possible sample, we
    simply multiply the probabilities of each of the
    observations (Think of a tree diagram!).
  • The 9 possible samples represent the elementary
    events of the experiment of taking a sample of
    size 2 from the population or distribution.
  • The sample mean ( ) is obtained the usual way.
  • The sample variance is computed the usual way.
    For example, for the second sample, we have
  • S2 (2-3)2 (4-3)2/(2-1) 1 1/1 2

8
Sample Statistics as Random Variables
  • Since the sample mean and the sample variance are
    numerical characteristics of each of the possible
    samples, they can be viewed as random variables
    in this sampling experiment.
  • Therefore, we could obtain the probability
    distributions of the sample mean and sample
    variance.
  • These probability distributions are called
    sampling distributions.
  • Thus we will have the sampling distribution of
    the sample mean, as well as the sample variance.

9
Sampling Distribution of the Sample Mean
  • From the earlier table, we could construct the
    probability distribution of the sample mean, now
    called the sampling distribution of the sample
    mean.
  • This is given by the following table.

10
Graph of the Sampling Distribution of the Sample
Mean
  • Note that it has become more concentrated near
    the population mean of 3.3, compared to the
    original distribution.

11
Parameters of the Sampling Distribution
  • Because the sampling distribution is just like
    any other probability distribution, we are also
    able to obtain its mean, variance, and standard
    deviation.
  • Thus, for the sampling distribution of the sample
    mean, we find the mean to be 3.3, which coincides
    with the original population mean while
  • the variance of the sampling distribution of the
    sample mean turns out to be equal to .605, which
    is equal to (1.21)/2, the population variance
    divided by the sample size.
  • The standard deviation of the sample mean, now
    called the standard error (SE), is (.605)1/2
    .7778.

12
Recapitulation
  • Sampling from a probability distribution or
    population could be viewed as a random
    experiment, and the elementary outcomes are the
    possible samples.
  • Sample statistics, such as the sample mean, could
    be viewed as random variables, and as such have
    their associated probability distributions, which
    are called sampling distributions.
  • The sampling distribution also has a mean.
  • And it also has a variance.
  • The standard deviation of the sampling
    distribution is called the standard error (SE).

13
Sampling Distribution of the Sample Mean
  • The mean of the sampling distribution of the
    sample mean equals the population mean.
  • The variance of the sampling distribution of the
    sample mean equals the population variance
    divided by the sample size.
  • These two characteristics are always true for the
    sampling distribution of the sample mean when
    sampling with replacement.

14
Obtaining Sampling Distributions
  • In the example considered, we obtained the
    sampling distribution of the sample mean by
    enumerating all the possible samples that could
    arise.
  • However, such a method is not feasible if the
    sample size is large. For instance, if n 10,
    then there will be a total of (3)(3)(3)(3) 310
    59049 possible samples, and complete
    enumeration is not anymore possible.
  • How do we obtain sampling distributions?

15
Some Methods for Obtaining Sampling Distributions
of Statistics
  • Complete enumeration, if possible.
  • Computer simulation or via the Monte Carlo
    method. In this method the computer generates
    many, many samples, and then constructs the
    probability histogram of the values of the
    statistic of interest. This will provide an
    empirical approximation.
  • Using theoretical results such as, for instance,
    when sampling from a Bernoulli population the
    number of successes is binomially-distributed.
  • Using theoretical approximations such as the
    Central Limit Theorem or the deMoivre
    approximation.

16
Illustrating the Monte Carlo Method
  • We illustrate the use of the simulation or Monte
    Carlo method by approximating the sampling
    distribution of the sample mean based on n 10
    observations from the population considered
    earlier which has
  • p(2) .4, p(4) .5, p(5) .1
  • We generate 500 samples of size n 10 from this
    population, and for each sample we compute the
    sample mean.
  • This simulation was done using Minitab.

17
First 10 of the 500 Generated Samples
  • The table below shows the first 10 samples of
    size n 10 that were generated from the
    population.
  • Also included are their corresponding sample
    means.

18
Relative Frequency Histogram of the 500 Sample
Means
19
Points to Ponder
  • This relative frequency histogram of the
    simulated sample means serves as an approximation
    to the sampling distribution of the sample mean
    when n 10 and when sampling from the given
    population.
  • Notice that the values of the sample means are
    now clustered around the population mean of 3.3,
    and furthermore, the shape of the histogram is
    almost bell-shaped.
  • Looking at this histogram, it also shows that the
    chances of getting a sample of size n 10 whose
    sample mean is less than 2.5 or greater than 4.5
    is rather small.

20
  • When the mean of the 500 sample means is
    computed, it turns out to be 3.3094. Their
    median is exactly 3.30!
  • Recall that the population mean is 3.30.
  • The standard deviation of the 500 sample means
    turns out to be 0.3497.
  • Recall that the population standard deviation is
    (1.21)1/2 1.1, so

21
  • We therefore note that the mean of the simulated
    sample means is very close to the population
    mean, and
  • the standard deviation of the simulated sample
    means is also very close to the population
    standard deviation divided by the square root of
    the sample size.
  • Indeed, we always have the theoretical results

22
An Important Result About the Sampling
Distribution of the Sample Mean
  • When the population being sampled is a normal
    population with mean ? and standard deviation ?,
    then the sampling distribution of the sample mean
    is also normal with mean ? and standard error of
    ?/n1/2, for any sample size n.
  • When the population is not normal, however, then
    the sampling distribution of the sample mean need
    not be normal. But we have

23
Central Limit Theorem
  • If a random sample of size n is taken from a
    population or distribution with mean ? and
    standard deviation ?, and if the sample size is
    large (n gt 30), then the sampling distribution of
    the sample mean is approximately normal with mean
    ? and standard deviation (or standard error) of
    ?/n1/2. That is,

24
Uses of the Central Limit Theorem
  • Because of this approximation, when computing
    probabilities associated with the sample mean, we
    can use the approximation given below which uses
    the standard normal distribution.
  • Note Z ? N(0,1), the standard normal variable.

25
Applications of the CLT
  • Situation 1 Suppose we take a sample of size n
    30 from the population described by the
    probability function p(2) 0.4, p(3) 0.5, p(5)
    0.1. This is the population we were using
    earlier.
  • Question 1 We seek the approximate probability
    that the sample mean is between 3.1 and 3.5.
  • Question 2 Find the approximate probability that
    the sample mean is less than 2.6.

26
Applications continued
  • Situation 2 The systolic blood pressure
    population data set has mean ? 114.58 and
    standard deviation of ? 14.06. Its
    distribution is not normal as it is right-skewed.
    Suppose we take a random sample of n 50
    people, and obtain the sample mean of their
    systolic blood pressures.
  • Question 1 What is the approximate probability
    that this sample mean will exceed 120?

27
Continued ...
  • Question 2 What would be the value of A such
    that the probability that the sample mean of the
    systolic blood pressures of a sample of size 50
    is greater than A is 0.95?

28
Sampling a Bernoulli Population
  • A Bernoulli population is one where there are
    only two possible values or outcomes, called a
    Success, denoted by the value of X 1, and a
    Failure, denoted by a value of X 0. The
    probability of a Success is denoted by p.
  • For such a population we have
  • Mean ? p
  • Variance ?2 p(1-p).
  • Consider now taking a sample of size n from this
    population and letting equal the proportion
    of successes in the sample. That is,

29
Sample Proportion
  • Because the Bernoulli observations are either 0
    or 1 (with 1 representing success), then the
    sample proportion could be defined via

30
Sampling Distribution of the Sample Proportion
  • Since the sample proportion is the sample mean of
    the observations from a Bernoulli population, by
    the Central Limit Theorem, it follows that the
    sampling distribution of the sample proportion,
    when the sample size is large (that is n gt 30),
    is approximately normal with mean of p and SE of
    p(1-p)/n1/2.

31
An Application
  • Situation One of the ways most Americans
    relieve stress is to reward themselves with
    sweets. According to one study, 46 admit to
    overeating sweet foods when stressed. Suppose
    that the 46 figure is correct and we take a
    random sample of size n 100 Americans and ask
    them if they overeat sweets when they are
    stressed out.
  • Question 1 What is the probability that the
    proportion who overeats sweets in this sample
    exceeds 0.50?
Write a Comment
User Comments (0)
About PowerShow.com