Interval estimates of the mean for small n, s unknown. Estimates of sample size. - PowerPoint PPT Presentation

About This Presentation
Title:

Interval estimates of the mean for small n, s unknown. Estimates of sample size.

Description:

Interval estimates of the mean for small n, unknown. Estimates of sample size. ASW, 8.2 8.4 Economics 224 Notes for October 15 ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 26
Provided by: PAUL2237
Category:

less

Transcript and Presenter's Notes

Title: Interval estimates of the mean for small n, s unknown. Estimates of sample size.


1
Interval estimates of the mean for small n, s
unknown. Estimates of sample size.
  • ASW, 8.2 8.4

Economics 224 Notes for October 15
2
Interval estimates using the t distribution
  • When a random sample of size n is drawn from a
    normally distributed population whose standard
    deviation s is unknown, the sampling distribution
    of the sample mean has a t distribution.
  • The interval estimate has the same format as
    earlier, but with a t value replacing the Z
    value, and the sample standard deviation s
    replacing s. The interval estimate of the
    population mean µ is
  • A confidence level must be specified for each
    interval estimate the confidence level and the
    sample size determine the t value.

3
t distribution
  • The shape of the t distribution is similar to
    that of the normal distribution bell-shaped,
    peaked in the centre, symmetrical about the mean,
    and asymptotic to the horizontal axis.
  • The table of the t distribution is a standardized
    t distribution, that is, the t values have mean 0
    and standard deviation 1.
  • There are many different t distributions, one for
    each degree of freedom (see explanation on a
    later slide).
  • For small degrees of freedom, the t distribution
    is very dispersed. As the degrees of freedom
    increase, the t distribution becomes more
    concentrated around the mean.
  • The limiting distribution for the t distribution
    is the normal distribution. That is, as the
    degrees of freedom increase, the t distribution
    is approximated by the normal distribution.

4
The concept of degrees of freedom (df)
  • Degrees of freedom refers to how many sample
    values can vary freely. In many statistical
    procedures, some sample values are constrained by
    the parameters to be estimated.
  • When a t distribution describes the sampling
    distribution of the mean, one degree of freedom
    is lost since s is used as an estimate of s (ASW,
    304). In this case, the t distribution has n-1
    degrees of freedom, where n is the sample size.
  • Degrees of freedom are used in the chi-square
    test and distribution and in regression and
    analysis of variance models. The type and number
    of constraints and degrees of freedom differ from
    model to model.

5
t table (ASW, 303 and Appendix B, Table 2
  • In ASW, the t table gives areas in the upper tail
    of the distribution. The t distribution is
    symmetric about a mean of t 0 so the same area
    for the lower tail is given by the negative of
    the t value in the table.
  • Each degree of freedom defines a different t
    distribution. For degrees of freedom above 100,
    use Z values from the standard normal
    distribution, since the normal distribution
    closely approximates the t distribution for large
    df.
  • t values are given in the body of the table, with
    areas under the curve given at the top of each
    column, and df at the start of each row.
  • The values in the table state the t value, or
    number of standard deviations, to the right of
    the centre of the distribution that is required
    to include all but the area in the right tail of
    the distribution.

6
General notation for interval estimates
  • The confidence coefficient is given the symbol
    (1-a). a is the first letter of the Greek
    alphabet and is termed alpha. For interval
    estimates, a is merely a symbol used to denote
    the area in the two tails of a distribution. The
    area in the middle of the distribution is (1-a)
    or (1-a) x 100 and there is a/2 of the area in
    each of the tails of the distribution.
  • When a sample mean is normally distributed, the Z
    values for the 95 interval estimate, or the (1
    0.05) x 100 95 interval, are 1.96. That is,
    if a 0.05,
  • and the interval estimate of the population
    mean µ is

7
Notation for interval estimates using the t
distribution
  • For the t distribution, the notation is the same
    as in the last slide, with t replacing Z. The
    only addition is that for each df, there is a
    different t value.
  • For a with a t distribution. the t values for the
    (1a) interval are ta/2 for appropriate df.
  • If a sample mean has a t distribution, df n-1,
    that is, the sample size minus 1.
  • For a 98 interval estimate of a population mean,
    where the sample size is n 9, or df n 1
    8, t 2.896.
  • In this case, the interval estimate of the
    population mean µ is

8
Example of wages of workers employed at new jobs
after a plant shutdown I
  • Prior to the shutdown of an Ontario manufacturing
    plant in the 1990s, male were paid 13.76 per
    hour and females 11.80 per hour.
  • Two years after the workers were laid off,
    researchers located some of these laid off
    workers. For twelve males workers who found new
    jobs, the mean hourly wage was 12.20, with a
    standard deviation of 3.27. For twelve female
    workers who found new jobs, the mean hourly wage
    was 8.11, with a standard deviation of 3.53.
  • Obtain 90 interval estimates of the mean wage
    for all laid off male workers who found new jobs.
    For female workers. What do you conclude from
    these results?

Source Data and research from Belinda Leach and
Anthony Winson, Bringing Globalization Down to
Earth Restructuring and Labour in Rural
Communities, Canadian Review of Sociology and
Anthropology, 323, August 1995.
9
Example of wages II
  • These are small samples, and for neither males
    nor females is the standard deviation of wages
    known. While male wages in the new jobs may not
    be exactly normally distributed, assume that they
    are symmetrically distributed or close to normal.
    Assume the same for the distribution of wages of
    female workers. Assume that each sample is a
    random sample of all laid off workers who had new
    jobs at the time of the study.
  • From the above assumptions, for each of male and
    female workers who found new jobs, the
    distribution of sample mean pay is a t
    distribution. In each case, the sample has size
    n 12, so the df associated with each interval
    estimate is df 12 -1 11. For a 90 interval
    estimate, there is 10 or 0.10 of the area in the
    middle of the distribution and 5 or 0.05 of the
    area in each of the two tails of the
    distribution. The appropriate t value for 11 df
    and 90 confidence is 1.796.

10
Example of wages III
  • For males, let µ be the mean wage of all those
    laid off male workers who found new jobs. The
    90 interval estimate for the population mean µ
    is
  • or 12.20 0.94 or (11.26 , 13.14).
  • For females, the procedure is the same and the
    resulting 90 interval estimate is
  • or 8.11 1.02 or (7.09 , 9.13).

11
Example of wages IV
  • These interval estimates provide reasonable
    certainty that the mean hourly pay of laid off
    workers has declined.
  • For males, the 90 confidence interval is 11.30
    to 13.14, so the mean pay for all re-employed
    males is very likely below the pre-layoff level
    of 13.76.
  • For females the situation appears worse. The 90
    interval estimate of mean pay of re-employed
    females is from 7.09 to 9.13, an interval well
    below the pre-layoff level of 11.80.
  • Neither result is certain but evidence at the
    time of the study is that workers, especially
    females, experienced a decline in mean pay, as
    compared with the pre-layoff situation.
  • Cautions samples may not be random,
    distribution of pay may not be normal,
    uncertainty with only 90 confidence.

12
Using the t distribution
  • Small n often occurs in practice.
  • s unknown is the usual situation.
  • Normal distribution of population. This is
    unlikely but so long as the population
    distribution is close to symmetric, this should
    not produce unreliable results.
  • In these situations, when estimating a population
    mean, it is advisable to use the t distribution
    as the sampling distribution of the sample mean,
    rather than the normal distribution.
  • For larger sample sizes, the sampling
    distribution of sample means can be approximated
    by the normal distribution.
  • All of the above assume that the sample is a
    random sample, or is equivalent to a random
    sample.

13
t distribution in economic analysis
  • Much economic analysis uses very large sample
    sizes. But there are situations where n is small
    and the population standard deviation is unknown.
    Then the sample mean has a t distribution. When
    n gt100, the normal provides a good approximation
    to the t distribution.
  • Experiments, administrative data, or other
    situations with a small number of cases.
  • Measurement error is often close to normally
    distributed.
  • Regression analysis, especially with time series
    data, where the number of observations across
    time is not large. Regression coefficients have
    a t distribution (ASW, Ch. 12).
  • Economic variables are sometimes assumed to be
    normally distributed, but with unknown
    variability, so the t distribution is used for
    the distribution of the sample mean.

14
Estimating sample size (ASW, 310-313)
  • Prior to conducting a research study, it is often
    useful to estimate the sample size required to
    achieve a particular margin of error, specifying
    a confidence level.
  • While this may not be the final sample size a
    researcher obtains, the following calculations
    provide an estimate of the number of population
    elements from which a researcher should attempt
    to obtain data. This, in turn, can be used to
    plan the research study and estimate the time and
    cost that will be required to conduct it.
  • Cost may be too great, respondents may refuse to
    participate, nonresponse to some questions, time
    may be insufficient.
  • The method examined here provides the required
    sample size for a random sample from a
    population, given the margin of error and
    confidence level.

15
Formula
  • Margin of error E.
  • Confidence level (1 a) 100 and the
    corresponding normal value is Za/2.
  • Population standard deviation is s.
  • n is the required sample size when random
    sampling from this population.

16
Rationale for formula
  • Formula for interval estimate is
  • Researcher wishes an interval
  • Let
  • And solve this expression for n, giving

17
Example of sample size I
  • A manager at Access Communications wants to know
    whether it is worthwhile to target university
    students for a promotion. In order to do this
    she would like to know how many minutes of TV
    students watch each day, accurate to within five
    minutes, with 99 confidence. The upper limit
    of budget expenditures for the study is 1,000.
  • You have been hired as a consultant to the
    manager and your task is to conduct a sample of
    university students to obtain the required
    information. What sample size is required?
    What would you recommend to the manager?

18
Example of sample size II
  • Fortunately you have kept the Excel worksheet
    from Economics 224 and when you check it, you
    determine that the standard deviation of the
    hours students in Economics 224 reported watching
    TV daily was 1.298 hours.
  • From this, you use the requirements specified by
    the manager, that is, a margin of error of 5
    minutes or E 5/60 0.0833 hours and 99
    confidence. The Z value is 2.576, so the
    required sample size is

19
Example III
  • You report that the required sample size is at
    least n 1,610.
  • You also report that a larger sample size might
    be required, since you have an estimate of the
    variability of the population that may be low.
    That is, s for all university students might
    exceed 1.3 hours.
  • If this is a random sample, to obtain a sampling
    frame and then contact each student by telephone,
    email, or Canada Post, you estimate that the cost
    of sampling is approximately 10 per student, for
    a total cost of well over 1,000.
  • From this, you might recommend a smaller sample
    size, with relaxed requirements, say E 15
    minutes, 95 confidence, for a sample of around
    100 students. You might note that the required
    margin of error of five minutes is very difficult
    to obtain and too demanding.
  • Explore less expensive methods of conducting the
    survey.

20
Estimating s
  • To obtain an estimate of the required n, some
    estimate of the population standard deviation s
    is required.
  • Use s from previous studies or similar
    populations.
  • Pilot study. Obtain a preliminary estimate of s.
  • Judgment or best guess. Dividing the range by 4
    can produce a reasonable provisional estimate of
    s. If there are outliers, it may be best to
    eliminate these. For example, what is the s of
    income for Saskatchewan residents? Minimum 0
    and maximum might be 10 million plus. But make
    range from 0 to 100,000 and this may include 99
    plus of the population. Rough estimate of s for
    Saskatchewan income would be around 25,000.
    (For 2001, s 23,000, from Census).
  • Structure sampling procedure so that the sample
    size can later be increased, if necessary.

21
Additional notes about sample size I
  • When obtaining n from the formula, round up.
  • Make sure units for s and E in the formula are
    the same.
  • n larger with
  • Greater variability s in the population.
  • Larger confidence level.
  • Smaller margin of error E.
  • Trade off between costs and accuracy of results.
  • For a random sample, n does not depend on
    population size if the proportion of the
    population sampled (n/N) is small.
  • It may not be possible to obtain the required n,
    so researcher will have to settle for a larger
    margin of error or reduced confidence level, or
    both. For example, time series data on Internet
    use may only be available for n 10 years.

22
Additional notes about sample size II
  • Sample size given by above formula indicates the
    number of population elements actually required
    in the study. If individuals or firms to be
    surveyed are reluctant to participate, cannot be
    found, or are unwilling to answer some questions,
    expand the required number of elements in the
    hope that the n indicated can be obtained. For
    example, if the formula indicates a required n
    500 and 25 nonresponse is expected, expand
    sample size to 650 or 700.
  • Sampling procedure affects required sample size.
    Cluster samples might need to have larger n but
    stratification of a sample might reduce the
    required sample size. Different formulae for
    more complex sampling procedures.

23
Weighting of sample elements
  • This issue is not discussed in the text but is
    one that needs consideration in much survey
    sampling.
  • Sampling procedure may be designed so each
    element selected in the sample represents a
    different number of population elements (eg.
    cluster, stratified, multistage sampling).
    Research methodology should report the weighting
    procedures to be used when conducting data
    analysis. Statistics Canada often includes a
    weight in the data set.
  • Weighting may occur after data obtained, to
    estimate characteristics of population. For
    example, if males and females are about equal in
    number in a population but a sample has fewer
    males than females, data from males may be more
    heavily weighted when analyzing and reporting
    results.

24
Conclusion about interval estimates and sample
size
  • Formulae are precise but approximations are often
    used
  • Random sample?
  • Standard deviation of population?
  • Confidence level arbitrary.
  • Nonresponse and other nonsampling errors.
  • When data come from samples, there is usually
    sampling error. Interval estimates and estimates
    of sample size are necessary but remember above
    cautions about their accuracy.
  • Replication of studies, similar research on
    related topics and comparable populations.
  • Careful sample and research design and data
    analysis.

25
Later on Wednesday or on Monday
  • Normal approximation to the binomial (ASW, 6.3).
  • Sampling distribution of the sample proportion
    (ASW, 7.6).
  • Interval estimate of a population proportion
    (ASW, 8.4).
  • Sample size for estimation of a population
    proportion (ASW, 315-316).
  • Review Monday during class and Tuesday at
    review session
Write a Comment
User Comments (0)
About PowerShow.com