Computation Details Confidence Intervals for the Center of Data - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Computation Details Confidence Intervals for the Center of Data

Description:

... flood is the 99th percentile (0.99 quantile) of the distribution of annual flood ... except binomial probability is with parameters n and p = quantile of interest. ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 35
Provided by: leona6
Category:

less

Transcript and Presenter's Notes

Title: Computation Details Confidence Intervals for the Center of Data


1
Computation Details -Confidence Intervals for
the Center of Data
  • Interest in median (typical value)
    non-parametric interval or parametric interval.
  • A NP interval estimate for the true population
    median is computed using the cumulative
    probabilities from the Binomial distribution.
  • 1. The desired ? is stated, acceptable risk of
    not including the true median.
  • 2. ?/2 of this risk is assigned to each end of
    the interval,

2
  • 3. Table A of the cdf of the binomial
    distribution with parameters n and p 0.5
    provides the lower and upper critical values x
    and x at one-half the desired ? level. This
    table is identical to the one used for the Sign
    Test to be discussed later.
  • 4. These critical values are transformed to ranks
    R1 and Ru corresponding to data points Cl and Cu
    at the ends of the confidence interval.
  • Rl x 1
  • Ru n - x x

3
(No Transcript)
4
  • The resulting confidence interval will reflect
    the shape (skewed or symmetric) of the original
    data.
  • NP interval cannot always exactly produce the
    desired confidence level when the sample sizes
    are small.

5
  • Minitab (SINTERVAL) uses non-linear interpolation
    to get exact (1 - ?) levels.
  • For n gt 20, a normal approximation can be used to
    compute the intervals.
  • Computed ranks are rounded to the nearest integer
    when necessary.

6
Parametric Interval Estimate for the Median
  • As mentioned in FIRST LECTURE, the geometric mean
    of X (GMX) is an estimate of the median of X when
    yln(X) is normal or fairly symmetric.
  • The mean of y and confidence interval on the mean
    of y become the geometric mean with its
    (asymmetric) confidence interval after being
    transformed back to original units by
    exponentiation.

7
Confidence Intervals for the Mean
  • Intervals may also be computed for the true
    population mean ?. These are appropriate if the
    center of mass of the data is the statistic of
    interest.
  • Symmetric intervals around the sample mean are
    computed most often. For large sample sizes a
    symmetric interval adequately describes the
    variation of the mean regardless of the shape of
    the data distribution.
  • Note C.I. on the geometric mean is not an
    interval estimate of the mean.

8
Prediction Intervals to Evaluate a Future New
Observation
  • The question is often asked whether a new
    observation is likely to have come from the same
    distribution as previously collected data.
  • This can be evaluated by determining whether the
    new observation is outside the prediction
    interval computed from existing data.
  • E.g. Calculate prediction interval from
    background data (or same well data), check new
    compliance data (or new observation in same well)
    with prediction interval.

9
  • PI are wider that CI because an individual
    observation is more variable than is a summary
    statistic.
  • NP prediction interval - valid for all data.
  • Symmetric prediction interval - valid only for
    symmetric data
  • Asymmetric prediction interval - valid only when
    logs are symmetric.
  • For a normal population, table is available for
    prediction interval of k new observations.

10
Two - Sided NP PI
  • The NP PI of confidence level (1 - ?) is simply
    the interval between ?/2 and (1- ?/2) percentile
    of the distribution.

11
  • The interval contains 100(1- ?) percent of the
    data.
  • Therefore if the new observation comes from the
    same distribution as the previously measured
    data, there is a 100 ? chance that is will lie
    outside the previously measured data.
  • to

One-Sided NP PI
One-sided PI are appropriate if the interest is
in whether a new observation is larger than
existing data, or smaller than existing data, but
not both.
12
  • One-sided PI
  • New lt X?(n1) or New gt X(1- ?)(n1) (But not
    either or).

13
Parametric PI - Symmetric PI
  • Assumes data follow a normal distribution. PI
    are considered to be symmetric around the sample
    mean and wider than CI on the mean.
  • The equation for PI differs from that for the CI
    by adding a term s, the std. dev. of an
    individual observation around their mean.
  • PI to

14
  • Difference in length between the PI and CI is
  • There PI can be computed from CI and sample size
    n. This is useful when using Minitab or other
    software which do not have a routine to compute
    the PI.
  • One sided PI are computed as before using ?
    rather than ?/2 and comparing new data to only
    one end of the PI.

15
Asymmetric PI
  • This is based on the log of the skewed data.
  • PI
  • to
  • where y ln(X)
  • Use parametric intervals when data are normal or
    lognormal only.

16
Confidence Intervals for Quantiles (Percentiles)
  • E.g. Whats the CI of the 100 yr. flood?
  • The 100 yr. flood is the 99th percentile (0.99
    quantile) of the distribution of annual flood
    data.
  • Similarly, the 2 yr. flood is the median or 50th
    percentile of annual floods.
  • In environmental monitoring, the median, 95th, or
    some other percentile should not exceed (or be
    below) a standard (e.g. water quality standard.

17
Valid for All Data - NP Interval
  • Similar procedure as for median, except binomial
    probability is with parameters n and p quantile
    of interest.
  • For n gt 20, the normal approximation can be used.
  • The 0.5 is a continuity correction term.
  • The computed ranks Rl and Ru are rounded to the
    nearest integer.

18
NP Test Whether a Percentile Differs From Xo
(2-sided test)
NP Tests for Percentiles
  • E.g. A water quality standard Xo could be set
    such that the median of daily concentrations
    should not exceed Xo ppb.
  • Compute interval for percentile, if Xo falls
    within this interval, the percentile is not
    significantly different from Xo at the ? level

19
(No Transcript)
20
NP Test for Whether a Percentile Exceeds Xo
(One-sided Test)
  • Compute one-sided CI for the percentile.
    Remember that the entire error level ? is placed
    on the side below the percentile point estimate.

21
(No Transcript)
22
NP Test for Whether a Percentile is Less than Xo
(One-sided Test)
  • Compute one-sided CI for the percentile, place
    all error ? on one side above the estimated
    percentile.

23
(No Transcript)
24
Intervals for Normal Population
  • Factors for calculating two-sided 95 intervals
    for a normal distribution (or transformed to
    normal) are available in a table.
  • For the 95 CI to contain the true mean, it is
    given by
  • where the factor cM(n) is obtained from the
    table, and s is the standard deviation.
  • The factor is basically the 97.5th percentile of
    the t-distribution with n-1 degrees of freedom.
  • E.g. If n5, calculated mean 50.1 and
    standard deviation 1.31, factor is 1.24.

25
(No Transcript)
26
  • To compute the 95 tolerance intervals to contain
    a specific percentage e.g. 90 of a normal
    population, we use
  • where, the factor is cT,90 (n) for the given
    sample size n, and s is the standard deviation.
  • E.g. We have 5 observations and the sample mean
    is 50.1, and the standard deviation is 1.31.
    From the table, the factor to contain 90 of a
    normal distribution with 95 confidence when n5
    is 4.28. Hence, the tolerance interval is from
    44.49 to 55.71.
  • Other percentages 95 and 99 are also given in
    the table.

27
  • Prediction intervals for future observations (PI
    to contain all k future observations) are
    computed similarly using the factors from the
    table. With 95 confidence that all k future
    observations from a previously sampled normal
    population will be located in the interval given
    by
  • for k 1, 2, 5, 10, and 20.
  • E.g. Random sample of n5 observations, a 95 PI
    to contain the values of k2 further randomly
    selected observations from the same population is
    50.1 3.70 (1.31).

28
Comparison of length of intervals
Comparison of lengths of statistical intervals
for examples used.
29
Bootstrap Methods for Standard Errors and
Confidence Intervals
  • The bootstrap method for estimating the standard
    error of a statistic is one of the most
    significant developments in the field of
    statistics. The method has also been called the
    Computer Intensive method or a Resampling method.
    It can be used for hypothesis testing and
    probability evaluation. The method is completely
    nonparametric. The essential features of the
    bootstrap approach are best illustrated by and
    example.

30
  • Suppose we wish to estimate the median m from a
    sample of 13 data points.
  • 19.2 16.2 10.7 16.6 3.6 18.1 8.6
  • 15.3 14.0 14.2 16.9 13.4 5.7
  • and that we want the standard error of estimate,
    or CI for m. The sample median of these data is
    m 14.0.
  • Step 1
  • Draw a random sample of size 13, with
    replacement, from the original sample, e.g.
  • 3.6 19.2 8.6 15.3 8.6 3.6 14.2 10.7 10.7
    10.7 16.9 14.2

31
  • This is the first bootstrap sample. Note that
    some of the original sample values occur more
    than once, and others not at all. The keyword
    here is with replacement.
  • Step 2
  • Calculate the median for the bootstrap sample
  • m 10.7
  • Step 3
  • Carry out Steps 2 and 3 a large number B (say
    200) times, to obtain 200 bootstrap estimates
    m1, ... , m200.

32
  • Step 4
  • The standard deviation of the 200 m values gives
    the standard error of m.
  • To obtain the confidence interval, first sort the
    200ms in ascending order (lowest to highest).
    For a 90 confidence interval for m, choose the
    rank 10 and 191 bootstrap values for the lower
    and upper confidence limits.
  • The above steps can be easily implemented in
    MINITAB by writing a short simple macro. The
    macro can be written during the MINITAB session
    or can be written using a text editor.

33
  • Assume data is in column C1. Use Notepad or text
    editor type
  • gmacro
  • bootmed
  • do k11200
  • sample 13 c1 c2
  • replace.
  • let c3(k1) median(c2)
  • enddo
  • endmacro
  • save file as bootmed.mtb in macro directory of
    Minitab
  • to run, type during Minitab session
  • MTBgt bootmed
  • the 200 values of the bootstrap medians will be
    in C3.
  • DESCRIBE C3 will give stats of the 200 medians.

34
Summary
  • 1. Probability vs. Statistics (Deductive vs.
    Inductive)
  • 2. Things to consider in an estimator.
  • 3. Parametric and Non-parametric interval
    estimates.
  • 4. Types of interval estimates
  • confidence - mean or median
  • prediction - one or more future values
  • confidence -on percentile (tolerance)
Write a Comment
User Comments (0)
About PowerShow.com