Concepts of Probability and Statistics Statistical Intervals - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Concepts of Probability and Statistics Statistical Intervals

Description:

However, alternative notion is that coin is 2-headed is well supported by evidence at hand. The key is to determine which notion is best supported by the evidence. 10 ... – PowerPoint PPT presentation

Number of Views:119

Avg rating:3.0/5.0

Slides: 39

Provided by: leona6

Category:

more less

Transcript and Presenter's Notes

Title: Concepts of Probability and Statistics Statistical Intervals

1
Concepts of Probability and StatisticsStatistic
al Intervals
2
Probability and Statistics - Best Illustrated
with an example

Suppose I flip a coin 10 times and get 8H, 2T,
but I dont know the coin is really fair. What
is my chance of tossing heads on the next throw?
A) If I only have sample information at my
disposal, I estimate 80.
B) If I consider the total population of
possible tosses, I would say 50.
These two cases illustrate the basic difference
between probability and statistics.

Probability deals with the likelihood of
observing an event arising from a known process
(model).
i) Based on deductive reasoning
ii) Begin with knowing something about the
population (e.g. that the coin is fair)
iii) Reason from population to the sample (e.g.
how likely will the next toss be H?)
Statistics approaches the problem backwards
given a collection of observed data (the sample),
one of many possible subsets of data, what can be
said about the process (the entire population of
data)?
i) Based on inductive reasoning
ii) Begin with knowing only about the sample
results, not the population (e.g. 8H and 2T)
iii) Reason from sample to population (e.g. is
the coin fair based on sample results.

Since we dont usually know the behaviour of the
overall population, but only about the sample
results we collect, we are forced to make
statistical inferences in order to estimate
typical behaviour in the population from the
sample results.
As the coin tossing example illustrates, even if
the average behaviour of the coin is to land on H
50 of the time, specific sample results will
vary from experiment to experiment.
So to adequately estimate or predict the coins
true average behaviour, one must carefully
account for between-sample variability.

5
(No Transcript)
6

In an environmental pollution setting, the sample
results will vary from period to period even if
no contamination has occurred. Why?
Variation in lab measurements of concentration of
individual samples.
Sampling variability from field collection and
handling.
Natural variation in background levels of
pollutants.
These factors combine to give random variation in
sample results that will be seen whether or not
contamination has occurred.
Despite the sample fluctuations due to random
variation, we still want to know for example if
compliance concentrations are significantly
higher than background concentrations on average.

Note that the degree of sample fluctuation in
background and compliance point data relative to
the difference in average background and
compliance point concentration levels plays a
crucial role in distinguishing background
behaviour from compliance point behaviour.
Only by careful measurement of sample variability
can we accurately make statistical inferences
about the behaviour of the overall population.
For example, consider the question
Is the long-term average concentration level at
the compliance point greater than the
background levels?

One way to answer the above question is to set
up a hypothesis test using the results of sample
groundwater analyses.
A hypothesis test makes a decision as to which of
two competing notions is closer to reality or
truth.
It is a type of statistical inference that reason
from sample results back to the population .
It is used in environmental monitoring settings
since samples are costly to analyze only limited
data are typically available for statistical
purposes.

Example Flip a coin 100 times and get all
heads.
What do we decide about the coin? What is the
chance of getting heads on the next toss?
Answer Chance is 100. Why? Because coin is
almost certainly two-headed!
Notion being tested is whether the coin is fair
or not.
If we say it is fair, what evidence can we use to
support our claim?
Prob (100H in 100 tosses of fair coin) (1/2)100
approx.. zero.
However, alternative notion is that coin is
2-headed is well supported by evidence at hand.
The key is to determine which notion is best
supported by the evidence.

In an environmental setting (monitoring and
remediation), first make sure that the hypothesis
being tested is appropriate.
In detection monitoring, this becomes Ho No
contamination versus Ha evidence of
contamination. E.g. innocent until proven
guilty
In remediation, the null hypothesis changes to
Ho guilty until proven innocent or dirty
until proven clean.
Then choose a statistical test that measures
whether the sample data side better with Ho or Ha

11
Point Estimation

The sample median and sample mean estimate the
corresponding center points of a population.
Such estimates are called point estimates.
For example, point estimators for the 100-year
flood might be
a) the largest flood which occurred during 100
years or record.
b) Q0.99 meanQ stdQ x Z0.99, using the
mean and standard deviation of the flood record
(assumes a normal distribution of the Qs).
c) Q0.99 expmeanlogQ stdlogQ x K0.99,
where K0.99 is the P3 distribution frequency
factor for a skewness g.
d) Q0.99 from regional equations.

12
Things to Consider in Selecting an Estimator

It should have little or no BIAS
It should have low MEAN SQUARE ERROR.
It should be RESISTANT. I.e. not affected by a
few unusual values.
It should be ROBUST. I.e. Its MSE should
compare favourably with wide range of assumptions
(e.g. distribution).
It should be REPRODUCIBLE. Others should be able
to repeat the calculation with no difference in
results.

13
Interval Estimation

We want to estimate a statistical interval
because a point estimate tells us nothing about
the variability of the statistic. Since any
statistic is itself a random variable, it is thus
very important to know how it might fluctuate.
E.g. There is a big difference between
20 ppm ? 10 ppm and 20 ppm ? 2 ppm.
Interval estimates are intervals which have a
stated probability of containing the true
population value.
The intervals are wider for data sets having
greater variability.

Interval estimates can provide three pieces of
information which point estimates cannot
1. A statement of the probability or likelihood
that the interval contains the true population
value (its reliability).
- confidence intervals.
2. A statement of the likelihood that a single
data point with specified magnitude comes from
the population under study.
- prediction intervals.
3. A statement of the likelihood that the
interval contains a certain proportion of all
population values.
- tolerance intervals.

15
Differences Between the Interval Types

Statistical intervals have different uses
depending on the purpose in mind
Astronaut example
An astronaut awaiting his tour of duty on the
space shuttle is not concerned about what happens
on average during such flights (confidence
interval), nor what happens 95 of all flights
(tolerance interval), but rather with what will
happen on his or her specific flights (prediction
interval).

Casino example
A player is concerned with what he or she will
win on the next few bets (prediction interval)
the casino owners care about their average
winnings in order to make a profit (confidence
interval) while the roulette operator who makes
a commission on each bet lost by a player is
concerned about long-run proportion of lost bets
(tolerance interval).
Remember, the type of interval used can make a
huge difference in the resulting decision -- in
general, the widths of confidence, tolerance, and
prediction intervals will be very different on
the same sample data.

The width of the interval indicates the amount of
potential error or variability associated with
the sample average.
The width depends on three factors
i. Estimated standard deviation of sample data,
s.
ii. Level of confidence chosen beforehand.
iii. Sample size, n.
To reduce the width of a random interval, either
i. Increase the sample size, or
ii. Lower the acceptable confidence level (e.g.
from 95 to 90).

18
Parametric, Non-parametric, Symmetric and
Asymmetric Intervals

Parametric
assumes data (or transformed data) are normally
distributed.
Non-parametric
distribution free (based on ranks).
Symmetric
interval divided equally on either side of true
value. Usually used with parametric intervals.
Asymmetric
Interval not divided equally on either side of
true value. It is used with skewed data (e.g.
lognormal data).

19
Sampling Distributions

Probability of statistics e.g. mean, median,
stdev, etc. Used for constructing statistical
intervals.

20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38

While the previous statement is true, when
dealing with environmental data however, the
distribution of data are usually positively
skewed. In these cases, use of the
t-distribution may not be appropriate.
Sometimes the t-distribution can be used after a
suitable transformation of the data is made. E.g.
after a log transform.

Write a Comment

User Comments (0)