Hypothesis testing using bootstrap resampling - PowerPoint PPT Presentation

1 / 16

About This Presentation

Title:

Hypothesis testing using bootstrap resampling

Description:

Hypothesis testing using bootstrap resampling – PowerPoint PPT presentation

Number of Views:213

Avg rating:3.0/5.0

Slides: 17

Provided by: brucek64

Category:

more less

Transcript and Presenter's Notes

Title: Hypothesis testing using bootstrap resampling

1
Hypothesis testing using bootstrap resampling

May 15, 2008
ESM 206C

2
What weve done so far

Used bootstrap resampling to understand the
pattern of variability of the sample statistic if
the population parameter was actually the value
we estimated from our data
Used to construct confidence intervals, look for
bias
What about hypothesis testing?

3
What if we bootstrap P values?

Resample the data, and calculate test statistic
on this sample
E.g., F statistic for regression model
Calculate P value for that statistic
Call this P
Look at the distribution of P
What does this tell us?

4
What we want to do

P is the probability of observing our data if the
null hypothesis is true
Find a way to simulate the process of resampling
from a population for which H0 is true

5
Recall the TccB problem

An industrial site has been found to be
contaminated with a toxic chemical called TcCB,
and the company responsible has performed a
cleanup operation. The EPA has determined that
concentrations in the soil above 6 parts per
million (ppm) are unsafe
Your job is to determine whether the cleanup has
been successful. The company has taken a set of
soil samples from the site and sent them to an
independent lab for analysis. The results (in
ppm) are contained in the file tccb.xls

6
H0 µx 6 ppm HA µx lt 6 ppm
7
Conceptual approach

Create a population which has a mean of 6 ppm
(the null hypothesis) but otherwise has all the
characteristics of our sample mean
Do this by subtracting the quantity
from every observation
Bootstrap this dataset, and calculate bootstrap
means
How often are bootstrap means more extreme than
observed mean?

8
Bootstrap mean lt data mean 219/1000 times, even
though null hypothesis true (mu6) P 219/1000
0.22
9
For single-sample tests theres an easier way

Use bootstrap of original data to calculate 90
or 95 CI of mean
Two-tailed tests if 95 CI includes m0, Pgt0.05
One tailed tests if sample mean is in same
direction as HA, and 90 CI includes m0, Pgt0.05
(if sample mean in same direction as H0, Pgt0.5)
TcCB 90 BCa CI is 3.61, 13.01

10
Comparing means of two samples

t-test assuming equal variances
H0 m1 m2, s1 s2, both populations normal
The two samples come from identical populations
Any given observation could just as easily be
from either population
Now we assume identical populations, but not
necessarily normal
Permutation test

11
Permutation test

Create a new dataset that has all the original
observations, but with assignments to groups 1
2 randomized (permuted)
Each group has same sample size as in original
data
Calculate difference in sample means of the two
groups
Repeat many times, and compare distribution of
resampled differences to difference in original
data

12
P 0.0183
13
Permutation test for regression

Null hypothesis there is no relationship between
x and y
If true, then any possible value of y can occur
at any x
If we scramble the xs and ys, should get same
result

For each bootstrap sample
Take the full set of xs
To each x, assign a y at random (sampling w/o
replacement)
Run regression
Calculate F
Look at distribution of F
Compare with observed value

14
Bootstrapping the chlorophyll regression
15
Bootstrapping a regression
Call lm(formula Chlorophyll.a Phosphorus,
data chlor) Residuals Min 1Q Median
3Q Max -36.148 -13.901 -5.022 5.254
61.037 Coefficients Estimate Std.
Error t value Pr(gtt) (Intercept) 11.34093
6.72380 1.687 0.105 Phosphorus
0.30241 0.03512 8.610 1.19e-08
--- Signif. codes 0 '' 0.001 '' 0.01
'' 0.05 '.' 0.1 ' ' 1 Residual standard error
24.86 on 23 degrees of freedom Multiple
R-Squared 0.7632, Adjusted R-squared 0.7529
F-statistic 74.13 on 1 and 23 DF, p-value
1.189e-08
16
Permutation tests assume homoskedasticity

Residuals assumed to be random draws from the
same distribution, regardless of x
Solution involves bootstrapping residuals, but
this is not a generic process
Consult a real statistician

Write a Comment

User Comments (0)