Hypothesis testing using bootstrap resampling - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Hypothesis testing using bootstrap resampling

Description:

Hypothesis testing using bootstrap resampling – PowerPoint PPT presentation

Number of Views:213
Avg rating:3.0/5.0
Slides: 17
Provided by: brucek64
Category:

less

Transcript and Presenter's Notes

Title: Hypothesis testing using bootstrap resampling


1
Hypothesis testing using bootstrap resampling
  • May 15, 2008
  • ESM 206C

2
What weve done so far
  • Used bootstrap resampling to understand the
    pattern of variability of the sample statistic if
    the population parameter was actually the value
    we estimated from our data
  • Used to construct confidence intervals, look for
    bias
  • What about hypothesis testing?

3
What if we bootstrap P values?
  • Resample the data, and calculate test statistic
    on this sample
  • E.g., F statistic for regression model
  • Calculate P value for that statistic
  • Call this P
  • Look at the distribution of P
  • What does this tell us?

4
What we want to do
  • P is the probability of observing our data if the
    null hypothesis is true
  • Find a way to simulate the process of resampling
    from a population for which H0 is true

5
Recall the TccB problem
  • An industrial site has been found to be
    contaminated with a toxic chemical called TcCB,
    and the company responsible has performed a
    cleanup operation. The EPA has determined that
    concentrations in the soil above 6 parts per
    million (ppm) are unsafe
  • Your job is to determine whether the cleanup has
    been successful. The company has taken a set of
    soil samples from the site and sent them to an
    independent lab for analysis. The results (in
    ppm) are contained in the file tccb.xls

6
H0 µx 6 ppm HA µx lt 6 ppm
7
Conceptual approach
  • Create a population which has a mean of 6 ppm
    (the null hypothesis) but otherwise has all the
    characteristics of our sample mean
  • Do this by subtracting the quantity
    from every observation
  • Bootstrap this dataset, and calculate bootstrap
    means
  • How often are bootstrap means more extreme than
    observed mean?

8
Bootstrap mean lt data mean 219/1000 times, even
though null hypothesis true (mu6) P 219/1000
0.22
9
For single-sample tests theres an easier way
  • Use bootstrap of original data to calculate 90
    or 95 CI of mean
  • Two-tailed tests if 95 CI includes m0, Pgt0.05
  • One tailed tests if sample mean is in same
    direction as HA, and 90 CI includes m0, Pgt0.05
    (if sample mean in same direction as H0, Pgt0.5)
  • TcCB 90 BCa CI is 3.61, 13.01

10
Comparing means of two samples
  • t-test assuming equal variances
  • H0 m1 m2, s1 s2, both populations normal
  • The two samples come from identical populations
  • Any given observation could just as easily be
    from either population
  • Now we assume identical populations, but not
    necessarily normal
  • Permutation test

11
Permutation test
  • Create a new dataset that has all the original
    observations, but with assignments to groups 1
    2 randomized (permuted)
  • Each group has same sample size as in original
    data
  • Calculate difference in sample means of the two
    groups
  • Repeat many times, and compare distribution of
    resampled differences to difference in original
    data

12
P 0.0183
13
Permutation test for regression
  • Null hypothesis there is no relationship between
    x and y
  • If true, then any possible value of y can occur
    at any x
  • If we scramble the xs and ys, should get same
    result
  • For each bootstrap sample
  • Take the full set of xs
  • To each x, assign a y at random (sampling w/o
    replacement)
  • Run regression
  • Calculate F
  • Look at distribution of F
  • Compare with observed value

14
Bootstrapping the chlorophyll regression
15
Bootstrapping a regression
Call lm(formula Chlorophyll.a Phosphorus,
data chlor) Residuals Min 1Q Median
3Q Max -36.148 -13.901 -5.022 5.254
61.037 Coefficients Estimate Std.
Error t value Pr(gtt) (Intercept) 11.34093
6.72380 1.687 0.105 Phosphorus
0.30241 0.03512 8.610 1.19e-08
--- Signif. codes 0 '' 0.001 '' 0.01
'' 0.05 '.' 0.1 ' ' 1 Residual standard error
24.86 on 23 degrees of freedom Multiple
R-Squared 0.7632, Adjusted R-squared 0.7529
F-statistic 74.13 on 1 and 23 DF, p-value
1.189e-08
16
Permutation tests assume homoskedasticity
  • Residuals assumed to be random draws from the
    same distribution, regardless of x
  • Solution involves bootstrapping residuals, but
    this is not a generic process
  • Consult a real statistician
Write a Comment
User Comments (0)
About PowerShow.com