Nonparametrics.Zip (a compressed version of nonparamtrics) Tom Hettmansperger Department of Statistics, Penn State University - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Nonparametrics.Zip (a compressed version of nonparamtrics) Tom Hettmansperger Department of Statistics, Penn State University

Description:

Department of Statistics, Penn State University. References: ... Johnson, Morrell, and Schick (1992) Two-Sample Nonparametric Estimation and ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 45
Provided by: tphettma
Category:

less

Transcript and Presenter's Notes

Title: Nonparametrics.Zip (a compressed version of nonparamtrics) Tom Hettmansperger Department of Statistics, Penn State University


1
Nonparametrics.Zip(a compressed version of
nonparamtrics)Tom HettmanspergerDepartment of
Statistics, Penn State University
  • References
  • Higgins (2004) Intro to Modern Nonpar Stat
  • Hollander and Wolfe (1999) Nonpar Stat Methods
  • Arnold Notes
  • Johnson, Morrell, and Schick (1992) Two-Sample
    Nonparametric Estimation and Confidence Intervals
    Under Truncation, Biometrics, 48, 1043-1056.
  • Website http//www.stat.wmich.edu/slab/RGLM/

2
Single Sample Methods
3
  • Robust Data Summaries
  • Graphical Displays
  • Inference Confidence Intervals and Hypothesis
    Tests
  • Location, Spread, Shape
  • CI-Boxplots (notched boxplots)
  • Histograms, dotplots, kernel density estimates.

4
Absolute MagnitudePlanetary NebulaeMilky Way
Abs Mag (n 81) 17.537 15.845 15.449
12.710 15.499 16.450 14.695 14.878
15.350 12.909 12.873 13.278 15.591
14.550 16.078 15.438 14.741
5
(No Transcript)
6
(No Transcript)
7
But dont be too quick to accept normality
8
(No Transcript)
9
(No Transcript)
10
Null Hyp Pop distribution, F(x) is normal
The Kolmogorov-Smirnov Statistic
The Anderson-Darling Statistic
11
(No Transcript)
12
Anatomy of a 95 CI-Boxplot
  • Box formed by quartiles and median
  • IQR (interquartile range) Q3 Q1
  • Whiskers extend from the end of the box to the
    farthest point within 1.5xIQR.
  • For a normal benchmark distribution,
    IQR1.348Stdev and 1.5xIQR2Stdev.
  • Outliers beyond the whiskers are more than
    2.7 stdevs from the median. For a normal
    distribution this should happen about .7 of the
    time.
  • Pseudo Stdev .75xIQR

13
The confidence interval and hypothesis test
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
Additional Remarks The median is a robust
measure of location. It is not affected by
outliers. It is efficient when the population has
heavier tails than a normal population. The sign
test is also robust and insensitive to outliers.
It is efficient when the tails are heavier than
those of a normal population. Similarly for
the confidence interval. In addition, the test
and the confidence interval are distribution free
and do not depend on the shape of the underlying
population to determine critical values or
confidence coefficients. They are only 64
efficient relative to the mean and t-test when
the population is normal. If the population is
symmetric then the Wilcoxon Signed Rank statistic
can be used, and it is robust against outliers
and 95 efficient relative to the t-test.
20
Two-Sample Methods
21
  • Two-Sample Comparisons
  • 85 CI-Boxplots
  • Mann-Whitney-Wilcoxon Rank Sum Statistic
  • Estimate of difference in locations
  • Test of difference in locations
  • Confidence Interval for difference in locations
  • Levenes Rank Statistic for differences in scale
  • or variance.

22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
Why 85 Confidence Intervals?
  • We have the following test of
  • Rule reject the null hyp if the 85 confidence
  • intervals do not overlap.
  • The significance level is close to 5 provided
  • the ratio of sample sizes is less than 3.

29
Mann-Whitney-Wilcoxon Statistic The sign
statistic on the pairwise differences.
Unlike the sign test (64 efficiency for normal
population, the MWW test has 95.5 efficiency for
a normal population. And it is robust
against outliers in either sample.
30
(No Transcript)
31
Mann-Whitney Test and CI App Mag, Abs Mag
N Median App Mag
(M-31) 360 14.540 Abs Mag (MW) 81
-10.557 Point estimate for d is 24.900 95.0
Percent CI for d is (24.530,25.256) W
94140.0 Test of d0 vs d not equal 0 is
significant at 0.0000 What is W?
32
(No Transcript)
33
What about spread or scale differences between
the two populations? Below we shift the MW
observations to the right by 24.9 to line up
with M-31.
Variable StDev IQR PseudoStdev
MW 1.804 2.420 1.815
M-31 1.195 1.489 1.117
34
Levenes Rank Test
Compute Y Med(Y) and X Med(X), called
absolute deviations. Apply MWW to the absolute
deviations. (Rank the absolute deviations) The
test rejects equal spreads in the two populations
when difference in average ranks of the absolute
deviations is too large. Idea After we have
centered the data, then if the null hypothesis of
no difference in spreads is true, all
permutations of the combined data are roughly
equally likely. (Permutation Principle) So
randomly select a large set of the permutations
say B permutations. Assign the first n to the Y
sample and the remaining m to the X sample and
compute MMW on the absolute deviations. The
approximate p-value is MMW gt original MMW
divided by B.
35
Difference of rank mean abso devs 51.9793
So we easily reject the null hypothesis of no
difference in spreads and conclude that the two
populations have significantly different spreads.
36
One Sample Methods
k-Sample Methods
Two Sample Methods
37
(No Transcript)
38
Variable Mean StDev Median .75IQR Skew
Kurtosis Messier 31 22.685 0.969 23.028
1.069 -0.67 -0.67 Messier 81 24.298
0.274 24.371 0.336 -0.49 -0.68 NGC
3379 26.139 0.267 26.230 0.317 -0.64
-0.48 NGC 4494 26.654 0.225 26.659 0.252
-0.36 -0.55 NGC 4382 26.905 0.201
26.974 0.208 -1.06 1.08
All one-sample and two-sample methods can be
applied one at a time or two at a time. Plots,
summaries, inferences. We begin k-sample methods
by asking if the location differences between the
NGC nebulae are statistically significant. We
will briefly discuss issues of truncation.
39
(No Transcript)
40
Kruskal-Wallis Test on NGC sub N
Median Ave Rank Z 1 45 26.23
29.6 -9.39 2 101 26.66 104.5
0.36 3 59 26.97 156.4
8.19 Overall 205 103.0 KW 116.70
DF 2 P 0.000 This test can be followed by
multiple comparisons. For example, if we assign
a family error rate of .09, then we would conduct
3 MWW tests, each at a level of .03. (Bonferroni)
41
  • What to do about truncation.
  • See a statistician
  • Read the Johnson, Morrell, and Schick reference.
    and then
  • see a statistician.
  • Here is the problem Suppose we want to estimate
    the difference in locations
  • between two populations F(x) and G(y) F(y
    d).
  • But (with right truncation at a) the observations
    come from

Suppose d gt 0 and so we want to shift the
X-sample to the right toward the truncation
point. As we shift the Xs, some will pass the
truncation point and will be eliminated from the
data set. This changes the sample sizes and
requires adjustment when computing the
corresponding MWW to see if it is equal to its
expectation. See the reference for details.
42
  • What more can we do?
  • Multiple regression
  • Analysis of designed experiments (AOV)
  • Analysis of covariance
  • Multivariate analysis

These analyses can be carried out using the
website http//www.stat.wmich.edu/slab/RGLM/
43
Professor Lundquist, in a seminar on compulsive
thinkers, illustrates his brain stapling
technique.
44
The End
Write a Comment
User Comments (0)
About PowerShow.com