Ch.14 Nonparametric Statistical Method - PowerPoint PPT Presentation

About This Presentation
Title:

Ch.14 Nonparametric Statistical Method

Description:

Ch'14 Nonparametric Statistical Method – PowerPoint PPT presentation

Number of Views:725
Avg rating:3.0/5.0
Slides: 136
Provided by: youn2
Category:

less

Transcript and Presenter's Notes

Title: Ch.14 Nonparametric Statistical Method


1
Ch.14Nonparametric Statistical Method
2
  • Ahmad Yusuf
  • Yoojung Chung
  • Malek Deib
  • Mihir Shah
  • Jung Yeon Lee
  • Hyun Yoon
  • Nadia Saleh
  • Se Hyun Ji
  • Chan Min Park
  • Kyunghyun Ma
  • Mi Jeong Kim
  • Wonchang Chio
  • Tarique Jawed
  • Hyun Keun Cho

3
What is NONPARAMETRIC Statistics?
  • Normality doesnt hold for all data.
  • Similarly, some data may not have any particular
    fixed distribution such as Binomial or Poisson.
  • Such sets of data are called Non-parametric data
    or Distribution-free .
  • We use nonparametric tests for these populations.

4
When do we use NONPARAMETRIC Statistics?
  • the population distribution is highly skewed or
    very heavily tailed.
  • Median is a better measure to find the center
    than the mean.
  • The sample size is small (usually less than 30)
  • and not normal
  • (we find that out using SAS orother
    statistical programs).

Mean
Median
5
14.1.1 Sign Test and Confidence Interval
Sign test for a Single Sample We want to
test a hypothesis at a significant level if
the true median is above a certain known value .
6
14.1.1 Sign Test and Confidence Interval
Example THERMOSTAT DATA Perform the
sign test to
determine if the median
setting is different from
the design setting of 2000 F.
202.2 203.4
200.5 202.5
206.3 198.0
203.7 200.8
201.3 199.0
.
7
14.1.1 Sign Test and Confidence Interval
STEP 1 We find the signs of each sample by
comparing with 200.
STEP 2
202.2 gt 200 203.4 gt 200
200.5 gt 200 202.5 gt 200
206.3 gt 200 198.0 lt 200
203.7 gt 200 200.8 gt 200
201.3 gt 200 199.0 lt 200
.
8
14.1.1 Sign Test and Confidence Interval

What do we do if there is a Tie?
.
9
14.1.1 Sign Test and Confidence Interval
STEP 3 Why Binomial? Well, S and S- are
the only two variables in the sample set, n.
.
.
10
14.1.1 Sign Test and Confidence Interval
STEP 4 Since they both S and S-
have the same binomial distribution, we can
denote a common r.v S S bin (n, ½).
.
11
14.1.1 Sign Test and Confidence Interval
Now we can calculate the p-value using the
binomial distribution alternatively,
.
12
14.1.1 Sign Test and Confidence Interval
STEP 5 We compare our p-value with the
significant level
13
14.1.1 Sign Test and Confidence Interval
For large sample
14
14.1.1 Sign Test for matched pairs
Sign test for Matched Pairs When
observations are matched Then
- S the positive differences
- S- the negative
differences Note the magnitude of the
differences is not known
When pairs are matched P(A,B) P(B,A)
15
14.1.1 Sign Test for matched pairs
No. Method A Method B Difference No Method A Method B Differences
i xi yi di i xi yi di
1 6.3 5.2 1.1 14 7.7 7.4 0.3
2 6.3 6.6 -0.3 15 7.4 7.4 0
3 3.5 2.3 1.2 16 5.6 4.9 0.7
4 5.1 4.4 0.7 17 6.3 5.4 0.9
5 5.5 4.1 1.4 18 8.4 8.4 0
6 7.7 6.4 1.3 19 5.6 5.1 0.5
7 6.3 5.7 0.6 20 4.8 4.4 0.4
8 2.8 2.3 0.5 21 4.3 4.3 0
9 3.4 3.2 0.2 22 4.2 4.1 0.1
10 5.7 5.2 0.5 23 3.3 2.2 1.1
11 5.6 4.9 0.7 24 3.8 4 -0.2
12 6.2 6.1 0.1 25 5.7 5.8 -0.1
13 6.6 6.3 0.3 26 4.1 4 0.1
16
14.1.1 Sign Test for matched pairs
Note that for the matched paired test all tied
entries (xi yi) are disregarded Then n23
since xi yi, for i15,18,21
S 20
S- 3 Using
17
14.1.1 Sign Test for matched pairs
Two sided p-value
2(1- F(3.336))0.0008 -This indicates a
significant difference between Method A
and Method B

.
18
14.1.2 Wilcoxon Signed Rank Test
Who is Frank Wilcoxon? Born September 2 1892
Wilcoxon was born to American parents in County
Cork, Ireland. Frank Wilcoxon grew up in
Catskill, New York although he did receive pat of
his education in England. In 1917 he graduated
from Pennsylvania Military College with a B.S. He
then received his M.S. in Chemistry in 1921 from
Rutgers University. In 1924 he received his PhD
from Cornell University in Physical Chemistry.
In 1945 he published a paper containing two
tests he is most remembered for, the Wilcoxon
signed-rank test and the Wilcoxon rank-sum test.
His interest in statistics can be accredited to
R.A. Fisher's text, Statistical Methods for
Research Worker (1925).   Over the
course of his career Wilcoxon published 70 papers.
19
14.1.2 Wilcoxon Signed Rank Test
Who is Frank Wilcoxon?
Born to be wild! And a statistician.
20
14.1.2 Wilcoxon Signed Rank Test
Alternative method to the Sign Test The
Wilcoxon Signed Rank Test Improves on the Sign
Test. Unlike the sign test the Wilcoxon Signed
Rank Test not only looks at whether xigtµ or
xiltµ, but it also considers the magnitude of the
difference dixi-µ0.
21
14.1.2 Wilcoxon Signed Rank Test
Note Wilcoxon Signed Rank Test
assumes that the observed population
distribution is symmetric. (Symmetry
is not required for the Sign Test)
.
22
14.1.2 Wilcoxon Signed Rank Test
Step 1 Rank order the differences di in terms
of their absolute values. Step 2 w sum ri
(ranks) of the positive differences. w- sum ri
(ranks) of the negative differences. if we
assume no ties Then w w- r1 r2 rn 1
2 3 n

23
14.1.2 Wilcoxon Signed Rank Test
Step 3 reject H0 if w is large or if
w- is small!!
24
14.1.2 Wilcoxon Signed Rank Test
The size of w , w- needed to reject H0 at
a is determined using the distributions of the
corresponding W , W- r.v.s when H0 is true.
Since the null distributions are identical and
symmetric the common r.v. is denoted by W.
p-value P(W w) P(W
w-) Reject H0 if p-value is a
25
14.1.2 Wilcoxon Signed Rank Test
Zi Bernoulli (P) Pp(xi gt µ0) ,
P1/2
1 if ith rank corresponds to a positive sign0
if ith rank corresponds to a negative sign
E (w) E(S iZi)
E(1Z12Z2nZn)
E(1Z1)E(2Z2)E(nZn) 1E(Z1)
2E(Z2)nE(Zn), E(Z1) E(Z2)E(Zn)
1E(Z1) 2E(Z1)nE(Z1)
(123n) E(Z1)
26
14.1.2 Wilcoxon Signed Rank Test
Var(W) Var(SiZi)
Var(1Z12Z2nZn)
Var(1Z1)Var(3Z2)Var(nZn)
1²Var(Z1) 2²Var(Z2)n²Var(Zn)
1²Var(Z1)
2²Var(Z1)n²Var(Z1)
(1²2²n²) Var(Z1)
27
14.1.2 Wilcoxon Signed Rank Test
Then a Z-test is based on the statistic
H0 µ µ0 Ha
µ µ0 Reject H0 if Z Za

28
14.1.2 Wilcoxon Signed Rank Test
H0 µ µ0 Ha µ µ0
reject H0 if Z Za H0
µ µ0 Ha µ ? µ0
reject H0 if
(1) Z Za (2) Z
-Za The two sided p-value is 2p( W
wmax ) 2p( W wmin )
29
14.1.2 Summary
Sign Rank Test VS Sign Test
Counts the number of differences
Weighs each signed difference by its rank
If the positive differences are greater in
magnitude than the negative differences they get
a higher rank resulting in a larger value of w
This improves the power of the signed rank
test But it also affects the type I
error if the population distribution is NOT
symmetric
30
YOU WOULDNT WANT THIS TO HAPPEN!
31
14.1.2 Summary
Sign Rank Test VS Sign Test Preferred Test
And the winner is
32
14.1.2 Summary

I pity the Fu that messes with the Wilcoxon
Signed Rank Test !!!
33
14.1.2 Wilcoxon Signed Rank Test

No. Method A Method B Difference Rank No Method A Method B Differences Rank
i Xi Yi Di i Xi Yi Di
1 6.3 5.2 1.1 19.5 14 7.7 7.4 0.3 8
2 6.3 6.6 -0.3 8 15 7.4 7.4 0 -
3 3.5 2.3 1.2 21 16 5.6 4.9 0.7 15
4 5.1 4.4 0.7 15 17 6.3 5.4 0.9 17.5
5 5.5 4.1 1.4 23 18 8.4 8.4 0 -
6 7.7 6.4 1.3 22 19 5.6 5.1 0.5 12
7 6.3 5.7 0.6 17.5 20 4.8 4.4 0.4 10
8 2.8 2.3 0.5 12 21 4.3 4.3 0 -
9 3.4 3.2 0.2 5.5 22 4.2 4.1 0.1 2.5
10 5.7 5.2 0.5 12 23 3.3 2.2 1.1 19.5
11 5.6 4.9 0.7 15 24 3.8 4 -0.2 5.5
12 6.2 6.1 0.1 2.5 25 5.7 5.8 -0.1 2.5
13 6.6 6.3 0.3 8 26 4.1 4 0.1 2.5
34
14.1.2 Wilcoxon Signed Rank Test

w- 8 5.5 2.5 16then w
.
Two sided p-value 2(1 F(3.695))
0.0002
35
14.1.2 Wilcoxon Signed Rank Test
  • If di 0 then the observations are dropped and
    only the nonzero differences are retained
  • Given dis are tied for the same rank a new
    rank is assigned to them called the midrank.

36
14.1.2 Wilcoxon Signed Rank Test

No. A B Diff R No. A B Diff R
i xi yi di ri i xi yi di ri
15 7.4 7.4 0 - 8 2.8 2.3 0.5 12
18 8.4 8.4 0 - 10 5.7 5.2 0.5 12
21 4.3 4.3 0 - 19 5.6 5.1 0.5 12
12 6.2 6.1 0.1 2.5 7 6.3 5.7 0.6 18
22 4.2 4.1 0.1 2.5 4 5.1 4.4 0.7 15
25 5.7 5.8 -0.1 2.5 11 5.6 4.9 0.7 15
26 4.1 4 0.1 2.5 16 5.6 4.9 0.7 15
9 3.4 3.2 0.2 5.5 17 6.3 5.4 0.9 18
24 3.8 4 -0.2 5.5 1 6.3 5.2 1.1 20
2 6.3 6.6 -0.3 8 23 3.3 2.2 1.1 20
13 6.6 6.3 0.3 8 3 3.5 2.3 1.2 21
14 7.7 7.4 0.3 8 6 7.7 6.4 1.3 22
20 4.8 4.4 0.4 10 5 5.5 4.1 1.4 23
37
14.1.2 Wilcoxon Signed Rank Test

In the new table we see that when
n12,22,25,26 di0.1 Then d1d2d3d40.1 Th
en Therefore the new ranks of the above
differences are not 1,2,3,4 but rather 2.5
38
14.2 Inferences for independent samples
1. Wilcoxon rank sum test
  • Assumption There are no ties in the two samples.
  • Hypothesis
  • Step1 Rank all observations
  • Step2 Sum the ranks of the two samples
  • separately( sum the ranks of
    the xs,
  • sum the ranks of the ys)
  • Step3 Reject null hypothesis if is large
    or if is
  • small
  • Problem Distributions of are not
    same when

.
39
14. 2.1 Wilcoxon-Mann-Whitney Test
1. Mann Whitney test
  • Step1 Compare each with each
  • ( pairs in which , pairs in
    which )
  • Step2 Reject if is large or is
    small
  • Rank sum test statistic
  • P-value
  • For large samples, we approximate to normal, when
  • Rejection rule If
    then reject

.
40
14. 2.1. Wilcoxon-Mann-Whitney Test
Example Failure Times of Capacitors
Table 1 Times to Failure
  • c.d.f. of the control group and
    c.d.f. of the stressed group
  • T.S. w195, w276, u1 59, u2 21
  • P-value.051 from Table A.11
  • Compare with large sample normal approx
  • P-value .052

Control Group Control Group Stressed Group Stressed Group
5.2 17.1 1.1 7.2
8.5 17.9 2.3 9.1
9.8 23.7 3.2 15.2
12.3 29.8 6.3 18.3
7.0 21.1
Table 2 Ranks of Times to Failure
Control Group Control Group Stressed Group Stressed Group
4 13 1 7
8 14 2 9
10 17 3 12
11 18 5 15
6 16
41
14. 2.1. Wilcoxon-Mann-Whitney Test
Null Distribution of the Wilcoxon-Mann-Whitney
Test Statistic Assumption
Under H0, all N n1 n2 observations
come from the common distribution
F1 F2.
All possible orderings of these
observations with n1 coming
from F1 and n2 coming from
F2 are equally likely.
.
42
14. 2.1. Wilcoxon-Mann-Whitney Test
Example Find the null distribution of W1
and U1 when n1 2 and n2 2.
Ranks Ranks Ranks Ranks w1 u1 Null distn of W1 and U1 Null distn of W1 and U1 Null distn of W1 and U1
1 2 3 4 w1 u1 w1 u1 p
x x y y 3 0 3 0 1/6
x y x y 4 1 4 1 1/6
x y y x 5 2 5 2 2/6
y y x x 7 4 6 3 1/6
y x y x 6 3 7 4 1/6
y x x y 5 2
43
14. 2.2. Wilcoxon-Mann-Whitney Confidence
Interval

44
14. 2.2. Wilcoxon-Mann-Whitney Confidence
Interval

45
14. 2.2. Wilcoxon-Mann-Whitney Confidence
Interval
  • Example
  • Find 95 CI for the difference between the
    median failure times of the control group and
    thermally stressed group of capacitors the data
    from ex 14.7.

46
14. 2.2. Wilcoxon-Mann-Whitney Confidence
Interval
Table A.11 (pg. 684)
n1 n2 u1upper critical point (80-u1lower critical point) P(Wgtw1) Upper Tail Probabilities
8 10 59 (80-5921) 0.051
10 62 (80-6218) 0.027
10 63 (80-6317) 0.022
10 66 (80-6614) 0.010
10 68 (80-6812) 0.006
47
14.3 Inferences for Several Independent Samples
One-way layout experiment
The data classified according to the level of a
single treatment factor.
Completely Randomized Design
  • Comparing a gt 2 treatment.
  • The available experimental units are randomly
    assigned to each treatment.
  • No. of experimental units in different treatment
    groups does not have to be same.

48
14.3 Inferences for Several Independent Samples
Example of One-way layout experiment
  • Comparing effectiveness of different pills on
    migraine.
  • Comparing duration of different tires.
  • etc

Treatment Treatment Treatment Treatment Treatment Treatment
1 2 a
X1 1 X1 2 X1 n1 X2 1 X2 2 X2 n2 Xa 1 Xa 2 Xa na
Sample Median Sample Median T1 T1 Ta
Sample SD Sample SD S1 S2 Sa
49
14.3 Inferences for Several Independent Samples
Assumption
  • The data on the each treatment
  • form a random sample from a continuous
    c.d.f. Fi.
  • Random samples are
  • independent.
  • 3. Fi ( y ) F ( y Ti ) ,
  • where Ti is the location
  • of parameter of Fi
  • Ti Median of Fi

F1
T1
F2
T2
Fa
Ta
50
14.3 Inferences for Several Independent Samples
Hypothesis H0 F1 F2 Fa
H1 Fi lt Fj for some i j It can be
changed to H0 T1 T2 Ta H1
Ti gt Tj for some i j
Can we say that all Fis are the same?
51
14.3.1 Kruskal Wallis Test
STEP 1 STEP 2
a
Rank all N ? ni observations in ascending
order. Assign mid-ranks in case of ties.
i1
rij rank (yij)
N ( N 1 ) 2
N ? rij 1 2 N
( N 1 ) 2
E r
ni
Calculate rank sums ri ? rij and averages
ri ri / ni, i 1, 2, , a.
j1
52
14.3.1 Kruskal Wallis Test
STEP 3 STEP 4
Calculate the Kruskal-Wallis test statistic
12 N ( N 1 )
( N 1 ) 2
a
2
kw ? ni ( ri - )
i1
12 N ( N 1 )
ri
2
a
? - 3( N 1 )
ni
i1
Reject H0 for large value of kw.
If nis are large, kw follows chi-square dist.
with a-1 degrees of freedom.
53
14.3.1 Kruskal Wallis Test
Example
NRMA, the worlds biggest car insurance company,
has decided to test the durability of tires from
4 major companies.
54
14.3.1 Kruskal Wallis Test
Example



Average Test Scores Average Test Scores Average Test Scores Average Test Scores
Different Tires from 4 major co. Different Tires from 4 major co. Different Tires from 4 major co. Different Tires from 4 major co.

14.59 23.44 25.43 18.15 20.82 14.06 14.26 20.27 26.84 14.71 22.34 19.49 24.92 20.20 27.82 24.92 28.68 23.32 32.85 33.90 23.42 33.16 26.93 30.43 36.43 37.04 29.76 33.88
Ranks of Average Test Scores
3 13 16 5 9 1 2
8 17 4 10 6 14.5 7
19 14.5 20 11 23 26 12
24 18 22 27 28 21 25
24.92 24.92
28
1
?
49 66.5 125.5 165
55
14.3.1 Kruskal Wallis Test
Example
Can we say, Tires from 4 different companies
have the same median???
56
14.3.1 Kruskal Wallis Test
Example
kw
57
14.3.1 Kruskal Wallis Test
Example
According to chi-square distribution, there is a
significant difference between tires!!!
kw 18.34 gt X3,.005 12.837
58
14.3.2 Pairwise Comparisons
Comparing 2 groups among treatments
H0 E ( Ri Rj ) 0 and Var( Ri
Rj )
N ( N 1 ) 12
1 ni
1 nj
( )
For large nis, Ri Rj follows approximately
normally distributed.
ri - rj
zij
N ( N 1 ) 12
1 ni
1 nj
( )

59
14.3.2 Pairwise Comparisons
To control the type I familywise error rate at
level
IzijI statistic should be referred to
appropriate Studentized range distribution.
Tukey Method ( Chapter. 12)
60
14.3.2 Pairwise Comparisons
  • No. of treatment group compared a
  • Degree of freedom 8
  • (assumption sample is large)

Compare with critical constant q a, 8,. .
or
61
14.3.2 Pairwise Comparison
Example
Ranks of Average Test Scores Ranks of Average Test Scores Ranks of Average Test Scores Ranks of Average Test Scores
Different Tires from 4 major co. Different Tires from 4 major co. Different Tires from 4 major co. Different Tires from 4 major co.

3 13 16 5 9 1 2 8 17 4 10 6 14.5 7 19 14.5 20 11 23 26 12 24 18 22 27 18 21 25
49 66.5 125.5 165
?
62
14.3.2 Pairwise Comparison
Example
Let be .05.
We differ from GOODYEAR!!!
11.29
I r1 r4 I , I r1 r4 I gt 11.29
63
14.4 Inferences for Several Matched Samples
Randomized block design Friedman test
treatment groups and blocks.
A distribution-free rank-based test for comparing
the treatments in the randomized block
design Hypothesis H0 F1j F2j Faj H1
Fij lt Fkj for some i k It can be changed
to H0 T1 T2 Ta H1 Ti gt Tk for some i
k
64
14.4.1 Friedman Test
STEP 1 STEP 2
a
Rank all N ? ni observations in ascending
order. Assign mid-ranks in case of ties.
i1
rij rank (yij)
b
Calculate rank sums ri ? rij , i 1, 2,
, a.
j1
65
14.4.1 Friedman Test
STEP 3 STEP 4
Calculate the Friedman test statistic
12 ab( a 1 )
b ( a 1 ) 2
a
2
fr ? ( ri -
)
i1
12 ab( a 1 )
a
2
? - 3b( a 1 )
ri
i1
Reject H0 for large value of fr.
If nis are large, fr follows chi-square dist.
with a-1 degrees of freedom.
66
14.4.1 Friedman Test
Example
Drip loss in Meat Loaves Drip loss in Meat Loaves Drip loss in Meat Loaves Drip loss in Meat Loaves Drip loss in Meat Loaves Drip loss in Meat Loaves Drip loss in Meat Loaves Drip loss in Meat Loaves
Oven Position Batch Batch Batch Batch Batch Batch Rank sum
Oven Position 1 Rank 2 Rank 3 Rank Rank sum
1 2 3 4 5 6 7 8 7.33 3.22 3.28 6.44 3.83 3.28 5.06 4.44 8 1 2.5 7 4 2.5 6 5 8.11 3.72 5.11 5.78 6.50 5.11 5.11 4.28 8 1 4 6 7 4 4 2 8.06 4.28 4.56 8.61 7.72 5.56 7.83 6.33 7 1 2 8 5 3 6 4 23 3 8.5 21 16 9.5 16 11
67
14.4.1 Friedman Test
Example
Friedman test statistic equals
12 ab( a 1 )
a
2
fr ? - 3b( a 1 )
ri
i1
12 839
2
2
2
2
2
2
2
2

23 3 8.5 21 16 9.5 16 11 339
17.583 gt 16.012
significant differences between the oven positions
However, No. of blocks is only 3 the large
sample chi-square approximation may not be
accurate.
68
14.4.2 Pairwise Comparisons
Comparing 2 groups among treatments
H0 E ( Ri Rj ) 0 and Var( Ri
Rj )
a ( a 1 ) 6b
As in the case of the Kruskal-Wallis test, i and
j can be declared different at significance level
if
69
14.5 Rank Correlation Methods
What is Correlation? Correlation
indicates the strength and direction of a linear
relationship between two random variables.
In general statistical usage, correlation
to the departure of two variables from
independence. Correlation does not
imply causation.
.
70
14.5.1 Spearmans Rank Correlation Coefficient
Charles Edward Spearman
BTW, he looks like Sean Connery
  • Born September 10, 1863
  • Died September 7, 1945 (82 years old)
  • An English psychologist known for work in
    statistics, as a pioneer of factor analysis, and
    for Spearman's rank correlation coefficient.

.
71
14.5.1 Spearmans Rank Correlation Coefficient
What are we correlating?
  • Yearly alcohol consumption from wine
  • Yearly heart disease (Per 100,000)
  • 19 Country Study

.
72
D A T A
73
14.5.1 Spearmans Rank Correlation Coefficient
Spearmans Rank Correlation Coefficient
  • A nonparametric (distribution-free) rank
    statistic proposed in 1904 as a measure of the
    strength of the associations between two
    variables.
  • The Spearman rank correlation coefficient can be
    used to give a measure of monotone association
    that is used when the distribution of the data
    make Pearson's correlation coefficient
    undesirable.

74
14.5.1 Spearmans Rank Correlation Coefficient
Relevant Formulas
If Di is integer then
75
14.5.1 Spearmans Rank Correlation Coefficient
Examples
  • From previous data we calculate

76
14.5.1 Spearmans Rank Correlation Coefficient
Hypothesis Testing Using Spearman
  • Ho X and Y are independent
  • Ha X and Y are positively associated

77
14.5.1 Spearmans Rank Correlation Coefficient
  • For large values of N (gt 10) is approximated by
    the normal distribution with a mean

Using the test statistic
78
14.5.1 Spearmans Rank Correlation Coefficient
Examples
  • From previous data we calculate
  • P-value 0.0004

79
14.5.2 Kendalls Rank Correlation Coefficient
  • born September 6, 1907
  • died March 29, 1983 (76 years old)
  • Maurice Kendall was born in Kettering, North
    Hampton shire
  • He studied mathematics at St. John's College,
    Cambridge, where he played cricket and chess
  • After graduation as a Mathematics Wrangler in
    1929, he joined the British Civil Service in the
    Ministry of Agriculture. In this position he
    became increasingly interested in using
    statistics.
  • Developed the rank correlation coefficient in
    1948.

80
14.5.2 Kendalls Rank Correlation Coefficient
Kendalls Rank Correlation Coefficient
  • A pair of Bivariate random variables
  • Concordant
  • Which implies

AND or AND
81
14.5.2 Kendalls Rank Correlation Coefficient
Kendalls Rank Correlation Coefficient
  • Discordant
  • Which implies

AND or AND
82
14.5.2 Kendalls Rank Correlation Coefficient
Kendalls Rank Correlation Coefficient
  • Tied Pair
  • Which implies

OR
OR BOTH
83
14.5.2 Kendalls Rank Correlation Coefficient
Relevant Formula
84
14.5.2 Kendalls Rank Correlation Coefficient
Relevant Formula
Nc of Concordant Pairs Nd of Discordant
Pairs
85
14.5.2 Kendalls Rank Correlation Coefficient
Formula Continued
If there are ties the formula is modified
Suppose there are g groups of tied Xis with aj
tied observations in the jth group and h groups
of tied Yis with bj tied observations in the jth
group.
86
14.5.2 Kendalls Rank Correlation Coefficient
Formula Explanation
  • Five pairs of observations(x,y)
  • (1,3)
  • (1,4)
  • (1,5)
  • (2,5)
  • (3,4)
  • There is g1 group of a13 tied xs equal to 1
    and there are h2 groups of tied ys
  • Group 1 has b12 tied ys qual to 4 and group 2
    has b22 tied ys equal to 5.

87
14.5.2 Kendalls Rank Correlation Coefficient
Formula Example continued
88
Data
89
14.5.2 Kendalls Rank Correlation Coefficient
Testing Example
90
14.5.2 Kendalls Rank Correlation Coefficient
Hypothesis Testing
91
14.5.2 Kendalls Rank Correlation Coefficient
Testing Example
  • P-value lt 0.0001

92
14.5.3 Kendalls Coefficient of Concordance
Kendalls Coefficient of Concordance
Q Why do we need Kendalls Coefficient of
Concordance? A It is a measure of association
between several matched samples. Q Why
not use Kendalls Rank Correlation
Coefficient instead? A Because its only works
for two samples.
93
14.5.3 Kendalls Coefficient of Concordance
Kendalls Coefficient of Concordance
  • How can you apply this to real life?
  • A common interesting example
  • A taste-testing experiment used four tasters to
    rank eight recipes with the following results.
    Are the tasters in agreement??
  • Hmm lets find out! ?

94
14.5.3 Kendalls Coefficient of Concordance
Kendalls Coefficient of Concordance
  • Taster
    Rank
  • Recipe 1 2 3 4
    Sum
  • 1 5 4 5 4
    18
  • 2 7 5 7 5
    24
  • 3 1 2 1 3
    7
  • 4 3 3 2 1
    9
  • 5 4 6 4 6
    20
  • 6 2 1 3 2
    8
  • 7 8 7 8 8
    31
  • 8 6 8 6 7
    27

95
14.5.3 Kendalls Coefficient of Concordance
How does it work?
  • It is closely related to Freidmans test
    statistic (mentioned in 14.4).
  • The a treatments are candidates (recipes).
  • The b blocks are judges (Tasters).
  • Each judge ranks the a candidates.

96
14.5.3 Kendalls Coefficient of Concordance
Kendalls Coefficient of Concordance
  • The discrepancy of the actual rank sums under
    perfect disagreement as defined by
  • Is a measure of agreement between the judges

97
14.5.3 Kendalls Coefficient of Concordance
Kendalls Coefficient of Concordance
  • The maximum value of this measure is attained
    when there is perfect agreement
  • It is given by

98
14.5.3 Kendalls Coefficient of Concordance
Kendalls Coefficient of Concordance
  • Kendalls w statistic
  • Is an estimate of the variance of the row sums of
    ranks Ri divided by the maximum possible value
    the variance can take
  • This occurs when all judges are in agreement.
  • Hence

99
14.5.3 Kendalls Coefficient of Concordance
Kendalls Coefficient of Concordance
  • What relationship does w and fr, Freidmans
    statistic have?
  • Does the Kendalls w statistic relate to the
    Spearmans rank correlation coefficient?
  • only when a2

100
14.5.3 Kendalls Coefficient of Concordance
Kendalls Coefficient of Concordance
  • Q How can we perform statistical tests?
  • What distribution does it follow?
  • In order to perform a test on w for statistical
    significance
  • Use chi-square distribution.
  • Use (n-1) degrees of freedom.

101
Kendalls Coefficient of Concordance
  • In order to find out whether or not tasters are
    in agreement, we calculate the Kendalls
    coefficient of concordance.
  • Freidmans statistic fr24.667
  • Therefore,
  • w 24.667/ (4)(8) 0.881 ,
  • Comparing fr24.667 with 14.067,
    since fr exceeds this critical value
  • ? we conclude that tasters agree ?.

102
14.6.1 Permutation Tests
Permutation Test
1) General Idea
A permutation test is a type of statistical
significance test in which a reference
distribution is obtained by calculating all
possible values of the test statistic under
rearrangements the labels on the observed data
points. Confidence intervals can be derived from
the tests.
2) Inventor
The theory has evolved from the works of R.A.
Fisher and E.J.G. Pitman in the 1930s.
103
14.6.1 Permutation Tests
Major Theory Derivation
  • The permutation test finds a p-value as the
    proportion of
  • regroupings that would lead to a test statistic
    as extreme as
  • the one observed. Well consider the
    permutation test based
  • on sample averages, although one could
    computing and
  • comparing other test statistics
  • We have two samples that we with to compare
  • Hypotheses
  • Ho differences between two samples are due to
    chance
  • Ha sample 2 tends to have higher values than
    sample 1 not
  • due to simply to chance
  • Ha sample 2 tends to have smaller values than
    sample 1, not
  • due simply to chance
  • Ha there are differences between the two
    samples not just to
  • chance

104
14.6.1 Permutation Tests
To See if the observed difference d from our data
supports Ho or one of the selected alternatives,
do the following steps of a Permutation Test
105
Ms. Merry Huilin Ma
106
14.6.2 Bootstrap Method
1) General Idea
Bootstrapping is a statistical method for
estimating the sampling distribution of an
estimator by sampling with replacement from the
original sample, most often with the purpose of
deriving robust estimates of standard errors and
confidence intervals of a population parameter
like a mean, median, proportion, odds ratio,
correlation coefficient or regression
coefficient.
2) Inventor
Bradley Efron(1938-present)'s work has spanned
both theoretical and applied topics, including
empirical Bayes analysis, applications of
differential geometry to statistical inference,
the analysis of survival data, and inference for
microarray gene expression data.
Homepage http//stat.stanford.edu/brad/ E-mail
brad_at_stat.stanford.edu
107
14.6.2 Bootstrap Method
3) Major Theory and Derivation
Consider the cases where a random sample of size
n is drawn from an unspecified probability
distribution, The basic steps in the bootstrap
procedure are following
108
14.6.3 Jackknife Method
1) General Idea
Jackknife is a statistical method for estimating
and compensating for bias and for deriving robust
estimates of standard errors and confidence
intervals. Jackknifed statistics are created by
systematically dropping out subsets of data one
at a time and assessing the resulting variation
in the studied parameter.
2) Inventor
Richard Edler von Mises(1883 - 1953) was a
scientist who worked on practical analysis,
integral and differential equations, mechanics,
hydrodynamics and aerodynamics, constructive
geometry, probability calculus, statistics and
philosophy.
109
14.6.3 Jackknife Method
3) Major Theory and Derivation
Now we briefly describe how it is possible to
obtain the standard deviation of a generic
estimator using the Jackknife method. For
simplicity we consider the average estimator. Let
us consider the variables
where X is the sample average. X(i) is the
sample average of the data set deleting the ith
point. Then we can define the average of x(i)
The jackknife estimate of standard deviation is
then defined as
110
macro _SASTASK_DROPDS(dsname) IF
SYSFUNC(EXIST(dsname)) THEN DO DROP TABLE
dsname END IF SYSFUNC(EXIST(dsname,
VIEW)) THEN DO DROP VIEW dsname END men
d _SASTASK_DROPDS LET _EGCHARTWIDTH0 LET
_EGCHARTHEIGHT0 PROC SQL _SASTASK_DROPDS(WORK
.SORTTempTableSorted) QUIT PROC SQL CREATE
VIEW WORK.SORTTempTableSorted AS SELECT
ScoreChange FROM MIHIR.AMS572 QUIT TITLE TITLE1
"Distribution analysis of ScoreChange" Title2
" Wilcoxon Rank Sum Test" ODS EXCLUDE CIBASIC
BASICMEASURES EXTREMEOBS MODES MOMENTS
QUANTILES PROC UNIVARIATE DATA
WORK.SORTTempTableSorted MU00 VAR
ScoreChange HISTOGRAM / NOPLOT RUN
QUIT PROC SQL _SASTASK_DROPDS(WORK.SORTTempTabl
eSorted) QUIT
SAS program
111
SAS program
Distribution analysis of ScoreChange Wilcoxon
Rank
Sum Test
The UNIVARIATE Procedure Variable ScoreChange
(Change in Test Scores)
Tests for Location Mu00 Tests for Location Mu00 Tests for Location Mu00 Tests for Location Mu00 Tests for Location Mu00
Test Statistic Statistic p Value p Value
Student's t t -0.80079 Pr gt t 0.4402
Sign M -1 Pr gt M 0.7744
Signed Rank S -8.5 Pr gt S 0.5278
112
SAS program
/Kruskal-Wallis Test and Wilcoxon-Mann-Whitney
Test / macro _SASTASK_DROPDS(dsname) IF
SYSFUNC(EXIST(dsname)) THEN DO DROP TABLE
dsname END IF SYSFUNC(EXIST(dsname,
VIEW)) THEN DO DROP VIEW dsname END men
d _SASTASK_DROPDS LET _EGCHARTWIDTH0 LET
_EGCHARTHEIGHT0 PROC SQL _SASTASK_DROPDS(WORK
.TMP0TempTableInput) QUIT PROC SQL CREATE
VIEW WORK.TMP0TempTableInput AS SELECT PreTest,
Gender FROM MIHIR.AMS572 QUIT TITLE TITLE1
"Nonparametric One-Way ANOVA" PROC NPAR1WAY
DATAWORK.TMP0TempTableInput WILCOXON VAR
PreTest CLASS Gender RUN QUIT PROC
SQL _SASTASK_DROPDS(WORK.TMP0TempTableInput) QU
IT
113
Nonparametric One-Way ANOVA The NPAR1WAY
Procedure
SAS program
Wilcoxon Scores (Rank Sums) for Variable PreTestClassified by Variable Gender Wilcoxon Scores (Rank Sums) for Variable PreTestClassified by Variable Gender Wilcoxon Scores (Rank Sums) for Variable PreTestClassified by Variable Gender Wilcoxon Scores (Rank Sums) for Variable PreTestClassified by Variable Gender Wilcoxon Scores (Rank Sums) for Variable PreTestClassified by Variable Gender Wilcoxon Scores (Rank Sums) for Variable PreTestClassified by Variable Gender
Gender N Sum ofScores ExpectedUnder H0 Std DevUnder H0 MeanScore
F 7 40.0 45.50 6.146877 5.714286
M 5 38.0 32.50 6.146877 7.600000
Average scores were used for ties. Average scores were used for ties. Average scores were used for ties. Average scores were used for ties. Average scores were used for ties. Average scores were used for ties.
Wilcoxon Two-Sample Test Wilcoxon Two-Sample Test
Statistic 38.0000
   
Normal Approximation  
Z 0.8134
One-Sided Pr gt Z 0.2080
Two-Sided Pr gt Z 0.4160
   
t Approximation  
One-Sided Pr gt Z 0.2166
Two-Sided Pr gt Z 0.4332
Z includes a continuity correctionof 0.5. Z includes a continuity correctionof 0.5.
Kruskal-Wallis Test Kruskal-Wallis Test
Chi-Square 0.8006
DF 1
Pr gt Chi-Square 0.3709

114
/ Wilcoxon Signed Rank Test / macro
_SASTASK_DROPDS(dsname) IF SYSFUNC(EXIST(dsna
me)) THEN DO DROP TABLE dsname END IF
SYSFUNC(EXIST(dsname, VIEW)) THEN DO DROP
VIEW dsname END mend _SASTASK_DROPDS LET
_EGCHARTWIDTH0 LET _EGCHARTHEIGHT0 PROC
SQL _SASTASK_DROPDS(WORK.SORTTempTableSorted) Q
UIT PROC SQL CREATE VIEW WORK.SORTTempTableSor
ted AS SELECT ScoreChange FROM
MIHIR.AMS572 QUIT TITLE TITLE1 "Distribution
analysis of ScoreChange" TITLE2 "Wilcoxon
Signed Rank Test" ODS EXCLUDE CIBASIC
BASICMEASURES EXTREMEOBS MODES MOMENTS
QUANTILES PROC UNIVARIATE DATA
WORK.SORTTempTableSorted MU00 VAR
ScoreChange HISTOGRAM / NOPLOT RUN
QUIT PROC SQL _SASTASK_DROPDS(WORK.SORTTempTabl
eSorted) QUIT
SAS program
115
       
SAS program
Distribution analysis of ScoreChange Wilcoxon
Signed
Rank Test
The UNIVARIATE Procedure Variable ScoreChange
(Change in Test Scores)
Tests for Location Mu00 Tests for Location Mu00 Tests for Location Mu00 Tests for Location Mu00 Tests for Location Mu00
Test Statistic Statistic p Value p Value
Student's t t -0.80079 Pr gt t 0.4402
Sign M -1 Pr gt M 0.7744
Signed Rank S -8.5 Pr gt S 0.5278

116
/Friedman Test / macro _SASTASK_DROPDS(dsname
) IF SYSFUNC(EXIST(dsname)) THEN
DO DROP TABLE dsname END IF
SYSFUNC(EXIST(dsname, VIEW)) THEN DO DROP
VIEW dsname END mend _SASTASK_DROPDS LET
_EGCHARTWIDTH0 LET _EGCHARTHEIGHT0 PROC
SQL _SASTASK_DROPDS(WORK.SORTTempTableSorted) Q
UIT PROC SQL CREATE VIEW WORK.SORTTempTableSor
ted AS SELECT Emotion, Subject, SkinResponse
FROM WORK.HYPNOSIS1493 QUIT TITLE TITLE1
"Table Analysis" TITLE2
"Results" PROC FREQ DATA WORK.SORTTempTableSor
ted ORDERINTERNAL TABLES Subject Emotion
SkinResponse / NOROW NOPERCENT NOCUM CMH
SCORESRANK ALPHA0.05 RUN QUIT PROC
SQL _SASTASK_DROPDS(WORK.SORTTempTableSorted) Q
UIT
SAS program
117
SAS program
Table Analysis Results
The FREQ Procedure Summary Statistics for
Emotion by SkinResponseControlling for Subject
Cochran-Mantel-Haenszel Statistics (Based on Rank Scores) Cochran-Mantel-Haenszel Statistics (Based on Rank Scores) Cochran-Mantel-Haenszel Statistics (Based on Rank Scores) Cochran-Mantel-Haenszel Statistics (Based on Rank Scores) Cochran-Mantel-Haenszel Statistics (Based on Rank Scores)
Statistic Alternative Hypothesis DF Value Prob
1 Nonzero Correlation 1 0.2400 0.6242
2 Row Mean Scores Differ 3 6.4500 0.0917
3 General Association 84 . .
At least 1 statistic not computed--singular
covariance matrix.Total Sample Size 32

118
/Spearman correlation/ macro
_SASTASK_DROPDS(dsname) IF SYSFUNC(EXIST(dsna
me)) THEN DO DROP TABLE dsname END IF
SYSFUNC(EXIST(dsname, VIEW)) THEN DO DROP
VIEW dsname END mend _SASTASK_DROPDS
LET _EGCHARTWIDTH0 LET _EGCHARTHEIGHT0 P
ROC SQL _SASTASK_DROPDS(WORK.SORTTempTableSorted
) QUIT PROC SQL CREATE VIEW
WORK.SORTTempTableSorted AS SELECT Arts,
Economics FROM WORK.WESTERNRATES5171 QUIT TITLE
1 "Correlation Analysis" /Sperman
Method/ PROC CORR DATAWORK.SORTTempTableSorted
SPEARMAN VARDEFDF NOSIMPLE NOPROB VAR
Arts WITH Economics RUN
SAS program
119
SAS program
  • /Kendall Method /
  • PROC CORR DATAWORK.SORTTempTableSorted
  • KENDALL
  • VARDEFDF
  • NOSIMPLE
  • NOPROB
  • VAR Arts
  • WITH Economics
  • RUN
  • RUN QUIT
  • PROC SQL
  • _SASTASK_DROPDS(WORK.SORTTempTableSorted)

120
       
SAS program
Correlation Analysis
Correlation Analysis
The CORR Procedure
The CORR Procedure

1 With Variables Economics
1 Variables Arts
1 With Variables Economics
1 Variables Arts
Spearman Correlation Coefficients, N  52 Spearman Correlation Coefficients, N  52
  Arts
Economics 0.27926
Kendall Taub Correlation Coefficients, N  52 Kendall Taub Correlation Coefficients, N  52
  Arts
Economics 0.18854

121
How did we work as a group!!
122
I dont really believe in peace
What happened to his eyes!!!!!
123
buddies
124
Statistics is funny!
How?
125
They are going to kill me. HELP!
126
Are you still taking the picture?
127
I dont know but I am looking.
Is it safe to look at the camera?
128
We love statistics
129
Losers!
130
Warm-up
131
(No Transcript)
132
This is what we do!
133
(No Transcript)
134
This is what Prof. Zhu does!!
135
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com