Comparing Two Samples - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

Comparing Two Samples

Description:

Number of Views:45

Avg rating:3.0/5.0

Slides: 16

Provided by: harry66

Category:

Tags: comparing | samples | set1 | two

Transcript and Presenter's Notes

Title: Comparing Two Samples

1
Comparing Two Samples

2
Resources

Crawley, MJ (2005) Statistics An Introduction
Using R. Wiley.
Gentle, JE (2002) Elements of Computational
Statistics. Springer.
Gonick, L., and Woollcott Smith (1993) A Cartoon
Guide to Statistics. HarperResource (for fun).
Freund and Wilson (1998) Regression Analysis.
Academic Press.

3
Don't Complicate Things

Use
var.test to compare two variances (Fisher's F)
t.test to compare two means (Student's t)
wilcox.test to compare two means with non-normal
errors (Wilcoxon's rank test)
prop.test (binomial test) to compare two
proportions
cor.test (Pearson's or Spearman's rank
correlation) to correlate two variables
chisq.test (chi-square test)) or fisher.test
(Fisher's exact test) to test for independence in
contingency tables

4
Comparing Two Variances

Before comparing means, verify that the variances
are not significantly different.
var.text(set1, set2)
This performs Fisher's F test
If the variances are significantly different, you
can transform the output (y) variable, or you can
still use the t.test (Welch's modified test).

5
Comparing Two Means

Student's t-test (t.test) assumes the samples are
independent, the variances constant, and the
errors normally distributed. It will use the
Welch-Satterthwaite approximation (default, less
power) if the variances are different. This test
can also be used for paired data.
Wilcoxon rank sum test (wilcox.test) is used for
independent samples, errors not normally
distributed. If you do a transform to get
constant variance, you will probably have to use
this test.

6
Paired Observations

7
Sign Test

8
Chi-square Contingency Tables

Deals with count data.
Suppose there are two characteristics (hair
colour and eye colour). The null hypothesis is
that they are uncorrelated.
Create a matrix that contains the data and apply
chisq.test(matrix).
This will give you a p-value for matrix values
given the assumption of independence.

9
Fisher's Exact Test

Used for analysis of contingency tables when one
or more of the expected frequencies is less than
5.
Use fisher.test(x)

10
Correlation and Covariance

Are two parameters correlated significantly?
Create and attach the data.frame
Apply cor(data.frame)
To determine the significance of a correlation,
apply cor.test(data.frame)
You have three options Kendall's tau (method
"k"), Spearman's rank (method "s"), or
(default) Pearson's product-moment correlation
(method "p")

11
Kolmogorov-Smirnov Test

12
Statistical Problems

13
Outliers and Influential Observations

Extreme responses are called outliers and extreme
inputs are called leverage points.
An observation that has great influence on the
estimates is usually an outlier and a leverage
point.
Use the residual plot to detect them.
Remediate by verifying the correctness of the
observation. It may also reflect a factor not
present in any of the other observations.

14
Unequal variances

15
Correlated errors

The measurements are not independent selection
of the sample units was not strictly random.
Frequent problem with time series data but can
also reflect spatial correlation.
An autoregressive model.
Try special models

Write a Comment

User Comments (0)