Comparing Two Samples - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Comparing Two Samples

Description:

Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. Gentle, JE (2002) Elements of ... var.text(set1, set2) This performs Fisher's F test ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 16
Provided by: harry66
Category:
Tags: comparing | samples | set1 | two

less

Transcript and Presenter's Notes

Title: Comparing Two Samples


1
Comparing Two Samples
  • Harry R. Erwin, PhD
  • School of Computing and Technology
  • University of Sunderland

2
Resources
  • Crawley, MJ (2005) Statistics An Introduction
    Using R. Wiley.
  • Gentle, JE (2002) Elements of Computational
    Statistics. Springer.
  • Gonick, L., and Woollcott Smith (1993) A Cartoon
    Guide to Statistics. HarperResource (for fun).
  • Freund and Wilson (1998) Regression Analysis.
    Academic Press.

3
Don't Complicate Things
  • Use
  • var.test to compare two variances (Fisher's F)
  • t.test to compare two means (Student's t)
  • wilcox.test to compare two means with non-normal
    errors (Wilcoxon's rank test)
  • prop.test (binomial test) to compare two
    proportions
  • cor.test (Pearson's or Spearman's rank
    correlation) to correlate two variables
  • chisq.test (chi-square test)) or fisher.test
    (Fisher's exact test) to test for independence in
    contingency tables

4
Comparing Two Variances
  • Before comparing means, verify that the variances
    are not significantly different.
  • var.text(set1, set2)
  • This performs Fisher's F test
  • If the variances are significantly different, you
    can transform the output (y) variable, or you can
    still use the t.test (Welch's modified test).

5
Comparing Two Means
  • Student's t-test (t.test) assumes the samples are
    independent, the variances constant, and the
    errors normally distributed. It will use the
    Welch-Satterthwaite approximation (default, less
    power) if the variances are different. This test
    can also be used for paired data.
  • Wilcoxon rank sum test (wilcox.test) is used for
    independent samples, errors not normally
    distributed. If you do a transform to get
    constant variance, you will probably have to use
    this test.

6
Paired Observations
  • The measurements will not be independent.
  • Use the t.test with pairedT.
  • When you can do a paired t.test, you should
    always do the paired test.
  • Deals with blocking, spatial correlation, and
    temporal correlation.

7
Sign Test
  • Used when you can't measure a difference but can
    see it.
  • Use the binomial test (binom.test) for this.
  • Binomial tests can also be used to compare
    proportions. prop.test

8
Chi-square Contingency Tables
  • Deals with count data.
  • Suppose there are two characteristics (hair
    colour and eye colour). The null hypothesis is
    that they are uncorrelated.
  • Create a matrix that contains the data and apply
    chisq.test(matrix).
  • This will give you a p-value for matrix values
    given the assumption of independence.

9
Fisher's Exact Test
  • Used for analysis of contingency tables when one
    or more of the expected frequencies is less than
    5.
  • Use fisher.test(x)

10
Correlation and Covariance
  • Are two parameters correlated significantly?
  • Create and attach the data.frame
  • Apply cor(data.frame)
  • To determine the significance of a correlation,
    apply cor.test(data.frame)
  • You have three options Kendall's tau (method
    "k"), Spearman's rank (method "s"), or
    (default) Pearson's product-moment correlation
    (method "p")

11
Kolmogorov-Smirnov Test
  • Are two sample distributions significantly
    different?
  • or
  • Does a sample distribution arise from a specific
    distribution?
  • ks.test(A,B)

12
Statistical Problems
  • Observations
  • Outliers
  • Unequal variances
  • Correlated errors

13
Outliers and Influential Observations
  • Extreme responses are called outliers and extreme
    inputs are called leverage points.
  • An observation that has great influence on the
    estimates is usually an outlier and a leverage
    point.
  • Use the residual plot to detect them.
  • Remediate by verifying the correctness of the
    observation. It may also reflect a factor not
    present in any of the other observations.

14
Unequal variances
  • Mentioned earlier.
  • Use
  • non-parametric statistics (usually not effective
    for regression)
  • robust methods
  • rescaling
  • live with it

15
Correlated errors
  • The measurements are not independent selection
    of the sample units was not strictly random.
    Frequent problem with time series data but can
    also reflect spatial correlation.
  • An autoregressive model.
  • Try special models
Write a Comment
User Comments (0)
About PowerShow.com