Comparing multiple tests for separating populations - PowerPoint PPT Presentation

About This Presentation
Title:

Comparing multiple tests for separating populations

Description:

Paper presented at the Fifth International Conference on Multiple Comparisons, ... With the collaboration of Rhonda Kowalchek and Harvey Keselman, we are ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 43
Provided by: julie241
Category:

less

Transcript and Presenter's Notes

Title: Comparing multiple tests for separating populations


1
Comparing multiple tests for separating
populations
  • Juliet Popper Shaffer
  • Paper presented at the Fifth International
    Conference on Multiple Comparisons, Vienna, July
    10, 2007

2
Outline
  • Background
  • Original separation concepts
  • Revised separation concepts
  • Planned comparisons of different FDR and
    FWER-controlling methods
  • Selected examples with FDR-controlling methods
  • Summary and description of further planned work.

3
Background
  • I begin thinking about this problem in the early
    1970s, when I was approached by a faculty member
    with a rather common situation.
  • He had compared means of three treatments in an
    analysis of variance followed by pairwise tests.
    He found treatment 1 and 3 significantly
    different, but neither 1 and 2 nor 2 and 3
    significantly different.

4
  • I pointed out that this was a rather common
    outcome. His response was
  • What am I supposed to do with that?
  • A good question No clear interpretation.
  • The pattern of results of pairwise tests is
    important.

5
  • Consider four treatments.
  • Suppose the outcome of pairwise treatments is
  • (a) 14 significant, 13 significant,24
    significant.
  • (b) 14 significant, 13 significant,12
    significant.
  • (b) is clearly interpretable, (a) is not.

6
  • (a) 1 2 3 4
  • ---------------
  • ------------------
  • -----------------
  • (b) 1 2 3 4
  • ---------------------------------------

7
Original separation concepts
  • I developed a measure of interpretability of the
    outcome of pairwise tests and published the
    description with a comparison of FWER-controlling
    methods including a new one for comparing three
    treatments as
  • Complexity An interpretability criterion for
    multiple comparisons (JASA, 1981).

8
  • A pattern was defined as simple if it consisted
    of distinct groupings.
  • The measure was the number of additional
    rejections necessary to make the pattern simple.
    For 3 treatments, this is a reasonable measure

9
  • 3 treatments either no rejections or at least
    two rejections are necessary to achieve a simple
    pattern
  • Complexity
  • 2 if overall test is significant but no pairwise
    differences are significant
  • 1 if one pairwise difference is significant
  • 0 is two or three pairwise differences are
    significant or nothing is significant.
  • i.e. given that overall equality is rejected and
    1 -3 would be rejected before 1-2 or 2-3, simple
    patterns are
  • 1 2 3 (2 rejections), 1 2 3 (2
    rejections), or
  • -------
    -------
  • 1 2 3 (3 rejections)

10
  • The results were interesting, and the F test
    followed by individual t tests resulted in
    greater average simplicity than the range test,
    when both controlled FWER.
  • The study was limited to three treatments.

11
  • For more than 3 treatments, there are a
    multiplicity of patterns (e.g. 15 for four
    groups).
  • It is also less clear that the measure used is
    best with more than three groups, and average
    complexity is certainly harder to interpret in
    that case.
  • Furthermore, it seems desirable to distinguish
    true patterns from false patterns. If a pattern
    is false, a complex pattern is arguably more
    desirable than a simple pattern.

12
  • Since that time, the issue has been raised
    occasionally by others, so I decided to try again
    with a simpler way of dividing patterns.
  • Also, with new concepts of error control
    especially FDR, it seemed interesting to see
    whether clearer patterns would emerge with
    FDR-controlling methods.

13
Revised separation concepts
  • Ill discuss patterns of treatment means,
    although this can be generalized to other
    parameters.
  • Following Hartley (1955), Ill call sets of
    populations with equal means (usually assumed
    identical) clusters.

14
  • True Pattern a set of K clusters of sizes
    n1,n2, ,nk, of n true means. (If exact
    equality is considered impossible, think of
    virtual equality.)
  • Observed outcome Set of rejections of subset
    equality hypotheses.
  • Outcome clusters Subsets of sample means
    declared significantly different from all other
    means, with no subclusters within them.

15
  • True outcome clusters Outcome clusters in which
    all true means within the cluster are greater
    than all true means below it and smaller than all
    true means above it.
  • False outcome clusters Outcome clusters that are
    not true.
  • If there is no separation into clusters, the
    number of outcome clusters is defined as zero.

16
  • Note that there may be rejections within a
    cluster, as long as they dont separate it into
    subclusters.
  • Pure true outcome clusters True outcome
    clusters with no false rejections.

17
  • Note that there can be true rejections within a
    pure true outcome cluster if it contains true
    subclusters as long as there are no false
    rejections within it.
  • False rejections refers to either rejecting
    equality when a pair is equal (Type I error), or
    rejecting equality when a pair is unequal, but
    deciding the difference is in the wrong direction
    (Type III error).

18
  • False cluster rate Expected value of the ratio
    of false observed clusters to total observed
    clusters, defined as zero if there are no
    observed clusters.
  • Various measures of cluster power.

19
Comparisons of different FDR- and
FWER-controlling methods
  • Note that it isnt clear that more liberal
    methods will produce more true outcome clusters,
    more pure true outcome clusters, or a smaller
    false cluster rate.
  • With the collaboration of Rhonda Kowalchek and
    Harvey Keselman, we are conducting a large study
    of cluster measures as well as standard error and
    power measures with several methods, all at
    nominal level .05 for either FWER or FDR control.

20
True mean configurations
  • Were looking at true configurations in which one
    mean is different from all K-1 others, and at
    various other cluster configurations of 3, 4, 8
    and possibly 12 means. The work is still in
    progress.

21
Methods
  • FWER-controlling
  • Tukey-Welsch multiple range test
  • Modified Peritz multiple range test
  • FDR-controlling
  • Benjamini-Hochberg original stepup method (BH)
  • Yekutieli-revised BH method with proven FDR
    control
  • Newman-Keuls method with empirical evidence and
    limited proofs of FDR control (NK)

22
  • The Newman-Keuls method (NK) is little used these
    days. It is a multiple range method.
  • Let M1lt M2 lt Mn be the sample means of
    Populations P1, P2, , Pn with true means µ1, µ2,
    , µn, respectively.
  • For simplicity Ill describe the method assuming
    the populations are identical except for possible
    location shift.

23
  • Let. rj-i1,a be the a-critical value of the
    range of j i sample means. Then
  • H? µi µj is rejected if
  • Mj' Mi' gt rj'-i'1 for all j j, i i.
  • In other words, it is identical in form to the
    Tukey-Welsch multiple range method, but every
    subrange is tested for significance at the same
    level a.

24
BH and NK
  • Ill present some comparisons of these two
    FDR-controlling methods.
  • Significant pairwise comparisons are ordered
    differently in these, since BH is based on
    individual pairwise p-values, and NK is a
    multiple-range-based method. This makes the
    comparison of cluster outcomes especially
    interesting.

25
  • BH The FWER increases with the number of
    populations being compared.
  • NK In addition to apparent FDR control, the NK
    has the additional property that the FWER is
    controlled at the nominal level a within each
    cluster. Thus either method can have the larger
    FWER, depending on the number of populations and
    the number of clusters.

26
True clusters (K-1)(1)
  • BH apparently controls FDR according to
    simulation results. NK controls FDR, since it
    controls FWER in this case.
  • With one true outcome cluster, it must be a pure
    true cluster. With two true outcome clusters,
    there may be one or two pure true clusters.

27
True clusters (K-1)(1)
  • Simulation results indicate that there are more
    true outcome clusters and pure true outcome
    clusters with NK than with BH through most of the
    range, and the difference is greater with pure
    true outcome clusters. (When there are 1 or 2
    means in each cluster, every true outcome cluster
    is a pure true outcome cluster.)

28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
Two clusters, more than 1 mean in each
  • The following slides show results for clusters
    (2)(2) and (2)(4).

32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
False cluster rate
  • The false cluster rate seems to be generally
    higher for NK than for BH, and in fact can get
    higher than might be desired for both. The worst
    case is that in which there are two means in each
    cluster, since then one Type I error may result
    in two false clusters, while that cant happen
    with more than two means in a cluster.

36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
Summary
  • Gave the background for an interest in separating
    populations into clusters and previous ways of
    formulating the problem.
  • Described new measures of population separation.
  • Compared Newman-Keuls and Benjamini-Hochberg
    methods on these measures in two-cluster examples.

42
Further work
  • More combinations of numbers of clusters and
    numbers of means within clusters will be
    examined.
  • FWER-controlling methods will be compared among
    themselves and with FDR-controlling methods.
  • F-type measures will be added.
  • Nonparametric versions of the various methods
    will be examined.
  • Proofs of properties will be extended if possible.
Write a Comment
User Comments (0)
About PowerShow.com