Phylogenetic supertrees: seeing the data for the trees presentation

About This Presentation

Transcript and Presenter's Notes

Title: Phylogenetic supertrees: seeing the data for the trees

1
Phylogenetic supertreesseeing the data for the
trees

Olaf R. P. Bininda-Emonds
Technische Universität München

2
Outline

the fundamental issue characters versus trees
open questions are trees data?
loss of contact with primary character data
loss of information
novel solutions
data duplication
the nature of supertrees
analytical issues
conclusions
are supertrees a valid phylogenetic technique?

3
The fundamental issues
4
The basic distinction

Supertrees
source data phylogenies
basic unit membership criterion / statement of
relationship
at best, can be viewed as a proxy for a shared
derived character

Conventional studies
source data measurable attribute of an organism
basic unit character
can be viewed as a putative statement of
relationship

5
The fundamental issue

supertrees combine trees, not real data
has led to many criticisms of supertree
construction
but also lends advantages to the approach

6
Supertree construction
7
Supertree methods

Direct
strict consensus supertrees
MinCutSupertree (and variants)
semi-strict supertrees
Lanyon (1993)
Goloboff and Pol (2002)

Indirect
most matrix representation (MR) supertrees
parsimony (MRP and variants)
compatibility (MRC)
minimum flip supertrees (MRF)
average consensus (MRD)
gene tree parsimony

8
Are trees data?
9
Open questions

loss of contact with raw (character) data
loss of information
novel solutions
data duplication
the nature of supertrees consensus or
phylogenetic hypothesis?
analytical issues

10
Loss of information

a tree is a graphical representation of the
primary signal in a character-based data set
strength of primary signal can be measured (e.g.,
bootstrap frequencies)
but information regarding nature of any
conflicting subsignals lost

11
Potential problems

all trees and clades on them have equal support a
priori
prevents signal enhancement (sensu de Queiroz
et al., 1995) in combined data sets
coherent subsignals in different data partitions,
when combined, outweigh conflicting primary
signals
throwing away of information should cause a
supertree analysis to be less accurate than a
total evidence one, where primary data are
combined

12
No loss of accuracy

simulation studies indicate loss of information
is not detrimental
MRP (and variants) (Bininda-Emonds and Sanderson,
2001)
average consensus (Lapointe and Levasseur, 2001)
both methods perform about on a par with total
evidence analyses of primary character data
and show similar behaviour to total evidence
analyses

13
Maximizing contact

weighting according to evidential support in
source trees
possible for all MR methods, average consensus,
and MinCutSupertree (and gene tree parsimony?)
causes MRP to outperform total evidence analyses
of primary character data in simulation
(Bininda-Emonds and Sanderson, 2001)
bootstrapping of primary character data
both non-parametric (Moore et al., in prep) and
parametric versions (Huelsenbeck et al., in prep)

14
Non-parametric bootstrapped supertrees
)
(
consensus of supertrees
bootstrapped source trees
bootstrapped supertree
original data
15
Open questions

loss of contact with raw (character) data
loss of information
novel solutions
data duplication
the nature of supertrees consensus or
phylogenetic hypothesis?
analytical issues

16
Novel clades

all supertree methods have the potential to yield
novel statements
relationships between taxa that do not co-exist
on any single source tree (sensu Sanderson et
al., 1998)
defining characteristic of method

17
Unsupported clades

some supertree methods have the potential to make
statements that are not only novel, but also
contradicted (unsupported) by every source tree
violation of a weaker form of co-Pareto property
co-Pareto relationship of a given kind in the
consensus is present in at least one input tree

from Goloboff and Pol (2002)

19
Comparing supertree methods

indirect, optimization-based methods seem more
prone to producing unsupported clades

?
MRP (and variants)
MRF?
average consensus?
MinCutSupertree (and variants)?
gene tree parsimony?

?
strict consensus supertrees
semi-strict supertrees
MRC

20
Questions unsupported clades

how should they be treated?
how common are they?

21
Appropriateness

Conventional studies
unsupported clades (at level of resulting trees)
arise via signal enhancement
have direct character support in the combined
matrix

Supertrees
subsignals are invisible
unsupported clades lack any support among source
trees ? should be regarded as spurious (Pisani
and Wilkinson, 2002)
not equivalent to signal enhancement

22
(No Transcript)
23
Incidence of unsupported clades

circumstantial evidence hints that they are rare
only a few reported in the literature
theoretical Goloboff and Pol (2002) Wilkinson
et al. (2001)
empirical Bininda-Emonds and Bryant (1998)
Wilkinson et al. (2001)
estimated that 8 of the 198 clades in the
carnivore MRP supertree ( 4) had no support
among the source trees (Bininda-Emonds et al.,
1999)
dinosaur MRP supertree (Pisani et al., 2002) has
no unsupported clades

24
Unsupported clades are very rare

simulation results (MRP only)
occur most often with source trees that are
few in number (n 5)
large in size (up to 50 taxa)
possess identical taxon sets (consensus
setting)
most often means lt 0.21 of all simulated
clades
overall incidence was 131 of 282 137 clades (lt
0.05)
empirical results
both the carnivore and lagomorph MRP supertrees
have no unsupported clades whatsoever

25
Open questions

loss of contact with raw (character) data
loss of information
novel solutions
data duplication
the nature of supertrees consensus or
phylogenetic hypothesis?
analytical issues

26
Data duplication

character data are often recycled between
phylogenetic analyses
e.g., total evidence analyses, molecular studies
of the same gene
the same character data may contribute to more
than one source tree
overrepresented in a supertree analysis ? data
duplication
also violates assumption of data non-independence

data duplication among cetartio-dactyl source
trees in the Liu et al. (2001) mammal order MRP
supertree
from Gatesy et al. (2002)

28
Minimizing duplication

data duplication a potential problem for all
supertree methods
use of trees does not reveal directly source of
underlying data set
but can be minimized / avoided with careful data
collection protocols
e.g., supertrees of Daubin et al. (2001) and
Kennedy and Page (2002) lack data duplication

29
Is data duplication unavoidable?

no phylogenies are independent given a single
Tree of Life
all characters and data sources have been subject
to the same evolutionary processes and history
want to combine phylogenetic hypotheses that can
reasonably be viewed as being independent

30
Is the problem overrated?

supertrees combine phylogenetic hypotheses
emergent property ? composed of more than their
raw character data
manipulation of data (weighting, alignment,
recoding)
method and assumptions of analysis
for example
strongly conflicting molecular phylogenies for
whales can be explained largely by the choice of
outgroup (Messenger and McGuire, 1998)
alignment and weighting of primary data also
important

31
Is data duplication overrated?

data duplication is often only partial
most combined data sets represent unique
combinations of individual data sets
easy to deal with data sets that are supersets of
others
signal enhancement means that each unique
combination could justifiably be viewed as an
independent hypothesis
also independent from constituent data sets

32
Are supertrees unfairly singled out?

data duplication also exists in conventional
studies (but less obviously so and to a lesser
known extent)
morphological ? single features often described
by multiple characters
molecular ? secondary structure (e.g., stems in
tRNA, protein folding) and codon structure mean
primary mutations may require secondary
compensatory ones
total evidence ? mixing of phenotypic and
genotypic data must represent data duplication at
some level

33
Open questions

loss of contact with raw (character) data
loss of information
novel solutions
data duplication
the nature of supertrees consensus or
phylogenetic hypothesis?
analytical issues

34
The nature of supertrees

is the supertree itself a legitimate phylogenetic
hypothesis?
many would say no, arguing instead that they
are a
form of consensus
historical summary of systematic effort
therefore, supertrees should not be used to
answer biological questions

35
Supertrees as consensus

association derives from
similar methodology (combining trees rather than
data)
both containing polytomies
resulting topologies may be suboptimal given
underlying data
why are consensus trees not valid phylogenetic
hypotheses?
especially if polytomies viewed as soft rather
than hard

36
Dealing with incongruence

all supertree methods must somehow deal with
incongruence among source trees
ignore it strict consensus, semi-strict,
MinCutSupertree, MRC
fix it MRF
explain it biologically gene tree parsimony
optimize it average consensus and MRP

37
Incongruence as homoplasy

a repeated criticism of MRP is that inferred
homoplasy on supertree has no biological meaning
convergence and reversals meaningless with
respect to a membership criterion
but why is MRP singled out?
similar arguments should apply at least to
average consensus

38
Parsimony and parsimony

Principle of parsimony
a criterion for deciding among scientific
theories or explanations
Plurality should not be posited without
necessity ? choose the simplest explanation of a
phenomenon

Cladistic parsimony
specific application of principle of parsimony
prefer the tree with the fewest number of
evolutionary steps (i.e., character state
changes)
additional changes over minimum number represent
homoplasy

39
Homoplasy and supertrees

notions of homoplasy, convergence, and reversals
have nothing to do with parsimony per se
or really even with cladistic parsimony
post hoc biological interpretation of
incongruence
incongruence on an MRP supertree is simply
incongruence
idea of homoplasy in this context is
epistemologically, not biologically meaningless

40
Open questions

loss of contact with raw (character) data
loss of information
novel solutions
data duplication
the nature of supertrees consensus or
phylogenetic hypothesis?
analytical issues

41
Limitations of total evidence

analytical limitations of combined primary data
sets also result in a loss of information
data must be compatible
use of a single optimization criterion ? usually
MP, but ML now also possible
some data still not analyzable under either
framework (e.g., DNA-DNA hybridization,
morphometric data)
use of simplistic models of evolution
MP differential weighting (including titv
ratio)
ML same model for every partition
alignment problems

42
Advantages to supertrees

no loss of information all phylogenetic
hypotheses can be combined
even those that arent based on any data
process amounts to partitioned analyses
each partition can be analyzed according to most
appropriate model of evolution, and optimization
criterion
can be done in parallel
results then combined with little loss of
accuracy
or hopefully less than loss of information for a
total evidence analysis entails

43
A phylogeny of mammals

The superteam
have complete supertrees for
Carnivora
Chiroptera
Insectivora
Lagomorpha
Marsupialia
Primates
total of 1923 species (41.5)

Molecular data
Murphy et al. (2001a)
9779 bp from 18 genes for 64 species
Madsen et al. (2001)
8655 bp from 4 genes for 82 species
Murphy et al. (2001b)
16 397 bp from 22 genes for 44 species (lt 1)

44
Summary
45
Whither supertrees?

criticisms of supertree construction have been
launched at two levels
at the supertree approach as a whole
at individual supertree methods

46
Of approaches

supertree problem inherently difficult because of
missing data
results in the lack of a single right answer

47
Of approaches

trees are data
potential loss of information not detrimental
key is to think in terms of phylogenetic
hypotheses
still awaiting a response from the cladistic
community

48
and methods

all methods will go astray if its assumptions are
violated
e.g., parsimony and long-branch attraction,
likelihood and wrong model, regression and data
non-independence
for supertrees, key is to try and establish
what each methods boundary conditions are
how robust each method is to violations of its
assumptions
what the properties of each method are (in
relation to our desired objective)

Write a Comment

User Comments (0)

About PowerShow.com

Phylogenetic supertrees: seeing the data for the trees PowerPoint PPT Presentation