Title: Tricks for trees: Having reconstructed phylogenies what can we do with them?
1Tricks for trees Having reconstructed
phylogenies what can we do with them?
Mike Steel Allan Wilson Centre for Molecular
Ecology and Evolution Biomathematics Research
Centre University of Canterbury, Christchurch,
New Zealand
DIMACS, June 2006
2Where are phylogenetic trees used?
- Evolutionary biology species relationships,
dating divergences, speciation processes,
molecular evolution. - Ecology classifying new species biodiversity,
co-phylogeny, migration of populations. - Epidemiology systematics, processes, dynamics
- Extras - linguistics, stematology, psychology.
3Phylogenetic trees
- Definition A phylogenetic X-tree is a tree
T(V,E) with a set X of labelled leaves, and all
other vertices unlabelled and of degree gt3. - If all non-leaf vertices have degree 3 then T is
binary
4Trees and splits
3
1
2
4
5
6
Partial order
Bunemans Theorem
5Quartet trees
- A quartet tree is a binary phylogenetic tree on
4 leaves (say, x,y,w,z) written xywz.
x
w
y
z
- A phylogenetic X-tree displays xywz if there is
an edge in T whose deletion separates x,y from
w,z
x
y
w
r
z
u
s
6Corresponding notions for rooted trees
- Clusters (in place of splits)
- Triples in place of quartets
7How are trees useful in epidemiology?
- Systematics and reconstruction
- How are different types/strains of a virus
related? - When, where, and how did they arise?
- What is their likely future evolution?
- What was the ancestral sequence?
8How are trees useful in epidemiology?
- Processes and dynamics (Phylodynamics)
- How do viruses change with time in a population?
Population size etc - What is their rate of mutation, recombination,
selection? - Within-host dynamcs
- How do viruses evolve in a single patient?
- How is this related to the progression of the
disease? - How much compartmental variation exists?
9(No Transcript)
10What do the shapes of these trees tell us about
the processes governing their evolution?Eg.
Population dynamics, selection
Coalescent prediction
11Tree shapes (non-metric)
George Yule
12(No Transcript)
13Why do trees on the same taxa disagree?
- Model violation
- true model differs from assumed model
- true model assumed model but estimation
method not
appropriate to model - model true but too parameter rich
(non-identifyability) - Sampling error (and factors that make it worse!)
- Alignment error
- Evolutionary processes
- Lineage sorting
- Recombination
- Horizontal gene transfer hybrid taxa
- Gene duplication and loss
14Sampling error thats hard to deal with
Sequences
Sequences
Sequences
Sequences
T2
T1
T3
T4
Time
?
e
15Example Deep divergence in the Metazoan
phylogeny
From Huson and Bryant, 2006
16Models
2
1
1
3
vs
3
4
2
4
Finite state Markov process
17Models
3
1
3
1
vs
2
2
4
4
- site saturation
- subdividing long edges only offers a partial
remedy (trade-off).
18Why do trees on the same taxa disagree?
- Model violation
- true model differs from assumed model
- true model assumed model but estimation
method not
appropriate to model - model true but too parameter rich
(non-identifyability) - Sampling error (and factors that make it worse!)
- Alignment
- Evolutionary processes
- Lineage sorting
- Recombination
- Horizontal gene transfer hybrid taxa
- Gene duplication and loss
19Gene trees vs species trees
a b c
a b c
- Theorem J. H. Degnan and N.A. Rosenberg, 2006.
- For ngt5, for any tree, there are branch lengths
and population sizes for which the most likely
gene tree is different from the species tree. - Discordance of species trees with their most
likely gene trees. - PLoS Genetics, 2(5), e68 May, 2006
20Example
?
Orangutan
Human
Gorilla
Chimpanzee
Adapted From the Tree of the Life
Website,University of Arizona
21Distinguishing between signals
- Lineage sorting vs sampling error vs HGT
A B C
A B C
A C B
22Why do trees on the same taxa disagree?
- Model violation
- true model differs from assumed model
- true model assumed model but estimation
method not
appropriate to model - model true but too parameter rich
(non-identifyability) - Sampling error (and factors that make it worse!)
- Alignment
- Evolutionary processes
- Lineage sorting
- Recombination
- Horizontal gene transfer hybrid taxa
- Gene duplication and loss
23Given a tree what questions might we want to
answer?
- How reliable is a split?
- Where is the root of the tree? Relative ranking
of vertices? Dating? - How well supported is some deep divergence
resolved? - What model best describes the evolution of the
sequences - (molecular clock? dS/dN ratio constant? etc)
- Statistical approaches
- Non-parametric bootstrap
- Parametric bootstrap
- Likelihood ratio tests
- Bayesian posterior probabilities
- Tests (KH, SH, SOWH)
- Goldman, N., J. P. Anderson, and A. G. Rodrigo.
2000. - Likelihood-based tests of topologies in
phylogenetics. Systematic Biology 49 652-670.
24From Steve Thompson, Florida State Uni
25Example
26Non-parametric bootstrap
27(No Transcript)
28Dealing with incompatibility Consensus trees
- Strict
- Majority rule
- Semistrict consensus
29Consensus networks
- Take the splits that are in at least x of the
trees and represent them by a graph - Splits Graph (G(S)) Dress and Huson
- Each split is represented by a class of
parallel edges - Simplest example (n4).
30 (NS)
(NS)
(SS)
(A)
(A)
(SS)
(NS)
(NS)
(SS)
(SS)
(SS)
(SS)
(SS)
(NS)
(NS)
(N,NS)
(N)
chloroplast JSA tree
(C,S)
(NS, N)
(SS)
(NS)
(SS)
31 (SS)
(A)
(SS)
(SS)
(SS)
(SS)
(NS)
(SS)
(SS)
(N)
(SS)
(NS,N)
(A)
(NS)
(NS)
(NS)
(SS,NS)
(NS)
(NS)
(NS,N)
nuclear ITS tree
(SS)
(NS)
(SS)
32consensus network (ITStreeJSAtree)
I
III
II
R.nivicola
33Maximum agreement subtrees
- Concept
- Computational complexity
34Comparing trees
- Splits metric (Robinson-Foulds)
- Statistical aspects.
- Tree rearrangement operations the graph of
- trees (rSPR).
- Cophylogeny
35Co-phylogeny (m. charleston)
36Supertrees
- Compatibility concept
- Compatibility of rooted trees (BUILD)
- Why do we want to do this?
- Extension higher order taxa, dates
- Methods for handling incompatible trees
- (MRP mincut variants minflip)
37Compatibility
A set Q of quartets is compatible if there is a
phylogenetic X-tree T that displays each quartet
of Q
- Example Q1234, 1345, 1426
Complexity?
38Supertrees
- Compatibility concept
- Compatibility of rooted trees (BUILD)
- Why do we want to do this?
- Extension higher order taxa, dates
- Methods for handling incompatible trees
- (MRP mincut variants minflip)
39Phylogenetic networks
- Consensus setting consensus networks
- Minimizing hybrid/reticulate vertices
- Supernetworks Z closure, filtering
40d
a
b
a
d
a
b
c
c
d
c
b
- Networks can represent
- Reticulate evolution (eg. hybrid species)
- Phylogenetic uncertainty (i.e. possible
alternative trees) - Z-closure Given T1,, Tk on overlapping sets of
species, - let
- construct spcl2(S) and construct the
- splits graph of the resulting splits that are
full.
41Split closure operation (Meacham 1986)
,
42(No Transcript)
43(No Transcript)
44Reconstructing ancestral sequences
- Methods (MP, Likelihood, Bayesian)
- Quiz. MP for a balanced tree majority state?
- Information-theoretic considerations
45Statistics of parsimony (clustering on a tree)