Title: The multispecies coalescent: implications for inferring species trees
1The multispecies coalescent implications for
inferring species trees
- James Degnan
- 21 February 2008
2Outline
- 1. Background
- --gene trees vs. species trees
- --coalescence and incomplete lineage sorting
-
- 2. Inferring species trees
- --Concatenation
- --Consensus Trees
- 3. Conclusions
3Population Genetics and Phylogenetics
- Population genetics traditionally used to
analyze single populations.
4Desirable properties of species tree estimators
- 1. Statistical consistency (sample size of
genes)
2. Efficiency
3. Robustness to violations in assumptions
5Bridging the popgen/phylo divide
Incorporation of explicit models of lineage
sorting will be needed for continued development
of phylogenetic inference near the species
level. Maddison and Knowles (2006).
- Closer integration of population-genetic factors
in phylogenetics, including further insights into
gene-tree/species tree, and horizontal gene
transfer. --from Mike Steels website, My pick
for five directions in phylogenetics that will
grow in the next five years (2006).
6The coalescent process
7One population
8Multiple populations/species
9Gene tree in a species tree
10Model species tree with gene tree
A B C
D
The gene tree is a random variable. The gene tree
distribution is parameterized by the species tree
topology and internal branch lengths.
11How can we compute probabilities of gene trees
given species trees?
-Under a coalescent model, probabilities for gene
trees with three species were derived by Nei
(1987) 1-(2/3)e-T
-Probabilities for the gene tree to match the
species tree topology for 4 and 5 species given
by Pamilo and Nei (1988).
-All 30 species tree/gene tree combinations for 4
species given by Rosenberg (2002).
- -General case solved by Degnan and Salter (2005)
and implemented by program COAL. Also allows
individuals sampled in species i.
12Definition a coalescent history is a list of the
populations in which each coalescent event occurs.
A B C
D
This coalescent history (1,3,3)
Other coalescent histories (2,3,3), (3,3,3)
13Gene tree probabilities
14Gene tree probabilities
15Data from Ebersberger et al. 2007. Mol. Biol.
Evol. 242266-2276.
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37Definition a gene tree which is more probable
than the gene tree matching the species tree is
called an anomalous gene tree (Degnan and
Rosenberg, 2006).
Theorem 1. For the asymmetric species tree
topology with four species and for any species
tree topology with more than four species, there
exist branch lengths such that at least one gene
tree is anomalous (Degnan and Rosenberg, 2006).
38Is species tree inference consistent in this
setting?
1. Concatenation?
2. Consensus?
39Species Tree inferenceconcatenation
- Species Trees are often estimated by
concatenating several gene sequences and
analyzing as one (data from Chen and Li, 2001).
Gene 1 Human
CTTGAATAATTTTTAC Chimp CTTCAATAATTTTTAC Gorilla
TTTGAATAATTTTTAC Orang CTTGAATAATTTTTAT
Gene 2 TAGAGTTTCCTTGTGGTG TAGAGTTTCCTTGTGGTA TAGAG
TTTCCTTGTGGTA CAGAGTTTCCTTGTGGTC
Gene 3 CGGTTT TGGTTT TGGTTT CRGTTT
40Concatenation and gene tree discordance
- How does concatenation perform when sequences are
generated from different topologies?
concatenated sequence
41Trees inferred from concatenated sequences
(Kubatko and Degnan, 2007)
42Is species tree inference consistent in this
setting?
1. Concatenation? No.
2. Consensus?
43Consensus (majority-rule)
44Types of consensus trees
Majority ruleconsensus tree has all clades that
were observed in gt 50 of trees.
Greedysort clades by their proportions. Accept
the most frequently observed clades one at a time
that are compatible with already accepted clades.
Do this until you have a fully resolved tree.
45Asymptotic consensus trees
Consensus trees are usually statistics, functions
of data like x-bar.
Definition an asymptotic consensus tree is the
tree that is obtained by computing the consensus
tree using topology probabilities from the
multispecies coalescent model.
Motivation if there are a large number of
independent loci, observed gene tree, clade, and
rooted triple proportions should approximate
their theoretical probabilities.
46(No Transcript)
47(No Transcript)
48(No Transcript)
49Majority-rule unresolved zone
50Too-greedy zone
51Is species tree inference consistent in this
setting?
1. Concatenation? No.
2. Consensus? Yes (R), no for greedy and
majority-rule.
52Are consensus trees inconsistent estimators of
species trees?
- Theorem 2. (i) Majority-rule asymptotic
consensus trees (MACTs) do not have any clades
not on the species tree. (ii) Majority-rule
unresolved zones exist for any species tree
topology with n 3 species.
Theorem 3. Greedy asymptotic consensus trees
(GACTs) can be misleading estimators of species
trees for the 4-species asymmetric tree and for
any species tree with n gt 4 species.
Theorem 4. R asymptotic consensus trees (RACTs)
always match the species tree.
53What about finite samples?
- If you sample 10 loci, you could have
- All 10 match the species tree
- 9 match the species tree, 1 disagrees
- 8 match the species tree, 2 disagree, etc.
- You can consider gene trees as categories and use
multinomial probabilities for the probability of
your sample
54R consensus, y 0.4, x 0.6
55Conclusion
- Coalescent gene tree probabilities can be used to
prove or disprove the statistical consistency of
species tree estimators.
56(No Transcript)
57R consensus, y x 0.1
58(No Transcript)
59(No Transcript)