The multispecies coalescent: implications for inferring species trees - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

The multispecies coalescent: implications for inferring species trees

Description:

Desirable properties of species tree estimators ... Chimp CTTCAATAATTTTTAC. Gorilla TTTGAATAATTTTTAC. Orang CTTGAATAATTTTTAT. Gene 2 ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 60
Provided by: deg85
Category:

less

Transcript and Presenter's Notes

Title: The multispecies coalescent: implications for inferring species trees


1
The multispecies coalescent implications for
inferring species trees
  • James Degnan
  • 21 February 2008

2
Outline
  • 1. Background
  • --gene trees vs. species trees
  • --coalescence and incomplete lineage sorting
  • 2. Inferring species trees
  • --Concatenation
  • --Consensus Trees
  • 3. Conclusions

3
Population Genetics and Phylogenetics
  • Population genetics traditionally used to
    analyze single populations.

4
Desirable properties of species tree estimators
  • 1. Statistical consistency (sample size of
    genes)

2. Efficiency
3. Robustness to violations in assumptions
5
Bridging the popgen/phylo divide
Incorporation of explicit models of lineage
sorting will be needed for continued development
of phylogenetic inference near the species
level. Maddison and Knowles (2006).
  • Closer integration of population-genetic factors
    in phylogenetics, including further insights into
    gene-tree/species tree, and horizontal gene
    transfer. --from Mike Steels website, My pick
    for five directions in phylogenetics that will
    grow in the next five years (2006).

6
The coalescent process
7
One population
8
Multiple populations/species
9
Gene tree in a species tree
10
Model species tree with gene tree
A B C
D
The gene tree is a random variable. The gene tree
distribution is parameterized by the species tree
topology and internal branch lengths.
11
How can we compute probabilities of gene trees
given species trees?
-Under a coalescent model, probabilities for gene
trees with three species were derived by Nei
(1987) 1-(2/3)e-T
-Probabilities for the gene tree to match the
species tree topology for 4 and 5 species given
by Pamilo and Nei (1988).
-All 30 species tree/gene tree combinations for 4
species given by Rosenberg (2002).
  • -General case solved by Degnan and Salter (2005)
    and implemented by program COAL. Also allows
    individuals sampled in species i.

12
Definition a coalescent history is a list of the
populations in which each coalescent event occurs.
A B C
D
This coalescent history (1,3,3)
Other coalescent histories (2,3,3), (3,3,3)
13
Gene tree probabilities
14
Gene tree probabilities
15
Data from Ebersberger et al. 2007. Mol. Biol.
Evol. 242266-2276.
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
Definition a gene tree which is more probable
than the gene tree matching the species tree is
called an anomalous gene tree (Degnan and
Rosenberg, 2006).
Theorem 1. For the asymmetric species tree
topology with four species and for any species
tree topology with more than four species, there
exist branch lengths such that at least one gene
tree is anomalous (Degnan and Rosenberg, 2006).
38
Is species tree inference consistent in this
setting?
1. Concatenation?
2. Consensus?
39
Species Tree inferenceconcatenation
  • Species Trees are often estimated by
    concatenating several gene sequences and
    analyzing as one (data from Chen and Li, 2001).

Gene 1 Human
CTTGAATAATTTTTAC Chimp CTTCAATAATTTTTAC Gorilla
TTTGAATAATTTTTAC Orang CTTGAATAATTTTTAT
Gene 2 TAGAGTTTCCTTGTGGTG TAGAGTTTCCTTGTGGTA TAGAG
TTTCCTTGTGGTA CAGAGTTTCCTTGTGGTC
Gene 3 CGGTTT TGGTTT TGGTTT CRGTTT
40
Concatenation and gene tree discordance
  • How does concatenation perform when sequences are
    generated from different topologies?

concatenated sequence
41
Trees inferred from concatenated sequences
(Kubatko and Degnan, 2007)
42
Is species tree inference consistent in this
setting?
1. Concatenation? No.
2. Consensus?
43
Consensus (majority-rule)
44
Types of consensus trees
Majority ruleconsensus tree has all clades that
were observed in gt 50 of trees.
Greedysort clades by their proportions. Accept
the most frequently observed clades one at a time
that are compatible with already accepted clades.
Do this until you have a fully resolved tree.
45
Asymptotic consensus trees
Consensus trees are usually statistics, functions
of data like x-bar.
Definition an asymptotic consensus tree is the
tree that is obtained by computing the consensus
tree using topology probabilities from the
multispecies coalescent model.
Motivation if there are a large number of
independent loci, observed gene tree, clade, and
rooted triple proportions should approximate
their theoretical probabilities.
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
Majority-rule unresolved zone
50
Too-greedy zone
51
Is species tree inference consistent in this
setting?
1. Concatenation? No.
2. Consensus? Yes (R), no for greedy and
majority-rule.
52
Are consensus trees inconsistent estimators of
species trees?
  • Theorem 2. (i) Majority-rule asymptotic
    consensus trees (MACTs) do not have any clades
    not on the species tree. (ii) Majority-rule
    unresolved zones exist for any species tree
    topology with n 3 species.

Theorem 3. Greedy asymptotic consensus trees
(GACTs) can be misleading estimators of species
trees for the 4-species asymmetric tree and for
any species tree with n gt 4 species.
Theorem 4. R asymptotic consensus trees (RACTs)
always match the species tree.
53
What about finite samples?
  • If you sample 10 loci, you could have
  • All 10 match the species tree
  • 9 match the species tree, 1 disagrees
  • 8 match the species tree, 2 disagree, etc.
  • You can consider gene trees as categories and use
    multinomial probabilities for the probability of
    your sample

54
R consensus, y 0.4, x 0.6
55
Conclusion
  • Coalescent gene tree probabilities can be used to
    prove or disprove the statistical consistency of
    species tree estimators.

56
(No Transcript)
57
R consensus, y x 0.1
58
(No Transcript)
59
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com