Phylogeny - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Phylogeny

Description:

In molecular evolution, the studied data are homologous DNA/AA sequences ... Monte Carlo Markov Chains (MCMC) (e.g., using MrBayes) Most accurate. Very slow ... – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 41
Provided by: ibisT
Category:

less

Transcript and Presenter's Notes

Title: Phylogeny


1
Phylogeny
2
Reconstructing a phylogeny
  • The phylogenetic tree (phylogeny) describes the
    evolutionary relationships between the studied
    data
  • The data must be comprised of homologous types
  • In molecular evolution, the studied data are
    homologous DNA/AA sequences
  • Phylogeny reconstruction explicitly assumes that
    the sequences are aligned

INPUT MSA
3
Reminder MSA and phylogeny are dependent
MSA
Unaligned sequences
Sequence alignment
Phylogeny reconstruction
Inaccurate guide tree
4
Phylogeny representation
Textual representation (Newick format)
Visual representation
((A,C),(B,D))
C
A
D
B
  • Each pair of parenthesis () encloses a clade in
    the tree
  • A comma , separates the members of the
    corresponding clade
  • A semicolon is always the last character

5
Some terminology
monophyletic group (clade)
root
External branches
internal branches (splits)
Neighbors
Neighbors
internal nodes
External nodes (leaves)
6
Swapping neighbors is meaningless

Chimp
Human
Gorilla
(Gorilla,(Human,Chimp))
(Gorilla,(Chimp,Human))


Human
Gorilla
Chimp
((Human,Chimp),Gorilla)
((Chimp,Human),Gorilla)
7
Rooted vs. unrooted
?
3
A
1
?
C
B
2
8
In newick format
?
3
A
1
?
C
B
2
9
How can we root a tree?
10
Rooting the tree based on a priori knowledge
using an outgroup
The outgroup should be close enough for detecting
sequence homology, but far enough to be a clear
outgroup
11
The gene tree is not always identical to the
species tree
Gene tree
Species tree
?
12
Phylogeny reconstruction approaches
Distance based methods Neighbor Joining
A B C D E
A 0 2 3 4 4
B 0 3 4 5
C 0 3 4
D 0 5
E 0
A,B C D E
A,B 0 2.5 4.5 3.5
C 0 3 4
D 0 5
E 0
The Minimum Evolution (ME) criterion in each
iteration we separate the two sequences which
result with the minimal sum of branch lengths
13
Phylogeny reconstruction approaches
Topology search methods MP, ML
Maximum Parsimony finds the most parsimonious
topology
Maximum Likelihood finds the most likely topology
P(DataT)
14
Phylogeny reconstruction approaches summary
  • Distance based methods
  • Neighbor Joining (e.g., using ClustalX)
  • Fast
  • Inaccurate
  • Topology search methods
  • Maximum parsimony (e.g., using MEGA)
  • Crude
  • Questionable statistical basis
  • Maximum likelihood (e.g., using RAxML, phyML)
  • Accurate
  • Slow
  • Bayesian methods
  • Monte Carlo Markov Chains (MCMC) (e.g., using
    MrBayes)
  • Most accurate
  • Very slow

15
How robust is our tree?
16
Bootstrap for estimating robustness
  • We need some statistical way to estimate the
    confidence in the tree topology
  • But we dont know anything about the distribution
    of tree topologies
  • The only data source we have is our data (MSA)
  • So, we must rely on our own resources pull up
    by your own bootstraps

17
Bootstrap
1. Create n (100-1000) new MSAs (pseudo-MSAs) by
randomly sampling K positions from our original
MSA with replacement
12345 K 1 ATCTGA 2 ATCTGC 3 ACTTAC
4 ACCTAT
112443 1 AATTTC 2 AATTTC 3 AACTTT 4
AACTTC
9747810 1 TTTTAT 2 CATACA 3
CATACT 4 AGTGGA
51578 12 1 GAGTAT 2 GAGACG 3
AAAACA 4 AAAGGC
18
Bootstrap
2. Reconstruct a pseudo-tree from each pseudo-MSA
with the same method used for reconstructing the
original tree
112443 1 AATTTC 2 AATTTC 3 AACTTT 4
AACTTC
9747810 1 TTTTAT 2 CATACA 3
CATACT 4 AGTGGA
51578 12 1 GAGTAT 2 GAGACG 3
AAAACA 4 AAAGGC
Sp1
Sp1
Sp2
Sp2
Sp3
Sp3
Sp4
Sp4
19
Bootstrap
3. For each split in our original tree, we count
the number of times it appeared in the
pseudo-trees
Sp1
Sp1
Sp2
Sp2
Sp3
Sp3
Sp4
Sp4
67
Sp1
In 67 of the pseudo-trees, the split between
SP1SP2 and the rest of the tree was found
100
Sp2
Sp3
In general bp support lt 80 is considered low
Sp4
20
ClustalX NJ phylogeny reconstruction
21
ClustalX NJ phylogeny reconstruction
22
http//phylobench.vital-it.ch/raxml-bb/
23
(No Transcript)
24
Viewing the tree with njPlot
25
Note unrooted tree
26
Defining an outgroup
27
Swapping nodes
28
Bootstrap support
29
FigTree tree visualization and figure
creationhttp//tree.bio.ed.ac.uk/software/figtree
/
30
Reconstructing the tree of life
31
Darwins vision of the tree of life from the
Origin of Species
32
The three-domain tree of life based on SSU rRNA
MSA
33
But branching of several kingdoms remain in
dispute
34
Lateral Gene Transfer (LGT) challenges the
conceptual basis of phylogenetic classification
35
(No Transcript)
36
Methodology
  • Started with 36 genes universally present in 191
    species (spanning all 3 domains of life), for
    which orthologs could be unambiguously identified
  • Eliminated 5 genes that are LGT suspects (mostly
    tRNA synthetases)
  • Constructed an MSA for each of the 31 orthogroups
  • Concatenated all 31 MSAs to a super-MSA of 8090
    columns
  • The phylogeny was reconstructed based on the
    super-MSA using the maximum likelihood approach

37
http//itol.embl.de
38
Tree support
  • 81.7 of the splits show bootstrap support of
    over 80
  • 65 of the split show bootstrap support of 100
  • However, several deep splits show low supports

39
Still, the debate goes on
40
Tree of one percent of life
  • Ciccarelli et al. on the one hand favor the claim
    that bacteria adhere to a bifurcating tree of
    life, given that the small amount of LGT genes
    are filtered
  • On the other hand, their filtering process left
    only 31 proteins, which represent 1 of an
    average prokaryotic proteome and 0.1 of a large
    eukaryotic proteome
  • If throwing out all non-universally distributed
    genes and all LGT suspects leaves a 1 tree, then
    we should probably abandon the tree as a working
    hypothesis
Write a Comment
User Comments (0)
About PowerShow.com