Molecular Phylogenetics and the Tree of Life - PowerPoint PPT Presentation

Loading...

PPT – Molecular Phylogenetics and the Tree of Life PowerPoint presentation | free to view - id: 1b7b85-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Molecular Phylogenetics and the Tree of Life

Description:

Technion Israel Institute of Technology. 5th Lecture, November 22rd 2009 ... In 1866, Ernst Haeckel coined the word 'phylogeny' and presented phylogenetic ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 66
Provided by: michal76
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Molecular Phylogenetics and the Tree of Life


1
Molecular Phylogenetics and the Tree of Life
Genetic Tales by Andrea Branzi
5th Lecture, November 22rd 2009
Itai Yanai Department of Biology Technion
Israel Institute of Technology
2
Molecular Phylogenetics and the Tree of Life
  • Tree types and terminology
  • A tree based upon the molecular clock
  • UPGMA
  • Human history
  • A tree based upon minimum evolution
  • Neighbor Joining
  • The tree of life
  • Rooting the tree tree of life
  • Gene trees
  • Margulis and the mitochondria
  • Character-based method Maximum parsimony
  • Bootstrapping

3
Life has an historical narrative which can be
represented as a tree
Summary of Darwins theory of evolution
  • Species are not fixed
  • Common descent
  • Multiplication of species
  • Gradualism
  • Natural selection

(Mayr, 1991)
The only figure from The Origin of Species
4
In 1866, Ernst Haeckel coined the word
phylogeny and presented phylogenetic trees for
most known groups of living organisms.
Ernst Haeckel (1834-1919)
5
The Tree of Life project
Surf the tree of life at http//tolweb.org/tree/p
hylogeny.html
6
GATCTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAA
CATGTTATTCAGGTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAA
ATTATGTTTCCCATGCATCAGGTGCAATGGGAAGCTCTTCTGGAGAGTGA
GAGAAGCTTCCAGTTAAGGTGACATTGAAGCCAAGTCCTGAAAGATGAGG
AAGAGTTGTATGAGAGTGGGGAGGGAAGGGGGAGGTGGAGGGATGGGGAA
TGGGCCGGGATGGGATAGCGCAAACTGCCCGGGAAGGGAAACCAGCACTG
TACAGACCTGAACAACGAAGATGGCATATTTTGTTCAGGGAATGGTGAAT
TAAGTGTGGCAGGAATGCTTTGTAGACACAGTAATTTGCTTGTATGGAAT
TTTGCCTGAGAGACCTCATTGCAGTTTCTGATTTTTTGATGTCTTCATCC
ATCACTGTCCTTGTCAAATAGTTTGGAACAGGTATAATGATCACAATAAC
CCCAAGCATAATATTTCGTTAATTCTCACAGAATCACATATAGGTGCCAC
AGTTATCCCCATTTTATGAATGGAGTTypesofTreesandTerminolo
gyGATGAAAACCTTAGGAATAATGAATGATTTGCGCAGGCTCACCTGGAT
ATTAAGACTGAGTCAAATGTTGGGTCTGGTCTGACTTTAATGTTTGCTTT
GTTCATGAGCACCACATATTGCCTCTCCTATGCAGTTAAGCAGGTAGGTG
ACAGAAAAGCCCATGTTTGTCTCTACTCACACACTTCCGACTGAATGTAT
GTATGGAGTTTCTACACCAGATTCTTCAGTGCTCTGGATATTAACTGGGT
ATCCCATGACTTTATTCTGACACTACCTGGACCTTGTCAAATAGTTTGGA
CCTTGTCAAATAGTTTGGAGTCCTTGTCAAATAGTTTGGGGTTAGCACAG
ACCCCACAAGTTAGGGGCTCAGTCCCACGAGGCCATCCTCACTTCAGATG
ACAATGGCAAGTCCTAAGTTGTCACCATACTTTTGACCAACCTGTTACCA
ATCGGGGGTTCCCGTAACTGTCTTCTTGGGTTTAATAATTTGCTAGAACA
GTTTACGGAACTCAGAAAAACAGTTTATTTTCTTTTTTTCTGAGAGAGAG
GGTCTTATTTTGTTGCCCAGGCTGGTGTGCAATGGTGCAGTCATAGCTCA
TTGCAGCCTTGATTGTCTGGGTTCCAGTGGTTCTCCCACCTCAGCCTCCC
TAGTAGCTGAGACTACATGCCTGCACCACCACATCTGGCTAGTTTCTTTT
ATTTTTTGTATAGATGGGGTCTTGTTGTGTTGGCCAGGCTGGCCACAAAT
TCCTGGTCTCAAGTGATCCTCCCACCTCAGCCTCTGAAAGTGCTGGGATT
ACAGATGTGAGCCACCACATCTGGCCAGTTCATTTCCTATTACTGGTTCA
TTGTGAAGGATACATCTCAGAAACAGTCAATGAAAGAGACGTGCATGCTG
GATGCAGTGGCTCATGCCTGTAATCTCAGCACTTTGGGAGGCCAAGGTGG
GAGGATCGCTTAAACTCAGGAGTTTGAGACCAGCCTGGGCAACATGGTGA
AAACCTGTCTCTATAAAAAATTAAAAAATAATAATAATAACTGGTGTGGT
GTTGTGCACCTAGAGTTCCAACTACTAGGGAAGCTGAGATGAGAGGATAC
CTTGAGCTGGGGACTGGGGAGGCTTAGGTTACAGTAAGCTGAGATTGTGC
CACTGCACTCCAGCTTGGACAAAAGAGCCTGATCCTGTCTCAAAAAAAAG
AAAGATACCCAGGGTCCACAGGCACAGCTCCATCGTTACAATGGCCTCTT
TAGACCCAGCTCCTGCCTCCCAGCCTTCT
7
What is a tree?
A tree is a mathematical structure which
represents a model of an actual evolutionary
history of a group of sequences or organisms. In
other words, it is an evolutionary hypothesis.
A tree consists of nodes connected by branches.
One unique internal node is the root of the tree
the ancestor of all the sequences.
Internal nodes represent hypothetical ancestors
Terminal nodes represent sequences or organisms
for which we have data. Each is typically called
a Operational Taxonomical Unit or OTU.
8
Types of Trees
Rooted vs. Unrooted
M is the number of OTUs
9
The number of rooted and unrooted trees
Number of OTUs
OTU Operational Taxonomical Unit
 
10
Types of Trees
Multifurcating
Bifurcating
Polytomy
  • Polytomies Soft vs. Hard
  • Soft designate a lack of information about the
    order of divergence.
  • Hard the hypothesis that multiple divergences
    occurred simultaneously

11
Types of Trees
Trees
Networks
Only one path between any pair of nodes
More than one path between any pair of nodes
12
A shorthand for trees the Newick format
((1,2),(3,4))
1
2
3
4
((1,2),((3,4),5),6)
6
1
2
5
3
4
13
Different kinds of trees can be used to depict
different aspects of evolutionary history
  • Cladogram
  • simply shows relative order of common
    ancestry
  • Additive trees
  • a cladogram with branch lengths,
  • also called phylograms and metric trees
  • 3. Ultrametric trees
  • (dendograms) special kind of additive
    tree in which the
  • tips of the trees are all equidistant
    from the root

14
Making trees according to morphological features
The trouble with numerical taxonomy is that it is
not possible to take the subjectivity out of the
analysis What is more significant, tail length
or skin color?
Ridley New Scientist (Dec. 1983) 100, 647-51
15
Why is it possible to make trees with genetic
sequences?
Molecules as Documents of Evolutionary
History Zuckerkandl and Pauling (1965)
Human - ACTTGACCCTTACGAT Orangutan
AGCTGGCCCTGATTAC Chimpanzee AGTTGACCATTACGAT G
orilla - AGCTGGTCCTGATGAC
The Cheshire cat from Lewis Carrolss Alice in
Wonderland
16
All right, said the Cat and this time it
vanished quite slowly, beginning with the end of
the tail, and ending with grin, which remained
some time after the rest of it had gone.
The Cheshire cat from Lewis Carrolss Alice in
Wonderland
17
GATCTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAA
CATGTTATTCAGGTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAA
ATTATGTTTCCCATGCATCAGGTGCAATGGGAAGCTCTTCTGGAGAGTGA
GAGAAGCTTCCAGTTAAGGTGACATTGAAGCCAAGTCCTGAAAGATGAGG
AAGAGTTGTATGAGAGTGGGGAGGGAAGGGGGAGGTGGAGGGATGGGGAA
TGGGCCGGGATGGGATAGCGCAAACTGCCCGGGAAGGGAAACCAGCACTG
TACAGACCTGAACAACGAAGATGGCATATTTTGTTCAGGGAATGGTGAAT
TAAGTGTGGCAGGAATGCTTTGTAGACACAGTAATTTGCTTGTATGGAAT
TTTGCCTGAGAGACCTCATTGCAGTTTCTGATTTTTTGATGTCTTCATCC
ATCACTGTCCTTGTCAAATAGTTTGGAACAGGTATAATGATCACAATAAC
CCCAAGCATAATATTTCGTTAATTCTCACAGAATCACATATAGGTGCCAC
AGTTATCCCCATTTTATGAATGGAGTConstructingtreeswiththe
molecularclockGATGAAAACCTTAGGAATAATGAATGATTTGCGCAG
GCTCACCTGGATATTAAGACTGAGTCAAATGTTGGGTCTGGTCTGACTTT
AATGTTTGCTTTGTTCATGAGCACCACATATTGCCTCTCCTATGCAGTTA
AGCAGGTAGGTGACAGAAAAGCCCATGTTTGTCTCTACTCACACACTTCC
GACTGAATGTATGTATGGAGTTTCTACACCAGATTCTTCAGTGCTCTGGA
TATTAACTGGGTATCCCATGACTTTATTCTGACACTACCTGGACCTTGTC
AAATAGTTTGGACCTTGTCAAATAGTTTGGAGTCCTTGTCAAATAGTTTG
GGGTTAGCACAGACCCCACAAGTTAGGGGCTCAGTCCCACGAGGCCATCC
TCACTTCAGATGACAATGGCAAGTCCTAAGTTGTCACCATACTTTTGACC
AACCTGTTACCAATCGGGGGTTCCCGTAACTGTCTTCTTGGGTTTAATAA
TTTGCTAGAACAGTTTACGGAACTCAGAAAAACAGTTTATTTTCTTTTTT
TCTGAGAGAGAGGGTCTTATTTTGTTGCCCAGGCTGGTGTGCAATGGTGC
AGTCATAGCTCATTGCAGCCTTGATTGTCTGGGTTCCAGTGGTTCTCCCA
CCTCAGCCTCCCTAGTAGCTGAGACTACATGCCTGCACCACCACATCTGG
CTAGTTTCTTTTATTTTTTGTATAGATGGGGTCTTGTTGTGTTGGCCAGG
CTGGCCACAAATTCCTGGTCTCAAGTGATCCTCCCACCTCAGCCTCTGAA
AGTGCTGGGATTACAGATGTGAGCCACCACATCTGGCCAGTTCATTTCCT
ATTACTGGTTCATTGTGAAGGATACATCTCAGAAACAGTCAATGAAAGAG
ACGTGCATGCTGGATGCAGTGGCTCATGCCTGTAATCTCAGCACTTTGGG
AGGCCAAGGTGGGAGGATCGCTTAAACTCAGGAGTTTGAGACCAGCCTGG
GCAACATGGTGAAAACCTGTCTCTATAAAAAATTAAAAAATAATAATAAT
AACTGGTGTGGTGTTGTGCACCTAGAGTTCCAACTACTAGGGAAGCTGAG
ATGAGAGGATACCTTGAGCTGGGGACTGGGGAGGCTTAGGTTACAGTAAG
CTGAGATTGTGCCACTGCACTCCAGCTTGGACAAAAGAGCCTGATCCTGT
CTCAAAAAAAAGAAAGATACCCAGGGTCCACAGGCACAGCTCCATCGTTA
CAATGGCCTCTTTAGACCCAGCTCCTGCCTCCCAGCCTTCT
18
Given a multiple alignment, how do we construct
the tree?
A - GCTTGTCCGTTACGAT B ACTTGTCTGTTACGAT C
ACTTGTCCGAAACGAT D - ACTTGACCGTTTCCTT E
AGATGACCGTTTCGAT F - ACTACACCCTTATGAG
?
19
Issues in phylogenetic inference
  • Algorithms versus optimality criteria
  • Use of evolutionary models and assumptions
    (reciprocal illumination Hennig, 1966)
  • Types of data discrete, unordered, and
    independent.
  • Distance methods vs. Character based methods.

20
Distance methods
Logic Evolutionary distance is a tree metric and
hence defines a tree
  • General Method
  • Evolutionary distances are computed for all pairs
    of taxa.
  • A phylogenetic tree is constructed by considering
    the relationships among these distance data
    (fitting a tree to the matrix).
  • Methods well talk about
  • UPGMA
  • Neighbor Joining

21
Construction of a distance tree using clustering
with the Unweighted Pair Group Method with
Arithmatic Mean (UPGMA)
First, construct a distance matrix
A - GCTTGTCCGTTACGAT B ACTTGTCTGTTACGAT C
ACTTGTCCGAAACGAT D - ACTTGACCGTTTCCTT E
AGATGACCGTTTCGAT F - ACTACACCCTTATGAG
From http//www.icp.ucl.ac.be/opperd/private/upgm
a.html
22
Ultrametric Trees
1
3
1
1
3
2
1
1
1
1
a
b
c
Metric distances (for additive trees) must obey 4
rules Non-negativity d(a,b) 0 Distinctness
d(a,b) 0 if and only if a
b Symmetry d(a,b) d(b,a) Triangle
Inequality d(a,c) d(a,b) d(b,c) Ultrametric
must obey one additional rule Three point
condition d(a,b) max( d(a,c), d(b,c) )
1
2
0.4
1
c
b
a
23
UPGMA
First round
dist(A,B),C (distAC distBC) / 2 4
dist(A,B),D (distAD distBD) / 2 6
dist(A,B),E (distAE distBE) / 2
6 dist(A,B),F (distAF distBF) / 2 8
Choose the most similar pair, cluster them
together and calculate the new distance matrix.
24
UPGMA
Second round
Third round
25
UPGMA
Fourth round
Fifth round
Note the this method identifies the root of the
tree.
26
A tree of human mitochondria sequences
http//www.genpat.uu.se/mtDB/
  • The mitochondrial genome has 16,500 base-pairs.
  • In 2000, Gyllensten and colleagues sequenced the
    mitochondrial genomes of 86 people of diverse
    geographical, racial and linguistic backgrounds.
  • A molecular clock seems to hold the divergence of
    these sequences at a rate of 1.7x10-8
    substitutions per site per year.

Ingman, M., Kaessmann, H., Pääbo, S.
Gyllensten, U. (2000) Nature 408 708-713.
27
The deepest branches lead exclusively to
sub-Saharan mtDNAs, with the second branch
containing both Africans and non-Africans.
sub-Sahara mtDNA
A tree of 86 mitochondrial sequences. Downloaded
from http//www.genpat.uu.se/mtDB/sequences.html
and analyzed using MEGA, method UPGMA
28
Phylogeny based upon the molecular clock
  • Evidence for a human mitochondrial origin in
    Africa African sequence diversity is twice as
    large as that of non-African
  • Gyllensten and colleagues estimate that the
    divergence of Africans and non-Africans occurred
    52,000 to 28,000 years ago.

Ingman, M., Kaessmann, H., Pääbo, S.
Gyllensten, U. (2000) Nature 408 708-713.
29
UPGMA assumes a molecular clock
  • The UPGMA clustering method is very sensitive to
    unequal evolutionary rates (assumes that the
    evolutionary rate is the same for all branches).
  • Clustering works only if the data are ultrametric
  • Ultrametric distances are defined by the
    satisfaction of the 'three-point condition'.

The three-point condition
B
A
C
For any three taxa, the two greatest distances
are equal.
30
UPGMA fails when rates of evolution are not
constant
A tree in which the evolutionary rates are not
equal
(Neighbor joining will get the right tree in this
case.)
From http//www.icp.ucl.ac.be/opperd/private/upgm
a.html
31
GATCTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAA
CATGTTATTCAGGTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAA
ATTATGTTTCCCATGCATCAGGTGCAATGGGAAGCTCTTCTGGAGAGTGA
GAGAAGCTTCCAGTTAAGGTGACATTGAAGCCAAGTCCTGAAAGATGAGG
AAGAGTTGTATGAGAGTGGGGAGGGAAGGGGGAGGTGGAGGGATGGGGAA
TGGGCCGGGATGGGATAGCGCAAACTGCCCGGGAAGGGAAACCAGCACTG
TACAGACCTGAACAACGAAGATGGCATATTTTGTTCAGGGAATGGTGAAT
TAAGTGTGGCAGGAATGCTTTGTAGACACAGTAATTTGCTTGTATGGAAT
TTTGCCTGAGAGACCTCATTGCAGTTTCTGATTTTTTGATGTCTTCATCC
ATCACTGTCCTTGTCAAATAGTTTGGAACAGGTATAATGATCACAATAAC
CCCAAGCATAATATTTCGTTAATTCTCACAGAATCACATATAGGTGCCAC
AGTTATCCCCATTTTATGAATGGAGTAdditiveTreesbyNeighborJ
oiningGATGAAAACCTTAGGAATAATGAATGATTTGCGCAGGCTCACCT
GGATATTAAGACTGAGTCAAATGTTGGGTCTGGTCTGACTTTAATGTTTG
CTTTGTTCATGAGCACCACATATTGCCTCTCCTATGCAGTTAAGCAGGTA
GGTGACAGAAAAGCCCATGTTTGTCTCTACTCACACACTTCCGACTGAAT
GTATGTATGGAGTTTCTACACCAGATTCTTCAGTGCTCTGGATATTAACT
GGGTATCCCATGACTTTATTCTGACACTACCTGGACCTTGTCAAATAGTT
TGGACCTTGTCAAATAGTTTGGAGTCCTTGTCAAATAGTTTGGGGTTAGC
ACAGACCCCACAAGTTAGGGGCTCAGTCCCACGAGGCCATCCTCACTTCA
GATGACAATGGCAAGTCCTAAGTTGTCACCATACTTTTGACCAACCTGTT
ACCAATCGGGGGTTCCCGTAACTGTCTTCTTGGGTTTAATAATTTGCTAG
AACAGTTTACGGAACTCAGAAAAACAGTTTATTTTCTTTTTTTCTGAGAG
AGAGGGTCTTATTTTGTTGCCCAGGCTGGTGTGCAATGGTGCAGTCATAG
CTCATTGCAGCCTTGATTGTCTGGGTTCCAGTGGTTCTCCCACCTCAGCC
TCCCTAGTAGCTGAGACTACATGCCTGCACCACCACATCTGGCTAGTTTC
TTTTATTTTTTGTATAGATGGGGTCTTGTTGTGTTGGCCAGGCTGGCCAC
AAATTCCTGGTCTCAAGTGATCCTCCCACCTCAGCCTCTGAAAGTGCTGG
GATTACAGATGTGAGCCACCACATCTGGCCAGTTCATTTCCTATTACTGG
TTCATTGTGAAGGATACATCTCAGAAACAGTCAATGAAAGAGACGTGCAT
GCTGGATGCAGTGGCTCATGCCTGTAATCTCAGCACTTTGGGAGGCCAAG
GTGGGAGGATCGCTTAAACTCAGGAGTTTGAGACCAGCCTGGGCAACATG
GTGAAAACCTGTCTCTATAAAAAATTAAAAAATAATAATAATAACTGGTG
TGGTGTTGTGCACCTAGAGTTCCAACTACTAGGGAAGCTGAGATGAGAGG
ATACCTTGAGCTGGGGACTGGGGAGGCTTAGGTTACAGTAAGCTGAGATT
GTGCCACTGCACTCCAGCTTGGACAAAAGAGCCTGATCCTGTCTCAAAAA
AAAGAAAGATACCCAGGGTCCACAGGCACAGCTCCATCGTTACAATGGCC
TCTTTAGACCCAGCTCCTGCCTCCCAGCCTTCT
32
Neighbors
A
C
c
a
x
b
d
D
B
A and B are neighbors because they are connected
through a single internal node. C and D are
also neighbors, but A and D are not neighbors.
33
If the tree is additive, the 4-point condition
will hold.
The Four Point Condition
A
C
c
a
x
b
d
D
B
dAC dBD dAD dBC a b c d 2x dAB
dCD 2x
dAB dCD lt dAC dBD
The 4-point condition
dAB dCD lt dAD dBC
non-neighbors
neighbors
Basically states that neighbors are closer than
non-neighbors.
34
Neighbor Joining An algorithm for finding the
shortest tree
Start with a star (no hierarchical structure)
c
a
d
b
The length of the tree
Pair-wise distances
Number of OTUs
35
Neighbor Joining
The following can be used to calculate the length
of this tree
(Saitou and Nei, 1987)
36
Neighbor Joining
At each step, each pair of possible neighbors are
considered and the one producing the shortest
tree is chosen (minimal evolution criteria).
(Saitou and Nei, 1987)
37
Neighbor Joining
As in UPGMA, a new internal branch is added at
each step.
(Saitou and Nei, 1987)
38
Rooting a neighbor joining tree with an outgroup
Root
Outgroup
Ingman, M., Kaessmann, H., Pääbo, S.
Gyllensten, U. (2000) Nature 408 708-713.
39
GATCTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAA
CATGTTATTCAGGTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAA
ATTATGTTTCCCATGCATCAGGTGCAATGGGAAGCTCTTCTGGAGAGTGA
GAGAAGCTTCCAGTTAAGGTGACATTGAAGCCAAGTCCTGAAAGATGAGG
AAGAGTTGTATGAGAGTGGGGAGGGAAGGGGGAGGTGGAGGGATGGGGAA
TGGGCCGGGATGGGATAGCGCAAACTGCCCGGGAAGGGAAACCAGCACTG
TACAGACCTGAACAACGAAGATGGCATATTTTGTTCAGGGAATGGTGAAT
TAAGTGTGGCAGGAATGCTTTGTAGACACAGTAATTTGCTTGTATGGAAT
TTTGCCTGAGAGACCTCATTGCAGTTTCTGATTTTTTGATGTCTTCATCC
ATCACTGTCCTTGTCAAATAGTTTGGAACAGGTATAATGATCACAATAAC
CCCAAGCATAATATTTCGTTAATTCTCACAGAATCACATATAGGTGCCAC
AGTTATCCCCATTTTATGAATGGAGTTheTreeofLifeGATGAAAACCT
TAGGAATAATGAATGATTTGCGCAGGCTCACCTGGATATTAAGACTGAGT
CAAATGTTGGGTCTGGTCTGACTTTAATGTTTGCTTTGTTCATGAGCACC
ACATATTGCCTCTCCTATGCAGTTAAGCAGGTAGGTGACAGAAAAGCCCA
TGTTTGTCTCTACTCACACACTTCCGACTGAATGTATGTATGGAGTTTCT
ACACCAGATTCTTCAGTGCTCTGGATATTAACTGGGTATCCCATGACTTT
ATTCTGACACTACCTGGACCTTGTCAAATAGTTTGGACCTTGTCAAATAG
TTTGGAGTCCTTGTCAAATAGTTTGGGGTTAGCACAGACCCCACAAGTTA
GGGGCTCAGTCCCACGAGGCCATCCTCACTTCAGATGACAATGGCAAGTC
CTAAGTTGTCACCATACTTTTGACCAACCTGTTACCAATCGGGGGTTCCC
GTAACTGTCTTCTTGGGTTTAATAATTTGCTAGAACAGTTTACGGAACTC
AGAAAAACAGTTTATTTTCTTTTTTTCTGAGAGAGAGGGTCTTATTTTGT
TGCCCAGGCTGGTGTGCAATGGTGCAGTCATAGCTCATTGCAGCCTTGAT
TGTCTGGGTTCCAGTGGTTCTCCCACCTCAGCCTCCCTAGTAGCTGAGAC
TACATGCCTGCACCACCACATCTGGCTAGTTTCTTTTATTTTTTGTATAG
ATGGGGTCTTGTTGTGTTGGCCAGGCTGGCCACAAATTCCTGGTCTCAAG
TGATCCTCCCACCTCAGCCTCTGAAAGTGCTGGGATTACAGATGTGAGCC
ACCACATCTGGCCAGTTCATTTCCTATTACTGGTTCATTGTGAAGGATAC
ATCTCAGAAACAGTCAATGAAAGAGACGTGCATGCTGGATGCAGTGGCTC
ATGCCTGTAATCTCAGCACTTTGGGAGGCCAAGGTGGGAGGATCGCTTAA
ACTCAGGAGTTTGAGACCAGCCTGGGCAACATGGTGAAAACCTGTCTCTA
TAAAAAATTAAAAAATAATAATAATAACTGGTGTGGTGTTGTGCACCTAG
AGTTCCAACTACTAGGGAAGCTGAGATGAGAGGATACCTTGAGCTGGGGA
CTGGGGAGGCTTAGGTTACAGTAAGCTGAGATTGTGCCACTGCACTCCAG
CTTGGACAAAAGAGCCTGATCCTGTCTCAAAAAAAAGAAAGATACCCAGG
GTCCACAGGCACAGCTCCATCGTTACAATGGCCTCTTTAGACCCAGCTCC
TGCCTCCCAGCCTTCT
16S ribosomal RNA
40
Relationships between 16S ribosomal RNAs
Distant relationships
Close relationships
41
The three domains of Life as identified by
phylogenetic analysis of the highly conserved
16S ribosomal RNA
16S ribosomal RNA
(see Woese and Fox 1977)
42
Where is the root of the tree of life?
(by definition there is no outgroup)
43
An ancient gene duplication can root a tree
Speciation of 3 and 1-2
Gene duplication
Speciation of 1 and 2
Outgroups for A2
Outgroups for A1
Root of 1,2,3
Graur Li. Fundamentals of Molecular Evolution
(1999)
44
The root of the tree of life as inferred from
Ef-Tu and EF-G
Both trees show Archaea and Eucarya as sister taxa
Graur Li. Fundamentals of Molecular Evolution
(1999)
45
GATCTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAA
CATGTTATTCAGGTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAA
ATTATGTTTCCCATGCATCAGGTGCAATGGGAAGCTCTTCTGGAGAGTGA
GAGAAGCTTCCAGTTAAGGTGACATTGAAGCCAAGTCCTGAAAGATGAGG
AAGAGTTGTATGAGAGTGGGGAGGGAAGGGGGAGGTGGAGGGATGGGGAA
TGGGCCGGGATGGGATAGCGCAAACTGCCCGGGAAGGGAAACCAGCACTG
TACAGACCTGAACAACGAAGATGGCATATTTTGTTCAGGGAATGGTGAAT
TAAGTGTGGCAGGAATGCTTTGTAGACACAGTAATTTGCTTGTATGGAAT
TTTGCCTGAGAGACCTCATTGCAGTTTCTGATTTTTTGATGTCTTCATCC
ATCACTGTCCTTGTCAAATAGTTTGGAACAGGTATAATGATCACAATAAC
CCCAAGCATAATATTTCGTTAATTCTCACAGAATCACATATAGGTGCCAC
AGTTATCCCCATTTTATGAATGGAGTGeneTreesGATGAAAACCTTAGG
AATAATGAATGATTTGCGCAGGCTCACCTGGATATTAAGACTGAGTCAAA
TGTTGGGTCTGGTCTGACTTTAATGTTTGCTTTGTTCATGAGCACCACAT
ATTGCCTCTCCTATGCAGTTAAGCAGGTAGGTGACAGAAAAGCCCATGTT
TGTCTCTACTCACACACTTCCGACTGAATGTATGTATGGAGTTTCTACAC
CAGATTCTTCAGTGCTCTGGATATTAACTGGGTATCCCATGACTTTATTC
TGACACTACCTGGACCTTGTCAAATAGTTTGGACCTTGTCAAATAGTTTG
GAGTCCTTGTCAAATAGTTTGGGGTTAGCACAGACCCCACAAGTTAGGGG
CTCAGTCCCACGAGGCCATCCTCACTTCAGATGACAATGGCAAGTCCTAA
GTTGTCACCATACTTTTGACCAACCTGTTACCAATCGGGGGTTCCCGTAA
CTGTCTTCTTGGGTTTAATAATTTGCTAGAACAGTTTACGGAACTCAGAA
AAACAGTTTATTTTCTTTTTTTCTGAGAGAGAGGGTCTTATTTTGTTGCC
CAGGCTGGTGTGCAATGGTGCAGTCATAGCTCATTGCAGCCTTGATTGTC
TGGGTTCCAGTGGTTCTCCCACCTCAGCCTCCCTAGTAGCTGAGACTACA
TGCCTGCACCACCACATCTGGCTAGTTTCTTTTATTTTTTGTATAGATGG
GGTCTTGTTGTGTTGGCCAGGCTGGCCACAAATTCCTGGTCTCAAGTGAT
CCTCCCACCTCAGCCTCTGAAAGTGCTGGGATTACAGATGTGAGCCACCA
CATCTGGCCAGTTCATTTCCTATTACTGGTTCATTGTGAAGGATACATCT
CAGAAACAGTCAATGAAAGAGACGTGCATGCTGGATGCAGTGGCTCATGC
CTGTAATCTCAGCACTTTGGGAGGCCAAGGTGGGAGGATCGCTTAAACTC
AGGAGTTTGAGACCAGCCTGGGCAACATGGTGAAAACCTGTCTCTATAAA
AAATTAAAAAATAATAATAATAACTGGTGTGGTGTTGTGCACCTAGAGTT
CCAACTACTAGGGAAGCTGAGATGAGAGGATACCTTGAGCTGGGGACTGG
GGAGGCTTAGGTTACAGTAAGCTGAGATTGTGCCACTGCACTCCAGCTTG
GACAAAAGAGCCTGATCCTGTCTCAAAAAAAAGAAAGATACCCAGGGTCC
ACAGGCACAGCTCCATCGTTACAATGGCCTCTTTAGACCCAGCTCCTGCC
TCCCAGCCTTCT
46
A gene tree may contradict the species tree
Horizontal Gene Transfer can explain this. This
phenomena is so important that we devote an
entire lecture it soon.
archae
eubacteria
Mn-dependent transcriptional regulator
(Tatusov, 1996)
47
What is the origin of the mitochondria?
http//www.mitomap.org/
48
The mitochondria appears out of place
  • Both mitochondria and chloroplasts can arise only
    from preexisting mitochondria and chloroplasts.
    They cannot be formed in a cell that lacks them
    because nuclear genes encode only some of the
    proteins of which they are made.
  • Both mitochondria and chloroplasts have their own
    genome.
  • Both genomes consist of a single circular
    molecule of DNA.
  • There are no histones associated with the DNA.

49
Lynn Margulis and the Endosymbiotic Theory of
Eukaryote Evolution
http//www.mrs.umn.edu/goochv/CellBio/lectures/en
do/endo.html
50
The genome sequence of Rickettsia prowazekii and
the origin of mitochondria.
Andersson SG Nature 1998 Nov 12396(6707)133-40
51
The Mitochondrias sit with the proteobacteria in
the tree of life
mitochondrial (MT)
Small-subunit (SSU) ribosomal RNA tree
Gray MW Nature. 1998 Nov 12396(6707)109-10.
52
Mitochondria derive from ?-Purple
bacteria Chloroplasts derive from cyanobacteria
Conclusion the mitochondria and chloroplasts are
slaves!
Graur Li. Fundamentals of Molecular Evolution
(1999)
53
The tree of life with mitochondria and
chloroplast endosymbiotic events
(Doolittle, 1999)
54
GATCTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAA
CATGTTATTCAGGTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAA
ATTATGTTTCCCATGCATCAGGTGCAATGGGAAGCTCTTCTGGAGAGTGA
GAGAAGCTTCCAGTTAAGGTGACATTGAAGCCAAGTCCTGAAAGATGAGG
AAGAGTTGTATGAGAGTGGGGAGGGAAGGGGGAGGTGGAGGGATGGGGAA
TGGGCCGGGATGGGATAGCGCAAACTGCCCGGGAAGGGAAACCAGCACTG
TACAGACCTGAACAACGAAGATGGCATATTTTGTTCAGGGAATGGTGAAT
TAAGTGTGGCAGGAATGCTTTGTAGACACAGTAATTTGCTTGTATGGAAT
TTTGCCTGAGAGACCTCATTGCAGTTTCTGATTTTTTGATGTCTTCATCC
ATCACTGTCCTTGTCAAATAGTTTGGAACAGGTATAATGATCACAATAAC
CCCAAGCATAATATTTCGTTAATTCTCACAGAATCACATATAGGTGCCAC
AGTTATCCCCATTTTATGAATGGAGTMaximumParsimonyGATGAAAA
CCTTAGGAATAATGAATGATTTGCGCAGGCTCACCTGGATATTAAGACTG
AGTCAAATGTTGGGTCTGGTCTGACTTTAATGTTTGCTTTGTTCATGAGC
ACCACATATTGCCTCTCCTATGCAGTTAAGCAGGTAGGTGACAGAAAAGC
CCATGTTTGTCTCTACTCACACACTTCCGACTGAATGTATGTATGGAGTT
TCTACACCAGATTCTTCAGTGCTCTGGATATTAACTGGGTATCCCATGAC
TTTATTCTGACACTACCTGGACCTTGTCAAATAGTTTGGACCTTGTCAAA
TAGTTTGGAGTCCTTGTCAAATAGTTTGGGGTTAGCACAGACCCCACAAG
TTAGGGGCTCAGTCCCACGAGGCCATCCTCACTTCAGATGACAATGGCAA
GTCCTAAGTTGTCACCATACTTTTGACCAACCTGTTACCAATCGGGGGTT
CCCGTAACTGTCTTCTTGGGTTTAATAATTTGCTAGAACAGTTTACGGAA
CTCAGAAAAACAGTTTATTTTCTTTTTTTCTGAGAGAGAGGGTCTTATTT
TGTTGCCCAGGCTGGTGTGCAATGGTGCAGTCATAGCTCATTGCAGCCTT
GATTGTCTGGGTTCCAGTGGTTCTCCCACCTCAGCCTCCCTAGTAGCTGA
GACTACATGCCTGCACCACCACATCTGGCTAGTTTCTTTTATTTTTTGTA
TAGATGGGGTCTTGTTGTGTTGGCCAGGCTGGCCACAAATTCCTGGTCTC
AAGTGATCCTCCCACCTCAGCCTCTGAAAGTGCTGGGATTACAGATGTGA
GCCACCACATCTGGCCAGTTCATTTCCTATTACTGGTTCATTGTGAAGGA
TACATCTCAGAAACAGTCAATGAAAGAGACGTGCATGCTGGATGCAGTGG
CTCATGCCTGTAATCTCAGCACTTTGGGAGGCCAAGGTGGGAGGATCGCT
TAAACTCAGGAGTTTGAGACCAGCCTGGGCAACATGGTGAAAACCTGTCT
CTATAAAAAATTAAAAAATAATAATAATAACTGGTGTGGTGTTGTGCACC
TAGAGTTCCAACTACTAGGGAAGCTGAGATGAGAGGATACCTTGAGCTGG
GGACTGGGGAGGCTTAGGTTACAGTAAGCTGAGATTGTGCCACTGCACTC
CAGCTTGGACAAAAGAGCCTGATCCTGTCTCAAAAAAAAGAAAGATACCC
AGGGTCCACAGGCACAGCTCCATCGTTACAATGGCCTCTTTAGACCCAGC
TCCTGCCTCCCAGCCTTCT
55
Character state methods
Logic Examine each column in the multiple
alignment of the sequences. Examine all possible
trees and choose among them according to some
optimality criteria
  • Method well talk about
  • Maximum parsimony

56
Maximum Parsimony
Simpler hypotheses are preferable to more
complicated ones and that ad hoc hypotheses
should be avoided whenever possible (Occams
Razor). Thus, find the tree that requires the
smallest number of evolutionary changes.
0123456789012345 W - ACTTGACCCTTACGAT X
AGCTGGCCCTGATTAC Y AGTTGACCATTACGAT Z -
AGCTGGTCCTGATGAC
W
X
Y
Z
57
Maximum Parsimony
Start by classifying the sites
123456789012345678901 Mouse
CTTCGTTGGATCAGTTTGATA Rat
CCTCGTTGGATCATTTTGATA Dog
CTGCTTTGGATCAGTTTGAAC Human
CCGCCTTGGATCAGTTTGAAC ----------------------------
-------- Invariant Variant
-------------------------
----------- Informative
Non-inform.
58
123456789012345678901 Mouse
CTTCGTTGGATCAGTTTGATA Rat
CCTCGTTGGATCATTTTGATA Dog
CTGCTTTGGATCAGTTTGAAC Human
CCGCCTTGGATCAGTTTGAAC
G
T
T
G
G
G
T
G
G
G
G
G
Site 5
G
C
G
C
T
C
T
T
T
T
T
C
C
C
C
C
C
T
Site 2
C
C
C
C
T
C
T
G
G
T
T
T
G
T
G
G
G
G
Site 3
T
G
T
G
G
G
59
Maximum Parsimony
123456789012345678901 Mouse
CTTCGTTGGATCAGTTTGATA Rat
CCTCGTTGGATCATTTTGATA Dog
CTGCTTTGGATCAGTTTGAAC Human
CCGCCTTGGATCAGTTTGAAC Informative

3
1
0
60
Maximum Parsimony
The situation is more complicated when the there
are than four units.
(TAGC)
(AT)
(TAG)
T
(AG)
(AGT)
T
(CT)
A
(GT)
C
T
T
G
A
A
A
C
T
T
A
G
Problems with maximum parsimony Only uses
informative sites Long-branches attract
61
GATCTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAA
CATGTTATTCAGGTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAA
ATTATGTTTCCCATGCATCAGGTGCAATGGGAAGCTCTTCTGGAGAGTGA
GAGAAGCTTCCAGTTAAGGTGACATTGAAGCCAAGTCCTGAAAGATGAGG
AAGAGTTGTATGAGAGTGGGGAGGGAAGGGGGAGGTGGAGGGATGGGGAA
TGGGCCGGGATGGGATAGCGCAAACTGCCCGGGAAGGGAAACCAGCACTG
TACAGACCTGAACAACGAAGATGGCATATTTTGTTCAGGGAATGGTGAAT
TAAGTGTGGCAGGAATGCTTTGTAGACACAGTAATTTGCTTGTATGGAAT
TTTGCCTGAGAGACCTCATTGCAGTTTCTGATTTTTTGATGTCTTCATCC
ATCACTGTCCTTGTCAAATAGTTTGGAACAGGTATAATGATCACAATAAC
CCCAAGCATAATATTTCGTTAATTCTCACAGAATCACATATAGGTGCCAC
AGTTATCCCCATTTTATGAATGGAGTBootstrappingGATGAAAACCT
TAGGAATAATGAATGATTTGCGCAGGCTCACCTGGATATTAAGACTGAGT
CAAATGTTGGGTCTGGTCTGACTTTAATGTTTGCTTTGTTCATGAGCACC
ACATATTGCCTCTCCTATGCAGTTAAGCAGGTAGGTGACAGAAAAGCCCA
TGTTTGTCTCTACTCACACACTTCCGACTGAATGTATGTATGGAGTTTCT
ACACCAGATTCTTCAGTGCTCTGGATATTAACTGGGTATCCCATGACTTT
ATTCTGACACTACCTGGACCTTGTCAAATAGTTTGGACCTTGTCAAATAG
TTTGGAGTCCTTGTCAAATAGTTTGGGGTTAGCACAGACCCCACAAGTTA
GGGGCTCAGTCCCACGAGGCCATCCTCACTTCAGATGACAATGGCAAGTC
CTAAGTTGTCACCATACTTTTGACCAACCTGTTACCAATCGGGGGTTCCC
GTAACTGTCTTCTTGGGTTTAATAATTTGCTAGAACAGTTTACGGAACTC
AGAAAAACAGTTTATTTTCTTTTTTTCTGAGAGAGAGGGTCTTATTTTGT
TGCCCAGGCTGGTGTGCAATGGTGCAGTCATAGCTCATTGCAGCCTTGAT
TGTCTGGGTTCCAGTGGTTCTCCCACCTCAGCCTCCCTAGTAGCTGAGAC
TACATGCCTGCACCACCACATCTGGCTAGTTTCTTTTATTTTTTGTATAG
ATGGGGTCTTGTTGTGTTGGCCAGGCTGGCCACAAATTCCTGGTCTCAAG
TGATCCTCCCACCTCAGCCTCTGAAAGTGCTGGGATTACAGATGTGAGCC
ACCACATCTGGCCAGTTCATTTCCTATTACTGGTTCATTGTGAAGGATAC
ATCTCAGAAACAGTCAATGAAAGAGACGTGCATGCTGGATGCAGTGGCTC
ATGCCTGTAATCTCAGCACTTTGGGAGGCCAAGGTGGGAGGATCGCTTAA
ACTCAGGAGTTTGAGACCAGCCTGGGCAACATGGTGAAAACCTGTCTCTA
TAAAAAATTAAAAAATAATAATAATAACTGGTGTGGTGTTGTGCACCTAG
AGTTCCAACTACTAGGGAAGCTGAGATGAGAGGATACCTTGAGCTGGGGA
CTGGGGAGGCTTAGGTTACAGTAAGCTGAGATTGTGCCACTGCACTCCAG
CTTGGACAAAAGAGCCTGATCCTGTCTCAAAAAAAAGAAAGATACCCAGG
GTCCACAGGCACAGCTCCATCGTTACAATGGCCTCTTTAGACCCAGCTCC
TGCCTCCCAGCCTTCT
62
How confident are we in this tree?
63
Bootstrapping
A statistical method that can be used to place
confidence intervals on phylogenies
64
Resampling from the Data
human_myoglobin
-GLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHL
... pig_myoglobin
-GLSDGEWQLVLNVWGKVEADVAGHGQEVLIRLFKGHPETLEKFDK
FKHL ... horse_myoglobin
-GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKF
DKFKHL ... common_seal_myoglobin
-GLSEGEWQLVLNVWGKVEADLAGHGQDVLIRLFKGHPETLEKFDKFKHL
... sperm_whale_myoglobin
MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHL
... sea_hare_myoglobin
-SLSAAEADLAGKSWAPVFANKDANGDAFLVALFEKFPDSANFFADFKG-
...
Pick with replacement human_myoglobin
LQKWDQKHNVHTEFGAEELQGDKLSWKKLDQGKKVVKKELGLDEDEWL
GE pig_myoglobin
LQKWDQKHNVHTEFGAEELQGDKLSWKKLDQGKKVVKKELGLDEDEWLGE
horse_myoglobin LQKWDQTHNVHTEFGAEELQG
DKLSWKTLDQGKKVVTKELGQDEDEWLGE common_seal_myoglobi
n LQKWEQKHNVHTEFGADELQGDKLSWKKLDQGKKVVKKELGL
DEDDWLGE -sperm_whale_myoglobin
LQRWEQKHHVHTEFAADELQGDKLSWKKLDQGRKVVKKELGLDEDDWLGE
sea_hare_myoglobin LDDWADENKSNSNFAAAELD
ANFASAPELNDGDKVAEKFAALNNAAWAAN
65
Estimating Confidence from the Resamplings
41/100
28/100
31/100
Gorilla
Human
Chimpanzee
Human
Chimpanzee
Human
Gibbon
Gibbon
Gibbon
Gorilla
Chimpanzee
Gorilla
Orang-utan
Orang-utan
Orang-utan
Chimpanzee
Human
41
Gibbon
100
The End, Thanks
Gorilla
Orang-utan
About PowerShow.com