Phylogenies and the Tree of Life - PowerPoint PPT Presentation

Loading...

PPT – Phylogenies and the Tree of Life PowerPoint presentation | free to download - id: 67b79-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Phylogenies and the Tree of Life

Description:

Phylogenies and the Tree of Life. Basic Principles of Phylogenetics ... Donkey. Gibbon. Monkey. Rabbit. Cow. Rat. Pig. Horse. Goat. Llama. Sheep. Dog ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 26
Provided by: stati3
Learn more at: http://www.stats.ox.ac.uk
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Phylogenies and the Tree of Life


1
Phylogenies and the Tree of Life
Basic Principles of Phylogenetics Parsimony -
Distance - Likelihood Topologies - Super Trees -
Testing Networks Challenges Empirical
Investigations Molecular Clock
Biochemical rates Selection Strength
Tree shapes Branching Patterns
Rootings Open Questions
2
Central Principles of Phylogeny Reconstruction
TTCAGT TCCAGT GCCAAT GCCAAT
3
From Distance to Phylogenies
What is the relationship of a, b, c, d e?
4
Enumerating Trees Unrooted valency 3
Recursion Tn (2n-5) Tn-1
Initialisation T1 T2 T31
4 5 6 7 8 9 10 15 20
3 15 105 945 10345 1.4 105 2.0 106 7.9 1012 2.2 1020
5
Heuristic Searches in Tree Space
Nearest Neighbour Interchange
Subtree regrafting
Subtree rerooting and regrafting
6
Assignment to internal nodes The simple way.
What is the cheapest assignment of nucleotides to
internal nodes, given some (symmetric) distance
function d(N1,N2)??
If there are k leaves, there are k-2 internal
nodes and 4k-2 possible assignments of
nucleotides. For k22, this is more than 1012.

7
5S RNA Alignment Phylogeny Hein, 1990
Transitions 2, transversions 5 Total weight
843.
10 tatt-ctggtgtcccaggcgtagaggaaccacaccgatccatctcga
acttggtggtgaaactctgccgcggt--aaccaatact-cg-gg-ggggg
ccct-gcggaaaaatagctcgatgccagga--ta 17
t--t-ctggtgtcccaggcgtagaggaaccacaccaatccatcccgaact
tggtggtgaaactctgctgcggt--ga-cgatact-tg-gg-gggagccc
g-atggaaaaatagctcgatgccagga--t- 9
t--t-ctggtgtctcaggcgtggaggaaccacaccaatccatcccgaact
tggtggtgaaactctattgcggt--ga-cgatactgta-gg-ggaagccc
g-atggaaaaatagctcgacgccagga--t- 14
t----ctggtggccatggcgtagaggaaacaccccatcccataccgaact
cggcagttaagctctgctgcgcc--ga-tggtact-tg-gg-gggagccc
g-ctgggaaaataggacgctgccag-a--t- 3
t----ctggtgatgatggcggaggggacacacccgttcccataccgaaca
cggccgttaagccctccagcgcc--aa-tggtact-tgctc-cgcaggga
g-ccgggagagtaggacgtcgccag-g--c- 11
t----ctggtggcgatggcgaagaggacacacccgttcccataccgaaca
cggcagttaagctctccagcgcc--ga-tggtact-tg-gg-ggcagtcc
g-ctgggagagtaggacgctgccag-g--c- 4
t----ctggtggcgatagcgagaaggtcacacccgttcccataccgaaca
cggaagttaagcttctcagcgcc--ga-tggtagt-ta-gg-ggctgtcc
c-ctgtgagagtaggacgctgccag-g--c- 15
g----cctgcggccatagcaccgtgaaagcaccccatcccat-ccgaact
cggcagttaagcacggttgcgcccaga-tagtact-tg-ggtgggagacc
gcctgggaaacctggatgctgcaag-c--t- 8
g----cctacggccatcccaccctggtaacgcccgatctcgt-ctgatct
cggaagctaagcagggtcgggcctggt-tagtact-tg-gatgggagacc
tcctgggaataccgggtgctgtagg-ct-t- 12
g----cctacggccataccaccctgaaagcaccccatcccgt-ccgatct
gggaagttaagcagggttgagcccagt-tagtact-tg-gatgggagacc
gcctgggaatcctgggtgctgtagg-c--t- 7
g----cttacgaccatatcacgttgaatgcacgccatcccgt-ccgatct
ggcaagttaagcaacgttgagtccagt-tagtact-tg-gatcggagacg
gcctgggaatcctggatgttgtaag-c--t- 16
g----cctacggccatagcaccctgaaagcaccccatcccgt-ccgatct
gggaagttaagcagggttgcgcccagt-tagtact-tg-ggtgggagacc
gcctgggaatcctgggtgctgtagg-c--t- 1
a----tccacggccataggactctgaaagcactgcatcccgt-ccgatct
gcaaagttaaccagagtaccgcccagt-tagtacc-ac-ggtgggggacc
acgcgggaatcctgggtgctgt-gg-t--t- 18
a----tccacggccataggactctgaaagcaccgcatcccgt-ccgatct
gcgaagttaaacagagtaccgcccagt-tagtacc-ac-ggtgggggacc
acatgggaatcctgggtgctgt-gg-t--t- 2
a----tccacggccataggactgtgaaagcaccgcatcccgt-ctgatct
gcgcagttaaacacagtgccgcctagt-tagtacc-at-ggtgggggacc
acatgggaatcctgggtgctgt-gg-t--t- 5
g---tggtgcggtcataccagcgctaatgcaccggatcccat-cagaact
ccgcagttaagcgcgcttgggccagaa-cagtact-gg-gatgggtgacc
tcccgggaagtcctggtgccgcacc-c--c- 13
g----ggtgcggtcataccagcgttaatgcaccggatcccat-cagaact
ccgcagttaagcgcgcttgggccagcc-tagtact-ag-gatgggtgacc
tcctgggaagtcctgatgctgcacc-c--t- 6
g----ggtgcgatcataccagcgttaatgcaccggatcccat-cagaact
ccgcagttaagcgcgcttgggttggag-tagtact-ag-gatgggtgacc
tcctgggaagtcctaatattgcacc-c-tt-
8
Cost of a history - minimizing over internal
states
A C G T
9
Cost of a history leaves (initialisation).
A C G T
Initialisation leaves Cost(N) 0 if N is
at leaf, otherwise infinity
G
A
Empty Cost 0
Empty Cost 0
10
Fitch-Hartigan-Sankoff Algorithm
(A,C,G,T) (9,7,7,7)
(A, C, G,T) (10,2,10,2)
The cost of cheapest tree hanging from this node
given there is a C at this node
(A,C,G,T) 0
(A,C,G,T) 0
(A,C,G,T) 0
5
C
A
2
T
G
11
The Felsenstein Zone Felsenstein-Cavendar (1979)
Patterns(16 only 8 shown) 0 1 0 0 0
0 0 0 0 0 1 0 0 1 0 1 0 0 0 1
0 1 1 0 0 0 0 0 1 0 1 1
12
Bootstrapping Felsenstein (1985)
ATCTGTAGTCT ATCTGTAGTCT ATCTGTAGTCT ATCTGTAGTCT 10
230101201
13
Assignment to internal nodes The simple way.
If branch lengths and evolutionary process is
known, what is the probability of nucleotides at
the leaves?
Cctacggccatacca a ccctgaaagcaccccatcccgt
Cttacgaccatatca c cgttgaatgcacgccatcccgt
Cctacggccatagca c ccctgaaagcaccccatcccgt
Cccacggccatagga c ctctgaaagcactgcatcccgt
Tccacggccatagga a ctctgaaagcaccgcatcccgt
Ttccacggccatagg c actgtgaaagcaccgcatcccg Tggt
gcggtcatacc g agcgctaatgcaccggatccca
Ggtgcggtcatacca t gcgttaatgcaccggatcccat
14
Probability of leaf observations - summing over
internal states
A C G T
15
Output from Likelihood Method.
Likelihood 7.910-14 ?? ? 0.31 0.18
Likelihood 6.210-12 ?? ? 0.34 0.16
ln(7.910-14) ln(6.210-12) is ?2 distributed
with (n-2) degrees of freedom
16
The Molecular Clock
First noted by Zuckerkandl Pauling (1964) as an
empirical fact. How can one detect it?
17
Rootings
Purpose 1) To give time direction in the
phylogeny most ancient point 2) To be able to
define concepts such a monophyletic group.
1) Outgrup Enhance data set with sequence from
a species definitely distant to all of them. It
will be be joined at the root of the original data
2) Midpoint Find midpoint of longest path in
tree.
3) Assume Molecular Clock.
18
Rooting the 3 kingdoms
3 billion years ago no reliable clock - no
outgroup Given 2 set of homologous proteins, i.e.
MDH LDH can the archea, prokaria and eukaria be
rooted?
Given 2 set of homologous proteins, i.e. MDH
LDH can the archea, prokaria and eukaria be
rooted?
19
The generation/year-time clock Langley-Fitch,1973
20
The generation/year-time clock Langley-Fitch,1973
Can the generation time clock be tested?
21
The generation/year-time clock Langley-Fitch,1973
k3, t2 dg4 k, t dg (2k-3)-(t-1)
22
  • b globin, cytochrome c, fibrinopeptide A
    generation time clock
  • Langley-Fitch,1973
  • Relative rates
  • a-globin 0.342
  • globin 0.452
  • cytochrome c 0.069
  • fibrinopeptide A 0.137

23
Almost Clocks (MJ Sanderson (1997) A
Nonparametric Approach to Estimating Divergence
Times in the Absence of Rate Constancy
Mol.Biol.Evol.14.12.1218-31), J.L.Thorne et al.
(1998) Estimating the Rate of Evolution of the
Rate of Evolution. Mol.Biol.Evol.
15(12).1647-57, JP Huelsenbeck et al. (2000) A
compound Poisson Process for Relaxing the
Molecular Clock Genetics 154.1879-92. )
I Smoothing a non-clock tree onto a clock tree
(Sanderson)
II Rate of Evolution of the rate of Evolution
(Thorne et al.). The rate of evolution can change
at each bifurcation
III Relaxed Molecular Clock (Huelsenbeck et al.).
At random points in time, the rate changes by
multiplying with random variable (gamma
distributed)
Comment Makes perfect sense. Testing no clock
versus perfect is choosing between two
unrealistic extremes.
24
Spannoids
Advantage Decomposes large trees into small
trees Questions How to find optimal spannoid?
How well do they approximate?
25
Profiloids and Staroids
Questions Parameter changes on edges
relating HMMs Choosing Optimal Staroids
About PowerShow.com