Phylogenetic Trees Lecture 12 - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Phylogenetic Trees Lecture 12

Description:

Phylogenetic Trees Lecture 12 Based on pages 160-176 in Durbin et al (the black text book). This class has been edited from Nir Friedman s lecture which was ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 40
Provided by: NirF99
Learn more at: http://www.cs.technion.ac.il
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Phylogenetic Trees Lecture 12


1
Phylogenetic TreesLecture 12
Based on pages 160-176 in Durbin et al (the black
text book).
This class has been edited from Nir Friedmans
lecture which was available at www.cs.huji.ac.il/
nir. Pictures from Tal Pupko slides. Changes by
Dan Geiger and Shlomo Moran.
2
Evolution
  • Evolution of new organisms is driven by
  • Diversity
  • Different individuals carry different variants of
    the same basic blue print
  • Mutations
  • The DNA sequence can be changed due to single
    base changes, deletion/insertion of DNA segments,
    etc.
  • Selection bias

3
The Tree of Life
Source Alberts et al
4
Tree of life- a better picture
Daprès Ernst Haeckel, 1891
5
Primate evolution
A phylogeny is a tree that describes the sequence
of speciation events that lead to the forming of
a set of current day species also called a
phylogenetic tree.
6
Morphological vs. Molecular
  • Classical phylogenetic analysis morphological
    features number of legs, lengths of legs, etc.
  • Modern biological methods allow to use molecular
    features
  • Gene sequences
  • Protein sequences
  • Analysis based on homologous sequences (e.g.,
    globins) in different species
  • Important for many aspects of biology
  • Classification
  • Understanding biological mechanisms

7
Morphological topology
(Based on Mc Kenna and Bell, 1997)
Archonta
Ungulata
8
From sequences to a phylogenetic tree
Rat QEPGGLVVPPTDA Rabbit QEPGGMVVPPTDA Gorilla QE
PGGLVVPPTDA Cat REPGGLVVPPTEG
There are many possible types of sequences to use
(e.g. Mitochondrial vs Nuclear proteins).
9
Mitochondrial topology
(Based on Pupko et al.,)
10
Nuclear topology
(Based on Pupko et al. slide)
(tree by Madsenl)
11
Theory of Evolution
  • Basic idea
  • speciation events lead to creation of different
    species.
  • Speciation caused by physical separation into
    groups where different genetic variants become
    dominant
  • Any two species share a (possibly distant) common
    ancestor

12
Phylogenenetic trees
  • Leafs - current day species
  • Nodes - hypothetical most recent common ancestors
  • Edges length - time from one speciation to the
    next

13
Dangers in Molecular Phylogenies
  • Gene and protein sequences can be homologous for
    various reasons
  • Orthologs -- sequences diverged after a
    speciation event. Indicative of a new specie.
  • Paralogs -- sequences diverged after a
    duplication event.
  • Xenologs -- sequences diverged after a horizontal
    transfer (e.g., by virus).

14
Gene Phylogenies
Phylogenies can be constructed to describe
evolution genes.
Three species termed 1,2,3. Two paralog genes A
and B.
15
Dangers of Paralogs
  • If we happen to consider only species 1A, 2B, and
    3A, we get a wrong tree that does not represent
    the phylogeny of the host species of the given
    sequences because duplication does not create new
    species.

Gene Duplication
Speciation events
2B
1B
3A
3B
2A
1A
In the sequel we assume all given sequences are
orthologs.
16
Types of Trees
  • A natural model to consider is that of rooted
    trees

Common Ancestor
17
Types of trees
  • Unrooted tree represents phylogeny without the
    root node

Depending on the model, data from current day
species does not distinguish between different
placements of the root. In this example there
are seven possible ways to place a root.
18
Rooted versus unrooted trees
Tree c
b
a
c
Represents the three rooted trees
Slide by Tal Pupko
19
Positioning Roots in Unrooted Trees
  • We can estimate the position of the root by
    introducing an outgroup
  • a set of species that are definitely distant from
    all the species of interest

Proposed root
Falcon
Aardvark
Bison
Chimp
Dog
Elephant
20
Type of Data
  • Distance-based
  • Input is a matrix of distances between species
  • Can be fraction of residue they disagree on, or
    alignment score between them, or
  • Character-based
  • Examine each character (e.g., residue) separately

21
Three Methods of Tree Construction
  • Distance- A tree that recursively combines two
    nodes of the smallest distance.
  • Parsimony A tree with a total minimum number of
    character changes between nodes.
  • Maximum likelihood - Finding the best Bayesian
    network of a tree shape. The method of choice
    nowadays. Most known and useful software called
    phylip uses this method. http//evolution.genetics
    .washington.edu/phylip.html

22
Distance-Based (1st type Method)
  • Input distance matrix between species
  • Outline
  • Cluster species together
  • Initially clusters are singletons
  • At each iteration combine two closest clusters
    to get a new one

23
UPGMA Clustering
  • Let Ci and Cj be clusters, define distance
    between them to be
  • When we combine two cluster, Ci and Cj, to form a
    new cluster Ck, then
  • Define a node K and place its daughter nodes at
    depth d(Ci,Cj)/2

24
Example
UPGMA construction on five objects. The length of
an edge its (vertical) height.
9
8
0.5d(7,8)
6
7
0.5d(2,3)
2
3
4
5
1
25
Molecular clock
This phylogenetic tree has all leaves in the same
level. When this property holds, the
phylogenetic tree is said to satisfy a molecular
clock. Namely, the time from a speciation event
to the formation of current species is identical
for all paths (wrong assumption in reality).
26
Molecular Clock
UPGMA constructs trees that satisfy a molecular
clock, even if the true tree does not satisfy a
molecular clock.
UPGMA
27
Restrictive Correctness of UPGMA
Proposition If the distance function is derived
by adding edge distances in a tree T with a
molecular clock, then UPGMA will reconstruct T.
28
Additivity
  • Molecular clock defines additive distances,
    namely, distances between objects can be realized
    by a tree

29
Basic property of Additivity
  • Suppose input distances are additive
  • For any three leaves
  • Thus

m
c
b
j
a
k
i
30
Constructing additive treesThe neighbor finding
problem
  • Can we use this fact to construct trees assuming
    only additivity (but not a molecular clock)?

Yes. The formula shows that if we knew that i
and j are neighboring leaves, then we can
construct their parent node k and compute the
distances of k to all other leaves m. We remove
nodes i,j and add k.
31
Neighbor Finding
  • How can we find from distances alone that a pair
    of nodes i,j are neighboring leaves?
  • Closest nodes arent necessarily neighbors.

Next we show one way to find neighbors from
additive distances.
32
Neighbor Finding
Theorem (SaitouNei) Assume all edge weights are
positive. If D(i,j) is minimal (among all pairs
of leaves), then i and j are neighboring leaves
in the tree.
33
Neighbor Joining Algorithm
  • Set L to contain all leaves
  • Iteration
  • Choose i,j such that D(i,j) is minimal
  • Create new node k, and set
  • remove i,j from L, and add k
  • Terminatewhen L 2, connect two remaining
    nodes

34
Neighbor Finding
Notations used in the proof p(i,j) the path
from vertex i to vertex j P(D,C) (e1,e2,e3)
(D,E,F,C)
For a vertex i, and an edge e(i,j) Ni(e)
k e is on p(i,k). ND(e1) 3, ND(e2) 2,
ND(e3) 1 NC(e1) 1
E
F
35
Neighbor Finding
Notation For e(i,m), we denote d(i,m) by d(e).
Rest of T
k
l
i
j
36
Neighbor Finding
Proof of Theorem Assume by contradiction that
D(i,j) is minimal for i,j which are not
neighboring leaves. Let (i,l,...,k,j) be the path
from i to j. Let T1 and T2 be the subtrees
rooted at l and k. Let T denote the number
of leaves in T.
37
Neighbor Finding
Case 1 i or j has a neighboring leaf. WLOG j and
m are such leaves. A. D(i,j) - D(m,j)(L-2)(d(i,j)
- d(j,m) ) (rirj) rm rj
Definition (L-2)(d(i,k)-d(k,m) )rm-ri

Figure
B. rm-ri (L-2)(d(k,m)-d(i,l)) (4-L)d(k,l)
LemmaFigure (since for each
edge e?P(k,l), Nm(e)2 and Ni(e) ? L-2, so
Nm(e)- Ni(e ) 4-L )
Substituting B in A D(i,j) - D(m,j)
(L-2)(d(i,k)-d(i,l)) (4-L)d(k,l) 2d(k,l) gt 0,
contradicting the minimality assumption.
38
Neighbor Finding
Case 2 Not case 1. Then both T1 and T2 contain 2
neighboring leaves. We show that if D(i,j) is
minimal, then we must have both T1 gt T2 and
T2 gt T1 - which is a contradiction, hence
D(i,j) is not minimal.
We prove that T1 gt T2 by assuming that T1
T2 and reaching a contradiction. The proof
that T2 gt T1 is similar. Let n,m be
neighboring leaves in T1.
39
Neighbor Finding
A. 0 D(m,n) - D(i,j) (L-2)(d(m,n) - d(i,j) )
(rirj) (rmrn)
B. rj-rmlt (L-2)(d(j,k) d(m,p))
(T1-T2)d(k,p) (Because Nj(e)- Nm(e ) lt
T1-T2).
C. ri-rn lt (L-2)(d(i,k) d(n,p))
(T1-T2)d(l,p) Adding B and C, noting that
d(l,p)gtd(k,p) and using the assumption T1 -
T2 0 D. (rirj) (rmrn) lt
(L-2)(d(i,j)-d(n,m)) 2(T1-T2)d(k,p)
Substituting D in the right hand side of A 0
D(m,n) - D(i,j)lt 2(T1-T2)d(k,p), hence
T1-T2 gt 0, a contradiction.
About PowerShow.com