Class 9: Phylogenetic Trees - PowerPoint PPT Presentation

About This Presentation
Title:

Class 9: Phylogenetic Trees

Description:

speciation events lead to creation of different species. Speciation caused by physical separation into groups where different genetic ... Primate evolution ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 31
Provided by: NirFri
Category:

less

Transcript and Presenter's Notes

Title: Class 9: Phylogenetic Trees


1
Class 9 Phylogenetic Trees
2
The Tree of Life
Daprès Ernst Haeckel, 1891
3
Evolution
  • Many theories of evolution
  • Basic idea
  • speciation events lead to creation of different
    species
  • Speciation caused by physical separation into
    groups where different genetic variants become
    dominant
  • Any two species share a (possibly distant) common
    ancestor

4
Phylogenies
  • A phylogeny is a tree that describes the sequence
    of speciation events that lead to the forming of
    a set of current day species
  • Leafs - current day species
  • Nodes - hypothetical most recent common ancestors
  • Edges length - time from one speciation to the
    next

Aardvark
Bison
Chimp
Dog
Elephant
5
Primate evolution
6
  • Until mid 1950s phylogenies were constructed by
    experts based on their opinion (subjective
    criteria)
  • The Linnaeus classification scheme implicitly
    assumes tree structure
  • Since then, focus on objective criteria for
    constructing phylogenetic trees
  • Thousands of articles in the last decades
  • Important for many aspects of biology
  • Classification (systematics)
  • Understanding biological mechanisms

7
Morphological vs. Molecular
  • Classical phylogenetic analysis morphological
    features
  • number of legs, lengths of legs, etc.
  • Modern biological methods allow to use molecular
    features
  • Gene sequences
  • Protein sequences
  • Analysis based on homologous sequences (e.g.,
    globins) in different species

8
Dangers in Molecular Phylogenies
  • We have to remember that gene/protein sequence
    can be homologous for different reasons
  • Orthologs -- sequences diverged after a
    speciation event
  • Paralogs -- sequences diverged after a
    duplication event
  • Xenologs -- sequences diverged after a horizontal
    transfer (e.g., by virus)

9
Dangers of Paralogues
Gene Duplication
Speciation events
2B
1B
3A
3B
2A
1A
10
Dangers of Paralogs
  • If we only consider 1A, 2B, and 3A...

Gene Duplication
Speciation events
2B
1B
3A
3B
2A
1A
11
Types of Trees
  • A natural model to consider is that of rooted
    trees

Common Ancestor
12
Types of Trees
  • Depending on the model, data from current day
    species does not distinguish between different
    placements of the root

vs
13
Types of trees
  • Unrooted tree represents the same phylogeny with
    out the root node

14
Positioning Roots in Unrooted Trees
  • We can estimate the position of the root by
    introducing an outgroup
  • a set of species that are definitely distant from
    all the species of interest

Proposed root
Falcon
Aardvark
Bison
Chimp
Dog
Elephant
15
Type of Data
  • Distance-based
  • Input is a matrix of distances between species
  • Can be fraction of residue they disagree on, or
    alignment score between them, or
  • Character-based
  • Examine each character (e.g., residue) separately

16
Simple Distance-Based Method
  • Input distance matrix between species
  • Outline
  • Cluster species together
  • Initially clusters are singletons
  • At each iteration combine two closest clusters
    to get a new one

17
UPGMA Clustering
  • Let Ci and Cj be clusters, define distance
    between them to be
  • When we combine two cluster, Ci and Cj, to form a
    new cluster Ck, then

18
Molecular Clock
  • UPGMA implicitly assumes that all distances
    measure time in the same way

2
3
2
3
4
1
1
4
19
Additivity
  • A weaker requirement is additivity
  • In real tree, distances between species are the
    sum of distances between intermediate nodes

k
c
b
j
a
i
20
Consequences of Additivity
  • Suppose input distances are additive
  • For any three leaves
  • Thus

k
c
b
j
a
m
i
21
Neighbor Joining
  • Can we use this fact to construct trees?
  • Let
  • where
  • Theorem if D(i,j) is minimal (among all pairs of
    leaves), then i and j are neighbors in the tree

22
Neighbor Joining
  • Set L to contain all leaves
  • Iteration
  • Choose i,j such that D(i,j) is minimal
  • Create new node k, and set
  • remove i,j from L, and add k
  • Terminatewhen L 2, connect two remaining
    nodes

23
Distance Based Methods
  • If we make strong assumptions on distances, we
    can reconstruct trees
  • In real-life distances are not additive
  • Sometimes they are close to additive

24
Parsimony
  • Character-based method
  • Assumptions
  • Independence of characters (no interactions)
  • Best tree is one where minimal changes take place

25
Simple Example
  • Suppose we have five species, such that three
    have C and two T at a specified position
  • Minimal tree has one evolutionary change

C
T
C
T
C
C
C
T
T ? C
26
Another Example
  • What is the parsimony score of

A CAGGTA B CAGACA C CGGGTA D TGCACT E TGCGTA
27
Evaluating Parsimony Scores
  • How do we compute the Parsimony score for a given
    tree?
  • Weighted Parsimony
  • Each change is weighted by the score c(a,b)

28
Evaluating Parsimony Scores
  • Dynamic programming on the tree
  • Initialization
  • For each leaf i set S(i,a) 0 if i is labeled by
    a, otherwise S(i,a) ?
  • Iteration
  • if k is node with children i and j, then S(k,a)
    minb(S(i,b)c(a,b)) minb(S(j,b)c(a,b))
  • Termination
  • cost of tree is minaS(r,a) where r is the root

29
Example
A CAGGTA B CAGACA C CGGGTA D TGCACT E TGCGTA
30
Cost of Evaluating Parsimony
  • If there are n nodes, m characters, and k
    possible values for each character, then
    complexity is O(nmk)
  • Using this procedure, we can reconstruct most
    parsimonious values at each ancestor node
Write a Comment
User Comments (0)
About PowerShow.com