Phylogenetic trees - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Phylogenetic trees

Description:

PHYLIP package contains Fitch algorithm. Triangle method ... Fitch. Continue in this fashion, add each species to all possible places. ... – PowerPoint PPT presentation

Number of Views:162
Avg rating:3.0/5.0
Slides: 38
Provided by: sper
Category:

less

Transcript and Presenter's Notes

Title: Phylogenetic trees


1
Phylogenetic trees
  • Jana Sperschneider
  • Institute for Computer Science
  • Albert-Ludwigs-Universität, Freiburg

2
Review from last talk
  • All organisms share a common ancestor
  • Phylogenetic tree displays evolutionary distances
    between objects
  • Molecular phylogenetic analysis
  • Choose homologous sequences
  • Build distance matrix
  • UPGMA or Neighbour Joining algorithmus

3
Contents
  • 1. Properties of phylogenetic trees
  • Ultrametric trees
  • Additive trees
  • 2. Distance based methods
  • Least-squares method
  • UPGMA
  • Neighbor-Joining
  • Maximum Parsimony methods
  • Maximum Likelihood methods
  • 3. Tree construction using partial distance
    matrices
  • 4. Summary

4
Metric spaces
  • A metric space is a set of objects X such that to
    every pair we associate a
    nonnegative real number with the following
    properties
  • An ultrametric space has the special ultrametric
    inequality
  • An additive space has the special additive
    inequality

5
Ultrametric trees
  • An ultrametric tree is characterized by the
    3-point condition
  • For all objects two of the
    distances are equal and the third one is
    smaller.

1. 2. 3. i j
k j k i i
k j Ultrametric tree
assumes constant molecular clock.
6
Additive trees
  • An additive tree is characterized by the 4-point
    condition
  • Given any four objects of X we can label them i,
    j, k, l such that
  • two of the sums are equal and the third sum is
    smaller than the first two.
  • i j
  • l k

7
Contents
  • 1. Properties of phylogenetic trees
  • Ultrametric trees
  • Additive trees
  • 2. Distance based methods
  • Least-squares method
  • UPGMA
  • Neighbor-Joining
  • Maximum Parsimony methods
  • Maximum Likelihood methods
  • 3. Tree construction using partial distance
    matrices
  • 4. Summary

8
Distance based methods
  • In reality, input data is neither additive nor
    ultrametric.
  • Find the tree that minimizes the squared error
    between the distance from the input matrix d(i,j)
    and the distance in the tree
  • This optimization problem is NP-hard !
  • Least-squares criterion
  • Weighted least-squares criterion

9
Distance based methods
  • Polynomial heuristics
  • UPGMA
  • very fast
  • - always produces an ultrametric, rooted tree
  • Neighbor Joining
  • fast
  • works well in practice
  • does not assume molecular clock

10
Maximum Parsimony
  • Choose tree that explains data using the minimal
    number of substitutions.
  • Two computational subproblems
  • 1. Find the parsimony cost of a given tree
    (easy).
  • 2. Search through all the tree topologies
    (NP-hard).
  • Number of possible trees

11
Maximum Parsimony
  • Example Sequences for 4 objects 3 possible
    unrooted trees
  • 1. AC
  • 2. TC
  • 3. TG
  • 4. TG
  • Output Most parsimonous tree (Tree 1)

AC TG TC TG AC TC Tree 1 2
mutations TG TG AC TG Tree 23 3
mutations TG TC
12
Maximum Parsimony
  • Search through all tree topologies not possible
    for more than 12 objects
  • Heuristic methods
  • useful for small sequences that are quite
    similar
  • - Results often not optimal
  • Branch and Bound
  • Does not have to evaluate every possible tree
  • - Method is limited

13
Maximum Likelihood
  • Search through all trees to find the one with
    highest probability
  • Assumed to be NP-complete
  • For a given tree, we can calculate its
    likelihood score
  • Markov model
  • Dynamic programming
  • Very time-consuming

14
Which method to use?
  • Distance based
  • fast
  • Maximum Parsimony
  • Strong sequence similarity
  • Maximum Likelihood
  • Very slow
  • Use only for small number of sequences
  • Most packages (Phylip, TRex) use software for all
    three methods.

15
Contents
  • Properties of phylogenetic trees
  • Ultrametric trees
  • Additive trees
  • Distance based methods
  • Least-squares method
  • UPGMA
  • Neighbor-Joining
  • Maximum Parsimony methods
  • Maximum Likelihood methods
  • 3. Tree construction using partial distance
    matrices
  • 4. Summary

16
Incomplete distance matrix
  • How do we construct a phylogenetic tree from an
    incomplete distance matrix?
  • 1. Estimate missing cells
  • Ultrametric procedure (De Soete, 1984)
  • Additive procedure (Landry, 1996)
  • 2. Constructive tree algorithms
  • Triangle (Guénoche, 2001)
  • Fitch (Felsenstein, 1997)
  • Method of weights (MW) (Makarenkov, 2004)

17
1. Estimate missing cells
  • Ultrametric procedure
  • Use ultrametric inequality
  • Given a missing entry d(i,j), look for an index
    k with known entries d(i,k) and d(j,k).
  • The d(i,j) is set to the greatest of the two
    others if and only if they are different.
  • does not always return a complete distance
    matrix
  • time complexity O(n³)

18
1. Estimate missing cells
  • Additive procedure
  • Use additive inequality
  • Given a missing entry d(i,j), look for indices
    k,l such that d(i,k), d(j,k), d(i,l), d(j,l), and
    d(k,l) are known entries.
  • The entry d(i,j) is set to the greatest of the
    two sums (minus d(k,l)) if and only if they are
    different.
  • does not always return a complete distance
    matrix
  • time complexity O(n4)

19
2. Constructive tree algorithms
  • PHYLIP package contains Fitch algorithm
  • Triangle method
  • TRex package contains Method of weights (MW)

20
2. Constructive tree algorithmsFitch
  • branch lengths are estimated by minimizing the
    weighted SSQ for a given tree topology
  • Fitch-Margoliash criterion
  • Greater distances are given less weight

21
2. Constructive tree algorithmsFitch
  • The algorithm
  • Start with three species in an unrooted tree.
  • Calculate least squares branch lengths.
  • Where can the fourth species be added?
  • 3 possible places, try them all
  • Calculate least squares branch lengths for each
    topology ---gt choose the one with the smallest
    SSQ

22
2. Constructive tree algorithmsFitch
  • Continue in this fashion, add each species to all
    possible places. Pick the placement that
    minimizes the SSQ.
  • After adding a species, a series of
    rearrangements is carried out, where branch
    lengths are recalculated.
  • Complexity O(n4)

23
2. Constructive tree algorithmsFitch
  • What if we have missing entries in the distance
    matrix?
  • Use the following weight option
  • n(i,j) 0,1
  • Missing entries n(i,j) 0
  • Known entries n(i,j) 1
  • Known entry i, unknown entry j n(i,j) 0.5

24
2. Constructive tree algorithmsTriangle
  • Uses only a subset of the distance matrix
  • A new element is placed in the tree according to
    two previously examined elements (triangle)
  • Quite complicated algorithm
  • For partial distance matrices O(n³)

25
2. Constructive tree algorithmsMethod of weights
(MW)
  • Constructs a tree from a distance matrix with
    missing entries
  • Step A Use ultrametric or additive procedure
    to estimate missing entries.
  • Compute weight matrix W.
  • Step B Apply weighted least-squares fitting
    algorithm.

26
2. Constructive tree algorithmsMethod of weights
(MW)
  • Step A Use ultrametric or additive procedure
    to estimate missing entries.
  • Ultrametric for high percentages of missing
    entries and small distance matrices
  • Additive for low number of missing entries and
    bigger distance matrices
  • 24x24 matrix lt 40 missing entries -gt additive
    procedure

27
2. Constructive tree algorithmsMethod of weights
(MW)
  • Step A Compute weight matrix W.
  • Both ultrametric or additive procedure do not
    necessarily return complete distance matrix.
  • Compute weight matrix W

28
2. Constructive tree algorithmsMethod of weights
(MW)
  • Step B Apply weighted least-squares fitting
    algorithm. Given distance matrix D and weight
    matrix W.
  • Choose taxa i and j, such that d(i,j) is a known
    distance in D.
  • Gives tree T2. i j
  • Place taxon k in the tree T3, which maximizes the
    sum of weights w(i,k) w(j,k). If we have more
    than one candidate, choose the one that minimizes
    SSQ. If we still have more than one candidate,
    choose the one that has the greatest score
    for i j
  • 3. Continue in this manner k

29
2. Constructive tree algorithmsMethod of weights
(MW)
  • No fixed weighting function
  • Time complexity O(n³)
  • If we carry out the algorithm for all possible
    pairs of taxa in the first step, we get O(n5).

30
Data set and experiment
  • Two distance matrices D
  • 20 mammals (20x20 matrix)
  • 34 species (34x34 matrix)
  • Random deletion of fixed number of entries from D
  • 50 to 100 known values
  • Apply
  • Triangle
  • Fitch
  • MW
  • Ultrametric procedure followed by MW with weights
    set to 1
  • Additive procedure followed by MW with weights
    set to 1

31
Results Computational time34 x 34 matrix
32
Results Tree construction20 x 20 matrix
  • Lower RF value indicates better tree recovery

33
Results Tree construction34 x 34 matrix
  • Lower RF value indicates better tree recovery

34
Results
  • Triangle
  • very fast
  • - worst results, never recovered the correct tree
  • Fitch
  • best results for 34 x 34 matrix
  • - slow
  • MW
  • best results for 20 x 20 matrix
  • fast for a high percentage of missing entries
  • Additive procedure MW
  • good results for 20 x 20 matrix
  • - worst performance for high number of missing
    cells
  • Ultrametric procedure MW
  • good results for higher number of missing cell

35
Summary
  • Phylogenetic trees are an important tool in
    understanding how objects evolve through time
  • Real data not perfect, leads to optimization
    problems
  • Can be NP-hard, use heuristics
  • Methods
  • Distance based
  • Maximum Parsimony
  • Maximum Likelihood
  • No method is superior under all conditions

36
References
  • Böckenhauser, Bongartz. Algorithmische Grundlagen
    der Bioinformatik. Teubner 2003.
  • Makarenkov, Lapointe. (2004). A weighted
    least-squares approach for inferring phylogenies
    from incomplete distance matrices. Bioinformatics
    20, 2113-2121.
  • Felsenstein. (1997). An alternating least squares
    approach to inferring phylogenies from pairwise
    distances. Syst. Zool., 46, 101-111.
  • Guénoche, Leclerc. (2001). The triangle method to
    build X-trees from incomplete distance matrices.
    RAIRO Oper. Res., 35, 283-300.
  • Ron Shamir, Phylogenetic Trees,
    www.math.tau.ac.il/rshamir/algmb/01/scribe08/lec0
    8.pdf
  • Mona Singh, Phylogenetics,
  • http//www.cs.princeton.edu/mona/Lecture/phyloge
    ny-slides.pdf
  • Phylip Software Package http//evolution.genetics
    .washington.edu/phylip.html

37
The end
  • Thanks for listening!
  • Any questions?
Write a Comment
User Comments (0)
About PowerShow.com