Methods for Determining Trees - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Methods for Determining Trees

Description:

Fitch's Algorithm for Small Parsimony Problem ... Fitch's Algorithm for Small Parsimony Problem. Exhaustive Search: Number of Trees. 2N - 2 ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 51
Provided by: gg184
Category:

less

Transcript and Presenter's Notes

Title: Methods for Determining Trees


1
Methods for Determining Trees
  • Sequence Based methods
  • Maximum Parsimony
  • Maximum Likelihood
  • Distance Based methods
  • UPGMA
  • Neighbor Joining

2
Maximum Parsimony
  • A phylogeny constructed with the method of
    maximum parsimony explains evolution with the
    fewest evolutionary changes.
  • Multiple sequence alignment must first be
    obtained. Each aligned column is a site.

3
Maximum Parsimony Example
  • 1 A A G A G T G C A
  • 2 A G C C G T G C G
  • 3 A G A T A T C C A
  • 4 A G A G A T C C G
  • four sequences, nine sites, three possible
    unrooted trees

4
Maximum Parsimony Example
  • Possible Trees I

Number of Mutations 10
5
Maximum Parsimony Example
  • Possible Trees II

(1)AAGAGTGCA
(2)AGCCGTGCG
1
3
5
AGGAGTGCA
AGAGGTCCG
4
1
(3)AGATATCCA
(4)AGAGATCCG
Number of Mutations 14
6
Maximum Parsimony Example
  • Possible Trees III

(1)AAGAGTGCA
(2)AGCCGTGCG
1
3
5
AGGAGTGCA
AGATGTCCG
5
2
(3)AGATATCCA
(4)AGAGATCCG
Number of Mutations 16
Tree I has the topology with the least number of
mutations and thus is the most parsimonious tree.
7
Maximum Parsimony Example
  • Some sites are informative, others are not
  • Informative site there are at least two
    different kinds of nucleotides at the site, each
    of which is represented in at least two of the
    sequences under study.
  • Only informative sites are considered

8
Maximum Parsimony Example
  • 1 A A G A G T G C A
  • 2 A G C C G T G C G
  • 3 A G A T A T C C A
  • 4 A G A G A T C C G
  • Three informative columns

9
Maximum Parsimony Example
  • 1 G G A
  • 2 G G G
  • 3 A C A
  • 4 A C G
  • Tree 1 4
  • Tree 2 5
  • Tree 3 6

Column 1
Column 2
Column 3
Is a substitution
10
Maximum Parsimony Problems
  • Small Parsimony Problem
  • Given the phylogeny topology, compute the
    internal nodes to minimize the total number of
    mutations
  • Used to evaluate the phylogeny
  • Polynomial time solvable.
  • Large Parsimony Problem
  • Given that we have a way of determining the score
    of a given phylogeny, search through all possible
    phylogenies to find the best one
  • Proved to be NP-complete.

11
Fitchs Algorithm for Small Parsimony Problem
  • Consider each site separately
  • Dynamic programming style
  • Constructs a set of possible states (possible
    nucleotides) for each internal node
  • Start at the leaves of the phylogeny. Each leaf
    is labeled with the singleton set containing the
    nucleotide at that particular site.
  • Traverse in a postorder manner (all of the
    children of the current node have been visited
    before the current node).

12
Fitchs Algorithm for Small Parsimony Problem
  • If m is an internal node with children l and r
    having states Sl and Sr respectively. The state
    of m, Sm , is computed as follows
  • if
    is empty
  • otherwise
  • Each application of the first rule
    contributes one count to the number of changes.

13
Fitchs Algorithm for Small Parsimony Problem
14
Exhaustive Search Number of Trees
15
Branch and Bound for Large Parsimony Problem
  • Consider trees of increasing size (starting from
    3 species)
  • Branch add one species, check all possible
    phylogeny topologies
  • Bound one solution as the first bound, update
    the bound while finding a better one
  • Abort an extension if score already exceeds
    current best.

16
Branch and Bound for Large Parsimony Problem
  • The worst case time complexity is the same as the
    complexity of exhaustive search
  • With a wisely chosen bound, many subtrees will be
    cut and therefore the running time will decrease
  • Sometimes a special traversal order finds better
    solutions faster
  • A algorithm ?

17
Maximum Parsimony
  • Time consuming algorithm
  • Only works well if the sequences have a strong
    sequence similarity

18
Maximum Likelihood
  • Evaluate the topologies of different trees and
    picks the best one according to an optimality
    criterion, the likelihood score
  • Require a specified model of the evolutionary
    process that can account for the conversion of
    one sequence into another.

19
Maximum Likelihood Model
  • The model is composed of the composition and the
    substitution process
  • Composition Frequencies of the character states
  • Substitution Process Rate of change from one
    character state to another character state

20
Maximum Likelihood Model
  • For DNA sequences, A simple model is that the
    rate of change from a to c or vice versa is 0.4,
    the composition of a is 0.25 and the composition
    of c is 0.25

P
21
Maximum Likelihood Model
  • For nucleotide sequences, there are 16 possible
    ways to describe substitutions - a 4x4 matrix.
    Each entry in the matrix represents the
    substitution rate from nucleotide i to nucleotide
    j (rows, and columns, follow the order A, C, G,
    T).

22
Maximum Likelihood Model
  • In this matrix, the probability of an a changing
    to a c is 0.01 and the probability of a c
    remaining the same is 0.979, etc.
  • The rows of this matrix sum to 1 - meaning that
    for every nucleotide, we have covered all the
    possibilities of what might happen to it. The
    columns do not sum to anything in particular

23
Maximum Likelihood Model
  • This matrix corresponds to one Certain
    Evolutionary Distance
  • In the computation of likelihood, we need a
    matrix that can describe the branch lengths.
  • Normally, a model gives another rate matrix, Q,
    which gives branch lengths in substitutions per
    site. For a branch length of v

24
Maximum Likelihood Example
  • A ccat
  • B ccgt
  • C gcat

What is the likelihood of alignment
given a tree topology with branch lengths,
a rate matrix,
and a composition,
25
Maximum Likelihood Example
26
Maximum Likelihood Example
  • If we calculate the other three sites in a
    similar way, we get site likelihoods 0.245,
    0.00368, and 0.166. If we multiply them together,
    we get a likelihood for the tree of 3.0410-6.

27
Maximum Likelihood Example
28
Maximum Likelihood Search
  • Input
  • Alignment of sequences
  • Model rate matrix, base frequencies
  • Search
  • Go through all possible trees, For each tree
    calculate branch lengths, and then likelihood
    value
  • Output the tree with maximum likelihood value

29
Heuristic Search in PAUP
  • Stepwise addition
  • As is
  • Random
  • Closest
  • Simple
  • Branch swapping
  • Nearest Neighbor Interchange(NNI)
  • Subtree Pruning Regrafting (SPR)
  • Tree Bisection Reconnection(TBR)

30
Heuristic Search in PAUP
  • Stepwise addition

31
Addition Order
  • As is Input sequences order
  • Random In each step, select a random one
  • Closest In each step, all remaining taxa are
    considered
  • Simple Assume a reference taxon, calculate the
    distance between this reference taxon and all the
    other taxa, add the taxon in the increase order
    of the distances

32
Nearest Neighbor Interchange(NNI)
33
Subtree Pruning Regrafting (SPR)
34
Tree Bisection Reconnection(TBR)
35
Heuristic Search in PHYLIP
  • Stepwise addition in As Is or Random order
  • In each step, do Local Rearrangement by using
    Nearest Neighbor Interchange (NNI)
  • After finish adding, do Global Rearrangement by
    using Subtree Pruning Regrafting (SPR)

36
Distance Method
37
Distance Method
  • Distance table used, a symmetric matrix M that
    gives the pairwise distances
  • Goal Build an edge-weighted tree where each
    leaf (external node) corresponds to one object of
    M and so that distances measured on the tree
    between leaves i and j correspond to Mij

38
Example of Distance Analysis
  • Distances can be shown as a table
  • A ACGCGTTGGGCGATGGCAAC
  • B ACGCGTTGGGCGACGGTAAT
  • C ACGCATTGAATGATGATAAT
  • D ACACATTGAGTGATAATAAT

39
Example of Distance Analysis
  • Using this information, a tree can be drawn
  • A ACGCGTTGGGCGATGGCAAC
  • B ACGCGTTGGGCGACGGTAAT
  • C ACGCATTGAATGATGATAAT
  • D ACACATTGAGTGATAATAAT

40
UPGMA
  • Unweighted Pair Group Method Using Arithmetic
    Averages
  • Works by clustering sequences

41
UPGMA
  • distance dij between clusters Ci and Cj is
    average distance between pairs of sequences from
    each cluster
  •  
  • Ci and Cj are the number of sequences in
    clusters i and j

42
UPGMA
  • Initialization Steps
  • Assign each sequence i to its own cluster Ci
  • Define one leaf of the tree T for each sequence

43
UPGMA
  • Iteration steps
  • 3. Determine the two clusters, i and j for which
    dij is minimal
  • 4. Define a new cluster k by Ck Ci ? Cj, and
    define dkl for all l
  • 5. Add k to the current clusters and remove i and
    j.
  • 6. Continue steps 3-6 until only two clusters i
    and j remain.

44
UPGMA
  • Example using 5 Sequences

45
Neighbor Joining
  • Begin with star topology no neighbors have been
    joined

46
Neighbor Joining
  • Tree modified by joining pairs of sequences
  • Pair is chosen by calculating sum of branch
    lengths for the corresponding tree

47
Neighbor Joining
  • If A and B are joined

48
Neighbor Joining
  • Pair with smallest branch length chosen to be
    joined
  • Calculate new branch lengths

49
Neighbor Joining
  • A new distance table is created with joined
    sequences entered as a composite
  • Repeat process to select next pair to join
  • Process continues until correctly branched tree
    and distances identified

50
Comparison of Methods
Write a Comment
User Comments (0)
About PowerShow.com