BCB%20444/544 - PowerPoint PPT Presentation

About This Presentation
Title:

BCB%20444/544

Description:

BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods ... Kishino-Hasegawa Test (paired t-test) Shimodaira-Hasegawa Test (?2 test) 39 ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 46
Provided by: publicI
Category:
Tags: bcb | hasegawa

less

Transcript and Presenter's Notes

Title: BCB%20444/544


1
BCB 444/544
  • Lecture 31
  • Phylogenetics Character-Based Methods
  • 31_Nov05

2
Required Reading (before lecture)
  • Fri Oct 30 - Lecture 30
  • Phylogenetic Distance-Based Methods
  • Chp 11 - pp 142 169
  • Mon Nov 5 - Lecture 31
  • Phylogenetics Parsimony and ML
  • Chp 11 - pp 142 169
  • Wed Nov 7 - Lecture 32
  • Machine Learning
  • Fri Nov 9 - Lecture 33
  • Functional and Comparative Genomics
  • Chp 17 and Chp 18

3
Assignments Announcements
  • Mon Oct 29 - HW5
  • HW5 Hands-on exercises with
    phylogenetics
  • and tree-building software
  • Due Mon Nov 5 (not Fri Nov 1 as previously
    posted)

4
BCB 544 Only New Homework Assignment
  • 544 Extra2
  • Due vPART 1 - ASAP
  • PART 2 - meeting prior to 5 PM Fri Nov 2
  • Part 1 - Brief outline of Project, email to Drena
    Michael
  • after response/approval, then
  • Part 2 - More detailed outline of project
  • Read a few papers and summarize status of
    problem
  • Schedule meeting with Drena Michael to
    discuss ideas

5
Seminars this Week
  • BCB List of URLs for Seminars related to
    Bioinformatics
  • http//www.bcb.iastate.edu/seminars/index.html
  • Nov 7 Wed - BBMB Seminar 410 in 1414 MBB
  • Sharon Roth Dent MD Anderson Cancer Center
  • Role of chromatin and chromatin modifying
    proteins in regulating gene expression
  • Nov 8 Thurs - BBMB Seminar 410 in 1414 MBB
  • Jianzhi George Zhang U. Michigan
  • Evolution of new functions for proteins
  • Nov 9 Fri - BCB Faculty Seminar 210 in 102
    SciI
  • Amy Andreotti ISU
  • Something about NMR

6
Chp 11 Phylogenetic Tree Construction Methods
and Programs
  • SECTION IV MOLECULAR PHYLOGENETICS
  • Xiong Chp 11 Phylogenetic Tree Construction
    Methods and Programs
  • Distance-Based Methods
  • Character-Based Methods
  • Phylogenetic Tree Evaluation
  • Phylogenetic Programs

7
Tree Construction
  • Two main categories of tree building methods
  • Distance-based
  • Overall similarity between sequences
  • Character-based
  • Consider the entire MSA

8
Summary of Distance-Based Methods
  • Clustering-based methods
  • Computationally very fast and can handle large
    datasets that other methods cannot
  • Not guaranteed to find the best tree
  • Optimality-based methods
  • Better overall accuracies
  • Computationally slow
  • All distance-based methods lose all sequence
    information and cannot infer the most likely
    state at an internal node

9
Character-Based Methods
  • Based directly on the sequence characters in the
    MSA rather than overall distances
  • Count mutational events accumulated on sequences
  • Evolutionary dynamics of each character can be
    studied and ancestral sequences inferred
  • Two popular approaches
  • Parsimony
  • Maximum Likelihood (ML)

10
Parsimony
  • Parsimony is based on Occams Razor the
    simplest explanation is most likely correct
  • Goal Find the tree that allows evolution of the
    sequences with the fewest changes

11
Parsimony
  • Parsimony score of a tree The smallest
    (weighted) number of steps required by the tree
  • Two parsimony problems
  • Large Parsimony problem Find the tree with the
    lowest parsimony score
  • Small Parsimony problem Given a tree, find its
    parsimony score
  • Use the small parsimony problem to solve the
    large parsimony problem

12
Algorithms for Small Parsimony
  • Fitchs algorithm
  • Based on set operations
  • Evolutionary steps have the same weight
  • Sankoffs algorithm
  • Based on dynamic programming
  • Allows steps to have different weights
  • Both algorithms compute the minimum (weighted)
    number of steps a tree requires at a given site

13
Fitchs Algorithm Example
14
Sankoffs Algorithm
  • Allows for different weights for different
    evolutionary steps
  • Transitions (A lt-gt G or C lt-gt T) are more
    probable than transversions, so give a lower
    weight to transitions

15
Sankoffs Algorithm Example
16
Sankoffs Algorithm Traceback
17
Searching for a Most Parsimonious Tree
  • Solving the large parsimony problem requires
    searching all possible trees (or does it?)
  • Exhaustive search (exact)
  • Branch-and-Bound (exact)
  • Heuristic search methods (not exact)

18
Exhaustive Search
  • Build the only possible unrooted tree for three
    taxa (can be randomly chosen)
  • Try all possible places to add the fourth taxon
    and score each tree
  • Try all places to add the fifth taxon to the
    trees and score again

19
Why Finding a True Tree is Difficult
Number of rooted trees
  • The number of possible trees grows exponentially
    with the number of species (or sequences)
  • Nr (2n -3)!/2(n-2)(n-2)!
  • Nu (2n -5)!/2(n-3)(n-3)!
  • To find the best tree, you must explore all
    possibilities (or must you?)

20
Adding the Fourth Taxon
21
Adding the Fifth Taxon
22
(No Transcript)
23
Branch and Bound
  • Similar to exhaustive search except that we
    maintain the score of best tree obtained so far
  • If score of current tree exceeds the current best
    score, backtrack and take next available path
  • Main idea The parsimony score of a tree can
    only increase as we add another taxa

24
Branch and Bound
  • When a tip of the search tree is reached the tree
    is either optimal (and retained) or suboptimal
    (and rejected)
  • When all paths leading from the initial 3 taxon
    tree have been explored, the algorithm
    terminates, and all most parsimonious trees will
    have been identified

25
Branch and Bound
26
Branch and Bound
  • One way to find a reasonable lower bound quickly
  • Use UPGMA or NJ to build a complete tree
  • Calculate the parsimony score of this tree and
    use it as a lower bound in our search

27
Heuristic Search
  • Shortcuts have been designed to reduce the search
    space
  • Idea Build a tree quickly (by NJ or some other
    fast method) and rearrange parts of it to explore
    some of the possible trees
  • Branch swapping
  • Nearest neighbor interchange
  • Subtree pruning and regrafting
  • Tree bisection and reconnection

28
Nearest-Neighbor Interchange
29
Subtree Pruning and Regrafting
30
Tree Bisection and Reconnection
31
Stepwise Addition Another Heuristic
  • A greedy method
  • Start with 3 taxon tree
  • Add one taxon at a time
  • Keep only the best tree found so far
  • No guarantee of optimality, but may provide a
    good starting point for a search

32
Maximum Likelihood Method
  • ML is based on a Markov model of evolution
  • Observed The species labeling the leaves
  • Hidden The ancestral states
  • Transition probabilities The mutation
    probabilities
  • Assumptions
  • Only mutations are allowed
  • Sites are independent

33
Models of Evolution at a Site
  • Transition probability matrix
  • M mij, i,j A,C,T,G
  • Where
  • mij Prob(i -gt j mutation in 1 time unit)
  • Branches may have different lengths

34
The Probability of an Assignment
T
G
T
A
G
C
T
Probability mTG mGA mGG mTT mTC mTT
35
Ancestral Reconstruction Most Likely Assignment
X
Y
Z
A
G
C
T
L maxX,Y,Z mXY mYA mYG mXZ mZC mZT
Compute using Viterbi algorithm
36
Likelihood of a Tree
X
Y
Z
A
G
C
T
L ??X,Y,Z mXY mYA mYG mXZ mZC mZT
Compute using forward algorithm
37
Maximum Likelihood Comments
  • ML is robust
  • ML converges to the correct answer as more data
    is added
  • Can put in a Bayesian statistical framework to
    obtain a distribution of possible phylogenies
  • ML can be slow

38
Phylogenetic Tree Evaluation
  • Bootstrapping
  • Jackknifing
  • Bayesian Simulation
  • Statistical difference tests (are two trees
    significantly different?)
  • Kishino-Hasegawa Test (paired t-test)
  • Shimodaira-Hasegawa Test (?2 test)

39
Bootstrapping
  • A bootstrap sample is obtained by sampling sites
    randomly with replacement
  • Obtain a data matrix with same number of taxa and
    number of characters as original one
  • Construct trees for samples
  • For each branch in original tree, compute
    fraction of bootstrap samples in which that
    branch appears
  • Assigns a bootstrap support value to each branch
  • Idea If a grouping has a lot of support, it
    will be supported by at least some positions in
    most of the bootstrap samples

40
Bootstrapping Comments
  • Bootstrapping doesnt really assess the accuracy
    of a tree, only indicates the consistency of the
    data
  • To get reliable statistics, bootstrapping needs
    to be done on your tree 500 1000 times, this is
    a big problem if your tree took a few days to
    construct

41
Jackknifing
  • Another resampling technique
  • Randomly delete half of the sites in the dataset
  • Construct new tree with this smaller dataset, see
    how often taxa are grouped
  • Advantage sites arent duplicated
  • Disadvantage again really only measuring
    consistency of the data

42
Bayesian Simulation
  • Using a Bayesian ML method to produce a tree
    automatically calculates the probability of many
    trees during the search
  • Most trees sampled in the Bayesian ML search are
    near an optimal tree

43
Phylogenetic Programs
  • Huge list at
  • http//evolution.genetics.washington.edu/phylip/so
    ftware.html
  • PAUP - one of the most popular programs,
    commercial, Mac and Unix only, nice user
    interface
  • PHYLIP free, multiplatform, a bit difficult to
    use but web servers make it easier
  • WebPhylip another interface for PHYLIP online

44
Phylogenetic Programs
  • TREE-PUZZLE uses a heuristic to allow ML on
    large datasets, also available as a web server
  • PHYML web based, uses genetic algorithm
  • MrBayes Bayesian program, fast and can handle
    large datasets, multiplatform download
  • BAMBE web based Bayesian program

45
Final Comments on Phylogenetics
  • No method is perfect
  • Different methods make very different assumptions
  • If multiple methods using different assumptions
    come up with similar results, we should trust the
    results more than any single method
Write a Comment
User Comments (0)
About PowerShow.com