BCB%20444/544 - PowerPoint PPT Presentation

About This Presentation

Title:

BCB%20444/544

Description:

BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods ... Kishino-Hasegawa Test (paired t-test) Shimodaira-Hasegawa Test (?2 test) 39 ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 46

Provided by: publicI

Learn more at: http://www.public.iastate.edu

Category:

more less

Transcript and Presenter's Notes

Title: BCB%20444/544

1
BCB 444/544

Lecture 31
Phylogenetics Character-Based Methods
31_Nov05

2
Required Reading (before lecture)

Fri Oct 30 - Lecture 30
Phylogenetic Distance-Based Methods
Chp 11 - pp 142 169
Mon Nov 5 - Lecture 31
Phylogenetics Parsimony and ML
Chp 11 - pp 142 169
Wed Nov 7 - Lecture 32
Machine Learning
Fri Nov 9 - Lecture 33
Functional and Comparative Genomics
Chp 17 and Chp 18

3
Assignments Announcements

Mon Oct 29 - HW5
HW5 Hands-on exercises with
phylogenetics
and tree-building software
Due Mon Nov 5 (not Fri Nov 1 as previously
posted)

4
BCB 544 Only New Homework Assignment

544 Extra2
Due vPART 1 - ASAP
PART 2 - meeting prior to 5 PM Fri Nov 2
Part 1 - Brief outline of Project, email to Drena
Michael
after response/approval, then
Part 2 - More detailed outline of project
Read a few papers and summarize status of
problem
Schedule meeting with Drena Michael to
discuss ideas

5
Seminars this Week

BCB List of URLs for Seminars related to
Bioinformatics
http//www.bcb.iastate.edu/seminars/index.html
Nov 7 Wed - BBMB Seminar 410 in 1414 MBB
Sharon Roth Dent MD Anderson Cancer Center
Role of chromatin and chromatin modifying
proteins in regulating gene expression
Nov 8 Thurs - BBMB Seminar 410 in 1414 MBB
Jianzhi George Zhang U. Michigan
Evolution of new functions for proteins
Nov 9 Fri - BCB Faculty Seminar 210 in 102
SciI
Amy Andreotti ISU
Something about NMR

6
Chp 11 Phylogenetic Tree Construction Methods
and Programs

SECTION IV MOLECULAR PHYLOGENETICS
Xiong Chp 11 Phylogenetic Tree Construction
Methods and Programs
Distance-Based Methods
Character-Based Methods
Phylogenetic Tree Evaluation
Phylogenetic Programs

7
Tree Construction

Two main categories of tree building methods
Distance-based
Overall similarity between sequences
Character-based
Consider the entire MSA

8
Summary of Distance-Based Methods

Clustering-based methods
Computationally very fast and can handle large
datasets that other methods cannot
Not guaranteed to find the best tree
Optimality-based methods
Better overall accuracies
Computationally slow
All distance-based methods lose all sequence
information and cannot infer the most likely
state at an internal node

9
Character-Based Methods

Based directly on the sequence characters in the
MSA rather than overall distances
Count mutational events accumulated on sequences
Evolutionary dynamics of each character can be
studied and ancestral sequences inferred
Two popular approaches
Parsimony
Maximum Likelihood (ML)

10
Parsimony

Parsimony is based on Occams Razor the
simplest explanation is most likely correct
Goal Find the tree that allows evolution of the
sequences with the fewest changes

11
Parsimony

Parsimony score of a tree The smallest
(weighted) number of steps required by the tree
Two parsimony problems
Large Parsimony problem Find the tree with the
lowest parsimony score
Small Parsimony problem Given a tree, find its
parsimony score
Use the small parsimony problem to solve the
large parsimony problem

12
Algorithms for Small Parsimony

Fitchs algorithm
Based on set operations
Evolutionary steps have the same weight
Sankoffs algorithm
Based on dynamic programming
Allows steps to have different weights
Both algorithms compute the minimum (weighted)
number of steps a tree requires at a given site

13
Fitchs Algorithm Example
14
Sankoffs Algorithm

Allows for different weights for different
evolutionary steps
Transitions (A lt-gt G or C lt-gt T) are more
probable than transversions, so give a lower
weight to transitions

15
Sankoffs Algorithm Example
16
Sankoffs Algorithm Traceback
17
Searching for a Most Parsimonious Tree

Solving the large parsimony problem requires
searching all possible trees (or does it?)
Exhaustive search (exact)
Branch-and-Bound (exact)
Heuristic search methods (not exact)

18
Exhaustive Search

Build the only possible unrooted tree for three
taxa (can be randomly chosen)
Try all possible places to add the fourth taxon
and score each tree
Try all places to add the fifth taxon to the
trees and score again

19
Why Finding a True Tree is Difficult
Number of rooted trees

The number of possible trees grows exponentially
with the number of species (or sequences)
Nr (2n -3)!/2(n-2)(n-2)!
Nu (2n -5)!/2(n-3)(n-3)!
To find the best tree, you must explore all
possibilities (or must you?)

20
Adding the Fourth Taxon
21
Adding the Fifth Taxon
22
(No Transcript)
23
Branch and Bound

Similar to exhaustive search except that we
maintain the score of best tree obtained so far
If score of current tree exceeds the current best
score, backtrack and take next available path
Main idea The parsimony score of a tree can
only increase as we add another taxa

24
Branch and Bound

When a tip of the search tree is reached the tree
is either optimal (and retained) or suboptimal
(and rejected)
When all paths leading from the initial 3 taxon
tree have been explored, the algorithm
terminates, and all most parsimonious trees will
have been identified

25
Branch and Bound
26
Branch and Bound

One way to find a reasonable lower bound quickly
Use UPGMA or NJ to build a complete tree
Calculate the parsimony score of this tree and
use it as a lower bound in our search

27
Heuristic Search

Shortcuts have been designed to reduce the search
space
Idea Build a tree quickly (by NJ or some other
fast method) and rearrange parts of it to explore
some of the possible trees
Branch swapping
Nearest neighbor interchange
Subtree pruning and regrafting
Tree bisection and reconnection

28
Nearest-Neighbor Interchange
29
Subtree Pruning and Regrafting
30
Tree Bisection and Reconnection
31
Stepwise Addition Another Heuristic

A greedy method
Start with 3 taxon tree
Add one taxon at a time
Keep only the best tree found so far
No guarantee of optimality, but may provide a
good starting point for a search

32
Maximum Likelihood Method

ML is based on a Markov model of evolution
Observed The species labeling the leaves
Hidden The ancestral states
Transition probabilities The mutation
probabilities
Assumptions
Only mutations are allowed
Sites are independent

33
Models of Evolution at a Site

Transition probability matrix
M mij, i,j A,C,T,G
Where
mij Prob(i -gt j mutation in 1 time unit)
Branches may have different lengths

34
The Probability of an Assignment
T
G
T
A
G
C
T
Probability mTG mGA mGG mTT mTC mTT
35
Ancestral Reconstruction Most Likely Assignment
X
Y
Z
A
G
C
T
L maxX,Y,Z mXY mYA mYG mXZ mZC mZT
Compute using Viterbi algorithm
36
Likelihood of a Tree
X
Y
Z
A
G
C
T
L ??X,Y,Z mXY mYA mYG mXZ mZC mZT
Compute using forward algorithm
37
Maximum Likelihood Comments

ML is robust
ML converges to the correct answer as more data
is added
Can put in a Bayesian statistical framework to
obtain a distribution of possible phylogenies
ML can be slow

38
Phylogenetic Tree Evaluation

Bootstrapping
Jackknifing
Bayesian Simulation
Statistical difference tests (are two trees
significantly different?)
Kishino-Hasegawa Test (paired t-test)
Shimodaira-Hasegawa Test (?2 test)

39
Bootstrapping

A bootstrap sample is obtained by sampling sites
randomly with replacement
Obtain a data matrix with same number of taxa and
number of characters as original one
Construct trees for samples
For each branch in original tree, compute
fraction of bootstrap samples in which that
branch appears
Assigns a bootstrap support value to each branch
Idea If a grouping has a lot of support, it
will be supported by at least some positions in
most of the bootstrap samples

40
Bootstrapping Comments

Bootstrapping doesnt really assess the accuracy
of a tree, only indicates the consistency of the
data
To get reliable statistics, bootstrapping needs
to be done on your tree 500 1000 times, this is
a big problem if your tree took a few days to
construct

41
Jackknifing

Another resampling technique
Randomly delete half of the sites in the dataset
Construct new tree with this smaller dataset, see
how often taxa are grouped
Advantage sites arent duplicated
Disadvantage again really only measuring
consistency of the data

42
Bayesian Simulation

Using a Bayesian ML method to produce a tree
automatically calculates the probability of many
trees during the search
Most trees sampled in the Bayesian ML search are
near an optimal tree

43
Phylogenetic Programs

Huge list at
http//evolution.genetics.washington.edu/phylip/so
ftware.html
PAUP - one of the most popular programs,
commercial, Mac and Unix only, nice user
interface
PHYLIP free, multiplatform, a bit difficult to
use but web servers make it easier
WebPhylip another interface for PHYLIP online

44
Phylogenetic Programs

TREE-PUZZLE uses a heuristic to allow ML on
large datasets, also available as a web server
PHYML web based, uses genetic algorithm
MrBayes Bayesian program, fast and can handle
large datasets, multiplatform download
BAMBE web based Bayesian program

45
Final Comments on Phylogenetics

No method is perfect
Different methods make very different assumptions
If multiple methods using different assumptions
come up with similar results, we should trust the
results more than any single method

Write a Comment

User Comments (0)