# Phylogenetic Trees Lecture 3 - PowerPoint PPT Presentation

PPT – Phylogenetic Trees Lecture 3 PowerPoint presentation | free to download - id: d7212-NTM2Z

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Phylogenetic Trees Lecture 3

Description:

### S ( t s ) = S ( t ) S ( s ) for any time lengths s or t . The ... The Jukes-Cantor model (Cont.) Substituting S(t) into the differential equation yields: ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 22
Provided by: aleph0
Category:
Tags:
Transcript and Presenter's Notes

Title: Phylogenetic Trees Lecture 3

1
Phylogenetic TreesLecture 3
Based on Durbin et al Chapter 8
2
Phylogenetic Tree Assumptions
• Topology T bifurcating
• Leaves - 1N
• Internal nodes N1 2N-2
• Lengths t ti for each branch
• Phylogenetic tree (Topology, Lengths) (T, t )

3
Maximum Likelihood Approach
Consider the phylogenetic tree to be a stochastic
process.
AAA
Unobserved
AAA
AGA
AAA
Observed
AGA
GGA
AAG
The probability of transition from character a to
character b is given by parameters ?ba. The
probability of letter a in the root is qa. These
parameters are defined via rates of change per
time unit times the time unit. Given the
complete tree, the probability of data is defined
by the values of the ?ba s and the qas.
4
Maximum Likelihood Approach
Assume each site evolves independently of the
others.
Pr(DTree, ?)?i Pr(D(i)Tree, ?)
Write down the likelihood of the data (leaves
sequences) given each tree. When the tree is
not given Search for the tree that maximizes
Pr(DTree, ?)?i Pr(D(i)Tree, ?)
5
Probabilistic Methods
• The phylogenetic tree represents a generative
probabilistic model (like HMMs) for the observed
sequences.
• Background probabilities q( a )
• Mutation probabilities P( a b, t )
• Models for evolutionary mutations
• Jukes Cantor
• Kimura 2-parameter model
• Such models are used to derive the probabilities

6
Jukes Cantor model
• A model for mutation rates
• Mutation occurs at a constant rate
• Each nucleotide is equally likely to mutate into
any other nucleotide with rate a.

7
The Jukes-Cantor model (1969)
We need to develop a formula for DNA evolution
via Prob(y x, t) where x and y are taken from
A, C, G, T and t is the time length. Jukes-Cant
or assumes equal rate of change
8
The Jukes-Cantor model (Cont.)
We denote by S(t) the transition probabilities
We assume the matrix is multiplicative in the
sense that S ( t s ) S ( t ) S ( s ) for
any time lengths s or t .
9
The Jukes-Cantor model (Cont.)
For a short time period ?, we write
By multiplicatively S(t ?) S(t) S(?) ?
S(t)(IR?)
Hence S(t ?) - S(t) /? ? S(t) R
10
The Jukes-Cantor model (Cont.)
Substituting S(t) into the differential equation
yields
Yielding the unique solution which is known as
the Jukes-Cantor model
11
Kimura 2-parameter model
• Allows a different rate for transitions and
transversions.

12
Kimuras K2P model (1980)
Jukes-Cantor model does not take into account
that transitions rates (between purines) A?G and
(between pyrmidine) C?T are different from
transversions rates of A?C, A?T, C?G,
G?T. Kimura used a different rate matrix
13
Kimuras K2P model (Cont.)
Where
14
Mutation Probabilities
• Both models satisfy the following properties
• Lack of memory
• Reversibility
• Exist stationary probabilities Pa s. t.

15
Probabilistic Approach
• Given P,q, the tree topology and branch lengths,
we can compute

x5
t4
x4
t2
t3
t1
x1
x2
x3
16
1. Calculate likelihood for each site on a
specific tree. 2. Sum up the L values for all
sites on the tree. 3. Compare the L value for
all possible trees. 4. Choose tree with highest
L value.
17
Computing the Tree Likelihood
• We are interested in the probability of observed
data given tree and branch lengths
• Computed by summing over internal nodes
• This can be done efficiently using a tree upward
traversal pass.

18
Tree Likelihood Computation
• Define P( Lk a ) prob. of leaves below node
k given that xk a
• Init for leaves P( Lk a ) 1 if xk a 0
otherwise
• Iteration if k is node with children i and j
, then
• TerminationLikelihood is

19
Maximum Likelihood (ML)
• Score each tree by
• Assumption of independent positions m
• Branch lengths t can be optimized
• EM
• We look for the highest scoring tree
• Exhaustive
• Sampling methods (Metropolis)

20
Optimal Tree Search
• Perform search over possible topologies

Parameter space
Parametric optimization (EM)
Local Maxima
21
Computational Problem
• Such procedures are computationally expensive!
• Computation of optimal parameters, per candidate,
requires non-trivial optimization step.
• Spend non-negligible computation on a candidate,
even if it is a low scoring one.
• In practice, such learning procedures can only
consider small sets of candidate structures