Perfect Phylogeny MLE for Phylogeny Lecture 14 - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Perfect Phylogeny MLE for Phylogeny Lecture 14

Description:

Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1 – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 25
Provided by: Shlo2
Category:

less

Transcript and Presenter's Notes

Title: Perfect Phylogeny MLE for Phylogeny Lecture 14


1
Perfect Phylogeny MLE for Phylogeny Lecture 14
Based on SetubalMeidanis 6.2, Durbin et. Al. 8.1
2
Some Announcements
  • The Final Exam will take Place on Friday,
    17.2.04, 0900, at Taub 8.
  • Allowed Material CourseTutorial slides the
    textbooks of the course (Durbin et el,
    SetubalMeidanis, Gusfield).
  • Lab offered next semester
  • algorithms for constructing phylogenetic trees
  • http//www.cs.technion.ac.il/moran/lab06.htm

3
2. The perfect phylogeny problem
  • A character is assumed to be a property which
    distinguishes between species (e.g. dental
    structure).
  • A characters state is a value of the character
    (human dental structure).
  • Problem Given set of species, specified by their
    characters, reconstruct their evolutionary tree.

4
The Perfect Phylogeny Problem(pure graph
theoretic setting)
Input Partial colorings (C1,,Ck) of a set of
vertices U (in the example 3 total colorings
left, center, right, each by two colors).
Problem Is there a tree T(V,E), s.t. U?V
and for i1,,k,, Ci is a convex (partial)
coloring of T?
NP-Hard In general, in P for some special cases
5
Perfect Phylogeny for directed binary characters
  • Input a matrix where rows correspond to objects
    (species), columns to characters.
  • Each character has two states 0 (non exists) or
    1 (exists).
  • Question Is there a directed perfect phylogeny
    tree for the given species, in which all the
    characters have value 0 at the root?

(00000)
C1 C2 C3 C4 C5
A 1 1 0 0 0
B 0 0 1 0 0
C 1 1 0 0 1
D 0 0 1 1 0
E 0 1 0 0 0
E
B
(01000)
D
(00100)
(00110)
A
C
(11000)
(11001)
6
Perfect Phylogeny for directed binary characters
  • By the definition, for each character C there is
    one edge in which it is converted from 0 to 1. In
    the below tree, the edge on which character C2 is
    converted to 1 is marked. The resulted tree is
    convex for this character.

the edge on which character C2 is converted to 1
C1 C2 C3 C4 C5
A 1
B 0
C 1
D 0
E 1
0
C2
E
B
1
D
0
0
A
C
1
1
7
Perfect Phylogeny for directed binary characters
  • A tree is a directed perfect phylogeny for a
    given 0-1 matrix M iff we can map each character
    to an edge s.t. edge labeled by Ci represent
    changing character Cis state from 0 to 1. Below
    we show such a tree for the given matrix

C1 C2 C3 C4 C5
A 1 1 0 0 0
B 0 0 1 0 0
C 1 1 0 0 1
D 0 0 1 1 0
E 0 1 0 0 0
8
Efficient algorithm for the Binary Perfect
Phylogeny Problem
  • Definition Given a 0-1 matrix M, OkjMjk1,
    ie Ok is the set of objects that have character
    Ck.
  • Theorem M has a perfect phylogenetic tree iff
    the sets Oi are laminar, ie for all i, j,
    either Oi and Oj are disjoint, or one includes
    the other.

Laminar
Not Laminar
C1 C2 C3 C4 C5
A 1 1 0 0 0
B 0 0 1 0 0
C 1 1 0 0 1
D 0 0 1 1 0
E 0 1 0 0 0
C1 C2 C3 C4 C5
A 1 1 0 0 0
B 0 0 1 0 1
C 1 1 0 0 1
D 0 0 1 1 0
E 0 1 0 0 1
9
Proof
  • ? Assume M has a perfect phylogeny, and let Ci,
    Cj be given.
  • Consider the edges labeled Ci and Cj.
  • Case 1 There is a root to leaf path containing
    both edges. Then one is included in the other (C2
    and C1 below).
  • Case 2 not case 1. Then they are disjoint (C2
    and C3).

C2
C3
C1
C4
E
D
B
C5
A
C
10
Proof (cont.)
  • ? Assume for all i, j, either Oi and Oj are
    disjoint, or one includes the other. We prove by
    induction on the number of characters that M has
    a perfect phylogenetic tree for the matrix.
  • Basis one character. Then there are at most two
    objects, one with and one without this character.

C1
A 1
B 0
11
Proof (cont.)
  • ? Induction step Assume correctness for n-1
    characters, and consider a matrix with n
    characters (non-zero columns). WLOG assume that
    O1 is not contained in Oj for j gt 1.
  • Let S1 be the set of objects j for which Mj1 1,
    and S2 be the remaining objects. Then each
    character belongs to objects in S1 or S2, but not
    both (prove!). By induction there are trees T1
    and T2 for S1 and S2. Combining them as below
    gives the desired tree.

C1 C2 C3 C4 C5
A 1 1 0 0 0
B 0 0 1 0 0
C 1 1 0 0 1
D 0 0 1 1 0
E 1 0 0 0 0
S1A,C,E S2B,D
1
T1
T2
12
Efficient Implementation
  • 1 Sort the columns (characters) by decreasing
    value when considered as binary numbers. (Time
    complexity O(mn), using radix sort).
  • Claim If the binary value of column i is larger
    than that of column j, then Oi is not a proper
    subset of Oj.
  • Proof Oi Oj gt 0 means the 1s in Oi are not
    covered by the 1s in Oj.

C1 C2 C3 C4 C5
A 1 1 0 0 0
B 0 0 1 0 0
C 1 1 0 0 1
D 0 0 1 1 0
E 0 1 0 0 0
C2 C1 C3 C5 C4
A 1 1 0 0 0
B 0 0 1 0 0
C 1 1 0 1 0
D 0 0 1 0 1
E 1 0 0 0 0
13
Efficient Implementation(2)
  • 2. Make a backwards linked list of the 1s in
    each row (leftmost 1 in each row points at
    itself). Time complexity O(mn).

C2 C1 C3 C5 C4
A 1 1 0 0 0
B 0 0 1 0 0
C 1 1 0 1 0
D 0 0 1 0 1
E 1 0 0 0 0
Claim If the columns are sorted, then the set of
columns is laminar iff for each column i, all the
links leaving column i point at the same column.
Can be checked in O(mn) time.
14
Examples
laminar
Not laminar

A 1 1 0 0 0
B 0 0 1 0 0
C 1 1 0 1 0
D 0 0 1 0 1
E 1 0 0 0 0

A 1 1 0 0 0
B 0 0 1 0 0
C 1 1 0 1 0
D 0 0 1 0 1
E 1 0 1 1 0
15
Efficient Implementation(3)
  • 3. When the matrix is laminar, the tree edges
    corresponding to characters are defined by the
    backwards links in the matrix.

remaining edges and leaves are determined by the
characters of each object. Needs O(mn) time.
C2 C1 C3 C5 C4
A 1 1 0 0 0
B 0 0 1 0 0
C 1 1 0 1 0
D 0 0 1 0 1
E 1 0 0 0 0
C2
C3
C1
C4
E
D
B
C5
A
C
16
A scenario where Maximum Parsimony (and Perfect
Phylogeny) are misleading
Consider a model with 4 letters (DNA), where the
probability for a substitution is proportional to
time.
1
4
In the following topology, 2 and 3 are likely to
be as the origin, but 4 and 5 are likely to be
different. In this case, Maximum Parsimony
principle may be useless or misleading.
A
A
3
2
A
17
Parsimony may be useless/misleading
A
I Uninformative
II Uninformative
A
III Uninformative
C
Assume the (likely) scenario where leaves 2 and 3
are the same. There are 4 combinations of
substitution for leaves 1,4. In the first three,
all three topologies will obtain the same
parsimony score.
G
In the fourth, a wrong topology will score best
18
Case I Parsimony is Useless
A
A
1
4
A
A
3
2
A
Score0
Score0
Score0
19
Case II Parsimony is Useless
G
A
1
4
A
A
3
2
A
Score1
Score1
Score1
20
Case III Parsimony is useless
G
C
1
4
A
A
3
2
A
Score2
Score2
Score2
21
Case III Parsimony is misleading
C
C
1
4
A
A
3
2
A
Score1
Score2
Score2
22
Parsimony is correct only in rare cases
Will infer correctly only in the rare case of a
change on the central edge, or
In an even more rare case of a parallel change
from A to C on the pendant edges to 1 and 2.
23
3. Maximum Likelihood Approach
Consider the phylogenetic tree to be a stochastic
process.
A simple model assumes that in each edge,
likelihood of transition from character a to
charcter b is given by parameters ?ba . The
liklihood of a letter a in the root is qa. Given
the complete tree, its probability is defined by
the values of the ?ba s and the qas.
24
Maximum Likelihood Approach(2)
When the data consists only of the leaves
sequences (but the topology is fixed)
Write down the likelihood of the data (leaves
sequences) given the tree. Use EM to estimate the
?ba parameters. When the tree is not given
Search for the tree that maximizes
Prob(dataTree, ?EM)
Write a Comment
User Comments (0)
About PowerShow.com