Title: Convex Recoloring of Strings and Trees
1Convex Recoloring of Strings and Trees
- Shlomo Moran Sagi Snir Technion, Israel
Institute of Technology
2Phylogenetic Trees
- Represent evolutionary history.
- Leaves represent living species.
- Internal nodes represent extinct species.
- Edges represent evolutionary relationship.
3Evolutionary (Phylogenetic) tree
Source Alberts et al
4Characters in Species
- A (discrete) character is a property which
distinguishes between species (e.g. form of
movement) - A character state is a value of the character
(crawling ).
5Characters in Phylogenetic Trees should avoid
reversal transitions
- A species regains a state its direct ancestor has
lost. - Famous examples
- Teeth in birds.
- Legs in snakes.
6and also avoid convergence transitions
- Two species possess the same state while their
least common ancestor possesses a different
state. - Famous example The marsupials.
7(No Transcript)
8- Assumption Significant characters exhibits
neither reversals nor convergence (or, are
homoplasy free).
9Species Vertices Character Coloring States
Colors
Graph theoretic terminology
10Is this a reasonable evolutionary tree (of a
significant character)?
11- Definition a character is convex on a (given)
tree, if in the corresponding coloring, each
color induces a block.
12 Usually, only a partial coloring of the tree is
given. A partial coloring is convex if it can be
completed to a (total) convex coloring.
13Alternative definition for convex(partial or
total) coloring
- A d-carrier is the minimal subtree containing all
vertices colored d
A partial coloring is convex if all d-carriers
are disjoint
C
14The Perfect Phylogeny Problem
- Input a set of species, and many characters,
each assign states (colors) to the species. - Question is there a tree T containing the
species as vertices, in which all the characters
(colorings) are convex?
15The Perfect Phylogeny Problem(combinatorial
setting)
Input Some colorings (C1,,Ck) of a set of
vertices (in the example 3 colorings left,
center, right, each by (the same) two
colors). Problem Is there a tree T which
includes these vertices, s.t. (T,Ci) is convex
for i1,,k?
NP-Hard In general, in P for some special cases
16Assumption Significant characters are convex on
the tree.
Goal assign scores to (partial) colorings which
measure their distance from convexity on the
given tree.
17 Some relevant parameters (for totally colored
trees) nc number of colors 5 in the tree
below nb number of blocks (monochromatic
subtrees) 7
18 Bad Colors and Blocks
A color is bad, and its blocks are bad, if it
has more than a single block. number of
bad colors 2 number of bad blocks 4
19 Common score parsimony score nb-1 5 (
Fitch71, Sankoff75 Present efficient
algorithms that complete leaf colorings to total
colorings with minimum parsimony score). Other
natural known score Fernandez-Baca,
Lagergren03 (violations) nb- nc
2
Shortcoming Dont measure the distance to a
desired convex tree.
20Parsimony Illness
- Score of a colored tree (T,C) studied here
- The minimum cost of converting (T,C) to
convex colored tree (T,C).
21Cost 1 of recolored vertices
Input (T, C)
Convex recoloring C
costC(C) 2
- Opt(T,C) The minimum number of vertices needed
to be recolored for changing C to a convex
coloring.
22Cost 2 of recolored blocks
Input (T, C)
Convex recoloring C
- Opt(T,C) The minimum number of blocks needed to
be recolored for changing C to a convex coloring.
23Motivation
- The phylogenetic tree T is given.
- A new character is introduced, which assigns
states to some species. - Opt(T,C) the minimum number of exceptional
species (or exceptional mutations), relative to a
closest convex coloring.
24Refinements
- Uniform Weighted Recoloring Each vertex has a
weight w(v), and the cost is the weight of the
recolored vertices.
- Non Uniform Weighted Recoloring The cost of
coloring vertex v by color d is given by
arbitrary nonnegative value cost(v,d). - The cost of the recoloring C is
25Algorithmic Challanges
- Compute efficiently the minimum cost of a convex
recoloring of a given colored tree (T,C). - Or at least decide efficiently whether (T,C) is
nearly convex.
26Complexity of computing optimal recoloring of
(T,C) on n vertices
Efficient for small number of colors
- NP-hard for all models.
- Can be computed in poly(n)exp(nc) time.
- In the uniform weight model
- can be computed in poly(n)exp(k) time
- where k of vertices in an optimal
cover. - Can be approximated in poly(n) time.
Efficient for small number of color changes
27Optimal convex recoloring is NP-Hard even for
strings
- Input A colored string (S,C ) and an integer k.
- Question Is there a convex recoloring C of C
such that costC(C) k? - Theorem Minimum convex recoloring of strings is
NP hard. - Reduction from 3 SAT.
28Proof sketch
- Given a CNF formula F, with m clauses on n vars,
we construct a colored string composed of two
segment types Informative segments, separated by
Junk segments, each of length k1.
29Proof sketch
Informative segments type 1 Clause segments
Ci(?x ? y ? z) composed of A triplets d?x,i,
dy,i , dz,i.
30Proof sketch
d?x,i, dy,i , dz,i.
Claim 1 A convex recoloring of a clause segment
of small cost must use one of the literals colors
31Proof sketch
dy,i
Claim 1 A convex recoloring of a clause segment
of cost k must use one of the literals colors
32Proof sketch
Informative segments types 2 and 3 For each var
x, an x-segment and a ?x-segment
?x
x
Blocks are of length BgtgtA
33Proof sketch
d?x,1
d?x,m
?x
x
dx,m
dx,1
The ?x segment contains blocks of all the d?x,i
colors, separated by blocks of the x color
(black) The x segment contains blocks of all the
dx,i, colors, separated by the x color (black)
34Proof sketch
cost Bm
?x
x
cost B(m1)
Claim 2 A convex recoloring of cost k must
overwrite the x color in exactly one of the
segments Claim 3 Only the colors of the other
segment can be used for recoloring the clause
segments (End of proof details in paper).
35Poly(n)exp(nc) algorithm for optimal convex
recoloring of strings
Input a colored weighted string (S,C,w) Output
an optimal convex recoloring of C.
Method scan from left to right, change colors
when needed
36For each subset of colors D? C and for each d
?D OptD,d(i) minimal cost of a convex
recoloring of v1,..,vi by colors from D, so that
vi is colored by d.
Input (S,C)
37For each subset of colors D? C and for each d
?D OptD,d(i) minimal cost of a convex
recoloring of v1,..,vi by colors from D, so that
vi is colored by d.
38Dynamic Programming
- For i1 to n
- For each subset of colors D? C and for each d
?D - OptD,d(i)
39i-1
i
40optimal recoloring of v1,..,vi-1 by colors from
D, so that vi-1 is colored by d.
-
- OptD,d(i) cost (vi,d) minOptD,d(i-1) , ..
41optimal recoloring of v1,..,vi-1 by colors from
D-d
- OptD,d(i) cost (vi,d) minOptD,d(i-1) ,
OptD-d(i-1)
i-1
i
42The complete algorithm
- For i1 to n
- For each subset of colors D? C and for each d
?D - optD,d(i) cost (vi,d) minoptD,d(i-1) ,
optD-d(i-1) - optD(i) mind?D optD,d(i)
Return optC(n)
43Complexity
- n length of string
- nc number of colors
- Time complexity O( n nc2nc ) poly(n)exp(nc)
44Poly(n)exp(nc) Algorithm for Trees
Let T(v) be the subtree rooted at v.
OptD,d(v) value of a min-cost convex recoloring
of T(v), using colors from D, so that v is
colored by d.
45Poly(n)exp(nc) Algorithm for Trees
Recursive formula for computing optD,d(v)
46Poly(n)exp(nc) Algorithm for Trees
Recursive formula for computing optD,d(v)
47Poly(n)exp(nc) Algorithm for Trees
Recursive formula for computing optD,d(v)
48Poly(n)exp(nc) Algorithm for Trees
Recursive formula for computing optD,d(v)
49(No Transcript)
50Number of vertices time per vertex
51Reducing Complexity
- Characters may have many colors (nc is large),
- But significant characters should have
- Only few bad colors (nc is small).
- Only few vertices which need recoloring, (k is
small).
52Questions
- Can we estimate k by nc ?
- Is there an algorithm which is efficient when any
of these values is small?
53Answers
- k ?(nc).
- 2. For strings and totally colored trees
poly(n)exp(nc ) algorithm. - 3. For partially colored trees poly(n)exp(nc )
algorithm (stricter definition of nc)
54poly(n) exp(k) algorithm for the weighted
(uniform) model k (recolored vertices)
Intuition Use variants of the dynamic
programming algorithms, to compute OptD,d(v),
where D range only over subsets of bad colors
D. This reduces the complexity to poly(n)exp (nc)
And use the fact that k ?(nc)
55Case 1 strings
Observation Overwriting one vertex reduces nc
(the number of bad colors) by at most 2
56Observation In optimal convex recoloring of a
string, a sequence of good blocks
57Observation In optimal convex recoloring of a
string, a sequence of good blocks is either
completely retained or completely overwritten by
a single bad color.
58Observation In optimal convex recoloring of a
string, good blocks Are either completely
retained or completely overwritten by a single
bad color.
Optimal convex recoloring
59Poly(n)exp(k) algorithm for strings
Treat all good colors as a special single color,
which is allowed to have more than one block in
the output coloring.
60poly(n) exp(k) algorithm for totally colored
trees
Like in strings, In a totally colored tree, k
O( ) Recoloring a single vertex reduces the
number of bad colors by at most 2
u2
u1
u3
v
uk
u4
(Bounding k in partially colored trees seems more
tricky)
61Can we reduce the time complexity from O(n
exp(nc)) to O(n exp(n)?
c
62Problem Good colors in trees do not behave as
nicely as in strings
The completely retained or completely
overwritten property does not hold in trees
Solution Use DP to compute optimal cover rather
than optimal recoloring.
63Weighted Recoloring asa Min-Cost Cover
- A cover of a weighted partially colored tree
(T,C,w), is a set X? V(T), so that erasing the
colors from X s vertices makes C a convex
partial coloring.
Xu,v is a cover
u
v
64Weighted Recoloring asa Min-Cost Cover
- Input A weighted partially colored tree (T,C,w).
- Output A minimum weight cover , X.
- Very involved dynamic programming.
65Weighted Recoloring asa Min-Cost Cover
- Claim
- If C is a convex recoloring of C then XC (C) is
a cover of (T,C). - If X is a cover of (T,C) then there is a convex
coloring C s.t. XC (C) X.
66- In the uniform cost model, the cover X(C)
determines the cost of the recoloring C .
This enables dynamic programming which keep track
only of bad colors, and treat good one at a
time
67- Agag\in we root the tree.
- For each subset of bad colors D, and for each
color d?D?nill - OptD,d(v) value of a min-cost convex cover of
T(v), which uses only bad colors from D, and can
be completed to convex coloring in which v is
colored by d, (if dnill there is no such
coloring).
v
Opt nill (v)3
,
68(No Transcript)
69Approximation ResultsFor the Uniform Cost Model
Polynomial time algorithms for 2-approximation
for strings 3-approximation for trees
702-approximation for convex recoloring of strings
via penalties
C
C
- Penalty of a blue block in a recoloring
- (non-blue vertices in the block)
-
- (blue vertices outside the block)
- 2 1 3
71 Penalty of a convex recoloring C
C
- penaltyc(C) 1 2 2 3 8
- Observation For any convex recoloring C,
penaltyc(C) 2cost(C). - (Note empty blocks are also penalized!!)
72Minimizing Penalties
- Denote si,j the sub string from position i to j.
- Given a colored string (S,C) let
- Clearly for
any convex coloring C. -
- Problem There is not necessarily a recoloring
realizes .
73Step 1 Find for each color a segment which
minimizes the penalty
- This is the block which maximizes
- (blue vertices) - (non-blue vertices)
74 Step 1 Find
That is for every color, the block that
minimizes its penalty.
75Step 2 Scan from left to right, assign a color
as long as you dont exit its block enter a
block of different color
76Analysis
- Suppose a blue vertex was recolored outside the
blue block - Then the blue color paid for it in penaltyc.
- Suppose a blue vertex was recolored inside the
blue block - Then the recoloring color paid for it in
penaltyc. - Conclusion
77Time complexity
- O(ncn) for finding the mimnimum-penalty blocks
- O(n) for computing the approximate recoloring
783-approximation for convex recoloring of trees
- Solve the corresponding min-weight cover problem
- Given a weighted partially colored tree (T,C,w),
find a minimum weight cover , X.
u
C
C
v
w(u,v ) 2
79Recursive Approximation Algorithm
- If the set of vertices of zero weight is a cover,
take X to be this set and return. - Reduce the input (T,C,w) to (T,C,w), where in
T there are less vertices of positive weight. - Find a cover X which is a 3-approximation for
(T,C,w), - Use X to construct a cover X which is a
3-approximation for (T,C,w) and return. - (Based on the local ratio technique BE)
80Reduction type 0 removable subtrees
- Remove leaf carriers that do not inersect any
other carrier.
81Reduction of Type I (local ratio)
- If there is a triplet x,y,z as below, all have
weight gt 0. - subtract ? min w(x), w(y) and w(z) from each
(one must become 0). - Any cover contains at least ?. We take 3?.
- Approximation ratio 3.
82Reduction of Type II (local ratio)
- If there is a vertex v in 3 carriers
- Every cover contains at least two vertices out of
the six - We take all six.
v
83Reduction type 3 Not cases 0, I or II.
TH
- Root the tree.
- The lowest carrier, (blue), intersects only one
other carrier (violet). - Let TL be the subtree rooted at the blue root.
TL has at least 3 vertices. - Let TH be T- TL .
TL
84Three colorings of TL
TH
- In any optimal recoloring of T, TL may have one
of three forms, depending on the coloring on TH
s
TL
85Three colorings of TL
TH
s
TL
- CHIGH if on TH it has a violet block disjoint
from s, then TL is totally blue
86Three colorings of TL
TH
s
TL
- CMED if on TH it has a violet block containing
s, then TL has a minimal cost coloring which is
either totally blue or blue/violet in which the
root is violet
87Three colorings of TL
TH
s
TL
- CMIN if on TH it has no violet block, then on TL
it has an optimal blue/violet coloring
88Replace TL by T, which encodes the differences
of the costs of the 3 colorings into 2 colored
vertices
TL
T
u
w(u)CMED -CMIN
v
w(v) CHIGH -CMIN
89Reduction type 3 Replace TH by T
TH
TH
TL
T
Claim The new tree has a cover of weight k if
and only if the old has a cover of weight CMIN
k.
90Time complexity
- At most n iterations
- Each iteration O(ncn).
- Total O(ncn2).
91Convexity decision problemon General Graphs
92Further Research
- Improve running times of the algorithms.
- Algorithm for partially colored trees.
- Algorithms for the non-uniform cost model.
- Improve approximation ratios.
- Optimal convex recoloring of general graphs
motivations, results.
93Convexity decision problem