Convex Recoloring of Strings and Trees - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Convex Recoloring of Strings and Trees

Description:

Legs in snakes. ...and also avoid. convergence transitions ... two segment types: Informative segments, separated by Junk segments, each of length k 1. ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 52
Provided by: rlem6
Category:

less

Transcript and Presenter's Notes

Title: Convex Recoloring of Strings and Trees


1
Convex Recoloring of Strings and Trees
  • Shlomo Moran Sagi Snir Technion, Israel
    Institute of Technology

2
Phylogenetic Trees
  • Represent evolutionary history.
  • Leaves represent living species.
  • Internal nodes represent extinct species.
  • Edges represent evolutionary relationship.

3
Evolutionary (Phylogenetic) tree
Source Alberts et al
4
Characters in Species
  • A (discrete) character is a property which
    distinguishes between species (e.g. form of
    movement)
  • A character state is a value of the character
    (crawling ).

5
Characters in Phylogenetic Trees should avoid
reversal transitions
  • A species regains a state its direct ancestor has
    lost.
  • Famous examples
  • Teeth in birds.
  • Legs in snakes.

6
and also avoid convergence transitions
  • Two species possess the same state while their
    least common ancestor possesses a different
    state.
  • Famous example The marsupials.

7
(No Transcript)
8
  • Assumption Significant characters exhibits
    neither reversals nor convergence (or, are
    homoplasy free).

9
Species Vertices Character Coloring States
Colors
Graph theoretic terminology






10
Is this a reasonable evolutionary tree (of a
significant character)?






11
  • Definition a character is convex on a (given)
    tree, if in the corresponding coloring, each
    color induces a block.
















12












Usually, only a partial coloring of the tree is
given. A partial coloring is convex if it can be
completed to a (total) convex coloring.
13
Alternative definition for convex(partial or
total) coloring
  • A d-carrier is the minimal subtree containing all
    vertices colored d

A partial coloring is convex if all d-carriers
are disjoint
C
14
The Perfect Phylogeny Problem
  • Input a set of species, and many characters,
    each assign states (colors) to the species.
  • Question is there a tree T containing the
    species as vertices, in which all the characters
    (colorings) are convex?

15
The Perfect Phylogeny Problem(combinatorial
setting)
Input Some colorings (C1,,Ck) of a set of
vertices (in the example 3 colorings left,
center, right, each by (the same) two
colors). Problem Is there a tree T which
includes these vertices, s.t. (T,Ci) is convex
for i1,,k?
NP-Hard In general, in P for some special cases
16
Assumption Significant characters are convex on
the tree.
Goal assign scores to (partial) colorings which
measure their distance from convexity on the
given tree.
17

Some relevant parameters (for totally colored
trees) nc number of colors 5 in the tree
below nb number of blocks (monochromatic
subtrees) 7
18

Bad Colors and Blocks
A color is bad, and its blocks are bad, if it
has more than a single block. number of
bad colors 2 number of bad blocks 4
19

Common score parsimony score nb-1 5 (
Fitch71, Sankoff75 Present efficient
algorithms that complete leaf colorings to total
colorings with minimum parsimony score). Other
natural known score Fernandez-Baca,
Lagergren03 (violations) nb- nc
2
Shortcoming Dont measure the distance to a
desired convex tree.
20
Parsimony Illness
  • Score of a colored tree (T,C) studied here
  • The minimum cost of converting (T,C) to
    convex colored tree (T,C).

21
Cost 1 of recolored vertices
Input (T, C)
Convex recoloring C
costC(C) 2
  • Opt(T,C) The minimum number of vertices needed
    to be recolored for changing C to a convex
    coloring.

22
Cost 2 of recolored blocks
Input (T, C)
Convex recoloring C
  • Opt(T,C) The minimum number of blocks needed to
    be recolored for changing C to a convex coloring.

23
Motivation
  • The phylogenetic tree T is given.
  • A new character is introduced, which assigns
    states to some species.
  • Opt(T,C) the minimum number of exceptional
    species (or exceptional mutations), relative to a
    closest convex coloring.

24
Refinements
  • Uniform Weighted Recoloring Each vertex has a
    weight w(v), and the cost is the weight of the
    recolored vertices.
  • Non Uniform Weighted Recoloring The cost of
    coloring vertex v by color d is given by
    arbitrary nonnegative value cost(v,d).
  • The cost of the recoloring C is

25
Algorithmic Challanges
  • Compute efficiently the minimum cost of a convex
    recoloring of a given colored tree (T,C).
  • Or at least decide efficiently whether (T,C) is
    nearly convex.

26
Complexity of computing optimal recoloring of
(T,C) on n vertices
Efficient for small number of colors
  • NP-hard for all models.
  • Can be computed in poly(n)exp(nc) time.
  • In the uniform weight model
  • can be computed in poly(n)exp(k) time
  • where k of vertices in an optimal
    cover.
  • Can be approximated in poly(n) time.

Efficient for small number of color changes
27
Optimal convex recoloring is NP-Hard even for
strings
  • Input A colored string (S,C ) and an integer k.
  • Question Is there a convex recoloring C of C
    such that costC(C) k?
  • Theorem Minimum convex recoloring of strings is
    NP hard.
  • Reduction from 3 SAT.

28
Proof sketch
  • Given a CNF formula F, with m clauses on n vars,
    we construct a colored string composed of two
    segment types Informative segments, separated by
    Junk segments, each of length k1.

29
Proof sketch
Informative segments type 1 Clause segments
Ci(?x ? y ? z) composed of A triplets d?x,i,
dy,i , dz,i.
30
Proof sketch
d?x,i, dy,i , dz,i.
Claim 1 A convex recoloring of a clause segment
of small cost must use one of the literals colors
31
Proof sketch
dy,i
Claim 1 A convex recoloring of a clause segment
of cost k must use one of the literals colors
32
Proof sketch
Informative segments types 2 and 3 For each var
x, an x-segment and a ?x-segment
?x
x
Blocks are of length BgtgtA
33
Proof sketch
d?x,1
d?x,m
?x
x
dx,m
dx,1
The ?x segment contains blocks of all the d?x,i
colors, separated by blocks of the x color
(black) The x segment contains blocks of all the
dx,i, colors, separated by the x color (black)
34
Proof sketch
cost Bm
?x
x
cost B(m1)
Claim 2 A convex recoloring of cost k must
overwrite the x color in exactly one of the
segments Claim 3 Only the colors of the other
segment can be used for recoloring the clause
segments (End of proof details in paper).
35
Poly(n)exp(nc) algorithm for optimal convex
recoloring of strings
Input a colored weighted string (S,C,w) Output
an optimal convex recoloring of C.
Method scan from left to right, change colors
when needed
36
For each subset of colors D? C and for each d
?D OptD,d(i) minimal cost of a convex
recoloring of v1,..,vi by colors from D, so that
vi is colored by d.
Input (S,C)
37
For each subset of colors D? C and for each d
?D OptD,d(i) minimal cost of a convex
recoloring of v1,..,vi by colors from D, so that
vi is colored by d.
38
Dynamic Programming
  • For i1 to n
  • For each subset of colors D? C and for each d
    ?D
  • OptD,d(i)

39
  • OptD,d(i) cost (vi,d)

i-1
i
40
optimal recoloring of v1,..,vi-1 by colors from
D, so that vi-1 is colored by d.
  • OptD,d(i) cost (vi,d) minOptD,d(i-1) , ..

41
optimal recoloring of v1,..,vi-1 by colors from
D-d
  • OptD,d(i) cost (vi,d) minOptD,d(i-1) ,
    OptD-d(i-1)

i-1
i
42
The complete algorithm
  • For i1 to n
  • For each subset of colors D? C and for each d
    ?D
  • optD,d(i) cost (vi,d) minoptD,d(i-1) ,
    optD-d(i-1)
  • optD(i) mind?D optD,d(i)

Return optC(n)
43
Complexity
  • n length of string
  • nc number of colors
  • Time complexity O( n nc2nc ) poly(n)exp(nc)

44
Poly(n)exp(nc) Algorithm for Trees
Let T(v) be the subtree rooted at v.
OptD,d(v) value of a min-cost convex recoloring
of T(v), using colors from D, so that v is
colored by d.
45
Poly(n)exp(nc) Algorithm for Trees
Recursive formula for computing optD,d(v)
46
Poly(n)exp(nc) Algorithm for Trees
Recursive formula for computing optD,d(v)
47
Poly(n)exp(nc) Algorithm for Trees
Recursive formula for computing optD,d(v)
48
Poly(n)exp(nc) Algorithm for Trees
Recursive formula for computing optD,d(v)
49
(No Transcript)
50
Number of vertices time per vertex
51
Reducing Complexity
  • Characters may have many colors (nc is large),
  • But significant characters should have
  • Only few bad colors (nc is small).
  • Only few vertices which need recoloring, (k is
    small).

52
Questions
  • Can we estimate k by nc ?
  • Is there an algorithm which is efficient when any
    of these values is small?

53
Answers
  • k ?(nc).
  • 2. For strings and totally colored trees
    poly(n)exp(nc ) algorithm.
  • 3. For partially colored trees poly(n)exp(nc )
    algorithm (stricter definition of nc)

54
poly(n) exp(k) algorithm for the weighted
(uniform) model k (recolored vertices)
Intuition Use variants of the dynamic
programming algorithms, to compute OptD,d(v),
where D range only over subsets of bad colors
D. This reduces the complexity to poly(n)exp (nc)
And use the fact that k ?(nc)
55
Case 1 strings
Observation Overwriting one vertex reduces nc
(the number of bad colors) by at most 2
56
Observation In optimal convex recoloring of a
string, a sequence of good blocks
57
Observation In optimal convex recoloring of a
string, a sequence of good blocks is either
completely retained or completely overwritten by
a single bad color.
58
Observation In optimal convex recoloring of a
string, good blocks Are either completely
retained or completely overwritten by a single
bad color.
Optimal convex recoloring
59
Poly(n)exp(k) algorithm for strings
Treat all good colors as a special single color,
which is allowed to have more than one block in
the output coloring.
60
poly(n) exp(k) algorithm for totally colored
trees
Like in strings, In a totally colored tree, k
O( ) Recoloring a single vertex reduces the
number of bad colors by at most 2
u2
u1
u3
v
uk
u4
(Bounding k in partially colored trees seems more
tricky)
61
Can we reduce the time complexity from O(n
exp(nc)) to O(n exp(n)?
c
62
Problem Good colors in trees do not behave as
nicely as in strings
The completely retained or completely
overwritten property does not hold in trees
Solution Use DP to compute optimal cover rather
than optimal recoloring.
63
Weighted Recoloring asa Min-Cost Cover
  • A cover of a weighted partially colored tree
    (T,C,w), is a set X? V(T), so that erasing the
    colors from X s vertices makes C a convex
    partial coloring.

Xu,v is a cover
u
v
64
Weighted Recoloring asa Min-Cost Cover
  • Input A weighted partially colored tree (T,C,w).
  • Output A minimum weight cover , X.
  • Very involved dynamic programming.

65
Weighted Recoloring asa Min-Cost Cover
  • Claim
  • If C is a convex recoloring of C then XC (C) is
    a cover of (T,C).
  • If X is a cover of (T,C) then there is a convex
    coloring C s.t. XC (C) X.

66
  • In the uniform cost model, the cover X(C)
    determines the cost of the recoloring C .

This enables dynamic programming which keep track
only of bad colors, and treat good one at a
time
67
  • Agag\in we root the tree.
  • For each subset of bad colors D, and for each
    color d?D?nill
  • OptD,d(v) value of a min-cost convex cover of
    T(v), which uses only bad colors from D, and can
    be completed to convex coloring in which v is
    colored by d, (if dnill there is no such
    coloring).

v
Opt nill (v)3
,
68
(No Transcript)
69
Approximation ResultsFor the Uniform Cost Model
Polynomial time algorithms for 2-approximation
for strings 3-approximation for trees
70
2-approximation for convex recoloring of strings
via penalties
C
C
  • Penalty of a blue block in a recoloring
  • (non-blue vertices in the block)
  • (blue vertices outside the block)
  • 2 1 3

71

Penalty of a convex recoloring C
C
  • penaltyc(C) 1 2 2 3 8
  • Observation For any convex recoloring C,
    penaltyc(C) 2cost(C).
  • (Note empty blocks are also penalized!!)

72
Minimizing Penalties
  • Denote si,j the sub string from position i to j.
  • Given a colored string (S,C) let
  • Clearly for
    any convex coloring C.
  • Problem There is not necessarily a recoloring
    realizes .

73
Step 1 Find for each color a segment which
minimizes the penalty
  • This is the block which maximizes
  • (blue vertices) - (non-blue vertices)

74
Step 1 Find
That is for every color, the block that
minimizes its penalty.
75
Step 2 Scan from left to right, assign a color
as long as you dont exit its block enter a
block of different color
76
Analysis
  • Suppose a blue vertex was recolored outside the
    blue block
  • Then the blue color paid for it in penaltyc.
  • Suppose a blue vertex was recolored inside the
    blue block
  • Then the recoloring color paid for it in
    penaltyc.
  • Conclusion

77
Time complexity
  • O(ncn) for finding the mimnimum-penalty blocks
  • O(n) for computing the approximate recoloring

78
3-approximation for convex recoloring of trees
  • Solve the corresponding min-weight cover problem
  • Given a weighted partially colored tree (T,C,w),
    find a minimum weight cover , X.

u
C
C
v
w(u,v ) 2
79
Recursive Approximation Algorithm
  • If the set of vertices of zero weight is a cover,
    take X to be this set and return.
  • Reduce the input (T,C,w) to (T,C,w), where in
    T there are less vertices of positive weight.
  • Find a cover X which is a 3-approximation for
    (T,C,w),
  • Use X to construct a cover X which is a
    3-approximation for (T,C,w) and return.
  • (Based on the local ratio technique BE)

80
Reduction type 0 removable subtrees
  • Remove leaf carriers that do not inersect any
    other carrier.

81
Reduction of Type I (local ratio)
  • If there is a triplet x,y,z as below, all have
    weight gt 0.
  • subtract ? min w(x), w(y) and w(z) from each
    (one must become 0).
  • Any cover contains at least ?. We take 3?.
  • Approximation ratio 3.

82
Reduction of Type II (local ratio)
  • If there is a vertex v in 3 carriers
  • Every cover contains at least two vertices out of
    the six
  • We take all six.

v
  • Approximation ratio 3.

83
Reduction type 3 Not cases 0, I or II.
TH
  • Root the tree.
  • The lowest carrier, (blue), intersects only one
    other carrier (violet).
  • Let TL be the subtree rooted at the blue root.
    TL has at least 3 vertices.
  • Let TH be T- TL .

TL
84
Three colorings of TL
TH
  • In any optimal recoloring of T, TL may have one
    of three forms, depending on the coloring on TH

s
TL
85
Three colorings of TL
TH
s
TL
  • CHIGH if on TH it has a violet block disjoint
    from s, then TL is totally blue

86
Three colorings of TL
TH
s
TL
  • CMED if on TH it has a violet block containing
    s, then TL has a minimal cost coloring which is
    either totally blue or blue/violet in which the
    root is violet

87
Three colorings of TL
TH
s
TL
  • CMIN if on TH it has no violet block, then on TL
    it has an optimal blue/violet coloring

88
Replace TL by T, which encodes the differences
of the costs of the 3 colorings into 2 colored
vertices
TL
T
u
w(u)CMED -CMIN
v
w(v) CHIGH -CMIN
89
Reduction type 3 Replace TH by T
TH
TH
TL
T
Claim The new tree has a cover of weight k if
and only if the old has a cover of weight CMIN
k.
90
Time complexity
  • At most n iterations
  • Each iteration O(ncn).
  • Total O(ncn2).

91
Convexity decision problemon General Graphs
92
Further Research
  • Improve running times of the algorithms.
  • Algorithm for partially colored trees.
  • Algorithms for the non-uniform cost model.
  • Improve approximation ratios.
  • Optimal convex recoloring of general graphs
    motivations, results.

93
Convexity decision problem
Write a Comment
User Comments (0)
About PowerShow.com