An Algorithm for the Consecutive Ones Property - PowerPoint PPT Presentation

About This Presentation
Title:

An Algorithm for the Consecutive Ones Property

Description:

Title: The consecutive ones property (CP1) Author: Unit applicativa di informatica medica e telemedi Last modified by: Unit applicativa di informatica medica e ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 44
Provided by: Unita
Category:

less

Transcript and Presenter's Notes

Title: An Algorithm for the Consecutive Ones Property


1
An Algorithm forthe Consecutive Ones Property
Claudio Eccher
2
Outline
  1. C1P definition
  • Biological background
  • Hybridization mapping
  • An algorithm for the C1P problem
  • Dividing in components
  • Taking care of a component
  • Joining the components together

3
The consecutive ones property
Definition A binary matrix is said to have
the consecutive ones property (C1P) if a
permutation of its columns can be found such that
all 1s in each row are consecutive
A B C D
1 1 0 0 1
2 0 1 0 1
3 1 0 1 0
C A D B
1 0 1 1 0
2 0 0 1 1
3 1 1 0 0
4
The consecutive ones property
Observation the C1P is closed under taking
submatrices
A bad matrix
C A D
1 0 1 1
2 1 0 1
3 1 1 0
Whichever column x I put in the middle there is a
row in which x is 0
Hence, every matrix containing this submatrix is
bad
5
Hybridization mapping (1)
  • Copies of a DNA molecule are broken into several
    fragments (104 bases) and replicated by cloning
    (clones)
  • The possible binding of small sequences (probes)
    to a clone are checked, the subset of the probes
    bounded (hybridized) to a clone becomes its
    fingerprint
  • Clones overlap, and thus their relative order,
    are determined by comparing fingerprints

6
Hybridization mapping (2)
Two clones sharing part of their respective
fingerprints are likely to have come from
overlapping DNA regions
Clone 1
Clone 2
A
D
C
B
Probes
7
Assumptions
  • Probes are unique
  • There are no errors
  • All clones x probes hybridization experiments
    have been done

8
Model
  • n clones and m probes

9
Problem
Obtaining a physical map from M
10
An algorithm for the C1P problem
  • The problem belongs to P
  • The algorithm is from Fulkerson and Gross (1965)

11
Algorithm sketch
Separation of the rows into components (subsets
of rows)
Permutation of the columns of each component
Join of the components together
12
Row relations
Definition " row iÎM, Sicolumns k Mi,k1
  • Given two rows i and j
  • Si Ç Sj Æ or
  • Si Í Sj or Sj Í Si or
  • Si Ç Sj ¹ Æ and none of them is a subset of the
    other

13
Dividing in components (1)
Lets initially lump together in the same
component the rows with non empty intersection
14
Dividing in components (2)
The components we want are the connected
components of Gc
15
Building Gc an example
Gc
c1 c2 c3 c4 c5 c6 c7 c8 c9
l1 1 1 0 1 1 0 1 0 1
l2 0 1 1 1 1 1 1 1 1
l3 0 1 0 1 1 0 1 0 1
l4 0 0 1 0 0 0 0 1 0
l5 0 0 1 0 0 1 0 0 0
l6 0 0 0 1 0 0 1 0 0
l7 0 1 0 0 0 0 1 0 0
l8 0 0 0 1 1 0 0 0 1
l2
a
l1
Edge (l1, l2)
16
Building Gc an example
Gc
c1 c2 c3 c4 c5 c6 c7 c8 c9
l1 1 1 0 1 1 0 1 0 1
l2 0 1 1 1 1 1 1 1 1
l3 0 1 0 1 1 0 1 0 1
l4 0 0 1 0 0 0 0 1 0
l5 0 0 1 0 0 1 0 0 0
l6 0 0 0 1 0 0 1 0 0
l7 0 1 0 0 0 0 1 0 0
l8 0 0 0 1 1 0 0 0 1
l2
a
l4
l1
g
l5
Edge (l4, l5)
17
Building Gc an example
Gc
c1 c2 c3 c4 c5 c6 c7 c8 c9
l1 1 1 0 1 1 0 1 0 1
l2 0 1 1 1 1 1 1 1 1
l3 0 1 0 1 1 0 1 0 1
l4 0 0 1 0 0 0 0 1 0
l5 0 0 1 0 0 1 0 0 0
l6 0 0 0 1 0 0 1 0 0
l7 0 1 0 0 0 0 1 0 0
l8 0 0 0 1 1 0 0 0 1
l2
a
l1
Edge (l6, l7)
l8
d
l6
l7
18
Building Gc an example
Gc
c1 c2 c3 c4 c5 c6 c7 c8 c9
l1 1 1 0 1 1 0 1 0 1
l2 0 1 1 1 1 1 1 1 1
l3 0 1 0 1 1 0 1 0 1
l4 0 0 1 0 0 0 0 1 0
l5 0 0 1 0 0 1 0 0 0
l6 0 0 0 1 0 0 1 0 0
l7 0 1 0 0 0 0 1 0 0
l8 0 0 0 1 1 0 0 0 1
l2
a
l1
Edge (l6, l8)
l8
d
l6
l7
19
Taking care of a component (1)
c1 c2 c3 c4 c5 c6 c7 c8
l1 0 1 0 0 0 0 1 1
l2 0 1 0 0 1 0 1 0
l3 1 0 0 1 0 0 1 1
The 1s of the first row have to be put
consecutive. The possible solutions can be
represented as follows
2,7,8 2,7,8 2,7,8
l1 0 1 1 1 0
The second row is adjacent to the first one.
Hence, for the second row (l2) there are 2
choices the 1s can be placed to the left or to
the right of those of the row l1. In any case the
direction does not really matter
5 2,7 2,7 8
l1 0 0 1 1 1 0
l2 0 1 1 1 0 0
20
Taking care of a component (2)
c1 c2 c3 c4 c5 c6 c7 c8
l1 0 1 0 0 0 0 1 1
l2 0 1 0 0 1 0 1 0
l3 1 0 0 1 0 0 1 1
For the third row (l3) we have to consider the
relations with the rows connected by edges to l3
Lets place l3 with respect to l2 we cannot
place l3 in either direction (left or right)
because of its relation with l1
To take into account the relation between l1 and
l3 is necessary to consider the number of
elements in the intersections between S1, S2 and
S3
21
Taking care of a component (3)
c1 c2 c3 c4 c5 c6 c7 c8
l1 0 1 0 0 0 0 1 1
l2 0 1 0 0 1 0 1 0
l3 1 0 0 1 0 0 1 1
Definition Let xy Sx Ç Sy be the internal
product of rows x and y
If we have equality it isnt possible to have the
1s of l3 consecutive
22
Taking care of a component (4)
c1 c2 c3 c4 c5 c6 c7 c8
l1 0 1 0 0 0 0 1 1
l2 0 1 0 0 1 0 1 0
l3 1 0 0 1 0 0 1 1
For l3, S3 1,4,7,8, l1l3 2, l1l2 2,
l1l3 1, so l3 have to be put to the right of
l2
5 2 7 8 1,4 1,4
l1 0 0 1 1 1 0 0 0
l2 0 1 1 1 0 0 0 0
l3 0 0 0 1 1 1 1 0
23
Taking care of a component (5)
The only choice made was in the placement of l2
with respect to l1 and both possibilities result
in the same solutions up to reversal.
24
String generator
We have seen the following examples of string
generator
2,7,8
5 2,7 8
5 2 7 8 1,4
A permutation p of the probes is compatible with
a string generator if whenever A, B, C appear in
this order in p and A and C are in a group G,
then B is also included in G
An invariant of the algorithm is that, after
considering rows 1..k, a permutation p
certificates the C1P of the submatrix on rows
1..k iff either p or its reversal is compatible
with the string generator
25
Taking care of a component a bad component
c1 c2 c3 c4 c5 c6 c7 c8
l1 0 1 1 0 0 0 1 1
l2 0 1 0 0 1 0 1 0
l3 1 0 0 1 0 0 1 1
The relations between the rows are the same as
the preceding component
5 2,7 8, 3
l1 0 0 1 1 0
l2 0 1 1 0 0
5 2 7 8 3 1,4 1,4
l1 0 0 1 1 1 1 0 0
l2 0 1 1 1 0 0 0 0
l3 0 0 0 1 1 0 1 1
26
Taking care of a component (6)
For a new row k in the same component find two
previously placed rows i and j s.t. E(k,i),
E(i,j) in Gc and proceed as for the three-row
case. Check also the consistency with the
solution generator
The algorithm gives all possible permutations of
a component having the C1P, up to reversal
27
Algorithm implementation
Construct Gc and traverse it using depth-first
search
When visiting a vertex invoke procedure Place
Algorithm Place input u, v, w vertices of
Gc(V,E) s.t. (u,v)ÎE and (v,w) ÎE output A
placement for row u, if possible if v nil
and w nil then Place all 1s of u
consecutively else if w nil then Left-
or right-place the 1s of u with respect to the 1s
of v Record direction used else if
u w lt min(u v , v w) then Place u
with respect to v in the same direction used
in v, w placement. Record direction used
else Place u with respect to v in
the opposite direction used in v, w
placement. Record direction used Check
consistency of column set
If column sets are not consistent then the
component doesnt have the C1P
28
Algorithm running time
For a n x m matrix building graph Gc takes O(nm)
time
To check consistency of column sets requires O(m)
time per row and there are n rows to process
Total time is thus O(nm)
29
Joining components together (1)
GM tells us how the components of M fit together
30
GM for the example matrix
c1 c2 c3 c4 c5 c6 c7 c8 c9
l1 1 1 0 1 1 0 1 0 1
l2 0 1 1 1 1 1 1 1 1
l3 0 1 0 1 1 0 1 0 1
l4 0 0 1 0 0 0 0 1 0
l5 0 0 1 0 0 1 0 0 0
l6 0 0 0 1 0 0 1 0 0
l7 0 1 0 0 0 0 1 0 0
l8 0 0 0 1 1 0 0 0 1
GM
a
b
a
b
g
d
g
d
31
Joining components together (2)
For two sets Si Î b, Sj Îa, if Si Í Sj then there
is no row k Î a s.t. Si Ë Sk and Si Ç Sk ¹ Æ
The exact same containments and disjunctions hold
for all other sets from b
GM is acyclic
32
Joining components together (3)
The joining of components depends on the way sets
in one component contain or are contained in sets
from other components
Components having sets not contained anywhere
else should be processed first
Containment is specified by the directed edges in
GM
33
Joining components together (4)
GM has to be processed in topological order
Remove all sources from GM (e.g. a) and make the
union of their string generators
While GM is not empty take the next source b,
remove b from GM, and refine the current string
generator with the string generator of b
34
Example (1)
c1 c2 c3 c4 c5 c6 c7 c8 c9
l1 1 1 0 1 1 0 1 0 1
l2 0 1 1 1 1 1 1 1 1
l3 0 1 0 1 1 0 1 0 1
l4 0 0 1 0 0 0 0 1 0
l5 0 0 1 0 0 1 0 0 0
l6 0 0 0 1 0 0 1 0 0
l7 0 1 0 0 0 0 1 0 0
l8 0 0 0 1 1 0 0 0 1
GM
b
a
a
b
g
g
d
d
One topological order is a, b, g, d
35
Example (2)
1 2,4,5,7,9 3,6,8
l1 1 1 1 1 1 1 0 0 0
l2 0 1 1 1 1 1 1 1 1
a
2,4,5,7,9
b
l1 1 1 1 1 1
9,5 4 7 2
l6 0 0 1 1 0
l7 0 0 0 1 1
l8 1 1 1 0 0
d
6 3 8
l4 0 1 1
l5 1 1 0
g
36
Example (3)
1 2,4,5,7,9 3,6,8
l1 1 1 1 1 1 1 0 0 0
l2 0 1 1 1 1 1 1 1 1
l3 0 1 1 1 1 1 0 0 0
9,5 4 7 2
l6 0 0 1 1 0
l7 0 0 0 1 1
l8 1 1 1 0 0
37
Example (4)
1 9,5 4 7 2 3,6,8
l1 1 1 1 1 1 1 0 0 0
l2 0 1 1 1 1 1 1 1 1
l3 0 1 1 1 1 1 0 0 0
l6 0 0 0 1 1 0 0 0 0
l7 0 0 0 0 1 1 0 0 0
l8 0 1 1 1 0 0 0 0 0
6 3 8
l4 0 1 1
l5 1 1 0
38
Example (5)
1 9,5 4 7 2 6 3 8
l1 1 1 1 1 1 1 0 0 0
l2 0 1 1 1 1 1 1 1 1
l3 0 1 1 1 1 1 0 0 0
l6 0 0 0 1 1 0 0 0 0
l7 0 0 0 0 1 1 0 0 0
l8 0 1 1 1 0 0 0 0 0
l4 0 0 0 0 0 0 0 1 1
l5 0 0 0 0 0 0 1 1 0
In this particular case there are two solutions
corresponding to the permutation of identical
columns (5 and 9)
39
Algorithm solution is not unique
In general multiple solutions may exist because
  • Each component may on its own have several
    solutions
  • Each solution can be used in two ways the
    permutation and its reversal

40
Algorithm running time
Topological sorting of GM takes time O(nm)
If the entries of M are preprocessed the queries
needed for traversing GM can take constant time
Preprocessing takes at most O(nm)
Total time for processing each component ci is
O(nim)
Algorithm running time is O(nm)
41
Concluding remarks (1)
Even if a C1P permutation exists, this is not
necessarily the true permutation
  • The solution is not unique
  • In general errors do exist, so the true
    permutation is not the C1P one

42
Concluding remarks (2)
Generalizations to account for errors yield
NP-hard problems
Also relaxing the assumption of unique probes
yields NP-hard problems
43
Related works
A considerably more complicated algorithm from
Booth and Leuker exists (1976) that takes
O(nmr) time (r is the total number of 1s)
Quite recently a simple O(nmr)-time algorithm
has been presented by Hsu - J Algorithms 43
(2002), no. 1, 1-16
Write a Comment
User Comments (0)
About PowerShow.com