Qiong Cheng, Dipendra Kaur, Robert Harrison, Alexander Zelikovsky - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Qiong Cheng, Dipendra Kaur, Robert Harrison, Alexander Zelikovsky

Description:

Qiong Cheng, Dipendra Kaur, Robert Harrison, Alexander Zelikovsky ... gspan: Graph-based substructure pattern mining. In ICDM, pages 721-724, 2002. ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 31
Provided by: Chun6
Category:

less

Transcript and Presenter's Notes

Title: Qiong Cheng, Dipendra Kaur, Robert Harrison, Alexander Zelikovsky


1
Homomorphism Mapping in Metabolic Pathways
  • Qiong Cheng, Dipendra Kaur, Robert Harrison,
    Alexander Zelikovsky
  • Computer Science in Georgia State University

Dec. 1 2007 RECOMB Satellite Conference on
Systems Biology 2007
2
Outline
  • Concept of Metabolic pathway comparison
  • Enzyme similarity
  • Graph mappings embeddings homomorphisms
  • Min cost homomorphism problem for trees
  • Optimal DP algorithm for trees
  • Min cost homomorphism problem for arbitrary
    graphs
  • Minimum Feedback vertex set (MFVS)
  • Searching metabolic networks for
  • pathway motifs
  • pathway holes
  • Web tool
  • Architecture Brief interface
  • Future work

3
Metabolic pathway pathways model
  • Metabolic pathway
  • Metabolic pathways model

4
Comparison of metabolic pathways
  • Enzyme similarity and pathway topology together
    represent the similarity of pathway functionality.
  • Enzyme Similarity
  • Pathway topology
  • Similarity

5
Related work
  • Linear topology

(Forst Schulten1999, Chen
Hofestaedt2004)
  • Tree topology

(Pinter 2005 o(VG2VT/logVGVGVTlogVT
) )
  • Arbitrary topology

Mapping Linear pattern ? Graph (Kelly et al
2004) ( o(VTi2VG2) )
Exhaustively search (Sharan et al 2005 ( o(i!)
o(VTi2VG2) ), Yang et al 2007 (
o(2VGVG2) )
6
Enzyme mapping cost
Enzyme D d1 . d2 . d3 . d4
  • EC (Enzyme Commission) notation
  • Measure Enzyme similarity score ? by the lowest
    common upper class distribution
  • Measure ? by tight reaction property

Enzyme X x1 . x2 . x3 . x4
Enzyme Y y1 . y2 . y3 . y4
?X, Y 1





?X, Y 10
?X, Y 8
otherwise
7
Graph mappings embeddings homomorphisms
  • Isomorphism
  • Isomorphic embedding
  • Homeomorphic embedding
  • Homomorphism

Homomorphism f T ? G fv VT ? VG fe ET
? paths of G
Edge-to-path cost l (fe(e)-1)

Homomorphism cost

l Se in ET (fe(e)-1)
We allow different enzymes to be mapped to the
same enzyme.
8
Min cost homomorphism of multi source tree to
arbitrary graph
  • A multi-source tree is a directed graph, whose
    underlying undirected graph is a tree.
  • Given an multisource tree T ltVT, ETgt (Pattern)
    and an arbitrary graph G ltVG, EGgt (Text),
  • find min cost homomorphism of multisource tree
    to arbitrary graph f T ? G

9
Preprocessing of text graph
Transitive closure of G is graph G(V, E),
where E(i,j) there is i-j-path in G
10
Pattern graph ordering
  • Construct ordered pattern T
  • DFS traversal
  • Processing order in opposite way

Ordered pattern T
  • Each edge ei in T is the unique edge connecting
    vi
  • with the previous vertices in the order

11
DP table
min cost homomorphism mapping from Ts subgraph
induced by previous vertices in the order into G
DTa, uj
12
Filling DP table
  • Recursive function

?(vi , uj) if vi is a leaf in T
?(vi, uj) ?l1 to adj(vi)Minj1 to VGC(il,
j)
if vi is a leaf in T
Cil, jl DTil, jl l(h(j, jl) - 1)
l is penalty for gaps
h(j, jl) (hops between uj and ujl in G)
13
Runtime Analysis for mapping trees
  • Transitive closure takes O(VGEG).
  • Pattern graph ordering takes O(VT ET)
  • Dynamic programming

- Calculate min contribution of all child pairs
of node pair (vi?T,uj?G) takes tij degT
(vi)degG(uj)
- Filling DT takes Sj1 to VG Si1 to
VTtij Sj1 to VG degG(uj)Si1 to
VTdegT(vi)
2EGET
The total runtime for mapping trees is
O(VGEGVGVT).
14
MFVS
  • Minimum Feedback vertex set (MFVS)
  • Given an undirected graph G(V,E) and a
    nonnegative weight function w on V
  • Find a minimum weight subset of V whose removal
    leaves an acyclic graph.
  • Bad news MFVS problem is NP-complete.
  • Good news 2-approximation
  • Greedy Algorithm
  • Delete degree 1/0 vertices from V and set
    remaining vertices to V
  • MFVSlt- f
  • while V ? f do
  • pick up the set S of maximal degree
    vertices
  • MFVS lt- MFVS U S
  • Delete degree 1/0 vertices from V

15
Min cost homomorphism of arbitrary graphs
  • Given an arbitrary graph P ltVP, EPgt (Pattern)
    and an arbitrary graph G ltVG, EGgt (Text),
  • find min cost homomorphism f P ? G
  • Algorithm
  • Find minimum feedback vertex set F(P) of P
  • Construct a multi source tree P ltVp-F(P),
    Ep(Vp-F(P))gt
  • for every sub mapping fv F(P) ?VG do
  • obtain min cost homomorphism of multi source
    tree P to arbitrary graph G under sub mapping
    fv
  • choose min cost homomorphism for all sub mappings

16
Runtime Analysis for mapping arbitrary graphs
  • Finding min feedback vertex set takes O(VP
    ET)
  • O(VG F(P)) possible mappings for MFVS
  • Finding min homomorphism mapping of multi source
    tree to arbitrary graph takes O(VGEGVGVT)
    .

The total runtime is O(VG F(P)(VGEGVGVT
)).
17
Statistical significance
  • Random degree-conserved graph generation
  • Reshuffle nodes

Reshuffle edge
  • Reshuffle edges
  • Randomized P-Value computation

18
Experiments applications
  • All-against-all mappings among S. cerevisiae, B.
    subtilis, T. thermophilus, and E.coli
    Hallobacterium
  • Identifying conserved pathways
  • 24 pathways that are conserved across all 4
    species
  • 18 more pathways that are conserved across at
    least three of these species
  • Resolving ambiguity
  • Discovering pathways holes

19
Mappings with cycles
20
Resolving Ambiguity
21
Pathway holes
  • Check if there is such enzyme in pattern
  • Find the closest protein in the same group
  • If identity is too high gt 80 then we expect good
    filling
  • Align to previous and next enzyme the functions
    may be taken over

22
Filling pathways holes
23
Web Service Architecture
24
Web Interface
25
(No Transcript)
26
Future work
  • Approximation algorithm to handle with the
    comparison of general graphs
  • Mining protein interaction network
  • Discovery of critical elements or modules based
    on graph comparison
  • Discovery of evolution relation of organisms by
    pathway comparison of different organisms at
    different time points
  • Integration with genome database

27
Reference
  • Ron Y Pinter, Oleg Rokhlenko, Esti Yeger-Lotem,
    Michal Ziv-Ukelson Alignment of metabolic
    pathways. Bioinformatics. LNCS 3109.
    Springer-Verlag.(Aug 2005)21(16) 3401-8
  • Sebastian Wernicke Combinatorial Algorithms to
    Cope with the Complexity of Biological Networks.
    Dissertation (December 2006)
  • J. Ellson, E. Gansner, E. Koutsofios, S. North,
    and G. Woodhull. Graphviz and dynagraph - static
    and dynamic graph drawing tools. In M. Junger and
    P. Mutzel, editors, Graph Drawing Software, pages
    127-148. Springer-Verlag, 2003
  • Yan and J. Han. gspan Graph-based substructure
    pattern mining. In ICDM, pages 721-724, 2002.
  • N. Ketkar, L. Holder, D. Cook, R. Shah and J.
    Coble, Subdue Compression-based Frequent Pattern
    Discovery in Graph Data, Proceedings of the ACM
    KDD Workshop on Open-Source Data Mining, August
    2005.
  • K, Borgwardt, S. Bottger, H. Kriegel, VGM visual
    graph mining, International Conference on
    Management of Data archive Proceedings of the
    2006 ACM SIGMOD international conference on
    Management of data
  • Q. Cheng, D. Kaur, R. Harrison, and A.
    Zelikovsky,"Mapping and Filling Metabolic
    Pathways ", RECOMB Satellite Conference on
    Systems Biology 2007  
  • Q. Cheng, R. Harrison, and A. Zelikovsky,"Homomor
    phisms of Multisource Trees into Networks with
    Applications to Metabolic Pathways", Proc. of
    IEEE 7-th International Symposium on
    BioInformatics and BioEngineering (BIBE'07)

28
Question?
Thanks!
29
Handling Cycles
  • Sorting of the pattern such that children can
    communicate only through parent
  • Fix images for some pattern vertices gt
    interrupt communication through cycles
  • Feedback vertex set F(T) VT-F(T) is acyclic
  • Runtime is increased by factor O(VG F(T))
  • t(v) of reasonable text images of v
  • ? t(v) -gt min ? log(t(v)) -gtmin
  • 2-approximation algo

30
Software architecture of service-oriented pathway
mining tool
Services Container
Ambiguity pairs
AI
Potential holes
Rule based mining
DB
Pathway Modeling Comparison
Storage Indexing

Data-Control-View
Browsers
PDC
SW
Visualized Outputs
Additional Value Service
Simulation
Write a Comment
User Comments (0)
About PowerShow.com