Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks

Description:

Randomly color graph and require paths be colorful (exactly one vertex of each color) ... Colorful paths can be found with dynamic programming ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 21
Provided by: jacob4
Category:

less

Transcript and Presenter's Notes

Title: Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks


1
Efficient Algorithms for Detecting Signaling
Pathways in Protein Interaction Networks
  • Jacob Scott, Trey Ideker,
  • Richard M. Karp, Roded Sharan

RECOMB 2005
2
Outline
  • Motivation
  • Theoretical foundations
  • Biological extensions
  • Implementation
  • Validation techniques
  • Results from yeast

3
Motivation
  • Post-genomics, want to understand organisms
    protein-protein interaction network
  • Model network as a probabilistic graph, with edge
    weights representing probabilities
  • Interested in protein signaling cascades
  • Show up as simple paths in the graph
  • Want to find biologically interesting paths
    efficiently
  • Score paths, with high scores reflecting
    importance
  • Extended graph algorithms provide speed
  • Automated modelling of signal transduction
    networks as baseline (Steffen et al 2002)

4
Theoretical Foundation
  • Finding long, simple paths is NP-Hard
  • Reduce from TSP
  • Once we find these paths, want the best
    (lightest) ones
  • Need for paths to be simple is what drives
    hardness
  • Color-Coding is a randomized, dynamic-programming
    based algorithm for finding paths of fixed length
  • Developed by Alon et al (1995)
  • Randomly color graph and require paths be
    colorful (exactly one vertex of each color)
  • Number of colors length of paths
  • A colorful path is always simple

5
Color-Coding
  • Colorful paths can be found with dynamic
    programming
  • Key point a colorful path of length k contains a
    colorful path of length k-1.
  • Store path information at each node for each
    subset of k colors
  • Only 2k color subsets, rather than O(nk) node
    subsets
  • Runtime is O(2kkm) ltlt O(knk) brute force
  • Space is O(2kn) ltlt O(knk) brute force

6
Coloring Example
II
I
  • Two different colorings on toy graph, k3
  • In coloring I, W(A,RGB) is built C-gtBC-gtABC
  • In coloring II, W(A,RGB) is built G-gtBG-gtABG
  • ABC is not colorful in coloring II

7
Monte Carlo Details
  • A colorful path is simple, but a simple path may
    not be colorful under a given coloring
  • Solution run multiple independent trials
  • After one trial, for paths of length k,

8
Adding Biology
  • Color-Coding gives an algorithmic basis, now
    introduce biologically motivated extensions
  • Can set the start or end of path by type
  • E.g. screening by Gene Ontology categories
  • Can force the inclusion of a protein on the path
    by giving it a unique color
  • Using counters, can specify path must contain
    between x and y proteins of a given type
  • Computational cost multiplicative in y per counter

9
Adding Biology - Segmented Paths
  • Pathways may be ordered
  • Signaling pathways going from the membrane, to
    nuclear proteins and finally transcription
    factors
  • Assign each protein an integer label based on
    biological information, build path out of ordered
    sequences of labeled proteins
  • Now only need to constrain color collisions among
    proteins with the same label
  • If path length is about equally split among
    labels, probability of correct coloring rises
  • Modifications allow for inability to assign
    proteins to unique labels

10
Adding Biology - More Structures
  • Modifications to the Color-Coding recurrence
    allow for the discovery beyond simple paths
  • Example Two-terminal series-parallel graphs
  • Capture parallel signaling pathways

Example two-terminal series-parallel graph
11
Generating Edge Weights
  • So far, have glossed over how weights
    (probabilities) on the protein graph are assigned
  • Here, use our previous work, generate logistic
    function of three variables (for a pair of
    proteins)
  • Number of times interaction between them was
    experimental observed
  • Pearson correlation coefficient of expressions
    (for corresponding genes)
  • Their small world clustering coefficient
  • Used training data from MIPS (gold standard) for
    training our relative weighting
  • Taking log of weights makes path score additive

12
Application
  • Tested our simple path implementation with the
    yeast interaction network
  • 4,500 vertices, 14,500 edges
  • Based on interaction data from Database of
    Interacting Proteins (Feb 2004)
  • Runtimes varied from minutes (length 8) to under
    two hours (length 10)
  • Much faster than brute force for longer paths
    (14x for paths of length 9)
  • Focus on paths from membrane proteins to
    transcription factors

13
Validation Techniques
  • Three methods of validation
  • Two statistical
  • Functional enrichment p-value based on how many
    proteins in the path are similar (by GO category)
  • Weight p-value compares weights of paths to those
    found when the protein graph undergoes random
    degree-preserving shuffling
  • Lastly, search for expected pathways
  • MAP-Kinase, ubiquitin-ligation

14
MAP-Kinase and Ubiquitin-Ligation
  • Concentrated on three MAPK pathways (same as
    Steffen et al)
  • Pheromone response
  • Filamentous growth
  • Cell wall integrity
  • Looked for shorter (length 4-6)
    ubiquitin-ligation pathways
  • Started at a cullin, ended at an F-Box
  • High functional enrichment under ubiquitin GO
    category

15
Statistical Results (CDFs)
  • 100 best paths of length 8 _at_ 99.9 success
  • 100 normal, 2000 random paths used for weight
    p-value

16
MAPK Recovery Results
  • Cell wall integrity pathway in yeast

MID2 RHO1 PKC1 BCK1 MKK1/2 SLT2 RLM1
B) Best path of length 7 found from MID2 to RLM1
MID2 ROM2 RHO1 PKC1 MKK1 SLT2 RLM1
C) Pheromone response signaling pathway in yeast
D) Best path of length 9 found from STE2/3 to
STE12
17
Additional MAPK Recovery Results
Pheromone response pathway assembly network
REM1
STE50
FAR1
GPA1
CDC24
STE3
STE4/18
STE12
FUS3
STE7
DIG1/2
CDC42
STE11
AKR1
KSS1
STE5
Pheromone response signaling pathway in yeast
18
Conclusion
  • Presented efficient, color-coding based
    algorithms for finding simple paths
  • Added biological extensions, other structures
  • Integrated our well-founded reliability scores
  • Applied our algorithms to yeast
  • Shown 60 of discovered pathways were
    significantly enriched
  • Recovered known MAP-Kinase, ubiquitin-ligation
    pathways

19
Simple vs. Segmented CDFs
Segmented 72
Simple 54
p-value (functional enrichment)
20
References
  • Steffen, M., Petti, A., Aach, J., Dhaeseleer,
    P., Church, G. Automated modelling of signal
    transduction networks. BMC Bioinformatics 3
    (2002) 3444
  • Alon, N., Yuster, R., Zwick, U. Color-coding. J.
    ACM 42 (1995) 844856
Write a Comment
User Comments (0)
About PowerShow.com