Title: Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks
1Efficient Algorithms for Detecting Signaling
Pathways in Protein Interaction Networks
- Jacob Scott, Trey Ideker,
- Richard M. Karp, Roded Sharan
RECOMB 2005
2Outline
- Motivation
- Theoretical foundations
- Biological extensions
- Implementation
- Validation techniques
- Results from yeast
3Motivation
- Post-genomics, want to understand organisms
protein-protein interaction network - Model network as a probabilistic graph, with edge
weights representing probabilities - Interested in protein signaling cascades
- Show up as simple paths in the graph
- Want to find biologically interesting paths
efficiently - Score paths, with high scores reflecting
importance - Extended graph algorithms provide speed
- Automated modelling of signal transduction
networks as baseline (Steffen et al 2002)
4Theoretical Foundation
- Finding long, simple paths is NP-Hard
- Reduce from TSP
- Once we find these paths, want the best
(lightest) ones - Need for paths to be simple is what drives
hardness - Color-Coding is a randomized, dynamic-programming
based algorithm for finding paths of fixed length - Developed by Alon et al (1995)
- Randomly color graph and require paths be
colorful (exactly one vertex of each color) - Number of colors length of paths
- A colorful path is always simple
5Color-Coding
- Colorful paths can be found with dynamic
programming - Key point a colorful path of length k contains a
colorful path of length k-1. - Store path information at each node for each
subset of k colors - Only 2k color subsets, rather than O(nk) node
subsets - Runtime is O(2kkm) ltlt O(knk) brute force
- Space is O(2kn) ltlt O(knk) brute force
6Coloring Example
II
I
- Two different colorings on toy graph, k3
- In coloring I, W(A,RGB) is built C-gtBC-gtABC
- In coloring II, W(A,RGB) is built G-gtBG-gtABG
- ABC is not colorful in coloring II
7Monte Carlo Details
- A colorful path is simple, but a simple path may
not be colorful under a given coloring - Solution run multiple independent trials
- After one trial, for paths of length k,
-
8Adding Biology
- Color-Coding gives an algorithmic basis, now
introduce biologically motivated extensions - Can set the start or end of path by type
- E.g. screening by Gene Ontology categories
- Can force the inclusion of a protein on the path
by giving it a unique color - Using counters, can specify path must contain
between x and y proteins of a given type - Computational cost multiplicative in y per counter
9Adding Biology - Segmented Paths
- Pathways may be ordered
- Signaling pathways going from the membrane, to
nuclear proteins and finally transcription
factors - Assign each protein an integer label based on
biological information, build path out of ordered
sequences of labeled proteins - Now only need to constrain color collisions among
proteins with the same label - If path length is about equally split among
labels, probability of correct coloring rises - Modifications allow for inability to assign
proteins to unique labels
10Adding Biology - More Structures
- Modifications to the Color-Coding recurrence
allow for the discovery beyond simple paths - Example Two-terminal series-parallel graphs
- Capture parallel signaling pathways
Example two-terminal series-parallel graph
11Generating Edge Weights
- So far, have glossed over how weights
(probabilities) on the protein graph are assigned - Here, use our previous work, generate logistic
function of three variables (for a pair of
proteins) - Number of times interaction between them was
experimental observed - Pearson correlation coefficient of expressions
(for corresponding genes) - Their small world clustering coefficient
- Used training data from MIPS (gold standard) for
training our relative weighting - Taking log of weights makes path score additive
12Application
- Tested our simple path implementation with the
yeast interaction network - 4,500 vertices, 14,500 edges
- Based on interaction data from Database of
Interacting Proteins (Feb 2004) - Runtimes varied from minutes (length 8) to under
two hours (length 10) - Much faster than brute force for longer paths
(14x for paths of length 9) - Focus on paths from membrane proteins to
transcription factors
13Validation Techniques
- Three methods of validation
- Two statistical
- Functional enrichment p-value based on how many
proteins in the path are similar (by GO category) - Weight p-value compares weights of paths to those
found when the protein graph undergoes random
degree-preserving shuffling - Lastly, search for expected pathways
- MAP-Kinase, ubiquitin-ligation
14MAP-Kinase and Ubiquitin-Ligation
- Concentrated on three MAPK pathways (same as
Steffen et al) - Pheromone response
- Filamentous growth
- Cell wall integrity
- Looked for shorter (length 4-6)
ubiquitin-ligation pathways - Started at a cullin, ended at an F-Box
- High functional enrichment under ubiquitin GO
category
15Statistical Results (CDFs)
- 100 best paths of length 8 _at_ 99.9 success
- 100 normal, 2000 random paths used for weight
p-value
16MAPK Recovery Results
- Cell wall integrity pathway in yeast
MID2 RHO1 PKC1 BCK1 MKK1/2 SLT2 RLM1
B) Best path of length 7 found from MID2 to RLM1
MID2 ROM2 RHO1 PKC1 MKK1 SLT2 RLM1
C) Pheromone response signaling pathway in yeast
D) Best path of length 9 found from STE2/3 to
STE12
17Additional MAPK Recovery Results
Pheromone response pathway assembly network
REM1
STE50
FAR1
GPA1
CDC24
STE3
STE4/18
STE12
FUS3
STE7
DIG1/2
CDC42
STE11
AKR1
KSS1
STE5
Pheromone response signaling pathway in yeast
18Conclusion
- Presented efficient, color-coding based
algorithms for finding simple paths - Added biological extensions, other structures
- Integrated our well-founded reliability scores
- Applied our algorithms to yeast
- Shown 60 of discovered pathways were
significantly enriched - Recovered known MAP-Kinase, ubiquitin-ligation
pathways
19Simple vs. Segmented CDFs
Segmented 72
Simple 54
p-value (functional enrichment)
20References
- Steffen, M., Petti, A., Aach, J., Dhaeseleer,
P., Church, G. Automated modelling of signal
transduction networks. BMC Bioinformatics 3
(2002) 3444 - Alon, N., Yuster, R., Zwick, U. Color-coding. J.
ACM 42 (1995) 844856