Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks - PowerPoint PPT Presentation

About This Presentation

Title:

Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks

Description:

Randomly color graph and require paths be colorful (exactly one vertex of each color) ... Colorful paths can be found with dynamic programming ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 21

Provided by: jacob4

Learn more at: http://people.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks

1
Efficient Algorithms for Detecting Signaling
Pathways in Protein Interaction Networks

Jacob Scott, Trey Ideker,
Richard M. Karp, Roded Sharan

RECOMB 2005
2
Outline

Motivation
Theoretical foundations
Biological extensions
Implementation
Validation techniques
Results from yeast

3
Motivation

Post-genomics, want to understand organisms
protein-protein interaction network
Model network as a probabilistic graph, with edge
weights representing probabilities
Interested in protein signaling cascades
Show up as simple paths in the graph
Want to find biologically interesting paths
efficiently
Score paths, with high scores reflecting
importance
Extended graph algorithms provide speed
Automated modelling of signal transduction
networks as baseline (Steffen et al 2002)

4
Theoretical Foundation

Finding long, simple paths is NP-Hard
Reduce from TSP
Once we find these paths, want the best
(lightest) ones
Need for paths to be simple is what drives
hardness
Color-Coding is a randomized, dynamic-programming
based algorithm for finding paths of fixed length
Developed by Alon et al (1995)
Randomly color graph and require paths be
colorful (exactly one vertex of each color)
Number of colors length of paths
A colorful path is always simple

5
Color-Coding

Colorful paths can be found with dynamic
programming
Key point a colorful path of length k contains a
colorful path of length k-1.
Store path information at each node for each
subset of k colors
Only 2k color subsets, rather than O(nk) node
subsets
Runtime is O(2kkm) ltlt O(knk) brute force
Space is O(2kn) ltlt O(knk) brute force

6
Coloring Example
II
I

Two different colorings on toy graph, k3
In coloring I, W(A,RGB) is built C-gtBC-gtABC
In coloring II, W(A,RGB) is built G-gtBG-gtABG
ABC is not colorful in coloring II

7
Monte Carlo Details

A colorful path is simple, but a simple path may
not be colorful under a given coloring
Solution run multiple independent trials
After one trial, for paths of length k,

8
Adding Biology

Color-Coding gives an algorithmic basis, now
introduce biologically motivated extensions
Can set the start or end of path by type
E.g. screening by Gene Ontology categories
Can force the inclusion of a protein on the path
by giving it a unique color
Using counters, can specify path must contain
between x and y proteins of a given type
Computational cost multiplicative in y per counter

9
Adding Biology - Segmented Paths

Pathways may be ordered
Signaling pathways going from the membrane, to
nuclear proteins and finally transcription
factors
Assign each protein an integer label based on
biological information, build path out of ordered
sequences of labeled proteins
Now only need to constrain color collisions among
proteins with the same label
If path length is about equally split among
labels, probability of correct coloring rises
Modifications allow for inability to assign
proteins to unique labels

10
Adding Biology - More Structures

Modifications to the Color-Coding recurrence
allow for the discovery beyond simple paths
Example Two-terminal series-parallel graphs
Capture parallel signaling pathways

Example two-terminal series-parallel graph
11
Generating Edge Weights

So far, have glossed over how weights
(probabilities) on the protein graph are assigned
Here, use our previous work, generate logistic
function of three variables (for a pair of
proteins)
Number of times interaction between them was
experimental observed
Pearson correlation coefficient of expressions
(for corresponding genes)
Their small world clustering coefficient
Used training data from MIPS (gold standard) for
training our relative weighting
Taking log of weights makes path score additive

12
Application

Tested our simple path implementation with the
yeast interaction network
4,500 vertices, 14,500 edges
Based on interaction data from Database of
Interacting Proteins (Feb 2004)
Runtimes varied from minutes (length 8) to under
two hours (length 10)
Much faster than brute force for longer paths
(14x for paths of length 9)
Focus on paths from membrane proteins to
transcription factors

13
Validation Techniques

Three methods of validation
Two statistical
Functional enrichment p-value based on how many
proteins in the path are similar (by GO category)
Weight p-value compares weights of paths to those
found when the protein graph undergoes random
degree-preserving shuffling
Lastly, search for expected pathways
MAP-Kinase, ubiquitin-ligation

14
MAP-Kinase and Ubiquitin-Ligation

Concentrated on three MAPK pathways (same as
Steffen et al)
Pheromone response
Filamentous growth
Cell wall integrity
Looked for shorter (length 4-6)
ubiquitin-ligation pathways
Started at a cullin, ended at an F-Box
High functional enrichment under ubiquitin GO
category

15
Statistical Results (CDFs)

100 best paths of length 8 _at_ 99.9 success
100 normal, 2000 random paths used for weight
p-value

16
MAPK Recovery Results

Cell wall integrity pathway in yeast

MID2 RHO1 PKC1 BCK1 MKK1/2 SLT2 RLM1
B) Best path of length 7 found from MID2 to RLM1
MID2 ROM2 RHO1 PKC1 MKK1 SLT2 RLM1
C) Pheromone response signaling pathway in yeast
D) Best path of length 9 found from STE2/3 to
STE12
17
Additional MAPK Recovery Results
Pheromone response pathway assembly network
REM1
STE50
FAR1
GPA1
CDC24
STE3
STE4/18
STE12
FUS3
STE7
DIG1/2
CDC42
STE11
AKR1
KSS1
STE5
Pheromone response signaling pathway in yeast
18
Conclusion

Presented efficient, color-coding based
algorithms for finding simple paths
Added biological extensions, other structures
Integrated our well-founded reliability scores
Applied our algorithms to yeast
Shown 60 of discovered pathways were
significantly enriched
Recovered known MAP-Kinase, ubiquitin-ligation
pathways

19
Simple vs. Segmented CDFs
Segmented 72
Simple 54
p-value (functional enrichment)
20
References

Steffen, M., Petti, A., Aach, J., Dhaeseleer,
P., Church, G. Automated modelling of signal
transduction networks. BMC Bioinformatics 3
(2002) 3444
Alon, N., Yuster, R., Zwick, U. Color-coding. J.
ACM 42 (1995) 844856

Write a Comment

User Comments (0)