Title: Prediction of Synthetic Genetic Interactions In Silico Using Protein Network Data
1Prediction of Synthetic Genetic Interactions In
Silico Using Protein Network Data
- Sri R. Paladugu
- Keck Graduate Institute
- July 07, 2007
2Outline
- What?
- Prediction of Synthetic sick or lethal (SSLs)
- Why?
- Goals of our study
-
- How?
- Computational Methods (SVMs)
- Results Conclusion
3Protein Interaction Networks
Nodes represent Proteins Edges represent
Observed Interactions
4Synthetic sick or lethal (SSL)
- Extension of single gene dispensability
Dispensability of one gene versus, dispensability
of a pair of genes.
Cells live(wild type)
Cells live
Cells live
Cells die or grow slowly
5SSL Interaction Networks
Nodes represent Non-essential genes Edges
represent synthetic lethality or synthetic slow
growth
6Scenarios giving rise to SSL interactions
Wong SL, Trends in Genetics. 21, 8 (2005)
7SSL hubs might be good cancer targets
Cancer cells w/ random mutations
Normal cell
Alive
Dead
Dead
A. Kamb, J. Theor. Biol. 223, 205 (2003)
8SSL Interactions ? Chemical Genetics
- SSL Interactions together with chemical genetics
can help us predict the target pathways of magic
bullets.
B. Stockwell, Nature Biotechnology. 22, 1 (2003)
9Goals of our study
- Study the predictive power of protein interaction
networks for synthetic genetic interactions
(SGIs) in S. cerevisiae using machine learning
methods - Predict novel SGIs in yeast
- Validate the method and carry out novel
predictions in other organisms
10How?
- Experimental Methods
- Synthetic Genetic Array (SGA) analysis
- dSLAM
- Computational Methods
- Use machine learning analysis to predict
synthetic lethal (SL) interactions - Has been done previously by Wong et.al (2004),
where they used lot of functional information
(e.g. common upstream regulator, correlated mRNA
expression, same MIPS function) to predict (SL)
interactions in Saccharomyces - Data for most of the predictor variables that
were used in the above approach is not readily
available for other species.
11Can We Predict ?
Protein Interaction Network in S. Cerevesiae
Synthetic Lethal/Sick Interaction Network in S.
Cerevesiae
http//www.visualcomplexity.com/vc/project_details
.cfm?index19id184domainBiology
12Network Data
- PIN data and the synthetic lethal/sick
interactions for Saccharomyces were obtained from
Reguly et.al (2006). -
-
- Non-Synthetic Lethal/Non-Synthetic Sick
interactions (Negatives) were constructed by
generating all pairwise combinations of SGA baits
(227) with all other genes in yeast and then
subtracting the Lethal/Sick interactions
confirmed by SGA and dSLAM methods. Pairs where
either members are not part of PIN data are also
removed. - The genes that were used in SGA analysis was
obtained from Tong et.al (2001).
Ref 1 Reguly et.al. Comprehensive curation and
analysis of global interaction networks in
Saccharomyces Cerevisiae. Journal of Biology 511
(2006). Ref 2 Tong et.al. Systemic Genetic
Analysis with Ordered Arrays of Yeast Deletion
Mutants. Science 294, 2364-68 (2001).
13Input properties for SGI prediction
- For every pair of genes we have computed the
following PIN properties, which were supplied as
input to the SVM based classifier. - Degree
- Clustering Co-efficient
- Betweenness Centrality
- Closeness Centrality
- Eigenvector Centrality
- Information Centrality
- Current-Flow Betweenness Centrality
- Bridge Centrality
- Stress Centrality
- Shortest Distance
- Mutual Neighborhood
14Graph Properties
SSL Pairs
Non-SSL Pairs
15Graph Properties
SSL Pairs
Non-SSL Pairs
16Graph Properties
SSL Pairs
Non-SSL Pairs
17Support Vector Machines
SVM Classifier
Graph properties of a of pair of proteins P1, P2
Optimal hyperplane
Average of Betweenness Centralities of pair P1-P2
Average of Closeness Centralities of pair P1-P2
Absolute Difference between Betweenness
Centralities of pair P1-P2
Output between 0 and 1 measures the propensity
for SL interaction
Absolute difference between Closeness
Centralities of pair of P1-P2
Shortest Distance P1 - P2
Mutual Neighborhood P1 - P2
support vectors
18Results
- Receiver operating characteristic (ROC) curves
for SVM classifier trained using various
predictor variables.
- The accuracy of SVM Classification can be
measured by the Area Under the ROC Curve (AUC). - AUC for Classifier trained using All Features
0.7960
19Synthetic lethal network has many ?s
202Hop Characteristics
- 2Hop S-S is 1 if there is SL interaction between
the pair (Gene A, Gene C) and also between the
pair (Gene B, Gene C) - 2Hop S-P is 1 if there is SL interaction between
the Pair (Gene A, Gene C) and Physical
interaction between the pair (Gene B, Gene C) or
vice versa.
2Hop Y-Z
?
Gene A
Gene B
Y
Z
Gene C
212Hop Characteristics
- Addition of 2Hop input properties to the SVM
Classifier improves the ROC curves significantly.
22Positive Predictive Value (PPV)
Positive predictive value
Threshold (Cut-Off)
23Robustness Analysis
- We added some random edges to protein network
data and recomputed the input properties for each
gene pair. - The SVM classifier seems to be robust to bias in
PIN data
24Novel Predictions
- Trained the SVM classifier using all the known
positives and an equal number of known negatives. - Carried out predictions on the 2 million
non-essential pairs that were never screened for
synthetic lethality. - Repeated the training and testing 5 times each
time using a different training set. - Based on a threshold level of 1, we selected the
pairs that were predicted to be lethal in all 5
Runs.
25New Predictions
26Graph Properties
True Positives
False Positives
27Graph Properties
True Positives
False Positives
28Graph Properties
True Positives
False Positives
29Experimental validation of SSL pairs
30New Predictions
Pairs not being validated as single gene knockout
strains are not available for one of the two
genes in the pair
Pairs which are being validated experimentally
31rad1
dep1
end3
ras2
apn2
cti6
thi3
sap30
apn1
atx1
erf4/shr5
cis3
pir1
ccc2
erf2
thi2
spt4
hsp150
YPL183C
YKL050C
Known Interactions
Predicted Interactions
bug1
trm7
SS-SL interactions were color coded based on the
localization information.
32trm82
pus7
dus3
ncl1
smm1/dus2
dus1
trm8
Known Interactions
Predicted Interactions
33tsl1
rvs167
fps1
hsp82
tps1
rvs161
Nucleus
Cytoplasm
ER
Cell wall / Plasma Membrane
gpd1
tps3
Golgi Apparatus
Peroxisome
Vacuole
Other
npt1
rim15
Known Interactions
ypt6
ric1
Predicted Interactions
34rad1
dep1
end3
ras2
apn1
apn2
cti6
thi3
sap30
atx1
erf4/shr5
cis3
ccc2
erf2
pir1
thi2
spt4
hsp150
YPL183C
YKL050C
Known Interactions
Predicted Interactions
bug1
trm7
SS-SL interactions were color coded based on the
biological process.
35trm82
pus7
dus3
ncl1
smm1/dus2
dus1
trm8
Known Interactions
Predicted Interactions
36tsl1
rvs167
fps1
hsp82
tps1
rvs161
gpd1
tps3
npt1
rim15
Known Interactions
ypt6
ric1
Predicted Interactions
37Future Work
- Train the model on Saccharomyces Synthetic
Genetic Interactions data and test on
Caenorhabditis elegans data. - Quantify the role of graph centrality properties
in the prediction task. - Identify misclassified gene pairs and understand
where our methods go wrong.
38Acknowledgements
- Shan Zhao (University of Rochester)
- Dr. Animesh Ray
- Dr. Alpan Raval
- NSF
39References
- Ye P, Peyser BD, Pan X, Boeke JD, Spencer FA and
Bader JS. Gene function prediction from congruent
synthetic lethal interactions in yeast. Molecular
Systems Biology 1(Nov 22), 2005 . - Newman M. A measure of betweenness centrality
based on random walks. Social Networks, 27
(1)3954, 2005. - Wong SL. et al. Combining biological networks to
predict genetic interactions. Proc. Natl. Acad.
Sci. USA, 101, 15682-15687, 2004. - T. Reguly et al. Comprehensive curation and
analysis of global interaction networks in
Saccharomyces cerevisiae. Journal of Biology,
511, 2006.
40In Pursuit of Modularity
- Sri R Paladugu
- May 08, 2007
41Modular structure
42Where are the modules?
43Algorithm
- 1. Choose a centrality measure (e.g., Edge
Betweenness, Edge Information Centrality) - 2. Calculate the centrality score for each of the
edge - 3. Remove the edge highest centrality measure
- 4. Perform an analysis of the components
generated - 5. Repeat steps 2- 4 until the desired number of
components are generated
S. Fortunato, V. Latora, and M. Marchiori,
cond-mat/042522 (2004)
44Modular structure
0.11
0.11
0.11
0.02
0.11
0.01
0.11
0.21
0.61
0.21
0.51
0.12
0.11
0.12
0.01
0.11
0.11
0.11
0.41
0.11
0.11
0.11
0.11
0.11
0.11
0.11
0.11
0.31
0.41
0.01
0.11
0.11
0.01
0.01
0.01
0.01