Proteins,%20interactions,%20complexes:%20A%20computational%20approach - PowerPoint PPT Presentation

About This Presentation
Title:

Proteins,%20interactions,%20complexes:%20A%20computational%20approach

Description:

Hart. Pu. HAC PE. HAC all. HACO. HACO recovers more reference complexes. 5-fold cross-validation ... Hart. Pu. HACO. Reference. Random. Compare proteins within ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 52
Provided by: wwwcsstud
Category:

less

Transcript and Presenter's Notes

Title: Proteins,%20interactions,%20complexes:%20A%20computational%20approach


1
Proteins, interactions, complexesA
computational approach
  • Haidong Wang
  • Department of Computer Science
  • Stanford University

2
Motivationfrom protein to pathway
Protein-protein interaction

DKPALAKPPKV
V
Complex
Pathway
3
Challengenoisy data and their integration
  • Large amount of proteomic data available
  • Localization, microarray expression,
    protein-protein interaction, transcription
    regulation, sequence, genetic interaction, Gene
    Ontology, trans-membrane, growth fitness, protein
    abundance,
  • High throughput data are noisy
  • Measurement weakly correlate with objective
  • Integrate multiple datasets in a principled way
  • Reduce noise
  • Combine weak signals

4
Outline
Protein-protein interaction

DKPALAKPPKV
Complex
Pathway
5
Outline
Protein-protein interaction

DKPALAKPPKV
Wang et al. 2004 Wang et al. 2007
Complex
Pathway
6
Proteins interact at small region
  • Physical bindings between amino acids
  • Chemically attractive
  • Structurally complement
  • Target by mutation, virus, and drug
  • ? Disrupt interaction
  • ? Disease or cure

7
Challengefew data for interaction site
  • Co-crystallization
  • Costly
  • Time-consuming
  • Many proteins do not crystallize
  • Physics-based simulation
  • Computationally intensive
  • Low throughput
  • Docking
  • Require known structure

8
Our approach, InSite the intuition
B
A
a
b
b
C
D
E
d
e
c
DKPALAKPPKV
PPK
GAPDKLLPPKAK
PPK
  • Sequence motifs from existing database
  • Evolutionarily conserved
  • Cover more than 70 of interaction sites
  • Explain PPI by interactions between motifs

9
Bayesian network to integrate data
a
c
A
d
C
b
  • Evidence for protein-protein interactions
  • Y2H
  • TAP-MS
  • Gavin, Krogan
  • Co-expression
  • Same function
  • Evidence for motif-motif interactions
  • Domain fusion
  • Same function

Probability of fusion P(Ef1)
  • Motif-motif interaction non-deterministic
  • Interaction site outside motifs
  • Small-scale reliable interactions
  • Sparse

Fusion
GO
Eg
Ef
Ef
Bac
Bad
Bbc
Bbd
S
T( ) P(B1) 0.15
OR
AC
I
Oe
Og
Gavin score
Co-expression
Gavin score
10
Sharing of the parameters
Fusion
GO
Eg
Ef
Ef
Bac
Bad
Bbc
Bbd
S
OR
AC
I
Oe
Og
Gavin score
Co-expression
11
Learning with EM
M-step
E-step
  • In E-step, we exploit OR structure
  • Closed-form solution in a large and dense network

12
Predict interaction site intuition
  • Is motif on protein C the interaction site
    with protein A?
  • Target for disrupting C-A interaction

a
b
A
B
b
C
E
e
c
c
d
D
d
  • Not allow on C to bind to A
  • Evaluate how well the protein-protein interaction
    network is explained now

13
Predict interaction site model
A
a
b
c
Eg
Ef
Ef
d
C
Interaction site?
B
B
B
B
S
OR
AC
  • Not allow motif d on C to bind protein A
  • Re-train the model
  • Compute change in likelihood

I
Oe
Og
14
Prediction interaction and its site
  • Protein-protein interaction predictions
  • Interaction site predictions

A
a
b
c
d
C
A
a
b
c
d
C
15
Related work
  • Predict protein-protein interactions using motifs
  • Graphical model Deng et al. 2002, Liu et al.
    2005
  • Attraction-repulsion model Gomez et al. 2003
  • Affinities between motif types, not specific to
    protein pairs
  • Graphical model, exclusion analysis (DPEA) Riley
    et al. 2005
  • LP formulation of parsimony explanation of PPI by
    MMI Guimaraes et al. 2006
  • Expected number of MMI integrated with domain
    fusion, and etc. Lee et al. 2006
  • Our improvement
  • Predict interaction site specific to protein
    pairs
  • Integrate evidences for proteins and motifs

16
Better interaction prediction
  • 10 fold cross-validation on Yeast to predict PPI
  • Compare with Gavin/Krogan for proteins in their
    assays

TAP-MS (Gavin)
TAP-MS (Krogan)
True interactions in top pairs
True interactions in top pairs
Area under ROC
x 104
False interactions in top pairs
False interactions in top pairs
17
Better interaction prediction
  • Prediction over all proteins
  • Pfam works better than Prosite

True interactions in top pairs
False interactions in top pairs
18
Better interaction site prediction
  • Verify interaction site prediction against PDB
  • PDB co-crystallized proteins ? known interaction
    sites

Pfam
PDB interaction sites
Area under ROC
PDB non-interaction sites
19
Cancer mutation in interaction site
Cancer mutation
  • Top prediction SH2 on FYN binds to VAV1
  • Verified Michel et al. 2007
  • VAV1 and FYN both implicated in carcinoma
  • Hypothesis
  • FYN mutation
  • gt disrupt FYN-VAV1 interaction
  • gt cancer

SH2 (green) on FYN interaction site to VAV1
20
OMIM, human genetic disordertop 10 predictions
  • OMIM database for mutations in human genes that
    are related to genetic diseases

Protein Partner Binding site OMIM disease Status
PROC PROS1 PS01187 Protein C deficiency Validated
PROC PROS1 PS50026 Protein C deficiency Validated
BAX BCL2L1 PS01259 Leukemia Validated
MMP2 BCAN PS00142 Winchester syndrome Consistent
STAT1 SRC PS50001 STAT1 deficiency Consistent
VAPB VAMP2 PS50202 Amyotrophic lateral sclerosis Consistent
VAPB VAMP1 PS50202 Amyotrophic lateral sclerosis Consistent
MMP2 BCAN PS00546 Multicentric osteolysis, Wrong
PLAU PLAT PS50070 Alzheimer disease No info
UCHL1 S100A7 PS00140 Parkinson disease No info
21
Conclusion
  • Probabilistic method for prediction of
  • Protein-protein interaction
  • Interaction sites from sequence motifs
  • High quality
  • Genome-wide
  • Generate testable hypotheses for disease
    mechanisms based on interaction site predictions
  • How does a disruption of interaction leads to
    disease?

22
Outline
Protein-protein interaction

DKPALAKPPKV
Wang et al., in preparation
Complex
Pathway
23
TAP-MS detects complexes
  • Tandem Affinity Purification with Mass
    Spectrometry (TAP-MS)
  • Gavin et al. 2006
  • Krogan et al. 2006
  • Relatively high quality, genome wide
  • Purifications -gt pairwise Purification Enrichment
    (PE) score
  • Collins et al. 2007
  • Likelihood being in the same complex

Protein
Protein
Prey
Prey
Protein
Bait protein
Protein
Protein
Prey
Prey
PE score
24
Prior workidentify complex from PE score
  • Purifications ? PE scores
  • ? clusters as complexes
  • Clustering algorithm
  • Hierarchical agglomerative clustering (HAC)
  • Collins et al. 2007
  • Markov Clustering (MCL)
  • Hart et al. 2007, Pu et al. 2007
  • No overlap

But PE score still noisy
Protein
Protein
Prey
Prey
Protein
Bait protein
Protein
Protein
PE score
Prey
Prey
25
Complex prediction,our contribution
  • Integrate PE score with indirect evidence
  • Yeast two-hybrid, co-expression, localization,
  • Geared toward complex identification
  • Overlapping complexes to improve accuracy

26
Related work data integration
  • Identify pathway, functional module,
    co-expression cluster
  • Chen and Yuan 2006, Lee et al. 2007, Marcotte et
    al. 1999, Schlitt et al. 2003, Strong et al.
    2003, von Mering et al. 2003, Yanai and DeLisi
    2002, Yellaboina et al. 2007
  • Indirect evidence correlated with pathway or
    functional similarity
  • Predict pairwise affinity scores
  • Zhang et al. 2004, Jensen et al. 2003
  • Do not reconstruct complexes
  • Non-scalable algorithm
  • Our method gear toward complex reconstruction

27
Our approach
  • Larger reference set for training and validation
  • 340 complexes, double the size of others
  • LogitBoost to learn affinity
  • Integrate evidence into co-complex likelihood
  • Complex identification
  • Cluster pairwise affinity graph by HAC

Protein
Protein
Protein
Protein
Y2H
Protein
PE score
co-expression
Protein
Protein
co-localization
affinity
membrane?

28
Limitation of HAC
  • RAD23 merge with PNG1 first,
  • but slightly lower affinity with RAD4
  • RAD23 and RAD4 form a reference complex NEF2
  • In HAC, RAD23 stuck with PNG1
  • Solution reuse RAD23
  • Overlapping clusters

RAD23
PNG1
HAC
NEF2
RAD4
29
HAC overlap (HACO) algorithm
  • Merge 2 non-overlapping sets with least distance
    (d)
  • Add merged set to the pool
  • HAC remove the 2 sets
  • Universal cutoff, single granularity
  • Stuck to early mistake
  • No overlap
  • HACO remove any set A from the pool if d - M(A)
    gt ?
  • M(A) distance when A 1st merged
  • Cut tree into clusters
  • Cutoff level by cross-validation
  • Optimize for complex

lt??
30
HACO vs. HAC
  • RAD23 merge with PNG1 first,
  • but slightly lower affinity with RAD4
  • RAD23 and RAD4 form a reference complex NEF2
  • In HAC, RAD23 stuck with PNG1
  • In HACO, RAD23 reused to merge with RAD4

HACO
NEF2
RAD23
PNG1
HAC
RAD4
HACO
31
HACO recovers more reference complexes
  • 5-fold cross-validation

reference complexes
Hart
Pu
HAC PE
HAC all
HACO
32
More biologically coherent
Regulator overlap
Good
Fitness correlation
Bad
Hart
Pu
HACO
Reference
Random
Hart
Pu
HACO
Reference
Random
Bad
  • Compare proteins within same complex

Abundance
Good
33
Newly discovered complex?
  • Discovered previously uncharacterized six-protein
    complex, involving four phosphatases
  • Consistent with genetic interaction data
  • Six proteins cluster together
  • Have positive genetic interactions

34
Essential proteins are hubs?
  • Jeong et al. Nature 2001 Essential proteins
    are hubs in protein-protein interaction network

35
Its all about complexes
  • Larger complexes more likely to be essential
  • Complex size a better predictor of essentiality
    than hubness

36
Conclusion
  • Our contribution
  • Integrated PE score with indirect evidences
  • Gear toward identifying complexes
  • Developed HACO to allow overlap
  • Applicable to other clustering problem
  • Our predicted set of complexes
  • Matches better with reference set
  • More biologically coherent
  • Identifying unknown complexes
  • Provide biological insight

37
Outline
Protein-protein interaction

DKPALAKPPKV
Complex
Pathway
38
Complexes interact to coordinate a pathway
  • Signaling pathways
  • Activate, deactivate
  • Modification, eg. phosphorylation
  • Protein degradation pathway
  • 11S activator 20S proteasome
  • ? active unit
  • degrade short peptides
  • More transient
  • Specific time, location, condition

Protein degradation
39
Few data and studies
  • Difficult to measure
  • Interactions are more transient in nature
  • Lack of a comprehensive set of complexes
  • Prior work on protein-protein interactions
  • Graphical model Deng et al. 2002, Liu et al.
    2005
  • Attraction-repulsion model Gomez et al. 2003
  • SVM Bock and Gough 2001
  • Our work predict interaction at the level of
    complexes

40
Extract signals for protein pairs between two
complexes
  • Features used for predicting complex
  • Genetic interactions
  • Sharing of transcription factors
  • InSite interaction probability
  • Integrates multiple evidence
  • Correlates well with complex-complex interaction
  • Co-expression across active conditions
  • Best for predicting complex-complex interaction

11S
20S
Active Proteasome
With stimulus
Without stimulus
41
Experiments
  • Feature aggregate protein-level signal between
    complexes
  • Eg. between complex X and Y min P(A, B) A ?
    X, B ? Y
  • Complexes our predictions
  • More comprehensive
  • More biological coherent than reference set
  • Reference complex-complex interactions
  • 59 hand-labeled complex pairs by biologists
  • 82 complex pairs enriched for reliable PPIs
  • 133 total unique CCIs
  • Naïve Bayes with hidden variables for unknown
    pairs
  • Learn use EM

42
Accuracy on reference CCIs
  • 10-fold cross-validation

interacting reference CCIs
Area under ROC curve
non-interacting reference CCIs
43
Interacting complexes likely in the same
functional category
  • Top 500 predicted pairs
  • gt half of the proteins in a complex in a category
  • ? the complex assigned to the category

Proportion of complex pairs in the same category
44
Conclusion
  • Our predicted set of complex-complex interactions
  • High accuracy
  • Functionally coherent
  • Builds upon previous two stages
  • InSite interaction probability as a feature
  • Our predicted complexes as interaction candidates

45
Summary of the talk
Protein-protein interaction

DKPALAKPPKV
  • Unsupervised learning
  • Bayesian network with EM

46
Summary of the talk
  • Supervised learning
  • LogitBoost
  • Clustering

Complex
47
Summary of the talk
  • Semi-supervised learning
  • Naïve Bayes with EM

Complex
Pathway
48
Summary of the talk
Protein-protein interaction

DKPALAKPPKV
Data integration
Complex
Pathway
49
Contributions and resources
  • List of predicted PPIs and interaction sites
  • http//dags.stanford.edu/InSite/
  • List of predicted complexes
  • http//dags.stanford.edu/HACO/
  • List of predicted CCIs
  • http//dags.stanford.edu/CCI/
  • InSite code
  • http//dags.stanford.edu/InSite/software.html
  • HACO for clustering with overlap
  • http//dags.stanford.edu/HACO/software.html

50
Future work
  • Reconstruct pathways and functional modules
  • Different types of interactions
  • Phorsphorylation, ubiquitination
  • Specific time, location, and condition

51
Acknowledgment
  • Daphne Koller
  • Serafim Batzoglou, Douglas Brutlag, Jean-Claude
    Latombe, Andrew Ng
  • DAGS members
  • Collaborators Eran Segal, Asa Ben-Hur, Qianru
    Li, Marc Vidal, Sean Collins, Nevan Krogan
  • My family and friends
Write a Comment
User Comments (0)
About PowerShow.com