Interactions and Ontologies - PowerPoint PPT Presentation

1 / 91
About This Presentation
Title:

Interactions and Ontologies

Description:

BRENDA. BRITE - Biomolecular Relations in Information Transmission and Expression ... Fields S. Song O. Nature. 1989 Jul 20;340(6230):245-6. PMID: 2547163 ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 92
Provided by: stephe78
Category:

less

Transcript and Presenter's Notes

Title: Interactions and Ontologies


1
Interactions and Ontologies

CBW Bioinformatics Workshop February 23th 2006,
Toronto Christopher Hogue The Blueprint
Initiative
2
About this talk
  • Interoperability, Standards and Systems - A
    Historic Perspective
  • Understanding Biomolecular Function
  • A BIND Interaction Record
  • Interaction and Reaction Data Models
  • Interaction Experiments
  • Yeast Two Hybrid, Affinity Purification and False
    Positives
  • Spoke and Matrix models for complexes of Unknown
    Topology
  • Ontologies

3
Interaction Databases
  • Aminoacyl-tRNA Synthetases Database
  • ASEdb - Alanine Scanning Energetics database
  • BBID - Biological Biochemical Image Database
  • BIND - Biomolecular Interaction Network Database
  • BindingDB - The Binding Database
  • Biocarta
  • Biocatalysis/Biodegradation Database
  • BioPathways Consortium
  • BRENDA
  • BRITE - Biomolecular Relations in Information
    Transmission and Expression
  • COMPEL (Composite Regulatory Elements)
  • COPE - Cytokines Online Pathfinder Encyclopaedia
  • CSNDB - Cell Signaling Networks Database / CSNDB
    Paper
  • Curagen Pathcalling
  • DIP - Database of Interacting Proteins
  • DPInteract - DNA-protein interactions
  • DRC - Database of Ribosomal Crosslinks
  • Ecocyc and Metacyc
  • Dynamic Signaling Maps
  • JenPep Immunology MHC-peptide database
  • KEGG - Kyoto Encyclopedia of Genes and Genomes
  • Kohn Molecular Interaction Maps
  • MDB - Metalloprotein Database and Browser
  • MHCPEP - A database of MHC binding peptides
  • MINT - a database of Molecular INTeractions
  • MIPS Yeast Genome Database
  • MMDB - Molecular Modeling Database
  • NetBiochem Welcome Page
  • ooTFD - object-oriented Transcription Factors
    Database)
  • ORDB - Olfactory Receptor Database
  • PATIKA - Pathway Analysis Tool for Integration
    and Knowledge Acquisition
  • PFBP - Protein Function and Biochemical Pathways
    Project
  • PhosphoBase - A database of phosphorylation sites
  • PIM (Protein Interaction Map)
  • PIMdb - Drosophila Protein Interaction Map
    database
  • PKR - Protein Kinase Resource
  • ProChart Database (at AxCell Biosciences)
  • ProNet Online - Protein Interactions on the Web
    (Myriad)

4
Over 50? Why So Many?
  • Easy to build a simple Interaction Database.
  • A Simple Abstraction. Many Projects cutting
    their teeth in Bioinformatics
  • Conceptually this list includes Biochemical
    Pathways (reactions interactions)
  • Also includes transcription factors, tRNA
    synthetases, etc, all of which can fall into a
    general biomolecular binding description.
  • Many Niches to Fill
  • Kinetics
  • Organism centric
  • Protein-protein centric
  • Most are not funded for a large-scale service

5
How do we make things interoperate?What is in a
Standard?A Historical Perspective
  • Standards emerge from successful implementations
    of complete systems.
  • Which one is the standard The light bulb
    or the electrical grid?
  • Lamps were the original killer app.
  • (bye-bye candles, gas lamps, oil lamps)
  • Other Apps Motors, Heaters, Toasters
  • Unexpected Apps radio, TV, transformers,
    computers, rechargables
  • Entire systems become standards via ad-hoc and
    popular use snowball effect.

6
Emergence and evolution of technological systems
  • Systems emerge across broad frontiers
  • Lots of small inventions are responsible for
    emerging technologies.
  • Portions of the frontier that are held back
    become the focus of intense innovation
  • Called a reverse salient by students of
    technology
  • An inadequately functioning or accessible
    component in a complex system of components
  • Opportunities for invention and replacement

7
Reverse Salient AC/DC Example
  • 1882 Edisons DC standard lit up Wall Street
  • High-level buy-in for DC.
  • AC was too complicated, could kill a person!
  • Edisons DC system only worked over short-range.
  • This flaw is the reverse salient.
  • Westinghouse/Stanley/Tesla saw the flaw in this
    standard
  • AC technology raced to fill the gap.
  • Light bulbs work with both AC or DC.
  • Motors required re-invention
  • E.S. Rogers batteryless radio

1925
8
Reverse Salient AC/DC Example
  • Result Cars, Battery based devices emerged with
    DC.
  • Result The electrical Grid emerged with AC.

NOT A WINNER-TAKE-ALL (zero-sum game) RESULT!
9
A few reverse salients in Bioinformatics
  • Inadequately Functioning
  • Integration of Structure and Sequence
  • Integration of chemoinformatics with
    bioinformatics
  • Mapping of microarray data to pathways
  • Integration of interactions and pathways
  • Inaccessable
  • Carbohydrate representation and analysis tools
  • Advanced, ad-hoc text mining tools

10
Reverse Salient Attitudes
  • What holds us back?
  • Oversights (didnt think of that!).
  • Shortsightedness (wont ever need that!).
  • Inability (cant do it!)
  • Stubbornness (wont do it!)
  • Prescriptivism (do it like this!)
  • Nationalism, Continentalism, Colonialism
  • (because thats the way we do it here!)
  • 110 vs 220

11
Understanding Biomolecular Function
  • "I yam what I yam and that's all that I yam.
  • - Popeye the sailor man, the worlds first comic
    book superhero

12
Biomolecular function
E S gt E P
  • This is a generalization of how a biochemist
    might represent the function of enzymes.

13
Biomolecular function
E S gt E P kinase-ATP complex
inactive-enzyme gt Kinase ADP active
enzyme
K
P
ATP
ADP
  • Here is an example of the generalization
    represented two different ways.

14
Biomolecular function
Kinase-ATPcomplex
inactiveenzyme
Activeenzyme
ADP
  • This is another representation.

15
Biomolecular function
A
B
C
D
E
F
  • This is a generalization of the representation.

16
Biomolecular function
A
B
C
D
E
F
  • A biomolecules function can be defined by the
    things that it interacts with and the new (or
    altered) molecules that result from that
    interaction.

17
Biomolecular function
A
B
C
D
E
n
  • This representation makes it easy to focus on the
    interaction part.

18
Biomolecular function
A
B
C
D
E
n
  • This also happens to represent the BIND data
    model.

19
A simple BIND record
A
B
1. Short label for A 2. Short label for B3.
Molecule type for A 4. Molecule type for B 5.
Database reference for A 6. Database reference
for B7. Where A comes from 8. Where B comes
from 9. Publication reference
  • The minimal BIND record has 9 pieces of
    information.

20
A curated BIND record
A
B
1. Short label for A 2. Short label for B3.
Molecule type for A 4. Molecule type for B 5.
Database reference for A 6. Database reference
for B7. Where A comes from 8. Where B comes
from 9. Publication reference
  • The curated BIND record may have many more pieces
    of information.

21
An example BIND record
A
B
1. INAD 2. TRP3. Protein 4. Protein 5.
GenBank GI 3641615 6. GenBank GI 73018617.
GenBank Taxonomy ID 7227 8. GenBank Taxonomy ID
7227 9. PubMed ID 8630257
  • You can view this record in BIND

22
BIND stores molecular interaction data
23
(No Transcript)
24
http//bind.ca
  • Enter 188 (the BIND record number) in the
    Identifier search box

25
(No Transcript)
26
BIND records are observations
A
B
1. Short label for A 2. Short label for B3.
Molecule type for A 4. Molecule type for B 5.
Database reference for A 6. Database reference
for B7. Where A comes from 8. Where B comes
from 9. Publication reference
  • All BIND records will have a publication
    reference and most will specifically list a
    method(s) used to demonstrate the interaction.

27
(No Transcript)
28
Methods used to detect interactions.
  • A great deal of interaction data in BIND
    originates from high-throughput experiments
    designed to detect interactions between
    proteins.
  • The most common methods are
  • Two-hybrid assay
  • Affinity purification

29
Interaction Experimental Evidence in BIND
Remaining1
30
Two-hybrid assay
1.
3.
2.
4.
31
Two-hybrid assay
1.
3.
2.
4.
32
Two-hybrid assay
1.
B
3.
A
2.
4.
33
Two-hybrid assay
1.
B
3.
A
2.
4.
34
Two-hybrid assay
1.
SNF4
B
SNF1
3.
A
2.
GAL4-DBD
Transcription activation domain
UASG
4.
Fields S. Song O. Nature. 1989 Jul
20340(6230)245-6. PMID 2547163
GAL1
Allows growth on galactose
35
Some Two-hybrid caveats
1.
3.
A
2.
4.
Does the DBD-fusion have activity by itself?
36
Some Two-hybrid caveats
1.
A
3.
B
2.
4.
Is the interaction bi-directional?
37
Some Two-hybrid caveats
1.
B
C
3.
A
2.
4.
Is the interaction mediated by some other
protein?
38
Some Two-hybrid questions
1.
B
3.
A
2.
Are the proteins expresssed?Are they
over-expressed?Are they in-frame?Are the
interacting domains defined?Was the observation
reproducible?Was the strength of interaction
significant?Was another method used to back-up
the conclusion? Are the two proteins from the
same compartment?
4.
39
Two-hybrid assay
1.
A
3.
B
2.
4.
Negative results dont mean a lot.
40
Affinity purification
A
this molecule will bind the tag.
tag modification(e.g. HA/GST/His)
Protein of interest
41
Affinity purification
the cell
A
42
Affinity purification
lots of other untagged proteins
the cell
A
B
naturally binding protein
43
Affinity purification
Ruptured membranes
A
B
cell extract
44
Affinity purification
A
B
untagged proteinsgo through fastest(flow-through
)
45
Affinity purification
A
B
tagged complexes are slower and come out later
(eluate)
46
Some affinity purification questions
Is the bait protein expressed and in frame? Is
the bait protein observed?Is the bait protein
over-expressed?Are the interacting domains
defined?Was the observation reproducible?Was
the interactor found in the background?Was the
strength of interaction significant? Was the
interaction saturable? Was the interactor
stoichiometric with the bait protein?Was another
method used to back-up the conclusion?Was
tandem-affinity purification (TAP) used? Was the
interaction shown using an extract or a purified
protein? Is the inverse interaction
observable? Are the two proteins from the same
compartment? Are the two proteins known to be
involved in the same process? Is the interactor
likely to be physiologically significant?
A
B
47
Some affinity purification caveats
First and most importantly, this is only a
representation of the observation. You can only
tell what proteins are in the eluate you cant
tell how they are connected to one another. If
there is only one other protein present (B), then
its likely that A and B are directly
interacting. But, what if I told you that
two other proteins (B and C) were present along
with A.
A
B
A
C
B
48
Complexes with unknown topology
A
A
A
B
C
B
C
B
C
Which of these models is correct? The complex
described by this experimental result is said to
have an Unknown Topology.
49
Complexes with unknown stoichiometry
A
A
B
C
Heres another possibility? The complex described
by this experimental result is also said to have
Unknown Stoichiometry.
50
High throughput data in BIND
  • Affinity purificationSystematic identification
    of protein complexes in Saccharomyces cerevisiae
    by mass spectrometry (2002). PMID 11805837
  • Two-hybridA protein interaction map of
    Drosophila Melanogaster(2003). PMID 14605208
  • Two-hybrid and Affinity purificationA map of
    the interactome network of the metazoan C.
    Elegans (2004). PMID 14704431
  • Data from these examples can be retrieved from
    BIND using a PMID search.

51
How complex data are stored in BIND.
A
?
B
?
Three interaction records.
C
?
52
How complex data are stored in BIND.
A
?
A complex record in BIND is simply a collection
of interaction records.
B
?
C
?
53
How complex data are stored in BIND.
A
?
A complex record in BIND is simply a collection
of interaction records.
B
?
C
?
54
Alternate representations.
A
?
A
B
B
C
?
The matrix model (a clique).
C
?
55
Alternate representations.
A
?
A
B
B
C
?
The spoke model. Which model to use?
C
?
56
Spoke and Matrix Models
  • Vrp1 (bait), Las17, Rad51, Sla1, Tfp1, Ypt7

Possible Actual Topology
Spoke
Matrix
Theoretical max. number of interactions, but many
FPs
Simple model Intuitive, more accurate, but
canmisrepresent.
BaderHogue Nature Biotech. 2002 Oct 20(10)991-7
57
A view on real datamatrix model(seems hopeless)
6 redox enzymes
7 redox enzymes
Old yellow enzyme Function?
58
OYE has little small molecule specificity,
unlike all other redox enzymes
The crystal structure shows a large surface near
its reactivesite, unlike other similar
proteins. So is its substrate protein? Other
redox enzymes? Solution Go do an experiment!
59
Predicting Interaction Information
  • Very often the best result of a Bioinformatics
    investigation is the suggestion of a specific
    experiment, that wasnt previously considered.
  • Often very hard to get a scientist to try an
    experiment.
  • Negative results arent publishable risk to the
    experimentalist that they are wasting their
    time/resources!
  • Narrowing down the vast space of possible
    interactions is important
  • Approx. 36,000,000 pairs of testable
    protein-protein interactions in yeast.
  • Important to use all the information at hand and
    to demonstrate to the experimentalist that you
    have reduced (not increased or left-unchanged)
    their risk of failure.

60
1. How do we predict/validate interactions? 2.
How do we locate specific binding sites?
  • Functional annotation (imprecise for 2)
  • Matching sequence features to patterns
  • PSSMs
  • Domain-small Molecule Interactions (SMID-BLAST)
  • Domain-motif interactions
  • 3D Docking
  • slow
  • need 3D models
  • Energy scoring functions are imprecise

61
Motif-Domain Interactions
  • Protein interactions play a crucial role in
    driving many important cellular processes such as
    intra-cellular signaling, transcription
    regulation, cell cycle regulation, and metabolic
    activities.
  • Many of the interactions are mediated by
    conserved domains binding to short sequence
    motifs that form peptide recognition modules.
  • Only a small number of domains have known binding
    motifs.

62
SH3 domain and Pro-rich Motif
63
High-throughput protein complex identification
Ho et al Nature 415, 180 - 183 (10 Jan
2002) HMS-PCI dataset
Gavin et al Nature 415, 141 - 147 (10 Jan 2002)
TAP dataset
64
Rho family GTPase Interactions
Extract Motifs from 3D Structures Criteria Non
-domain polypeptides
65
Gibbs Sampling
  • Gibbs sampling is a stochastic Markov Chain Monte
    Carlo algorithm
  • Used for motif-discovery proteins
  • Widely used for the identification transcription
    factors binding sites Lawrence et al., 1993,
    Neuwald et al., 1995.
  • Gibbs sampling allows for the incorporation of
    prior knowledge about the motif composition.

66
Seed and Focus Procedure
  • Gibbs sampling is sensitive to database size.
  • On a sufficiently large database, almost any
    motif could be found.
  • Most motifs found with this approach were found
    before databases got big from genomics
  • SEED the Gibbs sampler with the 3D structure
    motif
  • Focus the Gibbs sampler groups of interacting
    sequences found in complexes with the domain
    smaller database
  • If the motif is real it should be enriched
  • otherwise it should disappear

67
Focused sequences from yeast complexes
containing RhoGAP.
Input to Gibbs Sampler Motifs from 3D structure
SEED Database of all proteins from HTP
complexes in yeast that have RhoGAP domains
68
4 Motif descriptions 4 PSSMs
QEDYXR
YVPXVP
QEDYXRLXXL
YXPXXF
69
Use PSSMs to Identify Motifs
  • Constrain to the HTP complexes (next slide).
  • Good enough to get the attention of an
    experimentalist!
  • Try on all yeast genes
  • 18,459 raw pssm-based predictions (scores vary)
  • No compartmentalization or other information
    considered
  • Match 623 literature validated predictions.
  • Probability of predicting by random chance is
    1.6e-53.

70
Predicted RhoGAP interactions
M. Tyers did the validation. Using a standard
flag-pull down - then a more sensitive myc
double-tag pull-down. 11 Validated interactions
(colored) to match 4 motifs
71
High-throughput protein complex identification
Ho et al Nature 415, 180 - 183 (10 Jan
2002) HMS-PCI dataset
Gavin et al Nature 415, 141 - 147 (10 Jan 2002)
TAP dataset
72
Domain-Motif TAP network hits.
73
Domain-motif HMS-PCI network hits.
Significantly more Domain-Motif hits than in the
TAP dataset. Over-expressed proteins used in
this approach may be more sentitive to transient
or low-copy number domain-motif interactions. Or
the baits selected contain more domain-motif
interactions in their respective networks
74
A tea cup in a rainstorm
  • 2000 elemental observations (facts) about
    molecular assembly published in the literature
    every month
  • 2600 High Throughput Interactions published
    every month with high rates of false positives.
  • 200,000 facts sitting in the literature on
    library shelves, not validated.

75
Ontologies for Pathways Interactions and
Signaling
  • An emerging consensus that may help you
    (someday)

76
The domain Biological pathways
Main categories
Metabolic Pathways
Molecular Interaction Networks
Signaling Pathways
77
Ontology
  • ltphilosophygt A systematic account of Existence.
  • ltartificial intelligencegt (From philosophy) An
    explicit formal specification of how to represent
    the objects, concepts and other entities that are
    assumed to exist in some area of interest and the
    relationships that hold among them.
  • ltinformation sciencegt The hierarchical
    structuring of knowledge about things by
    subcategorising them according to their essential
    (or at least relevant and/or cognitive)
    qualities. This is an extension of the previous
    senses of "ontology" (above) which has become
    common in discussions about the difficulty of
    maintaining subject indices. The philosophy of
    indexing everything in existence?

78
Ontology redux
  • An ontology is a choice of a system of data
    grammar together with specific controlled
    vocabularies and an organizational framework to
    contain data.
  • Ontologies are used in practice to describe how
    to exchange data faithfully between computers,
    not how to compute with them!
  • An Ontology may be used to Archive information or
    to make information available to applications
    (API).

79
Parsing - Summary
  • Parsing flatfiles is instructive to understand
    how biological data is stored and used.
  • Most bioinformaticians in small academic groups
    write their own parsers and work with small
    batches of computations.
  • Data Grammars and automatically generated parsers
    are efficient and often error free.
  • Most database organizations and software
    developers with large audiences use data grammar
    approaches.
  • Semantic approaches (OWL) are beginning to emerge.

80
BioPAX
  • BioPAX Biological PAthway eXchange
  • A data exchange ontology and format for semantic
    integration, aggregation and inference of
    biological pathway data
  • Open source community effort the community
    agreed upon and built this!
  • www.biopax.org

81
BioPAX Ontology Overview
Level 1 v1.0 (July 7th, 2004)
82
The domain Biological pathways
Main categories
Metabolic Pathways
Molecular Interaction Networks
Signaling Pathways
83
Aggregation, Integration, Inference
  • Multiple kinds of pathway databases
  • metabolic
  • molecular interactions
  • signal transduction
  • gene regulatory
  • Constructs designed for integration
  • DB References
  • XRefs (Publication, Unification, Relationship)
  • Synonyms
  • Provenance (not yet implemented)
  • OWL DL to enable reasoning

84
BioPAX uses other ontologies
  • Conceptual framework based upon existing DB
    schemas
  • aMAZE, BIND, EcoCyc, WIT, KEGG, Reactome, etc.
  • Allows wide range of detail, multiple levels of
    abstraction
  • Uses pointers to existing ontologies to provide
    supplemental annotation where appropriate
  • Cellular location ? GO Component
  • Cell type ? Cell.obo
  • Organism ? NCBI taxon DB
  • Incorporate other standards where appropriate
  • Chemical structure ? SMILES, CML, INCHI
  • Interoperate with existing standards (RDF/OWL,
    LSID, SBML, PSI, CellML Metadata Standard)

85
Case study BioPAX in SBML facilitates SMBL
integration
  • Addresses SBMLs nasty data integration issues
  • Different data types, same representation
  • Same data, different representations
  • External references
  • Synonyms
  • Provenance

86
BioPAX Ontology Overview
species
reaction
modifier
Level 1 v1.0 (July 7th, 2004)
87
Different data types, same representation
  • Protein-Protein Interaction
  • ltreaction
  • idpyruvate_dehydrogenase_cplx/gt
  • ltlistOfReactantsgt
  • ltspeciesRef speciesPdhA/gt
  • ltspeciesRef speciesPdhB/gt
  • lt/listOfReactantsgt
  • ltlistOfProductsgt
  • ltspeciesRef speciesPyruvate_dehydrogenase_E1
    /gt
  • lt/listOfProductsgt
  • lt/reactiongt

Biochemical Reaction ltreaction
idpyruvate_dehydrogenase_rxn/gt
ltlistOfReactantsgt ltspeciesRef
speciesNADP/gt ltspeciesRef speciesCoA/gt
ltspeciesRef speciespyruvate/gt
lt/listOfReactantsgt ltlistOfProductsgt
ltspeciesRef speciesNADPH/gt ltspeciesRef
speciesacetyl-CoA/gt ltspeciesRef
speciesCO2/gt lt/listOfProductsgt
ltlistOfModifersgt ltmodifierSpeciesRef
speciespyruvate_dehydrogenase_E1/gt
lt/listOfModifiersgt lt/reactiongt
88
BioPAX solution metadata
  • ltsbml xmlnsbphttp//www.biopax.org/release1/bio
    pax-release1.owl
  • xmlnsowl"http//www.w3.org/2002/07/owl"
  • xmlnsrdf"http//www.w3.org/1999/02/22-rdf
    -syntax-ns"gt
  • ltlistOfSpeciesgt
  • ltspecies idPdhA metaidPdhAgt
  • ltannotationgt
  • ltbpprotein rdfIDPdhA/gt
  • lt/annotationgt
  • lt/speciesgt
  • ltspecies idNADP metaidNADPgt
  • ltannotationgt
  • ltbpsmallMolecule rdfIDNADP/gt
  • lt/annotationgt
  • lt/listOfSpeciesgt
  • ltlistOfReactionsgt
  • ltreaction idpyruvate_dehydrogenase_cplxgt
  • ltannotationgt
  • ltbpcomplexAssembly rdfIDpyruvate_dehydrog
    enase_cplx/gt
  • lt/annotationgt

89
BioPAX External References
  • ltspecies idpyruvate metaidpyruvategt
  • ltannotation
  • xmlnsbphttp//biopax.org/release1/biopax-r
    elease1.owlgt
  • ltbpsmallMolecule rdfIDpyruvategt
  • ltbpXrefgt
  • ltbpunificationXref
    rdfIDunificationXref119"gt
  • ltbpDBgtLIGANDlt/bpDBgt
  • ltbpIDgtc00022lt/bpIDgt
  • lt/bpunificationXrefgt
  • lt/bpXrefgt
  • lt/bpsmallMoleculegt
  • lt/annotationgt
  • lt/speciesgt

90
BioPAX Synonyms
  • ltspecies idpyruvate metaidpyruvategt
  • ltannotation xmlnsbphttp//biopax.org/release1/b
    iopax_release1.owl/gt
  • ltbpsmallMolecule rdfIDpyruvate gt
  • ltbpSYNONYMSgtpyroracemic acidlt/bpSYNONYMSgt
  • ltbpSYNONYMSgt2-oxo-propionic
    acidlt/bpSYNONYMSgt
  • ltbpSYNONYMSgtalpha-ketopropionic
    acidlt/bpSYNONYMSgt
  • ltbpSYNONYMSgt2-oxopropanoatelt/bpSYNONYMSgt
  • ltbpSYNONYMSgt2-oxopropanoic acidlt/bpSYNONYMSgt
  • ltbpSYNONYMSgtBTSlt/bpSYNONYMSgt
  • ltbpSYNONYMSgtpyruvic acidlt/bpSYNONYMSgt
  • lt/bpsmallMoleculegt
  • lt/annotationgt
  • lt/speciesgt

91
Write a Comment
User Comments (0)
About PowerShow.com