Integrating Ontological Prior Knowledge into Relational Learning for Protein Function Prediction - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Integrating Ontological Prior Knowledge into Relational Learning for Protein Function Prediction

Description:

proteins molecular machines in any organism ... Protein1. Protein2 ... Protein1. Protein2. Protein3. interact. interact. interact ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 27
Provided by: Admi95
Category:

less

Transcript and Presenter's Notes

Title: Integrating Ontological Prior Knowledge into Relational Learning for Protein Function Prediction


1
Integrating Ontological Prior Knowledge into
Relational Learning for Protein Function
Prediction
Stefan ReckowMax Planck Institute of
PsychiatryVolker TrespSiemens, Corporate
Technology
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAAAAAAAAAAA
2
Proteins and Protein Ontologies
3
Protein and Protein Functions
  • motivation
  • proteins molecular machines in any organism
  • understanding protein function is essential for
    all areas of bio-sciences
  • diverse sources of knowledge about proteins
  • challenges
  • experimental determination of functions difficult
    and expensive
  • homologies can be misleading
  • most proteins have several functions

4
Protein function prediction
What function does this protein have?
catalytic activity (catalyzes a
reaction) isomerase activity intramolecular
oxidoreductase activity
specificity
intramolecular oxidoreductase activity,
interconverting aldoses and ketoses
triose-phosphate isomerase activity (catalyzes a
very specific reaction)
5
Function Ontologies
  • ontologies are a way of bringing order in the
    function of proteins
  • an ontology is a description of concepts of a
    domain and their relationships
  • hierarchical representation (subclass-relationship
    )
  • tree
  • directed, acyclic graph

6
Complex Ontology
  • complex structure formed by a group of two or
    more proteins to perfom certain functions
    concertedly

7
Ontologies as Great Source of Prior Knowledge in
Machine Learning
  • A considerable amount of community effort is
    invested in designing ontologies
  • Typically this prior knowledge is deterministic
    (logical constraints)
  • Machine Learning should be able to exploit this
    knowledge
  • Interactions of proteins is an important
    information for predicting function statistical
    relational learning

8
Statistical Relational Learning with the IHRM
9
Statistical Relational Learning (SRL)
  • SRL generalizes standard Machine Learning to
    domains where relations between entities (and not
    just entity attributes) play a significant role
  • Examples PRM, DAPER, MLN, RMN, RDN
  • The IHRM is an easily applicable general model,
    performs a cluster analysis of relational domains
    and requires no structural learning
  • Z. Xu, V. Tresp, K. Yu, and H.-P. Kriegel.
    Infinite hidden relational models. In Proc. 22nd
    UAI, 2006
  • Kemp, C., Tenenbaum, J. B., Griffiths, T. L.,
    Yamada, T. Ueda, N. (2006). Learning systems of
    concepts with an infinite relational model. AAAI
    2006

10
Standard Latent Model for Protein Mixture Models
  • Protein1

Protein2
  • In a Bayesian approach, we can permit an infinite
    number of states in the latent variables and
    achieve a Dirichlet Process Mixture Model (DPM)
  • Advantage the model only uses a finite number of
    those states thus no time consuming structural
    optimization is required

11
Infinite Hidden Relational Model (IHRM)
  • Permits us to include protein-protein
    interactions into the model

interact
  • Protein1

Protein3
interact
interact
Protein2
12
Ground Network
function
motif
complex
Z2
motif
interact
interact
complex
Z3
interact
Z1
function
function
motif
complex
13
Experimental Results
  • KDD Cup 2001
  • Yeast genome data
  • 1243 genes/proteins 862 (training) / 381 (test)
  • Attributes
  • Chromosome
  • Motif (351) 1-6 A gene might contain one or
    more characteristic motifs (information about the
    amino acid sequence of the protein)
  • Essential
  • Structural class (24) 1-2 The protein coded by
    the gene might belong to one or more structural
    categories (24) 1-2
  • Phenotype (11)1-6 observed phenotypes in the
    organism
  • Interaction
  • Complex (56)1-3 The expression of the gene can
    complex with others to form a larger protein
  • Function (14)1-4 (cell growth, cell
    organization, transport, )
  • genes were anonymous

14
Results
Comparison with Supervised Models
ROC curve
Accuracy
Model
15
IHRM Result
Node gene Link interaction Color cluster.
16
Integrating Ontological Prior Knowledge into the
IHRM
17
Integration of ontologies
Deductive closure
18
Integration of ontologies
Zi
independent concepts
dependent concepts
function
motif
complex
cytoskeleton
translocon
actin filaments
microtubules
signal peptidase
19
Experiments Including Complex Ontology
  • Data collected from CYGD of MIPS
  • 1000 genes/proteins 800 (Training) / 200 (Test)
  • Attributes
  • chromosome, motif, essential, structural class,
    phenotype, interaction, complex, function
  • interactions from DIP
  • usage of ontological knowledge on complex
  • five levels of hierarchal
  • in our model 258 nodes (concepts) using 66 top
    level categories
  • every protein has at least one complex annotation
  • After including ontological constraints about
    three annotations per protein on average

20
Results
800 (training) / 200 (test) 200 (training)
/ 200 (test)
w/o ontology 0.895 with ontology 0.928
w/o ontology 0.832 with ontology 0.894
AUC
21
Results
explicit modeling of dependencies
22
Results
  • Grey in test set
  • proteins concerned with secretion and
    transportation
  • The "Golgi apparatus" works together with the
    "endoplasmatic reticulum (ER)" as the transport
    and delivery system of the cell.
  • "SNARE" proteins help to direct material to the
    correct destination
  • Test proteins also "cellular transport"
  • proteins acting in cell division
  • control proteins
  • "Septins Septins have several roles throughout
    the cell cycle and carry out essential functions
    in cytokinesis
  • The three highlighted proteins fit into this
    cluster ( "cell fate" and "cell type
    differentiation)

23
Results
sampling convergence
24
Results
Distribution of proteins in the clusters
25
Results
  • Grey former singletons
  • Cellular Transport Cluster
  • The former singleton "Clathrin light chain", as a
    major constituent of coated vesicles (a component
    for transport) fits into this cluster quite well
  • Tasks occurring during DNA replication
  • The former singleton "DNA polymerase", as a main
    actor in replication, obviously is assigned the
    correct cluster here

26
Conclusion
  • application of the IHRM to function prediction
  • competitive with supervised learning methods
  • insights into the solution
  • advantages of integrating ontological knowledge
  • improvement of the clustering structure
  • robustness stable results with varying
    parameterization
  • deductive closure prior to learning is a general
    powerful principle
  • future challenges
  • usage of several or more complex ontologies
  • further analysis of dependent vs. independent
    concepts
  • Acknowledgements
  • Karsten Borgwardt (MPIs Tübingen)
    Hans-Peter Kriegel (LMU)
Write a Comment
User Comments (0)
About PowerShow.com