Formal Structuring of Genomic Knowledge - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Formal Structuring of Genomic Knowledge

Description:

The first Organon was written by the Ancient Greek philosopher Aristotle in the ... The second great Organon, the Novum Organum (1620) of Francis Bacon was written ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 26
Provided by: bioont
Category:

less

Transcript and Presenter's Notes

Title: Formal Structuring of Genomic Knowledge


1
Formal Structuring of Genomic Knowledge
  • Nigam Shah
  • Postdoctoral Fellow, SMI
  • nigam_at_stanford.edu

2
The Understanding cycle
Evaluate for consistency with known information
Formulate hypothesis
Identify conflicts and suggest corrections
Get best possible match with data
Store validated hypotheses
Design experiment to test hypothesis
HyBrow assists in the tasks bound by the red
outline
3
Walking along this cycle is hard
  • The way much of biology works is by applying
    prior knowledge (what is known) for
    interpreting datasets rather than the application
    of a set of axioms that will elicit knowledge.
    (Stevens et al, 2000)
  • We need to explicitly articulate what is known
    thats a problem with the current information
    overload.
  • If we explicitly articulate what is known, in
    an organizing framework, it serves as a reference
    for integrating new data with prior knowledge.
  • And increases our ability to fit the results into
    the big picture.

4
How can we make it easier?
  • If we design a framework for making statements or
    sets of statements, comprising a hypothesis,
    about biological processes and systematically
    examine a wide variety of datasets for evaluating
    them.
  • We can speed up the understanding cycle.

5
Events and Implicit claims
  • An hypothesis is a statement about relationships
    (among objects) within a biological system.
  • Protein P induces transcription of gene X
  • An event is a relationship between two
    biological entities, which we call agents.

P
promoter gene X
  • Implicit claims that can be tested
  • P is a transcription factor.
  • P is a transcriptional activator.
  • P is localized to the nucleus.
  • P can bind to the promoter of gene X

6
Components of a formal representation
Formal representation
Domain knowledge model (Ontology)
Conceptual framework
Establish a correspondence between the conceptual
framework and the ontology
Knowledgebase
Domain information and knowledge structured into
the knowledge model
Data generated by researchers. Not always
accessible or available in a Model Organism
Database (except sequence and microarray data)
Curated data information. Large amount of
information is created stored by model organism
databases
Database
7
The conceptual framework
Event ? Subject.Verb.Object Event ?
Subject.Verb.Object.Context Event ?
Subject.Verb.Object.Context.AssocCond Subject ?
(Actor Context Event) Verb ? (Physical
Biochemical Logical) Object ? (Actor Context
Event) Actor ? (Gene Protein Complex
) Context ? (Physical Genetic
Temporal) AssocCond ? (Presence of absence
of).Agent
  • The terminal symbols which cannot be further
    decomposed in a grammar are supplied by the
    hypothesis ontology.
  • This grammar together with the hypothesis
    ontology, allows us to represent hypotheses in a
    formal language

We have specified methods to evaluate formal
language hypotheses for internal consistency
agreement with existing knowledge.
8
The conceptual framework
  • Consistency of an hypothesis with prior knowledge
    is evaluated by applying constraints and rules.
  • A constraint is a statement specifying the
    evidence that contradicts or supports an event.
  • A protein must be in the nucleus to bind to a
    promoter.
  • A rule comprises the steps for deciding whether
    a constraint is satisfied or violated.

Binds_to_promoter P, g Annotation
constraints if cellular location of P is not
nucleus, give a penalty. if biological process
is not transcription, give a penalty.
9
Components of a formal representation
Formal representation
Domain knowledge model (Ontology)
Conceptual framework
Establish a correspondence between the conceptual
framework and the ontology
Knowledgebase
Domain information and knowledge structured into
the knowledge model
Data generated by researchers. Not always
accessible or available in a Model Organism
Database (except sequence and microarray data)
Curated data information. Large amount of
information is created stored by model organism
databases
Database
10
Hypothesis Ontology
  • Expressive enough to describe the galactose
    system at a coarse level of detail.
  • It is compatible with other ontology efforts.
  • E.g. GO so that GO annotations can be used
    directly in HyBrow.
  • We have also developed a grammar to write
    hypotheses using events from this ontology.

11
Grammar for a hypothesis
A hypothesis consists of at least one event
stream An event stream is a sequence of one or
more events or event streams with logical joints
(or operators) between them. An event has exactly
one agent_a, exactly one agent_b and exactly one
operator (i.e. a relationship between the two
agents). It also has a physical location that
denotes where the event happened, the genetic
context of the organism and associated
experimental perturbations when the event
happened. A logical joint is the conjunction
between two event streams.
12
Components of a formal representation
Formal representation
Domain knowledge model (Ontology)
Conceptual framework
Establish a correspondence between the conceptual
framework and the ontology
Knowledgebase
Domain information and knowledge structured into
the knowledge model
Data generated by researchers. Not always
accessible or available in a Model Organism
Database (except sequence and microarray data)
Curated data information. Large amount of
information is created stored by model organism
databases
Database
13
Constraints
  • A constraint is a statement specifying the
    evidence that supports or contradicts an event.
  • Types of constraints
  • Ontology
  • Data
  • Existence
  • Temporal
  • X binds to promoter of Y
  • Ontology
  • X must be a protein, complex Y must be a gene
  • Data
  • X must be annotated to be localized to the
    nucleus.
  • The promoter of Y must have a binding site for X
  • Existence
  • The gene for X must be present

14
Rules
A rule decides whether a constraint is satisfied
or violated.
The first layer of rules enforce the constraints
to decide support or conflict based on the data
we have.
A second layer of rules check the logical
structure of the hypothesis
15
Components of a formal representation
Formal representation
Domain knowledge model (Ontology)
Conceptual framework
Establish a correspondence between the conceptual
framework and the ontology
Knowledgebase
Domain information and knowledge structured into
the knowledge model
Data generated by researchers. Not always
accessible or available in a Model Organism
Database (except sequence and microarray data)
Curated data information. Large amount of
information is created stored by model organism
databases
Database
16
The knowledgebase
Proteomics
Microarray
HyBrow KB
Sequence
Literature
17
User interfaces
Hypothesis described in Natural Language
Biological process described in a formal language
18
Evaluating an hypothesis
19
Evaluating an hypothesis
20
Screen shot of the output
A list of events in the submitted hypothesis
A plot of the counts of support and conflicts
An explanation for each support / conflict with a
link to the data source
21
HyBrow take home
  • The minimum requirement for a formal
    representation
  • Ability to represent data ? information ?
    Knowledge
  • A language to express your thought experiment
    (your model, hypothesis, theory, theorem etc)
  • A reasoning framework to evaluate the outcome/
    validity/accuracy of your thought experiment
  • We should not aim to use all the data and come up
    with ONE model that explains everything.
  • It is much better to propose a model and examine
    if your data supports/contradicts it

22
A clinical example
  • Autism is a developmental disability
    characterized by severe and pervasive impairment
    in several areas of development.
  • Nutrigenomics is gathering a lot of attention in
    Autism treatment
  • DAN! (defeat autism now!) researchers sometimes
    refer to this as biomedical treatment
  • Tests for deciding the optimal nutrigenomics
    therapy are costly and hard to interpret

23
Excerpt from a parents email
  • right now, that is a manual process to relate
    the genetic (mutation info...) and any microbial
    inputs to a biochemical pathway diagram and
    relate the mutations to specific supplement or
    enzyme therapies. It costs gt 1000 and 6-8 months
    for someone to manually interpret the results.
  • I was wondering if it would be helpful to develop
    a model to contain the static/known information
    and some dynamic models to help answer some
    interesting questions relevant to the person's
    data.
  • This might make it possible to develop tools for
    a physician or motivated individual to use
    nutrigenomic information.

24
Credits and acknowledgements
  • Stephen Racunas
  • Co-developer of HyBrow
  • Funding
  • NIH

25
Orgnanon
  • an Organon, an instrument for the proper conduct
    and representation of scientific research.
  • The first Organon was written by the Ancient
    Greek philosopher Aristotle in the 4th Century
    B.C., and included his works on logic and the
    theory of science.1
  • The second great Organon, the Novum Organum
    (1620) of Francis Bacon was written as an update,
    extension and correction of the Aristotelian
    Organon in light of the success and experimental
    methods of post-Galilean modern natural science
    almost 2000 years latter.2
  • 1 The works known as Aristotles Organon can be
    found in The Complete Works of Aristotle, Two
    Volumes (Jonathan Barnes ed.). Princeton
    Princeton University Press, 1984.
  • 2 Bacon, F. Novum Organum (Urback, P. and
    Gibson, J. transl. and eds.). Chicago Open
    Court, 1994.
Write a Comment
User Comments (0)
About PowerShow.com