Introduction to Ontology - PowerPoint PPT Presentation

1 / 86
About This Presentation
Title:

Introduction to Ontology

Description:

Electronic Health Record data. Clinical trial data. Patient histories. Medical imaging ... existing clinical vocabularies are of variable quality and low mutual ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 87
Provided by: phis9
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Ontology


1
Introduction to Ontology
  • Barry Smith
  • http//ontology.buffalo.edu/smith

2
Who am I?
  • NCBO National Center for Biomedical Ontology
    (NIH Roadmap Center)
  • Stanford Medical Informatics
  • University of San Francisco Medical Center
  • Berkeley Drosophila Genome Project
  • Cambridge University Department of Genetics
  • The Mayo Clinic
  • University at Buffalo Department of Philosophy

3
Who am I?
  • NYS Center of Excellence in Bioinformatics and
    Life Sciences Ontology Research Group
  • Buffalo Clinical and Translational Science
    Institute (CTSI)

4
Who am I?
  • Cleveland Clinic Semantic Database
  • Gene Ontology
  • Ontology for Biomedical Investigations
  • Open Biomedical Ontologies Consortium
  • Institute for Formal Ontology and Medical
    Information Science
  • BIRN Ontology Task Force
  • ...

5
(No Transcript)
6
natural language labels
to make the data cognitively accessible to human
beings
and algorithmically tractable to computers
7
compare legends for maps
compare legends for maps
8
compare legends for maps
common legends allow (cross-border) integration
9
ontologies are legends for data
10
legends
  • help human beings use and understand complex
    representations of reality
  • help human beings create useful complex
    representations of reality
  • help computers process complex representations
    of reality
  • help glue data together

11
annotations using common ontologies can yield
integration of image data
12
computationally tractable legends
  • help human beings find things in very large
    complex representations of reality

13
where in the body ? where in the cell ?
what kind of organism ?
what kind of disease process ?
14
  • to yield
  • distributed accessibility of the data to
    humans
  • reasoning with the data
  • cumulation for purposes of research
  • incrementality and evolvability
  • integration with clinical data

Creating broad-coverage semantic annotation
systems for biomedicine
15
(No Transcript)
16
(No Transcript)
17
The Gene Ontology
18
The Gene Ontology
19
(No Transcript)
20
(No Transcript)
21
The Idea of Common Controlled Vocabularies
GlyProt
MouseEcotope
sphingolipid transporter activity
DiabetInGene
GluChem
22
The Idea of Common Controlled Vocabularies
GlyProt
MouseEcotope
Holliday junction helicase complex
DiabetInGene
GluChem
23
Multiple kinds of data in multiple kinds of silos
  • Lab / pathology data
  • Electronic Health Record data
  • Clinical trial data
  • Patient histories
  • Medical imaging
  • Microarray data
  • Protein chip data
  • Flow cytometry
  • Mass spec
  • Genotype / SNP data

24
How to find your data?
  • How to find other peoples data?
  • How to reason with data when you find it?
  • How to work out what data does not yet exist?

25
Multiple kinds of standardization for data
  • Terminologies (SNOMED, UMLS)
  • CDEs (Clinical research)
  • Information Exchange Standards (HL7 RIM)
  • LIMS (LOINC)
  • MGED standards for microarray data, etc.

26
how solve the problem of making such data
queryable and re-usable by others to address NIH
mandates?
part of the solution must involve standardized
terminologies and coding schemes
27
most successful, thus far UMLS
  • collection of separate terminologies built by
    trained experts
  • massively useful for information retrieval and
    information integration
  • UMLS Metathesaurus a system of post hoc mappings
    between overlapping source vocabularies

28
for UMLS
  • local usage respected
  • regimentation frowned upon
  • cross-framework consistency not important
  • no concern to establish consistency with basic
    science
  • different grades of formal rigor, different
    degrees of completeness, different update policies

29
caBIG approach BRIDG (top-down imposition)
30
(No Transcript)
31
for science
  • where do you find scientifically validated
    information linking gene products and other
    entities represented in biochemical databases to
    semantically meaningful terms pertaining to
    disease, anatomy, development in different model
    organisms?

A new approach
32
  • caBIG
  • BRIDG

33
SNOMED
  • Ultimately as data become attached to the
    samples (e.g., pathology data, genotypes) these
    will be linked to the patient records.

34
where in the body ? where in the cell ?
what kind of organism ?
what kind of disease process ?
35
ontologies high quality controlled structured
vocabularies for the annotation (description) of
data
36
compare legends for diagrams
37
or chemistry diagrams
legends for chemistry diagrams
Prasanna, et al. Chemical Compound Navigator A
Web-Based Chem-BLAST, Chemical Taxonomy-Based
Search Engine for Browsing Compounds PROTEINS
Structure, Function, and Bioinformatics
63907917 (2006)
38
Ramirez et al. Linking of Digital Images to
Phylogenetic Data Matrices Using a Morphological
Ontology Syst. Biol. 56(2)283294, 2007
39
The Network Effects of Synchronization
GlyProt
MouseEcotope
Holliday junction helicase complex
DiabetInGene
GluChem
40
Five bangs for your GO buck
  • based in biological science
  • incremental approach (evidence-based evolutionary
    pathway)
  • cross-species data comparability (human, mouse,
    yeast, fly ...)
  • cross-granularity data integration (molecule,
    cell, organ, organism)
  • cumulation of scientific knowledge in
    algorithmically tractable form, links people to
    software

41
  • Model organism databases employ scientific
    curators who use the experimental observations
    reported in the biomedical literature to
    associate GO terms with entries in gene product
    and other molecular biology databases
  • (4 mill. p.a. NIH funding)

The methodology of annotations
42
How to extend the GO methodology to other domains
of clinical and translational medicine?

43
the problem
existing clinical vocabularies are of variable
quality and low mutual consistency current
proliferation of tiny ontologies by different
groups with urgent annotation needs
44
the solution
  • establish common rules governing best practices
    for creating ontologies in coordinated fashion,
    with an evidence-based pathway to incremental
    improvement

45
How to build an ontology
  • work with scientists to create an initial
    top-level classification
  • find 50 most commonly used terms corresponding
    to types in reality
  • arrange these terms into an informal is_a
    hierarchy according to the universality principle
  • A is_a B ? every instance of A is an instance of
    B
  • fill in missing terms to give a complete
    hierarchy
  • (leave it to domain scientists to populate the
    lower levels of the hierarchy)

46
First step (2003)
  • a shared portal for (so far) 58 ontologies
  • (low regimentation)
  • http//obo.sourceforge.net ? NCBO BioPortal

47
(No Transcript)
48
OBO now the principal entry point for creation of
web-accessible biomedical data
  • OBO and OBOEdit low-tech to encourage users
  • Simple (web-service-based) tools created to
    support the work of biologists in creating
    annotations (data entry)
  • OBO ? OWL DL converters make OBO Foundry
    annotated data immediately accessible to Semantic
    Web data integration projects

49
Second step (2004)reform efforts initiated,
e.g. linking GO formally to other ontologies and
data sources
GO

Cell type

Osteoblast differentiation Processes whereby an
osteoprogenitor cell or a cranial neural crest
cell acquires the specialized features of an
osteoblast, a bone-forming cell which secretes
extracellular matrix.
New Definition
50
Third step (2006)
The OBO Foundryhttp//obofoundry.org/
51
(No Transcript)
52
Building out from the original GO
53
initial OBO Foundry coverage
54
  • Continuants (aka endurants)
  • have continuous existence in time
  • preserve their identity through change
  • exist in toto whenever they exist at all
  • Occurrents (aka processes)
  • have temporal parts
  • unfold themselves in successive phases
  • exist only in their phases

55
You are a continuant
  • Your life is an occurrent
  • You are 3-dimensional
  • Your life is 4-dimensional

56
Dependent entities
  • require independent continuants as their bearers
  • There is no run without a runner
  • There is no grin without a cat

57
Dependent vs. independent continuants
  • Independent continuants (organisms, buildings,
    environments)
  • Dependent continuants (quality, shape, role,
    propensity, function, status, power, right)

58
All occurrents are dependent entities
  • They are dependent on those independent
    continuants which are their participants (agents,
    patients, media ...)

59
BFO Top-Level Ontology
Continuant
Occurrent (always dependent on one or more
independent continuants)
Independent Continuant
Dependent Continuant
60
A representation of top-level types
Continuant
Occurrent
biological process
Independent Continuant
Dependent Continuant
cell component
molecular function
61
Top-Level Ontology
Continuant
Occurrent
Independent Continuant
Dependent Continuant
Side-Effect, Stochastic Process, ...
Functioning
Function
62
Top-Level Ontology
Continuant
Occurrent
Independent Continuant
Dependent Continuant
Functioning
Side-Effect, Stochastic Process, ...
Function
63
Top-Level Ontology
instances (in space and time)
64
CRITERIA
  • The ontology is open and available to be used by
    all.
  • The ontology is in, or can be instantiated in, a
    common formal language.
  • The developers of the ontology agree in advance
    to collaborate with developers of other OBO
    Foundry ontology where domains overlap.

CRITERIA
65
  • UPDATE The developers of each ontology commit to
    its maintenance in light of scientific advance,
    and to soliciting community feedback for its
    improvement.
  • ORTHOGONALITY They commit to working with other
    Foundry members to ensure that, for any
    particular domain, there is community convergence
    on a single controlled vocabulary.

CRITERIA
66
  • communities must work together to ensure
    consistency ? orthogonality ? modular development
    plus additivity of annotations
  • if we annotate a database or body of literature
    with one OBO Foundry ontology, we should be able
    to add annotations from a second such ontology
    without conflicts
  • ontologies do not need to create tiny theories of
    anatomy or chemistry within themselves

ORTHOGONALITY
67
CRITERIA
  • IDENTIFIERS The ontology possesses a unique
    identifier space within OBO.
  • VERSIONING The ontology provider has procedures
    for identifying distinct successive versions.
  • The ontology includes textual definitions for all
    terms.

CRITERIA
68
  • CLEARLY BOUNDED The ontology has a clearly
    specified and clearly delineated content.
  • DOCUMENTATION The ontology is well-documented.
  • USERS The ontology has a plurality of
    independent users.

CRITERIA
69
  • COMMON ARCHITECTURE The ontology uses relations
    which are unambiguously defined following the
    pattern of definitions laid down in the OBO
    Relation Ontology

CRITERIA
70
  • OBO Foundry is serving as a benchmark for
    improvements in discipline-focused terminology
    resources
  • yielding callibration of existing terminologies
    and data resources and alignment of different
    views

Consequences
71
Foundry ontologies all work in the same way
  • all are built to represent the types existing in
    a pre-existing domain and the relations between
    these types in a way which can support reasoning
  • we have data
  • we need to make this data available for semantic
    search and algorithmic processing
  • we create a consensus-based ontology for
    annotating the data
  • and ensure that it can interoperate with Foundry
    ontologies for neighboring domains

72
Mature OBO Foundry ontologies (now undergoing
reform)
  • Cell Ontology (CL)
  • Chemical Entities of Biological Interest (ChEBI)
  • Foundational Model of Anatomy (FMA)
  • Gene Ontology (GO)
  • Phenotypic Quality Ontology (PaTO)
  • Relation Ontology (RO)
  • Sequence Ontology (SO)

73
Ontologies being built to satisfy Foundry
principles ab initio
  • Ontology for Clinical Investigations (OCI)
  • Common Anatomy Reference Ontology (CARO)
  • Ontology for Biomedical Investigations (OBI)
  • Protein Ontology (PRO)
  • RNA Ontology (RnaO)
  • Subcellular Anatomy Ontology (SAO)

74
Ontologies in planning phase
  • Biobank/Biorepository Ontology (BrO, part of OBI)
  • Environment Ontology (EnvO)
  • Immunology Ontology (ImmunO)
  • Infectious Disease Ontology (IDO)
  • Mouse Adult Neurogenesis Ontology (MANGO)

75
OBO Foundry provides a method for handling legacy
databases
76
Senselab/NeuronDB
  • NeuronDB comprehends three types of neuronal
    properties
  • voltage gated conductances
  • neurotransmitter receptors
  • neurotransmitter substances
  • Many questions immediately arise what are
    receptors? Proteins? Protein complexes? The
    Foundry framework provides an opportunity to
    evaluate such choices.

http//senselab.med.yale.edu/
77
Senselab/NeuronDB
  • The GO Molecular Function (MF) ontology already
    has classes such as receptor activity
    (GO_0004872) plus subclasses describing receptor
    activities already referred to in NeuronDB.
  • This provides a roadmap for further development.
    Review the 130 receptor classes to see if they
    exist in MF, where not, create subclasses and
    submit to GO for future inclusion. We can then
    e.g. take advantage of GO Annotations to find
    the proteins that correspond to these receptor
    classes in different species.

78
OBO Foundry Success Story
  • Model organism research seeks results valuable
    for the understanding of human disease.
  • This requires the ability to make reliable
    cross-species comparisons, and for this anatomy
    is crucial.
  • But different MOD communities have developed
    their anatomy ontologies in uncoordinated
    fashion.

79
Multiple axes of classification
Functional cardiovascular system, nervous
system Spatial head, trunk, limb Developmental
endoderm, germ ring, lens placode Structural
tissue, organ, cell Stage developmental staging
series
80
  • Developmental terms are often lumped together for
    lack of a way to categorize them
  • Stages are represented in a variety of ways.
    Terms can be children of superstages, stages can
    be integrated into each term, or stages can be
    assigned to terms from a separate ontology

81
Ontologies facilitate grouping of annotations
brain 20 hindbrain 15
rhombomere 10
Query brain without ontology 20 Query brain
with ontology 45
82
CARO Common Anatomy Reference Ontology
  • for the first time provides guidelines for model
    organism researchers who wish to achieve
    comparability of annotations
  • for the first time provides guidelines for those
    new to ontology work
  • See Haendel et al., CARO The Common Anatomy
    Reference Ontology, in Burger (ed.), Anatomy
    Ontologies for Bioinformatics Springer, in press.

83
CARO-conformant ontologies already in development
  • Fish Multi-Species Anatomy Ontology (NSF funding
    received)
  • Ixodidae and Argasidae (Tick) Anatomy Ontology
  • Mosquito Anatomy Ontology (MAO)
  • Spider Anatomy Ontology
  • Xenopus Anatomy Ontology (XAO)
  • undergoing reform Drosophila and Zebrafish
    Anatomy Ontologies

84
OBI / OCI
  • Ontology for Biomedical Investigations
  • overarching terminology resource for MIBBI
    Foundry
  • Ontology for Clinical Investigations
  • collaboration with EPOCH ontology for clinical
    trial management
  • and with CDISC (FDA mandated vocabulary for
    clinical trial reports)

85
next step repertoire of disease ontologiesbuilt
out of OBO Foundry elements
86
Scope of Draft Ontology for Multiple Sclerosis

Write a Comment
User Comments (0)
About PowerShow.com