How Ontologies Create Research Communities - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

How Ontologies Create Research Communities

Description:

How Ontologies Create Research Communities Barry Smith University at Buffalo http://ontology.buffalo.edu/smith * – PowerPoint PPT presentation

Number of Views:268
Avg rating:3.0/5.0
Slides: 76
Provided by: BarryS214
Category:

less

Transcript and Presenter's Notes

Title: How Ontologies Create Research Communities


1
How Ontologies Create Research Communities
  • Barry Smith
  • University at Buffalo
  • http//ontology.buffalo.edu/smith

2
Who am I?
  • NCBO National Center for Biomedical Ontology
    (NIH Roadmap Center)
  • Stanford Medical Informatics
  • University of San Francisco Medical Center
  • Berkeley Drosophila Genome Project
  • Cambridge University Department of Genetics
  • The Mayo Clinic
  • University at Buffalo Department of Philosophy

3
Who am I?
  • NYS Center of Excellence in Bioinformatics and
    Life Sciences Ontology Research Group
  • Buffalo Clinical and Translational Science
    Institute (CTSI)
  • Duke/Dallas/Houston CTSA Ontology Consortium

4
Who am I?
  • Cleveland Clinic Semantic Database
  • Gene Ontology
  • Ontology for Biomedical Investigations
  • Open Biomedical Ontologies Consortium
  • Institute for Formal Ontology and Medical
    Information Science
  • BIRN Ontology Task Force
  • ...

5
Multiple kinds of data in multiple kinds of silos
  • Lab / pathology data
  • Electronic Health Record data
  • Clinical trial data
  • Patient histories
  • Medical imaging
  • Microarray data
  • Protein chip data
  • Flow cytometry
  • Mass spec
  • Genotype / SNP data

6
How to find your data?
  • How to find other peoples data?
  • How to reason with data when you find it?
  • How to work out what data does not yet exist?

7
Multiple kinds of standardization for data
  • Terminologies (SNOMED, UMLS)
  • CDEs (Clinical research)
  • Information Exchange Standards (HL7 RIM)
  • LIMS (LOINC)
  • MGED standards for microarray data, etc.

8
how solve the problem of making such data
queryable and re-usable by others to address NIH
mandates?
part of the solution must involve standardized
terminologies and coding schemes
9
most successful, thus far UMLS
  • collection of separate terminologies built by
    trained experts
  • massively useful for information retrieval and
    information integration
  • UMLS Metathesaurus a system of post hoc mappings
    between overlapping source vocabularies

10
for UMLS
  • local usage respected
  • regimentation frowned upon
  • cross-framework consistency not important
  • no concern to establish consistency with basic
    science
  • different grades of formal rigor, different
    degrees of completeness, different update policies

11
caBIG approach BRIDG (top-down imposition)
12
(No Transcript)
13
for science
  • where do you find scientifically validated
    information linking gene products and other
    entities represented in biochemical databases to
    semantically meaningful terms pertaining to
    disease, anatomy, development in different model
    organisms?

A new approach
14
(No Transcript)
15
where in the body ? where in the cell ?
16
where in the body ? where in the cell ?
what kind of organism ?
17
where in the body ? where in the cell ?
what kind of organism ?
what kind of disease process ?
18
we need semantic annotation of data
we need ontologies
19
natural language labels designed for use
in annotations
to make the data cognitively accessible to human
beings
and algorithmically tractable to computers
20
compare legends for maps
compare legends for maps
21
compare legends for maps
common legends allow (cross-border) integration
22
ontologies are legends for data
23
ontologies high quality controlled structured
vocabularies for the annotation (description) of
data
24
compare legends for diagrams
25
or chemistry diagrams
legends for chemistry diagrams
Prasanna, et al. Chemical Compound Navigator A
Web-Based Chem-BLAST, Chemical Taxonomy-Based
Search Engine for Browsing Compounds PROTEINS
Structure, Function, and Bioinformatics
63907917 (2006)
26
Ramirez et al. Linking of Digital Images to
Phylogenetic Data Matrices Using a Morphological
Ontology Syst. Biol. 56(2)283294, 2007
27
computationally tractable legends
  • help integrate complex representations of
    reality
  • help human beings find things in complex
    representations of reality
  • help computers reason with complex
    representations of reality

28
(No Transcript)
29
(No Transcript)
30
The Gene Ontology
31
what cellular component?
what molecular function?
what biological process?
32
The Idea of Common Controlled Vocabularies
GlyProt
MouseEcotope
sphingolipid transporter activity
DiabetInGene
GluChem
33
The Network Effects of Synchronization
GlyProt
MouseEcotope
Holliday junction helicase complex
DiabetInGene
GluChem
34
Five bangs for your GO buck
  • Five bangs for your GO buck
  • based in biological science
  • incremental approach (evidence-based evolutionary
    pathway)
  • cross-species data comparability (human, mouse,
    yeast, fly ...)
  • cross-granularity data integration (molecule,
    cell, organ, organism)
  • cumulation of scientific knowledge in
    algorithmically tractable form, links people to
    software

35
  • Model organism databases employ scientific
    curators who use the experimental observations
    reported in the biomedical literature to
    associate GO terms with entries in gene product
    and other molecular biology databases
  • (4 mill. p.a. NIH funding)

The methodology of annotations
36
what cellular component?
what molecular function?
what biological process?
37
How to extend the GO methodology to other domains
of clinical and translational medicine?

38
the problem
existing clinical vocabularies are of variable
quality and low mutual consistency current
proliferation of tiny ontologies by different
groups with urgent annotation needs
39
(No Transcript)
40
the solution
  • establish common rules governing best practices
    for creating ontologies in coordinated fashion,
    with an evidence-based pathway to incremental
    improvement

41
First step (2003)
  • a shared portal for (so far) 58 ontologies
  • (low regimentation)
  • http//obo.sourceforge.net ? NCBO BioPortal

42
(No Transcript)
43
OBO now the principal entry point for creation of
web-accessible biomedical data
  • OBO and OBOEdit low-tech to encourage users
  • Simple (web-service-based) tools created to
    support the work of biologists in creating
    annotations (data entry)
  • OBO ? OWL DL converters make OBO Foundry
    annotated data immediately accessible to Semantic
    Web data integration projects

44
Second step (2004)reform efforts initiated,
e.g. linking GO formally to other ontologies and
data sources
GO

Cell type

Osteoblast differentiation Processes whereby an
osteoprogenitor cell or a cranial neural crest
cell acquires the specialized features of an
osteoblast, a bone-forming cell which secretes
extracellular matrix.
New Definition
45
Third step (2006)
The OBO Foundryhttp//obofoundry.org/
46
Ontology Scope URL Custodians
Cell Ontology (CL) cell types from prokaryotes to mammals obo.sourceforge.net/cgi- bin/detail.cgi?cell Jonathan Bard, Michael Ashburner, Oliver Hofman
Chemical Entities of Bio- logical Interest (ChEBI) molecular entities ebi.ac.uk/chebi Paula Dematos, Rafael Alcantara
Common Anatomy Refer- ence Ontology (CARO) anatomical structures in human and model organisms (under development) Melissa Haendel, Terry Hayamizu, Cornelius Rosse, David Sutherland,
Foundational Model of Anatomy (FMA) structure of the human body fma.biostr.washington. edu JLV Mejino Jr., Cornelius Rosse
Functional Genomics Investigation Ontology (FuGO) design, protocol, data instrumentation, and analysis fugo.sf.net FuGO Working Group
Gene Ontology (GO) cellular components, molecular functions, biological processes www.geneontology.org Gene Ontology Consortium
Phenotypic Quality Ontology (PaTO) qualities of anatomical structures obo.sourceforge.net/cgi -bin/ detail.cgi? attribute_and_value Michael Ashburner, Suzanna Lewis, Georgios Gkoutos
Protein Ontology (PrO) protein types and modifications (under development) Protein Ontology Consortium
Relation Ontology (RO) relations obo.sf.net/relationship Barry Smith, Chris Mungall
RNA Ontology (RnaO) three-dimensional RNA structures (under development) RNA Ontology Consortium
Sequence Ontology (SO) properties and features of nucleic sequences song.sf.net Karen Eilbeck
47
RELATION TO TIME GRANULARITY CONTINUANT CONTINUANT CONTINUANT CONTINUANT OCCURRENT
RELATION TO TIME GRANULARITY INDEPENDENT INDEPENDENT DEPENDENT DEPENDENT
ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality(PaTO) Biological Process (GO)
CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Phenotypic Quality(PaTO) Biological Process (GO)
MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Function (GO) Molecular Process (GO)
Building out from the original GO
48
CONTINUANT CONTINUANT CONTINUANT CONTINUANT OCCURRENT
INDEPENDENT INDEPENDENT DEPENDENT DEPENDENT
ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality(PaTO) Organism-Level Process (GO)
CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Phenotypic Quality(PaTO) Cellular Process (GO)
MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Function (GO) Molecular Process (GO)
initial OBO Foundry coverage
49
CRITERIA
  • The ontology is open and available to be used by
    all.
  • The ontology is in, or can be instantiated in, a
    common formal language.
  • The developers of the ontology agree in advance
    to collaborate with developers of other OBO
    Foundry ontology where domains overlap.

CRITERIA
50
  • UPDATE The developers of each ontology commit to
    its maintenance in light of scientific advance,
    and to soliciting community feedback for its
    improvement.
  • ORTHOGONALITY They commit to working with other
    Foundry members to ensure that, for any
    particular domain, there is community convergence
    on a single controlled vocabulary.

CRITERIA
51
for science
  • communities must work together to ensure
    consistency ? orthogonality ? modular development
    plus additivity of annotations
  • if we annotate a database or body of literature
    with one OBO Foundry ontology, we should be able
    to add annotations from a second such ontology
    without conflicts
  • ontologies do not need to create tiny theories of
    anatomy or chemistry within themselves

ORTHOGONALITY
52
CRITERIA
  • IDENTIFIERS The ontology possesses a unique
    identifier space within OBO.
  • VERSIONING The ontology provider has procedures
    for identifying distinct successive versions.
  • The ontology includes textual definitions for all
    terms.

CRITERIA
53
  • CLEARLY BOUNDED The ontology has a clearly
    specified and clearly delineated content.
  • DOCUMENTATION The ontology is well-documented.
  • USERS The ontology has a plurality of
    independent users.

CRITERIA
54
  • COMMON ARCHITECTURE The ontology uses relations
    which are unambiguously defined following the
    pattern of definitions laid down in the OBO
    Relation Ontology

CRITERIA
55
  • OBO Foundry is serving as a benchmark for
    improvements in discipline-focused terminology
    resources
  • yielding callibration of existing terminologies
    and data resources and alignment of different
    views

Consequences
56
Foundry ontologies all work in the same way
  • all are built to represent the types existing in
    a pre-existing domain and the relations between
    these types in a way which can support reasoning
  • we have data
  • we need to make this data available for semantic
    search and algorithmic processing
  • we create a consensus-based ontology for
    annotating the data
  • and ensure that it can interoperate with Foundry
    ontologies for neighboring domains

57
Mature OBO Foundry ontologies (now undergoing
reform)
  • Cell Ontology (CL)
  • Chemical Entities of Biological Interest (ChEBI)
  • Foundational Model of Anatomy (FMA)
  • Gene Ontology (GO)
  • Phenotypic Quality Ontology (PaTO)
  • Relation Ontology (RO)
  • Sequence Ontology (SO)

58
Ontologies being built to satisfy Foundry
principles ab initio
  • Ontology for Clinical Investigations (OCI)
  • Common Anatomy Reference Ontology (CARO)
  • Ontology for Biomedical Investigations (OBI)
  • Protein Ontology (PRO)
  • RNA Ontology (RnaO)
  • Subcellular Anatomy Ontology (SAO)

59
Ontologies in planning phase
  • Biobank/Biorepository Ontology (BrO, part of OBI)
  • Environment Ontology (EnvO)
  • Immunology Ontology (ImmunO)
  • Infectious Disease Ontology (IDO)
  • Mouse Adult Neurogenesis Ontology (MANGO)

60
OBO Foundry Success Story
  • Model organism research seeks results valuable
    for the understanding of human disease.
  • This requires the ability to make reliable
    cross-species comparisons, and for this anatomy
    is crucial.
  • But different MOD communities have developed
    their anatomy ontologies in uncoordinated
    fashion.

61
Ontologies facilitate grouping of annotations
brain 20 hindbrain 15
rhombomere 10
Query brain without ontology 20 Query brain
with ontology 45
62
CARO Common Anatomy Reference Ontology
  • for the first time provides guidelines for model
    organism researchers who wish to achieve
    comparability of annotations
  • for the first time provides guidelines for those
    new to ontology work
  • See Haendel et al., CARO The Common Anatomy
    Reference Ontology, in Burger (ed.), Anatomy
    Ontologies for Bioinformatics Springer, in press.

63
CARO-conformant ontologies already in development
  • Fish Multi-Species Anatomy Ontology (NSF funding
    received)
  • Ixodidae and Argasidae (Tick) Anatomy Ontology
  • Mosquito Anatomy Ontology (MAO)
  • Spider Anatomy Ontology
  • Xenopus Anatomy Ontology (XAO)
  • undergoing reform Drosophila and Zebrafish
    Anatomy Ontologies

64
  • June 2006 establishment of MICheck
  • reflects growing need for prescriptive
    checklists specifying the key information to
    include when reporting experimental results
    (concerning methods, data, analyses and results).

Minimal Information Checklists
65
  • MIBBI a common resource for minimum information
    checklists analogous to OBO / NCBO BioPortal
  • MIBBI Foundry will create a suite of
    self-consistent, clearly bounded, orthogonal,
    integrable checklist modules
  • Taylor CF, et al. Nature Biotech, in press

The vision is spreading
66
  • Transcriptomics (MIAME Working Group / MGED)
  • Proteomics (Proteomics Standards Initiative)
  • Metabolomics (Metabolomics Standards Initiative)
  • Genomics and Metagenomics (Genomic Standards
    Consortium)
  • In Situ Hybridization and Immunohistochemistry
    (MISFISHIE Working Group)
  • Phylogenetics (Phylogenetics Community)
  • RNA Interference (RNAi Community)
  • Toxicogenomics (Toxicogenomics WG)
  • Environmental Genomics (Environmental Genomics
    WG)
  • Nutrigenomics (Nutrigenomics WG)
  • Flow Cytometry (Flow Cytometry Community)

MIBBI Foundry communities
67
OBI / OCI
  • Ontology for Biological Investigations
  • overarching terminology resource for MIBBI
    Foundry
  • Ontology for Clinical Investigations
  • collaboration with EPOCH ontology for clinical
    trial management
  • and with CDISC (FDA mandated vocabulary for
    clinical trial reports)

68
INDEPENDENT CONTINUANTS
organism
system
organ
organ part
tissue
cell
acellular anatomical structure
biological molecule
genome
DEPENDENT CONTINUANTS DEPENDENT CONTINUANTS DEPENDENT CONTINUANTS DEPENDENT CONTINUANTS
physiology (functions) pathology pathology pathology
physiology (functions) acute stage progressive stage resolution stage
next step repertoire of disease ontologiesbuilt
out of OBO Foundry elements
69
Draft Ontology for Multiple Sclerosis
to apprehend what is unknown requires a complete
demarcation of the relevant space of alternatives


70
CTSA Ontology Consortium
  • Duke Clinical Research Institute (DCRI)
  • Dallas University of Texas Southwestern Medical
    Center Clinical and Translational Science
    Initiative Division of Biomedical Informatics
  • University of Texas Health Science Center at
    Houston Center for Clinical and Translational
    Sciences

71
Multiple kinds of standardization for data
  • Terminologies (SNOMED, UMLS)
  • CDEs (Clinical research)
  • Ontologies (Biology, Disease Models)
  • Information Exchange Standards (HL7 RIM)
  • LIMS (LOINC)
  • Duke DCRI project to deal with 3 of these

72
Houston CTSA Biomedical Informatics
  • Specific aim 1 To design and implement the
    biological data interface ... based on existing
    biological ontologies, specifically those
    included in the NIH Roadmap funded Open
    Biomedical Ontologies (OBO) project, and to
    leverage previous informatics research in
    ontology management.

73
Houston CTSA proposal
  • providing a coherent and integrated framework
    for CTSI investigators to integrate disparate
    sources of data, improve the communication among
    researchers, and establish better contact between
    researchers and the community. Of critical
    importance, by combining isolated data clusters
    the biomedical informatics component will empower
    investigators to redefine human disease and the
    response to diagnostic and therapeutic strategies
    through the use of combined clinical and
    molecular profiling.

74
PAR-07-425 Data Ontologies for Biomedical
Research (R01)
  • Adoption of ontologies also depends on the
    ontology being in a format that is broadly
    supported, fully machine interpretable and not
    subject to intellectual property restrictions.
    ... Another determinate of ontology acceptance is
    the degree to which the ontology conforms to best
    practices governing ontology design and
    construction. Criteria have been developed, and
    are undergoing empirical validation, by the
    Vocabulary and Common Data Element Work Group of
    caBIG. Other criteria have been specified by the
    OBO Foundry (http//obofoundry.org).

75
Top-down (master-model-based) Bottom-up (evidence-based)
prospective standardization caBIG SNOMED HL7 OBO Foundry
retrospective mapping UMLS (multiple authorities) NLP / data text-mining
  • caBIG
  • BRIDG

76
  • SNOMED
  • Ultimately as data become attached to the
    samples (e.g., pathology data, genotypes) these
    will be linked to the patient records.
Write a Comment
User Comments (0)
About PowerShow.com