Pax Terminologica - PowerPoint PPT Presentation

About This Presentation
Title:

Pax Terminologica

Description:

semantic annotation in biomedical informatics. improving systems for semantic annotation ... use the peer-reviewed biomedical literature ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 59
Provided by: barr214
Category:

less

Transcript and Presenter's Notes

Title: Pax Terminologica


1
Pax Terminologica
  • Barry Smith
  • Institute for Formal Ontology and Medical
    Information Science (IFOMIS), Saarland University
    / University at Buffalo

2
Overview
  • systems for semantic annotation
  • linguistics vs. science
  • semantic annotation in biomedical informatics
  • improving systems for semantic annotation
  • conclusions

3
The Penn Treebank Project
  • annotates naturally occurring text for linguistic
    structure, producing skeletal parses showing
    syntactic and semantic information in tree form

4
Automatic Content Extraction Program (ACE)
  • develops text corpora in English, Chinese and
    Arabic annotated for entities, the relations
    among them and the events in which they
    participate.

5
High Accuracy Retrieval from Documents (HARD)
  • creates corpora and annotations including topics,
    metadata and relevance judgements

6
Annotation Graph Toolkit (AGTK)
  • formal framework for representing linguistic
    annotations of time series data.

7
TimeML
  • robust specification language for markup of
    natural language to support
  • time stamping of events (identifying and
    anchoring in time)
  • ordering events with respect to one another
  • reasoning about persistence

8
SpaceML
  • provides facilities for annotating
  • category attributions to spatial regions
    (self-connected, bounded, regular, etc.)
  • ascription to regions of topological, distance,
    morphological and orientation relations
  • the definition of a region in terms of its
    boundary.

9
WordNet
  • annotates English nouns, verbs, adjectives and
    adverbs to synonym sets, each representing one
    underlying lexical concept.

10
FrameNet
  • documents the range of semantic and syntactic
    combinatory possibilities (valences) of each word
    in each of its senses

11
is there order in this chaos?
12
ISO/TC 37 / SC 4 N 076
  • Ide, N., Romary, L., de la Clergerie, E. (2003).
    International Standard for a Linguistic
    Annotation Framework.
  • HLT-NAACL 2003 (Edmonton)

13
OntoGloss (influenced by ISO Linguistic
Annotation Framework)
  • an ontology based annotation tool that uses
    predefined terms in an ontology to mark-up a
    document
  • No standard portal for semantic annotation
    tools/projects (?)

14
Purposes of semantic annotation
  • information retrieval (incl. semantic indexing
    answering queries that use words not used in the
    text, including words from other languages)
  • automatic translation
  • disambiguation
  • topic extraction and text summarization
  • information integration
  • reasoning

15
for linguistics
  • fiction no less important than fact
  • English has no privileged status
  • regimentation not allowed
  • annotation frameworks may be competitive
  • cross-framework consistency is not important

16
for science
  • factual discourse alone important
  • English is language par excellence
  • regimentation is allowed
  • goal of truth to create a single
    computer-processable map of reality
  • truth is one ? must strive for consistency of
    annotations and additivity of annotation
    frameworks

17
for science
  • must end the terminology wars
  • Plant Ontology (PO)
  • cell def. structural and physiological unit of a
    plant
  • what should PO do when it needs to study bacteria
    in plants?
  • answer all shall use the word cell to mean the
    same thing!
  • (all in biology)

18
the ideal (of additivity)
  • WordNet for single word forms
  • FrameNet for valencies/combination forms
  • SpaceNet for spatial structures
  • TimeNet for temporal structures
  • ChemNet for chemical structures
  • CellNet for cellular structures
  • etc.

19
a scientific problem huge swarms of biomedical
data at different granularities, from molecule to
clinic
  • methods for data integration needed to enable
    reasoning across data at multiple granularities
  • (genomic medicine ...)

20
orthodox solutions to this problem
  • dumb statistical number-crunching
  • or
  • Semantic Web, Unified Medical Language System
    (UMLS), Moby, etc.
  • let a million flowers bloom
  • and rely on mappings between already existing
    controlled vocabularies/annotation systems

21
an alternative solution
  • use the peer-reviewed biomedical literature
  • contains both textual descriptions of biological
    functions (incl. diseases) and references to
    entities represented in the biochemical databases
  • use high-quality semantic annotations of the
    former to integrate across the latter ? the Gene
    Ontology

22
(No Transcript)
23
(No Transcript)
24
The methodology of annotations
  • Model organism databases employ scientific
    curators, who use the experimental observations
    reported in the biomedical literature to link
    gene products (such as proteins) with GO terms in
    annotations.

25
The process of annotations
  • leads to improvements and extensions of the
    ontology, which in turn leads to better
    annotations
  • a virtuous cycle of improvement in the quality
    and reach of both future annotations and the
    ontology itself,
  • yielding a slowly growing computer-interpretable
    map of biological reality within which major
    databases are automatically integrated in
    semantically searchable form

26
need to extend GO by means of other ontologies,
e.g. Cell Ontology, via integrated definitions
GO

Cell type

Osteoblast differentiation Processes whereby an
osteoprogenitor cell or a cranial neural crest
cell acquires the specialized features of an
osteoblast, a bone-forming cell which secretes
extracellular matrix.
New Definition
27
need to extend GO also to semantic annotation of
clinical literature
unfortunately, available (UMLS) clinical
vocabularies are of variable quality and low
mutual consistency
28
? need for prospective standards to assure
consistency and high quality
  • create rules for high-quality controlled
    vocabularies for the annotation of scientific
    literature
  • make everyone follow these rules
  • regimentation !

29
first step
  • a shared portal for (so far) 58 ontologies
  • (low regimentation)
  • http//obo.sourceforge.net

30
(No Transcript)
31
Second step The OBO Foundryhttp//obofoundry.or
g/
32
The OBO Foundry
  • scientific standards and principles-based
    coordination of systems for semantic annotation
    of biomedical literature to create a single
    interoperable family of gold standard reference
    ontologies

33
  • A subset of OBO ontologies, whose developers
    have agreed in advance to accept a common set of
    principles designed to ensure
  • formal robustness
  • stability
  • compatibility
  • interoperability
  • support for logic-based reasoning

34
  • Custodians
  • Michael Ashburner (Cambridge)
  • Suzanna Lewis (Berkeley)
  • Barry Smith (Buffalo/Saarbrücken)

35
A prospective standard
  • designed to guarantee interoperability of
    ontologies from the very start
  • established March 2006 already 13 OBO
    ontologies have joined the Foundry and are being
    corresponding reformed three new ontologies are
    being constructed ab initio in its terms

36
  • Initial Candidate Members
  • GO Gene Ontology
  • CL Cell Ontology
  • SO Sequence Ontology
  • ChEBI Chemical Ontology
  • PATO Phenotype (Quality) Ontology
  • FuGO Functional Genomics Investigation Ontology
  • FMA Foundational Model of Anatomy
  • RO Relation Ontology
  • ChEBI Chemical Entities of Biological Interest
  • CARO Common Anatomy Reference Ontology
  • FuGO Functional Genomics Investigation Ontology
  • PrO Protein Ontology
  • RnaO RNA Ontology  

37
  • Under development
  • Disease Ontology
  • Mammalian Phenotype Ontology
  • OBO-UBO / Ontology of Biomedical Reality
  • Organism (Species) Ontology
  • Plant Trait Ontology
  • Environment Ontology
  • Behavior Ontology
  • Biomedical Image Ontology
  • Clinical Trial Ontology

38
(No Transcript)
39
CRITERIA
The OBO Foundry
  • The ontology is open and available to be used by
    all.
  • The ontology is in, or can be instantiated in, a
    common formal language.
  • The developers of the ontology agree in advance
    to collaborate with developers of other OBO
    Foundry ontology where domains overlap.

40
CRITERIA
  • The developers of each ontology commit to its
    maintenance in light of scientific advance, and
    to soliciting community feedback for its
    improvement.
  • They commit to working with other Foundry members
    to ensure that, for any particular domain, there
    is community convergence on a single controlled
    vocabulary

41
CRITERIA
  • The ontology possesses a unique identifier space
    within OBO.
  • The ontology provider has procedures for
    identifying distinct successive versions.
  • The ontology includes textual definitions for all
    terms.

42
CRITERIA
  • The ontology has a clearly specified and clearly
    delineated content.
  • The ontology is well-documented.
  • The ontology has a plurality of independent users.

43
CRITERIA
  • The ontology uses relations which are
    unambiguously defined following the pattern of
    definitions laid down in the OBO Relation
    Ontology.
  • Genome Biology 2005, 6R46

44
OBO Relation Ontology
Foundational is_apart_of
Spatial located_incontained_inadjacent_to
Temporal transformation_ofderives_frompreceded_by
Participation has_participanthas_agent
45
analogy with FrameNet
  • the constituent ontologies in the OBO Foundry are
    focused overwhelmingly on single nouns
  • the OBO Relation Ontology is designed to ensure a
    common structure of relations shared by all
    Foundry ontologies comparable to SpaceML,
    TimeML ...
  • need something like (Bio)FrameNet to pull the
    different levels of granularity together

46
CRITERIA
The OBO Foundry
  • Further criteria will be added over time in order
    to bring about a gradual improvement in the
    quality of the ontologies in the Foundry

47
GOALS
  • semantic alignment of OBO Foundry ontologies
    through a common system of formally defined
    relations
  • to enable reasoning both within and across
    ontologies, and thus also within and between the
    literature annotated in its terms
  • and thus also to support reasoning across
    associated data

48
GOALS
  • to promote re-usability of data
  • if data-schemas are formulated using a single
    well-integrated framework for semantic annotation
    in widespread use, then this data will be to this
    degree itself become more widely accessible and
    usable

49
GOALS
  • to help in creating better mappings e.g. between
    human and model organism phenotypes
  • S Zhang, O Bodenreider, Alignment of Multiple
    Ontologies of Anatomy Deriving Indirect Mappings
    from Direct Mappings to a Reference Ontology,
    AMIA 2005

50
GOALS
  • to introduce the scientific method into the
    development of semantic annotation frameworks
  • to introduce some of the features of scientific
    peer review into biomedical ontology development

51
GOALS
  • to aid literature search
  • http//www.gopubmed.org/
  • to subvert the current policy of ad hoc creation
    of new annotation schemas by each clinical
    research group by providing a common shared
    framework

52
GOALS
  • to use the Foundry ontologies as benchmark for
    improving existing terminologies
  • to create controlled vocabularies for semantic
    annotation of clinical trial records, scientific
    journal articles, ...

53
GOALS
  • to create an evolving map-like computable
    representation of the entire domain of biomedical
    reality
  • to create the conditions for a step-by-step
    evolution towards high quality ontologies in the
    biomedical domain
  • which will serve as stable attractors for
    clinical and biomedical researchers in the future

54
GOALS
  • to end the terminology wars and to advance
    regimentation of clinical and other vocabularies
    in a scientific spirit

55
Conclusion 1
  • existing linguistic resources for semantic
    annotation are scattered to the four winds
  • ? need for something like the OBO Library to
    ensure that the different available tools are
    available for comparison and alignment

56
Conclusion 2
  • linguists developing tools for semantic
    annotation with scientific purposes need
    something like the Foundry to ensure a complete
    set of interoperable tools which allow for
    additivity of annotations

57
the ideal
  • BioWordNet for single word forms
  • SpaceNet for spatial structures
  • TimeNet for temporal structures
  • ChemNet for chemical structures
  • CellNet for cellular structures
  • BioFrameNet for valencies/combination forms

58
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com