Title: Pax Terminologica
1Pax Terminologica
- Barry Smith
- Institute for Formal Ontology and Medical
Information Science (IFOMIS), Saarland University
/ University at Buffalo
2Overview
- systems for semantic annotation
- linguistics vs. science
- semantic annotation in biomedical informatics
- improving systems for semantic annotation
- conclusions
3The Penn Treebank Project
- annotates naturally occurring text for linguistic
structure, producing skeletal parses showing
syntactic and semantic information in tree form
4Automatic Content Extraction Program (ACE)
- develops text corpora in English, Chinese and
Arabic annotated for entities, the relations
among them and the events in which they
participate.
5High Accuracy Retrieval from Documents (HARD)
- creates corpora and annotations including topics,
metadata and relevance judgements
6Annotation Graph Toolkit (AGTK)
- formal framework for representing linguistic
annotations of time series data.
7TimeML
- robust specification language for markup of
natural language to support - time stamping of events (identifying and
anchoring in time) - ordering events with respect to one another
- reasoning about persistence
8SpaceML
- provides facilities for annotating
- category attributions to spatial regions
(self-connected, bounded, regular, etc.) - ascription to regions of topological, distance,
morphological and orientation relations - the definition of a region in terms of its
boundary.
9WordNet
- annotates English nouns, verbs, adjectives and
adverbs to synonym sets, each representing one
underlying lexical concept.
10FrameNet
- documents the range of semantic and syntactic
combinatory possibilities (valences) of each word
in each of its senses
11is there order in this chaos?
12ISO/TC 37 / SC 4 N 076
- Ide, N., Romary, L., de la Clergerie, E. (2003).
International Standard for a Linguistic
Annotation Framework. - HLT-NAACL 2003 (Edmonton)
13OntoGloss (influenced by ISO Linguistic
Annotation Framework)
- an ontology based annotation tool that uses
predefined terms in an ontology to mark-up a
document - No standard portal for semantic annotation
tools/projects (?)
14Purposes of semantic annotation
- information retrieval (incl. semantic indexing
answering queries that use words not used in the
text, including words from other languages) - automatic translation
- disambiguation
- topic extraction and text summarization
- information integration
- reasoning
15for linguistics
- fiction no less important than fact
- English has no privileged status
- regimentation not allowed
- annotation frameworks may be competitive
- cross-framework consistency is not important
16for science
- factual discourse alone important
- English is language par excellence
- regimentation is allowed
- goal of truth to create a single
computer-processable map of reality - truth is one ? must strive for consistency of
annotations and additivity of annotation
frameworks
17for science
- must end the terminology wars
- Plant Ontology (PO)
- cell def. structural and physiological unit of a
plant - what should PO do when it needs to study bacteria
in plants? - answer all shall use the word cell to mean the
same thing! - (all in biology)
18the ideal (of additivity)
- WordNet for single word forms
- FrameNet for valencies/combination forms
- SpaceNet for spatial structures
- TimeNet for temporal structures
- ChemNet for chemical structures
- CellNet for cellular structures
- etc.
19a scientific problem huge swarms of biomedical
data at different granularities, from molecule to
clinic
- methods for data integration needed to enable
reasoning across data at multiple granularities - (genomic medicine ...)
20orthodox solutions to this problem
- dumb statistical number-crunching
- or
- Semantic Web, Unified Medical Language System
(UMLS), Moby, etc. - let a million flowers bloom
- and rely on mappings between already existing
controlled vocabularies/annotation systems
21an alternative solution
- use the peer-reviewed biomedical literature
- contains both textual descriptions of biological
functions (incl. diseases) and references to
entities represented in the biochemical databases - use high-quality semantic annotations of the
former to integrate across the latter ? the Gene
Ontology
22(No Transcript)
23(No Transcript)
24The methodology of annotations
- Model organism databases employ scientific
curators, who use the experimental observations
reported in the biomedical literature to link
gene products (such as proteins) with GO terms in
annotations.
25The process of annotations
- leads to improvements and extensions of the
ontology, which in turn leads to better
annotations - a virtuous cycle of improvement in the quality
and reach of both future annotations and the
ontology itself, - yielding a slowly growing computer-interpretable
map of biological reality within which major
databases are automatically integrated in
semantically searchable form
26need to extend GO by means of other ontologies,
e.g. Cell Ontology, via integrated definitions
GO
Cell type
Osteoblast differentiation Processes whereby an
osteoprogenitor cell or a cranial neural crest
cell acquires the specialized features of an
osteoblast, a bone-forming cell which secretes
extracellular matrix.
New Definition
27need to extend GO also to semantic annotation of
clinical literature
unfortunately, available (UMLS) clinical
vocabularies are of variable quality and low
mutual consistency
28? need for prospective standards to assure
consistency and high quality
- create rules for high-quality controlled
vocabularies for the annotation of scientific
literature - make everyone follow these rules
- regimentation !
29first step
- a shared portal for (so far) 58 ontologies
- (low regimentation)
- http//obo.sourceforge.net
30(No Transcript)
31Second step The OBO Foundryhttp//obofoundry.or
g/
32The OBO Foundry
- scientific standards and principles-based
coordination of systems for semantic annotation
of biomedical literature to create a single
interoperable family of gold standard reference
ontologies
33- A subset of OBO ontologies, whose developers
have agreed in advance to accept a common set of
principles designed to ensure - formal robustness
- stability
- compatibility
- interoperability
- support for logic-based reasoning
34- Custodians
- Michael Ashburner (Cambridge)
- Suzanna Lewis (Berkeley)
- Barry Smith (Buffalo/Saarbrücken)
35A prospective standard
- designed to guarantee interoperability of
ontologies from the very start - established March 2006 already 13 OBO
ontologies have joined the Foundry and are being
corresponding reformed three new ontologies are
being constructed ab initio in its terms
36- Initial Candidate Members
- GO Gene Ontology
- CL Cell Ontology
- SO Sequence Ontology
- ChEBI Chemical Ontology
- PATO Phenotype (Quality) Ontology
- FuGO Functional Genomics Investigation Ontology
- FMA Foundational Model of Anatomy
- RO Relation Ontology
- ChEBI Chemical Entities of Biological Interest
- CARO Common Anatomy Reference Ontology
- FuGO Functional Genomics Investigation Ontology
- PrO Protein Ontology
- RnaO RNA Ontology
37- Under development
- Disease Ontology
- Mammalian Phenotype Ontology
- OBO-UBO / Ontology of Biomedical Reality
- Organism (Species) Ontology
- Plant Trait Ontology
- Environment Ontology
- Behavior Ontology
- Biomedical Image Ontology
- Clinical Trial Ontology
38(No Transcript)
39CRITERIA
The OBO Foundry
- The ontology is open and available to be used by
all. - The ontology is in, or can be instantiated in, a
common formal language. - The developers of the ontology agree in advance
to collaborate with developers of other OBO
Foundry ontology where domains overlap.
40CRITERIA
- The developers of each ontology commit to its
maintenance in light of scientific advance, and
to soliciting community feedback for its
improvement. - They commit to working with other Foundry members
to ensure that, for any particular domain, there
is community convergence on a single controlled
vocabulary
41CRITERIA
- The ontology possesses a unique identifier space
within OBO. - The ontology provider has procedures for
identifying distinct successive versions. - The ontology includes textual definitions for all
terms.
42CRITERIA
- The ontology has a clearly specified and clearly
delineated content. - The ontology is well-documented.
- The ontology has a plurality of independent users.
43CRITERIA
- The ontology uses relations which are
unambiguously defined following the pattern of
definitions laid down in the OBO Relation
Ontology. - Genome Biology 2005, 6R46
44OBO Relation Ontology
Foundational is_apart_of
Spatial located_incontained_inadjacent_to
Temporal transformation_ofderives_frompreceded_by
Participation has_participanthas_agent
45analogy with FrameNet
- the constituent ontologies in the OBO Foundry are
focused overwhelmingly on single nouns - the OBO Relation Ontology is designed to ensure a
common structure of relations shared by all
Foundry ontologies comparable to SpaceML,
TimeML ... - need something like (Bio)FrameNet to pull the
different levels of granularity together
46CRITERIA
The OBO Foundry
- Further criteria will be added over time in order
to bring about a gradual improvement in the
quality of the ontologies in the Foundry
47GOALS
- semantic alignment of OBO Foundry ontologies
through a common system of formally defined
relations - to enable reasoning both within and across
ontologies, and thus also within and between the
literature annotated in its terms - and thus also to support reasoning across
associated data
48GOALS
- to promote re-usability of data
- if data-schemas are formulated using a single
well-integrated framework for semantic annotation
in widespread use, then this data will be to this
degree itself become more widely accessible and
usable
49GOALS
- to help in creating better mappings e.g. between
human and model organism phenotypes - S Zhang, O Bodenreider, Alignment of Multiple
Ontologies of Anatomy Deriving Indirect Mappings
from Direct Mappings to a Reference Ontology,
AMIA 2005
50GOALS
- to introduce the scientific method into the
development of semantic annotation frameworks - to introduce some of the features of scientific
peer review into biomedical ontology development
51GOALS
- to aid literature search
- http//www.gopubmed.org/
- to subvert the current policy of ad hoc creation
of new annotation schemas by each clinical
research group by providing a common shared
framework
52GOALS
- to use the Foundry ontologies as benchmark for
improving existing terminologies - to create controlled vocabularies for semantic
annotation of clinical trial records, scientific
journal articles, ...
53GOALS
-
- to create an evolving map-like computable
representation of the entire domain of biomedical
reality - to create the conditions for a step-by-step
evolution towards high quality ontologies in the
biomedical domain - which will serve as stable attractors for
clinical and biomedical researchers in the future
54GOALS
- to end the terminology wars and to advance
regimentation of clinical and other vocabularies
in a scientific spirit
55Conclusion 1
- existing linguistic resources for semantic
annotation are scattered to the four winds - ? need for something like the OBO Library to
ensure that the different available tools are
available for comparison and alignment
56Conclusion 2
- linguists developing tools for semantic
annotation with scientific purposes need
something like the Foundry to ensure a complete
set of interoperable tools which allow for
additivity of annotations
57the ideal
- BioWordNet for single word forms
- SpaceNet for spatial structures
- TimeNet for temporal structures
- ChemNet for chemical structures
- CellNet for cellular structures
- BioFrameNet for valencies/combination forms
58(No Transcript)