Title: Ontologies%20in%20Biomedicine:%20%20The%20Good,%20The%20Bad%20and%20The%20Ugly
1Ontologies in Biomedicine The Good, The Bad
and The Ugly
- Barry Smith
- http//ontology.buffalo.edu/smith
2The Good
- Foundational Model of Anatomy (FMA)
- Pro
- Very clear statement of scope structural human
anatomy, at all levels of granularity, from the
whole organism to the biological macromolecule - Powerful treatment of definitions, from which
the entire FMA hierarchy is generated can serve
as basis for formal reasoning -
- Con
- Some unfortunate artifacts in the ontology
deriving from its specific computer
representation (Protégé)
3Intermediate
- GALEN
- Pro
- Allows formal representation of clinical
information - Allows multiple views of relevant detail as
needed - Uses powerful Description Logic (DL)-based formal
structure - Con
- Remains only partially developed
- Contains errors Vomitus contains carrot
- which DLs did not prevent
4Intermediate
- The Gene Ontology
- Con
- Poor formal architecture
- Full of errors
- menopause part_of death
- Poor support for automatic reasoning and
error-checking - Poor treatment of definitions
- Not trans-granular
- No relation to time or instances
5The Gene Ontology
- Pro
- Open Source
- Cross-Species
- ... has recognized the need for reform,
including explicit representation of granular
levels
6Problem of Circularity
- GO0042270
- Protection from natural killer cell mediated
cytolysis - Definition The process of protecting a cell from
cytolysis by natural killer cells.
7GO0019836 hemolysis
- Definition The processes that cause hemolysis
- X def. the Y of X
- this is worse than circular
8The Bad
- Reactome
- Pro
- Rich catalogue of biological process
- Con
- Incoherent treatment of categories
- ReferentEntity (embracing e.g. small molecules)
is a sibling of PhysicalEntity (embracing
complexes, molecules, ions and particles). - Similarly CatalystActivity is a sibling of
Event.
9The Bad
- National Cancer Institute Thesaurus
- Pro
- Open source ambitiously broad coverage
DL-based - Con
- Poor realization of DL formalism
- Full of mistakes (many inherited from its UMLS
sources) - three disjoint classes of plants Vascular
Plant, Non-vascular Plant, Other Plant - three disjoint kinds of cells Cell, Normal
Cell, Abnormal Cell - Normal Cell is_a Microanatomy
- See http//ontology.buffalo.edu/medo/NCIT_Smith.ht
ml
10National Cancer Institute Thesaurus
- Duratec, Lactobutyrin and Stilbene Aldehyde
classified as Unclassified Drugs and Chemicals - Pro
- NCIT, too, has recognized the need for reform
- (NCIT is part of the OBO library)
-
11The UglyUMLS Semantic Network
- Pros
- Broad coverage no multiple inheritance
- Cons
- Incoherent use of conceptual entities
- (e.g. the digestive system as a conceptual part
of the organism) - Full of errors
12UMLS Semantic Network
- Edges in the graph represent merely possible
significant relations - Bacterium causes Experimental Model of Disease
- Experimental Model of Disease affects Fungus
- Experimental model of disease is_a Pathologic
Function
13UMLS Semantic Network
- Unclear what the nodes of the graph are
- Drug Delivery Device contains Clinical Drug
- Drug Delivery Device narrower_in_meaning_than
Manufactured Object - The use-mention confusion
- Swimming is healthy and has 8 letters
14The UglyClinical Terms Version 2 (The Read
Codes)
- Classifies chemicals into
- chemicals whose name begins with A,
- chemicals whose name begins with B,
- chemicals whose name begins with C, ...
15The Astonishingly (Criminally?) Ugly
- Health Level 7
- HL7 is a UML-based standard for exchange of
information between clinical information systems - has proved very crumbly as a standard
- The HL7 Reference Information Model (RIM) is
supposed to overcome this problem by defining the
universe of healthcare data in a rigorous way
16HL7-RIM
- Animal
- Definition A subtype of Living Subject
representing any animal-of-interest to the
Personnel Management domain. - Person
- A subtype of Living Subject representing single
human being sic who, in the context of the
Personnel Management domain, must also be
uniquely identifiable through one or more legal
documents. - LivingSubject
- Definition A subtype of Entity representing an
organism or complex animal, alive or not.
17HL7 RIM The Problem of Circularity
- Person Person with documents
- has the form An A is an A which is B
- useless in practical terms since neither we
nor the machine can use them to find out what A
means - incorporate a vicious infinite regress
- have the effect of making it impossible to
refer to As which are not Bs, for example to an
undocumented person
18HL7 Logically Incoherent
- act the record of an act
- This has the form An X is the Y of an X
- again worse than circular
19HL7-RIM Logically Contradictory Definitions
- Definition of Act An Act is an action of
interest that has happened, can happen, is
happening, is intended to happen, or is
requested/demanded to happen. - Definition of Act An Act is the record of
something that is being done, has been done, can
be done, or is intended or requested to be done.
20HL7 RIM Ontologically Incoherent
- The truth about the real world is constructed
through a combination and arbitration of
attributed statements ... - As such, there is no distinction between an
activity and its documentation.
21HL7 Incredibly Successful
- embraced as US federal standard
- central part of 15 billion program to integrate
all UK hospital information systems - made mandatory by Canada Health Infoway
- adopted by Oracle as basis for its EHR support
programs
22HL7 Merchandizing
23From molecules to diseases
- A good ontology should enable us to organize our
information resources in such a way that we can
bridge the granularity gap between genomics and
proteomics data and phenotype (clinical,
pharmacological, patient-centered) data
24good ontologies require
- Coherent upper level taxonomy distinguishing
- continuants (cells, molecules, organisms ...)
- occurrents (events, processes)
- dependent entities (qualities, functions ...)
- independent entities (their bearers)
- universals (types, kinds)
- instances (tokens, instances)
- Coherent relation ontology supporting inference
both within and between ontologies.
25good ontologies require
- Consistent use of terms, supported by logically
coherent (non-circular) definitions, in both
human-readable and computable formats
26Open Biomedical Ontologies (OBO) Upper Biomedical
Ontology (UBO)
- root UBO0000001top
- subclass BFOcontinuantcontinuant
- subclass BFOdependent_entitydependent_entity
- subclass UBO0000023quality
- subclass UBO0000026phenotype
- subclass UBO0000025state
- subclass UBO0000027disease
- subclass UBO0000005function
- subclass GO0003674molecular_function
- subclass BFOdispositiondisposition
- subclass BFOindependent_entityindependent_enti
ty - subclass UBO0000002substance
- subclass UBO0000019protein
- subclass GO0005575cellular_component
- subclass UBO0000006anatomical_entity
- subclass UBO0000008gross_anatomical_entity
- subclass UBO0000007organism
- subclass UBO0000015microbe
- subclass UBO0000014plant
27OBO Relation Ontology (RO)
- Clear distinction between universals (classes,
kinds, types and instances (individuals, tokens - Precise formal definitions of relations
- Automatic applicability to time-indexed
instance-data e.g. in Electronic Health Record - Consistency with the Relation Ontology now a
criterion for admission to the OBO ontology
library - see Genome Biology Apr. 2006
28Three types of relations
- between instances
- Marys heart part_of Mary
- between an instance and a universal
- Mary instance_of homo sapiens
- between universals
- gastrulation part_of embryonic development
29A suite of primitive instance-level relations
- identical_to
- part_of
- located_in
- adjacent_to
- earlier
- derives_from
- ...
30A suite of defined relations between universals
Foundational is_apart_of
Spatial located_incontained_inadjacent_to
Temporal transformation_ofderives_frompreceded_by
Participation has_participanthas_agent
31GALEN Vomitus contains carrot
- All portions of vomit contain all portions of
carrot - All portions of vomit contain some portion of
carrot - Some portions of vomit contain some portion of
carrot - Some portions of vomit contain all portions of
carrot
32- all-some structure
- A part_of B def. given any instance a of A there
is some instance b of B such that a part_of b on
the instance level - Allows automatic ontology integration via
cascading reasoning - A R1 B
- B R2 C
- ? A R3 C
33adjacent_to
- cell wall adjacent_to cytoplasm
- intron adjacent_to exon
- Golgi apparatus adjacent_to endoplasmic
- reticulum
- periplasm adjacent_to plasma membrane
- presynaptic membrane adjacent_to synaptic cleft
34A adjacent_to B
- every instance of A stands in the instance-level
adjacent_to relation to some instance of B
35adjacent_to as a relation between universals is
not symmetric
- nucleus adjacent_to cytoplasm
- Not cytoplasm adjacent_to nucleus
- seminal vesicle adjacent_to urinary bladder
- Not urinary bladder adjacent_to seminal vesicle
36The Granularity Gulf
- most existing data-sources are of fixed, single
granularity - many (all?) clinical phenomena cross
granularities
37Main obstacle to integrating genetic and EHR data
No facility for dealing with time and instances
(particulars, individuals) in current ontologies
38Key idea
- To define ontological relations like
- part_of, develops_from
- it is not enough to look just at universals /
classes / types / concepts - we need also to take account of instances and time
39transformation_of
-
- A transformation_of B
- def. any instance of A was at some earlier time
an instance of B
40transformation_of
mature RNA transformation_of pre-RNA adult
transformation_of child carcinomatous colon
transformation_of colon
41transformation_of relations cross both time and
granularity
42Advantages of the methodology of enforcing
commonly accepted coherent definitions
- promote quality assurance (better coding)
- guarantee automatic reasoning across ontologies
and across data at different granularities - yields direct connection to times and instances
in the EHR