Ontologies%20in%20Biomedicine:%20%20The%20Good,%20The%20Bad%20and%20The%20Ugly - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Ontologies%20in%20Biomedicine:%20%20The%20Good,%20The%20Bad%20and%20The%20Ugly

Description:

The truth about the real world is constructed through a combination and ... Consistent use of terms, supported by logically coherent (non-circular) ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 43
Provided by: barr222
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Ontologies%20in%20Biomedicine:%20%20The%20Good,%20The%20Bad%20and%20The%20Ugly


1
Ontologies in Biomedicine The Good, The Bad
and The Ugly
  • Barry Smith
  • http//ontology.buffalo.edu/smith

2
The Good
  • Foundational Model of Anatomy (FMA)
  • Pro
  • Very clear statement of scope structural human
    anatomy, at all levels of granularity, from the
    whole organism to the biological macromolecule
  • Powerful treatment of definitions, from which
    the entire FMA hierarchy is generated can serve
    as basis for formal reasoning
  • Con
  • Some unfortunate artifacts in the ontology
    deriving from its specific computer
    representation (Protégé)

3
Intermediate
  • GALEN
  • Pro
  • Allows formal representation of clinical
    information
  • Allows multiple views of relevant detail as
    needed
  • Uses powerful Description Logic (DL)-based formal
    structure
  • Con
  • Remains only partially developed
  • Contains errors Vomitus contains carrot
  • which DLs did not prevent

4
Intermediate
  • The Gene Ontology
  • Con
  • Poor formal architecture
  • Full of errors
  • menopause part_of death
  • Poor support for automatic reasoning and
    error-checking
  • Poor treatment of definitions
  • Not trans-granular
  • No relation to time or instances

5
The Gene Ontology
  • Pro
  • Open Source
  • Cross-Species
  • ... has recognized the need for reform,
    including explicit representation of granular
    levels

6
Problem of Circularity
  • GO0042270
  • Protection from natural killer cell mediated
    cytolysis
  • Definition The process of protecting a cell from
    cytolysis by natural killer cells.

7
GO0019836 hemolysis
  • Definition The processes that cause hemolysis
  • X def. the Y of X
  • this is worse than circular

8
The Bad
  • Reactome
  • Pro
  • Rich catalogue of biological process
  • Con
  • Incoherent treatment of categories
  • ReferentEntity (embracing e.g. small molecules)
    is a sibling of PhysicalEntity (embracing
    complexes, molecules, ions and particles).
  • Similarly CatalystActivity is a sibling of
    Event.

9
The Bad
  • National Cancer Institute Thesaurus
  • Pro
  • Open source ambitiously broad coverage
    DL-based
  • Con
  • Poor realization of DL formalism
  • Full of mistakes (many inherited from its UMLS
    sources)
  • three disjoint classes of plants Vascular
    Plant, Non-vascular Plant, Other Plant
  • three disjoint kinds of cells Cell, Normal
    Cell, Abnormal Cell
  • Normal Cell is_a Microanatomy
  • See http//ontology.buffalo.edu/medo/NCIT_Smith.ht
    ml

10
National Cancer Institute Thesaurus
  • Duratec, Lactobutyrin and Stilbene Aldehyde
    classified as Unclassified Drugs and Chemicals
  • Pro
  • NCIT, too, has recognized the need for reform
  • (NCIT is part of the OBO library)

11
The UglyUMLS Semantic Network
  • Pros
  • Broad coverage no multiple inheritance
  • Cons
  • Incoherent use of conceptual entities
  • (e.g. the digestive system as a conceptual part
    of the organism)
  • Full of errors

12
UMLS Semantic Network
  • Edges in the graph represent merely possible
    significant relations
  • Bacterium causes Experimental Model of Disease
  • Experimental Model of Disease affects Fungus
  • Experimental model of disease is_a Pathologic
    Function

13
UMLS Semantic Network
  • Unclear what the nodes of the graph are
  • Drug Delivery Device contains Clinical Drug
  • Drug Delivery Device narrower_in_meaning_than
    Manufactured Object
  • The use-mention confusion
  • Swimming is healthy and has 8 letters

14
The UglyClinical Terms Version 2 (The Read
Codes)
  • Classifies chemicals into
  • chemicals whose name begins with A,
  • chemicals whose name begins with B,
  • chemicals whose name begins with C, ...

15
The Astonishingly (Criminally?) Ugly
  • Health Level 7
  • HL7 is a UML-based standard for exchange of
    information between clinical information systems
  • has proved very crumbly as a standard
  • The HL7 Reference Information Model (RIM) is
    supposed to overcome this problem by defining the
    universe of healthcare data in a rigorous way

16
HL7-RIM
  • Animal
  • Definition A subtype of Living Subject
    representing any animal-of-interest to the
    Personnel Management domain.
  • Person
  • A subtype of Living Subject representing single
    human being sic who, in the context of the
    Personnel Management domain, must also be
    uniquely identifiable through one or more legal
    documents.
  • LivingSubject
  • Definition A subtype of Entity representing an
    organism or complex animal, alive or not.

17
HL7 RIM The Problem of Circularity
  • Person Person with documents
  • has the form An A is an A which is B
  • useless in practical terms since neither we
    nor the machine can use them to find out what A
    means
  • incorporate a vicious infinite regress
  • have the effect of making it impossible to
    refer to As which are not Bs, for example to an
    undocumented person

18
HL7 Logically Incoherent
  • act the record of an act
  • This has the form An X is the Y of an X
  • again worse than circular

19
HL7-RIM Logically Contradictory Definitions
  • Definition of Act An Act is an action of
    interest that has happened, can happen, is
    happening, is intended to happen, or is
    requested/demanded to happen.
  • Definition of Act An Act is the record of
    something that is being done, has been done, can
    be done, or is intended or requested to be done.

20
HL7 RIM Ontologically Incoherent
  • The truth about the real world is constructed
    through a combination and arbitration of
    attributed statements ...
  • As such, there is no distinction between an
    activity and its documentation.

21
HL7 Incredibly Successful
  • embraced as US federal standard
  • central part of 15 billion program to integrate
    all UK hospital information systems
  • made mandatory by Canada Health Infoway
  • adopted by Oracle as basis for its EHR support
    programs

22
HL7 Merchandizing
23
From molecules to diseases
  • A good ontology should enable us to organize our
    information resources in such a way that we can
    bridge the granularity gap between genomics and
    proteomics data and phenotype (clinical,
    pharmacological, patient-centered) data

24
good ontologies require
  • Coherent upper level taxonomy distinguishing
  • continuants (cells, molecules, organisms ...)
  • occurrents (events, processes)
  • dependent entities (qualities, functions ...)
  • independent entities (their bearers)
  • universals (types, kinds)
  • instances (tokens, instances)
  • Coherent relation ontology supporting inference
    both within and between ontologies.

25
good ontologies require
  • Consistent use of terms, supported by logically
    coherent (non-circular) definitions, in both
    human-readable and computable formats

26
Open Biomedical Ontologies (OBO) Upper Biomedical
Ontology (UBO)
  • root UBO0000001top
  • subclass BFOcontinuantcontinuant
  • subclass BFOdependent_entitydependent_entity
  • subclass UBO0000023quality
  • subclass UBO0000026phenotype
  • subclass UBO0000025state
  • subclass UBO0000027disease
  • subclass UBO0000005function
  • subclass GO0003674molecular_function
  • subclass BFOdispositiondisposition
  • subclass BFOindependent_entityindependent_enti
    ty
  • subclass UBO0000002substance
  • subclass UBO0000019protein
  • subclass GO0005575cellular_component
  • subclass UBO0000006anatomical_entity
  • subclass UBO0000008gross_anatomical_entity
  • subclass UBO0000007organism
  • subclass UBO0000015microbe
  • subclass UBO0000014plant

27
OBO Relation Ontology (RO)
  • Clear distinction between universals (classes,
    kinds, types and instances (individuals, tokens
  • Precise formal definitions of relations
  • Automatic applicability to time-indexed
    instance-data e.g. in Electronic Health Record
  • Consistency with the Relation Ontology now a
    criterion for admission to the OBO ontology
    library
  • see Genome Biology Apr. 2006

28
Three types of relations
  • between instances
  • Marys heart part_of Mary
  • between an instance and a universal
  • Mary instance_of homo sapiens
  • between universals
  • gastrulation part_of embryonic development

29
A suite of primitive instance-level relations
  • identical_to
  • part_of
  • located_in
  • adjacent_to
  • earlier
  • derives_from
  • ...

30
A suite of defined relations between universals
Foundational is_apart_of
Spatial located_incontained_inadjacent_to
Temporal transformation_ofderives_frompreceded_by
Participation has_participanthas_agent
31
GALEN Vomitus contains carrot
  • All portions of vomit contain all portions of
    carrot
  • All portions of vomit contain some portion of
    carrot
  • Some portions of vomit contain some portion of
    carrot
  • Some portions of vomit contain all portions of
    carrot

32
  • all-some structure
  • A part_of B def. given any instance a of A there
    is some instance b of B such that a part_of b on
    the instance level
  • Allows automatic ontology integration via
    cascading reasoning
  • A R1 B
  • B R2 C
  • ? A R3 C

33
adjacent_to
  • cell wall adjacent_to cytoplasm
  • intron adjacent_to exon
  • Golgi apparatus adjacent_to endoplasmic
  • reticulum
  • periplasm adjacent_to plasma membrane
  • presynaptic membrane adjacent_to synaptic cleft

34
A adjacent_to B
  • every instance of A stands in the instance-level
    adjacent_to relation to some instance of B

35
adjacent_to as a relation between universals is
not symmetric
  • nucleus adjacent_to cytoplasm
  • Not cytoplasm adjacent_to nucleus
  • seminal vesicle adjacent_to urinary bladder
  • Not urinary bladder adjacent_to seminal vesicle

36
The Granularity Gulf
  • most existing data-sources are of fixed, single
    granularity
  • many (all?) clinical phenomena cross
    granularities

37
Main obstacle to integrating genetic and EHR data

No facility for dealing with time and instances
(particulars, individuals) in current ontologies
38
Key idea
  • To define ontological relations like
  • part_of, develops_from
  • it is not enough to look just at universals /
    classes / types / concepts
  • we need also to take account of instances and time

39
transformation_of
  • A transformation_of B
  • def. any instance of A was at some earlier time
    an instance of B

40
transformation_of
mature RNA transformation_of pre-RNA adult
transformation_of child carcinomatous colon
transformation_of colon
41
transformation_of relations cross both time and
granularity
42
Advantages of the methodology of enforcing
commonly accepted coherent definitions
  • promote quality assurance (better coding)
  • guarantee automatic reasoning across ontologies
    and across data at different granularities
  • yields direct connection to times and instances
    in the EHR
About PowerShow.com