Medical%20Ontologies:%20An%20Overview - PowerPoint PPT Presentation

About This Presentation
Title:

Medical%20Ontologies:%20An%20Overview

Description:

Medical Ontologies: An Overview Barry Smith http://ifomis.de January 2004 – PowerPoint PPT presentation

Number of Views:217
Avg rating:3.0/5.0
Slides: 86
Provided by: BarryS208
Category:

less

Transcript and Presenter's Notes

Title: Medical%20Ontologies:%20An%20Overview


1
Medical Ontologies An Overview
  • Barry Smith
  • http//ifomis.de
  • January 2004

2
IFOMIS
  • Institute for Formal Ontology and
  • Medical Information Science
  • Faculty of Medicine
  • University of Leipzig

3
Partners
  • Laboratory for Applied Ontology, Trento and Rome
  • Language  Computing nv, Zonnegem, Belgium
  • Ontology Works, Baltimore
  • Structural Informatics Group, Department of
    Biological Structure, University of Washington,
    Seattle, USA
  • Cognitive Science Laboratory, Princeton University

4
Three levels of ontology
  • formal (top-level) ontology dealing with
    categories employed in every domain
  • object, event, whole, part, instance, class
  • 2) domain ontology, applies top-level system to
    a particular domain
  • cell, gene, drug, disease, therapy
  • 3) terminology-based ontology
  • large, lower-level system
  • Dupuytrens disease of palm, nodules with no
    contracture

5
Three levels of ontology
  • formal (top-level) ontology dealing with
    categories employed in every domain
  • object, event, whole, part, instance, class
  • 2) domain ontology, applies top-level system to
    a particular domain
  • cell, gene, drug, disease, therapy
  • 3) terminology-based ontology
  • large, lower-level system
  • Dupuytrens disease of palm, nodules with no
    contracture

6
Three levels of ontology
  • formal (top-level) ontology dealing with
    categories employed in every domain
  • object, event, whole, part, instance, class
  • 2) domain ontology, applies top-level system to
    a particular domain
  • cell, gene, drug, disease, therapy
  • 3) terminology-based ontology
  • large, lower-level system
  • Dupuytrens disease of palm, nodules with no
    contracture

7
IFOMIS
  • Institute for Formal Ontology and Medical
    Information Science
  • Leipzig
  • http//ifomis.de
  • philosophers and medical informaticians
    attempting to build and test a Basic Formal
    Ontology for applications in biomedical and
    related domains

8
IFOMIS
  • use basic principles of philosophical ontology
  • for quality assurance and alignment of
    biomedical ontologies

9
Compare
  1. pure mathematics (theories of structures such as
    order, set, function, mapping) employed in every
    domain
  2. applied mathematics, applications of these
    theories re-using the same definitions,
    theorems, proofs in new application domains
  3. physical chemistry, biophysics, etc. adding
    detail

10
Three levels of ontology
?????
  • formal (top-level) ontology
  • medical ontology has nothing like the technology
    of definitions, theorems and proofs provided by
    pure mathematics
  • 2) domain ontology
  • UMLS Semantic Network, GALEN CORE
  • 3) terminology-based ontology
  • UMLS, SNOMED-CT, GALEN, FMA

11
Strategy
  • Part 1 Provide an overview of medical ontologies
    and of the top-level ontologies which they
    implicitly define
  • Part 2 Show how principles of classification and
    definition derived from top-level ontology can
    help in quality assurance of terminology-based
    ontologies and in ontology alignment
  • Part 3 The Gene Ontology
  • Part 4 Medical Fact Net

12
(No Transcript)
13
UMLS Semantic Network
  • entity event
  • physical conceptual
  • object entity

14
UMLS Semantic Network
  • entity event
  • physical conceptual
  • object entity

15
conceptual entity
  • Organism Attribute
  • Finding
  • Idea or Concept
  • Occupation or Discipline
  • Organization
  • Group
  • Group Attribute
  • Intellectual Product
  • Language

16

  • conceptual
  • entity
  • idea or concept
  • functional concept
  • body system

17
  • entity
  • physical conceptual
  • object entity
  • idea or concept
  • functional concept
  • body system

confusion of entity and concept
18
Functional Concept
  • Body system is_a Functional Concept.
  • but
  • Concepts do not perform functions or have
    physical parts.

19
This
is not a concept
20
The Hydraulic Equation
  • BP COPVR
  • arterial blood pressure is directly proportional
    to the product of blood flow (cardiac output, CO)
    and peripheral vascular resistance (PVR)

21
Confusion of Ontology and Epistemology
  • blood pressure is an Organism Function,
  • cardiac output is a Laboratory or Test Result or
    Diagnostic Procedure
  • BP COPVR thus asserts that
  • blood pressure is proportional either to a
    laboratory or test result or to a diagnostic
    procedure

22
entities
  • independent dependent occurrents
  • continuants continuants (always
    dependent)
  • ORGANISMS ROLES PROCESSES
  • CELLS FUNCTIONS HISTORIES
  • MOLECULES CONDITIONS LIVES
    (diseases) (courses of

  • diseases)

23
entities
  • independent dependent occurrents
  • continuants continuants (always
    dependent)
  • ORGANISMS ROLES PROCESSES
  • CELLS FUNCTIONS HISTORIES
  • MOLECULES CONDITIONS LIVES
    (diseases) (courses of

  • diseases)

classes
instances
24
A three-category ontology along these lines
accepted by
  • DOLCE first module of Semantic Web Wonderweb
    Foundational Ontologies Library
  • BFO IFOMIS Basic Formal Ontology
  • LC LinKBase
  • UMLS-SN
  • Gene Ontology

25
(No Transcript)
26
Principles for Building Medical Ontologies
27
Examples
  • Dont confuse entities with concepts
  • Dont confuse domain entities with logical or
    computational structures
  • Dont confuse ontology with epistemology
  • Dont confuse is_a with has_role

28
Further Principles
  • univocity terms should have the same meanings
    (and thus point to the same referents) on every
    occasion of use
  • UMLS-SN
  • organization body plan
  • organization social organization

29
univocity
  • Gene Ontology
  • part_of can be part of (flagellum part_of
    cell)
  • part_of is sometimes part of (replication
    fork part_of the nucleoplasm)
  • part_of is included as a sublist in

30
dont forget instances
  • part_of as a relation between classes
  • vs. part as a relation between instances
  • A part_of B
  • every instance of A is part of some instance of B
  • every instance of B has some instance of A as part

31
Part_of as a relation between classes is more
problematic than is standardly supposed
  • testis part_of human being ?
  • heart part_of human being ?

32
objectivity
  • which classes exist is not a function of our
    biological knowledge.
  • (Terms such as unknown or unclassified or
    unlocalized do not designate biological natural
    kinds.)
  • GO
  • aminoadipate-semialdehyde dehydrogenase complex
    is_a unlocalized

33
rules for definitions
  • intelligibility the terms used in a definition
    should be simpler (more intelligible) than the
    term to be defined
  • definitions do not confuse definitions with the
    communication of new knowledge

34
substitutability
  • in all so-called extensional contexts a defined
    term should be substitutable by its definition in
    such a way that the result is both grammatically
    correct and has the same truth-value as the
    sentence with which we begin
  • GO0015070 toxin activity
  • Definition Acts as to cause injury to other
    living organisms.

35
substitutability
  • There is toxin activity here
  • There is acts as to cause injury to other living
    organisms here

36
(No Transcript)
37
GO the Gene Ontology
  • 3 large telephone directories of standardized
    designations for gene functions and products
  • organized into hierarchies via is_a and part_of

38
GO
  • can in practice be used only by trained
    biologists (with know how)
  • whether a GO-term truly stands in the is_a
    relation depends e.g. on the type of organism
    involved
  • glycosome is part-of cytoplasm only for
    Kinetoplastidae
  • Computers have no counterpart of such
    context-dependent know-how

39
GO divided into three disjoint term hierarchies
  • the cellular component ontology,
  • e.g. flagellum, chromosome, cell
  • the molecular function ontology,
  • e.g. ice nucleation, binding, protein
    stabilization
  • the biological process ontology,
  • e.g. glycolysis, death

40
Primary aim of GO
  • not rigorous definition and principled
    classification
  • but rather providing a practically useful
    framework for keeping track of the biological
    annotations that are applied to gene products

41
Thesis 1
  • With increasing size, GO will be required to
    increase the degree to which it is a controlled
    vocabulary which satisfies not merely the needs
    of human biologists but also the needs of
    automatic consistency-checking and updating
    systems

42
Thesis 2
  • GO can realize its goal more adequately (and
    avoid many coding errors) by taking ontology
    (especially the logic of classifications and
    definitions) seriously

43
GO the Gene Ontology
  • GO divided into 3 separate hierarchies each
    organized via is_a and part_of

44
Problems with is_a
  • A is_a B every instance of A is an instance of B

45
Problems with is_a
  • Holliday junction helicase complex is_a
  • unlocalized
  • protein storage vacuole is_a
  • vacuole (sensu Streptophyta)
  • R7 differentiation is_a eye photoreceptor
    differentiation (sensu Drosophilia).

46
Uses of part_of
  • membrane part-of cell, intended to mean a
    membrane is a part-of any cell
  • flagellum part-of cell, intended to mean a
    flagellum is part-of some cells
  • replication fork part-of cell cycle, intended
    to mean a replication fork is part-of the
    nucleoplasm only during certain times of the cell
    cycle
  • regulation of sleep part-of sleep, should be
    corrected to regulation of sleep is co-located
    with and is causally involved with the sleep
    process.

47
Problems with part_of
  • part_of can be part of (flagellum part_of
    cell)
  • part_of is sometimes part of (replication
    fork part_of the nucleoplasm)
  • part_of is included as a sublist in

48
Problems with GO Molecular Functions
  • anti-coagulant activity (defined as a
    substance that retards or prevents coagulation)
  • enzyme activity (defined as a substance that
    catalyzes)
  • structural molecule (defined as the action of
    a molecule that contributes to structural
    integrity)

49
GO0005199 structural constituent of cell wall
  • Definition The action of a molecule that
    contributes to the structural integrity of a cell
    wall.
  • confuses actions, which GO includes in its
    function ontology, with constituents, which GO
    includes in its cellular component ontology

50
  • extracellular matrix structural constituent
  • puparial glue (sensu Diptera)
  • structural constituent of bone
  • structural constituent of chorion (sensu Insecta)
  • structural constituent of chromatin
  • structural constituent of cuticle
  • structural constituent of cytoskeleton
  • structural constituent of epidermis
  • structural constituent of eye lens
  • structural constituent of muscle
  • structural constituent of myelin sheath
  • structural constituent of nuclear pore
  • structural constituent of peritrophic membrane
    (sensu Insecta)
  • structural constituent of ribosome
  • structural constituent of tooth enamel
  • structural constituent of vitelline membrane
    (sensu Insecta)

51
Why do these problems arise?
  • Because GO has no clear formal understanding of
    the role of temporal relations in organizing an
    ontology
  • (thus also no clear understanding of the
    difference between a function and the activity
    which is the realization of a function GO runs
    these two together)

52
As GO increases in size and scope
  • it will be increasingly difficult to maintain
    the semantic consistency we desire without
    software tools that perform consistency checks
    and controlled updates.
  • The addition of each new term will require the
    curator to understand the entire structure of GO
    in order to avoid redundancy and to ensure that
    all appropriate linkages are made with other
    terms.

53
Problems with GOs compositionality
sensu / with from in resulting regulating regulation of complex constituting constitution
54
/
  • GO0008608 microtubule/kinetochore interaction
  • df Physical interaction between microtubules and
    chromatin via proteins making up the kinetochore
    complex,
  • GO0001539 ciliary/flagellar motility
  • df Locomotion due to movement of cilia or
    flagella.

55
/
  • GO0045798 negative regulation of chromatin
    assembly/disassembly
  • df Any process that stops, prevents or reduces
    the rate of chromatin assembly and/or disassembly
  • GO0000082 G1/S transition of mitotic cell cycle
  • defined as Progression from G1 phase to S phase
    of the standard mitotic cell cycle.

56
/
  • GO0001559 interpretation of nuclear/cytoplasmic
    to regulate cell growth
  • df The process where the size of the nucleus
    with respect to its cytoplasm signals the cell to
    grow or stop growing.

57
/
  • GO0015539 hexuronate (glucuronate/galacturonate)
    porter activity
  • df Catalysis of the reaction hexuronate(out)
    cation(out) hexuronate(in) cation(in)

58
Problems with GOs consistency
  • GO 0030430 host cell cytoplasm part-of GO018995
    host
  • host cell cytoplasm df The cytoplasm of a host
    cell.
  • host df Any organism in which another organism,
    especially a parasite or symbiont, spends part or
    all of its life cycle and from which it obtains
    nourishment and/or protection.

59
Cellular Component
  • Another problem with host
  • It is not a cellular component (and not a
    molecular function, and not a biological process,
    either)
  • GO has adult walking behavior
  • but not adult or walking
  • GO has eye pigmentation but not eye

60
Solution
  • Link GO to external ontologies
  • of organism types (to solve the sensu problem)
  • of anatomy, to solve the eye problem
  • of coarse medical reality, to solve the adult
    walking behavior problem) (see MFN below)

61
note that such linkages are possible
  • only if GO itself has a coherent formal
    architecture

62
(No Transcript)
63
Medical Fact Net
  • Medical Belief Net (MBN)
  • large, heterogeneous, open-source corpus of
    medical sentences in the English language
    expressed in the form of grammatically complete
    statements and assessed by the degree to which
    they are understandable and assented to by
    typical non-expert human subjects.
  • Medical Fact Net (MFN) subclass of MBN
    receiving high marks on the scale of correctnesss
    from medical experts
  • MFN intersection of non-expert beliefs about
    medical phenomena and truths validated by medical
    experts.

64
Medical Word Net
  • lexical database extending the Princeton
    WordNet by all the medical terms encountered in
    MBN
  • First in (US) English
  • Then in German
  • First for adults, then for children
  • First for medicine, then for

65
MBN/MFN/MWN Formal Architecture
  • Semi-automatically generated graph-based parsing
    of each sentence
  • formal ontology of all MFN entities and
    relationships
  • mapping into the UMLS Metathesaurus.

66
Evaluation
  • MFN will be integrated into an existing
    term-search-based on-line consumer health portal
    based in such a way that MFN sentences are used
    to direct users to information sources. We will
    then measure the degree to which this results in
    greater user satisfaction by setting up an
    experiment in which customers of the portal are
    randomly assigned to one of two groups one to
    which access to MFN is offered, and other for
    which simple term-searching is used.

67
Significance
  • Non-expert language of family members, advisors,
    administrators, nurses, paramedics, lawyers
  • Research on differences between everyday language
    and technical language

68
Mismatches in Doctor-Patient Communication
  • Question Text My seven-year-old son developed a
    rash today that I believe to be chickenpox. My
    concern is that a friend of mine had her
    10-day-old baby at my home last evening before we
    were aware of the illness. Is there cause for
    concern at this point?
  • Answer Text Chickenpox is the common name for
    varicella infection. ... You are correct in
    that a person with chickenpox can be contagious
    for 48 hours before the first vesicle is seen.
    ...

69
Non-Expert Language in Online Communication
  • Need to integrate free text and structured data.
  • E-health services need automatic ways to respond
    to questions in standard forms, and to provide
    internet-accessible medical knowledge that is
    both reliable and accessible to the non-expert.

70
Diagnostic decision support
  • we might associate collections of utterances
    stored in MBN describing symptoms sourced to
    single patients with metadata recording
    subsequent diagnosis. Trained on this corpus, the
    system could establish patterns of association
    between specific sequences of utterances and
    specific diseases one could then test the degree
    to which such associations are sufficiently
    strong as to produce usable automatic diagnosis
    on the basis of patient inputs.

71
Medical education/medical literacy
  • Use MBN to evaluate of the reliability of the
    medical knowledge of different non-expert
    communities.
  • Use MFN to develop tools to support face-to-face
    education of lay people in the fields of medicine
    and health care
  • MBN provides opportunities for a new type of
    research in the field of consumer health.
  • e.g. on basic kinds in the medical domain à la
    Eleanor Rosch

72
Medical Coverage in WordNet 2.0
  • WordNets coverage of domains like medicine,
    physics, and geology is very limited.
  • coverage of medical terms represents a mixture
    of folk and expert vocabulary.

73
MFN From Words to Facts
  • Do for (non-expert) medicine what Belsteins Fact
    Database does for (expert) Biochemistry
  • Relation to CYC
  • Relation to FrameNet
  • Botany Knowledge Base
  • DARPAs Rapid Knowledge Formation project.

74
Sources
  • Lexical knowledge bases, such as
  • the relevant general lexical information
    contained in WordNet
  • lexical knowledge-bases of lay medical vocabulary
  • medical dictionaries and large medical
    terminology and ontology systems such as the UMLS
    Specialist Lexicon, the Foundational Model of
    Anatomy
  • Statement or fact knowledge bases, such as
  • d. open-source linguistic corpora, public health
    documents, internet resources
  • e. the relevant example sentences in the FrameNet
    and WordNet corpora
  • f. free text sources
  • g. the results of transforming the content of
    lexical knowledge bases (especially WordNet) into
    statements

75
Generation from lexical databases
  • treat a database like WordNet or LinKBase as a
    set of links tLt', between terms (where L ranges
    over 'is-a', 'part-of', 'is-caused-by', etc.).
  • We form the subset of this set by restricting
    the values of t and t' to those which terms occur
    in MWN
  • Some members of the resulting class of tLt'
    formula can then be transformed into English
    sentences automatically. For example each t is-a
    t'-formula can be transformed into a sentence of
    the form ' a t is a type of t' '
  • Other tLt' formula can be converted by hand into
    English sentences, for example "forearm
    HAS-PARTIAL-MATERIAL-OVERLAP wrist" can be
    transformed into "the forearm overlaps with the
    wrist" and "the wrist overlaps with the forearm".

76
Problems to be Addressed
  • generic medical knowledge of (non-expert) adults

77
Genericity
  • Much generic medical knowledge relates to what
    holds for the most part or in most cases or in a
    statistically significant fraction of cases
    (consider smoking causes cancer).

78
Medical knowledge
  • is intertwined with knowledge of other domains
  • (things that can be involved in an accident )

79
Knowledge
  • Much medical knowledge of experts and
    non-experts alike takes the form of knowledge of
    specific cases (Aunt Marys arthritis is always
    worse in the winter).
  • MFN should be a repository of medical knowledge
    that is generic and context-independent, the
    counterpart of the theoretical knowledge of the
    sciences.
  • Note that lexical knowledge of the sort stored
    in WordNet, too, is both generic and
    context-independent.

80
Expertise
  • a crisp separation of expert and non-expert
    sentences is impossible.
  • Viagra, anthrax, HIV, Prozac, SARS
  • ? experimental design needed to avoid artifacts

81
Completeness
  • Problem elementary facts People have two eyes.
    Babies are born. Arms move.
  • WordNet contains some coverage particularly of
    elementary facts of the A is type/part of B form
    in virtue of their specific formal architectures
  • WordNet synsets can be used to generate long
    lists of elementary facts from single starting
    points

82
Six
  • Transform MWN into a large corpus of generic
    beliefs by turning WordNet on its side that is
    we transform a relation such as t1, , tn IS-A
    t1, , tm into n x m sentences of the form
    ti IS-A tk
  • and impose filters

83
A New Kind of Linguistics
  • MFN part and parcel of recent attempts in the
    biomedical sciences to confront problems of
    similar scope in the development of large
    fact-repositories such as KEGG or Swiss-Prot.
  • In its final form it should be consistent with
    the knowledge that is contained also in other
    fact repositories both at the expert and the
    non-expert level and serve to integrate them
    together in a federated database.

84
Adult walking behavior
  • will be freed from its lonely status inside GO

85
  • The End
Write a Comment
User Comments (0)
About PowerShow.com