Basic Introduction to Ontology-based Language Technology (LT) (2nd year Ms in Social Medicine, UG, Belgium) - PowerPoint PPT Presentation

1 / 120
About This Presentation
Title:

Basic Introduction to Ontology-based Language Technology (LT) (2nd year Ms in Social Medicine, UG, Belgium)

Description:

Basic Introduction to Ontology-based Language Technology (LT) (2nd year Ms in Social Medicine, UG, Belgium) Werner Ceusters European Centre for Ontological Research – PowerPoint PPT presentation

Number of Views:4590
Avg rating:3.0/5.0
Slides: 121
Provided by: wer85
Category:

less

Transcript and Presenter's Notes

Title: Basic Introduction to Ontology-based Language Technology (LT) (2nd year Ms in Social Medicine, UG, Belgium)


1
Basic Introduction toOntology-basedLanguage
Technology (LT)(2nd year Ms in Social Medicine,
UG, Belgium)
  • Werner Ceusters
  • European Centre for Ontological Research
  • Universität des Saarlandes
  • Saarbrücken, Germany

2
Lecture overview
  • Problem description patient eligibility for
    clinical trial
  • Meaning theories
  • Medical Language and terminologies
  • realist ontology for medical natural language
    understanding
  • Natural language understanding today

3
The Medical Informatics Dogma
  • Everything should be structured
  • Fact computers can only deal with structured
    representations of reality
  • structured data
  • relational databases, spreadsheets
  • structured information
  • XML simulates context
  • structured knowledge
  • rule-based knowledge systems
  • Typical conclusion (Dogma?)
  • there is a need for structured data, hence
  • there is a need for structured data entry

4
Structured data entry
  • Current technical solutions
  • rigid data entry forms
  • coding and classification systems
  • But
  • the description of biological variability
    requires the flexibility of natural language and
    it is generally desirable not to interfere with
    the traditional manner of medical recording
    (Wiederhold, 1980)
  • Initiatives to facilitate the entry of narrative
    data have focused on the control rather than the
    ease of data entry (Tanghe, 1997)

5
Drawbacks of structured data entry
  • Loss of information
  • qualitatively
  • limited expressiveness and inherent defects of
    coding and classification systems, controlled
    vocabularies, and traditional medical
    terminologies
  • use of purpose oriented systems
  • dont use data for another purpose than
    originally foreseen (J VDL)
  • quantitatively
  • too time-consuming to code all information
    manually
  • Speech recognition and forms for structured data
    entry are not best friends

6
Areas for application of medical natural language
understanding
  • Coding patient data
  • Structured information extraction from
    unstructured clinical notes
  • Clinical protocols and guidelines
  • Assessing patient eligibility for clinical trial
    entry
  • Triggering and alerts
  • Linking case descriptions to scientific
    literature
  • Easy access to content
  • ... towards a medical semantic web

7
Clinical history description
  • Mr. Kovács is an 83-year-old man with a past
    medical history of hypertension, congestive heart
    failure, atrial fibrillation, hypercholesterolemia
    , and a history of CVA who presented himself to
    Budapest Emergency Room on April 25 with primary
    complaint of right-sided chest pain since April
    24. The patient was in his usual state of health
    until April 24 when he experienced right-sided
    chest pain after 10 minutes of bicycling exercise
    at the YMCA. He described the chest pain as a
    dull ache in the right side of his chest
    radiating posteriorly to the right scapular area.
    He rated the intensity as 7 out of 10. The chest
    pain lasted about 3 minutes and resolved with
    rest. That same night, the patient once again
    experienced right-sided chest pain while lying in
    bed just before he went to sleep. He describes
    the pain as right-sided chest pain with same
    radiation to posterior at an intensity of 6-7 out
    of 10. The chest pain lasted about 10 minutes and
    resolved spontaneously.

8
Inclusion criteria of the INVEST study
  • 1. Male or female
  • 2. Age 50 to no upper limit
  • 3. a) Hypertension documented as according to the
    6th report of the Joint National Committee on
    Detection and Evaluation of the treatment of high
    BP (JNC VI) , b) and the need for drug therapy
    (previously documented hypertension in patients
    currently taking antihypertensive agents is
    acceptable)
  • 4. Documented CAD (e.g., classic angina pectoris
    stable angina pectoris Heberden angina
    pectoris), myocardial infarction three or more
    months ago, abnormal coronary angiography, or
    concordant abnormalities on two different types
    of stress tests
  • 5. Willingness to sign informed consent

9
Do they match ?
  • Mr. Kovács is an 83-year-old man with past
    medical history of hypertension, congestive heart
    failure, atrial fibrillation, hypercholesterolemia
    , history of CVA who presented to Budapest
    Emergency Room on April 25 with chief complaint
    of right-sided chest pain since April 24. The
    patient was in his usual state of health until
    April 24 when he experienced right-sided chest
    pain after 10 minutes of bicycling exercise at
    YMCA. He described the chest pain as a dull ache
    in the right side of his chest radiating
    posteriorly to the right scapular area. He rated
    the intensity as 7 out of 10. The chest pain
    lasted about 3 minutes and resolved with rest.
    That same night, the patient once again
    experienced right-sided chest pain while lying in
    bed right before he went to sleep. He describes
    the pain as right-sided chest pain with same
    radiation to posterior at an intensity of 6-7 out
    of 10. The chest pain lasted about 10 minutes and
    resolved spontaneously.
  • 1. Male or female
  • 2. Age 50 to no upper limit
  • 3. Hypertension documented according to the 6th
    report of the Joint National Committee on
    Detection and Evaluation of the treatment of high
    BP (JNC VI) and the need for drug therapy
    (previously documented hypertension in patients
    currently taking antihypertensive agents is
    acceptable)
  • 4. Documented CAD (e.g., classic angina pectoris
    (stable angina pectoris Heberden angina
    pectoris), myocardial infarction three or more
    months ago, abnormal coronary angiography, or
    concordant abnormalities on two different types
    of stress tests)
  • 5. Willingness to sign informed consent

??
10
If the computer is to make this deduction ...
  • 1. Male or female
  • 2. Age 50 to no upper limit
  • 3. Hypertension documented according to the 6th
    report of the Joint National Committee on
    Detection and Evaluation of the treatment of high
    BP (JNC VI) and the need for drug therapy
    (previously documented hypertension in patients
    currently taking antihypertensive agents is
    acceptable)
  • 4. Documented CAD (e.g., classic angina pectoris
    (stable angina pectoris Heberden angina
    pectoris), myocardial infarction three or more
    months ago, abnormal coronary angiography, or
    concordant abnormalities on two different types
    of stress tests)
  • 5. Willingness to sign informed consent
  • Mr. Kovács is an 83-year-old man with past
    medical history of hypertension, congestive heart
    failure, atrial fibrillation, hypercholesterolemia
    , history of CVA who presented to Budapest
    Emergency Room on April 25 with chief complaint
    of right-sided chest pain since April 24. The
    patient was in his usual state of health until
    April 24 when he experienced right-sided chest
    pain after 10 minutes of bicycling exercise at
    YMCA. He described the chest pain as a dull ache
    in the right side of his chest radiating
    posteriorly to the right scapular area. He rated
    the intensity as 7 out of 10. The chest pain
    lasted about 3 minutes and resolved with rest.
    That same night, the patient once again
    experienced right-sided chest pain while lying in
    bed right before he went to sleep. He describes
    the pain as right-sided chest pain with same
    radiation to posterior at an intensity of 6-7 out
    of 10. The chest pain lasted about 10 minutes and
    resolved spontaneously.

... it must be able to understand !
11
What is understanding ?
  • To understand something is to know what its
    significance is.
  • What 'knowing significance' amounts to may be
    very different in different contexts thus
    understanding a piece of music requires different
    things of us than understanding a sentence in a
    language we are learning, for instance. It would
    be useful, then, for theorists to look at the
    different kinds of understanding that there are,
    and examine them in detail and without prejudice,
    rather than looking for the essence of
    understanding.
  • (Tim Crane, philosopher of mind)
  • The significance of a single sentence, too, can
    vary from context to context.

12
The etymology of understanding
  • understanding ? Latin substare
  • literally to stand under
  • Websters Dictionary (1961) understanding the
    power to render experience intelligible by
    bringing perceived particulars under appropriate
    concepts.
  • particulars what is NOT SAID of a subject
    (Aristotle)
  • substances this patient, that tumor, ...
  • qualities the red of that patients skin, his
    body temperature, blood pressure, ...
  • processes that incision made by that surgeon,
    the rise of that patients temperature,...
  • concepts may be taken in the above definition
    as Aristotles universals what is SAID OF a
    subject
  • Substantial concepts patient, tumor, ...
  • Quality concepts white, temperature
  • ...

13
What is natural language understanding?
  • NLU is constructing meaning from written
    language by which the degree of understanding
    involves a multifaceted meaning-making process
    that depends on knowledge about language and
    knowledge about the world.
  • ( cf. reading comprehension by humans. )
  • But then what is meaning

14
Dyadic models of meaning
  • Saussure (language philosopher)
  • signe / signifiant (sign/concept)
  • Ron Stamper (information scientist)
  • thing-A STANDS-FOR thing-B
  • Major drawback
  • excludes the referent from the model, i.e. that
    what the sign/symbol/word/... denotes

15
Current state of the art onmeaning in
healthcare informatics
  • A pervasive bias towards concepts
  • Content wise
  • Work based on ISO/TC37 that advocates the
    Ogden-Richards theory of meaning
  • Corresponds with a linguistic reading of
    concept
  • Architecture wise
  • In Europe work based on CEN/TC251 WG1 WG2 that
    follow ISO/TC37
  • In the US HL7, inspired by Speech Act Theory
  • Concepts used as elements of information
    models, hence mixing a linguistic and engineering
    reading.

16
Triadic models of meaning The Semiotic/Semantic
triangle
Reference Concept / Sense / Model / View /
Partition
Sign Language/ Term/ Symbol
Referent Reality/ Object
17
Aristotles triadic meaning model
Words spoken are signs or symbols (symbola) of
affections or impressions (pathemata) of the soul
(psyche) written words (graphomena) are the
signs of words spoken (phoné). As writing
(grammatta), so also is speech not the same for
all races of men. But the mental affections
themselves, of which these words are primarily
signs (semeia), are the same for the whole of
mankind, as are also the objects (pragmata) of
which those affections are representations or
likenesses, images, copies (homoiomata).
Aristotle, 'On Interpretation', 1.16.a.4-9,
Translated by Cooke Tredennick, Loeb Classical
Library, William Heinemann, London, UK, 1938.
pathema
semeia ? gramma/ phoné
pragma
18
Richards semantic triangle
  • Reference (concept) indicates the realm of
    memory where recollections of past experiences
    and contexts occur.
  • Hence as with Aristotle, the reference is
    mind-related thought.
  • But not the same for all, rather individual
    mind-related

reference
symbol
referent
19
Dont confuse with homonymy !
mole
20
Different thoughts Homonymy
R2
R3
R1
mole skin lesion
mole unit
mole
mole animal
21
And by the way, synonymy...
the Aristotelian view
Richards view
sweat
sweat
perspiration
perspiration
22
Freges view
  • sense is an objective feature of how words are
    used and not a thought or concept in somebodys
    head
  • 2 names with the same reference can have
    different senses
  • 2 names with the same sense have the same
    reference (synonyms)
  • a name with a sense does not need to have a
    reference (Beethovens 10th symphony)

sense
name
reference (referent)
23
Tetrahedric extensions
CEN/TC251 ENV 12264
FRISCO model (information science)
24
Requirements for NLU
  • Knowledge about terms and how they are used in
    valid constructions within natural language
  • Knowledge about the world, i.e. how the referents
    denoted by the terms interrelate in reality and
    in given types of context
  • An algorithm that
  • is able to calculate a language users
    representation of that part of the world
    described in the utterances that are the subject
    of the analysis.
  • can track the ways in which people express what
    does NOT represent anything in reality (eg for
    medico-legal reasons)

25
The medical language
26
Some figures about the estimated size of
clinical language
  • number of unique medical expressions 107
  • In one domain (AIDS) 150.000 candidate term
    phrases of 1 to 5 words found
  • 100-200 subdomains in medicine
  • estimated 2-word expressions 4106
  • assumes 20.000 meaningful single words
  • assumes 10 combination rate

(Evans Patel 91)
27
Some figures about the estimated size of
clinical language
  • 0.5 x 106 entries in Oxford Dictionary of
    English
  • 0.3 x 106 word occurrences in Snomed 3.1
  • 0.15 x 106 meanings in Meta-1.3
  • 0.10 x 106 entries in Dorlands Medical
    Dictionary
  • 0.05 x 106 entries in Websters Collegiate
    Dict.
  • 0.01 x 106 words in average human recognition
    voc.
  • 0.005 x 106 words in basic English

Tuttle Nelson 94
28
Specificities of the medical sublanguage
  • Extensive use of acronyms
  • reasons
  • consequence of sublanguage shaping and use by a
    relatively closed community
  • efficient and economical in use
  • forms
  • simple NIDDM non insulin-dependent diabetes
    mellitus
  • compound GABAuria GABA in the urine
  • Combined use of numerals and letters
  • for types, stage, severity, position, measures
  • exmpl IgG, IQ 50-70, type A1, ...

29
Specificities of the medical sublanguage
  • compounding and complex nouns
  • extensive use of affixing
  • embedded affixes -pathy, -osis,
  • linked affixes -related, -induced, -linked,
  • also outside the medical domain pseudo-, -like,
  • foreign language importation
  • words/expressions in Latin kyphosis dorsalis
    juvenilis
  • Latin/Greek based words with English
    lexicalisation
  • headache, cephalgia, cephalgic
  • tooth, dens, dentis, dental, dente

30
Specificities of the medical sublanguage
  • abundance of synonyms (and pseudo-synonyms)
  • abundance of proper nouns
  • toponyms Thogoto virus, Rio Bravo Fever
  • eponyms Laennecs cirrhosis, de Quervains
    disease
  • use of ellipsis
  • Ottos fever
  • parachute mitral valve
  • abundance of uncountable nouns
  • substances paracetamol, antibiotic
  • mass nouns acne, prurigo, air, materia alba
  • process describing nouns calcification,
    amelogenesis
  • state describing nouns hypoglycemia, anemia

31
Specificities of the medical sublanguage
  • large noun phrase structures
  • congenital absence of auricle with stenosis of
    auditory canal
  • acute narcotising cutaneous leishmaniasis
  • explicit use of prepositions
  • density of information
  • multiple (pseudo-)synonymous entries (eg ICD)
  • 487.1 Influenza, NOS
  • 487.1 Flu
  • 487.1 Grippe
  • CAVE ! Same category does not imply same semantics

32
The sublanguage of the clinical narrative
syntactic incompleteness.
  • Deleted verb and object / subject
  • stiff neck and fever
  • Deleted tense and verb be
  • brain scan negative
  • Deleted subject, tense, and verb be
  • positive for heart disease and diabetes
  • Deleted subject
  • was seen by local doctor

(Sager 1982)
33
Taming medical language ...
  • Classification systemsClinical
    vocabulariesCoding SystemsNomenclaturesThesauri
    ...

34
About nomenclatures and other strange animals (1)
  • nomenclature system of terms which is elaborated
    according to pre-established naming rules. In
    principle, there is a one-to-one relationship
    with the concepts of the subject field.
  • terminology set of terms representing the
    concept system of a particular subject field
  • vocabulary list of terms in a specific subject
    field, with their definitions
  • terminological system system that includes at
    least one concept set and one or more
    terminologies and / or coding schemes
  • thesaurus set of terms formally organised so
    that relationships between concepts (for example
    as 'broader' and 'narrower') are made explicit.

35
About nomenclatures and other strange animals (2)
  • coding scheme collection of rules to represent
    items of one set with the elements of another
    set
  • coding system terminological system consisting
    in a combination of a concept system, a
    terminology, a set of code values, and a coding
    scheme to relate the codes to the concepts and/
    or the terms.
  • classification terminological system whose
    concept system is connected by generic relations

36
Coding systems and nomenclatures in healthcare
  • Main purpose to stabilise the terminology
  • Mechanism assign a code to every single term
  • Uses
  • EDI
  • data storage and archiving
  • NLP
  • Disadvantages
  • no internal structure
  • difficulties in finding specific terms
  • does not account for synonyms

37
Characteristics of an ideal medical knowledge
system?
  • a unique code for each term (word, phrase)
    ?
  • each code-term being defined
  • each term independent, not defined as the result
    of other terms in the system ?
  • synonyms recognisable through the codes
  • to each codes could be attached codes of related
    terms ?
  • the system would encompass all of medicine
  • the system would be in the public domain
  • the format of the KB should be functionally
    described, independent from hard- or software

(C. Bishop, 1989)
38
Main problems associated with Bishops view
  • A unique code for each term unaware of the
    difference between terms and concepts
  • each term independent he probably ran into
    problems with compositionality due to
    misperception of the real issues
  • attachment of related codes this approach misses
    a formal ground

39
Requirements for clinical vocabularies (1)
  • Domain completeness coverage of all possible
    terms that lie within a vocabularys domain
  • Non-vagueness the term should represent the
    concept behind it as close as possible
  • Non-ambiguity the same term cannot refer to more
    than one concept
  • Non-redundancy each concept must be represented
    by one unique identifier

(Cimino, 1989)
40
Requirements for clinical vocabularies (2)
  • Synonomy multiple ways for expressing a word (or
    concept) must be allowed
  • Multiple classification concepts must be allowed
    to be classified in multiple hierarchies
  • Consistency of view concepts must have the same
    relationships in all views
  • Explicit relationships all relationships (e.g.
    class, synonymy,) must be explicitly labelled.

41
MeshMedical Subject Headings
  • Designed for bibliographic indexing, eg Index
    Medicus
  • Basis for MedLINE
  • focuses on biomedicine and other basic healthcare
    sciences
  • clinically very impoverished
  • Consistency amongst indexers
  • 60 for headings
  • 30 for sub-headings

42
MeSH Tree Structures - 2004
  •  Anatomy A
  •  Organisms B
  •  Diseases C
  •  Chemicals and Drugs D
  •  Analytical, Diagnostic and Therapeutic
    Techniques and Equipment E
  •  Psychiatry and Psychology F
  •  Biological Sciences G
  •  Physical Sciences H
  •  Anthropology, Education, Sociology and Social
    Phenomena I
  •  Technology and Food and Beverages J
  •  Humanities K
  •  Information Science L
  •  Persons M
  •  Health Care N
  • Geographic Locations Z

43
MeSH Tree Structures - 2004
  • Cardiovascular Diseases C14
  • Heart Diseases C14.280
  • Arrhythmia C14.280.067
  • Carcinoid Heart Disease C14.280.129
  • Cardiomegaly C14.280.195
  • Endocarditis C14.280.282
  • Heart Aneurysm C14.280.358
  • Heart Arrest C14.280.383
  • Heart Defects, Congenital C14.280.400
  • Aortic Coarctation C14.280.400.090
  • Arrhythmogenic Right Ventricular Dysplasia
    C14.280.400.145
  • Cor Triatriatum C14.280.400.200
  • Coronary Vessel Anomalies C14.280.400.210
  • Crisscross Heart C14.280.400.220
  • Dextrocardia C14.280.400.280

44
MeSH Tree Structures - 2004
  • Body Regions A01
  • Extremities A01.378
  • Lower Extremity A01.378.610
  • Buttocks A01.378.610.100
  • Foot A01.378.610.250
  • Ankle A01.378.610.250.149
  • Forefoot, Human A01.378.610.250.300
  • Heel A01.378.610.250.510
  • Hip A01.378.610.400
  • Knee A01.378.610.450
  • Leg A01.378.610.500
  • Thigh A01.378.610.750

45
MeSH Tree Structures - 2004
  • Body Regions A01
  • Abdomen A01.047
  • Back A01.176
  • Breast A01.236
  • Extremities A01.378
  • Amputation Stumps A01.378.100
  • Lower Extremity A01.378.610
  • Upper Extremity A01.378.800
  • Head A01.456
  • Neck A01.598
  • Pelvis A01.673
  • Perineum A01.719
  • Thorax A01.911
  • Viscera A01.960

46
SNOMED International (1995)
  • Multi-axial coding system
  • morphology, disease, function, procedure, ...
  • Each axis has an hierarchical structure
  • Translations in other languages than English only
    for older versions
  • Informal internal structuring
  • Being translated in CG formalism, but with only
    internal consistency
  • Possibility to generate meaningless concepts
  • Mixing of hierarchies
  • Bone
  • Long Bone
  • Periosteum
  • Shaft

47
Snomed International Number of records (V3.1)
  • T Topography 12,385
  • M Morphology 4,991
  • F Function 16,352
  • L Living Organisms 24,265
  • C Drugs Biological Products 14,075
  • A Physical Agents, Forces and Activities
    1,355
  • D Disease/ Diagnosis 28,623
  • P Procedures 27,033
  • S Social Context 433
  • J Occupations 1,886
  • G General Modifiers 1,176
  • TOTAL RECORDS 132,641

48
Snomed Internationalknowledge in the codes.
  • posterior
  • anatomic leaflet
  • mitral
  • cardiac valve
  • cardiovascular
  • CAVE ! This scheme is not consistently used
    throughout the system.

49
Snomed International multiple ways to express
the same thing
  • D5-46210 Acute appendicitis, NOS
  • D5-46100 Appendicitis, NOS
  • G-A231 Acute
  • M-41000 Acute inflammation, NOS
  • G-C006 In
  • T-59200 Appendix, NOS
  • G-A231 Acute
  • M-40000 Inflammation, NOS
  • G-C006 In
  • T-59200 Appendix, NOS

50
The International Classification of diseases
(WHO).
  • ...
  • Chapter II Neoplasms (C00-D48)
  • Chapter III Diseases of the Blood and
    Blood-forming organs and certain disorders
    involving the immune mechanism (D50-D89)
  • Excludes auto-immune disease (systemic) NOS
    (M35.9)
  • ....
  • Nutritional Anemias (D50-D53)
  • D50 Iron deficiency anaemia
  • Includes ...
  • D50.0 Iron deficiency anaemia secondary
    to blood loss (chronic)
  • Excludes ...
  • D50.1 ...
  • D51 Vit B12 deficiency anaemia
  • Haemolytic Anemias (D55-D59)
  • ...
  • Chapter IV ...

51
Specificities of the sublanguage used in
classification systems
  • Significance of punctuation, especially the comma
  • references to the source expression
  • Cholera, NOS not otherwise specified
  • external references
  • pedophilia, same sex
  • mixed internal/external references to the
    classification
  • other protozoal intestinal disease, NEC not
    elsewhere cited

52
The search for internal formal consistency
medSORT-II
  • no pin-prick sensation in calf gt
  • ltneuro-sensation-mxgt
  • ltmethodgt ltpin-prock-testgt pin-prick
  • ltlocusgt ltbody-regiongt calf
  • ltresultgt lteval-attrgt ltattrgt
    sensation
  • ltvaluegt absent
  • CAVE! Monks work with limited coverage

(Evans Hersh, 93)
53
UMLS Unified Medical Language System (NLM)
  • Tool for information retrieval of 4 components
  • Metathesaurus contains information about
    biomedical concepts and how they are represented
    in diverse terminological systems.
  • Semantic Network contains information about
    concept categories and the permissible
    relationships among them
  • Information Sources Map contains both
    human-readable and machine-processable
    information about all kinds of biomedical
    terminological systems
  • Specialist lexicon english words with POS
  • The tool from and for the U.S. -)

54
UMLS Semantic Network
55
Semantic Network Relationships
  • Is_a
  • physically related to
  • spatially related to
  • temporally related to
  • functionally related to
  • conceptually related to

56
Semantic Network Biologic Function Hierarchy
57
Semantic Network "affects" Hierarchy 
58
Ontology
59
There is ontology and ontology
  • Ontology in Information Science
  • An ontology is a description (like a formal
    specification of a program) of the concepts and
    relationships that can exist for an agent or a
    community of agents.
  • Ontology in Philosophy
  • Ontology is the science of what is, of the kinds
    and structures of objects, properties, events,
    processes and relations in every area of reality.

60
If, later, you can remember just one thing of
this representation, then make sure it is this
one
  • If you use the word ontology, ALWAYS be
    specific about what you mean by it.

61
My definition of an ontology
  • a for a computer understable representation of
    some pre-existing domain of REALITY, reflecting
    the properties of the objects within its domain
    in such a way that there obtain substantial and
    systematic correlations between reality and the
    ontology itself.
  • modified from Barry Smith

62
A visit to the operating theatre
A lot of objects present
63
A visit to the operating theatre
A lot of processes going on
Haydom Lutheran Hospital, Tanzania
64
Axiom 1
  • If the picture is not a fake, we (i.e., me and
    this audience) KNOW that that hand, that surgeon,
    ... EXIST(ed), i.e. ARE (were) REAL.
  • But importantly that hand, surgeon, kocher,
    mask, ... EXIST(ed) independent of our knowledge
    about them and also the part-relationship
    between that hand and that surgeon, and the
    processes going on, are (were) equally real.

65
But there is also communication
He wants me to remove that blood
I must get rid of that blood
Suction, please !
66
Issues in communication
Give me a kocher, please.
67
Concept-based Terminology
kocher
68
Axiom 2
  • Concept-based terminology (and standardisation
    thereof) is there as a mechanism to improve
    understanding of messages, originally by humans,
    now also by machines.
  • It is NOT the right device to explain why reality
    is what it is, how it is organised, etc.,
    (although it is needed to allow us to communicate
    on insights thereof).

69
Why not ?
  • Does not take care of universals and particulars
    appropriately
  • Concepts not necessarily correspond to something
    that (will) exist(ed)
  • Sorcerer, unicorn, leprechaun, ...
  • Definitions set the conditions under which terms
    may be used, and may not be abused as conditions
    an entity must satisfy to be what it is
  • Language can make strings of words look as if it
    were terms
  • Middle lobe of left lung
  • ...

70
Borders classification of medicine
  • Medicine
  • Mental health
  • Internal medicine
  • Endocrinology
  • Oversized endocrinology
  • Gastro-enterology
  • ...
  • Pediatrics
  • ...
  • Oversized medicine

71
SNOMED-CT (2004)
72
NCI Thesaurus
  • a biomedical thesaurus created specifically to
    meet the needs of the NCI
  • semantically modeled cancer-related terminology
    built using description logic

73
Why description logicsare not enough
SNOMED-RT (2000)
74
Underspecification
75
Use of description logics does not guarantee
correct representations !
76
Its not just a problemin Healthcare
Ontologies for Legal Information Serving and
Knowledge Management Joost Breuker, Abdullatif
Elhag, Emil Petkov and Radboud Winkels
77
Ontology versusDescription Logics
  • In the Description Logic world
  • terms and definitions come first,
  • the job is to validate them and reason with them
  • In the realist ontology world
  • robust ontology (with all its reasoning power)
    comes first
  • and terms and term-hierarchies must be subjected
    to the constraints of ontological coherence

78
Search for cancer
79
NCI Thesaurus Root concepts
80
Conceptual entity
  • Definition none
  • Semantic type
  • Conceptual entity
  • Classification
  • Subconcepts
  • Action
  • definition action a thing done
  • And
  • Definition an article which expresses the
    relation of connection or addition, used to
    conjoin a word with a word, ...
  • Classification
  • Definition the grouping of things into classes
    or categories

81
Definition of cancer gene
82
NCI Thesaurus architecture
Findings-And- Disorders-Kind
Anatomy-Kind
Disease
Formal subsumption or inheritance
Associative relationships providing
differentiae
Kinds restrict the domain and range of
associative relationships
ISA
Breast
Breast neoplasm
Disease-has-associated-anatomy
83
Problems with C - rel - C
  • Ad hoc readings of statements of the type
    C1-relationship-C2
  • Human has-part head // Human has-part
    finger
  • California is-part-of United States //
    California isa name
  • labial vein isa vein of head // labial vein
    isa vulval vein
  • Concepts not necessarily correspond to something
    that (will) exist(ed)
  • Sorcerer, unicorn, leprechaun, ...
  • Definitions set the conditions under which terms
    may be used, and may not be abused as conditions
    an entity must satisfy to be what it is
  • Language can make strings of words look as if it
    were terms
  • Middle lobe of left lung

84
What do we need then ?
85
Ontological theories
  • theories between reality and the ontology
    (ontology as a representation)
  • Granular Partition Theory (T Bittner B. Smith)
  • Logic of Classes (B. Smith)

86
Theory of granular partitions (B. Smith)
Think of it as Albertis grid
87
Granular partitions main principles
  • a partition is the drawing of a (typically
    complex) fiat boundary over a certain domain
  • a partition typically comes with labels and/or an
    address system
  • partitions are artefacts of our cognition
  • a partition is transparent (veridical)
  • bona fide objects exist independently of our
    partitions, fiat objects are determined by
    partitions
  • different partitions may represent cuts through
    the same reality which are skew to each other
  • entities (existing in reality) located in the
    same cell of a partition share common
    characteristics

88
(Simplified) Logic of classes
  • primitive
  • entities particulars versus universals
  • relation inst such that
  • all classes are universals all instances are
    particulars
  • some universals are not classes, hence have no
    instances pet, adult, physician
  • some particulars are not instances e.g. some
    mereological sums
  • subsumption defined resorting to instances

89
Reference Ontology
  • a theory of a domain of entities in the world
  • based on realizing the goals of maximal
    expressiveness and adequacy to reality
  • sacrificing computational tractability for the
    sake of representational adequacy

90
Basic Ontological Notions
  • Identity
  • How are instances of a class distinguished from
    each other
  • Unity
  • How are all the parts of an instance isolated
  • Essence
  • Can a property change over time
  • Dependence
  • Can an entity exist without some others

91
Basic Formal Ontology
  • Basic Formal Ontology consists in a
    series of sub-ontologies (most properly conceived
    as a series of perspectives on reality), the most
    important of which are
  • SnapBFO, a series of snapshot ontologies (Oti ),
    indexed by times continuants
  • SpanBFO a single videoscopic ontology (Ov)
    occurants.
  • Each Oti is an inventory of all entities
    existing at a time. Ov is an inventory
    (processory) of all processes unfolding through
    time.

92
Occurants and continuants
Picture by Vladimir Brajic
93
(No Transcript)
94
Take home messageLanguage Technology requiresa
clean separation of knowledge AND (the right sort
of) ontology
Pragmatic knowledge what users usually say or
think, what they consider important, how to
integrate in software
Knowledge of classification and coding systems
how an expression has been classified by such a
system
Knowledge of definitions and criteria how to
determine if a concept applies to a particular
instance
Surface linguistic knowledge how to express the
concepts in any given language
Conceptual knowledge the knowledge of sensible
domain concepts
Ontology what exists and how what exists relates
to each other
95
Meaning in the machine
96
Understanding content (1)
John Doe has a pyogenic granuloma of the left
thumb
John Doe has a pyogenic granuloma of the left
thumb
97
Understanding content (2)
ltrecordgt ltpatientgtJohn Doelt/patientgt ltdiagnosisgtpy
ogenic granuloma of the left thumblt/diagnosisgt lt/r
ecordgt
ltrecordgt ltsubjectgt John Doe lt/subjectgt ltdiagnosisgt
pyogenic granuloma of the left thumb
lt/diagnosisgt lt/recordgt
98
The XML misconception
lt?XML version"1.0" ?gt lt?XMLstylesheet
type"text/XSL" href"cr-radio.xsl"
?gt ltCR-RADIOLOGIEgtltENTETEgt ltINFORMATION-SERVICEgt
ltHOPITALgtGroupe hospitalier Léonard
Devintscielt/HOPITALgt ltSERVICEgtRadiologie
Centralelt/SERVICEgtltMEDECINgtDr. Bouaudlt/MEDECINgt
ltTITRE-EXAMENgtPhlébographie des membres
inférieurslt/TITRE-EXAMENgt lt/INFORMATION-SERVICEgt
ltINFORMATION-DEMANDEgt ltSERVICEgtSce Pr.
Charletlt/SERVICEgtltMEDECINgtDr. Brunielt/MEDECINgt
ltDATEgt29-10-99lt/DATEgt lt/INFORMATION-DEMANDEgt
ltINFORMATION-PATIENT ID"236784020"gtltNOMgtDonaldlt/
NOMgt ltPRENOMgtDucklt/PRENOMgtlt/INFORMATION-PAT
IENTgtlt/ENTETEgt ltBODYgt ltINDICATIONgtSuspicion
de phlébite de jambe gauchelt/INDICATIONgt
ltTECHNIQUEgtPonction bilatérale dune veine du dos
du pied et injection de 180cc de produit
de contrastelt/TECHNIQUEgt ltRESULTATSgtimage
lacunaire endoluminale visible au niveau des
veines péronières gauche. Absence dopacification
des veines tibiales antérieures et postérieures
gauches. Les veines illiaques et la veine cave
inférieure sont libres. lt/RESULTATSgt
ltCONCLUSIONgtTrombophlébite péronière et
probablement tibiale antérieure et
postérieure gauche.lt/CONCLUSION
gt lt/BODYgt lt/CR-RADIOLOGIEgt
99
Towards Machine ReadableSemantics
Form
Structure
Meaning
Function
Usage
Document Type Definition
Knowledge Type Definition
Workflow Type Definition
Style Type Definition
Information Type Definition
Data about
Formalism
XML
CSS
RDF
OWL
?
Cases Static Dynamic
Bold Centred Align Left Blink
Title Paragraph Heading1 Play
Subject isPartOf Date After_value
Utility affectedBy Receive Protect
Actor Receival Maintenance Archival
Standard
Layout
Outline
Content
Behaviour
Process
Hao Ding, Ingeborg T. Sølvberg
100
Understanding content (3)
lt129465004gt lt116154003gtJohn Doelt/116154003gt lt
8319008 gt 17372009 ltfinding sitegt 76505004
ltlateralitygt7771000lt/lateralitygt lt/finding
sitegt lt/ 8319008 gt lt/129465004gt
101
Text-basedknowledge discovery
  • Goal
  • Finding new biomedical scientific knowledge
    through the combination of existing knowledge as
    represented in the medical literature
  • Motivation
  • Prevention of re-inventing the wheel, re-usage of
    specific knowledge outside the original domain of
    discovery

102
Swanson
Effects B
Substance A
Disease C
103
Protein-Protein Interaction extracted from texts
by C. Blaschke
104
Steps of Knowledge Discovery
  • Training data gathering
  • Feature generation
  • k-grams, domain know-how, ...
  • Feature selection
  • Entropy, ?2, CFS, t-test, domain know-how...
  • Feature integration
  • SVM, ANN, PCL, CART, C4.5, kNN, ...

Some classifiers/learning methods
Limsoon Wong
105
Functional componentsfor text-basedfeature
generation system
  • Basic use components end-user
  • Corpus Management tool
  • Parser
  • Export module
  • Management components
  • Corpus editor super user
  • Grammar building workbench super user
  • Domain Ontology editor super user
  • Parser generator exporter
  • Linguistic ontology (multi-lingual use)
    exporter

106
What does it taketo build such a system ?
  • Short term single domain
  • Corpus collection analysis
  • Domain model design implementation
  • Grammar Development
  • Corpus Manipulation Engine
  • Integration in Biomining package
  • Long term generic system
  • Grammar Building Workbench
  • Parser Generator
  • Documentation

107
A statistics only system
108
Relative Concept/Node identification (real)
Statistical analysis is powerful, but not enough
concepts
nodes
109
One word multiple meanings
  • Abbreviation Extraction (Schwartz 2003)
  • Extracts short and long form pairs

Short form Long form
AA Alcoholic Anonymous
American
Americans
Arachidonic acid
arachidonic acid
amino acid
amino acids
anaemia
anemia

110
Syntactic variant detection
  • Corpus
  • MEDLINE the largest collection of abstracts in
    the biomedical domain
  • Rule learning
  • 83,142 abstracts
  • Obtained rules 14,158
  • Evaluation
  • 18,930 abstracts
  • Count the occurrences of each generated variant.

Tsuruoka, et.al. 03 SIGIR
111
Results antiinflammatory effect
Generation Probability Generated Variants Frequency
1.0 (input) antiinflammatory effect 7
0.462 anti-inflammatory effect 33
0.393 antiinflammatory effects 6
0.356 Antiinflammatory effect 0
0.286 antiinflammatory-effect 0
0.181 anti-inflammatory effects 23

112
Results tumour necrosis factor alpha
Generation Probability Generated Variants Frequency
1.0 (Input) tumour necrosis factor alpha 15
0.492 tumor necrosis factor alpha 126
0.356 tumour necrosis factor-alpha 30
0.235 Tumour necrosis factor alpha 2
0.175 tumor necrosis factor alpha 182
0.115 Tumor necrosis factor alpha 8

113
Biomedical NE Task (Collier Coling00,Kazama
ACL02, Kim ISMB02)
  • Recognize names in the text
  • Technical terms expressing proteins, genes,
    cells, etc.

Thus, CIITA not only activates the expression of
class II genes but recruits another B
cell-specific coactivator to increase
transcriptional activity of class II promoters in
B cells .
Junichi Tsujii
114
Kohonen clustering
115
Kohonen clustering
116
Domain-specific CUE-words
  • if (domain.equals("PROTEINS"))
  • subjObjVerbs_ar new Object
  • "abolish", "abolishes", "abolished",
    "abolishing",
  • "accompany", "accompanies", "accompanied",
    "accompanying",
  • "acetylate", "acetylates","acetylated","acetylat
    ing",
  • "activate", "activates", "activated",
    "activating",
  • "affect", "affects", "affected", "affecting",
  • ....
  • if (domain.equals("PROTEINS"))
  • ofByNouns_ar new Object
  • "acetylation", "activation", "affection",
    "aggregation", "altering",
  • "amelioration", "antagonization",
    "association", "augmentation", "binding",
  • "blocking", "blockage",....

117
Inter-protein relationship discovery
  • Leptin rapidly inhibits hypothalamic neuropeptide
    Y secretion and stimulates corticotropin-releasing
    hormone secretion in adrenalectomized mice .
  • (leptin)-INHIBITS-(hypothalamic neuropeptide Y
    secretion)
  • (leptin)-INHIBITS-(neuropeptide Y)

118
... special patterns
  • These results indicate that oTP-1 may prevent
    luteolysis by inhibiting development of
    endometrial responsiveness to oxytocin and ,
    therefore , reduce oxytocin-induced synthesis of
    IP3 and PGF2 alpha .
  • (oxytocin)-CAUSES-(synthesis of IP3 and PGF2
    alpha)
  • (oxytocin)-CAUSES-(pgf2 alpha)

119
From syntactic modification to subsumption
  • (adj)-(noun) Cadj-noun IS_A Cnoun
  • steroid hormone IS_A hormone
  • fetal liver IS_A liver
  • BUT not
  • binding factor IS_A factor
  • total protein IS_A protein
  • two domain IS_A domain
  • Usefulness ?
  • relationship with the Cadj

120
Text mining and classification
Write a Comment
User Comments (0)
About PowerShow.com