Title: Basic Introduction to Ontology-based Language Technology (LT) (2nd year Ms in Social Medicine, UG, Belgium)
1Basic Introduction toOntology-basedLanguage
Technology (LT)(2nd year Ms in Social Medicine,
UG, Belgium)
- Werner Ceusters
- European Centre for Ontological Research
- Universität des Saarlandes
- Saarbrücken, Germany
2Lecture overview
- Problem description patient eligibility for
clinical trial - Meaning theories
- Medical Language and terminologies
- realist ontology for medical natural language
understanding - Natural language understanding today
3The Medical Informatics Dogma
- Everything should be structured
- Fact computers can only deal with structured
representations of reality - structured data
- relational databases, spreadsheets
- structured information
- XML simulates context
- structured knowledge
- rule-based knowledge systems
- Typical conclusion (Dogma?)
- there is a need for structured data, hence
- there is a need for structured data entry
4Structured data entry
- Current technical solutions
- rigid data entry forms
- coding and classification systems
- But
- the description of biological variability
requires the flexibility of natural language and
it is generally desirable not to interfere with
the traditional manner of medical recording
(Wiederhold, 1980) - Initiatives to facilitate the entry of narrative
data have focused on the control rather than the
ease of data entry (Tanghe, 1997)
5Drawbacks of structured data entry
- Loss of information
- qualitatively
- limited expressiveness and inherent defects of
coding and classification systems, controlled
vocabularies, and traditional medical
terminologies - use of purpose oriented systems
- dont use data for another purpose than
originally foreseen (J VDL) - quantitatively
- too time-consuming to code all information
manually - Speech recognition and forms for structured data
entry are not best friends
6Areas for application of medical natural language
understanding
- Coding patient data
- Structured information extraction from
unstructured clinical notes - Clinical protocols and guidelines
- Assessing patient eligibility for clinical trial
entry - Triggering and alerts
- Linking case descriptions to scientific
literature - Easy access to content
- ... towards a medical semantic web
7Clinical history description
- Mr. Kovács is an 83-year-old man with a past
medical history of hypertension, congestive heart
failure, atrial fibrillation, hypercholesterolemia
, and a history of CVA who presented himself to
Budapest Emergency Room on April 25 with primary
complaint of right-sided chest pain since April
24. The patient was in his usual state of health
until April 24 when he experienced right-sided
chest pain after 10 minutes of bicycling exercise
at the YMCA. He described the chest pain as a
dull ache in the right side of his chest
radiating posteriorly to the right scapular area.
He rated the intensity as 7 out of 10. The chest
pain lasted about 3 minutes and resolved with
rest. That same night, the patient once again
experienced right-sided chest pain while lying in
bed just before he went to sleep. He describes
the pain as right-sided chest pain with same
radiation to posterior at an intensity of 6-7 out
of 10. The chest pain lasted about 10 minutes and
resolved spontaneously.
8Inclusion criteria of the INVEST study
- 1. Male or female
- 2. Age 50 to no upper limit
- 3. a) Hypertension documented as according to the
6th report of the Joint National Committee on
Detection and Evaluation of the treatment of high
BP (JNC VI) , b) and the need for drug therapy
(previously documented hypertension in patients
currently taking antihypertensive agents is
acceptable) - 4. Documented CAD (e.g., classic angina pectoris
stable angina pectoris Heberden angina
pectoris), myocardial infarction three or more
months ago, abnormal coronary angiography, or
concordant abnormalities on two different types
of stress tests - 5. Willingness to sign informed consent
9Do they match ?
- Mr. Kovács is an 83-year-old man with past
medical history of hypertension, congestive heart
failure, atrial fibrillation, hypercholesterolemia
, history of CVA who presented to Budapest
Emergency Room on April 25 with chief complaint
of right-sided chest pain since April 24. The
patient was in his usual state of health until
April 24 when he experienced right-sided chest
pain after 10 minutes of bicycling exercise at
YMCA. He described the chest pain as a dull ache
in the right side of his chest radiating
posteriorly to the right scapular area. He rated
the intensity as 7 out of 10. The chest pain
lasted about 3 minutes and resolved with rest.
That same night, the patient once again
experienced right-sided chest pain while lying in
bed right before he went to sleep. He describes
the pain as right-sided chest pain with same
radiation to posterior at an intensity of 6-7 out
of 10. The chest pain lasted about 10 minutes and
resolved spontaneously.
- 1. Male or female
- 2. Age 50 to no upper limit
- 3. Hypertension documented according to the 6th
report of the Joint National Committee on
Detection and Evaluation of the treatment of high
BP (JNC VI) and the need for drug therapy
(previously documented hypertension in patients
currently taking antihypertensive agents is
acceptable) - 4. Documented CAD (e.g., classic angina pectoris
(stable angina pectoris Heberden angina
pectoris), myocardial infarction three or more
months ago, abnormal coronary angiography, or
concordant abnormalities on two different types
of stress tests) - 5. Willingness to sign informed consent
??
10If the computer is to make this deduction ...
- 1. Male or female
- 2. Age 50 to no upper limit
- 3. Hypertension documented according to the 6th
report of the Joint National Committee on
Detection and Evaluation of the treatment of high
BP (JNC VI) and the need for drug therapy
(previously documented hypertension in patients
currently taking antihypertensive agents is
acceptable) - 4. Documented CAD (e.g., classic angina pectoris
(stable angina pectoris Heberden angina
pectoris), myocardial infarction three or more
months ago, abnormal coronary angiography, or
concordant abnormalities on two different types
of stress tests) - 5. Willingness to sign informed consent
- Mr. Kovács is an 83-year-old man with past
medical history of hypertension, congestive heart
failure, atrial fibrillation, hypercholesterolemia
, history of CVA who presented to Budapest
Emergency Room on April 25 with chief complaint
of right-sided chest pain since April 24. The
patient was in his usual state of health until
April 24 when he experienced right-sided chest
pain after 10 minutes of bicycling exercise at
YMCA. He described the chest pain as a dull ache
in the right side of his chest radiating
posteriorly to the right scapular area. He rated
the intensity as 7 out of 10. The chest pain
lasted about 3 minutes and resolved with rest.
That same night, the patient once again
experienced right-sided chest pain while lying in
bed right before he went to sleep. He describes
the pain as right-sided chest pain with same
radiation to posterior at an intensity of 6-7 out
of 10. The chest pain lasted about 10 minutes and
resolved spontaneously.
... it must be able to understand !
11What is understanding ?
- To understand something is to know what its
significance is. - What 'knowing significance' amounts to may be
very different in different contexts thus
understanding a piece of music requires different
things of us than understanding a sentence in a
language we are learning, for instance. It would
be useful, then, for theorists to look at the
different kinds of understanding that there are,
and examine them in detail and without prejudice,
rather than looking for the essence of
understanding. - (Tim Crane, philosopher of mind)
- The significance of a single sentence, too, can
vary from context to context.
12The etymology of understanding
- understanding ? Latin substare
- literally to stand under
- Websters Dictionary (1961) understanding the
power to render experience intelligible by
bringing perceived particulars under appropriate
concepts. - particulars what is NOT SAID of a subject
(Aristotle) - substances this patient, that tumor, ...
- qualities the red of that patients skin, his
body temperature, blood pressure, ... - processes that incision made by that surgeon,
the rise of that patients temperature,... - concepts may be taken in the above definition
as Aristotles universals what is SAID OF a
subject - Substantial concepts patient, tumor, ...
- Quality concepts white, temperature
- ...
13What is natural language understanding?
- NLU is constructing meaning from written
language by which the degree of understanding
involves a multifaceted meaning-making process
that depends on knowledge about language and
knowledge about the world. - ( cf. reading comprehension by humans. )
- But then what is meaning
14Dyadic models of meaning
- Saussure (language philosopher)
- signe / signifiant (sign/concept)
- Ron Stamper (information scientist)
- thing-A STANDS-FOR thing-B
- Major drawback
- excludes the referent from the model, i.e. that
what the sign/symbol/word/... denotes
15Current state of the art onmeaning in
healthcare informatics
- A pervasive bias towards concepts
- Content wise
- Work based on ISO/TC37 that advocates the
Ogden-Richards theory of meaning - Corresponds with a linguistic reading of
concept - Architecture wise
- In Europe work based on CEN/TC251 WG1 WG2 that
follow ISO/TC37 - In the US HL7, inspired by Speech Act Theory
- Concepts used as elements of information
models, hence mixing a linguistic and engineering
reading.
16Triadic models of meaning The Semiotic/Semantic
triangle
Reference Concept / Sense / Model / View /
Partition
Sign Language/ Term/ Symbol
Referent Reality/ Object
17Aristotles triadic meaning model
Words spoken are signs or symbols (symbola) of
affections or impressions (pathemata) of the soul
(psyche) written words (graphomena) are the
signs of words spoken (phoné). As writing
(grammatta), so also is speech not the same for
all races of men. But the mental affections
themselves, of which these words are primarily
signs (semeia), are the same for the whole of
mankind, as are also the objects (pragmata) of
which those affections are representations or
likenesses, images, copies (homoiomata).
Aristotle, 'On Interpretation', 1.16.a.4-9,
Translated by Cooke Tredennick, Loeb Classical
Library, William Heinemann, London, UK, 1938.
pathema
semeia ? gramma/ phoné
pragma
18Richards semantic triangle
- Reference (concept) indicates the realm of
memory where recollections of past experiences
and contexts occur. - Hence as with Aristotle, the reference is
mind-related thought. - But not the same for all, rather individual
mind-related
reference
symbol
referent
19Dont confuse with homonymy !
mole
20Different thoughts Homonymy
R2
R3
R1
mole skin lesion
mole unit
mole
mole animal
21And by the way, synonymy...
the Aristotelian view
Richards view
sweat
sweat
perspiration
perspiration
22Freges view
- sense is an objective feature of how words are
used and not a thought or concept in somebodys
head - 2 names with the same reference can have
different senses - 2 names with the same sense have the same
reference (synonyms) - a name with a sense does not need to have a
reference (Beethovens 10th symphony)
sense
name
reference (referent)
23Tetrahedric extensions
CEN/TC251 ENV 12264
FRISCO model (information science)
24Requirements for NLU
- Knowledge about terms and how they are used in
valid constructions within natural language - Knowledge about the world, i.e. how the referents
denoted by the terms interrelate in reality and
in given types of context - An algorithm that
- is able to calculate a language users
representation of that part of the world
described in the utterances that are the subject
of the analysis. - can track the ways in which people express what
does NOT represent anything in reality (eg for
medico-legal reasons)
25The medical language
26Some figures about the estimated size of
clinical language
- number of unique medical expressions 107
- In one domain (AIDS) 150.000 candidate term
phrases of 1 to 5 words found - 100-200 subdomains in medicine
- estimated 2-word expressions 4106
- assumes 20.000 meaningful single words
- assumes 10 combination rate
(Evans Patel 91)
27Some figures about the estimated size of
clinical language
- 0.5 x 106 entries in Oxford Dictionary of
English - 0.3 x 106 word occurrences in Snomed 3.1
- 0.15 x 106 meanings in Meta-1.3
- 0.10 x 106 entries in Dorlands Medical
Dictionary - 0.05 x 106 entries in Websters Collegiate
Dict. - 0.01 x 106 words in average human recognition
voc. - 0.005 x 106 words in basic English
Tuttle Nelson 94
28Specificities of the medical sublanguage
- Extensive use of acronyms
- reasons
- consequence of sublanguage shaping and use by a
relatively closed community - efficient and economical in use
- forms
- simple NIDDM non insulin-dependent diabetes
mellitus - compound GABAuria GABA in the urine
- Combined use of numerals and letters
- for types, stage, severity, position, measures
- exmpl IgG, IQ 50-70, type A1, ...
29Specificities of the medical sublanguage
- compounding and complex nouns
- extensive use of affixing
- embedded affixes -pathy, -osis,
- linked affixes -related, -induced, -linked,
- also outside the medical domain pseudo-, -like,
- foreign language importation
- words/expressions in Latin kyphosis dorsalis
juvenilis - Latin/Greek based words with English
lexicalisation - headache, cephalgia, cephalgic
- tooth, dens, dentis, dental, dente
30Specificities of the medical sublanguage
- abundance of synonyms (and pseudo-synonyms)
- abundance of proper nouns
- toponyms Thogoto virus, Rio Bravo Fever
- eponyms Laennecs cirrhosis, de Quervains
disease - use of ellipsis
- Ottos fever
- parachute mitral valve
- abundance of uncountable nouns
- substances paracetamol, antibiotic
- mass nouns acne, prurigo, air, materia alba
- process describing nouns calcification,
amelogenesis - state describing nouns hypoglycemia, anemia
31Specificities of the medical sublanguage
- large noun phrase structures
- congenital absence of auricle with stenosis of
auditory canal - acute narcotising cutaneous leishmaniasis
- explicit use of prepositions
- density of information
- multiple (pseudo-)synonymous entries (eg ICD)
- 487.1 Influenza, NOS
- 487.1 Flu
- 487.1 Grippe
- CAVE ! Same category does not imply same semantics
32The sublanguage of the clinical narrative
syntactic incompleteness.
- Deleted verb and object / subject
- stiff neck and fever
- Deleted tense and verb be
- brain scan negative
- Deleted subject, tense, and verb be
- positive for heart disease and diabetes
- Deleted subject
- was seen by local doctor
(Sager 1982)
33Taming medical language ...
- Classification systemsClinical
vocabulariesCoding SystemsNomenclaturesThesauri
...
34About nomenclatures and other strange animals (1)
- nomenclature system of terms which is elaborated
according to pre-established naming rules. In
principle, there is a one-to-one relationship
with the concepts of the subject field. - terminology set of terms representing the
concept system of a particular subject field - vocabulary list of terms in a specific subject
field, with their definitions - terminological system system that includes at
least one concept set and one or more
terminologies and / or coding schemes - thesaurus set of terms formally organised so
that relationships between concepts (for example
as 'broader' and 'narrower') are made explicit.
35About nomenclatures and other strange animals (2)
- coding scheme collection of rules to represent
items of one set with the elements of another
set - coding system terminological system consisting
in a combination of a concept system, a
terminology, a set of code values, and a coding
scheme to relate the codes to the concepts and/
or the terms. - classification terminological system whose
concept system is connected by generic relations
36Coding systems and nomenclatures in healthcare
- Main purpose to stabilise the terminology
- Mechanism assign a code to every single term
- Uses
- EDI
- data storage and archiving
- NLP
- Disadvantages
- no internal structure
- difficulties in finding specific terms
- does not account for synonyms
37Characteristics of an ideal medical knowledge
system?
- a unique code for each term (word, phrase)
? - each code-term being defined
- each term independent, not defined as the result
of other terms in the system ? - synonyms recognisable through the codes
- to each codes could be attached codes of related
terms ? - the system would encompass all of medicine
- the system would be in the public domain
- the format of the KB should be functionally
described, independent from hard- or software
(C. Bishop, 1989)
38Main problems associated with Bishops view
- A unique code for each term unaware of the
difference between terms and concepts - each term independent he probably ran into
problems with compositionality due to
misperception of the real issues - attachment of related codes this approach misses
a formal ground
39Requirements for clinical vocabularies (1)
- Domain completeness coverage of all possible
terms that lie within a vocabularys domain - Non-vagueness the term should represent the
concept behind it as close as possible - Non-ambiguity the same term cannot refer to more
than one concept - Non-redundancy each concept must be represented
by one unique identifier
(Cimino, 1989)
40Requirements for clinical vocabularies (2)
- Synonomy multiple ways for expressing a word (or
concept) must be allowed - Multiple classification concepts must be allowed
to be classified in multiple hierarchies - Consistency of view concepts must have the same
relationships in all views - Explicit relationships all relationships (e.g.
class, synonymy,) must be explicitly labelled.
41MeshMedical Subject Headings
- Designed for bibliographic indexing, eg Index
Medicus - Basis for MedLINE
- focuses on biomedicine and other basic healthcare
sciences - clinically very impoverished
- Consistency amongst indexers
- 60 for headings
- 30 for sub-headings
42MeSH Tree Structures - 2004
- Anatomy A
- Organisms B
- Diseases C
- Chemicals and Drugs D
- Analytical, Diagnostic and Therapeutic
Techniques and Equipment E - Psychiatry and Psychology F
- Biological Sciences G
- Physical Sciences H
- Anthropology, Education, Sociology and Social
Phenomena I - Technology and Food and Beverages J
- Humanities K
- Information Science L
- Persons M
- Health Care N
- Geographic Locations Z
43MeSH Tree Structures - 2004
- Cardiovascular Diseases C14
- Heart Diseases C14.280
- Arrhythmia C14.280.067
- Carcinoid Heart Disease C14.280.129
- Cardiomegaly C14.280.195
- Endocarditis C14.280.282
- Heart Aneurysm C14.280.358
- Heart Arrest C14.280.383
- Heart Defects, Congenital C14.280.400
- Aortic Coarctation C14.280.400.090
- Arrhythmogenic Right Ventricular Dysplasia
C14.280.400.145 - Cor Triatriatum C14.280.400.200
- Coronary Vessel Anomalies C14.280.400.210
- Crisscross Heart C14.280.400.220
- Dextrocardia C14.280.400.280
44MeSH Tree Structures - 2004
- Body Regions A01
- Extremities A01.378
- Lower Extremity A01.378.610
- Buttocks A01.378.610.100
- Foot A01.378.610.250
- Ankle A01.378.610.250.149
- Forefoot, Human A01.378.610.250.300
- Heel A01.378.610.250.510
- Hip A01.378.610.400
- Knee A01.378.610.450
- Leg A01.378.610.500
- Thigh A01.378.610.750
45MeSH Tree Structures - 2004
- Body Regions A01
- Abdomen A01.047
- Back A01.176
- Breast A01.236
- Extremities A01.378
- Amputation Stumps A01.378.100
- Lower Extremity A01.378.610
- Upper Extremity A01.378.800
- Head A01.456
- Neck A01.598
- Pelvis A01.673
- Perineum A01.719
- Thorax A01.911
- Viscera A01.960
46SNOMED International (1995)
- Multi-axial coding system
- morphology, disease, function, procedure, ...
- Each axis has an hierarchical structure
- Translations in other languages than English only
for older versions - Informal internal structuring
- Being translated in CG formalism, but with only
internal consistency - Possibility to generate meaningless concepts
- Mixing of hierarchies
- Bone
- Long Bone
- Periosteum
- Shaft
47Snomed International Number of records (V3.1)
- T Topography 12,385
- M Morphology 4,991
- F Function 16,352
- L Living Organisms 24,265
- C Drugs Biological Products 14,075
- A Physical Agents, Forces and Activities
1,355 - D Disease/ Diagnosis 28,623
- P Procedures 27,033
- S Social Context 433
- J Occupations 1,886
- G General Modifiers 1,176
- TOTAL RECORDS 132,641
48Snomed Internationalknowledge in the codes.
- posterior
- anatomic leaflet
- mitral
- cardiac valve
- cardiovascular
-
- CAVE ! This scheme is not consistently used
throughout the system.
49Snomed International multiple ways to express
the same thing
- D5-46210 Acute appendicitis, NOS
- D5-46100 Appendicitis, NOS
- G-A231 Acute
- M-41000 Acute inflammation, NOS
- G-C006 In
- T-59200 Appendix, NOS
- G-A231 Acute
- M-40000 Inflammation, NOS
- G-C006 In
- T-59200 Appendix, NOS
50The International Classification of diseases
(WHO).
- ...
- Chapter II Neoplasms (C00-D48)
- Chapter III Diseases of the Blood and
Blood-forming organs and certain disorders
involving the immune mechanism (D50-D89) - Excludes auto-immune disease (systemic) NOS
(M35.9) - ....
- Nutritional Anemias (D50-D53)
- D50 Iron deficiency anaemia
- Includes ...
- D50.0 Iron deficiency anaemia secondary
to blood loss (chronic) - Excludes ...
- D50.1 ...
- D51 Vit B12 deficiency anaemia
- Haemolytic Anemias (D55-D59)
- ...
- Chapter IV ...
51Specificities of the sublanguage used in
classification systems
- Significance of punctuation, especially the comma
- references to the source expression
- Cholera, NOS not otherwise specified
- external references
- pedophilia, same sex
- mixed internal/external references to the
classification - other protozoal intestinal disease, NEC not
elsewhere cited
52The search for internal formal consistency
medSORT-II
- no pin-prick sensation in calf gt
- ltneuro-sensation-mxgt
- ltmethodgt ltpin-prock-testgt pin-prick
- ltlocusgt ltbody-regiongt calf
- ltresultgt lteval-attrgt ltattrgt
sensation - ltvaluegt absent
- CAVE! Monks work with limited coverage
(Evans Hersh, 93)
53UMLS Unified Medical Language System (NLM)
- Tool for information retrieval of 4 components
- Metathesaurus contains information about
biomedical concepts and how they are represented
in diverse terminological systems. - Semantic Network contains information about
concept categories and the permissible
relationships among them - Information Sources Map contains both
human-readable and machine-processable
information about all kinds of biomedical
terminological systems - Specialist lexicon english words with POS
- The tool from and for the U.S. -)
54UMLS Semantic Network
55Semantic Network Relationships
- Is_a
- physically related to
- spatially related to
- temporally related to
- functionally related to
- conceptually related to
56Semantic Network Biologic Function Hierarchy
57Semantic Network "affects" Hierarchy
58Ontology
59There is ontology and ontology
- Ontology in Information Science
- An ontology is a description (like a formal
specification of a program) of the concepts and
relationships that can exist for an agent or a
community of agents. - Ontology in Philosophy
- Ontology is the science of what is, of the kinds
and structures of objects, properties, events,
processes and relations in every area of reality.
60If, later, you can remember just one thing of
this representation, then make sure it is this
one
- If you use the word ontology, ALWAYS be
specific about what you mean by it.
61My definition of an ontology
- a for a computer understable representation of
some pre-existing domain of REALITY, reflecting
the properties of the objects within its domain
in such a way that there obtain substantial and
systematic correlations between reality and the
ontology itself. - modified from Barry Smith
62A visit to the operating theatre
A lot of objects present
63A visit to the operating theatre
A lot of processes going on
Haydom Lutheran Hospital, Tanzania
64Axiom 1
- If the picture is not a fake, we (i.e., me and
this audience) KNOW that that hand, that surgeon,
... EXIST(ed), i.e. ARE (were) REAL. - But importantly that hand, surgeon, kocher,
mask, ... EXIST(ed) independent of our knowledge
about them and also the part-relationship
between that hand and that surgeon, and the
processes going on, are (were) equally real.
65But there is also communication
He wants me to remove that blood
I must get rid of that blood
Suction, please !
66Issues in communication
Give me a kocher, please.
67Concept-based Terminology
kocher
68Axiom 2
- Concept-based terminology (and standardisation
thereof) is there as a mechanism to improve
understanding of messages, originally by humans,
now also by machines. - It is NOT the right device to explain why reality
is what it is, how it is organised, etc.,
(although it is needed to allow us to communicate
on insights thereof).
69Why not ?
- Does not take care of universals and particulars
appropriately - Concepts not necessarily correspond to something
that (will) exist(ed) - Sorcerer, unicorn, leprechaun, ...
- Definitions set the conditions under which terms
may be used, and may not be abused as conditions
an entity must satisfy to be what it is - Language can make strings of words look as if it
were terms - Middle lobe of left lung
- ...
70Borders classification of medicine
- Medicine
- Mental health
- Internal medicine
- Endocrinology
- Oversized endocrinology
- Gastro-enterology
- ...
- Pediatrics
- ...
- Oversized medicine
71SNOMED-CT (2004)
72NCI Thesaurus
- a biomedical thesaurus created specifically to
meet the needs of the NCI - semantically modeled cancer-related terminology
built using description logic
73Why description logicsare not enough
SNOMED-RT (2000)
74Underspecification
75Use of description logics does not guarantee
correct representations !
76Its not just a problemin Healthcare
Ontologies for Legal Information Serving and
Knowledge Management Joost Breuker, Abdullatif
Elhag, Emil Petkov and Radboud Winkels
77Ontology versusDescription Logics
- In the Description Logic world
- terms and definitions come first,
- the job is to validate them and reason with them
- In the realist ontology world
- robust ontology (with all its reasoning power)
comes first - and terms and term-hierarchies must be subjected
to the constraints of ontological coherence
78Search for cancer
79NCI Thesaurus Root concepts
80Conceptual entity
- Definition none
- Semantic type
- Conceptual entity
- Classification
- Subconcepts
- Action
- definition action a thing done
- And
- Definition an article which expresses the
relation of connection or addition, used to
conjoin a word with a word, ... - Classification
- Definition the grouping of things into classes
or categories
81Definition of cancer gene
82NCI Thesaurus architecture
Findings-And- Disorders-Kind
Anatomy-Kind
Disease
Formal subsumption or inheritance
Associative relationships providing
differentiae
Kinds restrict the domain and range of
associative relationships
ISA
Breast
Breast neoplasm
Disease-has-associated-anatomy
83Problems with C - rel - C
- Ad hoc readings of statements of the type
C1-relationship-C2 - Human has-part head // Human has-part
finger - California is-part-of United States //
California isa name - labial vein isa vein of head // labial vein
isa vulval vein - Concepts not necessarily correspond to something
that (will) exist(ed) - Sorcerer, unicorn, leprechaun, ...
- Definitions set the conditions under which terms
may be used, and may not be abused as conditions
an entity must satisfy to be what it is - Language can make strings of words look as if it
were terms - Middle lobe of left lung
84What do we need then ?
85Ontological theories
- theories between reality and the ontology
(ontology as a representation) - Granular Partition Theory (T Bittner B. Smith)
- Logic of Classes (B. Smith)
86Theory of granular partitions (B. Smith)
Think of it as Albertis grid
87Granular partitions main principles
- a partition is the drawing of a (typically
complex) fiat boundary over a certain domain - a partition typically comes with labels and/or an
address system - partitions are artefacts of our cognition
- a partition is transparent (veridical)
- bona fide objects exist independently of our
partitions, fiat objects are determined by
partitions - different partitions may represent cuts through
the same reality which are skew to each other - entities (existing in reality) located in the
same cell of a partition share common
characteristics
88(Simplified) Logic of classes
- primitive
- entities particulars versus universals
- relation inst such that
- all classes are universals all instances are
particulars - some universals are not classes, hence have no
instances pet, adult, physician - some particulars are not instances e.g. some
mereological sums - subsumption defined resorting to instances
89Reference Ontology
- a theory of a domain of entities in the world
- based on realizing the goals of maximal
expressiveness and adequacy to reality - sacrificing computational tractability for the
sake of representational adequacy
90Basic Ontological Notions
- Identity
- How are instances of a class distinguished from
each other - Unity
- How are all the parts of an instance isolated
- Essence
- Can a property change over time
- Dependence
- Can an entity exist without some others
91Basic Formal Ontology
- Basic Formal Ontology consists in a
series of sub-ontologies (most properly conceived
as a series of perspectives on reality), the most
important of which are - SnapBFO, a series of snapshot ontologies (Oti ),
indexed by times continuants - SpanBFO a single videoscopic ontology (Ov)
occurants. -
- Each Oti is an inventory of all entities
existing at a time. Ov is an inventory
(processory) of all processes unfolding through
time.
92Occurants and continuants
Picture by Vladimir Brajic
93(No Transcript)
94Take home messageLanguage Technology requiresa
clean separation of knowledge AND (the right sort
of) ontology
Pragmatic knowledge what users usually say or
think, what they consider important, how to
integrate in software
Knowledge of classification and coding systems
how an expression has been classified by such a
system
Knowledge of definitions and criteria how to
determine if a concept applies to a particular
instance
Surface linguistic knowledge how to express the
concepts in any given language
Conceptual knowledge the knowledge of sensible
domain concepts
Ontology what exists and how what exists relates
to each other
95Meaning in the machine
96Understanding content (1)
John Doe has a pyogenic granuloma of the left
thumb
John Doe has a pyogenic granuloma of the left
thumb
97Understanding content (2)
ltrecordgt ltpatientgtJohn Doelt/patientgt ltdiagnosisgtpy
ogenic granuloma of the left thumblt/diagnosisgt lt/r
ecordgt
ltrecordgt ltsubjectgt John Doe lt/subjectgt ltdiagnosisgt
pyogenic granuloma of the left thumb
lt/diagnosisgt lt/recordgt
98The XML misconception
lt?XML version"1.0" ?gt lt?XMLstylesheet
type"text/XSL" href"cr-radio.xsl"
?gt ltCR-RADIOLOGIEgtltENTETEgt ltINFORMATION-SERVICEgt
ltHOPITALgtGroupe hospitalier Léonard
Devintscielt/HOPITALgt ltSERVICEgtRadiologie
Centralelt/SERVICEgtltMEDECINgtDr. Bouaudlt/MEDECINgt
ltTITRE-EXAMENgtPhlébographie des membres
inférieurslt/TITRE-EXAMENgt lt/INFORMATION-SERVICEgt
ltINFORMATION-DEMANDEgt ltSERVICEgtSce Pr.
Charletlt/SERVICEgtltMEDECINgtDr. Brunielt/MEDECINgt
ltDATEgt29-10-99lt/DATEgt lt/INFORMATION-DEMANDEgt
ltINFORMATION-PATIENT ID"236784020"gtltNOMgtDonaldlt/
NOMgt ltPRENOMgtDucklt/PRENOMgtlt/INFORMATION-PAT
IENTgtlt/ENTETEgt ltBODYgt ltINDICATIONgtSuspicion
de phlébite de jambe gauchelt/INDICATIONgt
ltTECHNIQUEgtPonction bilatérale dune veine du dos
du pied et injection de 180cc de produit
de contrastelt/TECHNIQUEgt ltRESULTATSgtimage
lacunaire endoluminale visible au niveau des
veines péronières gauche. Absence dopacification
des veines tibiales antérieures et postérieures
gauches. Les veines illiaques et la veine cave
inférieure sont libres. lt/RESULTATSgt
ltCONCLUSIONgtTrombophlébite péronière et
probablement tibiale antérieure et
postérieure gauche.lt/CONCLUSION
gt lt/BODYgt lt/CR-RADIOLOGIEgt
99Towards Machine ReadableSemantics
Form
Structure
Meaning
Function
Usage
Document Type Definition
Knowledge Type Definition
Workflow Type Definition
Style Type Definition
Information Type Definition
Data about
Formalism
XML
CSS
RDF
OWL
?
Cases Static Dynamic
Bold Centred Align Left Blink
Title Paragraph Heading1 Play
Subject isPartOf Date After_value
Utility affectedBy Receive Protect
Actor Receival Maintenance Archival
Standard
Layout
Outline
Content
Behaviour
Process
Hao Ding, Ingeborg T. Sølvberg
100Understanding content (3)
lt129465004gt lt116154003gtJohn Doelt/116154003gt lt
8319008 gt 17372009 ltfinding sitegt 76505004
ltlateralitygt7771000lt/lateralitygt lt/finding
sitegt lt/ 8319008 gt lt/129465004gt
101Text-basedknowledge discovery
- Goal
- Finding new biomedical scientific knowledge
through the combination of existing knowledge as
represented in the medical literature - Motivation
- Prevention of re-inventing the wheel, re-usage of
specific knowledge outside the original domain of
discovery
102Swanson
Effects B
Substance A
Disease C
103Protein-Protein Interaction extracted from texts
by C. Blaschke
104Steps of Knowledge Discovery
- Training data gathering
- Feature generation
- k-grams, domain know-how, ...
- Feature selection
- Entropy, ?2, CFS, t-test, domain know-how...
- Feature integration
- SVM, ANN, PCL, CART, C4.5, kNN, ...
Some classifiers/learning methods
Limsoon Wong
105Functional componentsfor text-basedfeature
generation system
- Basic use components end-user
- Corpus Management tool
- Parser
- Export module
- Management components
- Corpus editor super user
- Grammar building workbench super user
- Domain Ontology editor super user
- Parser generator exporter
- Linguistic ontology (multi-lingual use)
exporter
106What does it taketo build such a system ?
- Short term single domain
- Corpus collection analysis
- Domain model design implementation
- Grammar Development
- Corpus Manipulation Engine
- Integration in Biomining package
- Long term generic system
- Grammar Building Workbench
- Parser Generator
- Documentation
107A statistics only system
108Relative Concept/Node identification (real)
Statistical analysis is powerful, but not enough
concepts
nodes
109One word multiple meanings
- Abbreviation Extraction (Schwartz 2003)
- Extracts short and long form pairs
Short form Long form
AA Alcoholic Anonymous
American
Americans
Arachidonic acid
arachidonic acid
amino acid
amino acids
anaemia
anemia
110Syntactic variant detection
- Corpus
- MEDLINE the largest collection of abstracts in
the biomedical domain - Rule learning
- 83,142 abstracts
- Obtained rules 14,158
- Evaluation
- 18,930 abstracts
- Count the occurrences of each generated variant.
Tsuruoka, et.al. 03 SIGIR
111Results antiinflammatory effect
Generation Probability Generated Variants Frequency
1.0 (input) antiinflammatory effect 7
0.462 anti-inflammatory effect 33
0.393 antiinflammatory effects 6
0.356 Antiinflammatory effect 0
0.286 antiinflammatory-effect 0
0.181 anti-inflammatory effects 23
112Results tumour necrosis factor alpha
Generation Probability Generated Variants Frequency
1.0 (Input) tumour necrosis factor alpha 15
0.492 tumor necrosis factor alpha 126
0.356 tumour necrosis factor-alpha 30
0.235 Tumour necrosis factor alpha 2
0.175 tumor necrosis factor alpha 182
0.115 Tumor necrosis factor alpha 8
113Biomedical NE Task (Collier Coling00,Kazama
ACL02, Kim ISMB02)
- Recognize names in the text
- Technical terms expressing proteins, genes,
cells, etc.
Thus, CIITA not only activates the expression of
class II genes but recruits another B
cell-specific coactivator to increase
transcriptional activity of class II promoters in
B cells .
Junichi Tsujii
114Kohonen clustering
115Kohonen clustering
116Domain-specific CUE-words
- if (domain.equals("PROTEINS"))
- subjObjVerbs_ar new Object
- "abolish", "abolishes", "abolished",
"abolishing", - "accompany", "accompanies", "accompanied",
"accompanying", - "acetylate", "acetylates","acetylated","acetylat
ing", - "activate", "activates", "activated",
"activating", - "affect", "affects", "affected", "affecting",
- ....
- if (domain.equals("PROTEINS"))
- ofByNouns_ar new Object
- "acetylation", "activation", "affection",
"aggregation", "altering", - "amelioration", "antagonization",
"association", "augmentation", "binding", - "blocking", "blockage",....
117Inter-protein relationship discovery
- Leptin rapidly inhibits hypothalamic neuropeptide
Y secretion and stimulates corticotropin-releasing
hormone secretion in adrenalectomized mice . - (leptin)-INHIBITS-(hypothalamic neuropeptide Y
secretion) - (leptin)-INHIBITS-(neuropeptide Y)
118... special patterns
- These results indicate that oTP-1 may prevent
luteolysis by inhibiting development of
endometrial responsiveness to oxytocin and ,
therefore , reduce oxytocin-induced synthesis of
IP3 and PGF2 alpha . - (oxytocin)-CAUSES-(synthesis of IP3 and PGF2
alpha) - (oxytocin)-CAUSES-(pgf2 alpha)
119From syntactic modification to subsumption
- (adj)-(noun) Cadj-noun IS_A Cnoun
- steroid hormone IS_A hormone
- fetal liver IS_A liver
- BUT not
- binding factor IS_A factor
- total protein IS_A protein
- two domain IS_A domain
- Usefulness ?
- relationship with the Cadj
120Text mining and classification