Knowledge Engineering and SemiAutomatic Population of Medical Ontologies Using NLP Methodologies

About This Presentation

Title:

Knowledge Engineering and SemiAutomatic Population of Medical Ontologies Using NLP Methodologies

Description:

(Russel & Norvig, 1995) 4. Knowledge Engineering and Ontologies ... (Russel & Norvig 1995) 5. Medical Terminologies and Natural Language Processing (NLP) ... – PowerPoint PPT presentation

Number of Views:250

Avg rating:3.0/5.0

Slides: 17

Provided by: Ralf92

Category:

more less

Transcript and Presenter's Notes

Title: Knowledge Engineering and SemiAutomatic Population of Medical Ontologies Using NLP Methodologies

1
Knowledge Engineering and Semi-Automatic
Population of Medical Ontologies Using NLP
Methodologies

Munich 11.06.2007
Pinar Oezden Wennerberg
pinar.oezden_at_jrc.it

2
Agenda

Knowledge Engineering and Ontology
Definitions, methodologies, guidelines
Medical Terminology and Natural Language
Processing (NLP)
The problem of medical terminology
The context users, tasks, types of information
in the medical domain
The role of NLP and knowledge engineering
Motivation for Semi-Automatic Ontology Population
The knowledge acquisition bottleneck
Vast amount of knowledge available in (un- /
semi-)structured text, WWW, databases etc.
One example approach
Ontology population via Supervised Machine
Learning (ML)
Challenges

3
Knowledge Engineering and Ontologies

Some Definitions
Humans and software agents need knowledge about
the world in order to reach good decisions
Such knowledge is typically stored in
knowledgebases
Knowledge engineering is the process of building
a knowledgebase
A knowledge engineer is someone, who
investigates a particular domain,
determines what concepts and relations are
important in that domain,
and creates a formal representation of objects
and relations in that domain.
(Russel Norvig, 1995)

4
Knowledge Engineering and Ontologies

An ontology specifies a finite, controlled,
extensible and machine processable vocabulary for
a given knowledgebase
Consists of concepts, properties, relations,
axioms
Knowledge engineering guidelines
Decide what to talk about and on the vocabulary,
Encode general knowledge and a specific problem
case
Execute queries and verify inference
(Russel Norvig 1995)

5
Medical Terminologies and Natural Language
Processing (NLP)

Problem statement
Numerous heterogenious medical terminologies and
coding schemes exist that need to interoperate
e.g. Systemized Nomenclature of Medicine (SNOMED)
for coding paptient notes, ICD (International
Classification of Diseases), ICD-9-CM for billing
purposes,RIZIV, IDEWE, ICPC-2, ATC etc.
Existing efforts UMLS, Galen, MeSH, etc.

6
Medical Terminologies and Natural Language
Processing (NLP)

Definition of context
Information types to be collected are about
Individuals (e.g. medical records)
Groups (e.g. data about epidemiology, public
health)
Institutions (e.g. planning, management in
hospitals, clinics)
Domain specific knowledge (e.g. state-of-the-art
publications, proceedings)
Domain relevant tasks
Data entry, query and retrieval about patients
Information sharing and integration from
different applications and medical records

7
Medical Terminologies and Natural Language
Processing (NLP)
Information Extraction
Knowledge Representation and Reasoning

Question
Answering

Natural Language Processing
Machine Learning
Information Retrieval
Knowledge Discovery, Text Mining
Ontology Engineering
Adapted from Jena University www.julielab.de
8
Motivation for Semi-Automatic Ontology Population

The knowledge acquisition bottleneck
Ideally the knowledge engineer interviews the
knowledge expert to get educated about the domain
i.e. to acquire knowledge
? expensive in time and resources
? domain experts not alwaysavailable
Availability of vast amount knowledge
In resources such as medical databases, journals,
publications, conference proceedings, medical
reports etc.
World Wide Web

9
Ontology Population via Supervised Machine
Learning

Problem statement
Identify and extract relevant knowledge (terms,
phrases, relations, facts) in text e.g.
Terms health disorder, malfunction,
sickness, illness, maladie, Krankheit ?
Disease
Smoking causes cancer ? ltSmoking, Cancergt
Goal
Assign them to the appropriate concepts of the
ontology as instance
Concept Disease
Relation causes

10
Ontology Population via Supervised Machine
Learning

Processes
Annotate (i.e. supervised)
ltCAUgtSmokingltCAU/gt ltCAU-Rgtcauseslt/CAU-Rgt
ltDISgtcancerlt/DISgt
CAU DiseaseCause, CAU-R causalRelation, DIS
Disease
Learn and extract from a training set (i.e. ideal
world)
Extract from the test set (i.e. unknown world)
Apply the learned rules on new documents to
discover and extract new knowledge

11
Ontology Population via Supervised Machine
Learning

Learn and extract from a training set (i.e. ideal
world)
Recognize syntactic constructs such as NPs, VPs,
PPs
Generate extraction rules
Rule for concept Disease
Disease- ltNP smokinggtltVP causesgtltNP DIS gt
Rule for concept DiseaseCause
DiseaseCause- ltNP CAUgtltVP causesgtltNP cancer
gt
Rule for relation causalRelation
causalRelation- ltNP smokinggtltVP CAU-RgtltNP
cancer gt
Classify
Disease cancer
DiseaseCause smoking
causalRelation causes

12
Ontology Population via Supervised Machine
Learning

Possible problems
More than one value was extracted for a given
relation
Entities from different classes were extracted
(multiple concept assignment i.e. ambiguity)
Nothing was extracted
Possible solutions
Present the user all possible values, let the
user decide
To assist user with the decision process by
assigning confidence scores to possible values
i.e. how much does the system believe what it
suggests is relevant/true
Provide context information via text highlighting
to justify the systems confidence
Provide empty data entry slots for users to enter
their knowledge

13
Challenges

General challenges
It is difficult to eliminate the knowledge
acquisition problem entirely
Due to the sensitivity of the domain (human
health) the knowledge experts cannot be
completely avoided
Computer scientists need to work together with
domain experts to a certain extent
Systems should be able to be used by
non-technicians
Multilinguality
Healthcare workers, patients, administrators
should be able to have access to information in
their own language

14
Challenges

Knowledge/ontology engineering specific
challenges
Implicit information (typical for natural
language) i.e. not machine-processable (not
explicit)
Different levels of detail (granularity) is
required to meet different expectations
i.e. provide sufficient detail but abstract away
irrelevencies
Poly-hierachies to support multiple views
may lead to ambiguities, contradictions
Adaptability, extensibility for changing user
demands and for standards
Expressibility vs. computational tractibility
Achieving consensus between practitioners

15
Questions?

Evaluation
How do we know if we have a good system?
Practitioners to evaluate the effficiency and
reliability of the developed systems?

Thank You!

Write a Comment

User Comments (0)

About PowerShow.com

Knowledge Engineering and SemiAutomatic Population of Medical Ontologies Using NLP Methodologies - PowerPoint PPT Presentation

Knowledge Engineering and SemiAutomatic Population of Medical Ontologies Using NLP Methodologies

(Russel & Norvig, 1995) 4. Knowledge Engineering and Ontologies ... (Russel & Norvig 1995) 5. Medical Terminologies and Natural Language Processing (NLP) ... – PowerPoint PPT presentation