Knowledge Engineering and SemiAutomatic Population of Medical Ontologies Using NLP Methodologies - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Knowledge Engineering and SemiAutomatic Population of Medical Ontologies Using NLP Methodologies

Description:

(Russel & Norvig, 1995) 4. Knowledge Engineering and Ontologies ... (Russel & Norvig 1995) 5. Medical Terminologies and Natural Language Processing (NLP) ... – PowerPoint PPT presentation

Number of Views:250
Avg rating:3.0/5.0
Slides: 17
Provided by: Ralf92
Category:

less

Transcript and Presenter's Notes

Title: Knowledge Engineering and SemiAutomatic Population of Medical Ontologies Using NLP Methodologies


1
Knowledge Engineering and Semi-Automatic
Population of Medical Ontologies Using NLP
Methodologies
  • Munich 11.06.2007
  • Pinar Oezden Wennerberg
  • pinar.oezden_at_jrc.it

2
Agenda
  • Knowledge Engineering and Ontology
  • Definitions, methodologies, guidelines
  • Medical Terminology and Natural Language
    Processing (NLP)
  • The problem of medical terminology
  • The context users, tasks, types of information
    in the medical domain
  • The role of NLP and knowledge engineering
  • Motivation for Semi-Automatic Ontology Population
  • The knowledge acquisition bottleneck
  • Vast amount of knowledge available in (un- /
    semi-)structured text, WWW, databases etc.
  • One example approach
  • Ontology population via Supervised Machine
    Learning (ML)
  • Challenges

3
Knowledge Engineering and Ontologies
  • Some Definitions
  • Humans and software agents need knowledge about
    the world in order to reach good decisions
  • Such knowledge is typically stored in
    knowledgebases
  • Knowledge engineering is the process of building
    a knowledgebase
  • A knowledge engineer is someone, who
  • investigates a particular domain,
  • determines what concepts and relations are
    important in that domain,
  • and creates a formal representation of objects
    and relations in that domain.
  • (Russel Norvig, 1995)

4
Knowledge Engineering and Ontologies
  • An ontology specifies a finite, controlled,
    extensible and machine processable vocabulary for
    a given knowledgebase
  • Consists of concepts, properties, relations,
    axioms
  • Knowledge engineering guidelines
  • Decide what to talk about and on the vocabulary,
  • Encode general knowledge and a specific problem
    case
  • Execute queries and verify inference
  • (Russel Norvig 1995)

5
Medical Terminologies and Natural Language
Processing (NLP)
  • Problem statement
  • Numerous heterogenious medical terminologies and
    coding schemes exist that need to interoperate
  • e.g. Systemized Nomenclature of Medicine (SNOMED)
    for coding paptient notes, ICD (International
    Classification of Diseases), ICD-9-CM for billing
    purposes,RIZIV, IDEWE, ICPC-2, ATC etc.
  • Existing efforts UMLS, Galen, MeSH, etc.

6
Medical Terminologies and Natural Language
Processing (NLP)
  • Definition of context
  • Information types to be collected are about
  • Individuals (e.g. medical records)
  • Groups (e.g. data about epidemiology, public
    health)
  • Institutions (e.g. planning, management in
    hospitals, clinics)
  • Domain specific knowledge (e.g. state-of-the-art
    publications, proceedings)
  • Domain relevant tasks
  • Data entry, query and retrieval about patients
  • Information sharing and integration from
    different applications and medical records

7
Medical Terminologies and Natural Language
Processing (NLP)
Information Extraction
Knowledge Representation and Reasoning
  • Question
  • Answering

Natural Language Processing
Machine Learning
Information Retrieval
Knowledge Discovery, Text Mining
Ontology Engineering
Adapted from Jena University www.julielab.de
8
Motivation for Semi-Automatic Ontology Population
  • The knowledge acquisition bottleneck
  • Ideally the knowledge engineer interviews the
    knowledge expert to get educated about the domain
    i.e. to acquire knowledge
  • ? expensive in time and resources
  • ? domain experts not alwaysavailable
  • Availability of vast amount knowledge
  • In resources such as medical databases, journals,
    publications, conference proceedings, medical
    reports etc.
  • World Wide Web

9
Ontology Population via Supervised Machine
Learning
  • Problem statement
  • Identify and extract relevant knowledge (terms,
    phrases, relations, facts) in text e.g.
  • Terms health disorder, malfunction,
    sickness, illness, maladie, Krankheit ?
    Disease
  • Smoking causes cancer ? ltSmoking, Cancergt
  • Goal
  • Assign them to the appropriate concepts of the
    ontology as instance
  • Concept Disease
  • Relation causes

10
Ontology Population via Supervised Machine
Learning
  • Processes
  • Annotate (i.e. supervised)
  • ltCAUgtSmokingltCAU/gt ltCAU-Rgtcauseslt/CAU-Rgt
    ltDISgtcancerlt/DISgt
  • CAU DiseaseCause, CAU-R causalRelation, DIS
    Disease
  • Learn and extract from a training set (i.e. ideal
    world)
  • Extract from the test set (i.e. unknown world)
  • Apply the learned rules on new documents to
    discover and extract new knowledge

11
Ontology Population via Supervised Machine
Learning
  • Learn and extract from a training set (i.e. ideal
    world)
  • Recognize syntactic constructs such as NPs, VPs,
    PPs
  • Generate extraction rules
  • Rule for concept Disease
  • Disease- ltNP smokinggtltVP causesgtltNP DIS gt
  • Rule for concept DiseaseCause
  • DiseaseCause- ltNP CAUgtltVP causesgtltNP cancer
    gt
  • Rule for relation causalRelation
  • causalRelation- ltNP smokinggtltVP CAU-RgtltNP
    cancer gt
  • Classify
  • Disease cancer
  • DiseaseCause smoking
  • causalRelation causes

12
Ontology Population via Supervised Machine
Learning
  • Possible problems
  • More than one value was extracted for a given
    relation
  • Entities from different classes were extracted
    (multiple concept assignment i.e. ambiguity)
  • Nothing was extracted
  • Possible solutions
  • Present the user all possible values, let the
    user decide
  • To assist user with the decision process by
    assigning confidence scores to possible values
  • i.e. how much does the system believe what it
    suggests is relevant/true
  • Provide context information via text highlighting
    to justify the systems confidence
  • Provide empty data entry slots for users to enter
    their knowledge

13
Challenges
  • General challenges
  • It is difficult to eliminate the knowledge
    acquisition problem entirely
  • Due to the sensitivity of the domain (human
    health) the knowledge experts cannot be
    completely avoided
  • Computer scientists need to work together with
    domain experts to a certain extent
  • Systems should be able to be used by
    non-technicians
  • Multilinguality
  • Healthcare workers, patients, administrators
    should be able to have access to information in
    their own language

14
Challenges
  • Knowledge/ontology engineering specific
    challenges
  • Implicit information (typical for natural
    language) i.e. not machine-processable (not
    explicit)
  • Different levels of detail (granularity) is
    required to meet different expectations
  • i.e. provide sufficient detail but abstract away
    irrelevencies
  • Poly-hierachies to support multiple views
  • may lead to ambiguities, contradictions
  • Adaptability, extensibility for changing user
    demands and for standards
  • Expressibility vs. computational tractibility
  • Achieving consensus between practitioners

15
Questions?
  • Evaluation
  • How do we know if we have a good system?
  • Practitioners to evaluate the effficiency and
    reliability of the developed systems?

16
  • Thank You!
Write a Comment
User Comments (0)
About PowerShow.com