Converting Semi-structured Clinical Medical Records into Information and Knowledge - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Converting Semi-structured Clinical Medical Records into Information and Knowledge

Description:

Automatically extract information from semi-structured patient records. ... Each patient is either current smoker, former smoker, or nonsmoker. Texts ' ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 23
Provided by: NanZ1
Category:

less

Transcript and Presenter's Notes

Title: Converting Semi-structured Clinical Medical Records into Information and Knowledge


1
Converting Semi-structured Clinical Medical
Records into Information and Knowledge
  • Dr. Hyoil Han and Xiaohua Zhou
  • College of Information Science Technology
    Drexel University

2
Agenda
  • Problem Addressed
  • Methods
  • Approach to numeric values
  • Approach to medical terms
  • Approach to text classification
  • Implementation
  • Evaluation
  • Future Work

3
Problem Addressed
  • Descriptions
  • Automatically extract information from
    semi-structured patient records.
  • Three types of information
  • Number blood pressure, weight, pulse, etc.
  • Medical terms medical history, surgical history
  • Text classification smoking behavior, alcohol
    use, appearance, etc.
  • Each record consists of multiple sections
    beginning with fixed strings. Each section is
    written in natural language.

4
Problem Addressed (cont.)
  • Examples

5
Problem Addressed (cont.)
  • Examples

6
Approach to Numeric Values (1)
  • Number Identification
  • Tokenization
  • Named Entity Recognition
  • Concept Identification
  • String Match
  • Synonym Expansion
  • Association
  • Pattern-based association approach
  • Linkage-based association approach

7
Approach to Numeric Values (2)
  • Pattern-based Approach
  • Examples
  • CONCEPT is NUMBER
  • CONCEPT of NUMBER
  • CONCEPT, NUMBER
  • CONCEPT NUMBER
  • Very simple but has generalization problem.
  • Linkage-based Association Approach
  • Convert linkage diagram (produced by link grammar
    parser) to graph
  • Calculate the shortest distance of any pair of
    concept and number in a sentence.

8
Approach to Numeric Values (3)
  • Link Grammar Parser
  • Converts word to node, link to (weighted) edge
  • Assume that if a number is the value of a certain
    concept, the numbers shortest distance from the
    concept must be less than from any other concept
    in the sentence.

9
Approach to Medical Terms (1)
  • State of the Art
  • Current NER algorithms dont work well for
    medical terms identification
  • Ontology is important to achieve high accuracy of
    medical term extraction.
  • Search of any combination of sequence in sentence
    through ontology is not efficient.
  • Solution
  • POS-based Ordered Patterns Search

10
Approach to Medical Terms (2)
  • Flow
  • Part of speech tagging
  • Ordered Patterns Matching, for example
  • JJ NN NN
  • NN NN
  • JJ NN
  • NN
  • Normalization of the candidate term.
  • Search candidate term through Ontology (e.g.
    UMLS).

11
Approach to Text Classification (1)
  • Available Methods
  • Analytic approach
  • Machine learning
  • Decision tree is frequently used in natural
    language understanding
  • Examples
  • Each patient is either current smoker, former
    smoker, or nonsmoker.
  • Texts
  • She quitted smoking five years ago (former
    smoker)
  • She is currently a smoker (current smoker)
  • None (non-smoker)

12
Approach to Text Classification (2)
  • Word-based Boolean Feature Extraction
  • Choose one or multiple part of speeches verb,
    noun, adjective, and adverb.
  • Choose one or multiple sentence constituents
    subject, verb, object, and supplement.
  • Head noun or head adjective only. If this option
    is enabled, for noun phrase or adjective phrase,
    only head word is extracted.
  • Use lemma (uninfected form) of any word. If this
    option is enabled, denies, denied and deny
    will be treated as the same feature.

13
Approach to Text Classification (3)
  • ID3-based Decision Tree
  • The criteria for feature selection is maximum
    Information Gain (mutual information)
  • ID3 yield fewer features than other algorithms

14
Approach to Text Classification (4)
  • Example ID3-based Decision Tree for
    Classification of Smoking Behavior.

15
Implementation
16
Evaluation
  • 50 semi-structured patient records
  • The goal is to extract 24 attributes (18 fields),
    4 medical terms, 8 numbers, and 12 categorical
    attributes.
  • Measures
  • Precision is defined as the proportion of
    correctly extracted instances of those extracted.
  • Recall is the proportion of correctly extracted
    instances of total instances.

17
Evaluation of Numeric Attributes
  • The precisions (recall) for eight numeric
    attributes are all 100.
  • By examining all 50 records manually, we find
    that the extremely high precision is in part
    attributed to the very consistent writing style.
  • If the size of data set increases and diversified
    writing styles are introduced, the performance
    may be degraded.

18
Evaluation of Smoking Behavior
  • 45 cases, 5 former smokers, 12 current smokers,
    and 28 nonsmokers.
  • 5-folder cross-validation
  • Run experiments for 10 rounds. (For each round,
    data set is randomly shuffled.)
  • Average precision (recall) is 92.2
  • The number of features used ranges from 4 to 7)

19
Evaluation of Medical Terms (1)
  • Each attribute can have multiple values (medical
    terms).
  • Where
  • ETruei number of extracted true terms in i-th
    subject.
  • ETotali number of extracted terms in i-th
    subject.
  • TInsti number of total true terms in i-th
    subject.

20
Evaluation of Medical Terms (2)
  • Extracted false terms and unextracted true terms
    are mainly caused by the incompleteness of domain
    ontology
  • The low recall of predefined past surgical
    history and low precision of other past surgical
    history are due to failure to recognize the
    synonyms of predefined surgical terms and
    improper recognition of them as other surgical
    terms.

Attribute Name Precision Recall
Predefined Past Medical History 96.7 96.7
Other Past Medical History 76.1 86.4
Predefined Past Surgical History 77.8 35
Other Past Surgical History 62.0 75
21
Future Work
  • Test our work on larger data set
  • A generic framework for any concept associations
  • Medical Terms Extraction
  • Ontology selection
  • The use of synonym and semantic type
  • Text Classification
  • How to deal with categories containing numeric
    threshold information

22
Questions
  • ?
Write a Comment
User Comments (0)
About PowerShow.com