A Comparative Study of Supervised Learning as Applied to Acronym Expansion in Clinical Reports - PowerPoint PPT Presentation

About This Presentation
Title:

A Comparative Study of Supervised Learning as Applied to Acronym Expansion in Clinical Reports

Description:

American Medical Informatics Association. Association of Moving Image Archivists ... sense 1: American Eagle. sense 2: Arab Emirates. sense 3: acronym expansion. WSD ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 21
Provided by: Ted9153
Learn more at: https://www.d.umn.edu
Category:

less

Transcript and Presenter's Notes

Title: A Comparative Study of Supervised Learning as Applied to Acronym Expansion in Clinical Reports


1
A Comparative Study of Supervised Learning as
Applied to Acronym Expansion in Clinical
Reports
  • Mahesh Joshi, Serguei Pakhomov,
  • Ted Pedersen, Christopher G. Chute
  • University of Minnesota, Duluth
  • Mayo College of Medicine, Rochester

2
Overview
  • Acronyms are ambiguous
  • in general, and in more specialized domains
  • Acronyms can be disambiguated by expansion
  • expansions act as senses or definitions
  • Acronym expansion can be viewed as word sense
    disambiguation
  • supervised learning from annotated examples
  • Features trump learning algorithms
  • unigrams dominant

3
AMIA - Top Google Results
  • American Medical Informatics Association
  • Association of Moving Image Archivists
  • Anglican Mission in America
  • Associcion Mutual Israelita Argentina

4
RN in Wikipedia
  • Registered Nurse
  • Royal Navy
  • Radio National
  • Radio Nederland
  • Richard Nixon
  • Registered Identification Number
  • Renovacion Nacional

5
Acronym Ambiguity not just a problem for General
English
  • 33 of Acronyms in UMLS are ambiguous
  • Liu et. al. AMIA-2001
  • 81 of Acronyms in MEDLINE abstracts are
    ambiguous, with an average of 16 expansions
  • Liu et. al. AMIA-2002

6
We view AE as WSD
  • AE
  • sense 1 American Eagle
  • sense 2 Arab Emirates
  • sense 3 acronym expansion
  • WSD
  • sense 1 Washington School for the Deaf
  • sense 2 web server director
  • sense 3 word sense disambiguation

7
Methodology
  • Identify 16 ambiguous acronyms
  • 9 from Pakhomov, et. al. AMIA-2005
  • 7 newly annotated for this this study
  • Manually annotate in clinical notes
  • 7,738 total instances from Mayo Clinic database
    of clinical notes
  • Use as training data for supervised learning

8
Acronyms (majority
  • AC
  • Acromioclavicular
  • Antitussive with Codeine
  • Acid Controller
  • 10 more
  • APC
  • Argon Plasma Coagulation
  • Adenomatous Polyposis Coli
  • Atrial Premature Contraction
  • 10 more expansions
    • LE
    • Limited Exam Lower Extremity
    • Initials
    • 5 more expansions
    • PE
    • Pulmonary Embolism
    • Pressure Equalizing
    • Patient Education
    • 12 more expansions

    9
    Acronyms (50
  • CP
  • Chest Pain
  • Cerebral Palsy
  • Cerebellopontine
  • 19 more expansions
  • HD
  • Huntington's Disease
  • Hemodialysis
  • Hospital Day
  • 9 more expansions
  • CF
  • Cystic Fibrosis
  • Cold Formula
  • Complement Fixation
  • 6 more expansions
    • MCI
    • Mild Cognitive Impairment
    • Methylchloroisothiazolinone
    • Microwave Communications, Inc.
    • 5 more expansions
    • ID
    • Infectious Disease
    • Identification
    • Idaho Identified
    • 4 more expansions
    • LA
    • Long Acting
    • Person
    • Left Atrium
    • 5 more expansions

    10
    Acronyms (majority 80)
    • MI
    • Myocardial Infarction
    • Michigan
    • Unknown
    • 2 more expansions
    • ACA
    • Adenocarcinoma
    • Anterior Cerebral Artery
    • Anterior Communication Artery
    • 3 more expansions
    • GE
    • Gastroesophageal
    • General Exam
    • Generose
    • General Electric
    • HA
    • Headache
    • Hearing Aid
    • Hydroxyapatite
    • FEN
    • Fluids, Electrolytes and Nutrition
    • Drug Fen Phen
    • Unknown
    • NSR
    • Normal Sinus Rhythm
    • Nasoseptal Reconstruction
    • FEN
    • Fluids, Electrolytes and NutritionDrug
    • Fen Phen
    • Unknown
    • NSR
    • Normal Sinus Rhythm
    • Nasoseptal Reconstruction

    11
    Experimental Objectives
    • Compare performance of ML methods
    • Naïve Bayesian classifier
    • J48/C4.5 Decision Tree Learner
    • Support Vector Machine (SMO)
    • Compare four different feature sets
    • POS tags from Brill-Hepple Tagger
    • Unigrams that occur 5 or more times
    • flexible window of size 5 around target
    • Bigrams that occur 5 or more times
    • flexible window of size 5 around target
    • Unigrams Bigrams POS Tags

    12
    Feature Extraction
    • Horizon up to 5 content words to left and right
      of target
    • Boundaries cross sentences, but not clinical
      notes
    • Skip stop words
    • Bigrams are pairs of contiguous content words
    • Example (CF is target)
    • Unigrams If she is found to be a carrier, then
      they will follow with CF carrier testing in her
      husband.
    • Bigrams If she is found to be a carrier, then
      they will follow with CF carrier testing in her
      husband.

    13
    Results (majority 14
    Results (50 15
    Results (majority 80)
    16
    Results (flexible window)
    17
    Conclusions
    • Overall expansion accuracy at or above 90
      regardless of distribution
    • Differences in accuracy are largely due to
      features, not ML algorithms
    • Addition of bigrams and POS tags helps
      performance, but unigrams dominant
    • Flexible window improves upon fixed window
      feature selection

    18
    Future Work
    • Expand all acronyms in a text, not just select
      few
    • expand based on prior expansions
    • utilize one sense per discourse constraint
    • Integrate supervised methods with knowledge based
      approaches and clustering methods to reduce need
      for annotated examples

    19
    Acknowledgments
    • We would like to thank our annotators Barbara
      Abbott, Debra Albrecht and Pauline Funk.
    • This work was supported in part by the NLM
      Training Grant (T15 LM07041-19) and the NIH
      Roadmap Multidisciplinary Clinical Research
      Career Development Award (K12/NICHD)-HD49078.
    • Dr. Pedersen has been partially supported by a
      National Science Foundation Faculty Early CAREER
      Development Award (0092784).

    20
    Software Resources
    • GATE (General Architecture for Text Engineering)
    • http//gate.ac.uk/
    • NSPGate
    • http//nspgate.sourceforge.net/
    • Ngram Statistics Package
    • http//ngram.sourceforge.net/
    • WSDGate
    • http//wsdgate.sourceforge.net/
    • WEKA (Waikato Environment for Knowledge Analysis)
    • http//www.cs.waikato.ac.nz/ml/weka/
    Write a Comment
    User Comments (0)
    About PowerShow.com