Using the UMLS MetaMap as a Cause of Death Analyzer - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Using the UMLS MetaMap as a Cause of Death Analyzer

Description:

Possible uses of a preliminary COD classification using automated methods that ... 985,330 unique literals (phrases) in all COD fields ... – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 34
Provided by: naph6
Category:
Tags: metamap | umls | analyzer | cause | cod | death | using

less

Transcript and Presenter's Notes

Title: Using the UMLS MetaMap as a Cause of Death Analyzer


1
Using the UMLS MetaMap as a Cause of Death
Analyzer
  • Michael Hogarth, MDMichael Resendez, MSUniv. of
    California, Davis

2
Overview
  • Causes of Death A Historical Perspective
  • Overview of the California EDRS
  • Cause of Death Analysis tool (BECA)
  • NLM MetaMap and the UMLS
  • BECA-MetaMap experiment
  • Discussion

2
3
Historical Perspectives on causes of death
  • Bills of Mortality (1532)
  • Arose from the need to better understand death
    rates in medieval England -- plague
    epidemics(1361,1368,1375,1390,1406, )
  • John Graunt (1620-74)
  • Used the Bills of Mortality and found an infant
    death rate of 36 in England -- not previously
    known or understood
  • London Bills of Mortality classification
  • Used by Dr. John Snow to characterize a cholera
    outbreak traced to a water source in London
  • Evolved to become the Intl. Classification of
    Disease (1850s)
  • International Classification of Disease(ICD) --
    used for the last 150 years

4
CA-EDRS Causes of Death
4
5
Causes of Death
  • Importance
  • key epidemiological information is contained in
    the cause of death
  • Issues and Challenges
  • absolutely correct versus close to correct
  • absolute correctness requires significant
    time/effort and manual effort
  • is close to correct in an automated fashion
    still useful?
  • Typical process in California
  • COD --gt SuperMICAR --gt Stat Master File
  • turnaround for entire process can be lengthy (2
    years)
  • could have a trend in causes of death and it
    would not be known by local jurisdictions for 2
    years.
  • Today in California
  • a significant number of jurisdictions today dont
    wait for the final statistical files from the
    State office to look at trends --- they
    manually code (if they have the staff) --
    takes time and funding

5
6
Preliminary COD classification
  • Possible uses of a preliminary COD classification
    using automated methods that are close to
    correct
  • early identification of trends in a local
    jurisdiction
  • disease vs. injury/poisoning -- coroner referral
    cross-checking
  • identify specific infectious causes
    (encephalitis, cholera, etc..)
  • What it is not
  • not for absolutely correct cause of death
    classification
  • will not replace the nosologists expertise in
    understanding the sequence of events leading to
    death nor their understanding of ICD-10, with its
    includes/excludes

7
How to analyze causes of death?
  • Challenges
  • text is verbatim and thus arbitrary (free text)
  • need to go beyond simple keyword matching
  • biomedical knowledge and content is vast -- and
    constantly changing!
  • A possible approach - text mining and
    computational linguistic techniques

8
BECA
  • We built BECA, a generic concept analyzer
    framework that can incorporate any concept
    identifier engine such as NLM MetaMap and other
    text processing tools
  • BECA BECA Enables Concept Analysis
  • Supports a plug-in design for the concept
    matcher and other components (ie, spell checker)
  • Designed to support multiple transformations of
    the text in step-by-step fashion
  • transformations -- strip special characters,
    lower case, run it through the concept matcher
    engine (MetaMap or other), run it through an
    available spell checker (jazzy spell, etc..)
  • example transformations
  • convert to lowercase, remove all punctuation, map
    string using concept mapper, etc..
  • First version of BECA uses the NLM MetaMap as a
    concept mapper

9
BECA system design
10
Example transformations
10
11
What is NLM MetaMap?
  • The National Library of Medicines MetaMap
  • a free, open source software component built by
    the NLM Lister Hill Laboratory
  • uses computational linguistic techniques to map
    biomedical text to a large corpus of biomedical
    content (the NLM Unified Medical Language System)
  • Provides a number of text processing functions
  • Includes a concept mapper that attempts to
    match phrases with concepts in the UMLS
    Metathesaurus
  • Includes a UMLS concept-to-code mapping for
    multiple coding systems (ICD, SNOMED, etc..)

11
12
How does MetaMap work?
  • Takes text as input and attempts to identify
    concepts in the text and match them to concepts
    in a large corpus of phrases and concepts in
    biomedicine (UMLS Metathesaurus)
  • The retrieved candidate matches include a score
    that reflects how sure it believes the match is
    correct
  • The candidates retrieved include their semantic
    type
  • Disease or Syndrome, Injury or Poisoning,
    etc...

12
13
The UMLS
  • Developed by the National Library of Medicine
  • Derived from over 100 sources (ICD, SNOMED,)
  • The Unified Medical Language System
  • A system built to support information retrieval
    in biomedicine
  • Used in PubMed, ClinicalTrials.gov, etc..
  • Consists of
  • (1) UMLS Metathesaurus
  • (2) UMLS Semantic Network
  • (3) UMLS SPECIALIST Lexicon

14
UMLS in detail
  • UMLS Metathesaurus -- the worlds largest
    repository of biomedical phrases
  • 1.3 million concepts, 6.4 million unique phrases
    (concept names)
  • over 100 source vocabularies (ICD,SNOMED,CPT,
    etc..)
  • UMLS SPECIALIST LEXICON
  • a file that provides individual words found in
    the UMLS metathesaurus and their linguistic
    information including grammatical type (noun,
    verb, adjective, adverb, etc..)
  • UMLS Sematic Network
  • a set of files that classify the metathesaurus
    concept into a particular type
  • Examples -- Disease, Injury/Poisoning,
    Neoplasm, ..

15
MetaMap Algorithm
  • MetaMaps algorithm consists of four steps
  • (1) Parsing
  • using a part-of-speech tagger text is decomposed
    into one or more noun phrases
  • ocular complications of myasthenia gravis gt
    ocular complications and myasthenia gravis.
  • noun phrases are processed independently by
    decomposing them into their grammatical origins
  • ocular complications gt modifier ocular and
    head of the phrase complications
  • (2) Variant Generation -- variants for each
    phrase are generated using SPECIALIST
  • variants -- all synonyms of the term, acronyms
    containing the term, abbreviations,
    plural/singular variants
  • each variants has a distance score obtained
    from SPECIALIST
  • ocular - eye, eyes, optic, opthalmic,
    opthalmia, oculus, oculi

15
16
MetaMap Algorithm
  • MetaMap Algorithm continued
  • (3) Candidate Retrieval from Metathesaurus
  • all metathesaurus strings that have at least one
    of the variants is retrieved
  • can exclude those where the variant is present in
    a large number of strings (ie, very common
    string)
  • (4) Candidate evaluation -- the MMTX score
  • each metathesaurus candidate is evaluated by
    calculating the strength of the similarity
    between the original input phrase and the
    candidate phrase from metathesaurus
  • the calculation involves a weighted average of
    four metrics including distance scores for
    variants from input noun phrase(variation),
    whether the phrase is part of the head
    (centrality), , coverage and cohesiveness

16
17
Example
  • BECA MetaMap output
  • Input phrase ocular complications

17
18
The question
  • ?Can BECA using the NLM MetaMap be useful in
  • Identifying biomedical concepts in a cause of
    death literal, which is narrative text.
  • auto-coding literals into ICD-10 codes

18
19
Cause of Death Literals in CA-EDRS
  • CA-EDRS data is a combination of records
    initiated in EDRS (EDRS counties) and those
    submitted on paper (non EDRS counties)
  • Causes of death are verbatim from the certifier
    and typically entered into EDRS or the typed on a
    paper certificate by funeral home staff or
    hospital staff
  • Overall COD statistics for CA-EDRS
  • 462,564 registered death certificates
  • 985,330 unique literals (phrases) in all COD
    fields
  • 88,719 unique literals (phrases) in the Immediate
    Cause of Death field

19
20
Experiment
  • We randomly selected 1,000 literals from the
    88,719 unique literals in the Immediate Cause of
    Death field
  • We submitted these as is to BECA (MetaMap, no
    spell checking component)
  • BECA returned 7.9 candidate matches per literal
    (7,791 candidates for 1,000 strings)
  • Candidate scores ranged from 517 - 1000
  • Match score distribution for the 7,791 candidates

20
21
Example Output
21
22
Literals with high score matches gt800
22
23
High Score Candidate Matches
  • 3,017 (38.7) of the 7,791 candidates had a score
    gt800
  • 95.3 of the original literals (953/1000) had at
    least one candidate with a match scoregt800
  • 54.5 of the original literals (545/1000) had at
    least one candidate with a match scoregt900
  • 30.7 of the original literals (307/1000) had at
    least one candidate with a match score1000
  • Note only 7.5 were the exact string as found
    the UMLS Metathesaurus
  • Match score distribution for the 3,017 candidates

24
Semantic Type correct matches
  • BECA with MetaMap correctly categorized 720 (72)
    of the literals by semantic type
  • Of these, Neoplastic Process had the highest
    reliability

24
25
Wrong matches
  • Semantic types most frequently in error

25
26
ICD-10 Coding
  • 252 of the 1,000 (25.2) literals had an ICD-10
    matched by BECA-MetaMap
  • Categories
  • 1 good match
  • 2 approximate match (within ICD category)
  • 0 incorrect code
  • Results - 97 were good or approximate
  • 82.5 good match
  • 14.3 approximate match
  • 3.2 incorrect match

26
27
ICD-10 Autocoding data
27
28
Some interesting challenges
  • CSTFIOTRDPIRATORY FAILURE
  • CHRONIC ALCOHOLISHM
  • ESOPHAGELA VARICES
  • END STAGE RENAL DOSEASE
  • HEAR FAILURE
  • OVARION CANCER WITH METASTASES
  • LUNF CARCINOMA, METASTATIC
  • PENDING TOX MICRO
  • SEPTIC SHOCK


28
29
Discussion
  • MetaMap may be useful for preliminary
    categorization of causes of death by semantic
    type
  • Excluding certain semantic types would improve
    match precision (at the cost of lower of
    matches)
  • BECA-MetaMap only assigned an ICD-10 code 25.2
    of the time
  • If BECA-MetaMap assigned an ICD-10 code, it was
    correct over in 83 of cases, and near correct in
    97 of cases
  • We found that MetaMap was confused if
  • there are multiple concepts (noun phrases) in a
    single string
  • the phrase has a compound statement (metastasis
    to brain and bone or gunshot wounds of the head
    and right arm
  • the phrases begin with certain words (ie,
    complications, etc...)

29
30
Future Directions for BECA
  • Build a new concept mapper to replace MetaMap,
    and specifically design it to analyze causes of
    death phrases
  • include a spell checker
  • disambiguation for phrases that have compound
    statements
  • match SNOMED first, then match to ICD-10
    (increases the hit rate for ICD-10 autocoding)
  • improve performance
  • implement for ICD-10 includes/excludes using an
    open source rules engine (jBoss Rules Engine)

30
31
Credits
  • National Library of Medicine, Lister Hill Lab
  • University of California
  • Michael Resendez, MS
  • Cecil Lynch, MD, MS
  • California Department of Health (California
    Department of Public Health)
  • Terry Trinidad
  • David Fisher
  • Debbie McDowell

31
32
California EDRS
  • Developed by the University of California and
    California DHS (2004-2005)
  • Implementation (2005 - 2008)
  • all death certificates entered into EDRS since
    Jan 1, 2005
  • full EDRS (implemented counties)-- DC originates
    in EDRS and electronically completed locally
  • KDE EDRS (non-EDRS counties) -- DC completed in
    standard paper fashion, eventually entered by
    State office into EDRS
  • June 2007 - where are we?
  • today --gt 510,000 certificates (2005 - present)
  • Originate locally (EDRS records) or are entered
    later into EDRS (non-EDRS records)
  • Today, June 2007, 65 originate locally as EDRS
    electronic
  • By Nov 2007 over 90 of all CA records will
    originate in EDRS

33
Cause of Death Workflow with CA-EDRS
  • CA-EDRS does not provide electronic support for
    gathering of the COD today

certifier and funeral home exchange
(fax) worksheet
Once COD is finalized by certifier, funeral home
staff create EDRS record and enters them
Write a Comment
User Comments (0)
About PowerShow.com