Determining the Syntactic Structure of Medical Terms in Clinical Notes PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Determining the Syntactic Structure of Medical Terms in Clinical Notes


1
Determining the Syntactic Structure of Medical
Terms in Clinical Notes
  • Bridget T. McInnes
  • Ted Pedersen
  • Serguei V. Pakhomov
  • bthomson_at_cs.umn.edu

2
Goal
  • The goal of this presentation is to present a
    simple but effective approach to identify the
    syntactic structure of three word terms

3
Importance
  • Potentially improve the analysis of unrestricted
    medical text
  • Mapping of medical text to standardized
    terminologies
  • Unsupervised syntactic parsing

4
Syntactic Structure of Terms
Monolithic
Non-branching
Right-branching
Left-branching
w1 w2 w3
w1 w2 w3
w1 w2 w3
w1 w2 w3
blue independence green dependence
5
Example
  • small bowel obstruction

6
Syntactic Structure of Example
  • small bowel obstruction

Monolithic
Non-branching
Right-branching
Left-branching
small bowel obstruction
small bowel obstruction
small bowel obstruction
small bowel obstruction
7
Method used to determine the structure of a term
  • The Log Likelihood Ratio is the ratio between the
    observed probability of a term occurring and the
    probability it would be expected to occur

Probability of Term Occurring --------------------
--------------- Expected Probability of Term
8
Log Likelihood Ratio
  • The expected probability of a term is often based
    on the Non-branching (Independence) Model

OBSERVED PROBABILITY
P(small bowel obstruction) -----------------------
------------ P(small) P(bowel) P(obstruction)
EXPECTED PROBABILITY
9
Extended Log Likelihood Ratio
  • The expected probabilities can be calculated
    using two other hypothesis (models)

Non-branching
Right-branching
Left-branching
P(small)P(bowel)P(obstruction)
P(small bowel) P(obstruction)
P(small) P(bowel obstruction)
10
Three Log Likelihood Ratio Equations
Non-branching
P(small bowel obstruction) -----------------------
------------ P(small) P(bowel) P(obstruction)
Right-branching
Left-branching
P(small bowel obstruction) -----------------------
------------ P(small bowel) P(obstruction)
P(small bowel obstruction) -----------------------
------------ P(small) P(bowel obstruction)
11
Expected Probability
  • The expected probability of a term differs as
    does the Log Likelihood Ratio

Non-branching
Right-branching
Left-branching
P(small) P(bowel) P(obstruction)
P(small bowel) P(obstruction)
P(small) P(bowel obstruction)
LL 5,169.81
LL 8,532.90
LL 11,635.45
12
Model Fitting
  • The model with the lowest Log Likelihood Ratio
    best describes the underlying structure of the
    term

Non-branching
Right-branching
Left-branching
P(small) P(bowel) P(obstruction)
P(small bowel) P(obstruction)
P(small) P(bowel obstruction)
LL 5,169.81
LL 8,532.90
LL 11,635.45
13
ReCap
  • The Log Likelihood Ratio is calculated for each
    possible model
  • Non-branching
  • Right-branching
  • Left-branching
  • The probabilities for each model are obtained
    from a corpus
  • The term is assigned the structure whose model
    has the lowest Log Likelihood Ratio

14
Test Set
  • Contains 708 three word terms from the SNOMED-CT

Monolithic
Non-branching
Right-branching
Left-branching
73 terms
6 terms
378 terms
251 terms
15
Test Set (cont)
  • Syntactic structure of each term was determined
    through the consensus of two medical text index
    experts (kappa 0.704)
  • The probabilities were obtained from over 10,000
    Mayo Clinic clinical notes

16
Monolithic Results
74.8
53.4
Percentage agreement with human experts
35.5
Technique
17
Results without Monolithic Terms
83.5
59.5
Percentage agreement with human experts
39.5
Technique
18
Limitations
  • Monolithic structures
  • possibly identify through collocation extraction
    or dictionary lookup
  • As the number of words in a term grows so does
    the number of hypothesis (models) to be evaluated
  • only consider adjacent models
  • limit the length of the terms to 5 or 6 words

19
Conclusions
  • Present a simple but effective method to identify
    the structure of three word terms
  • The method uses the Log Likelihood Ratio
  • Could be extended to identify the structure of
    for four, five and six word terms

20
Future Work
  • Improve accuracy of method
  • explore other measures of association
  • Chi-squared, Phi, Dice coefficient ...
  • incorporate multiple measures together
  • Extend our method to four and five word terms
  • difficulty finding a test set

21
Thank you
  • Software
  • Ngram Statistic Package (NSP)
  • www.d.umn.edu/tpederse/nsp.html
  • Log Likelihood Ratio Models
  • www.cs.umn.edu/bthomson/mti.html

22
Log Likelihood Equation
  • 2 ?xyz ( nxyz log(nxyz / mxyz) )

23
Expected Values
  • 2 ?xyz ( nxyz log(nxyz / mxyz) )
  • Non-branching mxyz nx ny nz / n
  • Left-branching mxyz nxy nz / n
  • Right-branching mxyz nx nyz / n
Write a Comment
User Comments (0)
About PowerShow.com