Infrastructure for Semantic Expansion and Curation of the RadLex Ontology - PowerPoint PPT Presentation

About This Presentation
Title:

Infrastructure for Semantic Expansion and Curation of the RadLex Ontology

Description:

Reduce variation and improve clarity in radiology reports. 11,962 terms over 12 categories ... Develop an automatic term extraction system. Focusing on Imaging ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 22
Provided by: Reb108
Category:

less

Transcript and Presenter's Notes

Title: Infrastructure for Semantic Expansion and Curation of the RadLex Ontology


1
Infrastructure for Semantic Expansion and
Curation of the RadLex Ontology
  • Rebecca Hazen Alexander van Esbroeck
  • Northwestern University
  • Dr. David Channin, Mentor

2
Background
  • RadLex - Radiology Lexicon
  • Reduce variation and improve clarity in radiology
    reports
  • 11,962 terms over 12 categories

3
Establishing the need
  • Missing many terms
  • Imaging Observations
  • Imaging Observation Characteristics
  • Committee dependent development process
  • Manual, time consuming, expensive
  • Larger lexicons are harder to manage
  • Difficult to sustain

4
Proposed Solution
  • Develop an automatic term extraction system
  • Focusing on Imaging Observation and
    Characteristics
  • Accelerate the expansion of RadLex
  • Decrease the demands on committees
  • Propose lists of strong candidates for inclusion
  • Reduce development costs

5
Processing System Description
  • Collect free full-text articles from medical
    journals
  • Identify new terms using LexEVS and NLP
    techniques
  • Create ranked lists of imaging observations and
    characteristics

6
Processing System Overview
LexEVS
Concepts/Relationships
Article Text
Article Finder
Candidate Term Identification
Data/Annotations
Ranked Lists of Imaging Observations Characteris
tics
Context Processing
7
LexEVS
  • LexEVS was developed by NCI, NIH, caBIG, Mayo
    Clinic
  • Designed to fulfill a community need for
    standards in storing, accessing, managing and
    distributing controlled vocabularies
  • Combination of LexBIG, LexGrid, EVS
  • Programmable interfaces for accessing and
    distributing controlled vocabularies
  • Provides a common API

8
UIMA Architecture
  • Framework for processing large collections of
    documents
  • Processing modules can be connected into
    pipelines

9
Article Finder
  • Locates and retrieves scientific articles
  • Searches PubMed
  • Returns free full-text, English, HTML articles.
  • Removes tags and extracts the article text

10
Articles Processed
  • 1,128 Documents
  • ImagingCTMRPETX-rayUSangiographytomography
    findings Title

11
Candidate Phrase Identification
  • Identifies a list of candidate phrases from the
    articles
  • Tokenizer
  • Part-of-speech Tagger
  • Linguistic Filter
  • Extracts sequences of words matching a specific
    pattern
  • Increased renal enhancement
  • -ed verb, adj, noun

12
LexEVS Annotator
  • Use LexEVS to access vocabularies
  • RadLex 2.0 NCI Thesaurus HL7 CTCAE
  • Determine if phrases exist in RadLex as a single
    concept
  • Retrieve vocabulary metadata
  • What us that
  • Annotate the document
  • Build database of annotations
  • Develop inclusion/exclusion criteria

13
LexEVS Annotator
14
Context Processing
  • Find indicator words that are associated with
    existing RadLex terms
  • Assign weights to those words as a function of
    the number of RadLex terms with which they are
    associated.

Focal confluent fibrosis can occur in the
cirrhotic liver as a hepatic mass in
approximately 14 of cases . This fibrosis
is accompanied by atrophy of the affected liver
parenchyma and retraction of the overlying
liver capsule (Figure 9 ).
15
Context Processing
  • Use those indicator words to identify new
    phrases
  • Score new phrases as a function of the strength
    of their association with the indicator words.

Less extensive findings included interlobular
septal thickening. Interlobular
septal thickening was seen in 32 patients
(89). A luminal mass was considered to be
present if there was a soft-tissue mass in the
lumen that arose from the
bowel wall.
16
Phrase Ranking
  • Calculate a termhood1 value for each phrase
  • Termhood is based on a combination of
  • Nesting
  • Context Scores
  • Length
  • Orthography
  • Stop List

1 termhood refers to the likelihood that a
candidate is a real term 2
17
Term Splitting
  • Phrases typically consist of an observation
    accompanied by one or more characteristics of
    that observation
  • Term splitting splits phrases into component
    characteristics and observations
  • Based on frequency ratios
  • Makes two new ranked lists

Candidate Term mediastinal soft tissue
infiltration
  • mediastinal
  • soft tissue
  • infiltration

18
Results
  • imaging observations
  • imaging observation characteristics
  • precision
  • Precision is defined as .

19
Conclusions
  • LexEVS is a powerful tool for exploiting a
    variety of controlled vocabularies
  • Automatic term extraction can identify new
    imaging observations and observation
    characteristics
  • Adjusting context and processing can lead to
    other kinds of terms
  • Broader searches for articles will lead to larger
    collections of terms

20
Future Work
  • Use syntactic structure to improve extraction
  • Automatic identification of relationships
  • Infrastructure for distributed editing
  • Semantic Wiki

21
Selected References
  • 1. Langlotz CP. RadLex a new method for indexing
    online educational materials. Radiographics. 2006
    Nov-Dec26(6)1595-7.
  • 2. Frantzi K, Ananiadou S, Mima H. Automatic
    recognition of multi-word terms the
    C-value/NC-value method. International Journal on
    Digital Libraries 2000 3(2)115-130.
  • 3. Baneyx A, Charlet J, Jaulent M. Building an
    ontology of pulmonary diseases with natural
    language processing tools using textual corpora.
    International Journal of Medical Informatics 2007
    76(2-3) 208-215.
  • 4. Zhou L, Tao Y, Cimino J, Chen E, Liu H,
    Lussier Y, Hripcsak G, Friedman C. Terminology
    model discovery using natural language processing
    and visualization techniques. Journal of
    Biomedical Informatics. 2006 39(6)626-636.
  • 5. Church K, Hanks P. Word association norms,
    mutual information, and lexicography.
    Computational linguistics 1990 16(1)22-29.
  • 6. Snow R, Jurafsky D, Ng A. Learning syntactic
    patterns for automatic hypernym discovery.
    Advances in Neural Information Processing Systems
    2005 171297-1304.

22
Example Query
  • NCI MetaThesaurus,
  • Cronkhite-Canada Syndrome,
  • exactMatch

23
Structure
24
References
  • Retrieve Candidate Terms
  • Query LexEVS
  • Selecting Scheme, Search Algorithm, Restrictions
  • Add Tags
  • Coding Scheme, Concept Code, etc
  • Store in database

25
UIMA Pipeline Lexicons
  • LexEVS Annotator
  • Marks existing observations and characteristics,
    as well as anatomic parts, treatments, etc.
  • Can be used for context information later on.
  • Filters out existing terms.

Mock-up of Annotation Results
26
Context Detection
  • Looks for the identified context words
    before/after candidate terms.
  • Calculates a context value for each term based on
    the frequency of context terms and their weights.

Sample Context Terms
27
Term Ranking Results
28
Room for Improvement
  • Improve precision of candidate term selection
  • Use context-term groups as classifiers.
  • Segment the article and identify term-rich
    areas
  • Normalize the context and inverse document
    frequency distributions.
  • Context phrases
  • Improve linguistic filter and use more POS
    information
  • Stop list additions
Write a Comment
User Comments (0)
About PowerShow.com