An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain - PowerPoint PPT Presentation

About This Presentation
Title:

An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain

Description:

Integrating Data for Analysis, Anonymization, and Sharing An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 49
Provided by: WendyC81
Category:

less

Transcript and Presenter's Notes

Title: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain


1
An NLP Ecosystemfor Development and Use of
Natural Language Processing in the Clinical
Domain
Integrating Data for Analysis, Anonymization, and
Sharing
  • Wendy W. Chapman, PhD

Division of Biomedical Informatics University of
California, San Diego
2
Overview
  • The promise of natural language processing (NLP)
  • Challenges of developing NLP in the clinical
    domain
  • Challenges in applying NLP in the clinical domain
  • iDASH
  • Opportunities for sharing and collaboration in NLP

3
NLP Success
  • Fresh off its butt-kicking performance on
    Jeopardy!, IBMs supercomputer "Watson" has
    enrolled in medical school at Columbia
    University, New York Daily News February 18th
    2011

IBM's computer could very well herald a whole
new era in medicine." ComputerWorld February 17,
2011
Dr. Watson??
4
Clinical NLP Since 1960s
  • Why has clinical NLP had little impact on
    clinical care?

5
Barriers to Development
  • Sharing clinical data difficult
  • Have not had shared datasets for development and
    evaluation
  • Modules trained on general English not sufficient
  • Insufficient common conventions and standards for
    annotations
  • Data sets are unique to a lab
  • Not easily interchangeable

6
  • Limited collaboration
  • Clinical NLP applications silos and black boxes
  • Have not had open source applications
  • Reproducibility is formidable
  • Open source release not always sufficient
  • Software engineering quality not always great
  • Mechanisms for reproducing results are sparse

7
Overview
  • The promise of natural language processing (NLP)
  • Challenges of developing NLP in the clinical
    domain
  • Challenges in applying NLP in the clinical domain
  • Developing an NLP ecosystem on iDASH

8
Security Privacy Concerns
  • Clinical texts have many patient identifiers
  • 18 HIPAA identifiers
  • Names
  • Addresses
  • Items not regulated by HIPAA
  • tight end for the Steelers
  • Unique cases
  • 50s-year-old woman who is pregnant
  • Sensitive information
  • HIV status

Institutions are reluctant to share data
9
  • Lack of user-centered development and scalability
  • Perceived cost of applying NLP outweighs the
    perceived benefit (Len DAvolio)

10
Overview
  • The promise of natural language processing (NLP)
  • Challenges of developing NLP in the clinical
    domain
  • Challenges in applying NLP in the clinical domain
  • Developing an NLP ecosystem on iDASH

11
iDASH
  • integrating Data
  • Analysis
  • Anonymization
  • Sharing

Data
Software/Tools
Computational Resources
12
Disincentives to Share
  • Scooping by faster analysts Exposure of
    potential errors in data
  • Resources for preparing data submissions
  • Maintaining data
  • Interacting with potential users takes time
  • Threat of privacy breach when human subjects are
    involved
  • Do not have policies in place
  • Fallible de-identification, anonymization
    algorithms

13
nlp-ecosystem.ucsd.edu
14
HIPAA /or FISMA Compliant Cloud
DigitalInformed consent
  • Access control
  • De-identification
  • Query counts
  • Artificial data generators

Privacy preserving
Informed Consent Registry
Customizable DUAs
Researcher access
15
Bibliography
Schemas
Tutorials
Research
Guidelines
Resources
Education
NLP Ecosystem
UCSD Clinical Data
Data
Evaluation Workbench
De-Identification
MT Samples
Tools Services
Collaborative Development Tools
TxtVect
Virtual Machines
Annotation Admin eHOST
Registry
16
Collaborative Effort to Build Ecosystem
Evaluation Workbench
De-Identification
Tools Services
Collaborative Knowledge Authoring
TextVect
Increase access to NLP
Virtual Machines
Annotation Environment
Decrease Burden of Developing NLP
Registry
17
orbit
  • Increase ability to find NLP tools

18
Registry orbit.nlm.nih.gov
Len DAvolio, Dina Demner-Fushman
19
De-identification service
  • Increase access to clinical text

20
De-identification
  • Several available de-identification modules
  • Need to adapt to local text
  • Efficient
  • Secure
  • Customizable ensemble de-identification system
  • Build a de-identified corpus
  • Incorporate existing de-id modules
  • Launch as virtual machine
  • Iterative training, evaluation, and modification
    by user
  • Correct mistakes
  • Add regular expressions

Brett South, Stephane Meystre, Oscar Fernandez,
Danielle Mowery
21
TextVect
  • Increase access to textual features

22
TextVect
NLM Abhishek Kumar
23
collaborative Knowledge Authoring Support Service
(cKass)
  • Decrease the Burden of Customizing an NLP
    Application

24
Customizing an IE App
Users Concepts Cough Dyspnea Infiltrate on
CXR Wheezing Fever Cervical Lymphadenopathy
IE Output
Map
25
Customizing an IE App
Users Concepts Cough Dyspnea Infiltrate on
CXR Wheezing Fever Cervical Lymphadenopathy
IE Output Dry cough Productive
cough Cough Hacking cough Bloody cough
Which concepts?
26
Customizing an IE App
Users Concepts Cough Dyspnea Infiltrate on
CXR Wheezing Fever Cervical Lymphadenopathy
IE Output Temp 38.0C Low-grade temperature
What is a fever?
27
Customizing an IE App
Users Concepts Cough Dyspnea Infiltrate on
CXR Wheezing Fever Cervical Lymphadenopathy
IE Output NECK no adenopathy Disorder
adenopathy Negation negated
Section mapping
28
KOS-IEKnowledge Organization Systems for
Information Extraction
29
Compile information helpful for IE
30
Collaborative Knowledge Base Development cKASS
NLP Tools
  • Physician
  • Radiologist
  • Nurse
  • Clinical Researcher
  • Knowledge Engineer.

Decision Support System
User KB
Shared KB
External KB
LQ Wang, M Conway, F Fana, M Tharp, D Hillert
31
Knowledge Authoring
  • Augment user KB with lexical variants, synonyms,
    and related concepts
  • User-driven authoring
  • Top-down Provide access to external knowledge
    sources
  • UMLS, Specialist Lexicon, Bioportal
  • Bottom-up Annotate to derive synonyms
  • Recommendation-based authoring
  • Generate lexical variants
  • Mine external knowledge sources
  • Mine patient records

32
Evaluation workbench
  • Decrease the Burden of Evaluation Error Analysis

33
Evaluation Workbench
  • Compare the output of two NLP annotators on
    clinical text
  • NLP system vs human annotation
  • View annotations
  • Calculate outcome measures
  • Drill down to all levels of annotation
  • Document-level
  • Perform error analysis
  • Future versions will support formal error
    analysis

34
Levels of Annotation
  • Document
  • Report classified as Shigellosis
  • Group
  • Section classified as Past Medical History
    Section
  • Utterance
  • Group of text classified as Sentence
  • Snippet
  • chest pain classified as CUI 058273
  • Word
  • pain classified as noun)
  • Token
  • . classified as EOS marker

35
Select Classifications to View
Document annotations
Outcome Measures for Selected Annotations
Report List
Attributes for Selected Annotation
Relationships for Selected Annotation
VA and ONC SHARP Christensen, Murphy, Frabetti,
Rodriguez, Savova
36
Annotation Environment
  • Decrease the Burden of Annotation

37
Challenges to Annotating
  • Time consuming
  • Recruiting training annotators for high
    agreement
  • Expensive
  • Domain experts especially expensive
  • Need for annotation by multiple people
  • Challenging to design annotation task
  • How many annotators?
  • How should I quantify quality of annotations?
  • Logistically challenging
  • Managing files and batches of reports
  • Setting up annotation tool
  • Reinventing the wheel
  • Hasnt someone created a schema for this before?

38
How can we reduce the burden of annotation?
39
iDASH Annotation Environment
Goal provide an environment to decrease
the Burden of annotation for research and
application
Annotator Registry
eHOST
Annotation Admin
Web application iDASH cloud
Client app on your computer
VA, SHARP, and NIGMS S Duvall, B South, G
Savova, N Elhadad, H Hochheiser
40
Annotator Registry
  • Enlist for annotation
  • Certify for annotation tasks
  • Personal health information
  • Part-of-speech tagging
  • UMLS mapping
  • Set pay rate
  • Searchable
  • Available for inclusion in new annotation task
  • http//idash.ucsd.edu/nlp-annotator-registry

41
Annotation Admin Intended Users Uses
  • Users
  • NLP researchers
  • Annotation administrators
  • Uses
  • Manage annotation projects who annotates what
  • Currently done with hundreds of files on hard
    drive
  • Integrate with annotation tool (eHOST)
  • Download batches of raw reports to annotators
  • Upload and store annotated reports
  • Manage simple annotation projects
  • Facilitate distributed annotation

42
Annotation Admin
1. Assign annotators to a task
43
2. Create a Schema
44
3. Assign users and set time expectations
45
3. Keep track of progress
46
Collaborative Effort to Build Resources
Evaluation Workbench
De-Identification
Tools Services
Collaborative Knowledge Authoring
TextVect
Increase access to NLP
Virtual Machines
Annotation Environment
Decrease Burden of Developing NLP
Registry
47
Conclusion
  • More demand for EHR data
  • NLP has potential to extend value of narrative
    clinical reports
  • There have been many barriers
  • To development
  • To deployment
  • Recent developments facilitate collaboration
    sharing
  • Common annotation conventions
  • Privacy algorithms
  • Shared datasets
  • Hosted environments
  • iDASH hopes to facilitate
  • Development of NLP
  • Application of NLP

48
Questions Discussion
Integrating Data for Analysis, Anonymization, and
Sharing
iDASH/ShARe Workshop on Annotation September 29,
2012 La Jolla, CA
Division of Biomedical Informatics University of
California, San Diego
  • wwchapman_at_ucsd.edu
Write a Comment
User Comments (0)
About PowerShow.com