Carol%20Friedman,%20PhD - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Carol%20Friedman,%20PhD

Description:

Discovering Novel Adverse Drug Events Using Natural ... Research Opportunities: Statistical Issues. Find associations between drug, symptoms, and diseases ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 41
Provided by: carolfr
Learn more at: http://aimedicine.info
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Carol%20Friedman,%20PhD


1
Discovering Novel Adverse Drug Events Using
Natural Language Processing and Mining
of Electronic Health Records
  • Carol Friedman, PhD
  • Department of Biomedical Informatics
  • Columbia University

2
Motivation Severity of Problem
  • Clinical trials do not test a broad population
  • Adverse Drug Events (ADEs) world-wide problem
  • Expense from ADEs is 5.6 billion annually
  • Estimated that over 2 million patients
    hospitalized due to ADEs
  • ADEs are fourth leading cause of death

In US alone
3
Motivation Limitations of Approaches
  • Manual review of case reports (Venulet J 1988)
  • Spontaneous reporting to designated agency (Evans
    JM 2001 Eland IA 1999 Wysowski DK 2005)
  • Serious ADEs reported less than 1-10 of time
  • Reporting is voluntary for physicians/patients
  • Recognition of ADEs is highly subjective
  • Difficult to determine cause of ADE
  • Biased by length of time on market and other
    factors
  • Cannot determine number of patients on drug or
    percent at risk
  • Drug prescribing/claims data (Hershman D 2007
    Ray WA 2009)

4
Severity of Under Reporting
  • Study showed 87 of time physicians ignored
    patient reports of known ADEs
  • (Golumb et al. Physicians response to
    patient reports of adverse drug effects. Drug
    Safety 2007)

5
Related Work
  • Automated methods mainly based on spontaneous
    reporting databases
  • Most methods use (Evans SJ 2001 Szarfman A 2002)
  • Surrogate observed-to-expected ratios
  • Incidence of drug-event reporting compared to
    background reporting across all drugs and events
  • Some research aimed at improving effectiveness of
    SPR databases
  • Create ontology of higher order adverse events
  • MedDRA
  • Avoid fragmentation of signal

6
Related Work
  • Pharmacoepidemiology databases used to confirm
    suspicions
  • General practice research database (GPRD) (Wood
    Martinez 2004)
  • New Zealand Intensive Medicines Monitoring (IMMP)
    (Coulter 1998)
  • Medicine Monitoring Unit (MEMO) (Evans et al.
    2001)
  • EHR databases used to find signals (Brown JS et
    al. 2007 Berlowitz DR et al. 2006 Wang X et al.
    2009)
  • Mainly coded data used
  • Has potential for active real time surveillance
  • Should reduce biased reporting

7
Related Work
  • Consortiums involving multiple EHRs
  • EU-ADR project (http//www.alert-project.org/)
  • eHealth initiative (http//www.ehealthinitiative.o
    rg/drugSafety/)
  • Related work using EHR to detect known ADEs not
    aimed at discovering novel ADEs
  • (Bates DW 2003 Hongman B 2001)

8
Exploiting the Electronic Health Record
D A T A
NLP Integration
9
The Electronic Health Record (EHR)
  • Rich source of patient information
  • Mostly untapped
  • Primary use for EHR
  • Documenting care in multi-provider environment
  • Manual review by providers
  • More complete than coded ICD-9 codes
  • Symptoms
  • Clinical conditions not beneficial for billing
  • Fragmented
  • Heterogeneous
  • Noisy

10
Research Opportunities NLP Issues
  • Occurrence of clinical events in natural language
  • Drugs, diseases, symptoms
  • Temporal information is critical
  • Irregularity of reports
  • Section headings important but abbreviated/missing
  • Use of indentation, lists, run on sentences
  • Tables semi-structured data in reports
  • Abbreviations
  • 2/2 meaning secondary to
  • co meaning cardiac output or complaining of
  • Mapping terms in text to an ontology/controlled
    vocabulary
  • infiltrate in chest x-ray means chest infiltrate
  • ontology terms more limited than language

11
Research Opportunities Statistical Issues
  • Find associations between drug, symptoms, and
    diseases
  • Not explicit in EHR
  • Large volumes of data
  • Statistical significance vs. clinical
    significance
  • Statistical associations not relationships
  • Drug treats condition / Drug causes condition
  • Integrating time sequences is important
  • For treats condition must precede drug event
  • For causes drug event must precede condition

12
Research Opportunities Statistical Issues
  • Confounding (indirect associations)
  • Metolazone treats heart failure (HF)
  • HF is manifested by shortness of breath (SOB)
  • Metolazone and SOB indirectly related
  • Higher order associations
  • Drug interactions Drug1, drug2, condition
  • Drug-contraindications Drug, disease, condition
  • Rare ADEs

13
Other Research Opportunities Knowledge
Acquisition
  • Structured Knowledge bases
  • UMLS relations (may_be_treated_by)
  • Proprietary ones usually unavailable
  • Text/Semi-Structured Knowledge (need NLP)
  • Spontaneous reporting databases indications,
    drugs, adverse events
  • Literature (Medline)
  • Web sites (WebMD, Micromedix)
  • Online medical textbooks
  • Claims Data (Health IT payors)

14
Text Mining for Knowledge Acquisition
  • Statistical methods co-occurrences
  • Discovered associations between diseases and
    diets from literature (Weeber M 2002)
  • Identified disease candidate genes ( Hristovski D
    2005)
  • NLP systems
  • Trends in medications based on the literature and
    narrative clinical reports (Chen ES 2007, 2008)
  • Semantic relations in the literature (Hristovski
    D 2006)

15
Overview of Our NLP-EHR based Pharmacovigilance
System
16
Natural Language Processing of EHR
17
Meds Tegretol xr Zocor All Several sz
meds PMHx sz d/o - well controlled on
tegretol high chol - on zocor CAD - 60 lesion in
LADM by cath MR - secondary to mitral
prolapse PSHx rib fx in 2001, shoulder fx
secondary to trauma Vitals 130/80 12
80 A/P 54 y/o m with mult med problems, all
relatively well controlled. Pt sz free, not
anemic as of 2/2003. Concerned of MR and its
possible long term effects.
18
Coded Output from NLP
  • medtegretol xr
  • sectnamegtgt report medication item
  • codegtgt UMLSC0592163_Tegretol XR
  • medzocor
  • sectnamegtgt report medication item
  • codegtgt UMLSC0678181_Zocor
  • .........
  • problemmitral valve regurgitation
  • sectnamegtgt report past history item
  • codegtgt UMLSC0026266_Mitral Valve
    Insufficiency
  • ..
  • problemrib fracture
  • dategtgt 2001
  • sectnamegtgt report past history item

19
Coding Issues
  • Not all conditions have codes
  • Non-communicative
  • Some conditions are combinations of codes
  • Difficulty sleeping
  • Vascular injury
  • Granularity of coding system
  • Many different codes for a concept
  • Asthma asthma exacerbation, asthma disturbing
    sleep, moderate asthma, suspected asthma,

20
Standardizing Coded Data
MedLEE NLP
C0744727 low hematocrit
Standardize integrate
HCT20
Selecting filtering
Detect associations
Eliminate confounding
ADE Signals
Medical knowledge
21
Standardizing Coded EHR DataLaboratory Tests
and Medications
  • Lab values denoting normal/abnormal vary
  • Abnormal range may depend on age, sex, ethnicity,
    weight
  • Change in lab values and duration must be
    considered
  • Standardizing medications is complex requires
    additional knowledge
  • Tradename to generic (Avandia ? rosaglitazone)
  • Handling of combination medications
  • 1.5 Lidocaine with 1200,000 Epinephrine
  • Handling of dose Route
  • Diazepam 2 MG Oral Tablet

22
Selecting and Filtering
  • Select using UMLS classes
  • (diseases, medications)
  • Filter out
  • negations, past info,
  • wrong time order

23
Selecting and Filtering
  • Dependence on accuracy of semantic classification
  • UMLS classification errors
  • - Finding birth history, cardiac output,
    divorce
  • Finding cardiomegaly, fever
  • Temporal information difficult to obtain
  • An adverse drug event should only follow drug
    event
  • Processing of explicit time information is
    complex and vague
  • Yesterday, last admission, 2/5
  • Information typically occur in reports without
    dates

24
Detect Associations
  • Obtain event frequencies
  • Co-occurrence frequencies
  • Form 2x2 tables
  • Calculate associations

25
Detect Associations
  • Correct temporal sequence is critical
  • Drug event should precede adverse event
  • Dates are not usually stated along with events
  • Section of reports helpful surrogate
  • Statistical associations correspond to different
    clinical relations
  • For pharmacovigilance
  • Want drug causes adverse event
  • Confounding caused by dependencies in data

26
(No Transcript)
27
Confounding Interdependencies
HD
SOB
ML
ML Metolazone HD Hypertensive Disease SOB
Shortness of Breath
28
Drug Associations Network
Rx1-n
treatment
association
ADE
treatment
Sx1-n
Sx
Rx
association
ADE
process
treatment
process
process
process
Dx1-n
Dx
association
29
Reduce Confounding
Eliminate confounding
Medical knowledge
30
Reduce Confounding
  • Collect knowledge from external sources and
    associations
  • Drug-treat-disease
  • Disease-manifested by-symptom
  • Drug-interacts with-drug
  • Use Information theory
  • Mutual Information (MI)
  • Data processing inequality
  • MI3 lt (MI1, MI3)

Disease
MI2
MI1
Adverse Event
Drug
MI3
31
Initial Study Methods
  • 6 drugs chosen
  • Ibuprofen, Morphine, Warfarin longtime on market
    with known ADEs
  • Bupropion, Paroxetine, Rosiglitazone ADEs
    discovered after 2004
  • 1 drug class ACE inhibitors
  • 25,074 textual discharge summaries in 2004 from
    NYPH processed using MedLEE NLP
  • Reference standard created using expert knowledge
    sources
  • Drug-potential ADE pairs determined
  • Recall/precision calculated
  • Qualitative analysis performed to classify
    drug-potential ADE pairs detected

32
Initial Study Results
  • Quantitative
  • recall (.75), precision (.30)
  • Qualitative analysis potential drug-ADE pairs
  • Known drug-ADEs 30
  • Drug-indication pairs 30
  • Remote drug-indication pair 33
  • Unknown clinical associations 6

33
Confounding Interdependencies
Disease
Disease2
Manifested by
Treats
Adverse Event
Drug
Cause_ADE
34
Study 2 Reduction of Confounding
  • Evaluation set
  • 14 associations related to 2 drugs from Study 1
  • Reference standard
  • Drug-ADE associations determined and MI, DPI used
    to automatically classify them

Drug-ADE Relation
Direct Side effects of the drug (Rosiglitazone-headache)
Indirect Conditions related to the disease/symptoms the drug treats (Metolazone-shortness of breath)
Either Conditions in both direct and indirect categories (Rosiglitazone-chest Pain)
35
Results
  • Precision
  • 0.86 when handling confounding
  • 0.31 when without handling confounding

36
Discussion Limitations Future Directions
  • Mutual information only strategy to handle
    confounding
  • More complex MI strategy will be explored
  • Other statistical/knowledge based methods will be
    explored
  • Inpatient data only/sicker patient population
  • The same methods could be used for outpatient
    data as well - possibly more noisy
  • Drug dosage, drug-drug and more complex
    interactions should be explored

37
Discussion Limitations Future Directions
  • Small evaluation data set
  • More comprehensive evaluation
  • Limitations inherent from NLP, coding,
    association detection
  • Limitations due to fragmented/incomplete patient
    data

38
Summary
  • Need for more pharmacovigilance research
  • Based on the EHR
  • Using available databases and text
  • Studies demonstrated promising results
  • Many interesting research opportunities
  • Natural language processing
  • Statistical methods
  • Integrating different sources of data
  • Gathering knowledge from different sources
  • Automated knowledge acquisition for evidence
    based medicine

39
Acknowledgement
  • NLP Data Mining group at DBMI at Columbia
  • George Hripcsak
  • Marianthi Markatou
  • Herb Chase
  • Xiaoyan Wang
  • David Albers
  • Jung-wei Fan
  • Lyudmila Shagina
  • Noemie Elhadad
  • Grants
  • R01 LM007659 from NLM
  • R01 LM008635 from NLM
  • R01 LM06910 from NLM
  • 5T15LM007079 from NLM training grant

40
QUESTIONS
THANK YOU!
About PowerShow.com