Literature-Based Knowledge Discovery using Natural Language Processing - PowerPoint PPT Presentation

About This Presentation
Title:

Literature-Based Knowledge Discovery using Natural Language Processing

Description:

1Institute of Biomedical Informatics, Medical Faculty, University of Ljubljana, Slovenia. 2Department of Biomedical Informatics, Columbia University, New York ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 46
Provided by: dimitarhr
Category:

less

Transcript and Presenter's Notes

Title: Literature-Based Knowledge Discovery using Natural Language Processing


1
Literature-Based Knowledge Discovery using
Natural Language Processing
Dimitar Hristovski,1 PhD, Carol Friedman,2 PhD,
Thomas C Rindflesch,3 PhD, Borut Peterlin,4 MD
PhD 1Institute of Biomedical Informatics,
Medical Faculty, University of Ljubljana,
Slovenia 2Department of Biomedical Informatics,
Columbia University, New York3National Library
of Medicine, Bethesda, Maryland 4Division of
medical genetics, UMC, Slajmerjeva 3, Ljubljana,
Slovenia e-mail dimitar.hristovski_at_mf.uni-lj.si
2
Part 1 Co-occurrence based LBD
3
Motivation
  • Overspecialization
  • Information overload
  • Large databases
  • Need and opportunity for computer supported
    knowledge discovery

4
Literature-based Discovery (LBD)
  • A method for automatically generating hypotheses
    (discoveries) from literature
  • Hypotheses have formConcept1 Relation
    Concept2
  • ExampleFish oil Treats Raynauds disease

5
Background
  • Swansons LBD paradigm

New Relation?e.g. Treats
Concept X(Disease) e.g. Raynauds
Concepts Y(Pathologycal or Cell Function,
) e.g. Blood viscosity
Concepts Z(Drugs, ) e.g. Fish oil
6
Biomedical Discovery Support System (BITOLA)
  • Goal
  • discover potentially new relations (knowledge)
    between biomedical concepts
  • to be used as research idea generator and/or as
  • an alternative way to search Medline
  • System user (researcher or intermediary)
  • interactively guides the discovery process
  • evaluates the proposed relations

7
Extending and Enhancing Literature Based
Discovery
  • Goal
  • Make literature based discovery more suitable for
    disease candidate gene discovery
  • Decrease the number of candidate relations
  • Method
  • Integrate background knowledge
  • Chromosomal location of diseases and genes
  • Gene expression location
  • Disease manifestation location

8
System Overview
Knowledge Base
Concepts
Background Knowledge (Chromosomal Locations, )
Discovery Algorithm
Association Rules
User Interface
Knowledge Extraction
Databases (Medline, LocusLink, HUGO, OMIM, )
9
Terminology Problems during Knowledge Extraction
  • Gene names
  • Gene symbols
  • MeSH and genetic diseases

10
Detected Gene Symbols by Frequency
  • type666548
  • II552584
  • III201776
  • component179643
  • CT175973
  • AT151337
  • ATP147357
  • IV123429
  • CD499657
  • p5389357
  • MR88682
  • SD85889
  • GH84797
  • LPS68982
  • 5967272
  • E264616
  • 8263521
  • AMP61862
  • TNF59343
  • RA58818
  • CD857324
  • O256847
  • ACTH54933
  • CO253171
  • PKC51057
  • EGF50483
  • T349632
  • MS46813
  • A244896
  • ER43212
  • upstream41820
  • PRL41599

11
Gene Symbol Disambiguation
  • Find MEDLINE docs in which we can expect to find
    gene symbols
  • Example of false positive
  • Ethics in a twist "Life Support", BBC1. BMJ 1999
    Aug 7319(7206)390
  • breast basic conserved 1 (BBC1) gene, v.s. BBC1
    television station featuring new drama series
    Life Support

12
Binary Association Rules
  • X?Y (confidence, support)
  • If X Then Y (confidence, support)
  • Confidence of docs containing Y within the X
    docs
  • Support number (or ) of docs containing both X
    and Y
  • The relation between X and Y not known.
  • Examples
  • Multiple Sclerosis ? Optic Neuritis (2.02, 117)
  • Multiple Sclerosis ? Interferon-beta (5.17, 300)

13
Discovery Algorithm
Candidate Gene?
Concepts Y(Pathologycal or Cell Function, )
Concept X(Disease)
Concepts Z(Genes)
Chromosomal Region
Chromosomal Location
Match
Manifestation Location
Expression Location
Match
14
Ranking Concepts Z
15
Problem Size
  • Full Medline analyzed (cca 15,000,000 recs)
  • 87,000,000 association rules between 180,000
    biomedical concepts

16
Bilateral Perisylvian Polymicrogiria - BPP (OMIM
300388)
  • Polymicrogyria of the cerebral cortex is a
    developmental abnormality characterized by
    excessive surface convolution
  • Clinical characteristics
  • Mental retardation
  • Epilepsy
  • Pseudobulbar palsy (paralysis of the face,
    throat, tongue and the chewing process)
  • X linked dominant inheritance

17
237 genes in Xq28
relation between semantic types Cell Movement and
Gene or gene products
18 gene candidates
Sublocalisation in the Xq28
15 gene candidates
Tissue specific expression
2 gene candidates L1CAM and FLNA
18
User Interface cgi-bin version
19
Automatically search for supporting Medline
Citations
20
Part 1 Summary and Conclusions
  • Discovery support system (BITOLA) presented
  • The system can be used as
  • Research idea generator, or
  • Alternative method of searching Medline
  • Genetic knowledge about the chromosomal locations
    of diseases and genes included to make BITOLA
    more suitable for disease candidate gene discovery

21
System Availability
  • URL www.mf.uni-lj.si/bitola/

22
Part 2 Exploring Semantic Relations for LBD
23
Current LBD Systems
  • Co-occurrence based
  • Concepts
  • Title/Abstract Words/Phrases
  • MeSH
  • UMLS
  • Genes ...
  • UMLS Semantic types used for filtering
  • Semantic relations between concepts NOT used

24
Drawbacks of Current LBD
  • Not all co-occurrences represent a relation
  • Users have to read many Medline citations when
    reviewing candidate relations
  • Many spurious (false-positive) relations and
    hypotheses produced
  • No explanation of proposed hypotheses

25
Enhancing the LBD paradigm
  • Use semantic relations obtained from
  • two NLP systems (BioMedLee and SemRep) to
    augment
  • co-occurrence based LBD system (BITOLA)

26
Methods
27
Discovery Patterns
  • Discovery pattern Set of conditions to be
    satisfied for the generation of new hypotheses
  • Conditions are combinations of semantic relations
    between concepts
  • Maybe_Treats pattern in this research has two
    forms
  • Maybe_Treats1
  • Maybe_Treats2

28
Maybe_Treats Discovery Pattern
Maybe_Treats1
Substance Y1(or Body meas., Body funct.)
Drug Z1 (or substance)
Opposite_Change1
Change1
Disease X
Disease X2
Substance Y2(or Body meas., Body funct.)
Same Change2
Change2
Treats
Drug Z2(or substance)
Maybe_Treats2
29
Maybe_Treats1 and Maybe_Treats2
  • GoalPropose potentially new treatments
  • Can work in concert
  • Propose different treatments (complementary)
  • Propose same treatments using different discovery
    reasoning (reinforcing)

30
Multiple Usages of Maybe_Treats
  • Given Disease X as input
  • find new treatments Z
  • Given Drug Z as input
  • find diseases X that can be treated
  • Given Disease X and Drug Z as input
  • test whether Z can be used to treat X

31
Semantic Relations Used
  • Associated_with_change and Treats used to extract
    known facts from the literature
  • Then Maybe_Treats1 and Maybe_Treats2 predict new
    treatments based on the known extracted facts

32
Associated_with_change
  • One concept associated with a change in another
    concept, for example
  • Associated_with(Raynauds, Blood viscosity,
    increase)
  • Local increase of blood viscosity during
    cold-induced Raynaud's phenomenon.
  • Increased viscosity might be a causal factor in
    secondary forms of Raynaud's disease,
  • BioMedLee (Friedman et al) used to extract
    Associated_with_change

33
Treats
  • Used to extract drugs known to treat a disease
  • Major purpose in our approach
  • Eliminate drugs already known to be used to treat
    a disease
  • Find existing treatments for similar diseases
  • TREATS(Amantadine,Huntington)
  • treatment of Huntingtons disease with
    amantadine
  • Treats extracted by SemRep (Rindflesch et al)

34
Results
35
Huntington Disease
  • Inherited neurodegenerative disorder
  • All 5511 Huntington citations (Jan.2006)
    processed with BioMedLee and SemRep
  • 35 interesting concepts assoc.with change
    selected and corresponding citations (250.000)
    processed

36
Insulin for Huntington Disease
  • Assoc_with(Huntington,Insulin,decrease)
  • Huntington's disease transgenic mice develop an
    age-dependent reduction of insulin mRNA
    expression and diminished expression of key
    regulators of insulin gene transcription,
  • Insulin also decreased in diabetes mellitus
  • Therapies used to regulate insulin in diabetes
    might be used for Huntington

37
Capsaicin for Huntington
  • Assoc_with(Huntington,Substance P,decrease)
  • In Huntington's disease brains decreased
    Substance P staining was found in
  • Assoc_with(Capsaicin,Substance P,increase)
  • Capsaicin also attenuated the increase in
    Substance P content in sciatic nerve,
  • Capsaicin maybe treats Huntington because
    Substance P is decreased in Huntington and
    Capsaicin increases Substance P.

38
Huntington Results - Summary
Maybe_Treats1
Substance P(Substance Y1)
Capsaicin(Drug Z1)
Increase
Decrease
Huntington(Disease X)
Diabetes M(Disease X2)
Insulin(Substance Y2)
Decrease
Decrease
Treats
Insulin regulation ther. (Z2)
Maybe_Treats2
39
Example Parkinson disease as starting concept.
Bellow shown some related concepts changed in
association to Parkinson
40
Potential Treatments for Parkinson (e.g.
gabapentine)
41
Showing Supporting Sentenceswith highlighted
concepts and relations
42
Gabapentine for Parkinson
  • Assoc_with(Parkinson,gamma-aminobutyric
    acid(GABA),decrease)
  • studies indicate that patients with Parkinson's
    disease have decreased basal ganglia
    gamma-aminobutyric acid function
  • Assoc_with(GABA,Gabapentine,increase)
  • Gabapentin, probably through the activation of
    glutamic acid decarboxylase, leads to the
    increase in synaptic GABA.
  • Explanation Gabapentine maybe treats Parkinson
    because GABA is decreased in Parkinson and
    Gabapentine increases GABA.

43
Part 2 Conclusions
  • A new method to improve LBD presented
  • Based on discovery patterns and semantic
    relations extracted by BioMedLee and SemRep,
    coupled with BITOLA LBD
  • Easier for the user to evaluate smaller number of
    hypotheses
  • Two potentially new therapeutic approaches for
    Huntington proposed and one for Parkinson
  • RaynaudsFish oil discovery replicated

44
The future of Literature-based Discovery
  • Development of specific discovery patterns based
    on semantic relations and further integrated with
    co-occurrence-based LBD

45
Link, References and some propaganda
  • http//www.mf.uni-lj.si/bitola
  • Hristovski D, Peterlin B, Mitchell JA and
    Humphrey SM. Using literature-based discovery to
    identify disease candidate genes. Int. J. Med.
    Inform. 2005. Vol. 74(24), pp. 289298. ?
    Selected for Yearbook of Medical Informatics 2006
  • Hristovski D, Friedman C, Rindflesch TC, Peterlin
    B. Exploiting semantic relations for
    literature-based discovery. In Proc AMIA 2006
    Symp 2006. p. 349-53.
  • Ahlers C, Hristovski D, Kilicoglu H, Rindflesch
    TC. Using the Literature-Based Discovery Paradigm
    to Investigate Drug Mechanisms. In Proc AMIA 2007
    Symp 2007. p. 6-10. ? Distinguished Paper Award
    AMIA2007
  • Hristovski D, Friedman C, Rindflesch TC, Peterlin
    B. Literature-Based Knowledge Discovery using
    Natural Language Processing. ? To appear as a
    chapter in the first LBD book in 2008
Write a Comment
User Comments (0)
About PowerShow.com