Mining BiologicalChemical Relationships from Literature A Survey of BioChem Text Mining - PowerPoint PPT Presentation

Loading...

PPT – Mining BiologicalChemical Relationships from Literature A Survey of BioChem Text Mining PowerPoint presentation | free to view - id: 25b7d7-ZjE2M



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Mining BiologicalChemical Relationships from Literature A Survey of BioChem Text Mining

Description:

Mining Biological/Chemical Relationships from Literature - A Survey of Bio/Chem ... Furthermore, rutaecarpine and limonin were identified as mechanism-based ... – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 40
Provided by: Jia67
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Mining BiologicalChemical Relationships from Literature A Survey of BioChem Text Mining


1
Mining Biological/Chemical Relationships from
Literature - A Survey of Bio/Chem Text Mining
  • Dazhi Jiao

2
Outline
  • I keep six honest serving-men
  • (They taught me all I knew)
  • Their names are What and Why and When
  • And How and Where and Who.
  • Rudyard Kipling The Elephant's Child

3
What
  • What to extract?

4
Named Entity Recognition
  • Locate and classify elements in text into
    predefined categories (genes, proteins,
    compounds, drugs, diseases, etc.)

Settles, B. ABNER an open source tool for
automatically tagging genes, proteins and other
entity names in text. Bioinformatics 21,
3191-3192 (2005).
5
Bio NER Performance
6
Chem NER
Kolarik, C.K., Klinger, R., Friedrich, C.M.,
Hofmann-Apitius, M. Fluck, J. Chemical Names
Terminological Resources and Corpora Annotation.
Workshop on Building and evaluating resources for
biomedical text mining (6th edition of the
Language Resources and Evaluation Conference)
5158 (2008).
7
Chem NER (Dictionary Based)
Kolarik, C.K., Klinger, R., Friedrich, C.M.,
Hofmann-Apitius, M. Fluck, J. Chemical Names
Terminological Resources and Corpora Annotation.
Workshop on Building and evaluating resources for
biomedical text mining (6th edition of the
Language Resources and Evaluation Conference)
5158 (2008).
8
Chem NER (Statistical NLP based)
Kolarik, C.K., Klinger, R., Friedrich, C.M.,
Hofmann-Apitius, M. Fluck, J. Chemical Names
Terminological Resources and Corpora Annotation.
Workshop on Building and evaluating resources for
biomedical text mining (6th edition of the
Language Resources and Evaluation Conference)
5158 (2008).
9
Relationship Extraction
  • Goal Mining relationships between biological
    entities. The relationships can be event-driven
    relationships or conceptual association.
  • Bio-event a change on the state of a
    bio-molecule or bio-molecules, e.g.
    phosphorylation of IkB involves a change on the
    protein IkB1.
  • Extracting bio-events in a single piece of text,
    using NLP methods
  • Extracting interesting statistical relationships
    from a large set of text

1. Kim, J., Ohta, T., Pyysalo, S., Kano, Y.
Tsujii, J. Overview of BioNLP'09 shared task on
event extraction. Proceedings of the Workshop on
BioNLP Shared Task 1-9 (2009).
10
Events/Relationships in Life Sciences
  • Protein Protein Interaction
  • Gene Regulation
  • Ligand Protein Interaction
  • Drug Disease Association
  • Drug Side-effects Association
  • Gene Disease Association
  • ...

11
NLP Based Bio-Events
  • BioNLP Shared Task 2009 (Event Extraction)
  • BioCreative Shared Task 2009 (PPI Extraction)
  • TRADD was the only protein that interacted with
    wild-type TES2 and not with isoleucine-mutated
    TES2.

12
NLP Based Protein-Ligand Interaction
  • automatically extracts CYP protein and chemical
    interactions from journal article abstracts,
    using natural language processing (NLP) and text
    mining methods
  • Furthermore, rutaecarpine and limonin were
    identified as mechanism-based inhibitors of
    CYP3A4 from the following observations.

Jiao, D. Wild, D.J. Extraction of CYP Chemical
Interactions from Biomedical Literature Using
Natural Language Processing Methods. Journal of
Chemical Information and Modeling 49, 263-269
(2009).
13
Statistically Induced Relationships Molecular
Connectivity Map
From PubMed Abstracts, generate associations
between drugs and proteins based on occurrences
Li, J., Zhu, X. Chen, J.Y. Building
Disease-Specific Drug-Protein Connectivity Maps
from Molecular Interaction Networks and PubMed
Abstracts. PLoS Comput Biol 5, e1000450 (2009).
14
Gene Phenotype Association
  • an unsupervised, systematic approach for
    associating genes and phenotypic characteristics
    that combines literature mining with comparative
    genome analysis.
  • (A) Traits and genes related to plant constituent
    degradation.
  • (B) Traits and genes related to food spoilage and
    poisoning.

Korbel, J.O. et al. Systematic Association of
Genes to Phenotypes by Genome and Literature
Mining. PLoS Biol 3, e134 (2005).
15
How
  • NLP based method
  • Analysis of individual piece of text to find
    targeted relationships or events
  • Statistical Method
  • Based on statistical analysis on a large set of
    texts to induce the relationships or associations

16
NLP-based methods
  • Usually based on certain machine learning
    methods, sometimes with the help of dictionary or
    rule-based methods

Björne, J. et al. Extracting complex biological
events with rich graph-based feature sets.
Proceedings of the Workshop on BioNLP Shared
Task 10-18 (2009)
Jiao, D. Wild, D.J. Extraction of CYP Chemical
Interactions from Biomedical Literature Using
Natural Language Processing Methods. Journal of
Chemical Information and Modeling 49, 263-269
(2009).
17
Björne, J. et al. Extracting complex biological
events with rich graph-based feature sets.
Proceedings of the Workshop on BioNLP Shared
Task 10-18 (2009)
18
NLP Based method
  • Pro
  • It can extracts the semantics of the relationship
  • It can be extended to extract more useful
    information, such as numerical measures of
    certain relationships
  • It can detect senses ( positive or negative)
  • Con
  • Low precision or recall
  • Lack of training data

19
Statistical Based method
  • Assumption
  • Relationships of entities also obey Zipf's law
    the distribution of co-occurrence correlates with
    the relationship in the real world (The more
    people publish on something, the more likely it
    happens)
  • Using co-occurrences as an indication of the
    association of entities 1
  • Using contextual information as an indication of
    relationships 2

1. Korbel, J.O. et al. Systematic Association of
Genes to Phenotypes by Genome and Literature
Mining. PLoS Biol 3, e134 (2005). 2.
Abi-Haidar, A. et al. Uncovering protein
interaction in abstracts and text using a novel
linear model and word proximity networks. Genome
Biology 9, S11 (2008).
20
Example Connectivity Map
From PubMed Abstracts, generate disease specific
(Alzheimer Disease) associations between drugs
and proteins based on occurrences
Li, J., Zhu, X. Chen, J.Y. Building
Disease-Specific Drug-Protein Connectivity Maps
from Molecular Interaction Networks and PubMed
Abstracts. PLoS Comput Biol 5, e1000450 (2009).
21
Example Connectivity Map
Li, J., Zhu, X. Chen, J.Y. Building
Disease-Specific Drug-Protein Connectivity Maps
from Molecular Interaction Networks and PubMed
Abstracts. PLoS Comput Biol 5, e1000450 (2009).
22
Statistical Method
  • Pro
  • No requirements for annotated training data
  • It can gives a weighted score
  • Con
  • No semantics (inhibition? Regulation?)
  • Not convincing to scientists Does that score
    really mean anything?

23
Why
  • NLP based methods
  • Human curated database can't keep up with the
    rate of publication
  • Curators needs help to automatically constrain
    the size of the work to be annotated (example,
    one database's publication)
  • Systems Biology research needs up-to-date
    published discoveries
  • Statistical Method
  • Some relationships are hard to directly obtain
    from databases
  • Ab initio methods are not powerful enough

24
Yearly trends in PubMed (Medline) indexing by
language(absolute numbers)
http//dan.corlan.net/medline-trend/language
25
TM Helps!
  • Databases
  • CTD (comparative toxicogenomics database) 1
  • HuGE Navigator (An integrated knowledge base of
    genetic associations and human genome
    epidemiology) 2
  • Systems Biology Research
  • Constructing Signaling Transduction Networks 3

1.Wiegers, T.C., Davis, A.P., Cohen, K.B.,
Hirschman, L. Mattingly, C.J. Text mining and
manual curation of chemical-gene-disease networks
for the comparative toxicogenomics database
(CTD). BMC Bioinformatics 10, 326 (2009). 2. Yu,
W. et al. GAPscreener An automatic tool for
screening human genetic association literature in
PubMed using the support vector machine
technique. BMC Bioinformatics 9, 205 (2008). 3.
Tomás Helikar, John Konvalina, Jack Heidel, and
Jim A. Rogers (12 February 2008) PNAS 105 (6),
1913.
26
Relationships Hard to Predict
  • Lacking of Databases
  • Side-effects Gene Relationship
  • Current computational or statistical prediction
    methods are not good enough (TM can help!)

27
Can TM Really help?What does the score in the
statistical methods mean again?
28
Protein Structure Prediction and Functional
Annotation in lack of Sequence Homology
29
Connectivity Maps Prediction
Diltiazem, Prazosin and Quinidine were clustered
together due to their similar drug-protein
connectivity profiles. The three drugs are
previously known to treat vascular diseases.
Among them, Diltiazem is an antihypertensive
agent with vasodilating actions due to its
antagonism of the actions of the calcium ion in
membrane function Prazosin is an
alpha-adrenergic blocking agent used in the
treatment of heart failure and hypertension
Quinidine is an anti-arrhythmia agent with
actions on sodium channels on the neuronal cell
membrane. Recent population-based epidemiological
studies suggested that vascular risk factors,
such as vascular disease gene ApoE, hypertension,
atherosclerosis, and heart failure, may impair
cognitive functions and are related to the
development of AD. Not too surprisingly, when
we look into clinical trial databases, we found
that Prazosin is currently under a double-blind
and placebo-controlled clinical study on the
treatment of agitation and aggression in persons
with AD...
30
Outline
  • I keep six honest serving-men
  • (They taught me all I knew)
  • Their names are What and Why and When
  • And How and Where and Who.
  • Rudyard Kipling The Elephant's Child

31
What Next?
  • Improve the validity of co-occurrence
  • Adding constraints
  • Combine Statistical Methods with NLP based
    methods
  • Use statistical method to identify interesting
    relationships for mining
  • Classify abstracts (with or without targeted
    relationships)
  • use NLP based method to extract the semantic
    relationships
  • Extraction of Numeric Values 1
  • Compare and combine Text mining methods with
    other methods

1.Wang, Z. et al. Literature mining on
pharmacokinetics numerical data A feasibility
study. Journal of Biomedical Informatics 42,
726-735 (2009).
32
Drug Interaction
  • Pharmacokinetic Drug Interactions
  • Inhibition of Absorption
  • Enzyme Inhibition Increasing Risk of Toxicity
  • Enzyme Inhibitors Resulting in Reduced Drug
    Effect
  • Enzyme Induction Resulting in Reduced Drug Effect
  • Enzyme Induction Resulting in Toxic Metabolites
  • Pharmacodynamic Drug Interactions
  • Additive Pharmacodynamic Effects
  • Antagonistic Pharmacodynamic Effects
  • May also hit two parallel pathways, leading to
    major blockage vs. minor blockage of overall flux
    through pathway.

33
Drug Interaction Example
  • inhibitors of CYP1A2 can increase the risk of
    toxicity from clozapine or theophylline.
    Inhibitors of CYP2C9 can increase the risk of
    toxicity from phenytoin, tolbutamide, and oral
    anticoagulants such as warfarin. Inhibitors of
    CYP3A4 can increase the risk of toxicity from
    many drugs, including carbamazepine, cisapride,
    cyclosporine, ergot alkaloids, lovastatin,
    pimozide, protease inhibitors, rifabutin,
    simvastatin, tacrolimus, and vinca alkaloids.

34
Drug Interaction Prediction
  • Network based methods 1
  • Probabilistic Model 2
  • Machine Learning 3, 4
  • Ontology Driven Method 5
  • Methods are based on drug-target interaction data
    from databases

1.Yamanishi, Y., Araki, M., Gutteridge, A.,
Honda, W. Kanehisa, M. Prediction of
drug-target interaction networks from the
integration of chemical and genomic spaces.
Bioinformatics 24, i232-240 (2008). 2.Zhou, J.
et al. A new probabilistic rule for drug-dug
interaction prediction. J Pharmacokinet
Pharmacodyn 36, 1-18 (2009). 3.Li, L. et al.
Drug-drug interaction prediction a Bayesian
meta-analysis approach. Stat Med 26, 3700-3721
(2007). 4.Yu, M., Kim, S., Wang, Z., Hall, S.
Li, L. A Bayesian meta-analysis on published
sample mean and variance pharmacokinetic data
with application to drug-drug interaction
prediction. J Biopharm Stat 18, 1063-1083 (2008).
5.Arikuma, T. et al. Drug interaction
prediction using ontology-driven hypothetical
assertion framework for pathway generation
followed by numerical simulation. BMC
Bioinformatics 9 Suppl 6, S11 (2008).
35
Drug Interaction Prediction
  • Build Connectivity Map (Drug-Target, Drug-Drug)
  • Identify Targeted Drug-Target Interactions
  • Classify Pubmed abstracts, constrain the search
    space to abstracts with Drug Target interactions
  • Mine the semantic relationship using NLP based
    methods
  • Build a RDF network or a Bipartite network
  • Using Ontology-based reasoning, or network
    analysis, or path finding to identify Drug-Drug
    interactions
  • Combine results with the Drug-Drug Interactions
    from the Drug-Drug Connectivity Map

36
Connectivity Map for Drug Drug Interaction
  • Two Connectivity Maps
  • Drug CYP (and possibly other proteins)
  • Drug - Drug
  • Mining semantic relationships on focused
    interactions (Using Previous work, Jiao Wild)

37
Classification
  • Classify whether an abstract contains the
    targeted relationship
  • Previous work
  • Classify paragraphs of PPI (With Luis Rocha)
  • Classify texts of organisms to constrain PPI
    (With Luis Rocha)
  • Current work (With Huijun) Semi-supervised
    Classification
  • TB Compound efficacy, toxicity
  • Drug Side-effects (Hepatic Necrosis)
  • CYP-Drug

38
Drug Interaction Prediction
  • Ontology Based DIO (Drug Interaction Ontology)
  • Network Analysis Bipartite network PPI
    Drug-Target interactions, supervised learning
  • Subgraph Mining Finding paths from One drug to
    another Drug
  • How to rank paths? 1, 2
  • How to find k top ranked paths efficiently? 3
  • How to interpret the paths found?

1.Azevedo, J., Santos Costa, M.E.O., Silvestre
Madeira, J.J.E.R. Vieira Martins, E.Q. An
algorithm for the ranking of shortest paths.
European Journal of Operational Research 69,
97-106 (1993). 2. Aleman-Meza, B.,
Halaschek-Wiener, C., Arpinar, I.B.,
Ramakrishnan, C. Sheth, A.P. Ranking Complex
Relationships on the Semantic Web. IEEE Internet
Computing 9, 37-44 (2005). 3. Eppstein, D.
Finding the k Shortest Paths. (1997).at
lthttp//citeseerx.ist.psu.edu/viewdoc/summary?doi
10.1.1.30.3705gt
39
Thank you!
About PowerShow.com