A%20knowledge%20based%20approach%20for%20representing,%20reasoning%20and%20hypothesizing%20about%20biochemical%20networks

About This Presentation

Title:

A%20knowledge%20based%20approach%20for%20representing,%20reasoning%20and%20hypothesizing%20about%20biochemical%20networks

Description:

A knowledge based approach for representing, reasoning and ... Initial condition I = { intially f } Observation O = { eventually g } (K,I) does not entail O ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 86

Provided by: namt

Category:

more less

Transcript and Presenter's Notes

Title: A%20knowledge%20based%20approach%20for%20representing,%20reasoning%20and%20hypothesizing%20about%20biochemical%20networks

1
A knowledge based approach for representing,
reasoning and hypothesizing about biochemical
networks

Chitta Baral
Arizona State University

2
Three parts to the talk

Prediction, Explanation and Planning with respect
to biochemical networks
Hypothesis Generation with respect to biochemical
networks
Collaborative BioCuration CBioC

3
Motivation purpose of interaction databases?

Suppose We have an almost exhaustive database of
the intracellular interactions (protein-protein,
metabolic, etc.) of particular cells.
What next?
How will we use this database?
What if our knowledge is incomplete?

4
Motivation Uses of networks pathways

Visualize the pathways
Analyze the graphs of the networks
Compare graphs of the networks
Use pathway data in conjunction with micro-array
data analysis
Do system level simulation
Is that all?

5
Motivation ultimate uses!

Prediction/System Simulation (Systems Biology?)
Impact of particular perturbations (say caused by
a drug that introduces certain proteins to the
cell membrane or into the cell)
Do the perturbations have the desired impact?
Do they mess up something else? (side effects!)
But thats not all!

6
Motivation Explaining observations

A phenotypical observation (leading to) OR
an observation that a particular protein or
chemical has abnormally high concentration
What is wrong? What is out of the ordinary?
The cause/explanation will give us approaches to
fix the problem.
How deep should the explanations go?
How do we compare explanations?

7
Motivation Designing drugs therapies

What perturbations (when and where) need to be
made so as to make the cell behave in a
particular way?
In case of cancer prevent proliferation, induce
apoptosis, prevent migration, etc.

8
What if knowledge is incomplete?

What kind of useful reasoning can we do with
incomplete knowledge?
Drug makers dont wait till full knowledge is
available.
Answer hypothesis formation

9
Motivation Use summary

The ultimate uses of signaling (metabolic, etc.)
interaction databases are to do
Prediction therapy verification determining
side effects.
Explanation -- diagnosing what is wrong.
Planning therapy and drug design.
Intermediate or immediate use
Generate Hypothesis

10
Initial goal of our research

Use knowledge representation and reasoning
techniques to
Represent interactions
Reason about these interactions prediction,
explanation, planning and hypothesis formation.

11
Some questions

Isnt it a little premature?
We know very little about the networks
New knowledge is being constantly added
Why knowledge representation and reasoning?
Why not simulation
Why not use Petri nets, p-calculus
Why a knowledge-based approach? Why not a data
base approach? Whats the difference?

12
Our approach present and future

Yes, prediction is kind-of same as simulation
Incompleteness of information is an issue though!
But hard to do explanation generation, or design
of therapies (planning) using simulation
guesses can be verified using simulation though
The core database query languages can not express
explanation or planning queries.
Dealing with incompleteness!

13
Dealing with incompleteness ongoing and future
work

Is one of the key criteria behind a good
knowledge representation language when building
AI systems.
Need to be non-monotonic.
Need to be elaboration tolerant.
Proper analysis leads to hypothesizing
If certain observations can not be satisfactorily
explained by the existing knowledge about the
network then use general biological knowledge to
hypothesize

14
Motivation -- summary

Goal To emulate the abstract reasoning done by
biologists, medical researchers, and pharmacology
researchers.
Types of reasoning prediction, explanation,
planning and hypothesis formation.
Current system biology approaches mostly
prediction.
Ongoing issues Dealing with incomplete knowledge
and elaboration tolerance.

15
Related Works

Quantitative approaches. (hybrid systems, use of
differential equations)
Graphical representations.
Other qualitative approaches.
Petri Nets
?-calculus
Pathway Logic
Model Checking

16
Overview of our approach

Represent signal network as a knowledge base that
describes
actions/events (biological interactions,
processes).
effect of these actions/events.
triggering conditions of the actions/events.
To query using the knowledge base
Prediction explanation planning Hypothesis
generation
BioSigNet-RR (Biological Signal Network -
Representation and Reasoning) and BioSigNet-RRH
systems.

17
Foundation behind our approach

Research on representing and reasoning about
dynamic systems (space shuttles, mobile robots,
software agents)
causal relations between properties of the world
effects of actions (when can they be executed)
goal specification
action-plans
Research on knowledge representation, reasoning
and declarative problem solving the AnsProlog
language.

18
An NFkB signaling pathway
19
An NFkB signaling pathway
20
Syntax by example

bind(TNF-a,TNFR1) causes trimerized(TNFR1)
trimerized(TNFR1) triggers bind(TNFR1,TRADD)

21
General syntax to represent networks

e causes f if f1 fk
g1 gk causes g
h1 hm n_triggers e
k1 kl triggers e
r1 rl inhibits e
e is an event (also referred to as an action) and
the rest are fluents (properties of the cell)
For metabolic interactions
e converts g1 gk to f1 fk if h1 hm

22
Semantics queries and entailment

Observation part of queries
f at t
a occurs_at t
Given the Network N and observation O
Predict if a temporal expression holds.
Explain a set of observations.
Plan to achieve a goal.

23
Importance of a formal semantics

Besides defining prediction, explanation and
planning, it is also useful in identifying
Under what restrictions the answer given by a
given (graph based) algorithm will be correct.
(soundness!)
Under what restrictions a given (graph based)
algorithm will find a correct answer if one
exists. (completeness!)

24
Utility of declarative programming languages
(such as AnsProlog)

Allows for quick implementation of the semantics
The specification or the definition of what is an
explanation, or what is a plan becomes a program
that finds explanations and plans respectively.

25
Prediction

Given some initial conditions and observations,
to predict how the world would evolve or predict
the outcome of (hypothetical) interventions.

26
Back to the example

Binding of TNF-a with TNFR1 leads to TRADD
binding with one or more of TRAF2, FADD, RIP.
TRADD binding with TRAF2 leads to over-expression
of FLIP provided NIK is phosphorylated on the
way.
TRADD binding with RIP inhibits phosphorylation
of NIK.
TRADD binding with FADD in the absence of FLIP
leads to cell death.

27
Prediction 1.

Binding of TNF-a with TNFR1 leads to TRADD
binding with one or more of TRAF2, FADD, RIP.
TRADD binding with TRAF2 leads to over-expression
of FLIP provided NIK is phosphorylated on the
way.
TRADD binding with RIP inhibits phosphorylation
of NIK.
TRADD binding with FADD in the absence of FLIP
leads to cell death.

Initial Condition
bind(TNF-a,TNF-R1) occurs at t0
Query
predict eventually apoptosis
Answer
Unknown!
Incomplete knowledge about the TRADDs bindings.
Depends on if bind(TRADD, RIP) happened or not!

28
Prediction 2

Binding of TNF-a with TNFR1 leads to TRADD
binding with one or more of TRAF2, FADD, RIP.
TRADD binding with TRAF2 leads to over-expression
of FLIP provided NIK is phosphorylated on the
way.
TRADD binding with RIP inhibits phosphorylation
of NIK.
TRADD binding with FADD in the absence of FLIP
leads to cell death.

Initial Condition
bind(TNF-a,TNF-R1) occurs at t0
Observation
TRADDs binding with TRAF2, FADD, RIP
Query
predict eventually apoptosis
Answer Yes!

29
Explanation

Given initial condition and observations, to
explain why final outcome does not match
expectation.

30
Explanation 1

Binding of TNF-a with TNFR1 leads to TRADD
binding with one or more of TRAF2, FADD, RIP.
TRADD binding with TRAF2 leads to over-expression
of FLIP provided NIK is phosphorylated on the
way.
TRADD binding with RIP inhibits phosphorylation
of NIK.
TRADD binding with FADD in the absence of FLIP
leads to cell death.

Initial condition
bound(TNF-a,TNFR1) at t0
Observation
bound(TRADD, TRAF2) at t1
Query Explain apoptosis
One explanation
Binding of TRADD with RIP
Binding of TRADD with FADD

31
Planning

Given initial conditions, to plan interventions
to achieve a goal.
Application in drug and therapy design.

32
Planning requirements

In addition to the knowledge about the pathway we
need additional information about possible
interventions such as
What proteins can be introduced
What mutations can be forced.

33
Planning example

Defining possible interventions
intervention intro(DN-TRAF2)
intro(DN-TRAF2) causes present(DN-TRAF2)
present(DN-TRAF2) inhibits bind(TRAF2,TRADD)
present(DN-TRAF2) inhibits interact(TRAF2,NIK)
Initial condition
bound(NF?B,I?B) at 0
bind(TNF-a,TNF-R1) at 0
Goal to keep NF?B remain inactive.
Query
plan always bound(NF?B,I?B) from 0

34
Conclusion of part 1

From paper in ISMB 2004
Our goal in this paper was to make progress
towards developing a system (and the necessary
representation language and reasoning algorithms)
that can be used to represent signal networks and
pathways associated with cells and reason with
them.
A start was made.
Defined a simple language (syntax and semantics)
Defined prediction, planning and explanation
A prototype implementation using AnsProlog
Illustration of its applicability with respect to
an NFkB pathway.

35
Issues with incomplete knowledge

Often one may not be able to do much predication,
explanation or planning.
What then?
Can reasoning help in obtaining new knowledge?
Yes, through hypothesis generation!
In fact, hypothesis generation needs reasoning!

36
Part II Hypothesis Generation
37
Hypothesis generation

Our observations can not be explained by our
existing knowledge OR the explanations given by
our existing knowledge are invalidated by
experiments?
Conclusion Our knowledge needs to be augmented
or revised?
How?
Can we use a reasoning system to predict some
hypothesis that one can verify through
experimentation?
Automate the reasoning in the minds of a
biologist, especially helpful when the background
knowledge is humongous.

38
Knowledge base
UV leads_to cancer
High UV
Hypothesis space
(K,I) O
p53
Cancer
No cancer
39
Issues in this tiny example

Hypothesis formation
Theory UV leads to cancer.
Observation wild-type p53 resists the UV
effect.
Hypothesis p53 is a tumor-suppressor.
Elaboration tolerance
How do we update/revise UV leads to cancer?
Default NM reasoning
Normally UV leads to cancer.
UV does not lead to cancer if p53 is present.

40
Related Works some prior mention of hypothesis
formation

HYPGENE (Karp, 1991)
TRANSGENE (Darden, 1997)
GenePath (Zupan et al., 2003)
Robot Scientist (King et al., 2004)
Database (Doherty et al., 2004)
BIOCHAM (Calzone et al., 2005)
PathLogic (Karp et al. 2002)
Cytoscape (Shannon et al., 2003)
Integrative Scheme (Su et al., 2003)
Pathway Analysis (Ingenuity?)
do not use the latest advances in knowledge
representation and reasoning. (eg. lack of ways
to express defaults, non-monotonicity,
elaboration tolerance, problem solving rules,
etc.)

41
Hypothesis formation

Knowledge base K
Set of initial conditions I
Set of (experimental) observations O
(K,I) does not entail O
To expand (K,I) to (K, I) (K, I) entails
O
How to expand (hypothesis space)
Explanation expand only I
Diagnosis normality assumptions about I,
minimally abandon the normality assumptions
Hypothesis formation expand K

42
Construction of hypothesis space

Present manual construction, using research
literature
Future integration of multiple data sources
Protein interactions
Pathway databases
Biological ontologies
..
Provide cues, hunches such as
A may interact with B action interact(A,B)
A-B interaction may have effect C
interact(A,B) causes C

43
Generation of hypotheses

Enumeration of hypotheses
Search computing with Smodels (an implementation
of AnsProlog)
Heuristics
A trigger statement is selected only if it is the
only cause of some action occurrence that is
needed to explain the novel observations.
An inhibition statement is selected only if it is
the only blocker of some triggered action at some
time.
Maximizing preferences of selected statements

44
Generation (cont) heuristics

Knowledge base K
a causes g
b causes g
Initial condition I intially f
Observation O eventually g
(K,I) does not entail O
Hypothesis space to expand K with rules among
f triggers a
f triggers b
Hypotheses f triggers a , or f
triggers b

45
Case study p53 network
46
Tumor suppression by p53

p53 has 3 main functional domains
N terminal transactivator domain
Central DNA-binding domain
C terminal domain that recognizes DNA damage
Appropriate binding of N terminal activates
pathways that lead to protection of cell from
cancer.
Inappropriate binding (say to Mdm2) inhibits p53
induced tumor suppression.

47
p53 knowledge base

Stress
high(UV ) triggers upregulate(mRNA(p53))
Upregulation of p53
upregulate(mRNA(p53)) causes high(mRNA(p53))
high(mRNA(p53)) triggers translate(p53)
translate(p53) causes high(p53)

48
p53 knowledge base (cont.)

Tumor suppression by p53
high(p53) inhibits growth(tumor)

49
p53 knowledge base (cont)

Interaction between Mdm2 and p53
high(p53), high(mdm2) triggers bind(p53,mdm2)
bind(p53,mdm2) causes bound(dom(p53,N))
bind(p53,mdm2) causes high(p53 mdm2),
bind(p53,mdm2) causes high(p53),high(mdm2)

50
Hypothesis formation

Experimental observation
I initially high(UV), high(mdm2), high(ARF)
O eventually tumorous
(K,I) does not entail O
Need to hypothesize the role of ARF.

51
Constructing hypothesis space

Levels of ARF and p53 correlate
high(ARF) triggers upregulate(mRNA(p53))
high(p53) triggers upregulate(mRNA(ARF))

52
Constructing (cont)

Interactions of ARF with the known proteins
bind(p53,ARF) causes bound(dom(p53,N))

53
Constructing (cont)

Influence of X (ARF) on other interactions
high(ARF) triggers upreg(mRNA(p53))
high(ARF) triggers translate(p53)
high(ARF) triggers bind(p53,mdm2)

54
Twelve Generated Hypothesis such as

high(UV) triggers upregulate(mRNA(ARF))
high(ARF), high(mdm2) triggers bind(ARF,mdm2)

55
Conclusion of part 2

Goal Automation of hypothesis formation (with
respect to interactions and pathways)
Approach Viewed known qualitative aspects of
cell activities as a knowledge base
Used knowledge representation language that
Can express defaults
Allows reasoning with incomplete knowledge
Can express reasoning as well as problem solving
rules
Developed a system BioSigNet-RRH
Formalizing and reasoning about hypotheses
Illustration Hypothesizing the role of ARF
protein in the p53 network.

56
Future Work on Reasoning about Biochemical
Networks (Part I and II)

Further development of the language
Validation with respect to larger networks
Kohns map
Networks in Reactome and other repositories
Going from prototype to deployable systems
Scaling up challenges
Recent advances in automatic planning
Integration with Biopax

57
Part III CBioC

http//cbioc.org

58
Do we have enough knowledge in the various
databases

Some have been curated into databases.
But there is much more in the literature.
So what do we do?

59
Current status of curation from text

About 15 million abstracts in Pubmed
3 million published by US and EU researchers
during 1994-2004 (800 articles per day)
300 K articles published so far reporting
protein-protein interactions in human, yeast and
mouse.
BIND (in 7 yrs) -- 23K DIP 3K MINT 2.4K.

60
Premise High cost of human curation

Overwhelming cost of large curation efforts may
be unsustainable for long periods
BIND Nov 2005 bad news.
Operated for 7 years
Listed over 100 curators programmers
CND 29 million received in 2003, plus other
funding
Curation efforts of AFCS has recently stopped.
Lack of funding for some genome annotation
projects.

61
Premise summary

Human curation of text is expensive.
Human curation of text is not scalable.
Human curation of text is not sustainable.

62
Why not resort to computers? do automatic
extraction

Lessons from DARPA funded MUCs (message
understanding conferences) in 90s for a decade
and at the cost of tens of millions of dollars.
Getting to 60 recall and precision is quick
Then every 5 improvement is about a years work.
Even when we get to 90 for an individual entity
extraction
for recognizing 4 related entities (.9)4 .64
Lessons from Biomedical text extraction
No proper evaluation.
Recognized that recall and precision is not very
good even in the best systems.

63
What do we do?

How do we curate not only the existing articles,
but also the future articles?
Too important to give up!
Need to think of a new way to do it.
Faster computers, better sequencing technology
and better algorithms came to the rescue of the
Human Genome project.
Hmm. What resources are we overlooking?

64
Key Idea

If lots of articles are being written then lot of
people are writing them and lot of people are
reading them.
If only we could make these people (the authors
and the readers) contribute to the curation
effort
Especially the readers the ones who need the
curated data!

65
Mass collaboration has worked in

Wikipedia
Project Gutenberg
Netflix rating
Amazon rating
Etc.

66
Mass collaborative curation initial hurdles

An average reader
(S)he is not normally interested in filling a
blank curation form.
We can not make an average reader go though
curation training.
So it has to be very different from just making
the existing curation tools available to the mass
and expect them to contribute.

67
Mass collaborative curation key initial ideas

Make it very easy
user need not remember where (which database,
which web page) to put the curated knowledge.
Curation opportunity should present itself
seamlessly.
Curation should not be a burden to an average
user
Make the curated knowledge thin.
There should be immediate rewards
Do not start with a blank slate.

68
Realization of the key ideas a biologist with a
gene name

Goes to Pubmed, types the gene name, clicks on
one of the abstracts
Curation panel presents itself automatically
Our approach calls for researchers to contribute
to the curation of facts as they read and
research over the web
But not with a blank slate
No one wants to be the first one!
Automatic extraction jump-starts the process, and
then researchers improve upon the extracted data,
ironing out inconsistencies by subsequent edits
on a massive scale.
Thin Schemas
Average users turned off by traditional wide
schemas
Wide schemas need to be broken down.

69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
(No Transcript)
74
(No Transcript)
75
(No Transcript)
76
(No Transcript)
77
(No Transcript)
78
(No Transcript)
79
Summary

Information/curation window pops up
automatically.
Automatic extraction is used as a boot strap so
that no user is working on a blank slate.
Users vote on correctness, make corrections, add
fact.
Suppose 60 precision and recall of automatic
extraction system
A person will have an easier time discarding 40
of wrongly extracted text than identifying 60 of
correct entries and entering them!

80
Very useful byproducts

Avoids some problems with existing human curation
approach
Curators bias
Curators miss things
Curators have disagreements
Slow access to newest findings
Researchers at large have little or no control
over what gets curated and when
A large curated corpus of text gets created
Very useful to evaluate and improve automated
extraction systems.

81
Current status of CBioC future plans

Basic system, as described, is ready
Being populated with
Facts from existing databases (BIND etc.)
Facts extracted using our extraction system
Querying mechanism
Answer display
Future work
Voter confidence issues

82
Conclusion

Collecting what is known
Reasoning with what is known
Hypothesizing what is unknown (based
on observations)

83
Open Invitation

We are building and eager to help other groups
build knowledge bases in particular domains to
Predict impact of interventions
Plan (therapy design) to make a pathway behave in
a desired way
Explain observation
Hypothesize new knowledge
Further improvements to and adaptation of CBioC

84
Acknowledgements

BioSignet
Nam Tran, Ph.D thesis on this, Postdoc _at_ Yale
Karen Chancellor, Ph.D student
Michael Berens and his group (Ana Joy, Nhan Tran)
Lokesh Joshi and his group (Vinay Nagraj)
CBioc Graciela Gonzalez, Lian Yu, Luis Tari,
Tony Gitter, Amanda Ziegler, Ryan Wendt,
Prabhdeep Singh.
Other projects
BioQA
Biogenenet

85
Thank you!

Write a Comment

User Comments (0)