Title: The Biochemical Abstract Machine BIOCHAM Logic programming steps towards formal biology Franois Fage
1The Biochemical Abstract Machine BIOCHAMLogic
programming steps towards formal
biologyFrançois Fages, INRIA Rocquencourt
http//contraintes.inria.fr/
- Joint work with
and - Nathalie Chabrier-Rivier
Sylvain Soliman - ARC CPBIO Process Calculi and Biology of
Molecular Networks - Alexander Bockmayr, LORIA Nancy, Vincent Danos,
CNRS Paris PPS, - Vincent Schächter, Genoscope Evry, et al.
- http//contraintes.inria.fr/cpbio/
2Current Revolution in Systems Biology
- Elucidation of high-level biological processes
- in terms of their biochemical basis at the
molecular level. - Mass production of genomic and post-genomic data
- ARN expression, protein synthesis,
protein-protein interactions, - Need for a strong parallel effort on the formal
representation of biological processes Systems
Biology. - Need for formal tools for modeling and reasoning
about their global behavior.
3Formalisms for Modeling Biochemical Systems
- Diagrammatic notation
- Boolean networks Thomas 73
- Milners picalculus Regev-Silverman-Shapiro
99-01, Nagasali et al. 00 - Concurrent transition systems Chabrier-Chiaverini
-Danos-Fages-Schachter 03 - Biochemical abstract machine BIOCHAM
Chabrier-Fages-Soliman 03 - Pathway logic Eker-Knapp-Laderoute-Lincoln-Me
seguer-Sonmez 02 - Bio-ambients Regev-Panina-Silverman-Cardelli-Shap
iro 03 - Differential equations
- Hybrid Petri nets Hofestadt-Thelen 98, Matsuno
et al. 00 - Hybrid automata Alur et al. 01, Ghosh-Tomlin 01
- Hybrid concurrent constraint languages
Bockmayr-Courtois 01
4Our Goal
- Beyond simulation, provide formal tools for
querying, validating and completing biological
models. - Our proposal
- Use of temporal logic CTL as a query language for
models of biological processes - Use of concurrent transition systems for their
modeling - Use of symbolic and constraint-based model
checkers for automatically evaluating CTL queries
in qualitative and quantitative models. - Use of inductive logic programming for learning
models - In course, learn and teach bits of biology with
logic programs.
5References
- A wonderful textbook
- Molecular Cell Biology. 5th Edition, 1100
pagesCD, Freeman Publ. - Lodish, Berk, Zipursky, Matsudaira, Baltimore,
Darnell. Nov. 2003. - Genes and signals. Ptashne, Gann. CSHL Press.
2002. - Modeling dynamic phenomena in molecular and
cellular biology. - Segel. Cambridge Univ. Press. 1987.
- Modeling and querying bio-molecular interaction
networks. - Chabrier, Chiaverini, Danos, Fages, Schächter. To
appear in TCS. 2004. - The biochemical abstract machine BIOCHAM.
Chabrier, Fages, Soliman. http//contraintes.inria
.fr/BIOCHAM
6Plan of the Talk
- Introduction
- A simple algebra of cell molecules
- Concurrent transition systems of biochemical
reactions - Example of the mammalian cell cycle control
- Temporal logic CTL as a query language
- Computational results with BIOCHAM
- Learning reaction rules
- An experiment with inductive logic programming
- Kinetics models
- Simulation with differential equations
- Hybrid systems
- Conclusion
72. A Simple Algebra of Cell Molecules
- Small molecules covalent bonds (outer electrons
shared) 50-200 kcal/mol - 70 water
- 1 ions
- 6 amino acids (20), nucleotides (5),
- fats, sugars, ATP, ADP,
- Macromolecules hydrogen bonds, ionic,
hydrophobic, Waals 1-5 kcal/mol - Stability and bindings determined by the number
of weak bonds 3D shape - 20 proteins (50-104 amino acids)
- RNA (102-104 nucleotides AGCU)
- DNA (102-106 nucleotides AGCT)
8Structure Levels of Proteins
- 1) Primary structure word of n amino acids
residues (20n possibilities) - linked with C-N bonds
- ICLP
- Isoleucine Cysteine Leucine Proline
- 2) Secondary word of m a-helix, b-strands,
random coils, (3m-10m) - stabilized by hydrogen
bonds H---O - 3) Tertiary 3D structure spatial folding
-
stabilized by -
hydrophobic -
interactions
9Formal proteins
- Cyclin dependent kinase 1 Cdk1
- (free, inactive)
- Complex Cdk1-Cyclin B Cdk1CycB
- (low activity)
- Phosphorylated form Cdk1thr161-CycB
- at site threonine 161
- (high activity)
-
(BIOCHAM syntax)
10Abstraction of gene expression DNA ? RNA ? protein
- DNA word over 4 nucleotides Adenine, Guanine,
Cytosine, Thymine - double helix of pairs A--T and C---G
- Replication DNA synthesis
- Genes parts of DNA
- Transcription RNA copying from a gene
-
- ERCC1-(PRB-JUN-CFOS)
11BIOCHAM Algebra of Cell Molecules
- E NameE-EEE,,E(E) S _ES
- Names molecules, proteins, gene binding sites,
abstract _at_processes - - binding operator for protein complexes, gene
binding sites, - Associative and commutative.
- modification operator for phosphorylated
sites, - Set of modified sites (Associative, Commutative,
Idempotent). - solution operator, soup aspect, Assoc.
Comm. Idempotent, Neutral _ - No membranes, no transport formalized. Bitonal
calculi Cardelli 03.
12Plan of the talk
- Introduction
- A simple algebra of cell molecules
- Concurrent transition systems of biochemical
reactions - Example of the mammalian cell cycle control
- Temporal logic CTL as a query language
- Computational results with BIOCHAM
- Learning reaction rules
- An experiment with inductive logic programming
- Kinetics models
- Simulation with differential equations
- Hybrid systems
- Conclusion
133. Concurrent Transition Syst. of Biochemical
Reactions
- Enzymatic reactions
- R SgtS SEgtS SRgtS SltgtS
SltEgtS - (where AltgtB stands for AgtB BgtA and ACgtB
for ACgtBC, etc.) - define a concurrent transition system over
integers denoting the multiplicity of the
molecules (multiset rewriting). - One can associate a finite abstract CTS over
boolean state variables denoting the
presence/absence of molecules - which correctly over-approximates the set of all
possible behaviors - a reaction ABgtCD is translated with 4 rules
for possible consumption - AB?ABCD AB??AB CD
- AB??A?BCD AB?A?BCD
14Six Rule Schemas
- Complexation A B gt A-B
Decomplexation A-B gt A B - Cdk1CycB gt Cdk1CycB
- Phosphorylation A Cgt Ap
Dephosphorylation Ap Cgt A - Cdk1CycB Myt1gt Cdk1thr161-CycB
- Cdk1thr14,tyr15-CycB Cdc25Ntermgt
Cdk1-CycB - Synthesis _ Cgt A.
- _ Ge2-E2f13-Dp12gt CycA
- Degradation A Cgt _.
- CycE _at_UbiProgt _ (not for CycE-Cdk2 which
is stable)
15MAPK Signaling Pathway
- RAF RAFK ltgt RAF-RAFK.
- RAFp1 RAFPH ltgt RAFp1-RAFPH.
- MEKP RAFp1 ltgt MEKP-RAFp1
- where p2 not in P.
- MEKPH MEKp1P ltgt MEKp1P-MEKPH.
- MAPKP MEKp1,p2 ltgt MAPKP-MEKp1,p2
- where p2 not in P.
- MAPKPH MAPKp1P ltgt MAPKp1P-MAPKPH.
- RAF-RAFK gt RAFK RAFp1.
- RAFp1-RAFPH gt RAF RAFPH.
- MEKp1-RAFp1 gt MEKp1,p2 RAFp1.
- MEK-RAFp1 gt MEKp1 RAFp1.
- MEKp1-MEKPH gt MEK MEKPH.
- MEKp1,p2-MEKPH gt MEKp1 MEKPH.
- MAPK-MEKp1,p2 gt MAPKp1 MEKp1,p2.
- MAPKp1-MEKp1,p2 gt MAPKp1,p2
MEKp1,p2. - MAPKp1-MAPKPH gt MAPK MAPKPH.
- MAPKp1,p2-MAPKPH gt MAPKp1 MAPKPH.
16MAPK Signaling Pathway
- RAF RAFK ltgt RAF-RAFK.
- RAFp1 RAFPH ltgt RAFp1-RAFPH.
- MEKP RAFp1 ltgt MEKP-RAFp1
- where p2 not in P.
- MEKPH MEKp1P ltgt MEKp1P-MEKPH.
- MAPKP MEKp1,p2 ltgt MAPKP-MEKp1,p2
- where p2 not in P.
- MAPKPH MAPKp1P ltgt MAPKp1P-MAPKPH.
- RAF-RAFK gt RAFK RAFp1.
- RAFp1-RAFPH gt RAF RAFPH.
- MEKp1-RAFp1 gt MEKp1,p2 RAFp1.
- MEK-RAFp1 gt MEKp1 RAFp1.
- MEKp1-MEKPH gt MEK MEKPH.
- MEKp1,p2-MEKPH gt MEKp1 MEKPH.
- MAPK-MEKp1,p2 gt MAPKp1 MEKp1,p2.
- MAPKp1-MEKp1,p2 gt MAPKp1,p2
MEKp1,p2. - MAPKp1-MAPKPH gt MAPK MAPKPH.
- MAPKp1,p2-MAPKPH gt MAPKp1 MAPKPH.
17Cell Cycle G1 ? DNA Synthesis ? G2 ? Mitosis
- G1 CdK4-CycD
- Cdk6-CycD
- Cdk2-CycE
- S Cdk2-CycA
- G2
- M Cdk1-CycA
- Cdk1-CycB
18Mammalian Cell Cycle Control Map Kohn 99
19Kohns map detail for Cdk2
- Complexation with CycA and CycE
- Phosphorylation sites PY15 and P
- Biocham Rules
- cdk2P cycA-C gt cdk2P-cycA-C
- where C in _,cks1 .
- cdk2P cycEQ-C gt cdk2P-cycEQ-C
- where C in _,cks1 .
- p57 cdk2P-cycA-C gt p57-cdk2P-cycA-C
- where C in _, cks1.
- cycE-C cdk2p2-cycE-Sgt cycET380-C
- where S in _, cks1 and C in _, cdk2?,
cdk2?-cks1 - 147-2733 rules, 165 proteins and genes, 500
variables, 2500 states.
20Plan of the talk
- Introduction
- A simple algebra of cell molecules
- Concurrent transition systems of biochemical
reactions - Example of the mammalian cell cycle control
- Temporal logic CTL as a query language
- Expressivity and computational results
- Learning reaction rules
- An experiment with inductive logic programming
- Kinetics models
- Simulation with differential equations
- Hybrid systems
- Conclusion
214. Temporal Logic CTL as a Query Language
22Kripke Structures
- A Kripke structure K is a triple (S R L) where
S is a set of states, and R?SxS is a total
relation. - s f if f is true in s,
- s E f if there is a path ? from s such that ?
f, - s A f if for every path ? from s, ? f,
- ? f if s f where s is the starting state
of ?, - ? X f if ?1 f,
- ? F f if there exists k gt0 such that ?k f,
- ? G f if for every k gt0, ?k f,
- ? f1 U f2 iff there exists kgt0 such that ?k
f for all j lt k ?j f. - Following Emerson 90 we identify a formula f
to the set of states which satisfy it f s?S
s f .
23Symbolic Model Checking
- Model Checking is an algorithm for computing, in
a given finite Kripke structure the set of states
satisfying a CTL formula s?S s f . - Basic algorithm represent K as a graph and
iteratively label the nodes with the subformulas
of f which are true in that node. - Add f to the states satisfying f
- Add EF f (EX f) to the (immediate) predecessors
of states labeled by f - Add E(f1 U f2 ) to the predecessor states of f2
while they satisfy f1 - Add EG f to the states for which there exists a
path leading to a non trivial strongly connected
component of the subgraph of states satisfying f - Symbolic model checking use OBDDs to represent
states and transitions as boolean formulas (S is
finite).
24Biological Queries (1/3)
- About reachability
- Given an initial state init, can the cell produce
some protein P? init ? EF(P) - Which are the states from which a set of products
P1,. . . , Pn can be produced simultaneously?
EF(P1Pn) - About pathways
- Can the cell reach a state s while passing by
another state s2? init ? EF(s2EFs) - Is state s2 a necessary checkpoint for reaching
state s? ?EF(?s2U s) - Is it possible to produce P without using nor
creating Q? EF(?Q U s) - Can the cell reach a state s without violating
some constraints c? init ? EF(c U s)
25Biological Queries (2/3)
- About stability
- Is a certain (partially described) state s a
stable state? s?AG(s) s?AG(s) (s denotes both the
state and the formula describing it). - Is s a steady state (with possibility of
escaping) ? s?EG(s) - Can the cell reach a stable state?
init?EF(AG(s))not a LTL formula. - Must the cell reach a stable state?
init?AF(AG(s)) - What are the stable states? Not expressible in
CTL Chan 00. - Can the system exhibit a cyclic behavior w.r.t.
the presence of P ? init ? EG((P ? EF ?P) (?P ?
EF P))
26Biological Queries (3/3)
- About the correctness of the model
- Can one see the inaccuracies of the model and
correct them? - Exhibit a counterexample pathway or a
witness. Suggest refinements of the model or
biological experiments to validate/invalidate the
property of the model. - About durations
- How long does it take for a molecule to become
activated? - In a given time, how many Cyclins A can be
accumulated? - What is the duration of a given cell cycles
phase? - CTL operators abstract from durations. Time
intervals can be modeled in FO by adding
numerical arguments for start times and durations.
27MAPK Signaling Pathway
- RAF RAFK ltgt RAF-RAFK.
- RAFp1 RAFPH ltgt RAFp1-RAFPH.
- MEKP RAFp1 ltgt MEKP-RAFp1
- where p2 not in P.
- MEKPH MEKp1P ltgt MEKp1P-MEKPH.
- MAPKP MEKp1,p2 ltgt MAPKP-MEKp1,p2
- where p2 not in P.
- MAPKPH MAPKp1P ltgt MAPKp1P-MAPKPH.
- RAF-RAFK gt RAFK RAFp1.
- RAFp1-RAFPH gt RAF RAFPH.
- MEKp1-RAFp1 gt MEKp1,p2 RAFp1.
- MEK-RAFp1 gt MEKp1 RAFp1.
- MEKp1-MEKPH gt MEK MEKPH.
- MEKp1,p2-MEKPH gt MEKp1 MEKPH.
- MAPK-MEKp1,p2 gt MAPKp1 MEKp1,p2.
- MAPKp1-MEKp1,p2 gt MAPKp1,p2
MEKp1,p2. - MAPKp1-MAPKPH gt MAPK MAPKPH.
- MAPKp1,p2-MAPKPH gt MAPKp1 MAPKPH.
28MAPK Signaling Pathway
- MEKp1 is a checkpoint for producing
MAPKp1,p2 - biocham !E(!MEKp1 U MAPKp1,p2)
- True
- The PH complexes are not compulsory for the
cascade - biocham !E(!MEKp1-MEKPH U MAPKp1,p2)
- false
- Step 1 rule 15
- Step 2 rule 1 RAF-RAFK present
- Step 3 rule 21 RAFp1 present
- Step 4 rule 5 MEK-RAFp1 present
- Step 5 rule 24 MEKp1 present
- Step 6 rule 7 MEKp1-RAFp1 present
- Step 7 rule 23 MEKp1,p2 present
- Step 8 rule 13 MAPK-MEKp1,p2 present
- Step 9 rule 27 MAPKp1 present
- Step 10 rule 15 MAPKp1-MEKp1,p2 present
- Step 11 rule 28 MAPKp1,p2 present
29Mammalian Cell Cycle Control Benchmark
- 700 rules, 165 proteins and genes, 500 variables,
2500 states. - BIOCHAM NuSMV model-checker time in seconds
30Plan of the talk
- Introduction
- A simple algebra of cell molecules
- Concurrent transition systems of biochemical
reactions - Example of the mammalian cell cycle control
- Temporal logic CTL as a query language
- Computational results with BIOCHAM
- Learning reaction rules
- An experiment with inductive logic programming
- Kinetics models
- Simulation with differential equations
- Hybrid systems
- Conclusion
315. Learning Reaction Weights and Rules
- Idea 1 learning reaction weights from temporal
properties - reaction weights restricts the non-determinism
(Markov models) - Idea 2 learn reaction rules from temporal
properties of the system. - Learning of cell cycle reaction rules from
reachability properties and counterexamples with
Progol Muggleton 00. - reaction(m_CP,m_Y,m_pM).
- reaction(m_CP,m_C2).
- reaction(m_pM,m_M).
- reaction(m_M,m_C2,m_YP).
- reaction(m_C2,m_CP).
- reaction(m_YP,).
- reaction(,m_Y).
- pathway(S1,S2) - same(S1,S2).
- pathway(S1,S2) - reaction(L1,L2),
transition(S1,L1,S3,L2),
pathway(S3,S2).
32Inductive Logic Programming
- reaction(m_pM,m_M) learned
- 6th PCRD APRIL 2 Applications of Probabilistic
Inductive Logic Progr. Luc de Raedt, Univ.
Freiburg, Stephen Muggleton, Imperial College
London.
33Plan of the talk
- Introduction
- A simple algebra of cell molecules
- Concurrent transition systems of biochemical
reactions - Example of the mammalian cell cycle control
- Temporal logic CTL as a query language
- Computational results with BIOCHAM
- Learning reaction rules
- An experiment with inductive logic programming
- Kinetics models
- Simulation with differential equations
- Hybrid system
- Conclusion
346. Kinetics Models
- Enzymatic reactions with rates k1 k2 k3
-
- ES ?k1 C ?k2 EP
- ES ?k3 C
- can be compiled by the law of mass action into a
system of - Michaelis-Menten Ordinary Differential Equations
(non-linear) - dE/dt -k1ES(k2k3)C
- dS/dt -k1ESk3C
- dC/dt k1ES-(k2k3)C
- dP/dt k2C
35 MAPK kinetics model
36Gene Interaction Networks
- Gene interaction example Bockmayr-Courtois 01
- Hybrid Concurrent Constraint Programming HCC
Saraswat et al. - 2 genes x and y.
- Hybrid linear approximation
- dx/dt 0.01 0.02x if y lt 0.8
- dx/dt 0.02x if y 0.8
- dy/dt 0.01x
37Concurrent Transition System
- Time discretization using Eulers method
- y lt 0.8 ? x x dt(0.01-0.02x) , y y
dt0.01x - y 0.8 ? x x dt(0.01-0.02x) , y y
dt0.01x - Initial condition x0, y0.
- Translation into a CLP(R) program (dt1)
- Init - X0, Y0, p(X,Y).
- p(X,Y)-Xgt0, Ygt0, Ylt0.8,
- X1X-0.02X0.01, Y1Y0.01X,
p(X1,Y1). - p(X,Y)-Xgt0, Ygt0, Ygt0.8,
- X1X-0.02X, Y1Y0.01X,
p(X1,Y1).
38Proving CTL properties by computing fixpoints of
Constraint Logic Programs
Theorem Delzanno Podelski 99
EF(f)lfp(TP?p(x)-f), EG(f)gfp(TP?f
). Safety property AG(?f) iff ?EF(f) iff
init?lfp(TP?f) Liveness property
AG(f1?AF(f2)) iff init?lfp(TP?f1?gfp(T P?f2 )
) Implementation in Sicstus-Prolog CLP(R,B)
Delzanno 00
39Deductive Model Checker DMC Gene Interaction
- r(init, p(s_s,A,B), A0,B0).
- r(p(s_s,A,B), p(s_s,C,D), Agt0,Bgt0.8,CA-0.02A,
DB0.01A). - r(p(s_s,A,B), p(s_s,C,D), Agt0,Bgt0,Blt0.8,
-
CA-0.02A0.01,DB0.01A). - ?- prop(P,S).
- P unsafe, S ps(xgt0.6)
- ?- ti.
- Property satisfied. Execution time 0.0
- ?- ls.
- s(0, p(s_s,A,_), Agt0.6, 1, (0,0)).
40Demonstration DMC (continued)
- ?- prop(P,S).
- P unsafe, S ps(xgt0.2) ?
- ?- ti.
- Property NOT satisfied. Execution time 1.5
- ?- ls.
- s(0, p(s_s,A,_), Agt0.2, 1, (0,0)).
- s(1, p(s_s,A,B), Blt0.8,Bgt-0.0,Agt0.1938775510204
0816, 2, (2,1)). -
- s(26, p(s_s,A,B), Bgt0.0,Agt0.0,
- B0.1982676351105516Alt0.7741338175552753,
27, (2,26)). - s(27, init, , 28, (1,27)).
-
41Conclusion
- The biochemical abstract machine BIOCHAM
provides - A first-order-rule-based language for modeling
biochemical systems - A powerful query language based on temporal logic
CTL - Models of complex biochemical processes,
- intracellular and extracellular signaling,
- cell-cycle control,
- a repository of models http//contraintes.inri
a.fr/CMBSlib - Implementation in Prolog model-checkers NuSMV
and DMC - Learning techniques investigated in APrIL 2
- PILP-based learning of reaction weights from
temporal properties - PILP-based learning of reaction rules from
temporal properties
42Perspectives
- Collaboration with biologists on BIOCHAM models
of the cell-cycle control - Colon cancer therapies, Domenjoud, UHP Nancy
- Chronotherapies, Clairambault, INSERM
- Hybrid concurrent constraint logic programming
Bockmayr Courtois 01, Saraswat 04 - Multi-scale molecular-electro-physiological
models Sorine et al. 03 -
http//www-rocq.inria.fr/sosso/icema2 -
- http//www.sci.sdsu.edu/movies