Title: Symbolic model checking of biochemical systems Logic programming steps towards formal biology Fran
1Symbolic model checking of biochemical
systemsLogic programming steps towards formal
biologyFrançois Fages, INRIA Rocquencourt
http//contraintes.inria.fr/
- Joint work with
and - Nathalie Chabrier-Rivier
Sylvain Soliman - In collaboration with ARC CPBIO
http//contraintes.inria.fr/cpbio - Alexander Bockmayr, Vincent Danos, Vincent
Schächter et al.
2Current revolution in Biology
- Elucidation of high-level biological processes
- in terms of their biochemical basis at the
molecular level. - Mass production of genomic and post-genomic data
- ARN expression, protein synthesis,
protein-protein interactions, - Need for a strong parallel effort on the formal
representation of biological processes. - Need for formal tools for modeling and reasoning
about their global behavior.
3Formalisms for modeling biochemical systems
- Diagrammatic notation
- Boolean networks Thomas 73
- Milners pcalculus Regev-Silverman-Shapiro
99-01, Nagasali et al. 00 - Concurrent transition systems Chabrier-Chiaverini
-Danos-Fages-Schachter 03 - Biochemical abstract machine BIOCHAM
Chabrier-Fages-Soliman 03 - Pathway logic Eker-Knapp-Laderoute-Lincoln-Me
seguer-Sonmez 02 - Bio-ambients Regev-Panina-Silverman-Cardelli-Shap
iro 03 - Differential equations
- Hybrid Petri nets Hofestadt-Thelen 98, Matsuno
et al. 00 - Hybrid automata Alur et al. 01, Ghosh-Tomlin 01
- Hybrid concurrent constraint languages
Bockmayr-Courtois 01
4Our goal
- Beyond simulation, provide formal tools for
querying, validating and completing biological
models. - Our proposal
- Use of temporal logic CTL as a query language for
models of biological processes - Use of concurrent transition systems for their
modeling - Use of symbolic and constraint-based model
checkers for automatically evaluating CTL queries
in qualitative and quantitative models. - Use of inductive logic programming for learning
models EU APRIL 2 - In course, learn and teach bits of biology with
logic programs.
5Plan of the talk
- Introduction
- A simple algebra of cell molecules
- Concurrent transition systems of biochemical
reactions - Example of the mammalian cell cycle control
- Temporal logic CTL as a query language
- Computational results with BIOCHAM
- Learning models
- An experiment with inductive logic programming
- Quantitative models
- Simulation with differential equations
- Constraint-based model checking
- Conclusion
6References
- A wonderful textbook
- Molecular Cell Biology. 5th Edition, 1100
pagesCD, Freeman Publ. - Lodish, Berk, Zipursky, Matsudaira, Baltimore,
Darnell. Nov. 2003. - Genes and signals. Ptashne, Gann. CSHL Press.
2002. - Modeling dynamic phenomena in molecular and
cellular biology. - Segel. Cambridge Univ. Press. 1987.
- Modeling and querying bio-molecular interaction
networks. - Chabrier, Chiaverini, Danos, Fages, Schächter. To
appear in TCS. 2003. - The biochemical abstract machine BIOCHAM.
Chabrier, Fages, Soliman. http//contraintes.inria
.fr/BIOCHAM
72. A Simple Algebra of Cell Molecules
- Small molecules covalent bonds (outer electrons
shared) 50-200 kcal/mol - 70 water
- 1 ions
- 6 amino acids (20), nucleotides (5),
- fats, sugars, ATP, ADP,
- Macromolecules hydrogen bonds, ionic,
hydrophobic, Waals 1-5 kcal/mol - Stability and bindings determined by the number
of weak bonds 3D shape - 20 proteins (50-104 amino acids)
- RNA (102-104 nucleotides AGCU)
- DNA (102-106 nucleotides AGCT)
8Structure levels of proteins
- 1) Primary structure word of n amino acids
residues (20n possibilities) - linked with C-N bonds
- ICLP
- Isoleucine Cysteine Leucine Proline
- 2) Secondary word of m a-helix, b-strands,
random coils, (3m-10m) - stabilized by hydrogen
bonds H---O - 3) Tertiary 3D structure spatial folding
-
stabilized by -
hydrophobic -
interactions
9Formal proteins
- Cyclin dependent kinase 1 Cdk1
- (free, inactive)
- Complex Cdk1-Cyclin A Cdk1CycB
- (low activity)
- Phosphorylated Cdk1thr161-CycB
- at site threonine 161
- (high activity)
-
(BIOCHAM syntax)
10Gene expression DNA ? RNA ? protein
- DNA word over 4 nucleotides Adenine, Guanine,
Cytosine, Thymine - double helix of pairs A--T and C---G
- Replication DNA synthesis
- Genes parts of DNA
- Transcription RNA copying from a gene
-
- ERCC1-(PRB-JUN-CFOS)
11Genome Size
Species Genome size Chromosomes Coding DNA
E. Coli (bacteria) 5 Mb 1 circular 100
S. Cerevisae (yeast) 12 Mb 16 70
Mouse, Human 3 Gb 20, 23 15
15 Gb
140 Gb
3,200,000,000 pairs of nucleotides single
nucleotide polymorphism 1 / 2kb
12Genome Size
Species Genome size Chromosomes Coding DNA
E. Coli (bacteria) 4 Mb 1 100
S. Cerevisae (yeast) 12 Mb 16 70
Mouse, Human 3 Gb 20, 23 15
Onion 15 Gb 8 1
140 Gb
13Genome Size
Species Genome size Chromosomes Coding DNA
E. Coli (bacteria) 4 Mb 1 100
S. Cerevisae (yeast) 12 Mb 16 70
Mouse, Human 3 Gb 20, 23 15
Onion 15 Gb 8 1
Lungfish 140 Gb 0.7
14Algebra of Cell Molecules
- E NameE-EEE,,E(E) S _ESS
- Names proteins, gene binding sites, molecules,
abstract processes - - binding operator for protein complexes, gene
binding sites, - Non associative, non commutative (could be in
most cases) - modification operator for phosphorylated
sites, - Associative, Commutative, Idempotent.
- solution operator, soup aspect, Assoc.
Comm. Idempotent, Neutral _ - No membranes, no transport formalized. Bitonal
calculi Cardelli 03.
15Plan of the talk
- Introduction
- A simple algebra of cell molecules
- Concurrent transition systems of biochemical
reactions - Example of the mammalian cell cycle control
- Temporal logic CTL as a query language
- Computational results with BIOCHAM
- Learning models
- An experiment with inductive logic programming
- Quantitative models
- Simulation with differential equations
- Constraint-based model checking
- Conclusion
163. Concurrent Transition Syst. of Biochemical
Reactions
- Enzymatic reactions
- R SgtS SEgtS SRgtS SltgtS
SltEgtS - (where AltgtB stands for AgtB BgtA and ACgtB
for ACgtBC, etc.) - define a concurrent transition system over
integers denoting the multiplicity of the
molecules (multiset rewriting). - One can associate a finite abstract CTS over
boolean state variables denoting the
presence/absence of molecules - which correctly over-approximates the set of all
possible behaviors - If we translate a reaction ABgtCD by 4 rules
for possible consumption - AB?ABCD AB??AB CD
- AB??A?BCD AB?A?BCD
17Four Rule Schemas
- Complexation A B gt A-B
- Cdk1CycB gt Cdk1CycB
- Phosphorylation A Cgt Ap
- Cdk1CycB Myt1gt Cdk1thr161-CycB
- Cdk1thr14,tyr15-CycB Cdc25Ntermgt
Cdk1-CycB - Synthesis _ Cgt A.
- _ Ge2-E2f13-Dp12gt CycA
- Degradation A Cgt _.
- CycE UbiProgt _ (not for CycE-Cdk2 which
is stable)
18An Actin-Myosin Engine with ATP fuel
- A
two-stroke nano-engine - Myosin ATP gt Myosin-ATP
- Myosin-ATP gt Myosin ADP
- http//www.sci.sdsu.edu/movies
http//www-rocq.inria.fr/sosso/icem
a2
19Cell Cycle G1 ? DNA Synthesis ? G2 ? Mitosis
- G1 CdK4-CycD
- Cdk6-CycD
- Cdk2-CycE
- S Cdk2-CycA
- G2
- M Cdk1-CycA
- Cdk1-CycB
20Mammalian Cell Cycle Control Map Kohn 99
21Kohns map detail for Cdk2
- Complexation with CycA and CycE
Phosphorylation sites PY15 and P - Concurrent Transition Rules
- cdk2cycA gt cdk2-cycA.
- cdk2p2cycA gt cdk2p2-cycA.
- cdk2p1cycA gt cdk2p1-cycA.
- cdk2p1,p2cycA gt cdk2p1,p2-cycA.
- cdk2cycE gt cdk2-cycE.
- cdk2cycEp1 gt cdk2-cycEp1.
- cdk2p2cycE gt cdk2p2-cycE.
-
- 700 rules, 165 proteins and genes, 500 variables,
2500 states.
22Translation in Prolog
- Encode states with a single predicate
p(A,B,C,D,E) - AB?CD.
p(1,1,_,_,E)-p(_,_,1,1,E). - C? A.
p(_,B,1,D,E)- p(1,B,_,D,E). - Thm. Delzanno-Podelski 99 Predecessor(S)
TP(S) - Backward analysis by computing lfp(TP?p(x)-s).
- CLP-based Deductive Model Checker DMC
Delzanno-Podelski 99 - More efficient implementation using
state-of-the-art symbolic model-checker NuSMV
Cimatti Clarke Giunchiglia Giunchiglia Pistore
02.
23Plan of the talk
- Introduction
- A simple algebra of cell molecules
- Concurrent transition systems of biochemical
reactions - Example of the mammalian cell cycle control
- Temporal logic CTL as a query language
- Computational results with BIOCHAM
- Learning models
- An experiment with inductive logic programming
- Quantitative models
- Simulation with differential equations
- Constraint-based model checking
- Conclusion
244. Temporal Logic CTL as a Query Language
Choice Time E exists A always
X next time EX(f) AX(f)
F finally EF(f) ? AG(?f) AF(f) liveness
G globally EG(f) ? AF(? f) AG(f) safety
U until E (f1 U f2) A (f1 U f2)
25Kripke Structures
- A Kripke structure K is a triple (S R L) where
S is a set of states, and R?SxS is a total
relation. - s f if f is true in s,
- s E f if there is a path ? from s such that ?
f, - s A f if for every path ? from s, ? f,
- ? f if s f where s is the starting state
of ?, - ? X f if ?1 f,
- ? F f if there exists k gt0 such that ?k f,
- ? G f if for every k gt0, ?k f,
- ? f1 U f2 iff there exists kgt0 such that ?k
f for all j lt k ?j f. - Following Emerson 90 we identify a formula f
to the set of states which satisfy it f s?S
s f .
26Symbolic Model Checking
- Model Checking is an algorithm for computing, in
a given finite Kripke structure the set of states
satisfying a CTL formula s?S s f . - Basic algorithm represent K as a graph and
iteratively label the nodes with the subformulas
of f which are true in that node. - Add f to the states satisfying f
- Add EF f (EX f) to the (immediate) predecessors
of states labeled by f - Add E(f1 U f2 ) to the predecessor states of f2
while they satisfy f1 - Add EG f to the states for which there exists a
path leading to a non trivial strongly connected
component of the subgraph of states satisfying f - Symbolic model checking use OBDDs to represent
states and transitions as boolean formulas (S is
finite).
27Biological Queries (1/3)
- About reachability
- Given an initial state init, can the cell produce
some protein P? init ? EF(P) - Which are the states from which a set of products
P1,. . . , Pn can be produced simultaneously?
EF(P1Pn) - About pathways
- Can the cell reach a state s while passing by
another state s2? init ? EF(s2EFs) - Is state s2 a necessary checkpoint for reaching
state s? ?EF(?s2U s) - Is it possible to produce P without using nor
creating Q? EF(?Q U s) - Can the cell reach a state s without violating
some constraints c? init ? EF(cUs)
28Biological Queries (2/3)
- About stability
- Is a certain (partially described) state s a
stable state? s?AG(s) s?AG(s) (s denotes both the
state and the formula describing it). - Is s a steady state (with possibility of
escaping) ? s?EG(s) - Can the cell reach a stable state?
init?EF(AG(s))not a LTL formula. - Must the cell reach a stable state?
init?AF(AG(s)) - What are the stable states? Not expressible in
CTL Chan 00. - Can the system exhibit a cyclic behavior w.r.t.
the presence of P ? init ? EG((P ? EF ?P) (?P ?
EF P))
29Biological Queries (3/3)
- About the correctness of the model
- Can one see the inaccuracies of the model and
correct them? - Exhibit a counterexample pathway or a
witness. Suggest refinements of the model or
biological experiments to validate/invalidate the
property of the model. - About durations
- How long does it take for a molecule to become
activated? - In a given time, how many Cyclins A can be
accumulated? - What is the duration of a given cell cycles
phase? - CTL operators abstract from durations. Time
intervals can be modeled in FO by adding
numerical arguments for start times and durations.
30Cell to Cell Signaling by Hormones and Receptors
- Receptor Tyrosine Kinase RTK
- RAF RAFK -gt RAF-RAFK
- RAFp RAFPH -gt RAFp-RAFPH
- MEKp RAFp -gt MEKp-RAFp
-
- RAF-RAFK -gt RAF RAFK.
- RAFp-RAFPH -gt RAFp RAFPH.
- MEKp-RAFp -gt MEKp RAFp.
-
- RAF-RAFK -gt RAFK RAFp.
- RAFp-RAFPH -gt RAF RAFPH.
- MEKp-RAFp -gt MEKpp RAFp.
31Cell to Cell Signaling by Hormones and Receptors
- Receptor Tyrosine Kinase RTK
- RAF RAFK -gt RAF-RAFK
- RAFp RAFPH -gt RAFp-RAFPH
- MEKp RAFp -gt MEKp-RAFp
-
- RAF-RAFK -gt RAF RAFK.
- RAFp-RAFPH -gt RAFp RAFPH.
- MEKp-RAFp -gt MEKp RAFp.
-
- RAF-RAFK -gt RAFK RAFp.
- RAFp-RAFPH -gt RAF RAFPH.
- MEKp-RAFp -gt MEKpp RAFp.
MEKp is a checkpoint for the cascade (producing
MAPKpp) ?- nusmv(!(E(!(MEKp) U MAPKpp))). true The
PH complexes are only here to "slow down" the
cascade ?- nusmv(E(!(MEKp-MEKPH) U MAPKpp)). true
32Cell Cycle G1 ? DNA Synthesis ? G2 ? Mitosis
- G1 CdK4-CycD
- Cdk6-CycD
- Cdk2-CycE
- S Cdk2-CycA
- G2
- M Cdk1-CycA
- Cdk1-CycB
33Mammalian Cell Cycle Control Benchmark
- 700 rules, 165 proteins and genes, 500 variables,
2500 states. - BIOCHAM NuSMV model-checker time in seconds
Initial state G2 Query Time
compiling 29
Reachability G1 EF CycE 2
Reachability G1 EF CycD 1.9
Reachability G1 EF PCNA-CycD 1.7
Checkpoint for mitosis complex ?EF (? Cdc25Nterm U Cdk1Thr161-CycB) 2.2
Cycle EG ( (CycA ? EF ? CycA) ? (? CycA ? EF CycA)) 31.8
34Plan of the talk
- Introduction
- A simple algebra of cell molecules
- Concurrent transition systems of biochemical
reactions - Example of the mammalian cell cycle control
- Temporal logic CTL as a query language
- Computational results with BIOCHAM
- Learning models
- An experiment with inductive logic programming
- Quantitative models
- Simulation with differential equations
- Constraint-based model checking
- Conclusion
355. Learning Models
- Basic idea learn reaction rules from temporal
properties of the system. - Learning of yeast cell cycle rules from
reachability properties and counterexamples with
Progol Muggleton 00. - reaction(m_CP,m_Y,m_pM).
- reaction(m_CP,m_C2).
- reaction(m_pM,m_M).
- reaction(m_M,m_C2,m_YP).
- reaction(m_C2,m_CP).
- reaction(m_YP,).
- reaction(,m_Y).
- pathway(S1,S2) - same(S1,S2).
- pathway(S1,S2) - reaction(L1,L2),
transition(S1,L1,S3,L2),
pathway(S3,S2).
36Inductive Logic Programming
pathway(m_CP,m_Y,m_M). pathway(m_CP,m_Y,m_M,m_pM). pathway(m_CP,m_Y,m_M,m_Y). pathway(m_CP,m_Y,m_M,m_Y,m_pM). pathway(m_CP,m_Y,m_M,m_CP). pathway(m_CP,m_Y,m_M,m_CP,m_Y). pathway(m_CP,m_Y,m_M,m_CP,m_pM). pathway(m_CP,m_Y,m_M,m_CP,m_Y,m_pM). pathway(m_pM,m_C2,m_YP). pathway(m_pM,m_M,m_C2,m_YP). pathway(m_pM,m_pM,m_C2,m_YP). pathway(m_pM,m_M,m_pM,m_C2,m_YP). -pathway(,m_C2). -pathway(,m_CP). -pathway(,m_C2,m_CP). -pathway(,m_M). -pathway(,m_YP). -pathway(,m_YP, m_Y). -pathway(,m_Y,m_pM). -pathway(,m_CP,m_pM). -pathway(,m_Y,m_M). -pathway(m_CP, m_C2,m_YP). -pathway(m_CP,m_YP). -pathway(m_C2,m_YP). -pathway(m_Y,).
- reaction(m_pM,m_M) learned
- 6th PCRD APRIL 2 Applications of Probabilistic
Inductive Logic Progr. Luc de Raedt, Univ.
Freiburg, Stephen Muggleton, Univ. London.
37Plan of the talk
- Introduction
- A simple algebra of cell molecules
- Concurrent transition systems of biochemical
reactions - Example of the mammalian cell cycle control
- Temporal logic CTL as a query language
- Computational results with BIOCHAM
- Learning models
- An experiment with inductive logic programming
- Quantitative models
- Simulation with differential equations
- Constraint-based model checking
- Conclusion
386. Quantitative Models
- Enzymatic reactions with rates k1 k2 k3
-
- ES ?k1 C ?k2 EP
- ES ?k3 C
- can be compiled by the law of mass action into a
system of - Ordinary Differential Equations
- dE/dt -k1ES(k2k3)C
- dS/dt -k1ESk3C
- dC/dt k1ES-(k2k3)C
- dP/dt k2C
39Circadian Cycle Model
- C' -(k1C)-k4C-kdCC k2CNk3P2T2
- CN' k1C-k2CN-kdNCN
- MP' (KIPnnusP)/(KIPnCNn)
- -kd MP-(numPMP)/(KmPMP)
- MT' (KITnnusT)/(KITnCNn)
- -MT t(kdnumT/(KmTMT))
- P0' ksPMP-kdP0-(V1PP0)/( K1PP0)
- (V2PP1)/(K2PP1)
- P1' (V1PP0)/(K1PP0)-kdP1 -(V2PP1)/(K2PP1)
- -(V3PP1)/( K3PP1)(V4PP2)/(K4PP2)
- P2' k4C(V3PP1)/(K3PP1) -kdP2-(V4PP2)/(K4P
P2) - -(nudPP2)/(KdPP2)-k3P2T2
- T0' ksTMT-kdT0-(V1TT0)/( K1TT0)(V2TT1)/(K2
TT1) - T1' (V1TT0)/(K1TT0)-kdT1 -(V2TT1)/(K2TT1)-(
V3TT1)/( K3TT1)(V4TT2)/(K4TT2) - T2' k4C(V3TT1)/(K3TT1) -k3P2T2-(V4TT2)/(K
4TT2) -T2(kdnudT/(KdTT2))
40Gene Interaction Networks
- Gene interaction example Bockmayr-Courtois 01
- Hybrid Concurrent Constraint Programming HCC
Saraswat et al. - 2 genes x and y.
- dx/dt 0.01 0.02x if y lt 0.8
- dx/dt 0.02x if y 0.8
- dy/dt 0.01x
41Concurrent Transition System
- Time discretized using Eulers method
(Runge-Kutta method in HCC) - y lt 0.8 ? x x dt(0.01-0.02x) , y y
dt0.01x - y 0.8 ? x x dt(0.01-0.02x) , y y
dt0.01x - Initial condition x0, y0.
- CLP(R) program
- Init - X0, Y0, p(X,Y).
- p(X,Y)-Xgt0, Ygt0, Ylt0.8,
- X1X-0.02X0.01, Y1Y0.01X,
p(X1,Y1). - p(X,Y)-Xgt0, Ygt0, Ygt0.8,
- X1X-0.02X, Y1Y0.01X,
p(X1,Y1).
42Proving CTL properties by computing fixpoints of
CLP programs
Theorem Delzanno Podelski 99
EF(f)lfp(TP?p(x)-f), EG(f)gfp(TP?f ).
Safety property AG(?f) iff ?EF(f) iff
init?lfp(TP?f) Liveness property AG(f1?AF(f2))
iff init?lfp(TP?f1?gfp(TP?f2) ) Prolog-based
implementation in CLP(R,B) Delzanno
00 Applications to life in silico Proof of
protocols, cache consistency, etc. Delzanno 01
43Deductive Model Checker DMC Gene Interaction
- r(init, p(s_s,A,B), A0,B0).
- r(p(s_s,A,B), p(s_s,C,D), Agt0,Bgt0.8,CA-0.02A,
DB0.01A). - r(p(s_s,A,B), p(s_s,C,D), Agt0,Bgt0,Blt0.8,
-
CA-0.02A0.01,DB0.01A). - ?- prop(P,S).
- P unsafe, S ps(xgt0.6)
- ?- ti.
- Property satisfied. Execution time 0.0
- ?- ls.
- s(0, p(s_s,A,_), Agt0.6, 1, (0,0)).
44Demonstration DMC (continued)
- ?- prop(P,S).
- P unsafe, S ps(xgt0.2) ?
- ?- ti.
- Property NOT satisfied. Execution time 1.5
- ?- ls.
- s(0, p(s_s,A,_), Agt0.2, 1, (0,0)).
- s(1, p(s_s,A,B), Blt0.8,Bgt-0.0,Agt0.1938775510204
0816, 2, (2,1)). -
- s(26, p(s_s,A,B), Bgt0.0,Agt0.0,
- B0.1982676351105516Alt0.7741338175552753,
27, (2,26)). - s(27, init, , 28, (1,27)).
-
457. Conclusion
- The great ambition of logic programming is to
make of programming a modeling task in the first
place, with equations, constraints and logical
formulae. - In this respect, computational molecular biology
offers numerous challenges to the logic
programming community at large. - Besides combinatorial search and optimization
problems coming from molecular biology (DNA and
protein sequence comparison, protein structure
prediction,) there is a need to model globally
the system at hand and automate reasoning on all
its possible behaviors.
46Conclusion
- The biochemical abstract machine BIOCHAM project
aims at developing - Qualitative models of complex biochemical
processes - Intracellular and extracellular signaling,
cell-cycle control, http//contraintes.inria.fr/
CMBSlib - Prolog-based implementation BDD symbolic
model-checking - ILP-based learning of models from temporal
properties 6thPCRD APRIL 2 - Membranes and transportation not modeled
- Bitonal algebras Cardelli et al. 03
BioAmbients, Brane calculi Cardelli et al. 03
47Perspectives for LP
- Quantitative models
- Differential equations
- Hybrid discrete-continuous time models
- Hybrid concurrent constraint programming
Bockmayr-Courtois-Eveillard 03 - CLP-based model-checking Delzanno-Podelski 99
Chabrier-Fages 03 - Multi-scale molecular-electro-physiological
models Sorine et al. 03 -
http//www-rocq.inria.fr/sosso/icema2 -
- http//www.sci.sdsu.edu/movies