Title: Special Topics in Computational Biology: Formal Methods in Systems Biology
1Special Topics in Computational BiologyFormal
Methods in Systems Biology
Spring, 2008
- Chris Langmead
- Department of Computer Science
- Carnegie Mellon University
- James Faeder
- Department of Computational Biology
- University of Pittsburgh School of Medicine
2General Info
- Course Numbers
- CMU 15-872(A)
- CMU 02-730
- Pitt CMPBIO 2045(Arts Sciences)
- Pitt MSCBIO 2045 (School of Medicine)
- Location Newell-Simon Hall (NSH) 3002 - OK?
- Time Tu, Th 130-250 PM
- Instructors
- Chris Langmead (cjl_at_cs.cmu.edu)
- Jim Faeder (faeder_at_pitt.edu)
- Office Hours By appointment (please email)
- Course Wiki http//bionetgen.org/index.php/Formal
_Methods_in_Systems_Biology (email Jim for
account)
3Course Format An Informal Course about Formal
Methods
- Introductory lectures (two weeks)
- Students will read and present research papers
- Sign up for open dates on the wiki (25 -
projects) - Students will design and complete a course
project on a subject of special interest - Grading is based on completion of work
- Flexibility depending on course enrollment
- Journal club
- Focused project
- Review article
4Encouragement
- Opportunity to learn about new areas and methods
that will be of direct interest in your research. - (True for the instructors as well)
- We will operate as a multi-disciplinary team
- Computer Scientists, Physicists, Chemists,
Engineers, Mathematicians, , Biologists - Good communication essential
5Products of the Course
- Comprehensive bibliography in wiki format
- Research projects leading to publishable results
in the field - Review article (?)
- Improved organization and presentation skills
- Participation on a multi-disciplinary team
6Introductions
- Your name
- Your university, department, research area(s) and
research advisor - Your educational background
- Computer Science, Math, Physics, etc.
- Goals taking the course
7Outline of Todays Lecture
- Definition of terms
- Goals
- Examples of Successful Abstractions
- Flux Balance Analysis
- Mass Action Kinetics
- Brief survey of topics
8Importance of Symbols
- Invention of symbol for zero and decimal system
for writing numbers among the greatest human
inventions. - 3 known independent inventions
- In each case, development took centuries
- Major impact on trade, culture, and philosophy.
- Celebration of zero dot in Sanskrit poetry
- The dot on her forehead / Increases her beauty
tenfold,/ Just as a zero dot sunya-bindu
/Increases a number tenfold. -Biharilal
9Key Definitions - Formal Methods
- In computer science and software engineering,
formal methods are mathematically-based
techniques for the specification, development and
verification of software and hardware systems. - The use of formal methods for software and
hardware design is motivated by the expectation
that, as in other engineering disciplines,
performing appropriate mathematical analyses can
contribute to the reliability and robustness of a
design. - However, the high cost of using formal methods
means that they are usually only used in the
development of high-integrity systems, where
safety or security is important.
- WIKIPEDIA
10Expanded View of Formal Methods
- Formal abstractions that may be used to model
system of interest - In addition to sytems that can be formally
analyzed, we will consider representations that
can only be fully explored by simulations.
11Key Definitions - Systems Biology
- Systems biology is a relatively new biological
study field that focuses on the systematic study
of complex interactions in biological systems,
thus using a new perspective (integration instead
of reduction) to study them. - Particularly from 2000 onwards, the term is used
widely in the biosciences, and in a variety of
contexts. - Because the scientific method has been used
primarily toward reductionism, one of the goals
of systems biology is to discover new emergent
properties that may arise from the systemic view
used by this discipline in order to understand
better the entirety of processes that happen in a
biological system.
- WIKIPEDIA
12Origin of Systems Biology
- Completion of genome projects is major
inspiration - Provided parts list for the cell
- Next obvious step is to ask how parts work
together to carry out function?
13Vision for Role of Computer Science in Systems
Biology
- Computer science could provide the
abstractions needed for consolidating knowledge
of biomolecular systems - ...the abstractions, tools and methods used to
specify and study computer systems should
illuminate our accumulated knowledge about
biomolecular systems.
Regev and Shapiro, Cells as Computation, Nature
(2002).
14Abstract Representations in Biology
- DNA sequence represented by strings with 4 letter
alphabet (ATGC) - Protein sequence and structure
- Strings with 20 letter alphabet
- Set of 3D atomic coordinates (PDB file)
The KaiC hexamer, a Circadian clock protein. From
pdb.org.
15(Some) Desirable Properties of an Abstract
Representation
- Relevant / accurate
- Computable
- Understandable
- Extensible
- Scalable
- Modular
- Hierarchical
1-4 from Regev and Shapiro, Cells as
Computation, Nature (2002).
16An Irony
- CS community aims to provide powerful abstract
representations to improve understanding of
systems. - Manner of reporting results - technical reports
in conference proceedings - presents major
barrier to wider adoption by science and
engineering communities. - There is a need for better communication among
disciplines!
17Sometimes formalism creates a barrier
18(No Transcript)
19Example Red blood cell model
20Agenda
- We are looking for useful abstractions that can
improve our understanding of how biological
systems behave
21Goals
- Language(s) for constructing whole-cell models
(comprehensive, system-wide) - Formal analysis (reasoning) of such models
- Simulation of models on distributed systems
- Combination of analysis and simulation to predict
behavior of models - genotype ? phenotype
22Challenges
- Accuracy
- Missing interactions
- Computability
- Requirement to perform simulations for many
properties of interest - Poor scaling of simulations
- Understanding
- Problem of network visualization
- Extensibility
- Missing biophysics
- Scalability
- Need to compute behavior on multiple scales, e.g.
tissue?cell?cytoplasm?nucleus
23Mathematical vs. Computational Models
Consider an elementary chemical reaction
r1 A B -gt C
module A 0..N init N r1 (A gt 0) -gt kAB
(A A - 1) endmodule
How important is this distinction?
Fisher Henzinger, Nat. Biotechnol. (2007).
24Tension between Accuracy and Computability
- Application of formal methods requires that
elements of representation be relatively simple. - For example, a representation that includes all
analytical functions in mathematics might not be
useful - impossible to make predictions. - In general, increasing the complexity of the
representation limits ability for analysis. - Representations are sometimes chosen for
amenability to analysis rather than realism -
e.g. boolean networks. - Computational (executable) models tend to make
restrictions explicit.
25Some successful abstractions in systems biology
- Flux Balance Analysis
- Genome-wide models of metabolism
- Mass Action Kinetics
- Cell-cycle model
- Growth factor signaling model
26Network Reconstruction (2D Annotation)
B. O. Palsson, Nature Biotechnology 22, 1218 -
1219 (2004)
27Network Reconstruction (cont.)
- Wiring diagram for the components in a cell
- Elements are
- Molecular Components (Species)
- Interactions (Reactions)
- Additional detail can be added.
- Genome-wide reconstructions for metabolism are
available for many model organisms (including
Homo Sapiens!) - All such interactions are ultimately represented
by a genome-scale stoichiometric matrixa
two-dimensional genome annotation.
B. O. Palsson, Nature Biotechnology 22, 1218 -
1219 (2004)
28Overview of Flux Balance Analysis
- Genome-wide reconstruction of metabolic network
- Assume steady state
- Assume optimal growth (biomass production)
29Genome-Wide Reconstruction of Haemophilus
influenzae
Edwards, J. S. et al. J. Biol. Chem.
199927417410-17416
30Single and double deletion in the central
metabolic pathways of H. Influenzae
Edwards, J. S. et al. J. Biol. Chem.
199927417410-17416
31What Accounts for Success?
- Knowledge Base
- Metabolic chemistry known from gt50 years
biochemistry and genome sequence - Simple Abstraction
- Biochemistry reduced to list of reaction
stoiochimetries - Powerful Computation Method
- Highly optimized solvers for Linear Programming
problem - Extensibility
- Non-optimal growth in mutants
- Constraints arising from molecular crowding
32Cellular Signal Transduction
signaling complex
plasma membrane
adaptor
SH3 domain
33Mass Action Kinetics
Differential Equations
34Reaction Network Model of Signaling
Kholodenko et al., J. Biol. Chem. 274, 30169
(1999)
35Comparing Model and Experiment
Experimental Data
Simulation Results
36Benefits of Mass Action Kinetic Modeling
- Large knowledge base of signaling biochemistry
- Models dynamical behavior
- Computational Methods Well Established
- ODE solvers for continuous systems
- Nonlinear Dynamics Theory
- Extensibility
- Stochastic Simulation Algorithm for discrete
systems - Spatially-resolved models can be built on same
mass action equations
37Limitations of Mass Action Kinetic Modeling
- Rapidly expanding knowledge base
- Many components and interactions unknown
- Lack of precision
- ad hoc assumptions to limit combinatorial
explosion (next lecture) - Large sets of nonlinear ODEs are difficult to
simulate or analyze - No comprehensive models yet
38Map of Signaling Initiated by a Single Family of
Receptors
Oda and Kitano (2006) Mol. Syst. Biol.
39Map of Signaling Initiated by a Single Family of
Receptors
Analysis is limited to simple graph theoretic
measures and qualitative discussions of
architecture.
Oda and Kitano (2006) Mol. Syst. Biol.
40(Partial) List of Topics
- Boolean Networks
- Petri Nets
- Statecharts
- Process Algebras
- Agent-Based Modeling
- Hybrid Systems
- Model Checking
- Simulation Algorithms
41Brief Overview of Two Useful Abstractions
- Boolean Networks
- Petri Nets
- Statecharts
- Process Algebras
- Agent-Based Modeling
- Hybrid Systems
- Model Checking
- Simulation Algorithms
42Boolean Networks
BN model of cell cycle in budding yeast
G1
Li, F., et al. PNAS 101, 47814786 (2004).
43Boolean Networks
BN model of cell cycle in budding yeast
G1
Update
Li, F., et al. PNAS 101, 47814786 (2004).
44Boolean Networks
BN model of cell cycle in budding yeast
G1
Update
Blue arrows form stable basin of attraction
Li, F., et al. PNAS 101, 47814786 (2004).
45Balance Sheet for BNs
- Pro
- Models may be constructed on basis of scant data
- Fast computation
- Strong analysis tools (?)
- Good for reasoning about stability and robustness
- Con
- Two levels may not be enough
- Lack of compositionality
- Not hierarchical, but may be embedded in more
complex models.
Li S, Assmann SM, Albert R (2006) Predicting
Essential Components of Signal Transduction
Networks A Dynamic Model of Guard Cell Abscisic
Acid Signaling. PLoS Biol 4(10) e312
46Petri Nets
Chaouiya, C. Petri net modelling of biological
networks. Brief. Bioinform. 8, 210219 (2007).
47Petri Nets
Time Evolution
Chaouiya, C. Petri net modelling of biological
networks. Brief. Bioinform. 8, 210219 (2007).
48Petri Nets Generalize Network Reconstruction
p3
t2
p4
C corresponds to S
Chaouiya, C. Brief. Bioinform. 8, 210219 (2007).
49Some useful formal properties of PNs
- P-invariants ( ) Mass Conservation
- T-invariants ( ) Loops / Ele. Modes
- Reachability - whether a state can be reached
- Liveness - whether a transition can be fired
50Overview of PNs
- PNs are graphs, and provide tight connection
between visualization and modeling - PN formalism is isomorphic to network
reconstruction formalism (reaction networks) - Many extensions are possible to overcome
limitations - Colored Petri Nets, Hierarchical CPNs,
Multi-level PN, Stochastic PNs, etc. - Extensions provide further modeling capabilities
at the expense of analysis.
51Concluding Remarks
- Goal of course is to explore various
representations from CS literature that can be
used to model biomolecular systems. - What opportunities do these representations offer
in terms of analysis, simulation, understanding,
and scalability?
52Simulation Algorithms
- Requirements
- Asynchronous
- Stochastic
- (Modular)
- (Hierarchical)
- Methods
- ODEs
- Model reduction
- Kinetic Monte Carlo (aka Gillespies method)
For distributed computation
53Comparison of two models of vulval development in
C. elegans
Giurumescu, et al. PNAS 103, 13311336 (2006).
Fisher, et al. PNAS 102, 19511956 (2005).