Title: Boolean and Probabilistic Boolean Networks as Models of Genomic Regulation
1Boolean and Probabilistic Boolean Networks as
Models of Genomic Regulation
- Ivan Ivanov
- Department of Veterinary Physiology and
Pharmacology, - Genomic Signal Processing Lab
- Texas AM University
- gsp.tamu.edu/people/ivan.html
2The Central dogma in cell biology
3Model based scientific approach
- Mathematical models allow for a formal and
unified description of physical phenomena
Experiment design
Mathematical Model
Data
Biology
Experiment
Inference
Any model that allows prediction could be
considered as a mathematical model
Prediction
4Goals
- Must incorporate rule-based dependencies between
genes - Rule-based dependencies may constitute important
biological information - Must allow to systematically study global network
dynamics - In particular, individual gene effects on
long-run network behavior - Must be able to cope with uncertainty
- Small sample size, noisy measurements, robustness
- Must permit quantification of the relative
influence and sensitivity of genes in their
interactions with other genes - This allows us to focus on individual (groups of)
genes
5Regulatory diagram for the activation of the
tumor-suppressor protein p53
Vogelstein, B., Lane, D. Levine, A. Surfing the
p53 network. Nature 408, 307-310 (2000)
6Challenges
- Biological systems function in exceedingly
parallel, nonlinear and extraordinarily
integrated fashion - Presence of protein-DNA feedback loops (negative
or positive) - Availability and quality of data
- Model selection
- Fine/Continuous or Coarse/Discrete
7cDNA microarray
8- Given
- Genes communicate/interact via the proteins they
encode - Protein production (transcription and
translation) is controlled by a multitude of
biochemical reactions which are in turn
influenced by many internal or external to the
cell factors. - Assumption
- Gene expression Xj of a particular gene i is a
random function xj(t, w) of the cell internal and
external environment. - Goal
- A good mathematical model for the dynamical
behavior of the genes
9Biochemical interactions network
Metabolic space
Metabolite 1
Metabolite 2
Microarrays
Protein space
Biochemical model
Protein 2
Protein 4
Complex 3-4
Protein 1
Protein 3
Gene 4
Gene 2
Biological phenomena
Relationship
Gene 3
Gene 1
Gene space
Variable
From Brazhnik et. Al. Gene networks how to put
the function in genomics, TRENDS in
Biotechnology, 20 (11), 2002
10Discrete Models
- Faithful representation of upregulated/expressed
and downregulated/repressed gene activity - Filter the noise in data
- Dynamical behavior can be clearly related to some
underlying biological phenomena - Fine details like protein concentrations or
kinetics of reactions cannot be captured
11Gene Networks Inference
Biochemical interaction network
Projection to the gene space
Gene 4
Gene 2
Gene 3
Gene Regulatory Network Model
Gene 1
Gene space
From Brazhnik et. Al. Gene networks how to put
the function in genomics, TRENDS in
Biotechnology, 20 (11), 2002
12Considerations
- Can it explain all the biological process?
- NO
- (Definition of context adding back the other
layers) - Can we understand better the physical phenomena?
- YES
- (Kauffman, attractorsphenotype, logical rules,
etc) - It is an useful model?
- YES
- (we can make some good predictions)
13Example of Cell Cycle Regulation
Logic diagram AND gate outputs
cdk2 p21/WAF1 is the input for a NOT gate NAND
gate outputs Rb
14Discussion
- How to derive a discrete representation of the
biological process in a consistent way ?? - How to define the quality of a mathematical model
to describe the biological model ?? - Obs here I dont use the word data. What is
important is the model, and the data is a way to
estimate its parameters !
Data
Biology
Model
Experiments
Parameters estimation
15Boolean Network (BN)
16Model Boolean functions
- Activity of gene 1 (promoter) promotes the
activation of gene 3, unless gene 2 is active
(repressor).
Gene 1
?
Gene 3
Gene 2
G1 G2 Y(G1,G2)
0 0 0
0 1 0
1 0 1
1 1 0
A possible Boolean function to represent this
biological relationship
17Note
- The Boolean function model is for the biological
model, NOT for the observed data !!! - Each binary function mimics the biological
behavior with some degree of fitness. - The quality of this fitness can be measured via
an error measure - There is always an optimal binary function, that
best fits the biological model.
18Inference of Boolean Functions
- Boolean relationship between genes can be
estimated from microarray data.
Experiment 2
Experiment 3
Experiment 1
Experiment 4
Experiment 2
Experiment 3
Experiment 1
Experiment 4
Experiment 5
Experiment 5
Experiment 6
Experiment 6
Examples A B
C Experiment 1 0 0 1 Experiment 2
0 1 0 Experiment 3 1 1
0 Experiment 4 1 1 1 Experiment 5
1 1 1 Experiment 6 0 0 1
Gene A
Gene A
0
0
1
1
1
0
Gene B
Gene B
0
1
1
1
1
0
Gene C
Gene C
1
0
0
1
1
1
Gene D
Gene D
1
0
0
0
1
1
A
B
Boolean function fc for C A B C 0 0
1 0 1 0 1 0 X 1 1 1
fC
C
19Error measure for binary functions
- How good is this function ? to model the
relationship between G1,G2 and G3 ? - The quality of the function ? depends on the
joint distribution of G1,G2 and G3 - In the same way, if the constant function is
defined by ?0c
20Optimal Function
- Between all possible Boolean functions ?, one of
them has the minimal error, as predictor of G3
from G1 and G2. This function is called ?opt. - ? ?opt ? ? ? for any other Boolean function ?
- If G1 and G2 are good predictors of G3, then the
relationship between them will be captured by
?opt and ? ?opt will be small. - The optimal constant predictor is called ?0-opt.
(there are only 2 possible constant predictors 0
and 1). - If G3 is almost constant, then ??0-opt will be
small.
21Coefficient of determination
- The Coefficient of Determination (CoD) of the
pair of genes G1 and G2 as predictors of the gene
G3 is given by the relative improvement in the
prediction when using the optimal predictor ?opt
over the optimal constant predictor ?0-opt. - The CoD depends ONLY on the joint distribution of
G1,G2 and G3.
22Probabilistic Boolean Network (PBN)
PBN (BN1, , BNk, p1, , pk, p, q) 0 lt p
lt 1 - probability of switching context 0 lt
pi lt 1 probability for BNi being used 0 lt q
lt 1 probability of gene flipping Context
Which BN is used for the next
transition the regime in which
the cell operates/functions Gene
flipping mutation rate
23Basic Building Block of a PBN
24 p1
q p2
x1 x2 x3 f1 f2 f3
0 0 0 1 0 0
0 0 1 0 0 0
0 1 0 1 1 1
0 1 1 0 1 1
1 0 0 1 0 0
1 0 1 0 0 0
1 1 0 1 0 0
1 1 1 0 0 0
x1 x2 x3 f1 f2 f3
0 0 0 0 0 1
0 0 1 0 0 1
0 1 0 1 0 0
0 1 1 0 1 0
1 0 0 1 0 0
1 0 1 0 0 0
1 1 0 0 1 0
1 1 1 0 0 1
p
25Context Switching
X2
X2
X3
X3
p
X1
X1
p1 q
p2
26Attractors in PBNs
- Attractors in the Boolean Networks should
correspond to cellular types (Kauffman)
- PBNs are formed by a family of Boolean Networks
- Steady-state analysis of the PBN may be
meaningful for classification based on
gene-expression data - Relationships between steady-state distribution
and the attractors of the Boolean Networks allow
structural analysis of the Network
27Dynamics of PBNs with perturbations
- Perturbations are added to the model to assure
the existence of a steady-state distribution - Perturbations move the system from the actual
state to a close state - The system behaves like a deterministic Boolean
Network until a perturbation or change of
function occurs
28Dynamics of PBNs with perturbations
The same Boolean Network being used
Time
In a basin
In the Attractor
Change of function or perturbation
Next change of function or perturbation
The system reaches the Attractor
29Steady-state analysis
- In the long run, the system is expected to stay
in the attractors of the Boolean Networks
From the same initial point the system can
transition to two different regions (attractors)
depending on the Boolean Function being used
30Modeling of real genetic regulatory systems using
PBNs
- PBNs with p 1 and q 0 are equivalent to
Dynamic Bayesian Networks (DBNs) Lähdesmäki, H.,
Relationships between probabilistic Boolean
networks and dynamic Bayesian networks as models
of gene regulatory networks, In Workshop on
Discrete Models for Genetic Regulatory Networks,
Texas AM University, College Station, TX
November 5-6, 2003. - Bayesian optimization of connectivity X. Zhou,
X. Wang, R. Pal, I. Ivanov, Michael Bittner and
E. Dougherty, A Bayesian Connectivity-based
Approach to Constructing PGRs, Bioinformatics
V.20 no 17, pp 2918-2927, 2004.
31Applications of PBNs
- Problem (A) Study a long-run characteristics
of a given dynamical system - Inverse Problem (B) Generate a BN/PBN with a
prescribed dynamical behavior - Control policies for reaching a desirable steady
state distribution
32Melanoma Application
- Microarray data
- 31 malignant melanoma samples
- 6971 unique genes on the array
- 7 genes of interest WNT5A, pirin, S100P,
RET1, MART1, HADHB, STC2 - Binarization of the gene expression profiles
- 18 distinct data states
- Suboptimal PGRN generation (using MSE distance)
- 10 attractor sets selected according to the
data frequency - 2 lt size of each attractor set lt 5
- 100 BNs generated for each attractor set
- 10 BNs selected to form PGRN, p q .001
33(No Transcript)
34Mathematics is biologys next microscope, only
better biology is mathematics next physics,
onlybetter. - J.E. Cohen Mathematics is
biologys next microscope, only better biology
is mathematics next physics, only better, PLOS
Biology 2 (2004) No.12.
CAN BIOLOGY LEAD TO NEW THEOREMS? B. Sturmfels,
Department of Mathematics, Univ. of California,
Berkeley, CA 94720, USA
35(No Transcript)
36Estimation of the CoD for G1,G2 and G3.
Microarray
Example of Ternary Expression Matrix
Exp1 Exp2 Exp3 Exp4 Exp5 Exp6 Exp7 Exp8
G1 1 1 0 1 1 1 1 1
G2 0 1 1 1 0 1 0 1
G3 0 1 0 1 0 1 1 0
Estimation of the optimal functions ?opt and
?0-opt for G1,G2 as predictors of G3
Estimated CoD for G1,G2 as predictors of G3
37Estimation of ??opt for G1,G2 and G3 from the
data
Ternary Expression Matrix for G1,G2 and G3
Exp1 Exp2 Exp3 Exp4 Exp5 Exp6 Exp7 Exp8
G1 1 1 0 1 1 1 1 1
G2 0 1 1 1 0 1 0 1
G3 0 1 0 1 0 1 0 0
Splitting of the matrix in Training and Test sets
TRAIN Exp1 Exp2 Exp3 Exp4
G1 1 1 0 1
G2 0 1 1 1
G3 0 1 0 1
TEST Exp5 Exp6 Exp7 Exp8
G1 1 1 1 1
G2 0 1 0 1
G3 0 1 0 0
38Estimation of ??opt for G1,G2 and G3 from the
data
More frequent value computed from data (X
denotes a non- observed configuration)
Generalization to fill non-observed configurations
TRAIN Exp1 Exp2 Exp3 Exp4
G1 1 1 0 1
G2 0 1 1 1
G3 0 1 0 1
Statistical Inference of the optimal function
?opt.
G1 G2 ?opt(G1,G2) ?opt(G1,G2)
0 0 X 0
0 1 0 0
1 0 0 0
1 1 1 1
TEST Exp5 Exp6 Exp7 Exp8
G1 1 1 1 1
G2 0 1 0 1
G3 0 1 0 0
Estimation of the error of ??opt from test set
1 mistake on 4 ? ??opt 0.25
39Estimation of ??0-opt for G1,G2 and G3 from the
data
Frequencies of possible values of G3 on train data
TEST Exp1 Exp2 Exp3 Exp4
G1 1 1 0 1
G2 0 1 1 1
G3 0 1 0 1
Statistical Inference of the optimal function
?0-opt.
G3 Frequency
0 2
1 2
?0-opt. 1 (use heuristic) (most frequent
observed value for G3)
TEST Exp5 Exp6 Exp7 Exp8
G1 1 1 1 1
G2 0 1 0 1
G3 0 1 0 0
Estimation of the error of ??opt from test set
3 mistakes on 4 ? ??0-opt 0.75
40Estimation of the CoD for G1,G2 and G3 from the
data
??0-opt 0.75
??opt 0.25
The error is reduced in a 66
41Estimation of the CoD for G1,G2 and G3.
- The previous process is repeated 1000 times, with
different random splitting of the set in training
and test sets. - The estimated value for the CoD is the average of
the 1000 values of ?. - If we want to know the predictive power of other
pair of genes, say G4,G5, over G3, we must repeat
the whole process - G1,G2 ? G3 ? ?312
- G4,G5 ? G3 ? ?345
42Result
- Determination of the predictive genetic network