Boolean and Probabilistic Boolean Networks as Models of Genomic Regulation

About This Presentation

Title:

Boolean and Probabilistic Boolean Networks as Models of Genomic Regulation

Description:

Boolean and Probabilistic Boolean Networks as Models of Genomic Regulation Ivan Ivanov Department of Veterinary Physiology and Pharmacology, Genomic Signal Processing Lab – PowerPoint PPT presentation

Number of Views:362

Avg rating:3.0/5.0

Slides: 43

Provided by: admi522

Category:

more less

Transcript and Presenter's Notes

Title: Boolean and Probabilistic Boolean Networks as Models of Genomic Regulation

1
Boolean and Probabilistic Boolean Networks as
Models of Genomic Regulation

Ivan Ivanov
Department of Veterinary Physiology and
Pharmacology,
Genomic Signal Processing Lab
Texas AM University
gsp.tamu.edu/people/ivan.html

2
The Central dogma in cell biology
3
Model based scientific approach

Mathematical models allow for a formal and
unified description of physical phenomena

Experiment design
Mathematical Model
Data
Biology
Experiment
Inference
Any model that allows prediction could be
considered as a mathematical model
Prediction
4
Goals

Must incorporate rule-based dependencies between
genes
Rule-based dependencies may constitute important
biological information
Must allow to systematically study global network
dynamics
In particular, individual gene effects on
long-run network behavior
Must be able to cope with uncertainty
Small sample size, noisy measurements, robustness
Must permit quantification of the relative
influence and sensitivity of genes in their
interactions with other genes
This allows us to focus on individual (groups of)
genes

5
Regulatory diagram for the activation of the
tumor-suppressor protein p53
Vogelstein, B., Lane, D. Levine, A. Surfing the
p53 network. Nature 408, 307-310 (2000)
6
Challenges

Biological systems function in exceedingly
parallel, nonlinear and extraordinarily
integrated fashion
Presence of protein-DNA feedback loops (negative
or positive)
Availability and quality of data
Model selection
Fine/Continuous or Coarse/Discrete

7
cDNA microarray
8

Given
Genes communicate/interact via the proteins they
encode
Protein production (transcription and
translation) is controlled by a multitude of
biochemical reactions which are in turn
influenced by many internal or external to the
cell factors.
Assumption
Gene expression Xj of a particular gene i is a
random function xj(t, w) of the cell internal and
external environment.
Goal
A good mathematical model for the dynamical
behavior of the genes

9
Biochemical interactions network
Metabolic space
Metabolite 1
Metabolite 2
Microarrays
Protein space
Biochemical model
Protein 2
Protein 4
Complex 3-4
Protein 1
Protein 3
Gene 4
Gene 2
Biological phenomena
Relationship
Gene 3
Gene 1
Gene space
Variable
From Brazhnik et. Al. Gene networks how to put
the function in genomics, TRENDS in
Biotechnology, 20 (11), 2002
10
Discrete Models

Faithful representation of upregulated/expressed
and downregulated/repressed gene activity
Filter the noise in data
Dynamical behavior can be clearly related to some
underlying biological phenomena
Fine details like protein concentrations or
kinetics of reactions cannot be captured

11
Gene Networks Inference
Biochemical interaction network
Projection to the gene space
Gene 4
Gene 2
Gene 3
Gene Regulatory Network Model
Gene 1
Gene space
From Brazhnik et. Al. Gene networks how to put
the function in genomics, TRENDS in
Biotechnology, 20 (11), 2002
12
Considerations

Can it explain all the biological process?
NO
(Definition of context adding back the other
layers)
Can we understand better the physical phenomena?
YES
(Kauffman, attractorsphenotype, logical rules,
etc)
It is an useful model?
YES
(we can make some good predictions)

13
Example of Cell Cycle Regulation
Logic diagram AND gate outputs
cdk2 p21/WAF1 is the input for a NOT gate NAND
gate outputs Rb
14
Discussion

How to derive a discrete representation of the
biological process in a consistent way ??
How to define the quality of a mathematical model
to describe the biological model ??
Obs here I dont use the word data. What is
important is the model, and the data is a way to
estimate its parameters !

Data
Biology
Model
Experiments
Parameters estimation
15
Boolean Network (BN)
16
Model Boolean functions

Activity of gene 1 (promoter) promotes the
activation of gene 3, unless gene 2 is active
(repressor).

Gene 1
?
Gene 3
Gene 2
G1 G2 Y(G1,G2)
0 0 0
0 1 0
1 0 1
1 1 0
A possible Boolean function to represent this
biological relationship
17
Note

The Boolean function model is for the biological
model, NOT for the observed data !!!
Each binary function mimics the biological
behavior with some degree of fitness.
The quality of this fitness can be measured via
an error measure
There is always an optimal binary function, that
best fits the biological model.

18
Inference of Boolean Functions

Boolean relationship between genes can be
estimated from microarray data.

Experiment 2
Experiment 3
Experiment 1
Experiment 4
Experiment 2
Experiment 3
Experiment 1
Experiment 4
Experiment 5
Experiment 5
Experiment 6
Experiment 6
Examples A B
C Experiment 1 0 0 1 Experiment 2
0 1 0 Experiment 3 1 1
0 Experiment 4 1 1 1 Experiment 5
1 1 1 Experiment 6 0 0 1
Gene A
Gene A
0
0
1
1
1
0
Gene B
Gene B
0
1
1
1
1
0
Gene C
Gene C
1
0
0
1
1
1
Gene D
Gene D
1
0
0
0
1
1
A
B
Boolean function fc for C A B C 0 0
1 0 1 0 1 0 X 1 1 1
fC
C
19
Error measure for binary functions

How good is this function ? to model the
relationship between G1,G2 and G3 ?
The quality of the function ? depends on the
joint distribution of G1,G2 and G3
In the same way, if the constant function is
defined by ?0c

20
Optimal Function

Between all possible Boolean functions ?, one of
them has the minimal error, as predictor of G3
from G1 and G2. This function is called ?opt.
? ?opt ? ? ? for any other Boolean function ?
If G1 and G2 are good predictors of G3, then the
relationship between them will be captured by
?opt and ? ?opt will be small.
The optimal constant predictor is called ?0-opt.
(there are only 2 possible constant predictors 0
and 1).
If G3 is almost constant, then ??0-opt will be
small.

21
Coefficient of determination

The Coefficient of Determination (CoD) of the
pair of genes G1 and G2 as predictors of the gene
G3 is given by the relative improvement in the
prediction when using the optimal predictor ?opt
over the optimal constant predictor ?0-opt.
The CoD depends ONLY on the joint distribution of
G1,G2 and G3.

22
Probabilistic Boolean Network (PBN)
PBN (BN1, , BNk, p1, , pk, p, q) 0 lt p
lt 1 - probability of switching context 0 lt
pi lt 1 probability for BNi being used 0 lt q
lt 1 probability of gene flipping Context
Which BN is used for the next
transition the regime in which
the cell operates/functions Gene
flipping mutation rate
23
Basic Building Block of a PBN
24
p1
q p2
x1 x2 x3 f1 f2 f3
0 0 0 1 0 0
0 0 1 0 0 0
0 1 0 1 1 1
0 1 1 0 1 1
1 0 0 1 0 0
1 0 1 0 0 0
1 1 0 1 0 0
1 1 1 0 0 0
x1 x2 x3 f1 f2 f3
0 0 0 0 0 1
0 0 1 0 0 1
0 1 0 1 0 0
0 1 1 0 1 0
1 0 0 1 0 0
1 0 1 0 0 0
1 1 0 0 1 0
1 1 1 0 0 1
p
25
Context Switching
X2
X2
X3
X3
p
X1
X1
p1 q
p2
26
Attractors in PBNs

Attractors in the Boolean Networks should
correspond to cellular types (Kauffman)

PBNs are formed by a family of Boolean Networks
Steady-state analysis of the PBN may be
meaningful for classification based on
gene-expression data
Relationships between steady-state distribution
and the attractors of the Boolean Networks allow
structural analysis of the Network

27
Dynamics of PBNs with perturbations

Perturbations are added to the model to assure
the existence of a steady-state distribution
Perturbations move the system from the actual
state to a close state
The system behaves like a deterministic Boolean
Network until a perturbation or change of
function occurs

28
Dynamics of PBNs with perturbations
The same Boolean Network being used
Time
In a basin
In the Attractor
Change of function or perturbation
Next change of function or perturbation
The system reaches the Attractor
29
Steady-state analysis

In the long run, the system is expected to stay
in the attractors of the Boolean Networks

From the same initial point the system can
transition to two different regions (attractors)
depending on the Boolean Function being used
30
Modeling of real genetic regulatory systems using
PBNs

PBNs with p 1 and q 0 are equivalent to
Dynamic Bayesian Networks (DBNs) Lähdesmäki, H.,
Relationships between probabilistic Boolean
networks and dynamic Bayesian networks as models
of gene regulatory networks, In Workshop on
Discrete Models for Genetic Regulatory Networks,
Texas AM University, College Station, TX
November 5-6, 2003.
Bayesian optimization of connectivity X. Zhou,
X. Wang, R. Pal, I. Ivanov, Michael Bittner and
E. Dougherty, A Bayesian Connectivity-based
Approach to Constructing PGRs, Bioinformatics
V.20 no 17, pp 2918-2927, 2004.

31
Applications of PBNs

Problem (A) Study a long-run characteristics
of a given dynamical system
Inverse Problem (B) Generate a BN/PBN with a
prescribed dynamical behavior
Control policies for reaching a desirable steady
state distribution

32
Melanoma Application

Microarray data
31 malignant melanoma samples
6971 unique genes on the array
7 genes of interest WNT5A, pirin, S100P,
RET1, MART1, HADHB, STC2
Binarization of the gene expression profiles
18 distinct data states
Suboptimal PGRN generation (using MSE distance)
10 attractor sets selected according to the
data frequency
2 lt size of each attractor set lt 5
100 BNs generated for each attractor set
10 BNs selected to form PGRN, p q .001

33
(No Transcript)
34
Mathematics is biologys next microscope, only
better biology is mathematics next physics,
onlybetter. - J.E. Cohen Mathematics is
biologys next microscope, only better biology
is mathematics next physics, only better, PLOS
Biology 2 (2004) No.12.
CAN BIOLOGY LEAD TO NEW THEOREMS? B. Sturmfels,
Department of Mathematics, Univ. of California,
Berkeley, CA 94720, USA
35
(No Transcript)
36
Estimation of the CoD for G1,G2 and G3.
Microarray
Example of Ternary Expression Matrix
Exp1 Exp2 Exp3 Exp4 Exp5 Exp6 Exp7 Exp8
G1 1 1 0 1 1 1 1 1
G2 0 1 1 1 0 1 0 1
G3 0 1 0 1 0 1 1 0

Estimation of the optimal functions ?opt and
?0-opt for G1,G2 as predictors of G3
Estimated CoD for G1,G2 as predictors of G3
37
Estimation of ??opt for G1,G2 and G3 from the
data
Ternary Expression Matrix for G1,G2 and G3
Exp1 Exp2 Exp3 Exp4 Exp5 Exp6 Exp7 Exp8
G1 1 1 0 1 1 1 1 1
G2 0 1 1 1 0 1 0 1
G3 0 1 0 1 0 1 0 0
Splitting of the matrix in Training and Test sets
TRAIN Exp1 Exp2 Exp3 Exp4
G1 1 1 0 1
G2 0 1 1 1
G3 0 1 0 1
TEST Exp5 Exp6 Exp7 Exp8
G1 1 1 1 1
G2 0 1 0 1
G3 0 1 0 0
38
Estimation of ??opt for G1,G2 and G3 from the
data
More frequent value computed from data (X
denotes a non- observed configuration)
Generalization to fill non-observed configurations
TRAIN Exp1 Exp2 Exp3 Exp4
G1 1 1 0 1
G2 0 1 1 1
G3 0 1 0 1
Statistical Inference of the optimal function
?opt.
G1 G2 ?opt(G1,G2) ?opt(G1,G2)
0 0 X 0
0 1 0 0
1 0 0 0
1 1 1 1
TEST Exp5 Exp6 Exp7 Exp8
G1 1 1 1 1
G2 0 1 0 1
G3 0 1 0 0
Estimation of the error of ??opt from test set
1 mistake on 4 ? ??opt 0.25
39
Estimation of ??0-opt for G1,G2 and G3 from the
data
Frequencies of possible values of G3 on train data
TEST Exp1 Exp2 Exp3 Exp4
G1 1 1 0 1
G2 0 1 1 1
G3 0 1 0 1
Statistical Inference of the optimal function
?0-opt.
G3 Frequency
0 2
1 2
?0-opt. 1 (use heuristic) (most frequent
observed value for G3)
TEST Exp5 Exp6 Exp7 Exp8
G1 1 1 1 1
G2 0 1 0 1
G3 0 1 0 0
Estimation of the error of ??opt from test set
3 mistakes on 4 ? ??0-opt 0.75
40
Estimation of the CoD for G1,G2 and G3 from the
data
??0-opt 0.75
??opt 0.25
The error is reduced in a 66
41
Estimation of the CoD for G1,G2 and G3.

The previous process is repeated 1000 times, with
different random splitting of the set in training
and test sets.
The estimated value for the CoD is the average of
the 1000 values of ?.
If we want to know the predictive power of other
pair of genes, say G4,G5, over G3, we must repeat
the whole process
G1,G2 ? G3 ? ?312
G4,G5 ? G3 ? ?345

Boolean and Probabilistic Boolean Networks as Models of Genomic Regulation - PowerPoint PPT Presentation

Boolean and Probabilistic Boolean Networks as Models of Genomic Regulation

Boolean and Probabilistic Boolean Networks as Models of Genomic Regulation Ivan Ivanov Department of Veterinary Physiology and Pharmacology, Genomic Signal Processing Lab – PowerPoint PPT presentation