Boolean and Probabilistic Boolean Networks as Models of Genomic Regulation - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Boolean and Probabilistic Boolean Networks as Models of Genomic Regulation

Description:

Boolean and Probabilistic Boolean Networks as Models of Genomic Regulation Ivan Ivanov Department of Veterinary Physiology and Pharmacology, Genomic Signal Processing Lab – PowerPoint PPT presentation

Number of Views:362
Avg rating:3.0/5.0
Slides: 43
Provided by: admi522
Category:

less

Transcript and Presenter's Notes

Title: Boolean and Probabilistic Boolean Networks as Models of Genomic Regulation


1
Boolean and Probabilistic Boolean Networks as
Models of Genomic Regulation
  • Ivan Ivanov
  • Department of Veterinary Physiology and
    Pharmacology,
  • Genomic Signal Processing Lab
  • Texas AM University
  • gsp.tamu.edu/people/ivan.html

2
The Central dogma in cell biology
3
Model based scientific approach
  • Mathematical models allow for a formal and
    unified description of physical phenomena

Experiment design
Mathematical Model
Data
Biology
Experiment
Inference
Any model that allows prediction could be
considered as a mathematical model
Prediction
4
Goals
  • Must incorporate rule-based dependencies between
    genes
  • Rule-based dependencies may constitute important
    biological information
  • Must allow to systematically study global network
    dynamics
  • In particular, individual gene effects on
    long-run network behavior
  • Must be able to cope with uncertainty
  • Small sample size, noisy measurements, robustness
  • Must permit quantification of the relative
    influence and sensitivity of genes in their
    interactions with other genes
  • This allows us to focus on individual (groups of)
    genes

5
Regulatory diagram for the activation of the
tumor-suppressor protein p53
Vogelstein, B., Lane, D. Levine, A. Surfing the
p53 network. Nature 408, 307-310 (2000)
6
Challenges
  • Biological systems function in exceedingly
    parallel, nonlinear and extraordinarily
    integrated fashion
  • Presence of protein-DNA feedback loops (negative
    or positive)
  • Availability and quality of data
  • Model selection
  • Fine/Continuous or Coarse/Discrete

7
cDNA microarray
8
  • Given
  • Genes communicate/interact via the proteins they
    encode
  • Protein production (transcription and
    translation) is controlled by a multitude of
    biochemical reactions which are in turn
    influenced by many internal or external to the
    cell factors.
  • Assumption
  • Gene expression Xj of a particular gene i is a
    random function xj(t, w) of the cell internal and
    external environment.
  • Goal
  • A good mathematical model for the dynamical
    behavior of the genes

9
Biochemical interactions network
Metabolic space
Metabolite 1
Metabolite 2
Microarrays
Protein space
Biochemical model
Protein 2
Protein 4
Complex 3-4
Protein 1
Protein 3
Gene 4
Gene 2
Biological phenomena
Relationship
Gene 3
Gene 1
Gene space
Variable
From Brazhnik et. Al. Gene networks how to put
the function in genomics, TRENDS in
Biotechnology, 20 (11), 2002
10
Discrete Models
  • Faithful representation of upregulated/expressed
    and downregulated/repressed gene activity
  • Filter the noise in data
  • Dynamical behavior can be clearly related to some
    underlying biological phenomena
  • Fine details like protein concentrations or
    kinetics of reactions cannot be captured

11
Gene Networks Inference
Biochemical interaction network
Projection to the gene space
Gene 4
Gene 2
Gene 3
Gene Regulatory Network Model
Gene 1
Gene space
From Brazhnik et. Al. Gene networks how to put
the function in genomics, TRENDS in
Biotechnology, 20 (11), 2002
12
Considerations
  • Can it explain all the biological process?
  • NO
  • (Definition of context adding back the other
    layers)
  • Can we understand better the physical phenomena?
  • YES
  • (Kauffman, attractorsphenotype, logical rules,
    etc)
  • It is an useful model?
  • YES
  • (we can make some good predictions)

13
Example of Cell Cycle Regulation
Logic diagram AND gate outputs
cdk2 p21/WAF1 is the input for a NOT gate NAND
gate outputs Rb
14
Discussion
  • How to derive a discrete representation of the
    biological process in a consistent way ??
  • How to define the quality of a mathematical model
    to describe the biological model ??
  • Obs here I dont use the word data. What is
    important is the model, and the data is a way to
    estimate its parameters !

Data
Biology
Model
Experiments
Parameters estimation
15
Boolean Network (BN)
16
Model Boolean functions
  • Activity of gene 1 (promoter) promotes the
    activation of gene 3, unless gene 2 is active
    (repressor).

Gene 1
?
Gene 3
Gene 2
G1 G2 Y(G1,G2)
0 0 0
0 1 0
1 0 1
1 1 0
A possible Boolean function to represent this
biological relationship
17
Note
  • The Boolean function model is for the biological
    model, NOT for the observed data !!!
  • Each binary function mimics the biological
    behavior with some degree of fitness.
  • The quality of this fitness can be measured via
    an error measure
  • There is always an optimal binary function, that
    best fits the biological model.

18
Inference of Boolean Functions
  • Boolean relationship between genes can be
    estimated from microarray data.

Experiment 2
Experiment 3
Experiment 1
Experiment 4
Experiment 2
Experiment 3
Experiment 1
Experiment 4
Experiment 5
Experiment 5
Experiment 6
Experiment 6
Examples A B
C Experiment 1 0 0 1 Experiment 2
0 1 0 Experiment 3 1 1
0 Experiment 4 1 1 1 Experiment 5
1 1 1 Experiment 6 0 0 1
Gene A
Gene A
0
0
1
1
1
0
Gene B
Gene B
0
1
1
1
1
0
Gene C
Gene C
1
0
0
1
1
1
Gene D
Gene D
1
0
0
0
1
1
A
B
Boolean function fc for C A B C 0 0
1 0 1 0 1 0 X 1 1 1
fC
C
19
Error measure for binary functions
  • How good is this function ? to model the
    relationship between G1,G2 and G3 ?
  • The quality of the function ? depends on the
    joint distribution of G1,G2 and G3
  • In the same way, if the constant function is
    defined by ?0c

20
Optimal Function
  • Between all possible Boolean functions ?, one of
    them has the minimal error, as predictor of G3
    from G1 and G2. This function is called ?opt.
  • ? ?opt ? ? ? for any other Boolean function ?
  • If G1 and G2 are good predictors of G3, then the
    relationship between them will be captured by
    ?opt and ? ?opt will be small.
  • The optimal constant predictor is called ?0-opt.
    (there are only 2 possible constant predictors 0
    and 1).
  • If G3 is almost constant, then ??0-opt will be
    small.

21
Coefficient of determination
  • The Coefficient of Determination (CoD) of the
    pair of genes G1 and G2 as predictors of the gene
    G3 is given by the relative improvement in the
    prediction when using the optimal predictor ?opt
    over the optimal constant predictor ?0-opt.
  • The CoD depends ONLY on the joint distribution of
    G1,G2 and G3.

22
Probabilistic Boolean Network (PBN)
PBN (BN1, , BNk, p1, , pk, p, q) 0 lt p
lt 1 - probability of switching context 0 lt
pi lt 1 probability for BNi being used 0 lt q
lt 1 probability of gene flipping Context
Which BN is used for the next
transition the regime in which
the cell operates/functions Gene
flipping mutation rate
23
Basic Building Block of a PBN
24
p1
q p2
x1 x2 x3 f1 f2 f3
0 0 0 1 0 0
0 0 1 0 0 0
0 1 0 1 1 1
0 1 1 0 1 1
1 0 0 1 0 0
1 0 1 0 0 0
1 1 0 1 0 0
1 1 1 0 0 0
x1 x2 x3 f1 f2 f3
0 0 0 0 0 1
0 0 1 0 0 1
0 1 0 1 0 0
0 1 1 0 1 0
1 0 0 1 0 0
1 0 1 0 0 0
1 1 0 0 1 0
1 1 1 0 0 1
p
25
Context Switching
X2
X2
X3
X3
p
X1
X1
p1 q
p2
26
Attractors in PBNs
  • Attractors in the Boolean Networks should
    correspond to cellular types (Kauffman)
  • PBNs are formed by a family of Boolean Networks
  • Steady-state analysis of the PBN may be
    meaningful for classification based on
    gene-expression data
  • Relationships between steady-state distribution
    and the attractors of the Boolean Networks allow
    structural analysis of the Network

27
Dynamics of PBNs with perturbations
  • Perturbations are added to the model to assure
    the existence of a steady-state distribution
  • Perturbations move the system from the actual
    state to a close state
  • The system behaves like a deterministic Boolean
    Network until a perturbation or change of
    function occurs

28
Dynamics of PBNs with perturbations
The same Boolean Network being used
Time
In a basin
In the Attractor
Change of function or perturbation
Next change of function or perturbation
The system reaches the Attractor
29
Steady-state analysis
  • In the long run, the system is expected to stay
    in the attractors of the Boolean Networks

From the same initial point the system can
transition to two different regions (attractors)
depending on the Boolean Function being used
30
Modeling of real genetic regulatory systems using
PBNs
  • PBNs with p 1 and q 0 are equivalent to
    Dynamic Bayesian Networks (DBNs) Lähdesmäki, H.,
    Relationships between probabilistic Boolean
    networks and dynamic Bayesian networks as models
    of gene regulatory networks, In Workshop on
    Discrete Models for Genetic Regulatory Networks,
    Texas AM University, College Station, TX
    November 5-6, 2003.
  • Bayesian optimization of connectivity X. Zhou,
    X. Wang, R. Pal, I. Ivanov, Michael Bittner and
    E. Dougherty, A Bayesian Connectivity-based
    Approach to Constructing PGRs, Bioinformatics
    V.20 no 17, pp 2918-2927, 2004.

31
Applications of PBNs
  • Problem (A) Study a long-run characteristics
    of a given dynamical system
  • Inverse Problem (B) Generate a BN/PBN with a
    prescribed dynamical behavior
  • Control policies for reaching a desirable steady
    state distribution

32
Melanoma Application
  • Microarray data
  • 31 malignant melanoma samples
  • 6971 unique genes on the array
  • 7 genes of interest WNT5A, pirin, S100P,
    RET1, MART1, HADHB, STC2
  • Binarization of the gene expression profiles
  • 18 distinct data states
  • Suboptimal PGRN generation (using MSE distance)
  • 10 attractor sets selected according to the
    data frequency
  • 2 lt size of each attractor set lt 5
  • 100 BNs generated for each attractor set
  • 10 BNs selected to form PGRN, p q .001

33
(No Transcript)
34
Mathematics is biologys next microscope, only
better biology is mathematics next physics,
onlybetter. - J.E. Cohen Mathematics is
biologys next microscope, only better biology
is mathematics next physics, only better, PLOS
Biology 2 (2004) No.12.
CAN BIOLOGY LEAD TO NEW THEOREMS? B. Sturmfels,
Department of Mathematics, Univ. of California,
Berkeley, CA 94720, USA
35
(No Transcript)
36
Estimation of the CoD for G1,G2 and G3.
Microarray
Example of Ternary Expression Matrix
Exp1 Exp2 Exp3 Exp4 Exp5 Exp6 Exp7 Exp8
G1 1 1 0 1 1 1 1 1
G2 0 1 1 1 0 1 0 1
G3 0 1 0 1 0 1 1 0

Estimation of the optimal functions ?opt and
?0-opt for G1,G2 as predictors of G3
Estimated CoD for G1,G2 as predictors of G3
37
Estimation of ??opt for G1,G2 and G3 from the
data
Ternary Expression Matrix for G1,G2 and G3
Exp1 Exp2 Exp3 Exp4 Exp5 Exp6 Exp7 Exp8
G1 1 1 0 1 1 1 1 1
G2 0 1 1 1 0 1 0 1
G3 0 1 0 1 0 1 0 0
Splitting of the matrix in Training and Test sets
TRAIN Exp1 Exp2 Exp3 Exp4
G1 1 1 0 1
G2 0 1 1 1
G3 0 1 0 1
TEST Exp5 Exp6 Exp7 Exp8
G1 1 1 1 1
G2 0 1 0 1
G3 0 1 0 0
38
Estimation of ??opt for G1,G2 and G3 from the
data
More frequent value computed from data (X
denotes a non- observed configuration)
Generalization to fill non-observed configurations
TRAIN Exp1 Exp2 Exp3 Exp4
G1 1 1 0 1
G2 0 1 1 1
G3 0 1 0 1
Statistical Inference of the optimal function
?opt.
G1 G2 ?opt(G1,G2) ?opt(G1,G2)
0 0 X 0
0 1 0 0
1 0 0 0
1 1 1 1
TEST Exp5 Exp6 Exp7 Exp8
G1 1 1 1 1
G2 0 1 0 1
G3 0 1 0 0
Estimation of the error of ??opt from test set
1 mistake on 4 ? ??opt 0.25
39
Estimation of ??0-opt for G1,G2 and G3 from the
data
Frequencies of possible values of G3 on train data
TEST Exp1 Exp2 Exp3 Exp4
G1 1 1 0 1
G2 0 1 1 1
G3 0 1 0 1
Statistical Inference of the optimal function
?0-opt.
G3 Frequency
0 2
1 2
?0-opt. 1 (use heuristic) (most frequent
observed value for G3)
TEST Exp5 Exp6 Exp7 Exp8
G1 1 1 1 1
G2 0 1 0 1
G3 0 1 0 0
Estimation of the error of ??opt from test set
3 mistakes on 4 ? ??0-opt 0.75
40
Estimation of the CoD for G1,G2 and G3 from the
data
??0-opt 0.75
??opt 0.25
The error is reduced in a 66
41
Estimation of the CoD for G1,G2 and G3.
  • The previous process is repeated 1000 times, with
    different random splitting of the set in training
    and test sets.
  • The estimated value for the CoD is the average of
    the 1000 values of ?.
  • If we want to know the predictive power of other
    pair of genes, say G4,G5, over G3, we must repeat
    the whole process
  • G1,G2 ? G3 ? ?312
  • G4,G5 ? G3 ? ?345

42
Result
  • Determination of the predictive genetic network
Write a Comment
User Comments (0)
About PowerShow.com