Insights from Boolean Modeling of Genetic Regulatory Networks

About This Presentation

Title:

Insights from Boolean Modeling of Genetic Regulatory Networks

Description:

... et al. (2003) Machine Learning, 52, 147-167. 20. Coefficient of ... GNB2 influences MAP kinase 1, which in turn influences c-rel, an NF B component. 32 ... – PowerPoint PPT presentation

Number of Views:189

Avg rating:3.0/5.0

Slides: 72

Provided by: ilyashm

Category:

more less

Transcript and Presenter's Notes

Title: Insights from Boolean Modeling of Genetic Regulatory Networks

1
Insights from Boolean Modeling of Genetic
Regulatory Networks

ilya shmulevich

2
Part I

Discover and understand the underlying gene
regulatory mechanisms by means of inferring them
from data.
By using the inferred model, endeavor to make
useful predictions by mathematical analysis and
computer simulations.

3
genetic networks

Complex regulatory networks among genes and their
products control cell behaviors such as
cell cycle
apoptosis
cell differentiation
communication between cells in tissues
A paramount problem is to understand the
dynamical interactions among these genes,
transcription factors, and signaling cascades,
which govern the integrated behavior of the cell.

Analogy circuit diagram
4
Clinical Impact

Model-based and computational analysis can
open up a window on the physiology of an organism
and disease progression
translate into accurate diagnosis, target
identification, drug development, and treatment.

5
What class of models should be chosen?

The selection should be made in view of
data requirements
goals of modeling and analysis.

Goals
Data
Model
6
Classical tradeoff

A fine model with many parameters
may be able to capture detailed low-level
phenomena (protein concentrations, reaction
kinetics)
requires large amounts of data for inference
A coarse model with low complexity
may succeed in capturing only high-level
phenomena (e.g. which genes are ON/OFF)
requires smaller amounts of data

7
Ockhams Razor

Underlies all scientific theory building.
Model complexity should never be made higher than
what is necessary to faithfully explain the
data.
What kind of data do we have and how much?

William of Ockham (1280-1349)
8
Boolean Networks

To what extent do such models represent reality?
Do we have the right type of data to infer
these models?
What do we hope to learn from them?

9
Basic Structure of Boolean Networks
1 means active/expressed 0 means
inactive/unexpressed
A
B
Boolean function A B X 0 0 1 0 1 1 1 0 0 1 1 1
X
In this example, two genes (A and B) regulate
gene X. In principle, any number of input genes
are possible. Positive/negative feedback is also
common (and necessary for homeostasis).
10
Dynamics of Boolean Networks
A
B
C
D
E
F
Time
0
1
1
0
0
1
11
State Space of Boolean Networks

equate cellular states (or fates) with
attractors.
attractor states are stable under small
perturbations
most perturbations cause the network to flow back
to the attractor.
some genes are more important and changing their
activation can cause the system to transition to
a different attractor.

Picture generated using the program DDLab.
12
Boolean model of the yeast filamentation network
Taylor, Galitski
13
But can we extract meaningful biological
information from gene expression data entirely in
the binary domain?

We reasoned that if genes, when quantized to only
two levels (1 or 0) would not be informative in
separating known subclasses of tumors, then there
would be little hope for Boolean inference of
real genetic networks.

14
Gene expression analysis in the binary domain

By using binary gene expression data and Hamming
distance as a similarity metric, a separation
between different subtypes of gliomas is evident,
using multidimensional scaling.

Shmulevich, I. and Zhang, W. (2002)
Bioinformatics 18(4), 555-565.
15
Boolean Framework

Limited amounts of data and the noisy nature of
the measurements can make useful quantitative
inferences problematic and a coarse-scale
qualitative modeling approach seems to be
justified.
Boolean idealization enormously simplifies the
modeling task.
We wish to study the collective regulatory
behavior without specific quantitative details.
Boolean networks qualitatively capture typical
genetic behavior.

Albert, R Othmer, H.G. (2003) J. Theor. Biol.
223, 1-18.
Mendoza, L., Thieffry, D. Alvarez-Buylla, R.E.
(1999) Bioinformatics 15, 593-606.
Huang, S. Ingber, D. E. (2000) Exp. Cell Res.
261, 91-103.
Li F, Long T, Lu Y, Ouyang Q, Tang C. (2004)
PNAS. 101(14)4781-6.

16
(No Transcript)
17
Probabilistic Boolean Networks (PBN)

Share the appealing rule-based properties of
Boolean networks.
Robust in the face of uncertainty.
Dynamic behavior can be studied in the context of
Markov Chains.
Boolean networks are just special cases.
Close relationship to (dynamic) Bayesian networks
Explicitly represent probabilistic relationships
between genes. (Lähdesmäki et al. (2006) Sig.
Proc., 86(4)814-834)
Can represent the same joint probability
distribution.
Allow quantification of influence of genes on
other genes (stay tuned for examples)

Shmulevich et al. (2002) Proceedings of the IEEE,
90(11), 1778-1792.
18
Basic structure of PBNs
If we have several good competing predictors
(functions) for a given gene and each one has
determinative power, dont put all our faith in
one of them!
19
Model Inference from Gene Expression Data

Two approaches
Coefficient of Determination (Dougherty et al.
2000)
Best-Fit Extensions

Lähdesmäki et al. (2003) Machine Learning, 52,
147-167.
20
Coefficient of Determination (COD)

COD is used to discover associations between
variables.
It measures the degree to which the expression
levels of an observed gene set can be used to
improve the prediction of the expression of a
target gene relative to the best possible
prediction in the absence of observations.
Using the COD, one can find sets of genes related
multivariately to a given target gene.

21
COD Definition
Target gene
Observed genes
Optimal Predictor
?i is the error of the best (constant) estimate
of xi in the absence of any conditional
variables ?opt is the optimal error achieved by
f
22
Constraints During Inference

Constraining the class of predictors can have
advantages
lessening the data requirements for reliable
estimation
incorporating prior knowledge of the class of
functions representing genetic interactions
certain classes of functions are more plausible
from the point of view of evolution, noise
resilience, network dynamics, etc.

23
Example of Constraint Post Classes
Shmulevich et al. (2003) PNAS 100(19),
10734-10739.
Emil Post (1897-1954)

The class is sufficiently large (this is
important for inference).
An abundance of functions from this class will
tend to prevent chaotic behavior in networks.
Eukaryotic cells are not chaotic! (Shmulevich et
al. (2005) PNAS 102(38), 13439-13444.)
Functions from this class have a natural way to
ensure robustness against noise and uncertainty.

24
Post Class Constraints During Inference

We compared the Post classes to the class of all
Boolean functions (i.e. no constraint) by
estimating the corresponding prediction error for
a set of target genes, using available gene
expression data.
We found that the optimal error of Post functions
compares favorably with optimal error without
constraint.
A hypothesis testing-based study gives no
statistically significant evidence against the
use of constrained function classes (i.e. cost of
constraint).
Thus, Post classes are also plausible in light of
experimental data.

25
SubnetworksTheory and Examples

aim discover relatively small subnetworks
whose genes interact significantly and
whose genes are not strongly influenced by genes
outside the subnetwork.
Principle of Autonomy
Start with a seed gene set and iteratively
adjoin new genes so as to enhance subnetwork
autonomy.

26
Growing Algorithm
To achieve network autonomy, both of these
strengths of connections should be high.
The sensitivity of Y from the outside should be
small.
Various stopping criteria can be used
Hashimoto et al. (2004) Bioinformatics 20(8)
1241-1247.
27
Cancer tissues need nutrients. Gliomas are highly
angiogenic. Expression of VEGF is often elevated.
28
VEGF is elevated in advanced stage of
gliomas Confirmation and localization by tissue
microarray
29
VEGF protein is secreted outside the cells and
binds to its receptor on the endothelial cells to
promote their growth.
30
Member of fibroblast growth factor family
FGF7
VEGF
PTK7
Tyrosine kinase receptor
GRB2

The protein products of all four genes are part
of signal transduction pathways that involve
surface tyrosine kinase receptors.
These receptors, when activated, recruit a number
of adaptor proteins to relay the signal to
downstream molecules
GRB2 is one of the most crucial adaptors that
have been identified.
GRB2 is also a target for cancer intervention
because of its link to multiple growth factor
signal transduction pathways.

FSHR
Follicle-stimulating hormone receptor
31
(No Transcript)
32

Such relationships should also be validated
experimentally.
The networks built from our models provide
valuable theoretical guidance for further
experiments.

IGFBP2 is overexpressed in high-grade gliomas
IGFBP2 contributes to increased cell invasion.

34
IGFBP2 is elevated in advanced stage of
gliomas Confirmation and localization by tissue
microarray
35
IGFBP2 promotes glioma cell invasion in vitro
High IGFBP2 clone 1
Vector
Low IGFBP2 clone
High IGFBP2 clone 2
36
A. Niemistö, L. Hu, O. Yli-Harja, W. Zhang, I.
Shmulevich, "Quantification of in vitro cell
invasion through image analysis," International
Conference of the IEEE Engineering in Medicine
and Biology Society (EMBS'04), San Francisco,
California, USA, Sep. 1-5, 2004.
37

A review of the literature showed that Cazals et
al. (1999) indeed demonstrated that NF?B
activated the IGFBP2 promoter in lung alveolar
epithelial cells.

IGFBP2
NF?B
38

Higher NF?B activity in IGFBP2 overexpressing
cells was also found.
Transient transfection of IGFBP2 expressing
vector together with NF?B promoter reporter gene
construct did not lead to increased NF?B
activity, suggesting an indirect effect of IGFBP2
on NF?B

IGFBP2
TNFR2

Our real-time PCR data showed that in stable
IGFBP2-overexpressing cell lines, IGFBP2 indeed
enhances ILK expression.
In addition, IGFBP2 contains an RGD domain,
implying its interaction with integrin molecules.
ILK is in the integrin signal transduction
pathway.

ILK
NF?B

Studies also showed that IGFBP2 affects cell
apoptosis and TNFR2 is a known regulator of
apoptosis

39
PBN web page
http//personal.systemsbiology.net/ilya/PBN/PBN.ht
m

Reprints
Software (BN/PBN MATLAB Toolbox)
Posters/Presentations
Workshops
Links
PBN People

40
PBN Collaborators
Wei Zhang Harri Lähdesmäki Olli
Yli-Harja Jaakko Astola Edward
Dougherty Ronaldo Hashimoto Marcel
Brun Seungchan Kim Edward Suh Huai Li Michael
Bittner
Support NIH/NIGMS R21 GM070600-01 NIH/NIGMS R01
GM072855-01
41
Part II
42
Joint work with
Stu Kauffman
Max Aldana
43
Order/Chaos

A broad body of work over the past 35 years has
shown that a variety of model genetic regulatory
networks behave in two broad regimes, ordered and
chaotic, with an analytically and numerically
demonstrated phase transition between the two.

44
Edge of chaos

The boundary between order and chaos is called
the complex regime or the critical phase.
The system can undergo a kind of phase
transition.
Networks are most evolvable at the edge of
chaos.
Living system in a variable environment
Strike a balance malleability vs. stability
Must be stable, but not so stable that it remains
forever static.
Must be malleable, but not so malleable that it
is fragile in the face of perturbations.

45
Plausible and long-standing hypothesis Real
cells lie in the ordered regime or are critical.
Life at the edge of chaos
There has been no experimental data supporting
this hypothesis.
46
Ordered networks

Homeostasis
A modest number of small recurrent patterns of
gene activity (attractors)
plausible models of the diverse cell types (or
cell fates) of an organism
the phenotypic traits of the organism are encoded
in the dynamical attractors of its underlying
genetic regulatory network
Confined avalanches of gene activity changes
following transient perturbations in the activity
of single genes
i.e. confined damage spreading

47
Chaotic networks

Nearby states lie on trajectories that diverge
hence, fail to exhibit a natural basis for
homeostasis
Have enormous attractors whose sizes scale
exponentially with the number of genes
Exhibit vast avalanches of gene activity
alterations following transient perturbations to
single gene activities

48
The model class

Random Boolean Networks (RBNs) - Kauffman (1969)
ensemble approach
One of the most intensively studied models of
discrete dynamical systems.
Sustained interest from biology and physics
communities.
Considered for many years as prototypes of
nonlinear dynamical systems.
RBNs are
Structurally simple yet capable of remarkably
rich complex behavior!

49
Connectivity
Mean number of input variables
(e.g. scale-free)
50
Bias

The bias p of a random function is the
probability that it takes on the value 1.
If p 0.5, then the function is unbiased.

51
Connectivity, bias, and the phase transition
Average Network Sensitivity
Chaos
Critical Phase
Order
Shmulevich Kauffman (2004) Physical Review
Letters, 93(4) 048701
52
Phase transition

RBNs can be tuned to undergo a phase transition
by
tuning the connectivity K
tuning the bias p
tuning the scale-free exponent ?
Aldana Cluzel (2003) PNAS, 100(15)8710-4.
tuning abundance of functional classes
Shmulevich et al. (2003) PNAS 100(19)10734-9.

53
Our approach

Measure and compare the complexity of time series
data of HeLa cells with that of mock data
generated by RBNs operating in the ordered,
critical, and chaotic regimes.
We use the Lempel-Ziv (LZ) measure of complexity.
Dataset Whitfield et al. (2002) Mol. Biol. Cell.
13, 1977-2000.
synchronized HeLa cells 48 time points at 1-hour
time intervals 29,621 distinct genes

54
Lempel-Ziv Complexity
The algorithm parses the sequence into shortest
words that have not occurred previously and the
complexity is defined as the number of such
words. Words are unique, except possibly the last
one.
01100101101100100110
01010101010101010101
LZ Complexity 7
LZ Complexity 3
55
Lempel-Ziv Complexity Example
01100101101100100110
LZ Complexity 7
56
Lempel-Ziv Complexity some remarks

Universal complexity measure
Basis of powerful lossless compression schemes
(ZIP, GIF, etc.)
by replacing words with a pointer to a previous
occurrence of the same word
Optimal compression rate approaches the entropy
of the random sequence
Asymptotically Gaussian can be used for
statistical test of randomness.

57
Intuition

Genes in ordered networks have low LZ
complexities.
Genes in chaotic networks have high LZ
complexities.

58
Binarization

We used the well-known k-means algorithm with two
groups, corresponding to the two binary values
(0,1).

59
Lempel-Ziv complexity distributions of binarized
HeLa data vs. random binary data
60
HeLa time-series data
ordered
critical
RBN
Binarize
chaotic
01101001101001101011
10011001100100110110
(29,621 genes by 48 time points)
LZ complexities
LZ complexities
Compute distance
Find minimum
61
Distance between LZ distributions
Kullback-Leibler (KL) distance
Euclidean distance
62
Three techniques to tune ordered, critical, and
chaotic regimes.

Fix p 0.5, let K 1, 2, 3, 4.
Fix K 4, let p 0.93301, 0.85355, 0.75, 0.5.
Scale-free topology with connectivity K(?). Vary
scale-free exponent ? such that average network
sensitivity is equal to the cases above. (Aldana
Cluzel (2003) PNAS, 100(15)8710-4)

63
But what about noise?

Wouldnt noise make things look more chaotic?
There are two issues
In the binary domain, the compound effect of
noise amounts to a certain percentage of values
in the time series data being flipped from zero
to one or vice versa.
Many genes are expressed at levels that are below
those corresponding to pure noise.
Fortunately, using the HeLa data, it is possible
to estimate both the binary noise probability and
the global noise floor level as follows.

64
Estimate the noise floor

There are 963 empty spots on the HeLa
microarrays.
As a conservative estimate, for each of the 48
microarrays, we used the 95th percentile of the
values of the empty spots as the noise floor
level for that array.
Only those genes whose values exceed this global
threshold at all time points are included for
further analysis.
Hence our criteria are very stringent.

65
Estimate the noise probability q

We made use of the replicated probes available on
the arrays.
2001 duplicate gene profiles of 48 time points.
Keeping only those that exceeded the global
threshold, we binarized each of the duplicate
profiles and computed the normalized Hamming
distance.

with a 95 bootstrap confidence interval of
0.32, 0.38.
66
Euclidean (fix p 0.5, tune K)
Shmulevich et al. (2005) PNAS 102(38)13439.
67
Kullback-Leibler (fix p 0.5, tune K)
Shmulevich et al. (2005) PNAS 102(38)13439.
68
Euclidean (fix K 4, tune p)
Shmulevich et al. (2005) PNAS 102(38)13439.
69
Kullback-Leibler (fix K 4, tune p)
Shmulevich et al. (2005) PNAS 102(38)13439.
70
Euclidean, Scale-free (tune ?)
Shmulevich et al. (2005) PNAS 102(38)13439.
71
Kullback-Leibler, Scale-free (tune ?)
Shmulevich et al. (2005) PNAS 102(38)13439.
72
Concluding remarks

The results strongly suggest that HeLa cells are
in the ordered regime or are critical, but not
chaotic.
We cannot statistically distinguish between
ordered and critical with these data.
Critical networks appear to predict the
distribution of genes whose activities are
altered in several hundred knock-out mutants of
yeast. (Serra et al. (2004) J. Theor. Biol. 227,
149-157)
It will be important to use more realistic
ensembles of model genetic networks to test
whether our conclusions hold.

Write a Comment

User Comments (0)

About PowerShow.com

Insights from Boolean Modeling of Genetic Regulatory Networks - PowerPoint PPT Presentation

Insights from Boolean Modeling of Genetic Regulatory Networks

... et al. (2003) Machine Learning, 52, 147-167. 20. Coefficient of ... GNB2 influences MAP kinase 1, which in turn influences c-rel, an NF B component. 32 ... – PowerPoint PPT presentation