Insights from Boolean Modeling of Genetic Regulatory Networks - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Insights from Boolean Modeling of Genetic Regulatory Networks

Description:

... et al. (2003) Machine Learning, 52, 147-167. 20. Coefficient of ... GNB2 influences MAP kinase 1, which in turn influences c-rel, an NF B component. 32 ... – PowerPoint PPT presentation

Number of Views:189
Avg rating:3.0/5.0
Slides: 72
Provided by: ilyashm
Category:

less

Transcript and Presenter's Notes

Title: Insights from Boolean Modeling of Genetic Regulatory Networks


1
Insights from Boolean Modeling of Genetic
Regulatory Networks
  • ilya shmulevich

2
Part I
  • Discover and understand the underlying gene
    regulatory mechanisms by means of inferring them
    from data.
  • By using the inferred model, endeavor to make
    useful predictions by mathematical analysis and
    computer simulations.

3
genetic networks
  • Complex regulatory networks among genes and their
    products control cell behaviors such as
  • cell cycle
  • apoptosis
  • cell differentiation
  • communication between cells in tissues
  • A paramount problem is to understand the
    dynamical interactions among these genes,
    transcription factors, and signaling cascades,
    which govern the integrated behavior of the cell.

Analogy circuit diagram
4
Clinical Impact
  • Model-based and computational analysis can
  • open up a window on the physiology of an organism
    and disease progression
  • translate into accurate diagnosis, target
    identification, drug development, and treatment.

5
What class of models should be chosen?
  • The selection should be made in view of
  • data requirements
  • goals of modeling and analysis.

Goals
Data
Model
6
Classical tradeoff
  • A fine model with many parameters
  • may be able to capture detailed low-level
    phenomena (protein concentrations, reaction
    kinetics)
  • requires large amounts of data for inference
  • A coarse model with low complexity
  • may succeed in capturing only high-level
    phenomena (e.g. which genes are ON/OFF)
  • requires smaller amounts of data

7
Ockhams Razor
  • Underlies all scientific theory building.
  • Model complexity should never be made higher than
    what is necessary to faithfully explain the
    data.
  • What kind of data do we have and how much?

William of Ockham (1280-1349)
8
Boolean Networks
  • To what extent do such models represent reality?
  • Do we have the right type of data to infer
    these models?
  • What do we hope to learn from them?

9
Basic Structure of Boolean Networks
1 means active/expressed 0 means
inactive/unexpressed
A
B
Boolean function A B X 0 0 1 0 1 1 1 0 0 1 1 1
X
In this example, two genes (A and B) regulate
gene X. In principle, any number of input genes
are possible. Positive/negative feedback is also
common (and necessary for homeostasis).
10
Dynamics of Boolean Networks
A
B
C
D
E
F
Time
0
1
1
0
0
1
11
State Space of Boolean Networks
  • equate cellular states (or fates) with
    attractors.
  • attractor states are stable under small
    perturbations
  • most perturbations cause the network to flow back
    to the attractor.
  • some genes are more important and changing their
    activation can cause the system to transition to
    a different attractor.

Picture generated using the program DDLab.
12
Boolean model of the yeast filamentation network
Taylor, Galitski
13
But can we extract meaningful biological
information from gene expression data entirely in
the binary domain?
  • We reasoned that if genes, when quantized to only
    two levels (1 or 0) would not be informative in
    separating known subclasses of tumors, then there
    would be little hope for Boolean inference of
    real genetic networks.

14
Gene expression analysis in the binary domain
  • By using binary gene expression data and Hamming
    distance as a similarity metric, a separation
    between different subtypes of gliomas is evident,
    using multidimensional scaling.

Shmulevich, I. and Zhang, W. (2002)
Bioinformatics 18(4), 555-565.
15
Boolean Framework
  • Limited amounts of data and the noisy nature of
    the measurements can make useful quantitative
    inferences problematic and a coarse-scale
    qualitative modeling approach seems to be
    justified.
  • Boolean idealization enormously simplifies the
    modeling task.
  • We wish to study the collective regulatory
    behavior without specific quantitative details.
  • Boolean networks qualitatively capture typical
    genetic behavior.
  • Albert, R Othmer, H.G. (2003) J. Theor. Biol.
    223, 1-18.
  • Mendoza, L., Thieffry, D. Alvarez-Buylla, R.E.
    (1999) Bioinformatics 15, 593-606.
  • Huang, S. Ingber, D. E. (2000) Exp. Cell Res.
    261, 91-103.
  • Li F, Long T, Lu Y, Ouyang Q, Tang C. (2004)
    PNAS. 101(14)4781-6.

16
(No Transcript)
17
Probabilistic Boolean Networks (PBN)
  • Share the appealing rule-based properties of
    Boolean networks.
  • Robust in the face of uncertainty.
  • Dynamic behavior can be studied in the context of
    Markov Chains.
  • Boolean networks are just special cases.
  • Close relationship to (dynamic) Bayesian networks
  • Explicitly represent probabilistic relationships
    between genes. (Lähdesmäki et al. (2006) Sig.
    Proc., 86(4)814-834)
  • Can represent the same joint probability
    distribution.
  • Allow quantification of influence of genes on
    other genes (stay tuned for examples)

Shmulevich et al. (2002) Proceedings of the IEEE,
90(11), 1778-1792.
18
Basic structure of PBNs
If we have several good competing predictors
(functions) for a given gene and each one has
determinative power, dont put all our faith in
one of them!
19
Model Inference from Gene Expression Data
  • Two approaches
  • Coefficient of Determination (Dougherty et al.
    2000)
  • Best-Fit Extensions

Lähdesmäki et al. (2003) Machine Learning, 52,
147-167.
20
Coefficient of Determination (COD)
  • COD is used to discover associations between
    variables.
  • It measures the degree to which the expression
    levels of an observed gene set can be used to
    improve the prediction of the expression of a
    target gene relative to the best possible
    prediction in the absence of observations.
  • Using the COD, one can find sets of genes related
    multivariately to a given target gene.

21
COD Definition
Target gene
Observed genes
Optimal Predictor
?i is the error of the best (constant) estimate
of xi in the absence of any conditional
variables ?opt is the optimal error achieved by
f
22
Constraints During Inference
  • Constraining the class of predictors can have
    advantages
  • lessening the data requirements for reliable
    estimation
  • incorporating prior knowledge of the class of
    functions representing genetic interactions
  • certain classes of functions are more plausible
    from the point of view of evolution, noise
    resilience, network dynamics, etc.

23
Example of Constraint Post Classes
Shmulevich et al. (2003) PNAS 100(19),
10734-10739.
Emil Post (1897-1954)
  • The class is sufficiently large (this is
    important for inference).
  • An abundance of functions from this class will
    tend to prevent chaotic behavior in networks.
  • Eukaryotic cells are not chaotic! (Shmulevich et
    al. (2005) PNAS 102(38), 13439-13444.)
  • Functions from this class have a natural way to
    ensure robustness against noise and uncertainty.

24
Post Class Constraints During Inference
  • We compared the Post classes to the class of all
    Boolean functions (i.e. no constraint) by
    estimating the corresponding prediction error for
    a set of target genes, using available gene
    expression data.
  • We found that the optimal error of Post functions
    compares favorably with optimal error without
    constraint.
  • A hypothesis testing-based study gives no
    statistically significant evidence against the
    use of constrained function classes (i.e. cost of
    constraint).
  • Thus, Post classes are also plausible in light of
    experimental data.

25
SubnetworksTheory and Examples
  • aim discover relatively small subnetworks
  • whose genes interact significantly and
  • whose genes are not strongly influenced by genes
    outside the subnetwork.
  • Principle of Autonomy
  • Start with a seed gene set and iteratively
    adjoin new genes so as to enhance subnetwork
    autonomy.

26
Growing Algorithm
To achieve network autonomy, both of these
strengths of connections should be high.
The sensitivity of Y from the outside should be
small.
Various stopping criteria can be used
Hashimoto et al. (2004) Bioinformatics 20(8)
1241-1247.
27
Cancer tissues need nutrients. Gliomas are highly
angiogenic. Expression of VEGF is often elevated.
28
VEGF is elevated in advanced stage of
gliomas Confirmation and localization by tissue
microarray
29
VEGF protein is secreted outside the cells and
binds to its receptor on the endothelial cells to
promote their growth.
30
Member of fibroblast growth factor family
FGF7
VEGF
PTK7
Tyrosine kinase receptor
GRB2
  • The protein products of all four genes are part
    of signal transduction pathways that involve
    surface tyrosine kinase receptors.
  • These receptors, when activated, recruit a number
    of adaptor proteins to relay the signal to
    downstream molecules
  • GRB2 is one of the most crucial adaptors that
    have been identified.
  • GRB2 is also a target for cancer intervention
    because of its link to multiple growth factor
    signal transduction pathways.

FSHR
Follicle-stimulating hormone receptor
31
(No Transcript)
32
  • Such relationships should also be validated
    experimentally.
  • The networks built from our models provide
    valuable theoretical guidance for further
    experiments.

33
  • IGFBP2 is overexpressed in high-grade gliomas
  • IGFBP2 contributes to increased cell invasion.

34
IGFBP2 is elevated in advanced stage of
gliomas Confirmation and localization by tissue
microarray
35
IGFBP2 promotes glioma cell invasion in vitro
High IGFBP2 clone 1
Vector
Low IGFBP2 clone
High IGFBP2 clone 2
36
A. Niemistö, L. Hu, O. Yli-Harja, W. Zhang, I.
Shmulevich, "Quantification of in vitro cell
invasion through image analysis," International
Conference of the IEEE Engineering in Medicine
and Biology Society (EMBS'04), San Francisco,
California, USA, Sep. 1-5, 2004.
37
  • A review of the literature showed that Cazals et
    al. (1999) indeed demonstrated that NF?B
    activated the IGFBP2 promoter in lung alveolar
    epithelial cells.

IGFBP2
NF?B
38
  • Higher NF?B activity in IGFBP2 overexpressing
    cells was also found.
  • Transient transfection of IGFBP2 expressing
    vector together with NF?B promoter reporter gene
    construct did not lead to increased NF?B
    activity, suggesting an indirect effect of IGFBP2
    on NF?B

IGFBP2
TNFR2
  • Our real-time PCR data showed that in stable
    IGFBP2-overexpressing cell lines, IGFBP2 indeed
    enhances ILK expression.
  • In addition, IGFBP2 contains an RGD domain,
    implying its interaction with integrin molecules.
  • ILK is in the integrin signal transduction
    pathway.

ILK
NF?B
  • Studies also showed that IGFBP2 affects cell
    apoptosis and TNFR2 is a known regulator of
    apoptosis

39
PBN web page
http//personal.systemsbiology.net/ilya/PBN/PBN.ht
m
  • Reprints
  • Software (BN/PBN MATLAB Toolbox)
  • Posters/Presentations
  • Workshops
  • Links
  • PBN People

40
PBN Collaborators
Wei Zhang Harri Lähdesmäki Olli
Yli-Harja Jaakko Astola Edward
Dougherty Ronaldo Hashimoto Marcel
Brun Seungchan Kim Edward Suh Huai Li Michael
Bittner
Support NIH/NIGMS R21 GM070600-01 NIH/NIGMS R01
GM072855-01
41
Part II
42
Joint work with
Stu Kauffman
Max Aldana
43
Order/Chaos
  • A broad body of work over the past 35 years has
    shown that a variety of model genetic regulatory
    networks behave in two broad regimes, ordered and
    chaotic, with an analytically and numerically
    demonstrated phase transition between the two.

44
Edge of chaos
  • The boundary between order and chaos is called
    the complex regime or the critical phase.
  • The system can undergo a kind of phase
    transition.
  • Networks are most evolvable at the edge of
    chaos.
  • Living system in a variable environment
  • Strike a balance malleability vs. stability
  • Must be stable, but not so stable that it remains
    forever static.
  • Must be malleable, but not so malleable that it
    is fragile in the face of perturbations.

45
Plausible and long-standing hypothesis Real
cells lie in the ordered regime or are critical.
Life at the edge of chaos
There has been no experimental data supporting
this hypothesis.
46
Ordered networks
  • Homeostasis
  • A modest number of small recurrent patterns of
    gene activity (attractors)
  • plausible models of the diverse cell types (or
    cell fates) of an organism
  • the phenotypic traits of the organism are encoded
    in the dynamical attractors of its underlying
    genetic regulatory network
  • Confined avalanches of gene activity changes
    following transient perturbations in the activity
    of single genes
  • i.e. confined damage spreading

47
Chaotic networks
  • Nearby states lie on trajectories that diverge
  • hence, fail to exhibit a natural basis for
    homeostasis
  • Have enormous attractors whose sizes scale
    exponentially with the number of genes
  • Exhibit vast avalanches of gene activity
    alterations following transient perturbations to
    single gene activities

48
The model class
  • Random Boolean Networks (RBNs) - Kauffman (1969)
    ensemble approach
  • One of the most intensively studied models of
    discrete dynamical systems.
  • Sustained interest from biology and physics
    communities.
  • Considered for many years as prototypes of
    nonlinear dynamical systems.
  • RBNs are
  • Structurally simple yet capable of remarkably
    rich complex behavior!

49
Connectivity
Mean number of input variables
(e.g. scale-free)
50
Bias
  • The bias p of a random function is the
    probability that it takes on the value 1.
  • If p 0.5, then the function is unbiased.

51
Connectivity, bias, and the phase transition
Average Network Sensitivity
Chaos
Critical Phase
Order
Shmulevich Kauffman (2004) Physical Review
Letters, 93(4) 048701
52
Phase transition
  • RBNs can be tuned to undergo a phase transition
    by
  • tuning the connectivity K
  • tuning the bias p
  • tuning the scale-free exponent ?
  • Aldana Cluzel (2003) PNAS, 100(15)8710-4.
  • tuning abundance of functional classes
  • Shmulevich et al. (2003) PNAS 100(19)10734-9.

53
Our approach
  • Measure and compare the complexity of time series
    data of HeLa cells with that of mock data
    generated by RBNs operating in the ordered,
    critical, and chaotic regimes.
  • We use the Lempel-Ziv (LZ) measure of complexity.
  • Dataset Whitfield et al. (2002) Mol. Biol. Cell.
    13, 1977-2000.
  • synchronized HeLa cells 48 time points at 1-hour
    time intervals 29,621 distinct genes

54
Lempel-Ziv Complexity
The algorithm parses the sequence into shortest
words that have not occurred previously and the
complexity is defined as the number of such
words. Words are unique, except possibly the last
one.
01100101101100100110
01010101010101010101
LZ Complexity 7
LZ Complexity 3
55
Lempel-Ziv Complexity Example
01100101101100100110
LZ Complexity 7
56
Lempel-Ziv Complexity some remarks
  • Universal complexity measure
  • Basis of powerful lossless compression schemes
    (ZIP, GIF, etc.)
  • by replacing words with a pointer to a previous
    occurrence of the same word
  • Optimal compression rate approaches the entropy
    of the random sequence
  • Asymptotically Gaussian can be used for
    statistical test of randomness.

57
Intuition
  • Genes in ordered networks have low LZ
    complexities.
  • Genes in chaotic networks have high LZ
    complexities.

58
Binarization
  • We used the well-known k-means algorithm with two
    groups, corresponding to the two binary values
    (0,1).

59
Lempel-Ziv complexity distributions of binarized
HeLa data vs. random binary data
60
HeLa time-series data
ordered
critical
RBN
Binarize
chaotic
01101001101001101011
10011001100100110110
(29,621 genes by 48 time points)
LZ complexities
LZ complexities
Compute distance
Find minimum
61
Distance between LZ distributions
Kullback-Leibler (KL) distance
Euclidean distance
62
Three techniques to tune ordered, critical, and
chaotic regimes.
  • Fix p 0.5, let K 1, 2, 3, 4.
  • Fix K 4, let p 0.93301, 0.85355, 0.75, 0.5.
  • Scale-free topology with connectivity K(?). Vary
    scale-free exponent ? such that average network
    sensitivity is equal to the cases above. (Aldana
    Cluzel (2003) PNAS, 100(15)8710-4)

63
But what about noise?
  • Wouldnt noise make things look more chaotic?
  • There are two issues
  • In the binary domain, the compound effect of
    noise amounts to a certain percentage of values
    in the time series data being flipped from zero
    to one or vice versa.
  • Many genes are expressed at levels that are below
    those corresponding to pure noise.
  • Fortunately, using the HeLa data, it is possible
    to estimate both the binary noise probability and
    the global noise floor level as follows.

64
Estimate the noise floor
  • There are 963 empty spots on the HeLa
    microarrays.
  • As a conservative estimate, for each of the 48
    microarrays, we used the 95th percentile of the
    values of the empty spots as the noise floor
    level for that array.
  • Only those genes whose values exceed this global
    threshold at all time points are included for
    further analysis.
  • Hence our criteria are very stringent.

65
Estimate the noise probability q
  • We made use of the replicated probes available on
    the arrays.
  • 2001 duplicate gene profiles of 48 time points.
  • Keeping only those that exceeded the global
    threshold, we binarized each of the duplicate
    profiles and computed the normalized Hamming
    distance.

with a 95 bootstrap confidence interval of
0.32, 0.38.
66
Euclidean (fix p 0.5, tune K)
Shmulevich et al. (2005) PNAS 102(38)13439.
67
Kullback-Leibler (fix p 0.5, tune K)
Shmulevich et al. (2005) PNAS 102(38)13439.
68
Euclidean (fix K 4, tune p)
Shmulevich et al. (2005) PNAS 102(38)13439.
69
Kullback-Leibler (fix K 4, tune p)
Shmulevich et al. (2005) PNAS 102(38)13439.
70
Euclidean, Scale-free (tune ?)
Shmulevich et al. (2005) PNAS 102(38)13439.
71
Kullback-Leibler, Scale-free (tune ?)
Shmulevich et al. (2005) PNAS 102(38)13439.
72
Concluding remarks
  • The results strongly suggest that HeLa cells are
    in the ordered regime or are critical, but not
    chaotic.
  • We cannot statistically distinguish between
    ordered and critical with these data.
  • Critical networks appear to predict the
    distribution of genes whose activities are
    altered in several hundred knock-out mutants of
    yeast. (Serra et al. (2004) J. Theor. Biol. 227,
    149-157)
  • It will be important to use more realistic
    ensembles of model genetic networks to test
    whether our conclusions hold.
Write a Comment
User Comments (0)
About PowerShow.com