Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic network - PowerPoint PPT Presentation

About This Presentation

Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic network


Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic network – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 24
Provided by: Chi7178
Learn more at:


Transcript and Presenter's Notes

Title: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic network

Gene expression Network dynamics from microarray
data to gene-gene connectivity reconstruction.
Reconstruction of c-MYC proto-oncogene
regulated genetic network
G. C.Castellani, D.Remondini, N.Intrator, B.
OConnell, JM Sedivy Centro L.Galvani Biofisica
Bioinformatica e Biocomplessità Università
Bologna and Physics Department Bologna Institute
for Brain and Neural System Brown University
Providence RI
Gene expression Network dynamics from microarray
data to gene-gene connectivity reconstruction.
Reconstruction of c-MYC proto-oncogene
regulated genetic network
  • Gene significance
  • Temporal structure
  • Gene clustering
  • Model validation

Complex Network Theory and its application to
cellular networks
Complex Network theory is a rapidly growing
field of contemporary interdisciplinary
research. The applications ranges from
Mathematics to Physics to Biology. The classical
mathematical theory has been developed
(1957-1960) by Erdos and Reny Random
Graph . Some Physical problems that are related
to this approach are Percolation, Bose-Einstein
Condensation and the Simon problem. Recent
application to Biology are focused on Neural
Network,Immune Network Protein Folding, Proteomic
and Genomics mainly on the large scale
organization of Biological Network
One of the most recent theories that has been
shown to have promising applications in the
Biological Sciences is the so called Theory of
Complex Networks that have been applied to
protein-protein interaction and to metabolic
network (Jeong and Barabasi)
Classical Random Graphs
A Random Graphs is a set of nodes and edges
connecting them. The number of edges and their
nodes attachment are chosen Randomly with a
certain probability p.
It has been demonstrated that there exists a
critical probability pc for the appearance of a
giant cluster (phase transition) pc N-1. Another
Erdos Reny result is that the degree connectivity
distribution (the number of edges of each node)
follows the Poisson statistics
Extension to Random Graph Theory
During the last years considerable efforts have
been done to further analyze the statistics of
Random Graphs. The major results are summarized
by the so called Small World and Scale free
graphs The Small World graphs interpolates
between regular lattice and Random graphs. The
Scale Free network are created by two simple
rules Network growth and Preferential Attachment
(the most connected Nodes are the most probable
sites of attachment) Both models gives a non
Poisson degree distribution Power Law
Moreover, this type of distributions was observed
in real networks such as Internet, C.Elegans
Brain, Methabolic Network with 2lt glt 3 exponent
and various values for the exponential cutoff kc
and k0
Inadeguacy of complete connectivity
The complete connectivity as well the random
connectivity are not biologically plausible.
Connectivity changes as consequence to
developemental changes (ie learning, ageing)
appear most appropriate
Comparison between experimental and theoretical
resultson the number of virgin cells during the
lifespan.The number of stable states (that we
identify with memory capacity and with memory
cells) increases as a function of age. We found
similar results (increase of number of stable
states by connectivity changes) also for the BCM
model, but the biological interpretation is less
The John Sedivy Lab at Brown University has
designed a new generation of microarrays that
cover approximately one half of the whole rat
genome (roughly 9000 genes). The array
construction aims at obtaining a precise
targeting of the proto-oncogene c-MYC. This gene
encodes for a transcriptional regulator that is
correlated with a wide array of human
malignancies, cellular growth and cell cycle
progression. The data base is organized in 81
array obtained by hybridisation with a cell line
of rat fibroblats. These gene expression
measurements were performed in triplicate for a
better statistical significance. The complete
data set is divided into three separate
experiments each of which addresses a specific
problem.   Experiment 1 Comparison of different
cell lines where c-myc is expressed at various
degrees ( null, moderate, over-expressed). This
experiment can reveal the total number of genes
that respond to a sustained loss of c-Myc as well
as those genes that respond to c-MYC
over-expression.   Experiment 2 Analysis of
those cell-lines that over-express c-Myc
following stimulation with Tamoxifen (a drug
that has been used to treat both advanced and
early stage breast cancer). This data was
collected during a 16 our time course. This
experiment reveals the kinetics of the response
to Myc activation and may lead to the
identification of the early- responding
genes.   Experiment 3 Analysis of the time
course of induction with Tamoxifen when it was
performed in the presence of Cycloheximide (a
protein synthesis inhibitor). This experiment
reveals a subset of direct transcriptional
targets of c-Myc.  
Our approach to the determination of the C-MYC
regulated network can be summarized in 3 points
1) List of genes based on significance analysis
over time points between MYC and control and
within time point (between groups and within
groups (time)). 2) Time translation matrix
calculated on microarray treated with Tamoxifen
and not treated - T and NT raw data The
resulting time translation matrix will be used to
reconstruct the connectivity matrix between
genes 4) Model validation for determination of
the error model
Significance Analysis
  • S0 is an appropriate regularizing factor.
  • Interesting genes are chosen as the union between
    the genes selected with the above methods
  • With this SA we obtain 776 significative genes
    (plt0.05) if we require significance on 1 time

Step 2 Linear Markov Model
The selected genes are used for the step 2 of our

The x(t) are the gene expressions at time t and A
is the unknown matrix that we estimate from time
course (0,2,4,8,16) of microarray data (T and NT
separately, An and At). This is a so called
inverse problem because the matrix is recovered
from time dependent data. -gt From appropriate
thresholding on As we can recover
the connectivity matrix between the genes.
Network topology
No Tamoxifen
With Tamoxifen
Model validation
The different models (data preprocessing,
modeling of gene dynamics, clustering techniques)
have been validated mathematically by means
of - residues analysis (errors)
The residual are small and we have used a Markov
matrix that is not the original (computed over 5
time steps) but the validated one. We compute the
matrix on 4 time step and the validation is on
the subsequent by comparison with the real data.
Changing databases
In order to have a better understanding of the
results, both in terms of network topology and
connectivity distribution, we generated 2
databases 1) One small database with those
genes that were without any doubt affected by
Tamoxifen (50 genes) 2) One larger database with
all the genes that give 2 P on 3 experiments i.e.
those genes for which we have good measurements
(3444 genes)
50 genes database
For each of the 50 genes, we computed the
connectivity and the clustering coefficient that
express if the gene is connected to highly
connected or poorly connected genes. It is
possible to see that the treatment with Tamoxifen
causes a decrease in clustering in the network so
it seems that the network becomes less scale
free. This is confirmed by the network
clustering coefficient N Overall graph
clustering coefficient 0.840 T Overall graph
clustering coefficient 0.241 The individual
connectivity and clustering changes are
summarized in this table Table
The 3444 genes database
This large database is used in order to have a
better statistics and possibly a distribution fit
Clearly these distributions are not Poisson and
seem to be Power law with exponential tail
Fitting the distributions
We fitted the distribution with a generalized
Network Structure (3444 genes)
N Overall graph clustering coefficient 0.902 T
Overall graph clustering coefficient 0.893 From
this results and from the fit parameters it seems
that the N-Network is less scale free, but these
results are strongly affected by noise
We have looked at the individual connectivity and
clustering coefficient, and their variation
between N and T. The results are encouraging
between those genes that have changed their
connectivity in a significant way there are
C-MYC targets
Network Structure (3444 genes)
As an example we report some connectivity change
in C-Myc target genes
  • rc_AI178135_at complement component 1, q
  • binding protein 3 272
  • U09256_at transketolase 13 39
  • U02553cds_s_at protein tyrosine phosphatase,
  • non-receptor type 16 133 146
  •  390 D10853_at phosphoribosyl pyrophosphate
  • Amidotransferase 0 7
  • M58040_at transferrin receptor 1 27

The connectivity is a very important parameter
both for Physical and Biological systems.
Connectivity (coupling) changes are the basis
for Phase Transitions and developmental changes
(ageing, learning and response to external
We have tested the hyphothesis that a treatment
with Tamoxifen that in these engineered cells
lead to C-MYC activation can be related to
connectivity changes between genes
Our results show that within the framework of
scale free network there are changes in gene-gene
The connectivity distributions of N and T are far
from Poisson with parameters that are similar to
those founded for other systems that account for
scale free distribution with exponential tail.
One clear result is that the global gene degree
connectivity follow a power law distribution both
without and with Tamoxifen.This result seems to
point out that this type of behaviour is very
If we look for the individual gene connectivity
or if we look in smaller database we observe that
there are significant changes induced by the
treatment. As example the clustering coefficient
changes and some C-MYC target shows connectivity
and clustering coefficient changes
These results need to be confirmed and further
analyzed, but, at our knowledge this is the first
attempt to monitor the network connectivity
changes induced by C-MYC activation in
comparison with a basal level
The MARKOV approach for the gene-gene
connectivity reconstruction is not new (Maritan
2001) but we have introduced matrix validation,
rigorous data discretization and normalization
that can improve the model robustness
Some points that need further analysis are the
correlation between connectivity change and C-MYC
target, our method is not a significance test it
can only help to look gene activity as result of
interactions between genes at the previous time
Finally we will further improve the model
robustness by time reshuffling and try to test
its predictive performances
Write a Comment
User Comments (0)