Title: A systems biology approach to the identification and analysis of transcriptional regulatory networks
1A systems biology approach to the identification
and analysis of transcriptional regulatory
networks in osteocytes
- Angela K. Dean, Stephen E. Harris, Jianhua Ruan
2Overview
- Osteocytes Background Motivation
- Review of Biological Central Dogma
- Osteoctye gene set derivation
- Osteocyte purification
- Microarray experiments
- Functional annotation analysis
- Sequence Analysis of promoter regions
- Construction of regulatory network
- Partitioning to define cis-regulatory modules
- Results
3Background Cellular functions
- Certain types of cells perform specific
biological functions - Key genes must be activated to perform correctly
- Osteocytes play an essential role in regulating
bone formation and remodeling - We want to identify these key genes and the
activators of these genes
4Why study osteocyte cells?
- Identifying these key genes (and their
activators) involved in the bone-formation
process may lead to new targeted therapies - For osteoporosis, loss of bone in space travel,
extended bed rest, etc.
5Molecular Biology Central Dogma
6- We want to identify these associations between
Transcription Factors and the genes that they
regulate in order to build a transcriptional
regulatory network
7Osteocyte cells are hard to isolate
- Embedded within the bone matrix, and lacking
molecular and cell surface markers, they are
seemingly inaccessible - How to characterize and isolate these cells?
- Solution create special mouse that contains
inserted special gene that drives fluorescence
in osteocytes
8Isolating osteocytes
- Osteocytes are known to highly express Dentin
matrix protein 1 (DMP1) - A transgene was created with the same promoter
(activation) region as DMP1 that drives GFP, then
inserted into this transgenic mouse - Cells that highly express DMP1 (osteocytes) will
also drive GFP - We can now purify osteocytes from other cells
using fluorescence-activated cell sorting
9Identifying key osteocyte genes using microarray
- Microarray experiments allow us to measure the
activity of genes (expression profile) - We compared the expression profiles of the
purified osteocyte cells (GFP) to non-osteocyte
cells (-GFP) - Identified the top 269 genes expressed gt 3 fold
in the GFP as compared to GFP (FDR-corrected
p-value lt 0.05)
10Identifying functionally-related osteocyte genes
- Each of the 269 genes has one or more GO terms or
PIR-keywords associated with it - Gene Ontology (GO) terms describe biological
processes, cellular components and molecular
functions - Protein Information Resource (PIR) keyword is an
annotation from the PIR database
11Functional Annotation Clustering
- For each GO term associated with a gene or group
of genes within the 269 set, a p-value is
computed using hypergeometric dist. and adjusted
for multiple testing using Benjamini method - Enrichment score per cluster is the geometric
mean of the indivual GO p-vals. - DAVID Bioinformatics Tool was used for the
clustering
12Functional annotation clustering results
- As expected, most enriched clusters relate to
extracellular region, system development,
etc. - Cluster 2 relates to bone, and interestingly,
Cluster 5 relates to muscle - We narrowed our 269 gene set to these 98 genes
corresponding to bone and muscle
13Identifying TF Binding Sites in the 98 gene set
- We searched the 5kb promoter sequence upstream to
TSS of each gene for known TF binding motifs from
TRANSFAC db, using rVista tool - Filtered the TF motifs to keep only those
conserved between mouse and human genomes - Conserved motifs increase confidence
14Identifying TF Binding Sites in the 98 gene set
- Many motifs identified related to bone muscle
- 67 of the 98 genes contained over 10 conserved
Mef2 binding sites in their promoters - Bone muscle genes and their number of conserved
Mef2 binding sites
15Building the transcriptional regulatory network
- Created a network consisting of the 98 gene set
and their conserved and enriched TFs as nodes - An edge between a gene and a TF represents the
statistically significant presence of that TFs
binding site on the promoter of that gene - TFs filtered using conservation AND enrichment
to produce more reliable edges and reduce noise - Enrichment of a TF motif is determined by a
p-value based on the of occurrences in the 5kb
upstream of this gene, as compared to the of
occurrences in the 5kb upstream of the rest of
the genes in the genome
16Modular structure of the regulatory network
- Final network consisted of 98 genes and 153
conserved and over-represented TFs - To identify possible combinatorial effects of
TFBS, we partitioned the genes in the network
using the Q-Cut algorithm - Q-Cut is a graph partitioning algorithm for
finding dense subnets (i.e., communities).
Optimizes a statistical score called the
modularity, and automatically determines the most
appropriate number of communities
17- We reduced noise and created a more sparse
gene-gene network for better partitioning - We created this temporary network by assigning a
cosine similarity score to each pair of genes
according to their shared TFs. - Cosine similarity is a measure of similarity
between two vectors (each vector contains 153
slots for the 153 enriched TFs in the 98 gene
set) - Edges between genes represent their similarity
score, and this net was converted to a sparse net
by connecting each gene to its k nearest
neighbors (k7) and employing a similarity score
cutoff of 0.5
18Identifying modules in the initial regulatory
network
- Q-Cut was then applied to this gene-gene network,
resulting in communities with many common TF
binding sites
19Interesting clusters
- Cluster below shows a strong community structure
between 16 genes and their common TFBS - Representative of many TFs coordinately
regulating a small set of genes
20A putative model of a transcriptional network
- A proposed model was built using the network
results - DMP1 Sost (highly expr. in osteocytes) are
shown to be regulated by Mef2 and Myogenin
21Putative model used to generate hypotheses
- We now have an ex vivo system for pure osteocytes
in a proper microenvironment to conduct
experimental validation based on this model - Here the osteocytes will make appropriate levels
of osteocyte-specific genes - Experiments are currently underway
22Conclusions
- We used a systems biology method to construct a
putative transcriptional regulatory network model
for osteocytes, by integrating - Microarray data
- Functional annotation
- Comparative genomics
- Graph-theoretic knowledge
- Many parts of the network can be confirmed by the
literature - Experiments are currently underway to further
validate the model