Title: Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome
1Exploiting transcription factor binding site
clustering to identify cis-regulatory modules
involved in pattern formation in the Drosophila
genome
- ECS289A Presentation
- By Hua Chen
- 2003-3-3
2Background Knowledge
- A significant character of cis-regulatory sites
the multiple binding sites for different
transcriptional factors tend to cluster together
in one region around the gene, forming the
Cis-Regulatory Modules (CRM). - The searching of cis-regulatory sites gives out
too many candidate positions, which make it
difficult to tell the true ones - The character of CRM provides a feasible method
to identify the cis-regulatory sites in the
genome.
3One example of CRM in Drosophilaeve gene
4Targets
The System Investigated
- Adopt the clustering of cis-regulatory modules as
a method to identify the functional motifs - Test the method with some known real CRM regions
- Search the genome to discover CRMs and confirm
the results by experiments.
- The early Drosophila embryo.
- Five transcriptional factors Bcd, Cad, Hb, Kr
and Kni are investigated.
5Methods
- Collecting Transcription Factor Binding Sequences
in preceding lab works and doing Alignment - Construction of Position Weight Matrices (PWM)
for the conserved motifs. - Test the method with the known CRMs
- Genome-wide Searching for unknown regulatory
regions - mRNA Hybridization and Microarray hybridization
to test whether the predicted regions are near to
genes under regulation of the Transcription
Factors - One special case giant gene, further
investigated by Transgenics and Mutant Embryo.
6Step1 Collection and Alignment of TF Binding
Sites
- Bcd, Cad, Hb, Kr, Kni binding sequences are
determined by in vitro DNAse protection assays - The sequences are aligned with MEME.
7(No Transcript)
8Step 2 Construction of PWMs and Searching
- Patser is used to construct the Position Weight
Matrix - Cis-Analyst is used to identify the potential
binding sites matching to the PWM in the
Drosophila genome. - A user-defined cutoff parameter (site_p) to
eliminate predicted low-affinity sites - Search the sequence with a specified window
length - Retain the windows that contain at least
min_sites binding sites - Merge all overlapping windows into a cluster.
9Binding Site Sequence for Cad
10Binding Sites
11(No Transcript)
12Step 3 Collection of Known CRMs
13Successful Result 14/19with the searching
criteria window-size700 bp, number of predicted
sitesgt13
14Step 4 Genome-wide Searching
- 28 clusters identified
- 23 out of 28 fall in regions between genes
- 5 in the intron regions
- 49 genes in the nearby regions.
15Step 5 Examine the expression pattern of the 49
genes by RNA in situ hybridization and microarray
hybridization
- The 49 genes are examined by hybridizations to
see whether they show the pattern of under
regulation of the TFs - 10 out of the 28 clusters are near to at least
one gene show the anterior-posterior expression
pattern (Under regulation of the five TFs).
16Step 6 The special case giant gene
- The posterior expression is regulated by
Cad,Hb,Kr - The cis-regulatory sites are still unknown
- The predicted CRM nearest to the giant gene is
cloned to the upstream of lacZ reporter gene. - The lacZ gene show a similar expression pattern
as the giant mRNA.
17Conclusions
- Binding site clustering is an effective method to
identify cis-regulatory modules - A major block is the paucity of the binding data
for most transcription factors, which need a
systematical work - The real CRM structures is more complex, it needs
to incorporate more complex rules in the method.
18Reference
- Berman, B.P., Nibu, Y. et al. 2001. Exploiting
transcription factor binding site clustering to
identify cis-regulatory modules involved in
pattern formation in the Drosophila genome.
P. N. A. S. 99757-762