Highthroughput SELEXSAGE method for quantitative modeling of transcriptionfactor binding sites

About This Presentation

Title:

Highthroughput SELEXSAGE method for quantitative modeling of transcriptionfactor binding sites

Description:

High-throughput SELEX-SAGE method for quantitative modeling of transcription ... Emmanuelle Roulet, Stephane Busso, Anamaria A. Camargo, Andrew H.G. Simpson, ... – PowerPoint PPT presentation

Number of Views:208

Avg rating:3.0/5.0

Slides: 19

Provided by: cauloSt

Category:

more less

Transcript and Presenter's Notes

Title: Highthroughput SELEXSAGE method for quantitative modeling of transcriptionfactor binding sites

1
High-throughput SELEX-SAGE method for
quantitative modeling of transcription-factor
binding sites

Emmanuelle Roulet, Stephane Busso, Anamaria A.
Camargo, Andrew H.G. Simpson, Nicolas Mermod,
Philipp Bucher
Nature Biotechnology August 2002 Vol. 20 p831-835

BioNetworks Journal Club -- Feb. 24, 2003 Slides
prepared by Alison Hottes
2
General Background

Transcription factors bind to DNA and enhance or
inhibit gene transcription.
Transcription factor binding is DNA-sequence
dependent.

3
General Problems

Want to figure out which transcription factors
bind where and under what conditions
This knowledge specifies a large part of the
connectivity and topology of a cells
transcriptional network
From a proteins amino acid sequence or structure
we cant currently predict its binding
specificity.

4
Common Approaches

Take a few known binding sites for a
transcription factor and generalize to a
consensus sequence.
Find genes likely regulated by a transcription
factor (e.g. microarrays). Search the upstream
DNA sequence of those genes for motifs which
might be the binding site.
Take the consensus and see where else it appears
in a genome.
In both methods, the accuracy is variable and
often unknown.

5
How complex a model is needed?

Example Suppose a transcription factor binds to
a 25 base pair DNA region.
Option 1 Assign a score for the binding affinity
for each possible 25mer
Need 425 scores (completely unrealistic)
Option 2 Assume each base position acts
independently and equally.
Need 325 parameters. (Still a lot.)
Gaps are more difficult.

6
For any model

Need a lot of data
How can we get the data?
What can we do with the data once we have it?
This paper illustrates one approach for CTF/NFI
Binds as a homodimer
Recognizes sequences like TTGGC(N5)GCCAA

7
Initial Model

Weight matrix
Models multiple binding modes
Given a sequence, can produce a score.
How would we generate sequences consistent with
the model?

8
Generating Synthetic Sequences

Generate a HMM from the weights.
Model depends on the average score desired.
Use each HMM to generate sequences.

9
Experiment Design

General plan Find sequences that bind CTF/NFI
and use them to refine the original model.
Design question What affinity (low/ medium/
high) should the sequences have for CTF/NFI?
Approach Use sets of synthetic training
sequences to estimate weights. Compare to
original weights.
Conclusion Low-affinity sequences give more
accurate models.

10
SELEX Systematic evolution of ligands by
exponential enrichment
SAGE Serial analysis of gene expression
11
Controlling Binding Affinity

Add radioactive medium-affinity 25mer probe.
Want 50 of probe to be competed away by library
at each step.
Stop when library and probe bind equally well ?
library has medium affinity.

12
SELEX Systematic evolution of ligands by
exponential enrichment
SAGE Serial analysis of gene expression
13
Affinity of Each Cycle

Fraction of binding sequences increases each
repetition.
Average affinity of binding sequences is
maintained.

14
New Model
HMM from SELEX 3
Weight matrix model.

Each half-site is now 6 bases.
Model is less tolerant of spacing variation.
Weights shifted (e.g. now more tolerant of
adenines in positions 2 and 4).

15
Cross-validation

Model is reasonably stable to sampling variations
in the dataset.

16
Experimental Verification

Scores correlate well with binding affinity.
Used model to search Eukaryotic Promoter Database
for binding candidates. Showed CTF/NFI induction
of some candidates.

17
Non-independence Between Bases

Checked SELEX sequences for dinucleotide
correlations.
Found dependencies. Add less than 1 bit to 12
bit model.

18
Comments and Questions

Did the improvement justify the work?
Should the background DNA sequence be modeled?
Could this be done for an arbitrary transcription
factor?
Is it necessary to know a medium affinity
sequence?
Must know binding partners.
Must know what modifications (e.g.
phosphorylation) are needed for binding.
Need extract with active protein

Write a Comment

User Comments (0)

About PowerShow.com

Highthroughput SELEXSAGE method for quantitative modeling of transcriptionfactor binding sites - PowerPoint PPT Presentation

Highthroughput SELEXSAGE method for quantitative modeling of transcriptionfactor binding sites

High-throughput SELEX-SAGE method for quantitative modeling of transcription ... Emmanuelle Roulet, Stephane Busso, Anamaria A. Camargo, Andrew H.G. Simpson, ... – PowerPoint PPT presentation