Highthroughput SELEXSAGE method for quantitative modeling of transcriptionfactor binding sites - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Highthroughput SELEXSAGE method for quantitative modeling of transcriptionfactor binding sites

Description:

High-throughput SELEX-SAGE method for quantitative modeling of transcription ... Emmanuelle Roulet, Stephane Busso, Anamaria A. Camargo, Andrew H.G. Simpson, ... – PowerPoint PPT presentation

Number of Views:208
Avg rating:3.0/5.0
Slides: 19
Provided by: cauloSt
Category:

less

Transcript and Presenter's Notes

Title: Highthroughput SELEXSAGE method for quantitative modeling of transcriptionfactor binding sites


1
High-throughput SELEX-SAGE method for
quantitative modeling of transcription-factor
binding sites
  • Emmanuelle Roulet, Stephane Busso, Anamaria A.
    Camargo, Andrew H.G. Simpson, Nicolas Mermod,
    Philipp Bucher
  • Nature Biotechnology August 2002 Vol. 20 p831-835

BioNetworks Journal Club -- Feb. 24, 2003 Slides
prepared by Alison Hottes
2
General Background
  • Transcription factors bind to DNA and enhance or
    inhibit gene transcription.
  • Transcription factor binding is DNA-sequence
    dependent.

3
General Problems
  • Want to figure out which transcription factors
    bind where and under what conditions
  • This knowledge specifies a large part of the
    connectivity and topology of a cells
    transcriptional network
  • From a proteins amino acid sequence or structure
    we cant currently predict its binding
    specificity.

4
Common Approaches
  • Take a few known binding sites for a
    transcription factor and generalize to a
    consensus sequence.
  • Find genes likely regulated by a transcription
    factor (e.g. microarrays). Search the upstream
    DNA sequence of those genes for motifs which
    might be the binding site.
  • Take the consensus and see where else it appears
    in a genome.
  • In both methods, the accuracy is variable and
    often unknown.

5
How complex a model is needed?
  • Example Suppose a transcription factor binds to
    a 25 base pair DNA region.
  • Option 1 Assign a score for the binding affinity
    for each possible 25mer
  • Need 425 scores (completely unrealistic)
  • Option 2 Assume each base position acts
    independently and equally.
  • Need 325 parameters. (Still a lot.)
  • Gaps are more difficult.

6
For any model
  • Need a lot of data
  • How can we get the data?
  • What can we do with the data once we have it?
  • This paper illustrates one approach for CTF/NFI
  • Binds as a homodimer
  • Recognizes sequences like TTGGC(N5)GCCAA

7
Initial Model
  • Weight matrix
  • Models multiple binding modes
  • Given a sequence, can produce a score.
  • How would we generate sequences consistent with
    the model?

8
Generating Synthetic Sequences
  • Generate a HMM from the weights.
  • Model depends on the average score desired.
  • Use each HMM to generate sequences.

9
Experiment Design
  • General plan Find sequences that bind CTF/NFI
    and use them to refine the original model.
  • Design question What affinity (low/ medium/
    high) should the sequences have for CTF/NFI?
  • Approach Use sets of synthetic training
    sequences to estimate weights. Compare to
    original weights.
  • Conclusion Low-affinity sequences give more
    accurate models.

10
SELEX Systematic evolution of ligands by
exponential enrichment
SAGE Serial analysis of gene expression
11
Controlling Binding Affinity
  • Add radioactive medium-affinity 25mer probe.
  • Want 50 of probe to be competed away by library
    at each step.
  • Stop when library and probe bind equally well ?
    library has medium affinity.

12
SELEX Systematic evolution of ligands by
exponential enrichment
SAGE Serial analysis of gene expression
13
Affinity of Each Cycle
  • Fraction of binding sequences increases each
    repetition.
  • Average affinity of binding sequences is
    maintained.

14
New Model
HMM from SELEX 3
Weight matrix model.
  • Each half-site is now 6 bases.
  • Model is less tolerant of spacing variation.
  • Weights shifted (e.g. now more tolerant of
    adenines in positions 2 and 4).

15
Cross-validation
  • Model is reasonably stable to sampling variations
    in the dataset.

16
Experimental Verification
  • Scores correlate well with binding affinity.
  • Used model to search Eukaryotic Promoter Database
    for binding candidates. Showed CTF/NFI induction
    of some candidates.

17
Non-independence Between Bases
  • Checked SELEX sequences for dinucleotide
    correlations.
  • Found dependencies. Add less than 1 bit to 12
    bit model.

18
Comments and Questions
  • Did the improvement justify the work?
  • Should the background DNA sequence be modeled?
  • Could this be done for an arbitrary transcription
    factor?
  • Is it necessary to know a medium affinity
    sequence?
  • Must know binding partners.
  • Must know what modifications (e.g.
    phosphorylation) are needed for binding.
  • Need extract with active protein
Write a Comment
User Comments (0)
About PowerShow.com