Review: RECOMB Satellite Workshop on Regulatory Genomics - PowerPoint PPT Presentation

About This Presentation
Title:

Review: RECOMB Satellite Workshop on Regulatory Genomics

Description:

Experimentally investigated binding in those promoters with no TATA-box ... TATA, DPE, and MTE can all. independently support transcription ... – PowerPoint PPT presentation

Number of Views:159
Avg rating:3.0/5.0
Slides: 52
Provided by: DerekD45
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Review: RECOMB Satellite Workshop on Regulatory Genomics


1
Review RECOMB Satellite Workshop on Regulatory
Genomics
  • (Held March 26-27, 2004)

2
Workshop Themes/Trends
  • More comprehensive evaluations of motif-detection
    algorithms
  • Making more effective use of comparative
    mapping/evolution data
  • Models that explain rather than just describe
  • Moving from binding motifs to entire regulatory
    modules
  • Methods are simple not sophisticated

3
Outline
  • Jim Kadonaga, University of California, San
    DiegoThe MTE, a New Core Promoter Element for
    Transcription by RNA Polymerase II 
  • Rotem Sorek, Compugen and Tel Aviv UniversityThe
    "promoters" of splicing Intronic sequences that
    regulate alternative splicing
  • Yitzhak Pilpel, Weizman InstituteRevealing the
    architecture of genetic backup circuits through
    inspection of transcription regulatory networks
  • Ron Shamir, Tel Aviv UniversityRevealing
    selection patterns in the evolution of yeast
    transcription regulation
  • Michael Eisen, Lawrence Berkeley National Lab
    Evolutionary Signatures of Regulatory Sequences

4
A New Core Promoter Element for Transcription by
RNA Polymerase II(Jim Kadonaga)
  • The majority of transcription activity is
    regulated by sequence-specific DNA-binding
    factors, which are thus the focus of the bulk of
    current research on regulation, however...
  • The ultimate target of all of this action is the
    core promoter, which also plays a part in
    regulation

5
  • Core promoter
  • Encompasses TSS
  • Directs RNA polymerase II
  • Most well-known component is the TATA box

6
  • Core promoter
  • Encompasses TSS
  • Directs RNA polymerase II
  • Most well-known component is the TATA box

Only about 30-40 of promoters contain a TATA
box! Whats going on the rest of the time?
7
Finding Novel Promoter Elements
  • Experimentally investigated binding in those
    promoters with no TATA-box
  • found novel promoter element DPE
  • Large scale motif detection of 2000 core
    promoters in Drosophila (Ohler et al, 2002)
  • Plotted distance of top 10 motifs to TSS
  • four motifs had clear peak TATA, Inr, DPE and
    ...
  • a novel promoter element MTE

8
The Core Promoter gets a new look
MTE Motif Ten Promoter Element
(Kadonaga, powerpoint slides)
9
DPE and MTETwo newly Identified Promoter Elements
  • Conserved from Drosophila to human (unknown
    whether occur in yeast)
  • Very sensitive to spacing to Inr motif
  • experimentally found TSS (papers not reliable)
  • single insertion/delection between motifs causes
    7-fold reduction in transcription
  • Inr and DPE (or MTE) bound cooperatively by TFIID
  • first step in transcription initiation

10
TATA gets top billing but...
  • In Drosophila (out of 205 core promoters)
  • TATA and DPE 14
  • TATA only 29
  • DPE only 26
  • Neither 31
  • TATA, DPE, and MTE can all
  • independently support transcription
  • compensate for mutation in one other

11
And finally... regulation.
  • NC2 previously known to repress TATA-dependent
    transcription unexpectedly found to activate
    DPE-dependent transcripton
  • Studied 18 enhancers and estimate that about 25
    exhibit some specificity for DPE or TATA
  • Similar work in progress for MTE

12
The Promoters of Splicing (Rotem Sorek)
  • In general it is not known how alternative
  • splicing (AS) is regulated
  • A few known splicing regulatory proteins
  • like TFs they are sequence-specific, but they
    bind to RNA not DNA
  • binding motif (usually 4-10 nt) can be located in
    exon or intron
  • can act as enhancers or silencers
  • Evidence for combinatorial regulation

13
The typical motif in a haystack
  • Most work on finding splicing
    factor motifs focuses on exons
  • short enough that mutation studies feasible
  • Introns too long, require a computational
    approach
  • Compiled training dataset
  • 250 AS exons, AS both in mouse/human
  • large set of constituitively spliced (CS) exons,
    conserved across human/mouse

ATTCA
14
Sorek and Ast, Genome Research 2003
15
  • Their Primary Finding there tends to be
    significantly more conservation in introns
    surrounding AS exons than CS exons
  • On average about 100 bases on either side of each
    exon are conserved, compared to around 7 bases
    for constituitively spliced exons
  • Whats the explanation?
  • multiple binding motifs?
  • helping to determine secondary structure in RNA,
    which helps lead to correct splicing?

16
Predicting Alternative Splicing
  • Additional Predictive features
  • Higher conservation around exon
  • Higher conservation of exon itself (motifs?)
  • Shorter exons
  • Exons that are a multiple of 3
  • Method somehow chose one threshold for each
    feature?
  • Performance scanned human genome, predicted 1000
    AS exons (incl training data?)
  • 70 had EST evidence of AS vs 6-7 baseline
  • Lab test showed that 7/15 (randomly?) selected
    from remaining 30 are AS in at least one of 15
    tissues
  • Significance estimate splicing promoters cover
    3x106 bp

17
Genetic Backup Circuits(Kafri and Pilpel)
  • Fact single gene knockouts often have little or
    no phenotypic effect
  • 10 lethal in worm
  • 27 lethal in yeast
  • Question Can we better understand the mechanisms
    of genetic backup?
  • Task Predict whether a knockout will be lethal
    or not

18
Duplicates Suggest Redundancy
  • Genes with duplicates are less likely to be
    essential
  • But clearly this doesnt tell the whole story
  • lethal genes can have duplicates
  • nonessential genes often have no duplicate

(Gu, Z. et al Nature 2003)
19
Function of Duplicate Matters
  • Compute dispensability of yeast genes
  • growth rate after knockout compared to mean
    growth rate, averaged over many conditions
  • Compared GO functional annotations of highly
    similar genes. Found higher dispensability when
  • higher functional similarity (Resnik info
    content)
  • little functional similarity but high sequence
    similarity (Blast E-values)

20
Similarity of Expression
  • 40 time series, 500 timepoints
  • In each condition calculated correlation of
    expression profiles of each pair of paralogous
    genes
  • Average correlation suggests
  • backup is best provided by genes which do not
    share expression patterns

21
How can we explain this unexpected result?
  • Classify pairs into
  • negative correlation
  • never similarly expressed
  • positive correlation
  • always similarly expressed
  • no correlation
  • never similarly expressed or
  • similarly expressed in certain conditions

22
Variability of Expression
  • Use stdDev to quantify consistency of correlation
    across conditions

23
Goldilocks and the three little paralogs
Expression correlated in only a subset of
conditions Just Right
Always Same Expression Too Similar
Never Same Expression Too Diverged
  • Optimal backup requires the ability to switch
    between similar and dissimilar expression in a
    condition dependent manner

24
Predictions about the Past...
  • Hypothesized Duplication Mechanism
  • duplication occurs
  • leads to nonstable redundancy
  • quickly followed by either
  • mutation and loss of one of the duplicates
  • subfunctionalization leading to stable redundancy
  • Hypothesize two distinct types of
    subfunctionalization
  • mutation of coding region leading to functional
    divergence
  • mutation of control region leading to divergence
    of expression

25
Need for Regulatory Flexibility
  • This second type of subfunctionalization would
    entail a quite significant regulatory challenge
    if the paralogs are to provide backup for one
    another
  • Upon mutation of B, A must be turned on in the
    conditions that would normally require B
  • Postulate that
  • this regulatory challenge is met when a gene has
    a significant amount of regulatory diversity
    (i.e. different TF motifs)
  • backup asymmetry arises when one of the genes has
    few motifs (Kellis suggests otherwise?)

26
Experiments, but no hard numbers
  • Claim the capacity of genes to respond at the
    transcriptional level when their counterpart is
    deleted is central to their ability to provide
    backup
  • Most paralogs downregulated when other gene is
    knocked out (cross-hybridization?)
  • lower stdev -gt down regulation
  • Claim that asymmetry of backup capability can be
    predicted based on number of transcription factor
    binding sites.
  • Gene that has the larger number of motifs is the
    one that is capable of providing a backup to the
    other
  • Genes with few motifs are parasites cant
    backup
  • Claim an improved ability to predict effect of
    double knockouts

27
A Question
  • They claim that only when the genes diverge in
    function will they be maintained in evolution.
  • But if the duplicated pair can compensate for
    each others function then wont there be little
    selection pressure to maintain both copies?

28
From General Conservation to Specific Motifs
  • Searched conserved intronic regions for
    overrepresented hexamer
  • literature search for most significant hexamer
    shows that hexamer mentioned as an AS motif in
    six papers
  • Next steps
  • identify the consensus sequences of additional
    motifs
  • learn tissue/developmental specificity for each
    motif

29
Revealing Selection Patterns in the Evolution of
Yeast Transcription Regulation(Amos Tanay, Irit
Gat-Viks and Ron Shamir)
  • Identifying TF binding sites is hard
  • Even harder to predict more complex interactions
  • rarely a binary switch
  • not a linear relation between affinity and
    acivation
  • different binding affinities can lead to
    different results (e.g. P53 can lead to apoptosis
    or rescue)
  • Conservation indicates functionality
  • Evolution dynamics disclose details of
    functionality

30
An AnalogyImagine we didnt know the genetic
code, but just the length of the codes
  • We know that synonymous substitutions are more
    common in coding regions than nonsynonymous
    substitutions
  • build a network where each 3-letter nt string is
    represented by one node
  • put an edge between nodes where the thickness of
    the edge represents the frequency of mutations in
    aligned coding regions of related organisms
  • see strongly connected components comprised of
    nodes which all code for the same amino acid

31
A Simple Approach
  • Chose to use the four recent genomes of simple
    yeasts (promoter regions are relatively short)
  • Identified 4000 promoters and aligned them using
    ClustalW
  • Use simple window scanning method to identify all
    motifs of size 8
  • Simple parsimony method to infer ancestral
    sequences at each node in the phylogeny

32
A Simple Approach (2)
  • Calculate background substitution rate
  • 16 parameter background model for each branch in
    phylogeny
  • For each motif, compute 8 tables of site-specific
    substitution rates
  • simply count observed substitutions at each site,
    summed over all branches of the tree and all
    instances of the motif
  • normalized substitution rate log of ratio of
    observed substitutions over expected substitutions

33
Building a Selection Network
  • Each node represents an 8mer motif
  • Connect all motifs that are 1 substitution apart
  • if substitution rate is positive, dark edge
  • if substitution rate is negative, light edge
  • if not enough data, very thin edge

34
images taken from http//www.cs.tau.ac.il/amos/p
romoter_evo/
35
  • Did some larger scale evaluations based on ChiP
    and gene expression data
  • Also some anectodal results

36
Matrix of Substitutions from the Motif Concensus
37
Evolutionary Signatures of Regulatory Sequences
(Michael Eisen)
  • Examples of Evolutionary Signatures
  • coding sequence conserved conserved variable
  • structural RNA, nt that basepair are coevolving
  • What are the evolutionary constraints
  • imposed on sequences by TF binding?
  • Aligned 4 yeast species
  • for each base in genome, estimate evolutionary
    rate (very noisy estimates)

38
Analyze the pattern of rate variation across the
entire binding site
Moses et al Evol Biol 2003
39
Position-specific Rate Variation
  • The pattern of rate variation across the entire
    binding site for a particular TF
  • within one genome
  • across genomes

40
Position-specific Rate Variation
  • The pattern of rate variation across the entire
    binding site for a particular TF
  • within one genome
  • across genomes
  • Clearly due to structural constraints
  • protein contacts
  • even when we know theres no contact, theres DNA
    bending issues....

Highly Correlated
41
These signatures are missing from current
motif-prediction programs
  • Although this isnt a particularly suprising
    result, many predicted motifs (e.g. from MEME
    etc.) do not display this TFBS signature
  • could use as a filter, or incorporate it more
    directly (theyre working on this currently?)
  • Different families of TF have different
    signatures
  • Eisen thinks the community is still
    underutilizing this information

42
Make better use of comparative data by using an
explicit evolutionary model
  • Is there likely to have been a TFBS in the
    ancestor?
  • build a PSSM representing the chemical
    contribution of each base to the binding
    specificity
  • use Halpern and Bruno model to predict how the
    TFBS will evolve given proposal selection model

43
Make better use of comparative data by using an
explicit evolutionary model
Moses et al Evol Biol 2003
44
Larger Cis-Regulatory Sequences
  • Known binding patterns in Drosophila have low
    information content
  • find a sequence match for each TFBS before almost
    every gene in the genome
  • Build a statistical model to identify significant
    clusters of binding sites in windows of arbitrary
    size
  • improved detection of cis-regulatory modules
  • experimental results still show many false
    positives
  • Use comparative data to discriminate real
    clusters from false ones

45
How to use comparative data
  • Conservation in Drosophila pseudoobscura isnt a
    good indicator of functionality
  • all real and fake clusters have very high overall
    sequence conservation, including their flanking
    regions (a surprise)
  • However...
  • the actual binding sites are often not conserved
  • even one or two mutations can destroy a binding
    site
  • conservation of binding site density
  • is a useful indicator of function

46
An Impassioned Speech on the Evolution of the
Scientific Journal
  • If you publish your work in a journal like
    Science which fewer and fewer people in the world
    have access to you run a really big risk of being
    the next Mendel and that your work will languish
    in obscurity
  • Dont publish in a journal that takes your
    writing, your ideas, thoughts and paper and
    claims ownership of them and then only doles them
    out to a relatively narrow bunch of people who
    have enough money to pay for them..solely to
    promote the financial health of the journal...
  • Dont be like Microsoft... publish in Public
    Library of Science or another freely available
    journal

47
For More Information
  • Most of the talks I picked were invited talks
  • For the workshop there there is often only an
    abstract
  • Video feed is available online
    http//www.calit2.net/multimedia/recomb2004videos.
    html
  • Many have papers that have just come out or are
    about to come out with additional details...
    check the authors webpages

48
Variability of Expression
  • Best backup provided by duplicates which have
    similar expression patterns in only a subset of
    conditions

49
Evolution and Larger Cis-Regulatory Sequences
  • what are enhancer? whole regions of binding
    sites?
  • how are Drosophila enhancers organized
  • only 5 binding sites whose specificities are well
    characterized from experim. studies
  • low information content
  • find them all over the genome
  • Clusters of binding sites -gt Surrogate for
    regulatory function
  • Shown previously that if look for clusters of
    these sites
  • all identified regions overlap known enhancers
  • dont find anything else
  • then I dont understand next study with 39
    clusters

50
  • Found 39 clusters
  • 9 overlap known enhancers
  • 28 tested experimentally
  • 6 clearly regulating nearby gene
  • 3 shown some regulatory role perhaps
  • remainder dont appear to be real (but could have
    wrong promoter? look back at donoga talk)
  • Whats difference between real and fake?
  • use comparative mapping

51
  • Used two flies (which ones)
  • distant enough based on coding region
    conservation that expect to see conservation only
    of funtionally conserved regions
  • not the case
  • all real and fake clusters have very high overall
    sequence conservation, including their flanking
    regions (why?)

52
  • However,
  • binding sites not conserved
  • one or two mutaitons enough to destroy a binding
    site
  • measure conservation of binding site density
  • show graph (3718)
  • summary (3921)
  • In more distantly related species
  • alignment more of an issue
  • binding sites will move around more
  • been shown that huge binding site turnover will
    have 2 separate ways to make the same enhancer
  • no sequence identity but in experimental studies
    can replace each other?
Write a Comment
User Comments (0)
About PowerShow.com