Computational Models of Function and Evolution of cisRegulatory Sequences - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Computational Models of Function and Evolution of cisRegulatory Sequences

Description:

A CRM contains bindings sites for one or more transcription factors (TFs) and ... Epistatic interactions among different positions of a TFBS. ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 17
Provided by: xin96
Category:

less

Transcript and Presenter's Notes

Title: Computational Models of Function and Evolution of cisRegulatory Sequences


1
Computational Models of Function and Evolution of
cis-Regulatory Sequences
  • Xin He (Advisor Saurabh Sinha)
  • Department of Computer Science
  • University of Illinois, Urbana-Champaign

2
cis-Regulatory Modules
The spatial-temporal expression pattern of a gene
is controlled by its cis-regulatory module (CRM).
A CRM contains bindings sites for one or more
transcription factors (TFs) and drives expression
of the target gene.
3
Motivations
  • Evolution of cis-regulatory modules
  • Evolutionary changes in CRMs are believed to give
    rise to new morphology
  • Better understanding of CRM evolution will lead
    to better discovery of CRMs through comparative
    genomics
  • Sequence-function relationship of CRMs
  • Important for analysis of genomic data
    sequences, ChIP-binding, gene expression
  • Important for understanding how CRMs carry their
    function

4
Existing Approaches
  • Existing approaches for cross-species CRM
    analysis
  • Assume the orthologous sequences are accurately
    aligned, but alignments are error prone in large
    evolutionary time.
  • Assume a transcription factor binding site (TFBS)
    must be conserved in all aligned sequences, but
    binding site gain and loss are common even in
    relatively close species
  • Existing quantitative approaches to CRM
    sequence-function relationship
  • Often based on general formalisms from statistics
    and machine learning that do not reflect the
    underlying biological process
  • Miss important biochemical mechanisms of gene
    regulation such as interactions among TFs

5
Pairwise Model of CRM Evolution EMMA
An evolutionary model on an entire cis-regulatory
module
6
Model of TFBS Gain and Loss - EMMA
Time
0
  • A functional site is initially under constraint
  • At some moment, a mutation disrupts the site
    (binding energy no longer satisfies threshold)
  • No longer constrained afterwards

t
t
7
Alignment of LAGAN vs EMMA
A. LAGAN alignment TFBSs are mis-aligned at the
boundary, or shifted by a few bps or completely
unaligned
B. EMMA fixes all these alignment errors.
8
Regulatory Target Prediction - EMMA
  • Classification of sequences known targets vs
    random
  • Two Drosophila species Mel, Pse

9
Multi-species CRM Model STEMMA
  • Constant rate of loss, µ, at each existing
    functional site
  • Constant rate of gain,?, at each background
    nucleotide

10
Regulatory Target Prediction - STEMMA
  • STUBB-mel no conservation information
  • STUBB-avg heuristic modeling of sequence
    conservation
  • STEMMA-NT evolutionary model without binding
    site turnover (gain and loss)

11
Evolution of CRM in 12 Drosophila Species (I)
Epistatic interactions among different positions
of a TFBS. HB model assumes that different
positions of a TFBS evolve independently SS
model assumes that binding sites evolve as a
single unit with possible dependence among
positions. SS model has a smaller sum of squared
error (SSE) when comparing with observed data. X
axis evolutionary change of binding energy Y
axis frequency of that change.
12
Evolution of CRM in 12 Drosophila Species (II)
Binding site loss process roughly follows a
molecular clock. X axis divergence between D.
melanogaster and another Drosophila species Y
axis the faction of conserved binding sites.
13
A Biophysical Model of TF-DNA Interactions
  • Configurations a sequence with n binding sites
    has 2n configurations, each one corresponding to
    occupancy states of all sites (each site is
    either occupied or not).
  • Probabilities of configurations a configuration
    exists with a certain probability, determined by
    1) TF-DNA binding stronger binding, larger
    probability (qA, qB terms) 2) TF-TF
    interactions more interaction, larger
    probability (wAB term)
  • The total binding affinity of the sequence to a
    factor A is the number of A molecules bound in
    the sequence, averaging over all configurations
    weighted by their probabilities.
  • For each configuration shown are the probability
    (relative value) and the number of bound A
    molecules in that configuration

14
Analysis of ChIP-binding Data by Biophysical
Modeling
Cooperative interactions among TFs are important
for explaining DNA binding.
  • Coop/Non-coop model with/without cooperative
    interactions.
  • The numbers are Pearson correlations among
    predicted and observed binding affinities.

15
Co-localization of TF Molecules
Understandings from applying our model
  • Two TF molecules can be co-localized with only
    one molecule binding to DNA.
  • Two molecules of different TFs can bind to DNA
    independently without interaction.
  • Two TF molecules can bind to DNA cooperatively
    binding of one may facilitate binding of the
    other.

16
Predicting Expression Patterns from Regulatory
Sequences
An integrated biophysical model
  • Cooperative binding of two adjacent TF molecules
  • Quenching (short-range repression) of an
    activator molecule by an nearby repressor
    molecule
  • Transcriptional synergism between two activator
    molecules in simultaneous contact with basal
    transcriptional machinery (BTM)
  • DNA binding of multiple TFs
  • Interaction of bound TF molecules with BTM
    determines gene activation
Write a Comment
User Comments (0)
About PowerShow.com