Identify regulatory modules from gene expression data - PowerPoint PPT Presentation

1 / 14
About This Presentation

Identify regulatory modules from gene expression data


Identify regulatory modules from gene expression data Xu Ling 02/09/2005 Introduction Much of a cell s activity is organized as a network of interacting modules ... – PowerPoint PPT presentation

Number of Views:185
Avg rating:3.0/5.0
Slides: 15
Provided by: XuL6


Transcript and Presenter's Notes

Title: Identify regulatory modules from gene expression data

Identify regulatory modules from gene expression
  • Xu Ling
  • 02/09/2005

  • Much of a cells activity is organized as a
    network of interacting modules sets of genes
    coregulated to respond to different conditions.
    Identifying this organization is crucial for
    understanding cellular responses to internal and
    external signals.
  • Genome-wide expression profiles (e.g., DNA
    microarray) provide important information about
    regulatory mechanisms.
  • With the availability of complete genome
    sequences, identifying cis-regulatory elements
    via a bioinformatics approach on a genome-wide
    manner comes out as a promising solution.

  • Whats the underlying mechanisms by which genes
    are regulated?
  • Modules of coregulated genes?
  • Regulators (transcription factors)?
  • Regulation conditions (TFBSs/motifs, positional
    and combinatorial constraints)?

General scheme (1)
  • clustering-based approaches for finding motifs
    from gene expression and sequence data

General scheme (2)
  • sequence(/knowledge)-based approaches for finding
    motifs from gene expression and sequence data

General scheme (3)
  • Comparative genomics has also been applied to
    identify eukaryotic regulatory elements (e.g.,
    Human-Mouse) because functional noncoding
    sequences may be conserved across species from
    evolutionary constraints.
  • Finding a good pair of species to compare and
    choosing a good sequence conservation threshold
    are critical and such information is not
    available for most species.

Related work
  • Predicting gene expression from sequence
  • Michael A. Beer and Saeed Tavazoie
  • Cell, 2004, 117 185-198
  • A successful application of existing
    computational approaches in studying the yeast
    transcriptional regulation network

  • Clustering (k-means) modules of coregulated
  • Motif Finding (AlignACE) putative regulatory
    elements (TFBSs)
  • Bayesian network learning regulation conditions
    (motifs, positional and combinatorial constraints)

Bayesian Network
  • Sequence features (x1,,xn) ? expression patterns
  • Sequence feature (xi) presence of motifs,
    positional constraints, and combinatorial
  • Expression pattern (ei) a binary one layer
  • Maximizing P(eix1,,xn), the probability that
    genes with these sequence features will
    participate in expression pattern i

  • Easy to integrate all kinds of sequence features
  • Explicit Sequence features
  • To avoid complex networks overfit the training
    data, a parameter for penalizing dense networks
    is used.
  • Optimal network is greedily learned.

Motif finding approaches
  • Explicit statistical modeling based
  • Expectation maximization MEME,
  • Gibbs Sampling AlignACE, Gibbs Motif sampler,
  • Others CONSENSUS,
  • word enumeration based MDscan,

  • Sequence is broken up into all overlapping
    subsequences of length W which it contains.
  • Two-component finite mixture model Motif (a
    set of similar subsequences of fixed width)
    Background (all other positions in the
  • Motif model each example of the motif is assumed
    to be generated by a sequence of independent,
    multinomial random variables.
  • Background model each position (which is not
    part of a motif) is generated independently by a
    multinomial random variable.
  • Maximize the likelihood of the model M given the
    data D L(MD)p(DM) by EM algorithm

Gibbs motif sampler
  • Dealing with a specific model alignment rather
    than a weighted average as EM does.
  • Iteratively sample motif models (or possibly
    background model) for each subsequence and
    thereby partition motif-encoding regions into
    different motifs.
  • Iterative heuristic method, which combines
    gradient search steps with random jumps in the
    search space, hence not guaranteed to reach
    optimal, but wont stuck at local maximums as EM
  • Identify the most probable motif models by
    locating the optimum alignments, which maximize
    the ratios of the corresponding target
    probabilities to the background probabilities
    (MAP (maximum a posteriori) score).

Future work
  • Ab initio motif finding approach from gene
    expression and sequence data by attempting new
    heuristic or statistic model.
  • Integrating prior knowledge (e.g., GO) to
    facilitate identification of regulatory elements
    and transcriptional network.
Write a Comment
User Comments (0)