Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting - PowerPoint PPT Presentation

About This Presentation
Title:

Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting

Description:

Special Topics in Genomics. Cis-regulatory Modules and Phylogenetic Footprinting ... A threshold method (Wasserman et al. Nature Genetics, 2000) ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 28
Provided by: jihk
Category:

less

Transcript and Presenter's Notes

Title: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting


1
Special Topics in GenomicsCis-regulatory
Modules and Phylogenetic Footprinting
2
Cis-regulatory Modules and Module Discovery
The slides for module discovery are provided by
Prof. Qing Zhou _at_ UCLA
3
Motif Discovery
Mixture modeling
4
Difficulties in motif discovery in higher
organisms
  • Upstream sequences are longer.
  • Motifs are less conserved and shorter.
  • Background sequence structures are more
    complicated.
  • To solve the problem, utilize more biological
    knowledge in our model.
  • 1) module structure
  • 2) multiple species conservation

5
Cis-regulatory module
  • Combinatorial control of genes cis-regulatory
    modules

6
CisModule modeling module structure(Zhou and
Wong, PNAS 2004)
  • Module structure consider co-localization of
    motif sites.

Hierarchical Mixture modeling ? K of motifs
7
Parameters and missing data
  • Missing data problem.
  • K of motifs
  • l Module length
  • S Set of sequences
  • M Indicators for a module start
  • A Indicators for a motif site start
  • Background model
  • Weight matrices for motifs
  • W Motif widths
  • r Probability of a module start
  • q Probability of starting a motif site

Given
? Observed data
Missing data
Parameters ?
8
Bayesian inference by posterior sampling
9
Module sampling
  • Want to sample from P (M S, ?), need to
    calculate
  • Denote
  • Forward summation

10
Module sampling
  • Backward sampling
  • How to calculate

11
Posterior inference
  • Motif sites marginal posterior probability of
    being a motif start position gt 0.5.
  • Modules marginal posterior probability of being
    within a module gt 0.5.

12
Simulation study
  • Generate 30 data sets independently, each
    contains
  • 1) 20 sequences, each of length 1000
  • 2) 25 modules, with length 150
  • 3) each module contains 1 E2F site, 1 YY1 site,
    and 1 cMyc site.

CisModule CisModule CisModule Do not consider module Do not consider module Do not consider module
Motifs Fail TP FP Fail TP FP
E2F 0.03 17.9 7.5 0.37 17.1 11.6
YY1 0.07 16.0 8.7 0.20 17.1 11.0
cMyc 0 15.7 9.9 0.63 13.6 12.4
13
Example Discovery of tissue-specific modules in
Ciona
  • Sidow lab Collected 21 genes that are
    co-expressed during the development of muscle
    tissue in Ciona.
  • Want to find motifs and modules in the upstream
    sequences (average length 1330) of these genes.
  • Found 3 motifs in 28 modules (4860 bps).

Are they real motifs that determine the gene
expression??
14
Experimental validation
  • Positive element the shortest sufficient and
    non-overlapping sequence that drives strong
    expression in muscle average length of 289 bps.

15
Experimental validation
  • 70 of our predicted motif sites are located in
    the positive elements!

16
Other tools
  • Gibbs Module Sampler (Thompson et al. Genome Res.
    2004)
  • EMCMODULE (Gupta and Liu, PNAS, 2005)

17
Phylogenetic Footprinting
18
Functional elements tend to be conserved across
species
For example, exons are conserved due to the
selection pressure. Introns and intergenic
regions are less likely to be conserved.
19
Phylogenetic footprinting
Miller et al. Annu. Rev. Genomics Hum. Genet. 2004
20
Incorporating cross-species conservation into
motif discovery
  • A threshold method (Wasserman et al. Nature
    Genetics, 2000)
  • STEP1 construct cross-species alignment
  • STEP2 compute conservation measure from the
    alignment
  • STEP3 Non-conserved regions are filtered out
  • STEP4 Gibbs motif sampler is applied to
    conserved regions of the target genome

21
Phylogenetic footprinting motif discovery
  • CompareProspector (Liu Y. et al. Genome Res.
    2004)
  • STEP1 construct cross-species alignment
  • STEP2 compute conservation measure (window
    percent identity, WPID) from the alignment
  • STEP3 multiply the likelihood ratio at a
    position by the corresponding WPID, thus
    likelihood landscape is changed to favor
    conserved sites
  • STEP4 apply a Gibbs motif sampler based algorithm

22
Phylogenetic footprinting motif discovery
  • Evolutionary model based approach
  • EMnEM (Moses et al. 2004)
  • PhyME (Sinha et al. 2004)
  • PhyloGibbs (Siddharthan et al. 2005)
  • Tree Sampler (Li and Wong, 2005)

23
Incorporating cross-species conservation into
motif discovery
  • PhyloCon(Wang and Stormo, Bioinformatics, 2003)
  • STEP 1 construct alignment among orthologous
    sequences
  • STEP 2 convert conserved regions into profiles
  • STEP 3 use profiles in the first sequence as
    seeds
  • STEP 4 find matches of each seed in the second
    sequence
  • STEP 5 update seeds
  • STEP 6 repeat step 2 and 3 for all sequences.

24
Phylogenetic footprinting module discovery
  • Multimodule (Zhou and Wong, The Annals of Applied
    Statistics, 2007)

25
Multimodule
  • Module structure of each sequence is modeled by
    an HMM.
  • Couple HMMs via multiple alignment Aligned
    states are coupled and collapsed into one common
    state.
  • Uncoupled states similar to single species
    model.
  • Coupled states evolutionary model.

26
Comparing with other methods
  • Three data sets with experimental validation
    reported previously, which contain 9 known motifs
    with 152 validated sites.
  • CompareProspector (Liu et al. 2004) conservation
    score
  • PhyloCon (Wang and Stormo 2003) progressive
    alignment of profiles
  • EMnEM (Moses et al. 2004) Phylogenetic motif
    discovery
  • CisModule (Zhou and Wong 2004) Single-species
    module discovery.

27
Comparing with other methods
Method known motifs identified For correctly identified motifs by each method For correctly identified motifs by each method For correctly identified motifs by each method For correctly identified motifs by each method
Method known motifs identified predicted sites overlaps Sensitivity () Specificity ()
CompareProspector 7 75 36 24 48
PhyloCon 3 50 26 17 52
EMnEM 6 130 44 29 34
CisModule 5 110 35 23 32
MultiModule 8 157 79 52 50
of known sites 152
Write a Comment
User Comments (0)
About PowerShow.com