Promoter Discovery: A Correlation Mining Approach - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Promoter Discovery: A Correlation Mining Approach

Description:

Motif discovery after clustering ... M1 M2 = increase gene expression change from Day 1 to Day 4. Yi Lu Wayne State University ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 28
Provided by: jas571
Learn more at: http://www.cs.wayne.edu
Category:

less

Transcript and Presenter's Notes

Title: Promoter Discovery: A Correlation Mining Approach


1
Promoter Discovery A Correlation Mining Approach
  • Yi Lu
  • Department of Computer Science
  • Wayne State University

2
Outline
  • Introduction
  • Related Work
  • Problem Definition
  • Correlation Mining
  • Conclusion and Future work

3
Introduction
  • Central Dogma
  • Gene Expression

4
Introduction
  • The promoter region (a set of transcription
    binding sites) of the gene acts as light switch.
    It signals when to turn the gene on and off.
  • We are interested in the relationship between the
    promoter region and gene expression. i.e. what
    kind of binding sites determine whether a gene is
    expressed or not?

5
Introduction - Microarray
Microarray chips
Images scanned by laser
Gene Value D26528_at
193 D26561_at 70 D26579_at
318 D26598_at 1764 D26599_at
1537 D26600_at 1204 D28114_at
707 H29189_at 899 G29183_at 9210
Datasets
D1 D2 D3 D4..
D26528_at D26561_at D26579_at D26598_at D26599_at
D26600_at D28114_at .. ..
6
Introduction
  • Transcription factor binding sites (motif) in
    promoter region should explain changes in
    transcription.

Time Course genes
AGCTAGCTGATTGTGCACACTGATCGAGCCCCACCATAGCTTCGTTGTG
CGCTATATATTGTGCAGCTAGTAGAGCTCTGCTAGAGCTCTATTTGTG
CCGATTGCGGGGCGTCTGAGCTCTTTGCTCTTTTGTGCCGCTTTTGAT
ATTATCTCTCTGCTCGTTTGTGCTTTATTGTGGGGGTTGTGCTGATTAT
GCTGCTCATAGGAGATTGTGCGAGAGTCGTCGTAGTTGTGCGTCGTCG
TGATGATGCTGCTGATCGATCGTTGTGCCTAGCTAGTAGATCGATGTT
TGTGCAGAAGAGAGAGGGTTTTTTCGCGCCGCCCCGCGCTTGTGCTCG
AGAGGAAGTATATATTTGTGCGCGCGCCGCGCGCACGTTGTGCAGCTGA
TGCATGCATGCTAGTATTGTGCCTAGTCAGCTGCGATCGACTCGTAGC
ATGCATCTTGTGCAGTCGATCGATGCTAGTTATTGTTGTGCGTAGTAG
TGCTTGTGCTCGTAGCTGTAG
AGCTAGCTGATTGTGCACACTGATCGAGCCCCACCATAGCTTCGTTGTG
CGCTATATATTGTGCAGCTAGTAGAGCTCTGCTAGAGCTCTATTTGTG
CCGATTGCGGGGCGTCTGAGCTCTTTGCTCTTTTGTGCCGCTTTTGAT
ATTATCTCTCTGCTCGTTTGTGCTTTATTGTGGGGGTTGTGCTGATTAT
GCTGCTCATAGGAGATTGTGCGAGAGTCGTCGTAGTTGTGCGTCGTCG
TGATGATGCTGCTGATCGATCGTTGTGCCTAGCTAGTAGATCGATGTT
TGTGCAGAAGAGAGAGGGTTTTTTCGCGCCGCCCCGCGCTTGTGCTCG
AGAGGAAGTATATATTTGTGCGCGCGCCGCGCGCACGTTGTGCAGCTGA
TGCATGCATGCTAGTATTGTGCCTAGTCAGCTGCGATCGACTCGTAGC
ATGCATCTTGTGCAGTCGATCGATGCTAGTTATTGTTGTGCGTAGTAG
TGCTTGTGCTCGTAGCTGTAG
AGCTAGCTGATTGTGCACACTGATCGAGCCCCACCATAGCTTCGTTGTG
CGCTATATATTGTGCAGCTAGTAGAGCTCTGCTAGAGCTCTATTTGTG
CCGATTGCGGGGCGTCTGAGCTCTTTGCTCTTTTGTGCCGCTTTTGAT
ATTATCTCTCTGCTCGTTTGTGCTTTATTGTGGGGGTTGTGCTGATTAT
GCTGCTCATAGGAGATTGTGCGAGAGTCGTCGTAGTTGTGCGTCGTCG
TGATGATGCTGCTGATCGATCGTTGTGCCTAGCTAGTAGATCGATGTT
TGTGCAGAAGAGAGAGGGTTTTTTCGCGCCGCCCCGCGCTTGTGCTCG
AGAGGAAGTATATATTTGTGCGCGCGCCGCGCGCACGTTGTGCAGCTGA
TGCATGCATGCTAGTATTGTGCCTAGTCAGCTGCGATCGACTCGTAGC
ATGCATCTTGTGCAGTCGATCGATGCTAGTTATTGTTGTGCGTAGTAG
TGCTTGTGCTCGTAGCTGTAG
R(t2)
7
Related work
  • Cluster gene expression profiles
  • Search for motifs in promoter regions of
    clustered genes

8
Related work
  • Clustering
  • partition the N genes to a set of disjoint groups
    so that the expression profile of genes in same
    group have high similarity to each other and the
    expression profile of genes in different groups
    are dissimilar to each other.
  • Most widely used algorithms K-means clustering,
    hierarchy clustering algorithms.
  • Genetic K-means algorithms (Lu et al. 2003,
    2004).

9
Related work
  • Motif discovery after clustering
  • given a set of upstream sequence of genes which
    are co-expressed, find subsequences that are
    overrepresented and are significant to be
    separated from other subsequences
  • MEME, Gibbs Sampling, Winnower algorithms.
  • PDC algorithm (Lu et al. 2006)
  • Usually have high false positive rate

10
Motivation
  • Researches have indicated that multiple
    transcription factor binding sites are involved
    into each transcription process. This lead us to
    study the Modules (a pair of motifs) instead of
    Motifs.

11
Motivation
  • Not all genes contain the same motif cause the
    same gene expression change.
  • Not all genes with same gene expression change
    contains same motif.

12
Problem Definition
  • Given a list of genes, and corresponding module
    present information, gene expression information,
    find the relationship between module and gene
    expression, i.e. which modules or module
    combinations may relate to the gene expression
    change.
  • M1 M2 gt increase gene expression change from
    Day 1 to Day 4

13
Method - Quantify Gene Expression
14
Method - Quantify Gene Expression
15
Method Generate Frequent Module Set
  • Frequent module sets (occurrence gt2)
  • M1(4), M2 (3), M3 (2)

, M4(1)
M1M2 (3), M1M3 (2)
, M2M3 (1)
M1M2M3(1)
16
Method Generate Frequent Gene Expression Set
  • Frequent gene expression sets (occurrence gt2)
  • E1 (2), E1- (0), E2 (1), E2-(3), E3 (0),
    E3-,(2),
  • E1E2-(1), E1E3-(1), E2-E3- (2)

17
Correlation Measure Contingency Table
  • The relation between u and v in the pair (u,v)

18
Liddell Measure
  • Liddell ( 21-10)/(22) 0.5

19
Method Correlate Module Set with Gene
Expression Set
  • Minimize module set
  • Maximize gene expression set
  • Minimum Liddell value is set to 0.5/-0.5, then
    the result sets
  • M2 -gtE1
  • M2 -gt (E2- E3-)
  • M3 -gtE2- E3-

20
Result on Spermatogenesis
  • Spermatogenesis is the biological process related
    to formation of sperm. Two gene expression data
    sets are downloaded from GEO (Gene Expression
    Omnibus).
  • The time course of one dataset ranges from day 0,
    3, 6, 8, 10, 14, 18, 20, 30, 35, and 56. And the
    other ranges from day 1, 4, 8, 11, 14, 18, 21,
    26, 29, and 60.

21
System Workflow
  • GEO Gene Expression Omnibus
  • DBTSS DataBase of Transcriptional Start Sites
  • TRANSFAC the Transcription Factor database
  • JASPAR The high-quality transcription factor
    binding profile database

22
Conclusion
  • Not only same module combination result, but also
    the same genes that contain the module
    combinations have been pulled out between the two
    datasets.
  • The promoter detected using our approach
    statistically shows significance than random
    generated datasets.
  • Some promoters found by our approach are
    confirmed by literatures.

23
Future work
  • The concordance between the two gene expression
    datasets downloaded from GEO are low, new method
    to reconcile the difference between two data sets
    is needed.
  • Motifs found by different algorithms are
    overwhelming, we may incorporate the weight
    matrix and gene ontology to identify the
    significant ones.

24
References
  • Gene Expression Clustering
  • Yi Lu, Shiyong Lu, Farshad Fotouhi, Youping Deng
    and Susan Brown, "FGKA A Fast Genetic K-means
    Clustering Algorithm", in Proceedings of the 19th
    ACM Symposium on Applied Computing, Nicosia,
    Cyprus, March, 2004.
  • Yi Lu, Shiyong Lu, Farshad Fotouhi, Youping Deng,
    and Susan Brown, Incremental Genetic K-means
    Algorithm and its Application in Gene Expression
    Data Analysis, International Journal of BMC
    Bioinformatics, 5(172), October, 2004.
  • Motif Discovery
  • Yi Lu, Shiyong Lu, Farshad Fotouhi, Yan Sun and
    Zijiang Yang, PDC Pattern Discovery with
    Confidence in DNA Sequences, In the proceedings
    of the IASTED International Conference on
    Advances in Computer Science and Technology (ACST
    2006), Puerto Vallarta, Mexico, January, 2006
  • Motif Extraction, Module Integration
  • Adrian E. Platts, Yi Lu, Stephen A. Krawetz,
    K-SPMM, an Online System for Data Mining
    Regulatory Elements from Murine Spermatogenic
    Promoter Sequences, presented in 2006 Great
    Lakes Mammalian Development Meeting, Toronto,
    March 3-5 2006.
  • Yi Lu, Adrian E. Platts, Charles G. Ostermeier,
    Stephen A. Krawetz, A Database of Murine
    Spermatogenic Promoters Modules Motifs,
    Submitted to Journal of BMC Bioinformatics for
    publication.
  • Correlation Mining
  • Yi Lu, Adrian Platts, Shiyong Lu, Jeffrey L. Ram
    and Stephen Krawetz, "Correlation Mining to
    Reveal the Regulation of Transcription Factor
    Binding Site Modules", 4th Great Lake
    Bioinformatics Retreat, Frankenmuth, Michigan,
    August, 2005.
  • Yi Lu, Adrian Platts, Shiyong Lu, Jeffrey L. Ram
    and Stephen Krawetz, Mining of Correlation
    Between Transcription Binding Sites and Gene
    Expression Profiles, In preparation.

25
(No Transcript)
26
Acknowledgements
  • Dr. Shiyong Lu
  • Dr. Stephen Krawetz
  • Mr. Adrian Platts
  • Dr. Jeffrey Ram
  • Dr. Youping Deng

27
Questions?
Write a Comment
User Comments (0)
About PowerShow.com