Introduction to the Eukaryotic Promoter Database (EPD) and Signal Search Analysis (SSA) - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Introduction to the Eukaryotic Promoter Database (EPD) and Signal Search Analysis (SSA)

Description:

Introduction to the Eukaryotic Promoter Database (EPD) and Signal Search Analysis (SSA) Workshop on Regulatory Sequence Motif Discovery, November 10th 2006. – PowerPoint PPT presentation

Number of Views:205
Avg rating:3.0/5.0
Slides: 28
Provided by: csc64
Category:

less

Transcript and Presenter's Notes

Title: Introduction to the Eukaryotic Promoter Database (EPD) and Signal Search Analysis (SSA)


1
Introduction to the Eukaryotic Promoter Database
(EPD) and Signal Search Analysis (SSA)
Workshop on Regulatory Sequence Motif
Discovery, November 10th 2006. The Linnaeus
Centre for Bioinformatics, SLU-UU, Sweden.
  • Giovanna Ambrosini Christoph Schmid

2
Components of transcriptional regulation
Distal transcription-factor binding sites
(enhancer)
cis-regulatory modules
Wasserman 5, 276-287 (2004)
3
EPDThe Eukaryotic Promoter DatabaseCurrent
Release 88 (SEPT-2006)
  • founded in 1986 (Bucher and Trifonov Nucleic
    Acids Res, 14, 10009-10026)
  • originally exclusively based on literature,
    carefully maintained and regularly updated
  • in recent years started with consideration of
    mass sequencing data
  • aim at high precision of mapping of
    transcription start site (/- 5bp)
  • promoter sequences of 139 different species,
    still relatively low coverage (i.e. 1871 human
    entries)
  • format of annotation of TSS
  • DR EMBL ZZ999999.1 HS28BP -19, 9.
  • -15 -10 -5 0
    5
  • ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
    ' ' ' ' '
  • a c c c g c c t g c a c c c g a t t c A T G T G
    A G A A
  • one or several alternative transcription start
    sites per gene

4
EPD format
ID HS_RPS3 standard multiple VRT. XX AC
EP74176 XX DT 10-JAN-2003 (Rel. 73,
created) DT 13-SEP-2004 (Rel. 80, Last
annotation update). XX DE Ribosomal protein
S3. OS Homo sapiens (human). XX HG none. AP
none. NP none. XX DR GENOME NT_033927.7
NT_033927 -5333322, 12577805. ENSEMBL

UCSC HapMap DR CLEANEX HS_RPS3. DR EMBL
AP000744.4 -90138, 35862. EMBL GenBank
DDBJ DR SWISS-PROT P23396 RS3_HUMAN. DR
RefSeq NM_001005 DBTSS . DR MIM 600454.
5
TSS determined by modelling Gaussian
distributions (MADAP)
10 bp
Frequency of full-length transcripts
45 bp
Genomic position
R
R
84047148-84047231
84046905-84046987
The Eukaryotic Promoter Database EPD the impact
of in silico primer extension. Schmid, C.D.,
Praz, V., Delorenzi, M., Perier, R. and Bucher,
P. (2004) Nucleic Acids Res, 32, D82-85.
6
-1010 -400400 EPD 70 0.83 1 36 RefSeq
mRNA 0.32 0.95 933 Genome annot.
0.31 0.95 890 DBTSSv1 (human) 0.13 0.68 933
Eponine 0.12 0.46 494
7
Superior precision of in silico primer extension
(ISPE)
8
New data sources for EPD
ChIP-chip Kim et al. (2005) Nature, 436, 876-880
GEO GSE2672 (remapped!)
virtual counts (2 log ratio)-1
ENSEMBL chro12 6.8 6.94 Mb
9
ChIP-chip data with insufficient resolution
FP Hs USP5 R EUNC_000012.10 1 6831557 74339.
10
EPD webserver http//www.epd.isb-sib.ch/
  • find EPD entry(-ies) using gene symbols,...
  • extraction of promoter sequences in user-defined
    ranges
  • direct transfer to Signal Sequence Analysis (SSA)
  • download of complete (reference!) promoter sets
    http//www.epd.isb-sib.ch/seq_download.html

11
SSASignal Search AnalysisGiovanna Ambrosini
ISREC Swiss Institute for Experimental Cancer
Research
  • History Signal Search Analysis is a method
    developed by P Bucher in the early eighties
    (Bucher, P. and Bryan B., E.N. Nucleic Acids
    Res, v.12(1 Pt 1) 287305)
  • Purpose to discover and characterize sequence
    motifs that occur at constrained distances from
    physiologically defined sites in nucleic acid
    sequences.
  • Signal search analysis programs
  • CPR generates a constraint profile for the
    neighborhood of a functional site
  • SList generates lists of over and
    under-represented motifs in particular regions
    relative to a functional site
  • OProf generates a signal occurrence profile
    for a particular motif
  • PatOP optimizes a weight matrix description of a
    locally over-represented sequence motif
  • Recent events Adaptation of software to new
    environment, SSA web server, application to
    promoters and translational start sites

12
Locally Over-represented Sequence Motifs
13
Definition of a Locally Over-represented Sequence
Motif
  • Concept

A motif which preferentially occurs at a
characteristic distance (range) from a certain
type of functional position Example the
TATA-box is a locally over-represented sequence
motif of the -30 region of eukaryotic POL II
transcription initiation sites
  • Components of the formal motif description
  1. A weight matrix or consensus sequence defining
    the motif
  2. A cut-off value determining which subsequence
    constitutes a motif match
  3. A preferred region of occurrence defined by 5
    and 3 borders relative to a functional site,
    e.g. a transcription initiation site

14
Locally Over-represented Sequence Motifs
  • Input Data Structure
  • Work data
  • Primary experimental data
  • (Functional Position Set)
  • annotated functional positions in DNA sequences
    stored in a database
  • A DNA sequence matrix
  • a set of fixed-length sequence segments with
    an experimentally defined site at a fixed
    internal position

15
The Motif Search Problem
  • Statement
  • For a given DNA sequence matrix
  • find locally optimal combination of
  • using a given quality criterion
  • Quantitative motif description
  • Cut-off value
  • Region of preferential occurrence

16
TATA-box Signal Occurrence Profile for EPD and
ENSEMBL Drosophila Promoters
17
CCAAT-box Signal Occurrence Profile for
Vertebrate and ENSEMBL Drosophila Promoters
18
SSA webserver http//www.isrec.isb-sib.ch/ssa
  • Provides access to precompiled functional
    position sets
  • Collections of transcription initiation sites
    (promoters) from eukaryotic species
  • Collections of translation initiation sites from
    large variety of prokaryotic genomes
  • Provides access to the four signal search
    analysis programs

19
Application to a bacterial translational control
signal the Shine-Dalgarno ribosome binding-site
motif
  • Compare the strength and location of the
    Shine-Dalgarno mRNA-rRNA interaction motif in E.
    coli and B. subtilis in a qualitative manner.
  • Result the Shine-Dalgarno interaction motif is
    stronger in B. subtilis than in E .coli and
    centered about two bases further upstream in the
    former species. More than hundred bacterial
    genomes are now available to perform this type of
    analysis.

20
Studying transcription regulatory processes with
specialized bioinformatics resources and example
  • Biological question
  • Do genes that are generally up-regulated in
    cancer cells have different types of
    promoters?
  • Procedure
  • Define cancer up- and down-regulated gene sets
    using CleanEx
  • Extract corresponding promoter regions from EPD
  • Analyse the signal content of the two promoter
    sequence sets using SSA

21
Comparative analysis of cancer up- and
down-regulated promoters
  • Signals considered
  • Initiator preferred position approx. frequency
  • Initiator 0 25 - 50
  • TATA-box -30 to -25 30
  • GC-box -200 to 0 50
  • CCAAT-box -200 to -50 20

22
Positional distribution of Initiator motif in
cancer up- and down-regulated promoters
23
Positional distribution of TATA-boxes in cancer
up- and down-regulated promoters
24
Positional distribution of GC-boxes in cancer up-
and down-regulated promoters
25
Positional distribution of CCAAT-boxes in cancer
up- and down-regulated promoters
26
Comparative analysis of cancer up- and
down-regulated promoters Summary of results
  • Signal content
  • Initiator Frequency in Frequency in
  • cancer-up genes cancer-down genes
  • Initiator no change no change
  • TATA-box up down
  • GC-box no change no change
  • CCAAT-box up down
  • Next questions
  • Are TATA-box and CCAAT-box binding factors
    disregulated in cancer cells ?
  • Or do cancer-specific transcription factors
    (binding to adjacent sites) preferentially
    interact with
  • TATA-box and CCAAT-box binding factors?

27
Concluding remarks
  • Signal search analysis has played an instrumental
    role in the characterization of eukaryotic
    promoter elements
  • The method has originally been developed for the
    analysis of eukaryotic promoters but has a much
    broader application potential (e.g.
    Shine-Dalgarno signal analysis)
  • Rapidly growing collection of complete genomes
    and high-throughput methods for genomic analysis
    increase the statistical power to discover new
    motifs, or better characterize already known
    control signals
  • Aligning sequence sets with respect to a well
    characterized motif might allow the detection of
    binding sites of cooperating transcription
    factors positionally correlated with the known
    motif
  • Confirm or challenge commonly accepted hypotheses
    originally derived from small sets
Write a Comment
User Comments (0)
About PowerShow.com