10/26/05 Promoter Prediction (really!) - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

10/26/05 Promoter Prediction (really!)

Description:

10/26/05 Promoter Prediction (really!) Announcements Announcements Announcements Promoter Prediction RNA Structure/Function Prediction Mon Quite a few more ... – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 42
Provided by: Drena8
Category:

less

Transcript and Presenter's Notes

Title: 10/26/05 Promoter Prediction (really!)


1
10/26/05 Promoter Prediction(really!)
2
Announcements
  • BCB Link for Seminar Schedules (updated)
  • http//www.bcb.iastate.edu/seminars/index.html
  • Seminar (Fri Oct 28)
  • 1210 PM BCB Faculty Seminar in E164 Lagomarcino
  • Assembly and Alignment of Genomic DNA Sequence
    Xiaoqiu Huang, ComS
  • http//www.bcb.iastate.edu/courses/BCB691-F2005.h
    tmlOct2028
  • Mark your calendars
  • 110 PM Nov 14 Baker Seminar in Howe Hall
    Auditorium
  • "Discovering transcription factor binding sites"
  • Douglas Brutlag,Dept of Biochemistry Medicine
  • Stanford University School of Medicine

3
Announcements
BCB 544 Projects - Important Dates Nov 2 Wed
noon - Project proposals due to David/Drena Nov
4 Fri 10A - Approvals/responses to
students Dec 2 Fri noon - Written project
reports due Dec 5,7,8,9 class/lab - Oral
Presentations (20') (Dec 15 Thurs Final
Exam)
4
Announcements
Lab 9 - due Wed noon (today) Exam 2 - this
Friday Posted Online Exam 2 Study Guide
544 Reading Assignment (2 papers) Lab
Keys (today) Thurs No Lab - Extra Office Hrs
instead David 1-3 PM in 209
Atanasoff Drena 1-3 PM in 106 MBB
5
Promoter Prediction RNA Structure/Function
Prediction
  • Mon ? Quite a few more words re
  • Gene prediction
  • Wed Promoter prediction
  • next Mon RNA structure function
  • RNA structure prediction
  • 2' 3' structure prediction
  • miRNA target prediction

6
Optional - but very helpful reading
(that's a hint!)
  • Zhang MQ (2002) Computational prediction of
    eukaryotic protein-coding genes. Nat Rev Genet
    3698-709
  • http//proxy.lib.iastate.edu2103/nrg/journal/v3/n
    9/full/nrg890_fs.html
  • Wasserman WW Sandelin A (2004) Applied
    bioinformatics for identification of regulatory
    elements. Nat Rev Genet 5276-287
  • http//proxy.lib.iastate.edu2103/nrg/journal/v5/
    n4/full/nrg1315_fs.html

Check this out http//www.phylofoot.org/NRG_test
cases/
03489059922
7
Reading Assignment (for Mon)
  • Mount Bioinformatics
  • Chp 8 Prediction of RNA Secondary Structure
  • pp. 327-355
  • Ck Errata http//www.bioinformaticsonline.org/hel
    p/errata2.html
  • Cates (Online) RNA Secondary Structure Prediction
    Module
  • http//cnx.rice.edu/content/m11065/latest/

8
Review last lectureFlowchart for Gene
PredictionPerformance Assessment
MeasuresCorrection re slide 10/24 27
Promoters
9
Gene prediction flowchart
Fig 5.15 Baxevanis Ouellette 2005
10
Evaluation of Splice Site Prediction
What do measures really mean?
Fig 5.11 Baxevanis Ouellette 2005
11
Correction re last lecture GeneSeqer
Performance Graphs
Brendel et al (2004) Bioinformatics 20 1157
12
Performance?
?
?
Human GT site
Human AG site
Sn
Sn
?
?
A. thaliana AG site
A. thaliana GT site
Sn
Sn
  • Note these are not ROC curves (plots of (1-Sn)
    vs Sp)
  • But plots such as these ( ROCs) much better
    than using "single number" to compare
    different methods
  • Both types of plots illustrate trade-off Sn vs
    Sp

Brendel 2005
13
Fig 2 - Brendel et al (2004) Bioinformatics 20
1157
14
Bayes Factor as Decision Criterion
H0 HT
Brendel 2005
15
Evaluation of Splice Site Prediction
Brendel 2005
16
Careful different definitions for "Specificity"
Brendel definitions
  • Specificity

cf. Guig?ó definitions Sn Sensitivity
TP/(TPFN) Sp Specificity TN/(TNFP) Sp- AC
Approximate Coefficient 0.5 x ((TP/(TPFN))
(TP/(TPFP)) (TN/(TNFP)) (TN/(TNFN))) - 1
Other measures? Predictive Values, Correlation
Coefficient
17
Best measures for comparing different methods?
  • ROC curves (Receiver Operating
    Characteristic?!!)
  • http//www.anaesthetist.com/mnm/stats/roc/
  • "The Magnificent ROC" - has fun applets
    quotes
  • "There is no statistical test, however intuitive
    and simple, which will not be abused by medical
    researchers"
  • Correlation Coefficient
  • (Matthews correlation coefficient (MCC)
  • MCC 1 for a perfect prediction
  • 0 for a completely random assignment
  • -1 for a "perfectly incorrect" prediction

Do not memorize this!
18
PromotersWhat signals are there? Simple
ones in prokaryotes
Brown Fig 9.17
BIOS Scientific Publishers Ltd, 1999
19
Prokaryotic promoters
  • RNA polymerase complex recognizes promoter
    sequences located very close to on 5 side
    (upstream) of initiation site
  • RNA polymerase complex binds directly to these.
    with no requirement for transcription factors
  • Prokaryotic promoter sequences are highly
    conserved
  • -10 region
  • -35 region

20
What signals are there? Complex ones in
eukaryotes!
Fig 9.13 Mount 2004
21
Simpler view of complex promoters in eukaryotes
Fig 5.12 Baxevanis Ouellette 2005
22
Eukaryotic genes are transcribed by 3 different
RNA polymerases
Recognize different types of promoters
enhancers
Brown Fig 9.18
BIOS Scientific Publishers Ltd, 1999
23
Eukaryotic promoters enhancers
  • Promoters located relatively close to
    initiation site
  • (but can be located within gene,
    rather than upstream!)
  • Enhancers also required for regulated
    transcription
  • (these control expression in specific cell
    types, developmental stages, in response to
    environment)
  • RNA polymerase complexes do not specifically
    recognize promoter sequences directly
  • Transcription factors bind first and serve as
    landmarks for recognition by RNA polymerase
    complexes

24
Eukaryotic transcription factors
  • Transcription factors (TFs) are DNA binding
    proteins that also interact with RNA polymerase
    complex to activate or repress transcription
  • TFs contain characteristic DNA binding motifs
  • http//www.ncbi.nlm.nih.gov/books/bv.fcgi?r
    idgenomes.table.7039
  • TFs recognize specific short DNA sequence motifs
    transcription factor binding sites
  • Several databases for these, e.g. TRANSFAC
  • http//www.generegulation.com/cgibin/pub/data
    bases/transfac

25
Zinc finger-containing transcription factors
  • Common in eukaryotic proteins
  • Estimated 1 of mammalian genes encode
    zinc-finger proteins
  • In C. elegans, there are 500!
  • Can be used as highly specific DNA binding
    modules
  • Potentially valuable tools for directed genome
    modification (esp. in plants) human gene
    therapy

Brown Fig 9.12
BIOS Scientific Publishers Ltd, 1999
26
New Today Promoter Prediction
  • Predicting regulatory regions (focus on
    promoters)
  • ? Brief review promoters enhancers
  • Predicting promoters eukaryotes vs prokaryotes
  • Next week
  • RNA structure function

27
Predicting Promoters
  • Overview of strategies
  • ? What sequence signals can be used?
  • What other types of information can be used?
  • Algorithms
  • Promoter prediction software
  • 3 major types
  • many, many programs!

28
Promoter prediction Eukaryotes vs prokaryotes
Promoter prediction is easier in microbial
genomes Why? Highly conserved Simpler
gene structures More sequenced genomes!
(for comparative approaches) Methods?
Previously, again mostly HMM-based Now
similarity-based. comparative methods because
so many genomes available
29
Predicting promoters Steps Strategies
  • Closely related to gene prediction!
  • Obtain genomic sequence
  • Use sequence-similarity based comparison
  • (BLAST, MSA) to find related genes
  • But "regulatory" regions are much less
    well-conserved than coding regions
  • Locate ORFs
  • Identify TSS (if possible!)
  • Use promoter prediction programs
  • Analyze motifs, etc. in sequence (TRANSFAC)

30
Predicting promoters Steps Strategies
  • Identify TSS --if possible?
  • One of biggest problems is determining exact
    TSS!
  • Not very many full-length cDNAs!
  • Good starting point? (human vertebrate genes)
  • Use FirstEF
  • found within UCSC Genome Browser
  • or submit to FirstEF web server

Fig 5.10 Baxevanis Ouellette 2005
31
Automated promoter prediction strategies
  • Pattern-driven algorithms
  • Sequence-driven algorithms
  • Combined "evidence-based"
  • BEST RESULTS? Combined, sequential

32
Promoter Prediction Pattern-driven algorithms
  • Success depends on availability of collections of
    annotated binding sites (TRANSFAC PROMO)
  • Tend to produce huge numbers of FPs
  • Why?
  • Binding sites (BS) for specific TFs often
    variable
  • Binding sites are short (typically 5-15 bp)
  • Interactions between TFs ( other proteins)
    influence affinity specificity of TF binding
  • One binding site often recognized by multiple BFs
  • Biology is complex promoters often specific to
    organism/cell/stage/environmental condition

33
Promoter Prediction Pattern-driven algorithms
  • Solutions to problem of too many FP predictions?
  • Take sequence context/biology into account
  • Eukaryotes clusters of TFBSs are common
  • Prokaryotes knowledge of ? factors helps
  • Probability of "real" binding site increases if
    annotated transcription start site (TSS) nearby
  • But What about enhancers? (no TSS nearby!)
  • Only a small fraction of TSSs have been
    experimentally mapped
  • Do the wet lab experiments!
  • But Promoter-bashing is tedious

34
Promoter Prediction Sequence-driven algorithms
  • Assumption common functionality can be deduced
    from sequence conservation
  • Alignments of co-regulated genes should highlight
    elements involved in regulation
  • Careful How determine co-regulation?
  • Orthologous genes from difference species
  • Genes experimentally determined to be
  • co-regulated (using microarrays??)
  • Comparative promoter prediction
  • "Phylogenetic footprinting" - more later.

35
Promoter Prediction Sequence-driven algorithms
  • Problems
  • Need sets of co-regulated genes
  • For comparative (phylogenetic) methods
  • Must choose appropriate species
  • Different genomes evolve at different rates
  • Classical alignment methods have trouble with
  • translocations, inversions in order of
    functional elements
  • If background conservation of entire region is
    highly conserved, comparison is useless
  • Not enough data (Prokaryotes gtgtgt Eukaryotes)
  • Biology is complex many (most?) regulatory
    elements are not conserved across species!

36
Examples of promoter prediction/characterization
software
Lab used MATCH, MatInspector TRANSFAC MEME
MAST BLAST, etc. Others? FIRST EF Dragon
Promoter Finder (these are links in PPTs) also
see Dragon Genome Explorer (has specialized
promoter software for GC-rich DNA, finding CpG
islands, etc) JASPAR
37
TRANSFAC matrix entry for TATA box
  • Fields
  • Accession ID
  • Brief description
  • TFs associated with this entry
  • Weight matrix
  • Number of sites used to build (How many here?)
  • Other info

Fig 5.13 Baxevanis Ouellette 2005
38
Global alignment of human mouse obese gene
promoters (200 bp upstream from TSS)
Fig 5.14 Baxevanis Ouellette 2005
39
Check out optional review try associated
tutorial
  • Wasserman WW Sandelin A (2004) Applied
    bioinformatics for identification of regulatory
    elements. Nat Rev Genet 5276-287
  • http//proxy.lib.iastate.edu2103/nrg/journal/v5/
    n4/full/nrg1315_fs.html

Check this out http//www.phylofoot.org/NRG_test
cases/
40
Annotated lists of promoter databases promoter
prediction software
  • URLs from Mount Chp 9, available online
  • Table 9.12 http//www.bioinformaticsonline.org/li
    nks/ch_09_t_2.html
  • Table in Wasserman Sandelin Nat Rev Genet
    article http//proxy.lib.iastate.edu2103/nrg/jour
    nal/v5/n4/full/nrg1315_fs.htm
  • URLs for Baxevanis Ouellette, Chp 5
  • http//www.wiley.com/legacy/products/subject/life
    /bioinformatics/ch05.htmlinks
  • More lists
  • http//www.softberry.com/berry.phtml?topicindexg
    roupprogramssubgrouppromoter
  • http//bioinformatics.ubc.ca/resources/links_direc
    tory/?subcategory_id104
  • http//www3.oup.co.uk/nar/database/subcat/1/4/

41
Reading Assignment (for Mon)
  • Mount Bioinformatics
  • Chp 8 Prediction of RNA Secondary Structure
  • pp. 327-355
  • Ck Errata http//www.bioinformaticsonline.org/hel
    p/errata2.html
  • Cates (Online) RNA Secondary Structure Prediction
    Module
  • http//cnx.rice.edu/content/m11065/latest/
Write a Comment
User Comments (0)
About PowerShow.com