10/24/05 Promoter Prediction RNA Structure - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

10/24/05 Promoter Prediction RNA Structure

Description:

... in the Human Genome 4.8 Using TWINSCAN to Predict Gene Structures in Genomic DNA Sequences 4.9 GrailEXP and Genome Analysis ... as project for this ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 47
Provided by: Dren153
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: 10/24/05 Promoter Prediction RNA Structure


1
10/24/05 Promoter Prediction RNA Structure
Function Prediction
2
Announcements
Seminar (Mon Oct 24) (several additional
seminars listed in email sent to class) 1210 PM
IG Faculty Seminar in 101 Ind Ed II "Laser
capture microdissection-facilitated
transcriptional profiling of abscission zones
in Arabidopsis" Coralie Lashbrook,
EEOB http//www.bb.iastate.edu/7Emarit/GEN691.ht
ml Mark your calendars 110 PM Nov 14 Baker
Seminar in Howe Hall Auditorium "Discovering
transcription factor binding sites" Douglas
Brutlag,Dept of Biochemistry Medicine,
Stanford University School of Medicine
3
Announcements
  • 544 Semester Projects
  • Thanks to all who sent already!
  • Others Information needed today!
  • ddobbs_at_iastate.edu
  • Briefly describe
  • Your background current grad research
  • Is there a problem related to your research you
    would like to learn more about develop as
    project for this course?
  • or
  • What would your dream project be?

4
Announcements
Exam 2 - this Friday Posted Online Exam 2
Study Guide 544 Reading Assignment (2
papers) Office Hours David Mon 1-2 PM in 209
Atanasoff Drena Tues 10-11AM in 106
MBB Michael - none this week Thurs No
Lab - Extra Office Hrs instead David
1-3 PM in 209 Atanasoff Drena 1-3 PM
in 106 MBB
5
Announcements
  • Updated PPTs PDFs for Gene Prediction
    lectures (covered on Exam 2) will be posted
    today (changes are minor)
  • Is everyone on BCB 444/544 mailing list?
    Auditors?

6
Promoter Prediction RNA Structure/Function
Prediction
  • Mon Quite a few more words re
  • Gene prediction
  • Promoter prediction
  • Wed RNA structure function
  • RNA structure prediction
  • 2' 3' structure prediction
  • miRNA target prediction
  • Thurs No Lab
  • Fri Exam 2

7
Reading Assignment - previous
  • Mount Bioinformatics
  • Chp 9 Gene Prediction Regulation
  • pp 361-401
  • Ck Errata http//www.bioinformaticsonline.org/hel
    p/errata2.html
  • Brown Genomes 2 (NCBI textbooks online)
  • Sect 9 Overview Assembly of Transcription
    Initiation Complex
  • http//www.ncbi.nlm.nih.gov/books/bv.fcgi?ridgeno
    mes.chapter.7002
  • Sect 9.1-9.3 DNA binding proteins, Transcription
    initiation
  • http//www.ncbi.nlm.nih.gov/books/bv.fcgi?ridgeno
    mes.section.7016
  • NOTEs Dont worry about the details!!
  • See Study Guide for Exam 2 reSections covered

8
Optional - but very helpful reading
(that's a hint!)
  • Zhang MQ (2002) Computational prediction of
    eukaryotic protein-coding genes. Nat Rev Genet
    3698-709
  • http//proxy.lib.iastate.edu2103/nrg/journal/v3/n
    9/full/nrg890_fs.html
  • Wasserman WW Sandelin A (2004) Applied
    bioinformatics for identification of regulatory
    elements. Nat Rev Genet 5276-287
  • http//proxy.lib.iastate.edu2103/nrg/journal/v5/
    n4/full/nrg1315_fs.html

Check this out http//www.phylofoot.org/NRG_test
cases/
03489059922
9
Reading Assignment (for Wed)
  • Mount Bioinformatics
  • Chp 8 Prediction of RNA Secondary Structure
  • pp. 327-355
  • Ck Errata http//www.bioinformaticsonline.org/hel
    p/errata2.html
  • Cates (Online) RNA Secondary Structure Prediction
    Module
  • http//cnx.rice.edu/content/m11065/latest/

10
Review last lecture Gene Prediction (formerly
Gene Prediction - 3)
  • Overview of steps strategies
  • Algorithms
  • Gene prediction software

11
Predicting Genes - Basic steps
  • Obtain genomic DNA sequence
  • Translate in all 6 reading frames
  • Compare with protein sequence database
  • Also perform database similarity search
  • with EST cDNA databases, if available
  • Use gene prediction programs to locate genes
  • Analyze gene regulatory sequences
  • Note Several important details missing above
  • 1. Mask to "remove" repetitive elements (ALUs,
    etc.)?
  • Perform database search on translated DNA
    (BlastX,TFasta)
  • Use several programs to predict genes
    (GenScan,GeneMark.hmm)
  • 4. Translate putative ORFs and search for
    functional motifs (Blocks, Motifs, etc.)
    regulatory sequences

12
Gene prediction flowchart
Fig 5.15 Baxevanis Ouellette 2005
13
Overview of gene prediction strategies
  • What sequence signals can be used?
  • Transcription TF binding sites, promoter,
    initiation site, terminator
  • Processing signals splice donor/acceptors,
    polyA signal
  • Translation start (AUG Met) stop (UGA,UUA,
    UAG)
  • ORFs, codon usage
  • What other types of information can be used?
  • cDNAs ESTs (pairwise alignment)
  • homology (sequence comparison, BLAST)

14
Examples of gene prediction software
  • Similarity-based or Comparative
  • BLAST
  • SGP2 (extension of GeneID)
  • Ab initio from the beginning
  • GeneID - (used in lab last week)
  • GENSCAN - (used in lab last week)
  • GeneMark.hmm - (should try this!)
  • Combined "evidence-based
  • GeneSeqer (Brendel et al., ISU)
  • BEST? GENSCAN, GeneMark.hmm, GeneSeqer
  • but depends on organism specific task

15
Annotated lists of gene prediction software
  • URLs from Mount Chp 9, available online
  • Table 9.1 http//www.bioinformaticsonline.org/lin
    ks/ch_09_t_1.html
  • from Pevsner Chps 14 16
  • http//www.bioinfbook.org/chapt14.htm -
    prokaryotic
  • http//www.bioinfbook.org/chapt16.htm -
    eukaryotic
  • Table in Zhang Nat Rev Genet article
    hptt//proxy.lib.iastate.edu2103/nrg/journal/v3/n
    9/full/nrg890_fs.html
  • Another list Kozar, Stanford
  • http//cmgm.stanford.edu/classes/genefind/
  • Performance Evaluation? Guig?ó, Barcelona (
    sites above) http//www1.imim.es/courses/SeqAnalys
    is/GeneIdentification/Evaluation.html

16
Gene prediction Eukaryotes vs prokaryotes
Gene prediction is easier in microbial
genomes Methods? Previously, mostly
HMM-based Now similarity-based
methods because so many genomes available
see Mount Fig 9.7 (E.coli gene)
Many microbial genomes have been fully sequenced
whole-genome "gene structure" and "gene
function" annotations are available. e.g.,
GeneMark.hmm TIGR Comprehensive
Microbial Resource (CMR) NCBI Microbial
Genomes
17
UCSC Browser view of 1000 kb region (Human URO-D
gene)
Fig 5.10 Baxevanis Ouellette 2005
18
Spliced Alignment Algorithm
GeneSeqer - Brendel et al.
http//deepc2.psi.iastate.edu/cgi-bin/gs.cgi
Brendel et al (2004) Bioinformatics 20 1157
  • Perform pairwise alignment with large gaps in one
    sequence (due to introns)
  • Align genomic DNA with cDNA, ESTs, protein
    sequences
  • Score semi-conserved sequences at splice
    junctions
  • Using a Bayesian model
  • Score coding constraints in translated exons
  • Using a Bayesian model

Brendel 2005
19
Brendel - Spliced Alignment I Compare with cDNA
or EST probes
Brendel 2005
20
Brendel - Spliced Alignment II Compare with
protein probes
Brendel 2005
21
Splice Site Detection
Do DNA sequences surrounding splice "consensus"
sequences contribute to splicing signal?
YES
i ith position in sequence I avg
information content over all positions gt20 nt
from splice site ?I avg sample standard
deviation of I
Brendel 2005
22
Information content vs position
Which sequences are exons which are
introns? How can you tell?
Brendel et al (2004) Bioinformatics 20 1157
Brendel 2005
23
Bayesian Splice Site Prediction
where H indexes the hypotheses of GT or AG at
- True site in reading phase 1, 2, or 0 -
False within-exon site in reading phase 1, 2, or
0 - False within-intron site
Brendel et al (2004) Bioinformatics 20 1157
Brendel 2005
24
Bayes Factor as Decision Criterion
7-class model
Brendel et al (2004) Bioinformatics 20 1157
Brendel 2005
25
Markov Model for Spliced Alignment
Brendel 2005
26
Evaluation of Splice Site Prediction
Brendel 2005
27
Performance?
?
?
Human GT site
Human AG site
Sn
Sn
?
?
A. thaliana AG site
A. thaliana GT site
Sn
Sn
  • Note these are not ROC curves (plots of (1-Sn)
    vs Sp)
  • But plots such as these ( ROCs) much better
    than using "single number" to compare
    different methods
  • Both types of plots illustrate trade-off Sn vs
    Sp

Brendel 2005
28
Evaluation of Splice Site Prediction
What do measures really mean?
Fig 5.11 Baxevanis Ouellette 2005
29
Careful different definitions for "Specificity"
Brendel definitions
  • Specificity

cf. Guig?ó definitions Sn Sensitivity
TP/(TPFN) Sp Specificity TN/(TNFP) Sp- AC
Approximate Coefficient 0.5 x ((TP/(TPFN))
(TP/(TPFP)) (TN/(TNFP)) (TN/(TNFN))) - 1
Other measures? Predictive Values, Correlation
Coefficient
30
Best measures for comparing different methods?
  • ROC curves (Receiver Operating
    Characteristic?!!)
  • http//www.anaesthetist.com/mnm/stats/roc/
  • "The Magnificent ROC" - has fun applets
    quotes
  • "There is no statistical test, however intuitive
    and simple, which will not be abused by medical
    researchers"
  • Correlation Coefficient
  • (Matthews correlation coefficient (MCC)
  • MCC 1 for a perfect prediction
  • 0 for a completely random assignment
  • -1 for a "perfectly incorrect" prediction

Do not memorize this!
31
Performance of GeneSeqer vs other methods?
  • Comparison with ab initio gene prediction
  • (e.g., GENESCAN)
  • Depends on
  • Availability of ESTs
  • Availability of protein homologs

Other Performance Evaluations?
Guig?ó http//www1.imim.es/courses/SeqAnalysis/Gen
eIdentification/Evaluation.html
Brendel 2005
32
GeneSeqer vs GENSCAN (Exon prediction)
Target protein alignment score
GENSCAN - Burge, MIT
Brendel 2005
33
GeneSeqer vs GENSCAN (Intron prediction)
GENSCAN - Burge, MIT
Brendel 2005
34
Other Resources
  • Current Protocols in Bioinformatics
  • http//www.4ulr.com/products/currentprotocols/bioi
    nformatics.html
  • Finding Genes
  • 4.1 An Overview of Gene Identification
    Approaches, Strategies, and Considerations
  • 4.2 Using MZEF To Find Internal Coding Exons
  • 4.3 Using GENEID to Identify Genes
  • 4.4 Using GlimmerM to Find Genes in Eukaryotic
    Genomes
  • 4.5 Prokaryotic Gene Prediction Using GeneMark
    and GeneMark.hmm
  • 4.6 Eukaryotic Gene Prediction Using GeneMark.hmm
  • 4.7 Application of FirstEF to Find Promoters and
    First Exons in the Human Genome
  • 4.8 Using TWINSCAN to Predict Gene Structures in
    Genomic DNA Sequences
  • 4.9 GrailEXP and Genome Analysis Pipeline for
    Genome Annotation
  • 4.10 Using RepeatMasker to Identify Repetitive
    Elements in Genomic Sequences

35
New Today Promoter Prediction
  • A few more words about Gene prediction
  • Predicting regulatory regions (focus on
    promoters)
  • Brief review promoters enhancers
  • Predicting in eukaryotes vs prokaryotes
  • Introduction to RNA
  • Structure function

36
Predicting Promoters
What signals are there? Algorithms Promoter
prediction software
37
What signals are there? Simple ones in
prokaryotes
Brown Fig 9.17
BIOS Scientific Publishers Ltd, 1999
38
Prokaryotic promoters
  • RNA polymerase complex recognizes promoter
    sequences located very close to on 5 side
    (upstream) of initiation site
  • RNA polymerase complex binds directly to these.
    with no requirement for transcription factors
  • Prokaryotic promoter sequences are highly
    conserved
  • -10 region
  • -35 region

39
What signals are there? Complex ones in
eukaryotes!
Fig 9.13 Mount 2004
40
Simpler view of complex promoters in eukaryotes
Fig 5.12 Baxevanis Ouellette 2005
41
Eukaryotic genes are transcribed by 3 different
RNA polymerases
Recognize different types of promoters
enhancers
Brown Fig 9.18
BIOS Scientific Publishers Ltd, 1999
42
Eukaryotic promoters enhancers
  • Promoters located relatively close to
    initiation site
  • (but can be located within gene,
    rather than upstream!)
  • Enhancers also required for regulated
    transcription
  • (these control expression in specific cell
    types, developmental stages, in response to
    environment)
  • RNA polymerase complexes do not specifically
    recognize promoter sequences directly
  • Transcription factors bind first and serve as
    landmarks for recognition by RNA polymerase
    complexes

43
Eukaryotic transcription factors
  • Transcription factors (TFs) are DNA binding
    proteins that also interact with RNA polymerase
    complex to activate or repress transcription
  • TFs contain characteristic DNA binding motifs
  • http//www.ncbi.nlm.nih.gov/books/bv.fcgi?r
    idgenomes.table.7039
  • TFs recognize specific short DNA sequence motifs
    transcription factor binding sites
  • Several databases for these, e.g. TRANSFAC
  • http//www.generegulation.com/cgibin/pub/data
    bases/transfac

44
Zinc finger-containing transcription factors
  • Common in eukaryotic proteins
  • Estimated 1 of mammalian genes encode
    zinc-finger proteins
  • In C. elegans, there are 500!
  • Can be used as highly specific DNA binding
    modules
  • Potentially valuable tools for directed genome
    modification (esp. in plants) human gene
    therapy

Brown Fig 9.12
BIOS Scientific Publishers Ltd, 1999
45
Global alignment of human mouse obese gene
promoters (200 bp upstream from TSS)
Fig 5.14 Baxevanis Ouellette 2005
46
Reading Assignment (for Wed)
  • Mount Bioinformatics
  • Chp 8 Prediction of RNA Secondary Structure
  • pp. pp. 327-355
  • Ck Errata http//www.bioinformaticsonline.org/hel
    p/errata2.html
  • Cates (Online) RNA Secondary Structure Prediction
    Module
  • http//cnx.rice.edu/content/m11065/latest/
About PowerShow.com