Module 5: Gene prediction - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Module 5: Gene prediction

Description:

based on similarity to known genes, or cDNAs ... ICG: TATA-Box predictor. PolyA signal predictors. CSHL: argon.cshl.org/tabaska/polyadq_form.html ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 24
Provided by: gareth76
Category:
Tags: gene | module | prediction | tata

less

Transcript and Presenter's Notes

Title: Module 5: Gene prediction


1
Module 5 Gene prediction
sequence
2
What is a gene?
  • Prokaryotic genes
  • Eukaryotic genes

3
Prokaryotic gene
  • Small genomes, high gene density
  • Haemophilus influenza genome 85 genic
  • Operons
  • One transcript, many genes
  • No introns.
  • One gene, one protein
  • Open reading frames
  • One ORF per gene
  • ORFs begin with start,
  • end with stop codon

4
Eukaryotic Gene
  • Much lower gene density.
  • Undergo several post transcriptional
    modifications.
  • 5 CAP
  • Poly A tail
  • Splicing

5
Sequencing genomes
The Hybrid Approach
6
Supporting evidence
mRNA
7
Collating the evidence
DNA databases (EMBL/Genbank/DDBJ)
Protein databases (Swall)
TrEMBL (automatic translation of CDS from DNA
dbs)
Swissprot (curated data)
mRNA (cDNA)
Genomic (finished, draft)
dbEST (ESTs)
8
Genome Browsers
Ensembl www.ensembl.org EBI and Sanger
collaboration Gene build, predict novel
genes UCSC genome.ucsc.edu University of Santa
Cruz Annotate other gene builds NCBIwww.ncbi.n
lm.nih.gov/mapview/ NCBI map viewer Gene build,
predicts novel genes
9
Genes
Known genes as catalogued by the reference
sequence project Ensembl known genes (red
genes) NCBI known genes Novel genes (1) based
on similarity to known genes, or cDNAs these
need not have 100 matching supporting
evidence Ensembl novel genes (black) NCBI Loc
genes
10
Genes
Novel genes (2) based on the presence of
ESTs resource of alternative splicing EST genes
in Ensembl (purple) Database of transcribed
sequences (DOTs) Acembly Ab initio gene
prediction Single organsism Genscan Comparative
information Twinscan Pseudogenes - matches a
known gene but with a a disrupted ORF - a
minefield!
11
Gene - www.ncbi.nlm.nih.gov80-entrez/
query.fgi?dbgene
12
Refseq - http//www.ncbi.nlm.nih.gov/RefSeq/
13
Genes in Ensembl
14
Genes in Ensembl
15
Suporting evidence in Ensembl
DNA
Protein
16
Genes in UCSC
Put in view of UCSC
17
Genes in UCSC
18
Gene prediction programs
  • Ab initio gene prediction
  • First ones predicted single exons, e.g. GRAIL
    (Uberbacher, 91) or MZEF (Zhang, 97)
  • Later, predict entire genes e.g. Genscan (Burge
    97) and Fgenesh (Solovyev, 95)
  • Predict individual exons based on codon usage and
    sequence signals (start, stop, splice sites)
    followed by assembly of putative exons into genes
  • Genscan predicts 90 of coding nucleotides, and
    70 of coding exons (Guigo, 00)
  • Can not use gene prediction methods alone to
    accurately identify every gene in a genome

19
Gene prediction programs
  • Sn Sensitivity TP/(TPFN)
  • How many exons were found out of total present?
  • Sp Specificity TP/(TPFP)
  • How many predicted exons were correct out of
    total exons predicted?

20
Twinscan
Gene structure prediction model Extends
probability model of GENSCAN Exploits homology
between two related genomes Notable improvement
on GENSCAN
21
Twinscan
22
Twinscan - genes.cs.wustl.edu/
23
Other sources of gene prediction
  • ORF detectors
  • NCBI http//www.ncbi.nih.gov/gorf/gorf.html
  • Promoter predictors
  • CSHL http//rulai.cshl.org/software/index1.htm
  • BDGP fruitfly.org/seq_tools/promoter.html
  • ICG TATA-Box predictor
  • PolyA signal predictors
  • CSHL argon.cshl.org/tabaska/polyadq_form.html
  • Splice site predictors
  • BDGP http//www.fruitfly.org/seq_tools/splice.htm
    l
  • Start-/stop-codon identifiers
  • DNALC Translator/ORF-Finder
  • BCM Searchlauncher
Write a Comment
User Comments (0)
About PowerShow.com