Genome Annotation - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Genome Annotation

Description:

Map cDNAs and ESTs using Exonerate (determine coverage, ... Fugu. Automatic Annotation. Coming Soon. Human NCBI 31. Rat. C briggsae. C elegans. Mosquito ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 23
Provided by: MHAM5
Category:

less

Transcript and Presenter's Notes

Title: Genome Annotation


1
Genome Annotation
Laura Clarke 18/12/02
2
Overview
  • What is Genome Annotation
  • Manual Curation
  • Automatic Annotation
  • Conclusions

3
Genome Annotation
Our Aim
4
Genome Annotation
Gene Identification
Known genes
Novel genes
  • where?
  • genomic structure?
  • transcripts(s)?
  • protein(s)?
  • attach useful links
  • how to predict?
  • require evidence
  • transcripts(s)?
  • protein(s)?
  • attach useful links

5
Manual Curation
6
Manual Curation
  • Who?
  • The G16
  • Wormbase
  • Flybase

7
Manual Curation
Identifying Genes
  • Known
  • Novel
  • Novel transcript
  • Putatitive
  • Pseudogene

8
Manual Curation
9
Automatic Annotation
10
Automatic Annotation
  • Who?
  • EnsEMBL
  • NCBI
  • UCSC

11
Automatic Annotation
Ensembl
12
Automatic Annotation
Raw Compute
Sequence data arrives in contigs
Repeat masking
Ab initio predictions (Genscan)
Blast the predictions against swall, vertebrate
RNA, unigene
ePCR places markers on the sequence
Assembly information is used to position contigs
on a golden path
EnsEMBL core
13
Automatic Annotation
GeneBuild
human proteins
Other proteins
cDNAs
ESTs
Pmatch
Exonerate
GeneWise
Est2Genome
Add UTRs
Merge
Genscan exons
Genes
EST-genes
14
Automatic Annotation
Genewise
Protein Sequences
Aligned to the Genome
Blast and MiniSeq
Genewise
15
Automatic Annotation
ESTs and cDNA
Map cDNAs and ESTs using Exonerate (determine
coverage, identity and location in genome)
Store hits and filter on percentage identity and
length coverage
blast sequence and create a miniseq
Map transcripts back into genome-assembly
16
Automatic Annotation
Miniseq - the need for speed
Minigenomic 1kb on either side run Genewise
Map back to genomic
Spliced alignment
17
Automatic Annotation
Resources
  • 8x ES40 Alpha (667 MHz) with 2Tb fibre channel
    storage
  • 6x ES45 Alpha (1GZ) with 4Tb fibre channel
    storage
  • 360x DS10L (467 MHz) farm with 60Gb local disk
    storage
  • 767xRLX800i with 80Gb of local disk storage
  • Further 21Tb storage on farm
  • Tru64 UNIX (avoids the 2Gb file limit)
  • 7 MySQL (v. 3) instances
  • Most binaries and all sequence databases stored
    locally (avoids using NFS)

18
Automatic Annotation
Latest full Human Build
  • NCBI 30 build, released Sept 2002
  • Ensembl genes  22,980
  • Ensembl transcripts 27,628
  • Ensembl exons 204,542
  • Made from
  • 41,955 proteins, 84,079 cdnas
  • Transcripts from human proteins 43,418
  • Transcripts from homology 4,818
  • cDNA alignments 75,668
  • Transcripts with UTRs 32,661
  • Genscan predictions
    73,128 (375,361 exons)

19
Automatic Annotation
Other Analyses
  • Protein Analysis
  • Interpro
  • Other Algorithms
  • Comparative Analysis
  • Homologous Gene Pairing
  • Synteny finding

20
Automatic Annotation
Other Species
  • Mouse
  • Mosquito
  • Zebrafish
  • Fugu

21
Automatic Annotation
Coming Soon
  • Human NCBI 31
  • Rat
  • C briggsae
  • C elegans
  • Mosquito
  • Drosophilla

22
EnsEMBL Website
Write a Comment
User Comments (0)
About PowerShow.com