The Zebrafish Genome Sequencing Project Bioinformatics resources - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

The Zebrafish Genome Sequencing Project Bioinformatics resources

Description:

Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources mis-joins and other complications change of strategy from Zv2 ... – PowerPoint PPT presentation

Number of Views:158
Avg rating:3.0/5.0
Slides: 31
Provided by: Sange6
Category:

less

Transcript and Presenter's Notes

Title: The Zebrafish Genome Sequencing Project Bioinformatics resources


1
The Zebrafish Genome Sequencing Project
Bioinformatics resources
Kerstin Howe, Mario Caccamo, Ian Sealy
2
Bioinformatics resources
  • outline
  • clone mapping, sequencing and manual annotation
    in
  • genome assemblies and automated annotation in
  • integrated ZF-Models data and tools

3
Clone mapping and sequencing
  • mapping
  • 2 BAC Tuebingen libraries
  • 1 BAC and 1 cosmid library from single Tuebingen
    double-haploid fish
  • end sequencing, RH mapping, fingerprinting
  • pieced together according to fingerprints,
    marker mapping, sequence alignment
  • currently 2500 ctgs

4
Clone mapping and sequencing
  • sequencing pipeline
  • select clones based on position in fpc contig
  • subcloning
  • sequencing
  • automatical assembly/pre-finishing
  • (back to sequencing if necessary)
  • finishing
  • QC
  • automated analysis pipeline
  • manual annotation
  • submission to EMBL

5
Manual annotation
unfinished sequence finished
sequence automated analysis pipeline manual
annotation
6
Manual annotation
  • annotation policy
  • follows guidelines for human annotation (havana
    team, Sanger Institute)
  • no "guesses", annotations solely based on
    supporting evidence
  • annotation of CDSs and UTRs / transcripts
  • splice variants
  • pseudogenes
  • poly A features
  • transposons
  • repeats
  • approved nomenclature (SIclone.number)
  • collaboration with ZFIN
  • existing ZFIN records are reported
  • ZFIN provides new records for newly found genes

7
Manual annotation
8
vega.sanger.ac.uk
9
Vega
contigview
10
Vega
geneview
11
www.sanger.ac.uk/Projects/D_rerio
12
www.sanger.ac.uk/Projects/D_rerio
13
when to use what
  • go to vega.sanger.ac.uk if you need
  • highly reliable sequence
  • highly reliable annotation (with your input)
  • your gene stable over time (TILLING)
  • go to www.ensembl.org if you need
  • the whole genome
  • comparative data
  • ZF-Models microarray or insertional mutagenesis
    data
  • complicated searches (BioMart)

14
Zebrafish Genome Project
whole genome shotgun sequencing
clone mapping and sequencing
WGS reads
integration
(un)finished clones
assembly release (Zv5)
8,000 finished clones (1 Gb)
automatic annotation
manual annotation
15
WGS assembly
Phusion assembler - High Performance Assembly
Group (Zemin Ning et al.)
reads
group reads
contig
contig
contig
contig
contig
supercontig
supercontig
supercontig
supercontig
16
Read grouping
  • k-mer word hashing


continuous base hash - k12 ATGGCGTGCAGTCCATGTTCG
GATCA ATGGCGTGCAGT TGGCGTGCAGTC GGCGTGCAGTCC
GCGTGCAGTCCA
gap hash k12 (4x3) - dealing with
variation ATGGCGTGCAGTCCATGTTCGGATCA ATGGCGTGCAGTC
CATGT TGGCGTGCAGTCCATGTT GGCGTGCAGTCCATGTTC
GCGTGCAGTCCATGTTCG
17
Zebrafish Genome Project
whole genome shotgun sequencing
clone mapping and sequencing
WGS reads
integration
(un)finished clones
assembly release (Zv5)
7,000 finished clones (1 Gb)
automatic annotation
manual annotation
18
Integration
BACs
BX005049.6
BX005123.6
BX005153
BX005057.8
fpc contig
Zv5 scaffoldn
19
Assemblies
Zv5 Zv4 Zv3 Zv2
release date assembly 27.05.05 12.07.04 27.11.03 03.04.03
total length bp 1,630,306,866 1,592,025,686 1,459,115,486 1,452,210,772
scaffolds 16,214 21,333 58,339 83,470
finished clones 4,519 (699 Mb) 2.828 (443 Mb) 1,502 (263Mb) -
scaffolds in chr 1-25 1,749 1,892 1,490 -
scaffolds in fpc contigs 265 (chrU) 694 (chrU) 1,842 5,677
NA scaffolds 14,676 18,747 54,798 77,793
sum(length) chr 1-25 bp 1,200,129,620 (73) 1,097,507,810 (69) 718,270,423 (49) -
sum(length) ctgs 183,993,739 (11) 176,222,396 (11) 365,271,659 (25) 1,143,459,008
sum(length) NAs 246,183,507 (16) 318,295,480 (20) 335,615,307 (23) 308,751,764
20
Automatic Annotation
21
Ensembl
22
Contigview
23
Geneview
24
Searching Ensembl
25
Biomart
26
(No Transcript)
27
Dos and Donts
go elsewhere (Ensembl) if you want to know about
the whole genome need comparative data need
ZF-Models microarray or insertional mut data
need to do complicated searches go to Vega if
you need highly reliable sequence need highly
reliable annotation need your gene stable over
time (TILLING)
28
DAS
genome browser
local storage
reference sequence
XML
29
SNPs and Indels
30
Ensembl releases
Write a Comment
User Comments (0)
About PowerShow.com