Formats and standards for sequencing data - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Formats and standards for sequencing data

Description:

... Maq BWA Bowtie RMAP Eland SOAP SOAP2 MOSAIK SOCS PatMaN ZOOM PASS PerM RazerS segemehl MPSCAN BFAST Lastz BLAT Genome Metagenome Gene annotation Gene ... – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 18
Provided by: Matus1
Category:

less

Transcript and Presenter's Notes

Title: Formats and standards for sequencing data


1
Formats and standards for sequencing data
Matúš Kalaš INF389, CBU, BCCS/UiB, Bergen Nov 12,
2010
2
SHRiMP Maq BWA Bowtie RMAP Eland SOAP SOAP2 MOSAIK
SOCS PatMaN ZOOM PASS PerM RazerS segemehl MPSCAN
BFAST Lastz BLAT
454 Solexa/Illumina SOLiD
Genome Metagenome Gene annotation Gene
expression Binding sites Variation
Celera Newbler Velvet Euler SOAPdenovo
GenBank EMBL DDBJ Genome Catalogue SNPdb
NCBI SRA EMBL-EBI ENA Your databases
3
454 output formats
.sff .fna .qual
4
Illumina output formats
.seq.txt .prb.txt Illumina FASTQ (ASCII
64 is Illumina score) Qseq (ASCII 64 is Phred
score) Illumina single line format SCARF
5
SOLiD output format(s)
CSFASTA
6
Real (standard) FASTQ Sanger
FASTQ (ASCII 33 is Phred score)

7
Example of dealing with diverse read formats

in Galaxy
(http//usegalaxy.org)
8
If reads should be deposited in a public
repository

SRA (Short Read Archive) at NCBI ENA at EMBL-EBI
SRA format (XML) SRF format
Or should they be deleted?
9
Common (standard) format for read alignments

SAM BAM ( binary SAM)
10
Some common formats for results (Genome/Gene
annotation)

BED format (genome-browser tracks) GFF
format (gene/genome features) BioXSD (XML)
(any annotation under development)
11
Deposit genome/metagenome in a public repository

INSDC databases GenBank, EMBL, DDBJ
Deposit genome/metagenome metadata
MIGS/MIMS standard by GSC
GCDML format (XML) (under development) following
the MIGS/MIMS standard
12
MIGS Minimum Information about a Genome
SequenceMIMS Minimum Information about a
Metagenome Sequence/Sample

13
MIGS/MIMS checklist

14

15
MIGS/MIMSmetadata example

16
Sequencing experiment metadata

MINSEQE standard by FGED Minimum Information
about a high-throughput Nucleotide SEQuencing
Experiment (under development)
17
Take-home messages
  • Use raw sequencing data when possible
  • For base-call data, use standard FASTQ
    (Sanger, Phred)
  • For read alignments, use SAM/BAM format
  • Use common formats for your results (e.g. GFF or
    BED format)
  • Hope for new, generic, extensible standard
    format(s)
  • Submit MIGS/MIMS-compliant metadata of genome
    sequences
  • Keep an eye on MINSEQE standard, store your
    sequencing metadata
Write a Comment
User Comments (0)
About PowerShow.com