Title: THE PROMISE OF 454 SEQUENCING IN CHARACTERIZING NATURAL DIVERSITY why sequencing centers will be out
1(No Transcript)
2(No Transcript)
3(No Transcript)
4THE PROMISE OF 454 SEQUENCING IN CHARACTERIZING
NATURAL DIVERSITYwhy sequencing centers will be
outmoded in 6 months
GSC, Cambridge, Sept 2006
Dept. Biology , SDSU, San Diego, CA Computational
Sciences Research Center, SDSU , San Diego, CA
Center for Microbial Sciences, San Diego,
CA Fellowship for Interpretation of Genomes,
Chicago, IL The Burnham Inst. for Medical
Research, San Diego, CA IMEC, LLC, San Diego, CA
5Outline
- Fabulous four-five-four for facile functional
findings - Is community structure antiestablishment?
- Functional analysis is a blast
- Why people suck
- Why were screwed and what weve done
6Metagenomics
200 liters water 5-500 g fresh fecal matter
Concentrate and purify viruses
Epifluorescent Microscopy
Extract nucleic acids
DNA/RNA LASL
Sequence
Breitbart et al., multiple papers
7Pyrosequencing
whole genome amplification
5-100ng DNA
2-5 µg DNA
www.454.com
8454 Sequence Data(In one year plus a bit)
- 71 libraries
- 40 microbial, 31 viral (many partial plates)
- 1,309,019,537 bp total
- 45 of the human genome
- More than all complete and partial bacterial
genomes - gt10 of community sequencing of JGI per year
- 12,632,567 sequences
- Average 177,923 per library
- Average read length 103.5 bp
- Av. read length has not increased
9Metazoan associated Corals Fish Human
blood Human stool Human other
Sampling Sites
Freshwater Aquifer Glacial lake
Marine Near-shore water Off-shore water
Near- and off-shore sediments
Extreme Hot springs (84oC 78oC) Soda lake
(pH 13) Solar saltern (gt35 salt)
Terrestrial/Soil Amazon rainforest Konza
prairie Joshua Tree desert
Air
10Can you assemble (100bp) 454 sequences?
Thanks Lutz Krause
11Community structure
Community structure based on frequency of
finding overlapping fragments from the sequences
12Functional Analysis UsingThe SEED
Database.Developed By FIG
Current version 661 Bacteria (396 complete) 38
Archaea (26 complete) 562 Eukarya (29
complete) 82 Metagenomes
13Functional analysis using the SEED
14Heat Map for Comparing Frequencies
15Phages In The Worlds Oceans
16SAR Aligned Against the Chlamydia ?4
Individual sequence reads
Coverage
Concatenated hits
Chlamydia phi 4 genome
Chl4 ORF calls
12,297 sequence fragments hit using TBLASTX over
a 4.5 kb genome
17Outline
- Fabulous four-five-four for facile functional
findings - Is community structure antiestablishment?
- Functional analysis is a blast
- Why people suck
- Why were screwed and what weve done
18Phages, Reefs, and Human Disturbance
19Phages, Reefs, and Human Disturbance
Palmyra
Washington
Fanning
The Northern Line Islands Expedition, 2005
2016S rDNA at each island
21Christmas to Kingman Bias in No. Phage
Hosts Negative numbers mean relatively more phage
hosts at Kingman
22Outline
- Fabulous four-five-four for facile functional
findings - Is community structure antiestablishment?
- Functional analysis is a blast
- Why people suck
- Why were screwed and what weve done
23Computational Challenges
- Sequence annotations and analysis
- What is there?
- What is it doing?
- How is it doing it?
- Gene predictions in unknowns
- Lutz Krause
- Sequence comparisons
- BLAST
- Other ways to rapidly compare short sequences
- What happens when everyone is using 454
sequencing? - Metadata or just data?
24Sequence data from 21 libraries
600 million bp
6 million sequences
- Each BLASTX search takes 1,000 CPU hours
- 71 libraries 71,000 CPU hours or 8.1 CPU years
- Users want
- repeat runs,
- TBLASTX,
- more analysis
- more data
- more, more, more, more
25Life Sciences Gateway
SOAP interface for job submission and control
26TeraGrid Resources
27SDSU Forest Rohwer Liz Dinsdale
Beltran Rodriguez-Brito USF Mya
Breitbart Rohwer Lab Linda Wegley Florent
Angly Matt Haynes
FIG Veronika Vonstein Ross Overbeek
Annotators
ANL Rick Stevens Bob Olsen
Also at SDSU Anca Segall Stanley Maloy
Math Guys_at_SDSU Peter Salamon Steve Rayhawk
Bielefeld Lutz Krause
MIT Ed DeLong
SIO Stuart Sandin