Environmental Genome Shotgun Sequencing of the Sargasso Sea Venter JC, et al - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Environmental Genome Shotgun Sequencing of the Sargasso Sea Venter JC, et al

Description:

Performing whole-genome shotgun sequencing on the samples (focusing on microbes) ... Rhodopsin-like Sequences. Identified 13 subfamilies of rhodopsin-like genes ... – PowerPoint PPT presentation

Number of Views:389
Avg rating:3.0/5.0
Slides: 19
Provided by: Kris274
Category:

less

Transcript and Presenter's Notes

Title: Environmental Genome Shotgun Sequencing of the Sargasso Sea Venter JC, et al


1
Environmental Genome Shotgun Sequencing of the
Sargasso SeaVenter JC, et al
  • Kristine Briedis
  • Evening Journal Club
  • 4 April 2005

2
Overview
  • Craig Venter is sailing around the world
    collecting ocean water
  • Performing whole-genome shotgun sequencing on the
    samples (focusing on microbes)

http//www.calacademy.org/calwild/2005winter/stori
es/venter.html
3
Big Picture
  • Of 1.7 million species classified so far,
    roughly 6000 are microbes
  • True number of microbes is obviously larger than
    6000
  • Imagine if our entire understanding of biology
    was based on a visit to the zoo. Thats where
    weve been in microbiology.
  • Norman Pace, Univ of Colorado, Boulder
  • http//www.wired.com/wired/archive/12.08/venter.ht
    ml?pg3topicventertopic_set

4
Sargasso Sea
http//www.smithsonianmag.si.edu/smithsonian/issue
s98/nov98/map_jpg.html
5
Voyage
http//www.sorcerer2expedition.org/version1/HTML/m
ain.htm
6
Methods
  • Performed whole-genome shotgun sequencing of
    surface water samples from Sargasso Sea
  • Samples were filtered to isolate microbes
  • Created genomic libraries with 2 to 6 kb inserts
  • Sequenced plasmid clones
  • Resulted in gt1.5 Gbp of microbial DNA sequence

7
Shotgun Sequencing
http//www.bioteach.ubc.ca/Bioinformatics/GenomePr
ojects/shotgun201.gif
8
Organism Identification
  • Focused analysis on scaffolds with at least 3X
    coverage depth
  • 333 scaffolds 2226 contigs 30.9 Mbp 25 of
    the data set
  • Used oligonucleotide frequencies, depth of
    coverage, and similarity to previously sequenced
    genomes to separate some sequences into organism
    bins
  • Identified several populations related to known
    species

9
Prochlorococcus gene conservation
10
SNPs
  • Most deeply covered scaffolds 21 scaffolds with
    gt14X coverage contain 1 SNP per 10,000 bp
  • Indicates discrete species are present
  • Prochlorococcus scaffolds display blend of
    discrepancies from the consensus sequence
  • Example-two scaffolds share gene synteny and
    align to MED4, but contain 15 inserted genes of
    probable phage origin
  • Also some regions of MED4 not present in
    scaffolds largest region codes for surface
    polysaccharide biosynthesisperhaps not present
    in Sargasso Sea population

11
Gene Finding
  • Most sequence is in short (lt10kb) unassociated
    scaffolds and singletons
  • Used sequence alignments to determine the most
    likely coding frame
  • Ends of alignments refined to include start and
    stop codons
  • Identified 1,214,207 genes (700 MB)
  • Additional hypothetical genes identified by
    conserved open reading frames69,901 genes
  • SwissProt database contains 137,885 sequences

12
Gene Function
13
Phylotypes
14
Photosynthesis in Sargasso Sea
  • Thought to be dominated by the cyanobacteria
    Prochlorococcus and Synechococcus
  • But, gt90 of cyanobacteria scaffolds appear to be
    Prochlorococcus
  • Could be due to the gradient sampled and the
    larger size of Synechococcus

15
Bacteriorhodopsin
  • Transmembrane protein that is a green
    light-driven proton pump
  • Protons pumped out of cell then flow back in
    through ATP Synthase to create ATP
  • Some rhodopsins found on scaffolds of organisms
    previously unknown to contain them

16
Rhodopsin-like Sequences
  • Identified 13 subfamilies of rhodopsin-like genes
  • Four families of proteins from cultured organisms
    and nine families from uncultured organisms
  • Expression levels of these genes are unknown

17
Problems
  • Large data dump in NCBI angered some
  • Unclear how effective filtering was (some
    apparent eukaryotic DNA found)
  • Some questioning of how samples were collected
  • Had some trouble getting permits from countries
    to collect ocean water samples

18
Our Plans
  • 8.3 million sequences to analyze
  • Sequences clustered by CD-HIT
  • Run Superfamily HMMs on data
  • Look for FSFs that are unique to particular
    superkingdoms (as found by Song)
  • Look for FSFs that are not present in the three
    superkingdoms
Write a Comment
User Comments (0)
About PowerShow.com