Mo17 shotgun project - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Mo17 shotgun project

Description:

variation based on 20X 'clone' cover. ... SNP/variation detection by alignment to B73 sequence ... Structural variation detection via paired-end placements ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 17
Provided by: maiz6
Learn more at: https://www.maizegdb.org
Category:

less

Transcript and Presenter's Notes

Title: Mo17 shotgun project


1
Mo17 shotgun project
  • Goal sequence Mo17 gene space with inexpensive
    new technologies
  • Datasets in progress
  • Four-phases of 454-FLX sequencing to max of 12X
  • Include 3kb paired-end sequencing (for
    short-range structural variation)
  • Ultra-short-read Solexa or ABI-SOLID (for
    polishing)
  • Preparation of methyl-spanning linkers to augment
    IBM map integration, detect rearrangements
    (Sanger end-sequence)
  • (Ideally would add Mo17 BAC-ends from DuPont, if
    available)

2
Shotgun
  • Independent of tiling path
  • -Can detect non-repetitive gene space even
    within otherwise complex regions that may not be
    in tiling path
  • Disadvantages of short-reads
  • -Cant expect to recover repetitive sequences

3
Four Phases of Sequencing Complete in 2007
  • Sequencing contract established with 454/Roche.
    Four Phases, including collaborative runs at no
    cost in P2-4.
  • Phase I underway (30 FLX runs.) Library QC and
    initial assessment of data quality (30 FLX runs).
  • 10 FLX runs totaling 1 Gb (0.4X)
  • 20 FLX pair runs spanning 12 Gb (5X span in 3kb
    inserts)
  • Assess quality, coverage, contamination,
    chimerism, accuracy
  • Phase II. (80 runs plus 30 runs from Roche, total
    110 runs). Rough draft stage.
  • 40 FLX-pair runs spanning 36 Gb (total 48 Gb10X
    span)
  • 70 FLX runs for 7 Gb (total 8Gb 3.5X sequence)
  • Assess rough draft assembly (3 methods), compare
    B73, sorghum

4
Phases III and IV
  • Phase III (50 runs 20 contributed)
  • 20 FLX-pair runs (total spanning cover 20X)
  • 50 FLX runs (total 13 Gb sequence 5.5X)
  • Draft assembly. Rough annnotation. Assessment
    of structural
  • variation based on 20X clone cover.
    Assessment complete by
  • end of 2007.
  • Phase IV (60 runs 30 contributed)
  • 90 FLX runs (to reach total 22 Gb 10X)
  • Data collection complete by end of 2007.
  • Early 08. Final assembly. Integration with MSSL
    ends and IBM
  • map. Proceed to annotation and full analysis.
  • Note Later phases may use next FLX release with
    longer
  • read lengths. To be conservative, sequence from
    FLX-pair
  • reads not included in sequence coverage
    estimates.
  • Total sequencing cost for Phase I-IV 1.6M

5
454-FLX reads are typically either mostly masked,
or mostly clean
29 of reads have lt quarter of positions masked
58 of reads have gt 2/3 of positions masked
0 0.5
1.0 Percent masked by over-repd
16mers
6
Mo17 454 unique full length alignments vs. B73
MAGIs show high quality of unique alignments
Residual repeats in MAGIs with multiple hits in
454 data
Unique full alignments
7
SNPs and indels of 454 reads relative to MAGIs
consistent with few variation of Mo17/B73
(combines variation with sequencing errors)
SNPs or indels per base
Frequency of reads
8
Multiple assembly alternate plans
  • Divide and conquer
  • Reduce 100 million reads to 50K unique gene
  • spaces of thousands of reads each (10kb) by
  • clustering based on various comparisons
  • Plan A De novo clustering of masked reads
  • Plan B map to B73, assemble (de novo for
    remainder)
  • Plan C sorghum-assisted
  • Use various assemblers to lay-out and produce
  • consensus for each cluster (454 assembly team
    engaged)
  • Polish sequence with Solexa or SOLID for
  • accuracy
  • Link with MSSL pairs, integrate with map

9
Backup analyses vs. B73 reference
  • SNP/variation detection by alignment to B73
    sequence
  • -454/Solexa/Solid (various successful models in
    other species at JGI, elsewhere)
  • Structural variation detection via paired-end
    placements
  • -Needs to be tolerant of chimerism rate
  • -Model of successful human structural analysis
    done with 454 (unpublished)

10
Timeline
  • Phase I in progress, complete by end of month.
    Analysis to OK phase II 10 days.
  • Phase II October
  • Phase III November
  • Phase IV December
  • 454 sequencing complete by end of year

11
58 of each BAC is masked by over-represented
16-mers
12
Outreach Dick McCombie
13
Types of Outreach
  • Public presentations
  • Collaborations
  • CSHL DNA Learning Center

14
Public Presentations
15
Collaborations
  • The Maize Genetics and Genomics Database.
    --Letter
  • for Carolyn Lawrence-MaizeGDB
  • MaizeGDB-web site text, links to data
  • Gramene
  • EBI Ensembl
  • Affymetrix Maize Pilot Expression Array Project
  • Optical map
  • TWINSCAN
  • Vmatch
  • Full-Length cDNA Project

16
CSHL DNA Learning Center
http//www.dnalc.org/maize/maize.html
Write a Comment
User Comments (0)
About PowerShow.com