Genome Sequence determination - PowerPoint PPT Presentation

1 / 87
About This Presentation
Title:

Genome Sequence determination

Description:

Determining the full nucleotide sequence of one strain of an organism ... Assemble the reads by using phred/phrap/consed softwares. Contig 1. Contig 2. Contig 3 ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 88
Provided by: chungyu
Category:

less

Transcript and Presenter's Notes

Title: Genome Sequence determination


1
Genome Sequence determination
???
E-mail cychen_at_cycu.edu.tw Web site
www.cychen.idv.tw
2
Complete Microbial Genomes
3
Genome what now?
  • Sequencing is
  • Determining the full nucleotide sequence of one
    strain of an organism
  • Making predictions of genes within that sequence
    predicting the function of those genes
  • HARD!!!!
  • Sequencing requires
  • Time
  • Money
  • People
  • Computers

4
Genome what now?
  • Before Sequencing
  • Nature of an organism
  • Genetic code
  • Genome size
  • Genome structure
  • Sequencing means
  • - Bioinformatic
  • - Functional Assay
  • - More.

5
Organism Selection
Library Creation
6
Organism Selection
Library Creation
Sequencing
7
Organism Selection
Library Creation
Sequencing
Assembly
8
Organism Selection
Library Creation
Sequencing
Assembly
9
Organism Selection
Library Creation
Sequencing
Assembly
10
Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
11
Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
12
Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
13
Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
Annotation
14
Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
Which steps are computationally expensive?
Annotation
15
Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
Annotation
16
Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
Which steps have not already been
exceptionally well studied?
Annotation
17
Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
Annotation
18
Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
Which step has not been subjected to a variety
of approaches?
Annotation
19
Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
Annotation
20
Organism Selection
Nature of an organism Pathogen? Genetic
code Genome size Genome structure
21
Vibrio vulnificus
Strain YJ016 Genome Size 5.2 Mb Source
Southern Taiwan Significance Virulence Strategy
Whole Genome Shotgun Sequencing Coverage 10X
22
Organism Selection
Nature of an organism Pathogen? Genetic code
Special Code? Genome size Genome structure
23
Genetic Code Tables http//www.ncbi.nlm.nih.gov/Ta
xonomy/Utils/wprintgc.cgi?modec
24
Organism Selection
Nature of an organism Pathogen? Genetic code
Special Code? Genome size How many
Megabases? Genome structure
25
Organism Selection
Nature of an organism Pathogen? Genetic code
Special Code? Genome size How many
Megabases? Genome structure Linear/Circular
Chromosome? How many?
26
How to sequence a complete genome?
Sizes of bacterial genomes vary between
Mycoplasma genitalium and Myxobacteria
0.6 Mb to 13 Mb reading length of DNA
sequencing reactions is just 600 bp ( 0.0006
Mb) ? a subdivision of the genome is obviously
necessary If the genome needs to be subdivided
into small pieces of suitable sizes for
sequencing, then Individual sequences/fragments
need to be ordered somehow into their "native"
order Therefore, overlaps between each other
are necessary in order to re-assemble the
pieces ? there are two main sequencing
strategies 1. whole genome shotgun
sequencing 2. ordered shotgun sequencing
27
c Coverage
28
  • Two ends are overlapped
  • Non overlapped
  • Plasmid percentage in contigs

29
(No Transcript)
30
Library Creation
  • Team Works
  • QC control
  • Time Table
  • Budget
  • Paper

31
Standard Operation Procedures of a Genome project
A. Decision
Mapping
Protocol 1
QC
PCR Confirm
Protocol 2
B. Library
Protocol 3
DNA purification
Protocol 4
PFG
FISH
QC
????
PCR confirm
Protocol 5
Shotgun Library
Picking
Print Labels
C. Sequencing
QC
Protocol 6
Plasmid DNA
Sequencing Reactions
Dye Primers
Protocol 7
QC
Dye Terminator
Protocol 8
Gel Running
Protocol 9
377
QC
Protocol 10
3700
D. Finish
Protocol 11
Assemble
Protocol 12
Annotation
32
Library (1)
Random Shearing of Genomic DNA
  • Restriction enzyme
  • Sau3AI (GATC)--- affected by CG methylase
  • MboI (GATC) affected by dam methylase
  • -- not affected by CG
    methylase
  • 2. Sonication
  • Sonication Bal31 repair T4 DNApolymerase
    Sizing
  • Recover Ligation
  • 3. GeneMachine easy sizing by filter

33
(No Transcript)
34
Library (2) Library clones Sequencing clones
Chromosome I
Chromosome II
1.8 Mb
3.3 Mb
Shotgun library
Library 1 2.5-3.5 kb inserts 7X Coverage
Library 2 5.5-7.5 kb inserts 3X Coverage
Library 3 30 kb inserts Cosmid library 10X Clone
Coverage, 0.4X Sequence Coverage
Sequenced for both ends
Sequenced for both ends
Sequenced for both ends
Assemble the reads by using phred/phrap/consed
softwares
Contig 1
Contig 2
Contig 3
Closing the gaps by primer walking, PCR or
re-sequencing
Annotation
35
Library (2) Library clones Sequencing clones
5,000,000 bp
1000 bp/per clone
5,000,000/1000 5000 clones 52 x 96 well plates
10 x redundancy
52 x10 x 96 wells plates
Library clones
Both ends sequencing
2 x 52 x 10 x 96 well plates ? 1000 plates
Sequencing clones
36
Sequencing (1) Time table
  • 377 2 runs/per day (one run for one 96 well
    plate)
  • 3700 6 runs/per day (POP6)
  • 8 runs/per day (POP5)
  • 3730 12 runs/per day

2. 377 x 2 sets 4 runs/per day 3700 x 2 sets
6 x1 8 x 1 14 runs/per day total 18 runs
per day
3. 1000 plates / 18 56 days 11 weeks (3
months)
4. Today, 3730 for 4 sets 48 runs/per day
1000 plats /48 20 days
37
Sequencing (2) Cost
38
????
ABI 377 ABI 3700
MegaBace 4000
ABI 3730XL
39
The automated production line for sample
preparation at the Whitehead Institute, Center
for Genome Research. The system consists of
custom-designed factory-style conveyor belt
robots that perform all functions from purifying
DNA from bacterial cultures through setting up
and purifying sequencing reactions.
40
Reads vs. Assembled Contigs
41
Reads and Assembled Size
42
How assemble software works?
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
What is Gap Closure?
  • What are gaps?
  • Unsequenced regions located between assembly
    generated fragments of contiguous sequence
    (contigs)
  • What causes gaps?
  • Host toxicity, secondary structure, ???
  • Back to gap closure
  • Producing, purifying, and sequencing, or
    locating, the missing regions of DNA

48
How Can I Close Gaps?
  • Genome Walking
  • Blind PCR extension of contigs
  • Multiplex PCR
  • Combinatorial trial of every contig pair
  • Read Pair Analysis
  • Use information stored by the assembler to
    suggest alignments, then PCR
  • Comparative Alignment

49
Comparative Alignment(the Bioinformatics
Approach)
  • Find locations where contigs are homologous to
    known sequences
  • Determine if any contigs share homology in the
    same region of the same sequence
  • Design primers
  • Conduct PCR with those primers
  • Sequence that product and use that sequence to
    close the gap

50
Blast Organism X(cross) - Comparison
  • Compares contig ends to NCBI nr database with
    BlastN
  • Parses all hits and finds biologically possible
    contig pairs
  • Using the flanking sequence and Primer3, designs
    primers that will produce a PCR product spanning
    that gap

51
Using the flanking sequence and Primer3, design
primers that produce a PCR product spanning that
gap
TTATGCTATCGAATTCCGACG
GTCTGCAGGTCTTCCGACGTAG
52
Using the flanking sequence and Primer3, design
primers that produce a PCR product spanning that
gap
TTATGCTATCGAATTCCGACG
GTCTGCAGGTCTTCCGACGTAG
53
Using the flanking sequence and Primer3, design
primers that produce a PCR product spanning that
gap
TTATGCTATCGAATTCCGACG
GTCTGCAGGTCTTCCGACGTAG
54
Information to reduce gaps
  • The distance of both end sequences
  • Cosmid anchors
  • Known genes
  • Compare with other genomes
  • Good luck

55
Finishing Standards
  • GENERAL RULES FOR FINISHING
  • Phase1 draft sequence assembled in contigs
  • Phase2 Contigs in order and linking
  • Phase3 Assembled as one contig with low
    error rate (0.01)
  • 2. Strategy of finishing
  • A. primer walking
  • B. re-sequencing individual clone
  • C. PCR and sequencing
  • D. Screening new clones
  • E.. Subcloning
  • F. Deletion and sequencing
  • G. Change sequencing chemical
  • H. Restriction map
  • I. End sequencing

56
Shotgun sequencing analogy shredding several
copies of Essential Cell Biology, then putting
back together via overlapping phrases Really only
good for small genomes 1995 used for genome
of Haemophilus influenza Problem repetitive
nucleotide sequences, which make up large part of
vertebrate genomes (Analogy -- phrases like the
human genome and difficulties they cause)
57
10_10_Repetit.sequence.jpg
Repetitive sequences make correct assembly
difficult
58
Multiple Genes
59
Timeline of large-scale genomic analyses. Shown
are selected components of work on Several
non-vertebrate model organisms (red), the mouse
(blue) and the human (green) from 1990 earlier
projects are described in the text. SNPs, single
nucleotide polymorphisms ESTs, expressed
sequence tags.
60
SCIENCE VOL. 277, p1453-1462, 1997
61
(No Transcript)
62
(No Transcript)
63
Set up genome center
1998
1999
????
NLBL mapped Over 300 clones
2000
???????????? ????
??????
????
2001
?????????? ???????
2002
????????
?????????? ???
?????????
?????????? ??????
2003
?????????? ????
YMGRC/NHRI
64
Vibrio vulnificus
Strain YJ016 Genome Size 5.2 Mb Source
Southern Taiwan Significance Virulence Strategy
Whole Genome Shotgun Sequencing Coverage 10X
65
http//genome.nhri.org.tw/vv/
66
Vibrio vulnificus
67
Global feature of the Vibrio vulnificus YJ016
genome
68
GC of V. vulnificus Chromosome 1 2
Chromosome 1
Chromosome 2
69
GC skew of V. vulnificus Chromosome 1 2
Chromosome 1
Chromosome 2
70
Comparison of the similarity between V.v. and
V.c. genome
71
Circular presentation of Vibrio vulnificus YJ016
genome
Chromosome 2
1.85 Mb
Chromosome 1
3.3 Mb
Plasmid pYJ016
48.5 Kb
72
Comparison of predicted genes of V. vulnificus
YJ016, V. cholerae El Tor N16961, and E. coli K12
73
(No Transcript)
74
(No Transcript)
75
Some more technological approaches (some of
which really work!) Sequencing by
hybridization (annealing) Sequencing by
ligase-edited annealing Pyrosequencing Note
there are also higher tech versions of classic
Sanger sequencing in the works (see
http//www.helicosbio.com)
76
(No Transcript)
77
(No Transcript)
78
(No Transcript)
79
Several companies are pursuing massively
parallel ( cheaper) new DNA sequencing
strategies, including some that involve single
molecule analyses. Some of the main players are
given below 454 Life Sciences (http//www.454.com
/enabling-technology/the-system.asp) Solexa (now
part of Illumina) (http//www.illumina.com/pages.i
lmn?ID203) Helicos BioSciences (http//www.helico
sbio.com) VisiGen Biotechnologies (http//www.visi
genbio.com/technology.html)
80
(No Transcript)
81
(No Transcript)
82
(No Transcript)
83
(No Transcript)
84
Solexa sequencing technology
85
Solexa sequencing technology
86
Solexa sequencing technology
87
Thanks you
Write a Comment
User Comments (0)
About PowerShow.com