Lecture 2: DNA sequencing - PowerPoint PPT Presentation

Loading...

PPT – Lecture 2: DNA sequencing PowerPoint presentation | free to download - id: 1ea85-MDEzN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Lecture 2: DNA sequencing

Description:

Cloning for sequence 2-200kb. PCR for sequence several thousand bps. Cloning. PCR ... Merge the clones together (with additional information) to get the ... – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 22
Provided by: bin1
Learn more at: http://www.csd.uwo.ca
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Lecture 2: DNA sequencing


1
Lecture 2 DNA sequencing
2
Amplifying DNA sequences
  • Cloning for sequence 2-200kb
  • PCR for sequence

3
Cloning
4
PCR
5
PCR
6
PCR
7
PCR
8
Sequencing 1
9
Sequencing 2
10
Sequencing 3
11
Sequence Assembly
  • Direct sequence reading technique can only read a
    sequence up to 10k bases.
  • To obtain long sequence information, we need
    computers to assemble the short reads together.

12
Shortest common super string
  • Let s1, , sn be n strings. Find a common super
    string s that contains every si as a substring,
    and the length of s is minimized.
  • Example
  • alf ate half lethal alpha alfalfa ?
    lethalphalfalfate

13
The use of SCSS in assembly
14
Greedy approach
  • Find the two strings with maximum length of
    overlap, merge the two into one.
  • Continue step 1 until there is only one string
    left.
  • Example
  • alf ate half lethal alpha alfalfa
  • delete alfa
  • half alfalfa - halfalfa
  • lethal halfalfa - lethalfalfa
  • alpha ate - alphate
  • lethalfalfa alphate - lethalfalfalphate

15
Greedy does not give optimal solution
  • However, in a paper Blum et al 1994, JACM, the
    following result was proved.
  • If the shortest common superstring has length L,
    then the greedy will give you a common
    superstring with length
  • We say, the greedy algorithm is a ratio 4
    approximation to SCSS problem.

16
SCSS is NP hard
  • Shortest Common super string was proved to be
    NP-hard.
  • Great efforts have been done to improve the
    approximation ratio, 2.89, 2.81, 2.79, 2.75, 2.66
  • Ratio 2? Open question for 10 years.
  • However, for random cases, greedy works very
    well.
  • Unfortunately, for DNA assembly, real cases are
    not random ?
  • Many repeats. Not solvable.

17
Whole Genome Shotgun
  • The real application is more complicated than
    SCSS.
  • WGS (Whole Genome Shotgun)
  • Randomly clone short segments of the genome.
  • Sequence both ends of the clones
  • Merge the clones together (with additional
    information) to get the draft genome
  • Examine the gaps with other techniques

18
WGS1 Generate Reads
  • Libraries with fragment sizes of 2,4,6,10, 12 and
    40 kb were produced
  • Individual clones were chosen from each library
    and both ends of the clone were sequenced
  • mate pairs

19
WGS2 Contig Assembly
  • The reads. When sequence overlap was detected at
    the end of reads, the two reads can be merged
    into a single sequence.

20
WGS3 Building Supercontigs
  • Also called scaffolds.

21
WGS4 Placing the assembly on the genome
  • A sequence tag is a short sequence that is unique
    among the whole genome.
  • Genetic map contains many sequence tags and their
    locations.
  • Align the super contigs to the genome according
    to the tags.
About PowerShow.com