Velvet:%20Algorithms%20for%20De%20Novo%20Short%20Assembly%20Using%20De%20Bruijn%20Graphs - PowerPoint PPT Presentation

About This Presentation
Title:

Velvet:%20Algorithms%20for%20De%20Novo%20Short%20Assembly%20Using%20De%20Bruijn%20Graphs

Description:

Reverse series of reverse complement k-mers. Overlap between reads from ... Each k-mer is recorded with reverse complement. Node is created if there is distinct ... – PowerPoint PPT presentation

Number of Views:626
Avg rating:3.0/5.0
Slides: 23
Provided by: mod480
Category:

less

Transcript and Presenter's Notes

Title: Velvet:%20Algorithms%20for%20De%20Novo%20Short%20Assembly%20Using%20De%20Bruijn%20Graphs


1
Velvet Algorithms for De Novo Short Assembly
Using De Bruijn Graphs
  • March 12, 2008
  • Daniel R. Zerbino and Ewan Birney
  • Presenter Seunghak Lee

2
What is de Bruijn Graphs?
  • De Bruijn graph is a directed graph
  • An edge represents overlap between sequences of
    symbols
  • V(s1, s2, , sm)
  • E(v1,v2,, vn),(w1,w2,,wn))v2w1,v3w2, ,
    vnwn-1

3
Introduction
  • New sequencing techniques are commercially
    available (e.g. 454 Sequencing, Solexa)
  • 454 Sequencing 100 200bp
  • Solexa 30bp
  • Algorithms whole genome shotgun (WGS) assembly
    are not suitable for short reads
  • Overlap graph with a node per read is extremely
    large
  • More ambiguous connections in assembly

4
Introduction (cont)
  • Euler assembler (Pevzner 2001) used k-mer for a
    node of de Bruijn graphs
  • Reads are mapped as a path through the de Brujin
    graph
  • High redundancy does not affect the number of
    nodes
  • Velvet effectively deals with experimental
    errors and repeats by using Brujin graphs with
    k-mers

5
De Bruijn Graphs - structure
  • Structure

6
De Bruijn Graphs structure (cont)
  • Adjacent k-mers overlap by k-1 nucleotides
  • Each node is attached to twin node
  • Reverse series of reverse complement k-mers
  • Overlap between reads from opposite strand
  • Union of a node and its twin node is called a
    block
  • Last k-mer overlaps with the first of
  • its destination

7
De Bruijn Graphs construction (cont)
  • Construction
  • Reads are hashed with predefined k-mer length
  • Small k-mer ? increase connectivity
  • ? more ambiguous repeats
  • Large k-mer ? increase specificity
  • ? decrease connectivity
  • Determine k considering sensitivity and
    specificity

8
De Bruijn Graphs construction (cont)
  • For each k-mer, hash table records ID of the
    first read and its position
  • Each k-mer is recorded with reverse complement
  • Node is created if there is distinct
  • interruption points
  • Reads are traced through the graph
  • Create a directed arc if necessary

9
De Bruijn Graphs simplification
  • Simplify the chains of blocks
  • No information loss
  • If node A has only one outgoing arc to node B,
  • and if node B has only one ingoing arc ?
    merge

A
B
10
De Bruijn Graphs error removal
  • Velvet focuses on topological features of the
    graph
  • First step remove tips
  • Tip chain of nodes disconnected on one end
  • Use two criteria (1) length and (2) minority
    count
  • Length remove a tip if lt 2k bp
  • since two nearby errors can create a
    tip up to 2k bp

error
error
k
k
11
De Bruijn Graphs error removal (cont)
  • Minority count multiplicity m lt n
  • Starting from node B, going through the tip is an
    alternative to a more common path

m
B
A
tip
C
n
12
De Bruijn Graphs error removal (cont)
  • Second step remove bubbles using Tour Bus
  • Redundant paths start and end at the same nodes
  • Bubbles are created by errors or biological
    variants such as SNP

Bubble
13
De Bruijn Graphs error removal (cont)
Tour Bus
  1. Detect redundant paths

2. Compare them using dynamic
programming methods
3. If similar, merge them
14
De Bruijn Graphs error removal (cont)
  • Third step remove erroneous connections
  • Remove erroneous connections after Tour Bus
    algorithm
  • Remove erroneous connections with basic coverage
  • cutoff
  • Genuine short nodes which cannot be simplified in
    the graph should have high coverage

15
Breadcrumb resolution of repeats
  • Using read pairs, pair up the long nodes
  • Flag paired reads using unambiguous long nodes

unambiguous long nodes
16
Breadcrumb resolution of repeats
  • Using read pairs, pair up the long nodes
  • Flag paired reads using unambiguous long nodes

unambiguous long nodes
17
Breadcrumb resolution of repeats
  • Extends the nodes as far as possible using
    flagged paired reads
  • All nodes between A and B are paired up to
  • either A or B

18
Experimental Results
  • Test error removal pipeline on simulated data
  • Simulate reads are from E. coli, S. cerevisiae,
  • C.elegans, and H. sapiens
  • Coverage density vs N50 for H. sapiens
  • Limited by natural repetition of the reference
    genome

Ideal
Error (1)
SNP
N50
19
Experimental Results (cont)
  • Test error removal pipeline on experimental data
  • 173,428 bp human BAC was sequenced using Solexa
    machines
  • Reads were 35bp long, and k31
  • Tour Bus increased sensitivity by correcting
    errors and
  • preserved the integrity of the graph structure

20
Experimental Results (cont)
21
Experimental Results (cont)
22
Conclusions
  • Velvet is a de Bruijn graph based sequence
    assembly method for short reads
  • Errors are handled by removing tips and Tour Bus
    algorithm
  • A large number of repeats are resolved by
    Breadcrumb algorithm
  • Velvet was assessed using simulated and real
    datasets and it performed well
Write a Comment
User Comments (0)
About PowerShow.com