Suffix trees - PowerPoint PPT Presentation

About This Presentation
Title:

Suffix trees

Description:

Dr. Xavier Messeguer. http://www.lsi.upc.es/~alggen. Saltar a la primera p gina ... Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab? ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 16
Provided by: lcl53
Learn more at: https://www.cs.upc.edu
Category:
Tags: suffix | trees | xavier

less

Transcript and Presenter's Notes

Title: Suffix trees


1
Suffix trees
  • ALGGEN Algorithmics and genetics group
  • Dep. Llenguatges i Sistemes Informàtics
  • Universitat Politècnica de Catalunya

Dr. Xavier Messeguer
http//www.lsi.upc.es/alggen
2
Suffix trees
Given string ababaas
Suffixes
3 abaas
1 ababaas
4 baas
2 babaas
What kind of queries?
3
Queries on Suffix trees
  • Does the sequence ababaas contain any ocurrence
    of patterns abab, aab, and ab?

  • Find repeats within the sequence ababaas.


What about MUMs?
4
Search for MUMs
Given strings ababaabs and aabaat
1st Bottom-up traversal
(Through the tree)
List of UM aab,abaa,baa.
2nd Search for maximals
(through the list of UM)
MUMs aab,abaa.
5
Suffix tree implementation
Given sequence ababaas
E.Ukkonen implementation MUMER, MGA
MALGEN implementation
On-line linear insertion algorithm!
6
Meaning of suffix-links
?
a?
7
Suffix links
Given Suffix tree of ababaas
8
Search for MUMs
Given Suffix tree of ababaas
and the sequence
b b a b b a a a b a a ...
  • bab

9
Search for MUMs
Given Suffix tree of ababaas
and the sequence
b b a b b a a a b a a ...
  • bab

Quadratic cost!
  • baa
  • aa
  • aa

10
Search for MUMs
Linear cost!
Quadratic cost!
  • abaa

11
Search for MUMs
Linear cost
Quadratic cost!
  • baa
  • aa

Two improvements
  • Decrease quadratic cost!
  • Useless candidates!

12
Search for MUMs
Given Suffix tree of ababaas
and the sequence
b b a b b a a a b a a ...
Linear cost
  • bab

Quadratic cost!
  • abaa
  • baa
  • aa

Two improvements
  • Decrease quadratic cost!

?
  • Useless candidates!

13
Tsuffix tree
Given Suffix tree of ababaas
and the sequence
b b a b b a a a b a a ...
Linear cost
  • bab
  • abaa

Quadratic cost!
Two improvments
?
  • Decrease quadratic cost!

?
  • Useless candidates!

14
Searching MUMs on-line
Number of the leaf
Length of the MUM
First character into the second sequence
2, 3(bab), 2
b b a b b a a a b a b b
15
Searching MUMs on-line
4, 3(baa), 5
16
Searching MUMs on-line
4, 3(baa), 5
5, 2(aa), 6
17
Searching MUMs on-line
4, 3(baa), 5
5, 2(baa), 6
1, 4(abab), 8
18
Methodology for a preview with two genomes
  • Construct the TSuffix of the first genome
  • Search the MUMs respect to the other genome

Construction of TSuffix tree
Reading the second sequence
Only one TSuffix tree
-50
What about more genomes?
19
Computational and biological background (3)
Chlamydophila pneumoniae AR39 1.247420bps Chlamyd
ia pneumoniae 1.247.805 Chlamidia muridarum
1.084.689bps Chlamidia trachomatis1057413bps
?
?
?
?
?
?
?
?
20
Alignment revisited
Pyrococcus abyssis 1.790.334 Pyrococcus
horikoshu 1.763.341 bps
Write a Comment
User Comments (0)
About PowerShow.com