Title: Bioinformatics PhD. Course
1Bioinformatics PhD. Course
Summary (approximate)
- 1. Biological introduction
- 2. Comparison of short sequences (lt10.000 bps)
- 3 Comparison of large sequences (up to 250 000
000)
- 5 Efficient data search structures and algorithms
23. Comparison of large sequences
Summary (more or less)
- 3.1 Overview
- 3.2 Suffix trees
- 3.3 MUMs
3Suffix trees
Algorithms on strings, trees and sequences, Dan
Gusfield Cambridge University Press http//seque
nce.rutgers.edu/st/
4Suffix trees
Given string ababaas
Suffixes
3 abaas
1 ababaas
4 baas
2 babaas
What kind of queries can we do?
5Applications of Suffix trees
1. Exact string matching
- Does the sequence ababaas contain any ocurrence
of the patterns abab, aab, and ab?
6Applications of Suffix trees
2. Finding the repeats within a sequence.
7Queries on Suffix trees
- Does the sequence ababaas contain any ocurrence
of patterns abab, aab, and ab?
- Find repeats within the sequence ababaas.
8Quadratic Insertion algorithm
Given the string ababaabbs
9Quadratic Insertion algorithm
Given the string ababaabbs
ababaabbs,1
10Quadratic Insertion algorithm
Given the string ababaabbs
ababaabbs,1
babaabbs,2
11Quadratic Insertion algorithm
Given the string ababaabbs
babaabbs,2
12Quadratic Insertion algorithm
Given the string ababaabbs
babaabbs,2
13Quadratic Insertion algorithm
Given the string ababaabbs
14Quadratic Insertion algorithm
Given the string ababaabbs
ba
baabbs,2
15Quadratic Insertion algorithm
Given the string ababaabbs
ba
baabbs,2
16Quadratic Insertion algorithm
Given the string ababaabbs
ba
baabbs,2
17Quadratic Insertion algorithm
Given the string ababaabbs
ba
ba
baabbs,2
18Quadratic Insertion algorithm
Given the string ababaabbs
ba
baabbs,2
19Quadratic Insertion algorithm
Given the string ababaabbs
ba
baabbs,2
20Quadratic Insertion algorithm
Given the string ababaabbs
21Quadratic Insertion algorithm
Given the string ababaabbs
22Quadratic Insertion algorithm
Given the string ababaabbs
23Generalizad suffix tree
A suffix tree of many strings
is called a generalized suffix tree
and is the suffix tree of the concatenation of
strings.
For instance,
24Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
Given the suffix tree of ababaaba
25Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
26Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
ab
a
ba,5
27Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
ab
a
ba,5
28Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
ab
a
ba,5
b
a
bba,3
a
baabba,1
29Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
ab
a
ba,5
b
a
bba,3
a
baabba,1
30Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
ab
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
a
a
bba,4
baabba,2
31Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
ab
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
a
a
bba,4
baabba,2
32Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
a
b
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
ß,3
a
a
bba,4
baabba,2
33Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
a
b
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
ß,3
a
a
bba,4
baabba,2
34Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
ß,4
ß,4
aaß,1
a
b
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
ß,3
a
a
bba,4
baabba,2
35Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
ß,4
ß,4
aaß,1
a
b
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
ß,3
a
a
bba,4
baabba,2
36Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
ß,4
ß,4
aaß,1
ß,4
a
b
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
ß,3
a
a
bba,4
baabba,2
37Generalizad suffix tree
Generalized suffix tree of ababaabbaaabaaß
What kind of queries can we do?
38Applications of Suffix trees
1. The substring problem for a database of
patterns DB
- Does the DB contain any ocurrence of patterns
abab, aab, and ab?
39Applications of Suffix trees
2. The longest common substring of two strings
40Applications of Suffix trees
3. Finding MUMs.
41Linear Insertion algorithm
Given the string ......
P1 the leaves of suffixes from ? have been
inserted
P2 the string ? is the longest string that can
be spelt through the tree.
42Insertion algorithm example
Given the string ababaababb...
43Linear Insertion algorithm
Given the string ......
P1 the leaves of suffixes from ? have been
inserted
P2 the string ? is the longest string that
P3 there is a pointer,called suffix pointer
between any node and its longest no proper suffix
node.
44Insertion algorithm example
45Insertion algorithm example
46Insertion algorithm example
47Insertion algorithm example
48Insertion algorithm example
49Insertion algorithm example
?
Given the string ababaababb...
8
?
ababb...,5
ababb...,3
ba
ba
ababb...,4
50Insertion algorithm improving time
we have pointed to the following nodes
51Insertion algorithm improving time
we have pointed to the following nodes
ba
baababb...,1
ba
baababb...,2
52Suffix tree implementationsuffix-links
Given sequence ababaas
?
a?
53Suffix links
Given Suffix tree of ababaas
54Insertion algorithm
Given the string ababaabbs
55Insertion algorithm
Given the string ababaabbs
56Insertion algorithm
Given the string ababaabbs
57Insertion algorithm
Given the string ababaabbs
58Insertion algorithm
Given the string ababaabbs
babaabbs,2
59Insertion algorithm
Given the string ababaabbs
60Insertion algorithm
Given the string ababaabbs
61Insertion algorithm
Given the string ababaabbs
62Insertion algorithm
Given the string ababaabbs
baabbs,1
63Insertion algorithm
Given the string ababaabbs
64Insertion algorithm
Given the string ababaabbs
65Insertion algorithm
Given the string ababaabbs
66Insertion algorithm
Given the string ababaabbs
67Insertion algorithm
Given the string ababaabbs
68Insertion algorithm
Given the string ababaabbs