# Bioinformatics PhD' Course - PowerPoint PPT Presentation

Title:

## Bioinformatics PhD' Course

Description:

### abba,4. ba,7. a,8. a,9. Construction of the suffix tree of ababaabbaaabaa : ... abba,4. ba,7. a,8. a,9. ab. aa ,1. Generalizad suffix tree. a. ba,5. b. a. bba, ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 110
Provided by: lcl2
Category:
Tags:
Transcript and Presenter's Notes

Title: Bioinformatics PhD' Course

1
Bioinformatics PhD. Course
1. Biological introduction
2. Comparison of short sequences ( up to
10.000bps)
Dot Matrix Pairwise align.
Multiple align. Hash alg.
3. Comparison of large sequences ( more that
10.000bps)
Data structures Suffix trees MUMs
4. String matching
Exact Extended Approximate
5. Sequence assembly
6. Projects PROMO, MREPATT,
2
Comparison of large sequences
First part Alignment of large sequences
3
Dynamic programming
acc.................................agt
.................................xx acc.........
........................a--
• Quadratic cost of space and time.
• Quadratic cost of space and time.
• Short sequences (up to 10.000 bps) can be
aligned using dynamic programming

4
Genomic sequences
• Genomic sequences have millions of base pairs.
• The length of sequences is 1000 times longer.

In which case Dynamic Programming can be applied?
5
First assumption
6
Realistic assumption?
Unrealistic assumption!
More realistic assumption
7
Realistic assumptions?
Unrealistic assumption!
More realistic assumption
But, now is it a real case?
8
Preview in a real case
Chlamidia muridarum 1.084.689bps Chlamidia
Thrachomatis1057413bps
9
Preview in a real case
Pyrococcus abyssis 1.790.334 bps Pyrococcus
horikoshu 1.763.341 bps
?
?
10
Methodology of an alignment
Identify the portions that can be aligned.
(Linear cost)
(Linear cost)
11
Methodology of an alignment
?
(Linear cost)
12
Preview-Revisited
Matching
Unique
Maximal
Connect to MALGEN
13
Methodology of an alignment
Linear cost with Suffix trees
How can MUMs be found?
Identify the portions that can be aligned.
How can these portions be determined?
With CLUSTALW, TCOFFEE,
14
Comparison of large sequences
M-GCAT Todd Treangen
15
Homework
• Javier 14. Alexis
• Dmitry 15. Ramon
• Ana Iris
• David
• Patricia
• Rogeli
• Atif
• Aina
• Isaac
• Maria Merce
• Romina
• Guillem
• Raul

16
Bioinformatics PhD. Course
Second part Introducing Suffix trees
17
Suffix trees
Given string ababaas
Suffixes
3 abaas
1 ababaas
4 baas
2 babaas
What kind of queries?
18
Applications of Suffix trees
1. Exact string matching
• Does the sequence ababaas contain any ocurrence
of patterns abab, aab, and ab?

19
Invariant Properties
Given the string ......
...
P1 the leaves of suffixes from ? have been
inserted
20
Given the string ababaabbs
21
Given the string ababaabbs
ababaabbs,1
22
Given the string ababaabbs
ababaabbs,1
babaabbs,2
23
Given the string ababaabbs
babaabbs,2
24
Given the string ababaabbs
babaabbs,2
25
Given the string ababaabbs
26
Given the string ababaabbs
ba
baabbs,2
27
Given the string ababaabbs
ba
baabbs,2
28
Given the string ababaabbs
ba
baabbs,2
29
Given the string ababaabbs
ba
ba
baabbs,2
30
Given the string ababaabbs
ba
baabbs,2
31
Given the string ababaabbs
ba
baabbs,2
32
Given the string ababaabbs
33
Given the string ababaabbs
34
Given the string ababaabbs
35
The suffix tree of many strings
is called the generalized suffix tree
and it is the suffix tree of the concatenation
of strings.
For instance,
36
Construction of the suffix tree of
ababaabbaaabaaß
Given the suffix tree of ababaaba
37
Construction of the suffix tree of
ababaabbaaabaaß
38
Construction of the suffix tree of
ababaabbaaabaaß
ab
a
ba,5
39
Construction of the suffix tree of
ababaabbaaabaaß
ab
a
ba,5
40
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
ab
a
ba,5
b
a
bba,3
a
baabba,1
41
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
ab
a
ba,5
b
a
bba,3
a
baabba,1
42
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
ab
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
a
a
bba,4
baabba,2
43
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
ab
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
a
a
bba,4
baabba,2
44
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
a
b
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
ß,3
a
a
bba,4
baabba,2
45
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
a
b
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
ß,3
a
a
bba,4
baabba,2
46
Construction of the suffix tree of
ababaabbaaabaaß
ß,5
ß,4
aaß,1
a
b
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
ß,3
a
a
bba,4
baabba,2
47
Construction of the suffix tree of
ababaabbaaabaaß
ß,5
ß,4
aaß,1
a
b
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
ß,3
a
a
bba,4
baabba,2
48
Construction of the suffix tree of
ababaabbaaabaaß
ß,5
ß,4
aaß,1
ß,6
a
b
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
ß,3
a
a
bba,4
baabba,2
49
Generalized suffix tree of ababaabbaaabaaß
50
Applications of Suffix trees
1. Exact string matching
• Does the sequence ababaas contain any ocurrence
of patterns abab, aab, and ab?

51
Applications of Suffix trees
2. The substring problem for a database of
strings DB
• Does the DB contain any ocurrence of patterns
abab, aab, and ab?

52
Applications of Suffix trees
3. The longest common substring of two strings
53
Applications of Suffix trees
4. Finding the maximal repeats.
54
Applications of Suffix trees
5. Finding MUMs.
55
Bioinformatics PhD. Course
56
57
58
?
59
?
60
?
61
?
62
?
63
?
64
?
65
66
67
Given S2 a a b a a
68
Given S2 a a b a a
69
Unique matchings
Given S2 a a b a a
aa in S2 1
70
Unique matchings
Given S2 a a b a a
aa in S2 1
aab in S2 1
S15..6-7 in S2 1
71
Unique matchings
Given S2 a a b a a
S15..6-7 in S2 1
72
Unique matchings
Given S2 a a b a a
S15..6-7 in S2 1
73
Unique matchings
Given S2 a a b a a b b a
S15..6-7 in S2 1
S13..6- in S2 2
74
Unique matchings
Given S2 a a b a a b b a
S15..6-7 in S2 1
S13..6- in S2 2
75
Unique matchings
Given S2 a a b a a b b a
S15..6-7 in S2 1
S13..6- in S2 2
76
Unique matchings
Given S2 a a b a a b b a
S15..6-7 in S2 1
S13..6- in S2 2
77
Unique matchings
Given S2 a a b a a b b a
S15..6-7 in S2 1
S13..6-8 in S2 2
S14..6-8 in S2 3
78
Unique matchings
Given S2 a a b a a b b a
S15..8 in S2 4
S13..6-8 in S2 2
S14..6-8 in S2 3
S16..8 in S2 5
S17..8 in S2 6
79
From UMs to MUMs
Unique matchings
Given S2 a a b a a b b a
S15..8 in S2 4
and S1 a b a b a a b b a
S13..6-8 in S2 2
S14..6-8 in S2 3
Array of UMs
S16..8 in S2 5
1 2 3 6-8 4 6-8 5 8 6 8 7 8 8 9
S17..8 in S2 6
MUM S13..6-8 in S22
80
Bioinformatics PhD. Course
Third part Linear insertion algorithm
81
Invariant Properties
Given the string ......
...
P1 the leaves of suffixes from ? have been
inserted
82
Linear insertion algorithm
Invariant Properties
Given the string ......
P1 the leaves of suffixes from ? have been
inserted
P2 the string ? is the longest string that can
be spelt through the tree.
83
Linear insertion algorithm example
Given the string ababaababb...
84
Linear insertion algorithm example
Given the string ababaababb...
6 7 8
85
Linear insertion algorithm example
?
Given the string ababaababb...
6 7 8
?
86
Linear insertion algorithm example
?
Given the string ababaababb...
6 7 89
?
87
Linear insertion algorithm example
88
Linear insertion algorithm example
89
Linear insertion algorithm example
90
Linear insertion algorithm example
ababb...,5
ababb...,3
ba
ba
ababb...,4
baababb...,2
91
Linear insertion algorithm example
ababb...,5
ababb...,3
ba
ba
ababb...,4
b
aababb...,2
baababb...,2
baababb...,2
92
Linear insertion algorithm example
?
Given the string ababaababb...
7 8
?
ababb...,5
ababb...,3
ba
ba
ababb...,4
baababb...,2
93
Linear insertion algorithm example
?
Given the string ababaababb...
89
?
ababb...,5
ababb...,3
ba
ba
ababb...,4
94
Linear insertion algorithm example
?
Given the string ababaababb...
89
?
ababb...,5
ababb...,3
ba
ba
ababb...,4
95
Linear insertion algorithm example
?
Given the string ababaababb...
89
?
ababb...,5
ababb...,3
ba
ba
ababb...,4
96
Linear insertion algorithm example
?
Given the string ababaababb...
89
?
ababb...,5
ababb...,3
ba
ba
ababb...,4
97
Linear insertion algorithm example
?
Given the string ababaababb...
89
?
ababb...,5
a
b
ba
ababb...,4
b
aababb...,2
b...,7
98
Linear insertion algorithm example
?
Given the string ababaababb...
89
?
ababb...,5
a
b
b...,8
ba
ababb...,4
b
aababb...,2
b...,7
99
Linear insertion algorithm example
Given the string ababaababb...
9
?
ababb...,5
a
b
b...,8
ba
ababb...,4
b
aababb...,2
b...,7
100
Linear insertion algorithm example
Given the string ababaababb...
9
?
ababb...,5
a
b
b...,8
ba
ababb...,4
b
aababb...,2
b...,7
101
Linear insertion algorithm example
Given the string ababaababb...
9
?
ababb...,5
a
b
b...,8
a
b
ababb...,4
b
aababb...,2
b...,7
102
Linear insertion algorithm example
Given the string ababaababb...
9
?
ababb...,5
a
b
b...,8
a
b
ababb...,4
b...,9
b
aababb...,2
b...,7
103
Linear insertion algorithm example
Given the string ababaababb...
9
?
ababb...,5
a
b
b...,8
a
b
ababb...,4
b...,9
b
aababb...,2
b...,7
104
Linear insertion algorithm example
Given the string ababaababb...
9
?
ababb...,5
a
b
b...,8
a
b
ababb...,4
b...,9
b
aababb...,2
b...,7
105
Linear insertion algorithm example
?
Given the string ababaababb...
9
?
ababb...,5
a
b
b...,8
a
b
ababb...,4
b...,9
b
aababb...,2
b...,7
106
Index
Suffix arrays Suffix-arrays a new method for
on-line string searches, G. Myers, U.
Manber
107
Suffix arrays
Given string ababaa
1 ababaa
Suffixes
but lexicographically sorted
2 babaa
1
3 abaa
6 a
4 baa
5 aa
3 abaa
1 ababaa
4 baa
2 babaa
Which is the cost?
O(n log(n))
108
Applications of suffix arrays
1. Exact string matching
• Does the sequence ababaas contain any ocurrence
of patterns abab, aab, and ab?

Binary search
which is the cost?
O(log(n) P)
Can it be improved to
O(log(n)P) ?
109
Fast search with cost O(log(n)P)
Invariant Properties
110
Fast search with cost O(log(n)P)
Invariant Properties
Algorithm