Bioinformatics PhD. Course - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Bioinformatics PhD. Course

Description:

Which is the best alignment? How many alignments of two sequences exist? ... How can the best alignment be found? Gap: worst case. Mismatch: unfavorable. Match: ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 24
Provided by: lcl2
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics PhD. Course


1
Bioinformatics PhD. Course
Summary (approximate)
  • 1. Biological introduction
  • 2. Comparison of short sequences (lt10.000 bps)
  • 3 Comparison of large sequences (up to 250 000
    000)
  • 4 Sequence assembly
  • 5 Efficient data search structures and algorithms
  • 6 Proteins...

2
2. Comparison of short sequences (lt10.000 bps)
Summary (more or less)
  • 2.1 Dot matrix
  • 2.2 Pairwise alignment.
  • 2.3 Hash algorithms.
  • 2.4 Multiple alignment.

3
2.2 Pairwise alignment
Given two DNA sequences A (a1a2...an)
and B (b1b2...bm) from the alphabet a,c,t,g we
say that A and B from a,c,t,g,- are aligned
iff
  • A and B become A and B if gaps ( ) are
    removed.
  • AB
  • For all i, it is not possible that ai bi -

How many alignments of two sequences exist?
Which is the best alignment?
4
2.2 Number of alignments
Given two DNA sequences A (a1a2...an)
and B (b1b2...bm) there are
(a1a2...an ,b1b2...bm) (a1a2...an-1
,b1b2...bm) those that end with
(an,-) (a1a2...an ,b1b2...bm-1) those
that end with (-,bm) (a1a2...an-1
,b1b2...bm-1) those that end with (an,bm)
(a1,b1)
5
2.2 Number of alignments
Given two DNA sequences A (a1a2...an)
and B (b1b2...bm) there are
(a1a2...an ,b1b2...bm) (a1a2...an-1
,b1b2...bm) those that end with
(an,-) (a1a2...an ,b1b2...bm-1) those
that end with (-,bm) (a1a2...an-1
,b1b2...bm-1) those that end with (an,bm)
1
1
1
1
1 1 1
6
2.2 Number of alignments
Given two DNA sequences A (a1a2...an)
and B (b1b2...bm) there are
(a1a2...an ,b1b2...bm) (a1a2...an-1
,b1b2...bm) those that end with
(an,-) (a1a2...an ,b1b2...bm-1) those
that end with (-,bm) (a1a2...an-1
,b1b2...bm-1) those that end with (an,bm)
1
1
1
1
1 1 1
3
? ?
7
2.2 Number of alignments
Given two DNA sequences A (a1a2...an)
and B (b1b2...bm) there are
(a1a2...an ,b1b2...bm) (a1a2...an-1
,b1b2...bm) those that end with
(an,-) (a1a2...an ,b1b2...bm-1) those
that end with (-,bm) (a1a2...an-1
,b1b2...bm-1) those that end with (an,bm)
1
1
1
1
1 1 1
3
5 7
5 7
?
8
2.2 Number of alignments
Given two DNA sequences A (a1a2...an)
and B (b1b2...bm) then
(a1a2...an ,b1b2...bm) (a1a2...an-1
,b1b2...bm) those that end with ( an ,
-) (a1a2...an ,b1b2...bm-1) those
that end with ( - , bm) (a1a2...an-1
,b1b2...bm-1) those that end with ( an , bm)
1
1
1
1
1 1 1
3
5 7
  • 25
  • 25 63

5 7
But, what is the assymptotic value?
9
2.2 Assymptotic value
As
(a1a2...an ,b1b2...bm)
and
n! nn e-n (Stirling approximation)
then
(a1a2...an ,b1b2...bn) gt 22n
10
2.2 Best alignment
How can an alignment be scored?
catcactactgacgactatcgtagcgcggctatacatctacgccaa-
ctac-t-gtgtagatcgccgg c- tgactgc--acgactatcgt-
attgcggctacacactacgcacaactactgtatgtcgc-cgg----


Then we assign a score for each case, for example
1,-1,-2.
How can the best alignment be found?
11
2.2 Edit distance and alignment of strings
The best alignment of two strings
is related with the edit distance, first
discussed in 1966...
The most efficient algorithm was proposed in
1968 and in 1970
using the technique called Dynamic programming
12
2.2 Best alignment
C T A C T A C T A C G T A C T G A
13
2.2 Best alignment
C T A C T A C T A C G T A C T G A
14
2.2 Best alignment
C T A C T A C T A C G T A C T G A
The cell contains the score of the best
alignment of AC and
CTACT.
15
2.2 Best alignment
C T A C T A C T A C G T 0 A C T
G A
?
16
2.2 Best alignment
C T A C T A C T A C G T 0 -2 A C T
G A
?
- C
17
2.2 Best alignment
C T A C T A C T A C G T 0 -2 -4 A C
T G A
?
- - CT
18
2.2 Best alignment
C T A C T A C T A C G T 0 -2-4-6 -8
A C T G A
- - - - - - CTACTA
19
2.2 Best alignment
C T A C T A C T A C G T 0 -2-4-6 -8
A ? C ? T ? G A
20
2.2 Best alignment
C T A C T A C T A C G T 0 -2-4-6 -8
A-2 C-4 T -6 G A
ACT - - -
21
2.2 Best alignment
C T A C T A C T A C G T

A
C

T G A
C T A C T A C T A C G T 0 -2 -4-6 -8
A-2 C-4 T -6 G A
s(AC,CTA)-2
s(A,CTA)1
BA(AC,CTAC) best
s(AC,CTAC)max
s(A,CTAC)-2
22
Best alignment
Given the maximum score, how can the best
alignment be found?
  • Quadratic cost in space and time
  • Up to 10,000 bps sequences in length

23
2.2 Best alignment
  • Connect to
  • http//alggen.lsi.upc.es/docencia/ember/lepa/Tfc1.
    htm
  • and use the global method.
Write a Comment
User Comments (0)
About PowerShow.com