Introduction to Dynamic Programming - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Introduction to Dynamic Programming

Description:

CFTR (Cystic Fibrosis Transmembrane conductance Regulator) protein is acting in ... Those with cystic fibrosis are missing one single amino acid in their CFTR ... – PowerPoint PPT presentation

Number of Views:150
Avg rating:3.0/5.0
Slides: 64
Provided by: sophieda
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Dynamic Programming


1
Introduction to Dynamic Programming
  • Pair-wise Sequence Alignment Distance

2
Outline
  • DNA Sequence Comparison First Success Stories
  • Dynamic Programming
  • Change Problem
  • Manhattan Tourist Problem
  • Longest Paths in Graphs
  • Pair-wise Sequence Alignment
  • Edit Distance
  • Longest Common Subsequence (LCS) Problem

3
DNA Sequence Comparison First Success Story
  • Finding sequence similarities with genes of known
    function is a common approach to infer a newly
    sequenced genes function
  • In 1984 Russell Doolittle and colleagues found
    similarities between cancer-causing gene and
    normal growth factor (PDGF) gene

4
Cystic Fibrosis
  • Cystic fibrosis (CF) is a chronic and frequently
    fatal genetic disease of the body's mucus glands
    (abnormally high level of mucus in glands). CF
    primarily affects the respiratory systems in
    children.
  • Mucus is a slimy material that coats many
    epithelial surfaces and is secreted into fluids
    such as saliva
  • In 1980s biologists hypothesized that CF is an
    autosomal recessive disorder caused by mutations
    in a gene that remained unknown till 1989

5
Finding Similarities between the Cystic Fibrosis
Gene and ATP binding proteins
  • ATP binding proteins are present on cell membrane
    and act as transport channel
  • In 1989 biologists found similarity between the
    cystic fibrosis gene and ATP binding proteins
  • A plausible function for cystic fibrosis gene,
    given the fact that CF involves sweet secretion
    with abnormally high sodium level

6
Cystic Fibrosis and the CFTR Protein
  • CFTR (Cystic Fibrosis Transmembrane conductance
    Regulator) protein is acting in the cell membrane
    of some epithelial cells that secrete mucus
  • These special cells might line the airways of the
    nose, lungs, the stomach wall, etc.

7
Mechanism of Cystic Fibrosis
  • The CFTR protein (1480 amino acids) regulates a
    chloride ion channel
  • Adjusts the wateriness of fluids secreted by
    the cell
  • Those with cystic fibrosis are missing one single
    amino acid in their CFTR
  • Mucus ends up being too thick, affecting many
    organs

8
Cystic Fibrosis Finding the Gene
9
Cystic Fibrosis Mutation Analysis
  • If a high of cystic fibrosis (CF) patients have
    a certain mutation in the gene and the normal
    patients all have the wild type, then that could
    be an indicator of a mutation that is related to
    CF
  •  
  • A certain mutation was found in 70 of CF
    patients, convincing evidence that it is a
    predominant genetic diagnostics marker for CF

10
Cystic Fibrosis and CFTR Gene
The CF gene is on the long arm of chromosome 7
11
Bring in the Bioinformaticians
  • Gene similarities between two genes with known
    and unknown function alert biologists to some
    possibilities
  • Computing a similarity score between two genes
    tells how likely it is that they have similar
    functions
  • Dynamic programming is commonly used technique
    for pair-wise sequence alignment
  • The Change Problem and Manhattan Tourist Problem
    are good problems to introduce the idea of
    dynamic programming

12
The Change Problem
  • Specify the problem precisely
  • Goal Convert some amount of money M into given
    denominations, using the fewest possible number
    of coins
  • Input An amount of money M, and an array of d
    denominations c (c1, c2, , cd), in a
    decreasing order of value (c1 c2 cd)
  • Output A list of d integers i1, i2, , id such
    that
  • c1i1 c2i2 cdid M
  • and i1 i2 id is minimal

13
A Correct But VERY Slow Algorithm
  • BruteForceChange(M, c, d)
  • smallestNumberOfCoins 8
  • for each (i1, , id) from (0, , 0) to (M/c1, ,
    M/cd)
  • valueOfCoins S ikck
  • if valueOfCoins M
  • numberOfCoins S ik
  • If numberOfCoins
  • smallestNumberOfCoins numberOfCoins
  • bestChange (i1, i2, , id)
  • return (bestChange)

14
Change Problem Example
Given the denominations 1, 3, and 5, what is the
minimum number of coins needed to make change for
a given value?
Only one coin is needed to make change for the
values 1, 3, and 5
15
Change Problem Example (contd)
Given the denominations 1, 3, and 5, what is the
minimum number of coins needed to make change for
a given value?
1
2
3
4
5
6
7
8
9
10
Value
1
2
1
2
1
2
2
2
Min of coins
However, two coins are needed to make change for
the values 2, 4, 6, 8, and 10.
16
Change Problem Example (contd)
Given the denominations 1, 3, and 5, what is the
minimum number of coins needed to make change for
a given value?
1
2
3
4
5
6
7
8
9
10
Value
1
2
1
2
1
2
3
2
3
2
Min of coins
Lastly, three coins are needed to make change for
the values 7 and 9
17
Change Problem Recurrence
This example is expressed by the following
recurrence relation
18
Change Problem Recurrence (contd)
Given the denominations c c1, c2, , cd, the
recurrence relation is
19
Change Problem A Recursive Algorithm
  • RecursiveChange(M,c,d)
  • if M 0
  • return 0
  • bestNumCoins ? infinity
  • for i ? 1 to d
  • if M ci
  • numCoins ? RecursiveChange(M ci , c,
    d)
  • if numCoins 1
  • bestNumCoins ? numCoins 1
  • return bestNumCoins

20
RecursiveChange Is Not Efficient
  • It recalculates the optimal coin combination for
    a given amount of money repeatedly
  • i.e., M 77, c (1,3,7)
  • Optimal coin combo for 70 cents is computed 9
    times!

21
The RecursiveChange Tree
77
74
76
70
75
73
69
73
71
67
69
67
63
74
72
68
68
66
62
70
68
64
68
66
62
62
60
56
72
70
66
72
70
66
66
64
60
66
64
60
. . .
. . .
70
70
70
70
70
22
We Can Do Better
  • Were re-computing values in our algorithm more
    than once
  • Save results of each computation for 0 to M
    (memo-ization)
  • This way, we can do a referrence call to find an
    already computed value, instead of re-computing
    each time
  • Running time Md, where M is the value of money
    and d is the number of denominations .

23
DPChange Example
0
0
1
2
3
4
5
6
0
0
1
2
1
2
3
2
0
1
0
1
2
3
4
5
6
7
0
1
0
1
2
1
2
3
2
1
0
1
2
0
1
2
0
1
2
3
4
5
6
7
8
0
1
2
3
0
1
2
1
2
3
2
1
2
0
1
2
1
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
0
1
2
1
2
3
2
1
2
3
0
1
2
1
2
c (1,3,7)M 9
0
1
2
3
4
5
0
1
2
1
2
3
24
The Change Problem Dynamic Programming
  • DPChange(M,c,d)
  • bestNumCoins0 ? 0
  • for m ? 1 to M
  • bestNumCoins ? infinity
  • for i ? 1 to d
  • if m ci
  • if bestNumCoinsm ci 1bestNumCoinsm
  • bestNumCoinsm ? bestNumCoinsm ci
    1
  • return bestNumCoinsM

What is the complexity of this algorithm?
25
Dynamic Programming
  • Basic idea solve an instance of a problem by
    taking advantage of computed solutions for
    smaller subparts of the problem
  • Initializing from smallest cases
  • Caching subproblem solutions (memoization) rather
    than recomputing them
  • Assume an recursive relation between the current
    problem and its smaller subparts

26
Manhattan Tourist Problem (MTP)
Imagine seeking a path (from source to sink) to
travel (only eastward and southward) with the
most number of attractions () in the Manhattan
grid
Source











Sink
27
Manhattan Tourist Problem Formulation
Goal Find the longest path in a weighted grid.
Input A weighted grid G with two distinct
vertices, one labeled source and the other
labeled sink
Output A longest path in G from source to
sink
28
MTP An Example
0
1
2
3
4
j coordinates
source
3
2
4
0
9
5
3
0
0
1
0
4
3
2
2
3
2
4
13
1
1
6
5
4
2
0
7
3
4
19
15
2
i coordinates
4
5
2
4
1
3
3
0
2
3
20
3
8
5
6
5
sink
2
1
3
2
23
4
29
MTP Greedy Algorithm Is Not Optimal
1
2
5
source
3
10
5
5
2
5
1
3
5
3
1
4
2
3
promising start, but leads to bad choices!
5
0
2
0
22
0
0
0
sink
18
30
MTP Recurrence
Computing the score for a point (i,j) by the
recurrence relation
31
MTP Simple Recursive Program
  • MTP(n,m)
  • x ? MT(n-1,m)
  • length of the edge from (n-
    1,m) to (n,m)
  • y ? MT(n,m-1)
  • length of the edge from
    (n,m-1) to (n,m)
  • return max x, y
  • Whats wrong with this approach?

32
MTP Dynamic Programming
j
0
1
source
1
0
1
S0,1 1
i
5
1
5
S1,0 5
  • Calculate optimal path for each vertex in the
    graph
  • Each vertexs score is the maximum of the prior
    vertices score plus the weight of the respective
    edge in between

33
MTP Dynamic Programming (contd)
j
0
1
2
source
1
2
0
1
3
S0,2 3
i
5
3
-5
1
5
4
S1,1 4
3
2
8
S2,0 8
34
MTP Dynamic Programming (contd)
j
0
1
2
3
source
1
2
5
0
1
3
8
S3,0 8
i
5
10
3
1
-5
1
5
4
13
S1,2 13
3
5
-5
2
8
9
S2,1 9
0
3
8
S3,0 8
35
MTP Dynamic Programming (contd)
j
0
1
2
3
source
1
2
5
0
1
3
8
i
5
3
10
-5
-5
1
-5
1
5
4
13
8
S1,3 8
3
5
-3
3
-5
2
8
9
12
S2,2 12
0
0
0
3
8
9
S3,1 9
greedy alg. fails!
36
MTP Dynamic Programming (contd)
j
0
1
2
3
source
1
2
5
0
1
3
8
i
5
3
10
-5
-5
1
-5
1
5
4
13
8
3
5
-3
2
3
3
-5
2
8
9
12
15
S2,3 15
0
0
-5
0
0
3
8
9
9
S3,2 9
37
MTP Dynamic Programming (contd)
j
0
1
2
3
source
1
2
5
0
1
3
8
Done!
i
5
3
10
-5
-5
1
-5
1
5
4
13
8
(showing all back-traces)
3
5
-3
2
3
3
-5
2
8
9
12
15
0
0
-5
1
0
0
0
3
8
9
9
16
S3,3 16
38
The Dynamic-Programming Manhattan Tourist
Algorithm
  • ManhattanTourist(wd, wr, n, m)
  • 1 s(0,0) ? 0
  • 2 for i ? 1 to n
  • 3 s(i, 0) ? s(i - 1, 0) wd (i, 0)
  • 4 for j ? 1 to m
  • 5 s(0, j ) ? s(0, j - 1) wr (0, j )
  • 6 for i ? 1 to n
  • 7 for j ? 1 to m
  • 8 s(i, j ) ? max s(i - 1, j )
    wd(i, j ), s(i, j - 1) wr (i, j )
  • 9 return s(n, m )

Whats the complexity of this algorithm?
the running time is O(n x m) for a n by m grid
39
Manhattan Is Not A Perfect Grid
What about diagonals?
  • The score at point B is given by

40
DAG Directed Acyclic Graph
  • Since Manhattan is not a perfect regular grid, we
    may represent it as a DAG
  • DAG for Dressing in the morning problem

41
Topological ordering
  • 2 different topological orderings of the DAG

42
Longest Path in DAG Problem
  • Goal Find a longest path between two vertices in
    a weighted DAG
  • Input A weighted DAG G with source and sink
    vertices
  • Output A longest path in G from source to sink

43
Manhattan Is Not A Perfect Grid (contd)
Computing the score for point x is given by the
recurrence relation
  • Predecessors (x) set of vertices that have
    edges leading to x
  • The running time for a DAG G(V, E) (V is
    the set of all vertices and E is the set of all
    edges) is O(E) since each edge is evaluated
    once

44
Aligning Sequences without Insertions and
Deletions Hamming Distance
Given two DNA sequences V and W
V
W
  • The Hamming distance dH(V, W) 8 is large
    but the sequences are very similar

45
Aligning Sequences with Insertions and Deletions
However, by shifting one sequence over one
position
V
--
W
--
  • Using Hamming distance neglects insertions and
    deletions in DNA
  • The edit distance d(v, w) 2.

46
Edit Distance
  • Levenshtein (1966) introduced edit distance of
    two strings as the minimum number of elementary
    operations (insertions, deletions, and
    substitutions) to transform one string into the
    other
  • d(v,w) MIN no. of elementary operations
  • to transform v ? w

47
Edit Distance (contd)
ith letter of v compare with ith letter of w
V - ATATATAT
V ATATATAT
Just one shift
Make it all line up
W TATATATA
W TATATATA
Edit distance d(v, w) 2 (one insertion and
one deletion)
48
Edit Distance Example
  • 5 edit operations TGCATAT ? ATCCGAT
  • TGCATAT ? (delete last T)
  • TGCATA ? (delete last A)
  • TGCAT ? (insert A at front)
  • ATGCAT ? (substitute C for 3rd G)
  • ATCCAT ? (insert G before last A)
  • ATCCGAT (Done)

-TG-CATAT ATCCGAT--
49
Edit Distance Example (contd)
  • 4 edit operations TGCATAT ? ATCCGAT
  • TGCATAT ? (insert A at front)
  • ATGCATAT ? (delete 6th T)
  • ATGCAAT ? (substitute G for 5th A)
  • ATGCGAT ? (substitute C for 3rd G)
  • ATCCGAT (Done)

-TGCATAT ATCCG-AT
50
Alignment 2 row representation
Given 2 DNA sequences v and w
v
m 7
w
n 6
Alignment 2 k matrix ( k m, n ) that is
optimal
letters of v
A
T
--
G
T
A
T
--
T
letters of w
A
T
C
G
--
A
--
C
T
5 matches
2 insertions
2 deletions
51
The Alignment Grid
  • 2 sequences used for grid
  • V ATGTTAT
  • W ATCGTAC
  • Every alignment path is from source to sink
  • Look for the path with the optimal score.
    Definition of sc?

52
Longest Common Subsequence (LCS) Problem
  • Given two sequences v v1, v2, , vm and w
    w1, w2, , wn
  • The LCS of v and w is a sequence of positions
    in
  • v 1
  • and a sequence of positions in
  • w 1
  • such that vit wjt, and 1

What is the score function here?
53
LCS Example
i coords
elements of v
elements of w
j coords
(0,0)?
(1,0)?
(2,1)?
(2,2)?
(3,3)?
(3,4)?
(4,5)?
(5,5)?
(6,6)?
(7,6)?
(8,7)
positions in v
2 Matches shown in red
positions in w
1 The LCS Problem can be expressed using the grid
similar to Manhattan Tourist Problem grid
54
Edit Graph for LCS Problem
A
T
C
T
G
A
T
C
55
LCS Dynamic Programming
  • Find the LCS of two strings
  • Input A weighted graph G with two distinct
    vertices, one labeled source one labeled sink
  • Output A longest path in G from source to
    sink
  • Solve using an LCS edit graph with diagonals
    replaced with 1 edges if they correspond to
    matches other edges have weight 0.

56
Computing LCS
Let vi prefix of v of length i v1
vi and wj prefix of w of length j w1 wj
The length of LCS(vi,wj) is computed by
57
Align Two Strings
  • Given the strings of DNA
  • v ATGTTAT
  • w ATCGTAC
  • One Possible Alignment of the strings
  • AT_GTTAT_
    ATCGT_A_C

LCS score 5. However, is this the optimal
alignment?
58
Dynamic Programming Example
  • There are no matches in the beginning of the
    sequence
  • Label column i1 to be all zero, and row j1 to
    be all zero

0
0
0
0
0
0
0
0
59
Dynamic Programming Example
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
?value from NW 1, if vi wj ? value from North
(top) ? value from West (left)
1
1
1
1
1
1
60
Alignment Backtracking
  • Arrows show where the score
    originated from.
  • if from the top
  • if from the left
  • if vi wj

61
Backtracking Example
Find a match in row and column 2. i2, j2,5 is
a match (T). j2, i4,5,7 is
a match (T). Since vi wj, S(i,j) Si-1,j-1
1 S(2,2) S(1,1) 1 1 S(2,5) S(1,4)
1 1 S(4,2) S(3,1) 1 1 S(5,2) S(4,1)
1 1 S(7,2) S(6,1) 1 1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
2
2
2
2
2
2
1
2
1
2
1
2
1
2
1
2
62
Backtracking Example
0
0
0
0
0
0
0
0
Continuing with the scoring algorithm gives this
result.
1
1
1
1
1
1
1
1
2
2
2
2
2
2
1
2
2
3
3
3
3
1
2
2
3
4
4
4
1
2
2
3
4
4
4
1
2
2
3
4
5
5
1
2
2
3
4
5
5
63
Now What?
  • LCS(v,w) created the alignment grid
  • Now we need a way to read the best alignment of v
    and w
  • Follow the arrows backwards from sink

64
The LCS Problem
  • The previous example was a solution to the
    Longest Common Subsequence (LCS) problemthe
    simplest form of a sequence similarity analysis.
  • To solve the alignment we eliminate mismatches
    and allow only insertions and deletions.
Write a Comment
User Comments (0)
About PowerShow.com