Dynamic Programming contd - PowerPoint PPT Presentation

Loading...

PPT – Dynamic Programming contd PowerPoint presentation | free to download - id: 2cfc6-MDAxY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Dynamic Programming contd

Description:

Begins by selecting either all putative exons between potential acceptor and ... Then find the chain of exons such that the sequence similarity to the target ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 43
Provided by: Saurab5
Learn more at: http://www.cs.uiuc.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Dynamic Programming contd


1
Dynamic Programming (contd)
  • CS 498 SS
  • Saurabh Sinha

2
Spliced Alignment
  • Begins by selecting either all putative exons
    between potential acceptor and donor sites or by
    finding all substrings similar to the target
    protein (as in the Exon Chaining Problem).
  • This set is further filtered in a such a way that
    attempt to retain all true exons, with some false
    ones.
  • Then find the chain of exons such that the
    sequence similarity to the target protein
    sequence is maximized

3
Spliced Alignment Problem Formulation
  • Input Genomic sequences G, target sequence T,
    and a set of candidate exons (blocks) B.
  • Output A chain of exons G such that the global
    alignment score between G and T is maximized
  • G - concatenation of all exons from chain G

4
The DAG
  • Vertices One vertex for each block in B
  • Directed edge connecting non-overlapping blocks
  • Label of vertex string of block it represents
  • A path through the DAG spells out the string
    obtained by concatenating that particular chain
    of blocks
  • Weight of a path is the score of the optimal
    alignment between the string it spells out and
    the target sequence

5
Dynamic programming
  • Genomic sequence G g1g2gn
  • Target sequence T t1t2tm
  • As usual, we want to find the optimal alignment
    score of the i-prefix of G and the j-prefix of T
  • Problem is, there are many i-prefixes possible
    (since multiple blocks may include position i)

6
Idea
  • Find the optimal alignment score of the i-prefix
    of G and the j-prefix of T assuming that this
    alignment uses a particular block B at position i
  • S(i, j, B)
  • For every block B that includes i

7
Recurrence
  • If i is not the starting vertex of block B
  • S(i, j, B)
  • max S(i 1, j, B) indel penalty
  • S(i, j 1, B) indel penalty
  • S(i 1, j 1, B) d(gi, tj)
  • If i is the starting vertex of block B
  • S(i, j, B)
  • max S(i, j 1, B) indel penalty
  • maxall blocks B preceding block B S(end(B), j,
    B) indel penalty
  • maxall blocks B preceding block B S(end(B), j
    1, B) d(gi, tj)

8
RNA secondary structure prediction
9
RNA
  • RNA is similar to DNA chemically. It is usually
    only a single strand. T(hyamine) is replaced by
    U(racil)
  • Some forms of RNA can form secondary structures
    by pairing up with itself. This can have
    change its properties dramatically.
  • DNA and RNA
  • can pair with
  • each other.

http//www.cgl.ucsf.edu/home/glasfeld/tutorial/trn
a/trna.gif
tRNA linear and 3D view
10
RNA
  • Theres more to RNA than mRNA
  • RNA can adopt interesting non-linear structures,
    and catalyze reactions
  • tRNAs (transfer RNAs) are the adapters that
    implement translation

11
Secondary structure
  • Several interesting RNAs have a conserved
    secondary structure (resulting from base-pairing
    interactions)
  • Sometimes, the sequence itself may not be
    conserved for the function to be retained
  • It is important to tell what the secondary
    structure is going to be, for homology detection

12
Conserved secondary structure
N-Y A A N-N N-N R N-N N-N N-N
N-N N-N / N
Consensus binding site for R17 phage
coat protein. N A/C/G/U, N is a
complementary base pairing to N, Y is C/U, R is
A/G Source DEKM
13
Basics of secondary structure
  • G-C pairing three bonds (strong)
  • A-U pairing two bonds (weaker)
  • Base pairs are approximately coplanar

14
Basics of secondary structure
15
Basics of secondary structure
  • G-C pairing three bonds (strong)
  • A-U pairing two bonds (weaker)
  • Base pairs are approximately coplanar
  • Base pairs are stacked onto other base pairs
    (arranged side by side) stems

16
Secondary structure elements
loop at the end of a stem
stem loop
on both sides of stem
single stranded bases within a stem
only on one side of stem
Loop single stranded subsequences bounded by
base pairs
17
Non-canonical base pairs
  • G-C and A-U are the canonical base pairs
  • G-U is also possible, almost as stable

18
Nesting
  • Base pairs almost always occur in a nested
    fashion
  • If positions i and j are paired, and positions i
    and j are paired, then these two base-pairings
    are said to be nested if
  • i
  • Non-nested base pairing pseudoknot

19
Pseudoknot
(9, 18) (2, 11) NOT NESTED
9
18
11
2
20
Pseudoknot problems
  • Pseudoknots are not handled by the algorithms we
    shall see
  • Pseudoknots do occur in many important RNAs
  • But the total number of pseudoknotted base pairs
    is typically relatively small

21
Secondary structure predictionApproach 1.
  • Find the secondary structure with most base
    pairs.
  • Nussinovs algorithm
  • Recursive finds best structure for small
    subsequences, and works its way outwards to
    larger subsequences

22
Nussinovs algorithm idea
  • There are only four possible ways of getting the
    best structure for subsequence (i,j) from the
    best structures of the smaller subsequences
  • (1) Add unpaired position i onto best structure
    for subsequence (i1,j)

i1
j
i
23
Nussinovs algorithm idea
  • There are only four possible ways of getting the
    best structure for subsequence (i,j) from the
    best structures of the smaller subsequences
  • (2) Add unpaired position j onto best structure
    for subsequence (i,j-1)

j-1
i
j
24
Nussinovs algorithm idea
  • There are only four possible ways of getting the
    best structure for subsequence (i,j) from the
    best structures of the smaller subsequences
  • (3) Add (i,j) pair onto best structure for
    subsequence (i1,j-1)

i1
j-1
j
i
25
Nussinovs algorithm idea
  • There are only four possible ways of getting the
    best structure for subsequence (i,j) from the
    best structures of the smaller subsequences
  • (4)Combine two optimal substructures (i,k) and
    (k1,j)

i
k
k1
j
26
Nussinov RNA folding algorithm
  • Given a sequence s of length L with symbols s1
    sL. Let ?(i,j) 1 if si and sj are a
    complementary base pair, and 0 otherwise.
  • We recursively calculate scores g(i,j) which are
    the maximal number of base pairs that can be
    formed for subsequence sisj.
  • Dynamic programming

27
Recursion
  • Starting with all subsequences of length 2, to
    length L
  • g(i,j) max of
  • g(i1, j)
  • g(i,j-1)
  • g(i1,j-1) ?(i,j)
  • maxi
  • Initialization
  • g(i,i-1) 0
  • g(i,i) 0

O(n2) ?
No. O(n3)
28
Traceback
  • As usual in sequence alignment ?
  • Optimal sequence alignment is a linear path in
    the dynamic programming table
  • Optimal secondary structure can have
    bifurcations
  • Traceback uses a pushdown stack

29
Traceback
Push (1,L) onto stack Repeat until stack is
empty pop (i,j) if i j continue else if
g(i1,j) g(i,j) push (i1,j) else if g(i,j-1)
g(i,j) push (i,j-1) else if g(i1,j-1) ?(i,j)
g(i,j) record (i,j) base pair push
(i1,j-1) else for k i1 to j-1, if
g(i,k)g(k1,j) g(i,j) push (k1,j) push
(i,k) break (for loop)
30
Secondary structure predictionApproach 2
  • Based on minimization of ?G, the equilibrium free
    energy, rather than maximization of number of
    base pairs
  • Better fit to real (experimental) ?G
  • Energy of stem is sum of stacking contributions
    from the interface between neighboring base pairs

31
Source DEKM
U U A A G-C G-C A G-C U-A A-U
C-G A-U A A
4nt loop 5.9
-1.1 terminal mismatch of hairpin
-2.9 stack
1nt bulge 3.3
-2.9 stack (special case)
-1.8 stack
-0.9 stack
-1.8 stack
-2.1 stack
dangle -0.3
  • Neighboring base pairs stack
  • Single bulges OK in stacking
  • Longer bulges no stacking term
  • Loop destabilisation energy
  • Loop terminal mismatch energy

32
hairpin loop exactly one base pair internal
loop exactly two base pairs bulge internal loop
with one base from each base pair
being adjacent multibranched loop 2 base pairs
in a loop, one base pair is closest to ends of
RNA. this is the exterior or closing base
pair all other base pairs are interior
Source Martin Tompas lecture notes
33
Energy contributions
  • eS(i,j) Free energy of stacked pair (i,j) and
    (i1,j-1)
  • eH(i,j) Free energy of a loop closed by (i,j)
    depends on length of loop, bases at i,j, and
    bases adjacent to them
  • eL(i,j,i,j) Free energy of an internal loop or
    bulge, with (i,j) and (i,j) being the bordering
    base pairs. Depends on bases at these positions,
    and unpaired bases adjacent to them
  • eM(i,j,i1,j1,ik,jk) Free energy of a
    multibranch loop with (i,j) as the closing base
    pair and i1j1 etc as the internal base pairs

34
Zukers algorithmDynamic programming
  • W(j) FE of optimal structure of s1..j
  • V(i,j) FE of optimal structure of si..j
    assuming i,j form a base pair
  • VBI(i,j) FE of optimal structure of si..j
    assuming i,j closes a bulge or internal loop
  • VM(i,j) FE of optimal structure of si..j
    assuming i,j closes a multibranch loop
  • WM(i,j) used to compute VM

35
Dynamic programming recurrences
  • W(j) FE of optimal structure of s1..j
  • W(0) 0
  • W(j) min( W(j-1),
  • min1

36
Dynamic programming recurrences
  • V(i,j) FE of optimal structure of si..j
    assuming i,j form a base pair
  • V(i,j) infinity if i j
  • V(i,j) min( eH(i,j),
  • eS(i,j) V(i1,j-1),
  • VBI(i,j),
  • VM(i,j)) if i

i,j is exterior base pair of a hairpin loop
37
Dynamic programming recurrences
  • V(i,j) FE of optimal structure of si..j
    assuming i,j form a base pair
  • V(i,j) infinity if i j
  • V(i,j) min( eH(i,j),
  • eS(i,j) V(i1,j-1),
  • VBI(i,j),
  • VM(i,j)) if i

i,j is exterior pair of a stacked pair. i1,j-1
is therefore a pair too.
38
Dynamic programming recurrences
  • V(i,j) FE of optimal structure of si..j
    assuming i,j form a base pair
  • V(i,j) infinity if i j
  • V(i,j) min( eH(i,j),
  • eS(i,j) V(i1,j-1),
  • VBI(i,j),
  • VM(i,j)) if i

i,j is exterior pair of a bulge or interior loop
39
Dynamic programming recurrences
  • V(i,j) FE of optimal structure of si..j
    assuming i,j form a base pair
  • V(i,j) infinity if i j
  • V(i,j) min( eH(i,j),
  • eS(i,j) V(i1,j-1),
  • VBI(i,j),
  • VM(i,j)) if i

i,j is exterior pair of a multibranch loop
40
Dynamic programming recurrences
  • VBI(i,j) FE of optimal structure of si..j
    assuming i,j closes a bulge or internal loop
  • VBI(i,j) min (eL(i,j,i,j) V(i,j))
  • Slow !

i,j i
Energy of the bulge
41
Dynamic programming recurrences
  • VM(i,j) FE of optimal structure of si..j
    assuming i,j closes a multibranch loop
  • VM(i,j) min (eM(i,j,i1,j1,..,ik,jk)
    ?hV(ih,jh))
  • Very slow !

k2 i1,j1,ik,jk
Energy of the loop itself
42
Order of computation
  • What order to fill the DP table in ?
  • Increasing order of (j-i)
  • VBI(i,j) and VM(i,j) before V(i,j)
About PowerShow.com