Loading...

PPT – Dynamic Programming contd PowerPoint presentation | free to download - id: 2cfc6-MDAxY

The Adobe Flash plugin is needed to view this content

Dynamic Programming (contd)

- CS 498 SS
- Saurabh Sinha

Spliced Alignment

- Begins by selecting either all putative exons

between potential acceptor and donor sites or by

finding all substrings similar to the target

protein (as in the Exon Chaining Problem). - This set is further filtered in a such a way that

attempt to retain all true exons, with some false

ones. - Then find the chain of exons such that the

sequence similarity to the target protein

sequence is maximized

Spliced Alignment Problem Formulation

- Input Genomic sequences G, target sequence T,

and a set of candidate exons (blocks) B. - Output A chain of exons G such that the global

alignment score between G and T is maximized - G - concatenation of all exons from chain G

The DAG

- Vertices One vertex for each block in B
- Directed edge connecting non-overlapping blocks
- Label of vertex string of block it represents
- A path through the DAG spells out the string

obtained by concatenating that particular chain

of blocks - Weight of a path is the score of the optimal

alignment between the string it spells out and

the target sequence

Dynamic programming

- Genomic sequence G g1g2gn
- Target sequence T t1t2tm
- As usual, we want to find the optimal alignment

score of the i-prefix of G and the j-prefix of T - Problem is, there are many i-prefixes possible

(since multiple blocks may include position i)

Idea

- Find the optimal alignment score of the i-prefix

of G and the j-prefix of T assuming that this

alignment uses a particular block B at position i - S(i, j, B)
- For every block B that includes i

Recurrence

- If i is not the starting vertex of block B
- S(i, j, B)
- max S(i 1, j, B) indel penalty
- S(i, j 1, B) indel penalty
- S(i 1, j 1, B) d(gi, tj)
- If i is the starting vertex of block B
- S(i, j, B)
- max S(i, j 1, B) indel penalty
- maxall blocks B preceding block B S(end(B), j,

B) indel penalty - maxall blocks B preceding block B S(end(B), j

1, B) d(gi, tj)

RNA secondary structure prediction

RNA

- RNA is similar to DNA chemically. It is usually

only a single strand. T(hyamine) is replaced by

U(racil) - Some forms of RNA can form secondary structures

by pairing up with itself. This can have

change its properties dramatically. - DNA and RNA
- can pair with
- each other.

http//www.cgl.ucsf.edu/home/glasfeld/tutorial/trn

a/trna.gif

tRNA linear and 3D view

RNA

- Theres more to RNA than mRNA
- RNA can adopt interesting non-linear structures,

and catalyze reactions - tRNAs (transfer RNAs) are the adapters that

implement translation

Secondary structure

- Several interesting RNAs have a conserved

secondary structure (resulting from base-pairing

interactions) - Sometimes, the sequence itself may not be

conserved for the function to be retained - It is important to tell what the secondary

structure is going to be, for homology detection

Conserved secondary structure

N-Y A A N-N N-N R N-N N-N N-N

N-N N-N / N

Consensus binding site for R17 phage

coat protein. N A/C/G/U, N is a

complementary base pairing to N, Y is C/U, R is

A/G Source DEKM

Basics of secondary structure

- G-C pairing three bonds (strong)
- A-U pairing two bonds (weaker)
- Base pairs are approximately coplanar

Basics of secondary structure

Basics of secondary structure

- G-C pairing three bonds (strong)
- A-U pairing two bonds (weaker)
- Base pairs are approximately coplanar
- Base pairs are stacked onto other base pairs

(arranged side by side) stems

Secondary structure elements

loop at the end of a stem

stem loop

on both sides of stem

single stranded bases within a stem

only on one side of stem

Loop single stranded subsequences bounded by

base pairs

Non-canonical base pairs

- G-C and A-U are the canonical base pairs
- G-U is also possible, almost as stable

Nesting

- Base pairs almost always occur in a nested

fashion - If positions i and j are paired, and positions i

and j are paired, then these two base-pairings

are said to be nested if - i
- Non-nested base pairing pseudoknot

Pseudoknot

(9, 18) (2, 11) NOT NESTED

9

18

11

2

Pseudoknot problems

- Pseudoknots are not handled by the algorithms we

shall see - Pseudoknots do occur in many important RNAs
- But the total number of pseudoknotted base pairs

is typically relatively small

Secondary structure predictionApproach 1.

- Find the secondary structure with most base

pairs. - Nussinovs algorithm
- Recursive finds best structure for small

subsequences, and works its way outwards to

larger subsequences

Nussinovs algorithm idea

- There are only four possible ways of getting the

best structure for subsequence (i,j) from the

best structures of the smaller subsequences - (1) Add unpaired position i onto best structure

for subsequence (i1,j)

i1

j

i

Nussinovs algorithm idea

- There are only four possible ways of getting the

best structure for subsequence (i,j) from the

best structures of the smaller subsequences - (2) Add unpaired position j onto best structure

for subsequence (i,j-1)

j-1

i

j

Nussinovs algorithm idea

- There are only four possible ways of getting the

best structure for subsequence (i,j) from the

best structures of the smaller subsequences - (3) Add (i,j) pair onto best structure for

subsequence (i1,j-1)

i1

j-1

j

i

Nussinovs algorithm idea

- There are only four possible ways of getting the

best structure for subsequence (i,j) from the

best structures of the smaller subsequences - (4)Combine two optimal substructures (i,k) and

(k1,j)

i

k

k1

j

Nussinov RNA folding algorithm

- Given a sequence s of length L with symbols s1

sL. Let ?(i,j) 1 if si and sj are a

complementary base pair, and 0 otherwise. - We recursively calculate scores g(i,j) which are

the maximal number of base pairs that can be

formed for subsequence sisj. - Dynamic programming

Recursion

- Starting with all subsequences of length 2, to

length L - g(i,j) max of
- g(i1, j)
- g(i,j-1)
- g(i1,j-1) ?(i,j)
- maxi
- Initialization
- g(i,i-1) 0
- g(i,i) 0

O(n2) ?

No. O(n3)

Traceback

- As usual in sequence alignment ?
- Optimal sequence alignment is a linear path in

the dynamic programming table - Optimal secondary structure can have

bifurcations - Traceback uses a pushdown stack

Traceback

Push (1,L) onto stack Repeat until stack is

empty pop (i,j) if i j continue else if

g(i1,j) g(i,j) push (i1,j) else if g(i,j-1)

g(i,j) push (i,j-1) else if g(i1,j-1) ?(i,j)

g(i,j) record (i,j) base pair push

(i1,j-1) else for k i1 to j-1, if

g(i,k)g(k1,j) g(i,j) push (k1,j) push

(i,k) break (for loop)

Secondary structure predictionApproach 2

- Based on minimization of ?G, the equilibrium free

energy, rather than maximization of number of

base pairs - Better fit to real (experimental) ?G
- Energy of stem is sum of stacking contributions

from the interface between neighboring base pairs

Source DEKM

U U A A G-C G-C A G-C U-A A-U

C-G A-U A A

4nt loop 5.9

-1.1 terminal mismatch of hairpin

-2.9 stack

1nt bulge 3.3

-2.9 stack (special case)

-1.8 stack

-0.9 stack

-1.8 stack

-2.1 stack

dangle -0.3

- Neighboring base pairs stack
- Single bulges OK in stacking
- Longer bulges no stacking term
- Loop destabilisation energy
- Loop terminal mismatch energy

hairpin loop exactly one base pair internal

loop exactly two base pairs bulge internal loop

with one base from each base pair

being adjacent multibranched loop 2 base pairs

in a loop, one base pair is closest to ends of

RNA. this is the exterior or closing base

pair all other base pairs are interior

Source Martin Tompas lecture notes

Energy contributions

- eS(i,j) Free energy of stacked pair (i,j) and

(i1,j-1) - eH(i,j) Free energy of a loop closed by (i,j)

depends on length of loop, bases at i,j, and

bases adjacent to them - eL(i,j,i,j) Free energy of an internal loop or

bulge, with (i,j) and (i,j) being the bordering

base pairs. Depends on bases at these positions,

and unpaired bases adjacent to them - eM(i,j,i1,j1,ik,jk) Free energy of a

multibranch loop with (i,j) as the closing base

pair and i1j1 etc as the internal base pairs

Zukers algorithmDynamic programming

- W(j) FE of optimal structure of s1..j
- V(i,j) FE of optimal structure of si..j

assuming i,j form a base pair - VBI(i,j) FE of optimal structure of si..j

assuming i,j closes a bulge or internal loop - VM(i,j) FE of optimal structure of si..j

assuming i,j closes a multibranch loop - WM(i,j) used to compute VM

Dynamic programming recurrences

- W(j) FE of optimal structure of s1..j
- W(0) 0
- W(j) min( W(j-1),
- min1

Dynamic programming recurrences

- V(i,j) FE of optimal structure of si..j

assuming i,j form a base pair - V(i,j) infinity if i j
- V(i,j) min( eH(i,j),
- eS(i,j) V(i1,j-1),
- VBI(i,j),
- VM(i,j)) if i

i,j is exterior base pair of a hairpin loop

Dynamic programming recurrences

- V(i,j) FE of optimal structure of si..j

assuming i,j form a base pair - V(i,j) infinity if i j
- V(i,j) min( eH(i,j),
- eS(i,j) V(i1,j-1),
- VBI(i,j),
- VM(i,j)) if i

i,j is exterior pair of a stacked pair. i1,j-1

is therefore a pair too.

Dynamic programming recurrences

- V(i,j) FE of optimal structure of si..j

assuming i,j form a base pair - V(i,j) infinity if i j
- V(i,j) min( eH(i,j),
- eS(i,j) V(i1,j-1),
- VBI(i,j),
- VM(i,j)) if i

i,j is exterior pair of a bulge or interior loop

Dynamic programming recurrences

- V(i,j) FE of optimal structure of si..j

assuming i,j form a base pair - V(i,j) infinity if i j
- V(i,j) min( eH(i,j),
- eS(i,j) V(i1,j-1),
- VBI(i,j),
- VM(i,j)) if i

i,j is exterior pair of a multibranch loop

Dynamic programming recurrences

- VBI(i,j) FE of optimal structure of si..j

assuming i,j closes a bulge or internal loop - VBI(i,j) min (eL(i,j,i,j) V(i,j))
- Slow !

i,j i

Energy of the bulge

Dynamic programming recurrences

- VM(i,j) FE of optimal structure of si..j

assuming i,j closes a multibranch loop - VM(i,j) min (eM(i,j,i1,j1,..,ik,jk)

?hV(ih,jh)) - Very slow !

k2 i1,j1,ik,jk

Energy of the loop itself

Order of computation

- What order to fill the DP table in ?
- Increasing order of (j-i)
- VBI(i,j) and VM(i,j) before V(i,j)