Algorithms for Searching RNA Motifs in Genomic DNA Sequences - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Algorithms for Searching RNA Motifs in Genomic DNA Sequences

Description:

... (left-to-right) : If loop-segment size D of two located substructures satisfies Dmin D Dmax, then integrate them to form a more complicated substructure (SP, EP) ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 18
Provided by: eric303
Category:

less

Transcript and Presenter's Notes

Title: Algorithms for Searching RNA Motifs in Genomic DNA Sequences


1
Algorithms for Searching RNA Motifs in Genomic
DNA Sequences
  • JCIS 2005 PRESENTATION
  • Jingping Liu
  • Bin Ma, Kaizhong Zhang

University of Western Ontario, Computer Science
Department July, 2005
2
Outline
  • Motivations
  • RNA Structures
  • Definitions Notations
  • Tree Representation Algorithms
  • Experimental Results
  • Conclusions Further Work

3
Motivations
  • Bio-molecule structures are invaluable in
    endeavors such as creating new drugs and
    understanding genetic diseases.
  • Computational algorithms for genome analysis are
    desirable. Fichants tRNAscan, Pavesis
    EufindtRNA are popular tools - - Only for
    computing tRNA!
  • Recently, Zhangs HomoStRscan for detecting RNA
    has been introduced. Designing algorithms
    functioning as a filter of Zhangs approach is
    one of our primary motivations.

4
RNA Structures
  • RNA Primary structure is a sequence of bases
    (i.e., ? A, C, G, U).
  • RNA Secondary structure is a set of base pairs
    (i.e., A-U, C-G, G-U, and vice versa). Assume
    (1) no base takes part in more than one base
    pair (2) Base pairs never cross.



A
G
C
U
G
C
U
G
U
A
C
G
U
A
A
A
(a)
(b)
5
Example of RNA Secondary Structure
Loop-Segment
5'
  • The paired region is called stem
  • The unpaired region is called loop-segment.
  • The stem-loop consists of a stem surmounted by a
    loop-segment (a.k.a. hairpin loop).

3'
Stem
Stem-Loop
6
Problem Statement
RNA Structure
In a given genomic sequence, efficiently
determine candidate segments that can
potentially form RNA secondary structures
similar to a given RNA secondary structure.
Genomic Sequence


A
G
C
U
G
C
U
G
U
A
C
G
U
A
A
A
p
q
Candidate Segment
One sequence may form many different secondary
structures!
7
Definitions Notations
  • In a stem, the number of base pairs is called
    stem size, denoted by SM.
  • In a loop-segment, the number of unpaired bases
    is called loop-segment size, denoted by D.
  • Hairpin size the number of unpaired bases in a
    hairpin loop, denoted by H.

U
G
U
C
Hairpin Loop
U
U
C
G
U
A
C
G
(H 4)
Loop-segment
A
U
(D 6)
U
A
Stem
G
C
(SM 4)
8
Definitions Notations (contd)
  • Similar Stem ?SM/2? SMx (SM ?SM/2?)
  • Similar Loop-segment ?D/2? Dx (D ?D/2?)

(1) Similar Stems
C
G
C
G
U
A
C
G
C
G
A
U
A
U
G
C
A
U
A
U
U
A
U
A
U
A
U
A
2
3
G
C
G
C
G
C
G
C
A
U
SM4
5
U
A
6
(2) Similar Loop-segments
U
U
C
G
U
A
U
C
G
U
A
U
U
C
G
U
A
C
U
A
6
2
D4
5
3
9
Tree Representation
  • Leaf node -- represents stem-loop, and contains
    necessary values. e.g. Hmin, Hmax of hairpin
    loop.
  • Internal node (including root) -- represents
    stem, and contains necessary values. e.g. SMmin,
    SMmax of stem, and Dmin, Dmax of loop segments.

10
Example of Tree Representation
3
A
C
C
5
A
(1) RNA
1
72
G
C
G
C
C
G
(2) Tree
U
A
G
C
n4
G
C
65
G
C
G
10
G
A
U
U
A
C
G
C
C
C
A
A
A
U
G
C
C
49
U
G
U
G
A
G
A
n3
n1
n2
U
28
44
A
A
C
G
G
C
A
U
A
C
C
26
G
A
U
U
U
U

C
G
U
A
C
G
A
U
C
C
U
U
A
G
A
11
Bottom-Up Approach
  • Leaf node (stem-loops) using SEARCH
  • 2. Internal node (stem) with degree d
  • Integrate step
  • Extend step
  • Repeat integrate and extend steps until root.
  • Reduce the number of candidates by using
  • additional biological features.

n4
RNA Tree
n3
n1
n2
12
SEARCH Algorithm
  • SEARCH Based on corresponding min and max values
    of sizes, search for one potential stem-loop/stem
    in the given genomic sequence S.
  • Given an index-pair (i, j), let i go left and j
    go right to search each possible stem-loop/stem
    in S by looking for consecutive base pairs
    (i.e., A-U, C-G and G-U, and vice versa) until no
    further such base pair.

SMmax
i
j
13
Integrate and Extend Steps
  • INTEGRATE (left-to-right) If loop-segment size
    D of two located substructures satisfies Dmin ? D
    ? Dmax, then integrate them to form a more
    complicated substructure (SP, EP).
  • EXTEND Use SEARCH algorithm, and start at (SP,
    EP) to compute the extended stem, such that each
    size satisfies corresponding min and max values.

1
(2) Tree
(1) RNA
D12
integrate steps step 1, and 2.
D41
SP
2
n4
EP
D23
4
D34
3
n3
n1
n2
14
mSearch Experimental Results
tRNA
5S rRNA
Sequence
NCBI
mSearch
Sequence
NCBI
mSearch
Mycoplasma Genitalium ( 0.4M)
Staphylococcus MW2 (partial 0.4M)
992 ( 5s)
36
1986 ( 5s)
3
Helicobacter Pylori ( 1.7M)
Escherichia Coli K12 (partial 0.4M)
1349 ( 5s)
4
36
1640 ( 20s)
Experimental results indicate that our mSearch
tool can locate all true tRNAs and 5S rRNAs in
the experimented sequences - - zero false
negative !
15
Conclusions Further Work
  • Theoretical and Practical aspects
  • Develop efficient pattern matching algorithms
  • Search RNA motifs in genomic sequences.
  • Our algorithms are suitable for searching any
    type of RNA.
  • Currently, a web-based mSearch tool is under
    construction.
  • Add biological constrains to further reduce the
    number of candidates.

16
References
  • D. Gusfield. Algorithms on strings, trees, and
    sequences. Computer Science and Computational
    Biology. Cambridge University Press, 143-148
    1997.
  • G. Mauri, and G. Pavesi. Pattern Discovery in RNA
    Secondary Structures Using Affix Trees. CMP 278 -
    294, 2003.
  • K. Zhang, S. Le, and J. Maizel. An Algorithm for
    Detecting homologues of Known Structured RNAs in
    Genomes. IEEE Computational Systems, CBS 2004.
  • K. Zhang, B. MA, and L.Wang. Computing
    Similarity between RNA Structures. Theor. Comput.
    Sci. 276(1-2) 111-132 2002.
  • M. Zuker. The Use of Dynamic Programming
    Algorithms in RNA Secondary Structure Prediction,
    in Mathematical Methods for DNA Sequences. CRC
    Press, Inc.,Boca Raton, 1989, 159.

17
QUESTION ?
THANK YOU
Write a Comment
User Comments (0)
About PowerShow.com