Mining RNA Tertiary Motifs with Structure Graphs - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Mining RNA Tertiary Motifs with Structure Graphs

Description:

Department of Electrical Engineering and Computer Science, University of Kansas. 3 ... Quantile-Quantile plot for first node in motif #12. 26. 26. Statistical Analysis ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 28
Provided by: ssi34
Category:

less

Transcript and Presenter's Notes

Title: Mining RNA Tertiary Motifs with Structure Graphs


1
Mining RNA Tertiary Motifs with Structure Graphs
  • Xueyi Wang
  • with
  • Jun Huan, Jack Snoeyink, and Wei Wang
  • Department of Computer Science, UNC-Chapel Hill
  • Department of Electrical Engineering and
    Computer Science, University of Kansas

2
Outline
  • RNA structure and RNA motif
  • Labeled graphs and frequent subgraph mining
  • Graph Modeling of RNA structure
  • Results

3
RNA World
  • RNA world hypothesis proposes that RNA was the
    first life-form on earth.
  • RNA has the ability to act as both gene and
    enzyme.
  • RNA has increasing industrial importance as a
    molecular tag and as an inhibitor.

From http//fig.cox.miami.edu/cmallery/
4
  • RNA
  • A sequence from 4 nucleotides (residues)A, C, G
    and U
  • Bases are similar
  • Protein
  • A sequence from 20 amino acids
  • Side-chains are quite different

5
RNA Structure
  • Large ribosome subunit
  • - Chain0 2914 residues
  • - Chain9 122 residues

6
Structure Motif
  • Structure motifs are geometric arrangements of
    residues that are common (frequent) to a group of
    different proteins/RNAs
  • Structure patterns are useful in
  • Structure alignment
  • Structure design
  • Prediction of protein/RNA interactions
  • Understanding protein/RNA folding
  • Drug design

7
RNA Motif
GCCGUU
  • Sequence motif
  • A fragment of RNA sequence
  • Secondary structure motif
  • RNA base pairing relations
  • Tertiary structure motif
  • Spatial interactions of nucleotides

8
Outline
  • RNA structure and RNA motif
  • Labeled graphs and frequent subgraph mining
  • Graph Modeling of RNA structure
  • Result

9
Labeled Graphs
  • A labeled graph is a graph where each node and
    each edge has a label
  • A graph database GD is a group of labeled graphs

10
Subgraph Mining
  • In a database GD, a graph G is subgraph
    isomorphic to another graph G0, denoted by G ?
    G0, if
  • there exists a 1-1 mapping from nodes in G to G0
    such that node labels, edges, and edge labels are
    preserved with the mapping.

11
Subgraph Mining
  • The support value of a subgraph P in graph
    database GD is the fraction of graphs in GD where
    P occurs.
  • Given a GD and a threshold ??(0,1, the frequent
    subgraph mining problem is the identification of
    all subgraphs that have support at least ?.

12
Examples
Maximal frequent subgraph with ? 1
Maximal frequent subgraph with ? 2/3
13
Fast Frequent Subgraph Mining (Huan et al.,
ICDM03)
  • Identify all frequently occurring subgraphs from
    a family of graphs
  • Depth-first search
  • Better memory utilization
  • Apriori property
  • Eliminate unnecessary isomorphism checks
  • Graph normalization
  • Avoid redundant examination
  • Subgraph isomorphism test is NP-complete
  • Incremental isomorphism check

14
Outline
  • RNA structure and RNA motif
  • Labeled graphs and frequent subgraph mining
  • Graph Modeling of RNA structure
  • Result

15
RNA Data
  • 20 tRNAs and 9 rRNAs (3 5s, 2 16s, and 4 23s)
  • Selected from NDB (Nucleic Acid Database)
  • No two structures have more than 70 of sequence
    similarity
  • All RNAs have more than 90 nucleotides present

16
RNA Graph
  • Use a labeled graph to represent a RNA structure
  • Nodes
  • Each node corresponds to a nucleotide
  • Each node is labeled as purine (A and G) or
    pyrimidine (C and U).
  • Edges
  • backbone edge connects two contiguous nucleotides
  • base pair edge connects two nucleotides recorded
    as a base pair in the NDB
  • contact edge connects spatial neighboring
    nucleotides within 8Å

17
RNA Graph (cont.)
  • Generate contact edges
  • each nucleotide is abstracted as two points, its
    phosphorus atom and the geometry center of its
    sugar ring
  • measure all four possible distances between the
    abstracted points in the two nucleotides
  • choose the smallest distance as the distance
    between them
  • discretize into distance bins.
  • 3 bin sizes are used for contact edges 3Å, 4Å,
    and 5Å

18
Experiments
  • Motif mining
  • ? 70 for rRNAs (i.e. motif must occur in 7 of
    the 9 graphs)
  • ? 20 for tRNAs (i.e. motif must occur in 4 of
    the 20 graphs), since tRNA is much smaller than
    rRNA
  • Results are compared to SCOR -- a comprehensive
    database of RNA motifs
  • Motif alignment
  • Differentiate the mined motifs as left or right
    hand
  • Generate consensus motifs

19
Outline
  • RNA structure and RNA motif
  • Labeled graphs and frequent subgraph mining
  • Graph Modeling of RNA structure
  • Result

20
Mined Tertiary Motif
  • For rRNAs, 37 of the 43 ribose zippers recorded
    in SCOR (86) for 23s rRNA 1s72

21
Mined Tertiary Motif
  • Ribose zippers found in 23s rRNA 1s72(43 ribose
    zippers identified in SCOR)
  • We identified all 5 sub-categories of ribose
    zippers defined in SCOR

22
Tertiary Motif Candidate
  • Tertiary interaction formed by a hydrogen bond
    between two sugars and a hydrogen bond between
    sugar and phosphorus

23
Motif Alignment
  • e.g. motif 12 of rRNA by bin size 4Å

Left hand occurrences
Right hand occurrences
24
Statistical Analysis
  • Quantile-Quantile plot for first node in motif
    12

25
Statistical Analysis
  • Histogram of R2 for all aligned positions

26
Conclusion
  • A novel method for identifying tertiary motifs
    from RNA molecules
  • Graph modeling of RNA molecules
  • Frequent subgraph mining algorithm to identify
    tertiary motifs
  • Motif alignment to identify left and right handed
    motifs and their consensus.
  • Identified tertiary motifs in SCOR and new
    candidates
  • Statistical analysis shows that the nodes in the
    aligned motifs follow 3D Gaussian Distribution

27
Acknowledgements and Future Work
  • Acknowledgements
  • NIH grant GM-074127
  • UNC BCB program
  • Future Work
  • Mining protein-RNA interaction motifs
Write a Comment
User Comments (0)
About PowerShow.com