RNA secondary structure prediction - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

RNA secondary structure prediction

Description:

Pseudo-knots: Cause problems to ordinary RNA folding. algorithms. ... with the maximal number of base pairs under the pseudo-knot exclusion constraint. ... – PowerPoint PPT presentation

Number of Views:576
Avg rating:3.0/5.0
Slides: 21
Provided by: isrecI
Category:

less

Transcript and Presenter's Notes

Title: RNA secondary structure prediction


1
RNA secondary structure prediction
  • Introduction
  • Examples of RNA molecules
  • Secondary structure elements
  • Pseudo-knots
  • RNA folding
  • Nussinov algorithm
  • Energy minimization
  • Covariance analysis
  • RNA secondary structure motifs
  • Examples, biological function
  • RNA secondary structure patterns

2
Basics about RNA (for computer scientists)
  • RNA initially synthesized as co-linear copy of
    DNA
  • U replaces T (however, U represented as T in
    nucleotide database entries)
  • RNA may undergo splicing and other
    post-transcriptional modifications
  • Two major RNA classes in cellular organisms
  • messenger RNA (mRNA) templates for protein
    synthesis
  • structural and catalytic RNAs
  • The genome of many viruses (e.g. HIV) consists of
    RNA
  • RNA is usually single-stranded (exception a few
    viral genomes)
  • RNA folds back onto itself to form short
    base-paired regions
  • As in DNA, base-paired regions form anti-parallel
    helices
  • Same base-pairing rules as for DNA but U-G pairs
    also permitted

3
Examples of structural and/or catalytic RNAs
ribosomal RNA (rRNA) transfer RNA (tRNA) small
nuclear RNA (snRNA. e.g. U1) small nucleolar RNAs
(snoRNA) small cytoplasmic RNA (scRNA, e.g
7SL-RNA) microRNAs (miRNA)
4
RNA secondary structure elements Terminology
5
Purpose of RNA folding algorithm
  • Prediction of the native secondary structure of
    an RNA molecule
  • Formally, the secondary structure of an RNA
    consists of all pairs of bases that interact with
    each other, usually through standard Watson-Crick
    base-pairs.
  • Recognition of RNA functional motifs
  • RNA molecules may contain regulatory motifs that
    interact with RNA-binding proteins
  • Such motif may have a conserved secondary
    structure in addition to conserved primary
    structure elements.

6
Pseudo-knots
Cause problems to ordinary RNA folding algorithms.
Pseudoknots imply an arrangement of pairs of
interacting base pairs of the type a b a
b Such structure require intersecting lines in
the following type of representation
U U C C G A A G C U C A A C G G G A A A A U G A G
C U
7
RNA secondary structure notation
  • RNA secondary structures can be specified by a
    sequences of the three letters -,gt,lt.
  • Base pairs can be reconstructed as follows
  • process sequence from left to right
  • if base marked - leave unpaired
  • if base marked gt wait
  • if base marked lt connect to closest unpaired
    base marked gt on left side

AAGACUUCGGAUCUGGCGACACCC --gtgtgt----lt-ltlt-gtgt-gt---ltltlt
Note works only if no pseudoknots occur.
8
Nussinov algorithm Principle
Objective To find the secondary structure with
the maximal number of base pairs under the
pseudo-knot exclusion constraint. Principle Recur
sive procedure (dynamic programming
algorithm). Scoring function sum of base-pair
scores, no penalties for loops Optimal score
computed from the optimal scores of
subsequences. Filling-stage. Scores for
subsequences are recursively computed from and
recorded in a quadratic table. Trace-back Reconst
ruction of filling steps indicates optimal
structure Time-complexity O(N3) Limitations No
pseudo-knots, No constraints on loop
lengths No penalties for bulge loops No
scoring terms for base-pair stacking
inter-actions (see later)
9
Nussinov algorithm extension operations
10
Nussinov algorithm fill-stage
Scoring system d(i,j) 1 for all RNA
Watson-Crick base-pairs including G-U else d(i,j)
0.
Blue addition of unpaired base 3 or 7
Green addition of paired bases 1,7
Pink joining of substructures 1..4 and 5..8
11
Nussinov algorithm trace-back
current record stack 1,9 1,9
1,8 1,8 1,4 5,8 1,4 1,4
2,3 5,8 2,3 2,3 3,2 5,8 3,2
5,8 5,8 5,8 6,7 6,7 6,7 7,6 7,6

12
RNA folding by energy minimization
Note a bulge loop does not alter stacking energy!
13
Principle of the Zuker algorithm (RNAFOLD)
  • Energy minimization using a richer scoring
    system
  • Stacking energies scores for overlapping
    dinucleotide pairs
  • Bulge loop scores dependent on length
  • Hairpin loop scores dependent on length and
    closing pair
  • Internal loop scores dependent on length and
    closing pair
  • Same principle as Nussinov algorithm but
  • Two minimal energy values are stored for each
    subsequence
  • W(i,j) best structure on i,j
  • V(i,j) best structure on i,j closed by paired
    i,j.
  • Computational complexity essentially O(N3)
  • (if constraints on maximal loop sizes are applied)

14
Energy-parameters used by RNAFOLD
Note Some energy terms (e.g. for the terminal
mismatch of a hairpin) are Missing.
15
Prediction of RNA structure by covariance models
Motivation Energy minimization-based approaches
often predict large numbers of alternative RNA
secondary structures with very similar free
energy. A Multiple alignment of related RNAs
potentially reveals base pair interactions
Interacting positions in multiple alignment
positions expected to show co-variation
compatible with standard RNA base-pairing
rules Limitation requires within column
variation. No information is obtained for
completely conserved position.
16
Prediction of RNA structure by covariance models
Covariance measure used Mutual information
17
Covariance analysis tRNA-Phe
18
RNA motifs, signatures, domains, and families
  • Terminology
  • Motif short RNA regions with partly conserved
    primary and secondary structure, usually with a
    defined function.
  • Signature short RNA regions with partly
    conserved primary and secondary structure useful
    for identifying members of an RNA family.
  • Domain A larger RNA region with conserved
    secondary structure, usually considered an
    independent folding unit
  • Family A family of homologous and/or
    structurally related RNA molecules, e.g. tRNAs.
  • RNA sequence-structural motifs play a role in
    various biological processes
  • Translational control, e.g. iron-response element
    (IRE)
  • RNA degradation
  • RNA localization (zip-code motifs)

19
RNABOB and example of an RNA pattern recognition
program
Characteristics Supports qualitative patterns
(true/false no scores or probabilities) Based
on simple but powerful pattern syntax Fast search
engine Supports non-Watson-Crick type base
interactions Supports pseudo-knots ! Allows for
errors (mismatches) in the pattern.
20
RNABOB pattern syntax
S1 h1 s2 h2 s3 h2' h1' h1 00 NNNNNNNNNN h2 00
NNNN S1 0 NN s2 0 R s3 0 ANYA
Example
  • The first line indicates the ordering of pattern
    elements
  • s1, s2, s3 consist of contiguous unpaired
    sequences
  • h1, h1 represent complementary sequence segments
    forming a double helix.
  • Lines 2 to 6 contain the descriptions of each
    element
  • NNNNNNNNNN means that any base is permitted in
    this structure, the only constraint is that they
    have to respect base-pairing rules2020
  • Numbers indicate how many mismatches are allowed
    per element.
  • IUPAC codes are used to specify ambiguous
    positions Y CT
Write a Comment
User Comments (0)
About PowerShow.com