RNA Secondary Structure Prediction - PowerPoint PPT Presentation

1 / 84
About This Presentation
Title:

RNA Secondary Structure Prediction

Description:

Introduction to RNA Sequence/Structure Analysis. RNAs have many structural and ... Mutagenic analysis of the SARS-CoV shift site and mass spectrometry of an ... – PowerPoint PPT presentation

Number of Views:1157
Avg rating:3.0/5.0
Slides: 85
Provided by: drer7
Category:

less

Transcript and Presenter's Notes

Title: RNA Secondary Structure Prediction


1
RNA Secondary Structure Prediction
  • Lecture 11 June 13, 2006
  • Algorithms of Molecular Biology

2
Introduction to RNA Sequence/Structure Analysis
  • RNAs have many structural and functional uses
  • Translation
  • Transcription
  • RNA splicing
  • RNA processing and editing
  • cellular localization
  • catalysis

3
RNA functions
  • RNA functions as
  • mRNA
  • rRNA
  • tRNA
  • In nuclear export
  • Part of spliceosome (snRNA)
  • Regulatory molecules (RNAi)
  • Enzymes
  • Viral genomes
  • Retrotransposons
  • Medicine

4
Biological Functions of Nucleic Acids
  • tRNA (transfer RNA, adaptor in translation)
  • rRNA (ribosomal RNA, component of ribosome)
  • snRNA (small nuclear RNA, component of
    splicesome)
  • snoRNA (small nucleolar RNA, takes part in
    processing of rRNA)
  • RNase P (ribozyme, processes tRNA)
  • SRP RNA (RNA component of signal recognition
    particle)
  • ..

5
RNA Sequence Analysis
  • RNA sequence analysis different from DNA sequence
    analysis
  • RNA structures fold and base pair to form
    secondary structures
  • not necessarily the sequence but structure
    conservation is most important with RNA

6
More Secondary Structures
Secondary Structures of Nucleic Acids
Pseudoknots
  • DNA is primarily in duplex form.
  • RNA is normally single stranded which can have a
    diverse form of secondary structures other than
    duplex.

Source Cornelis W. A. Pleij in Gesteland, R. F.
and Atkins, J. F. (1993) THE RNA WORLD. Cold
Spring Harbor Laboratory Press.
rRNA Secondary Structure Based on Phylogenetic
Data
7
3D Structures of RNACatalytic RNA
  • Some structural rules
  • Base pairing is stabilizing
  • Unpaired sections (loops) destabilize
  • 3D conformation with interactions makes up for
    this

Tertiary Structure Of Self-splicing RNA
Secondary Structure Of Self-splicing RNA
8
RNA secondary structure
  • E. coli Rnase P RNA secondary structure

Image source www.mbio.ncsu.edu/JWB/MB409/lecture/
lecture05/lecture05.htm
9
tRNA structure
10
RNA Variations
  • Variations in RNA sequence maintain base-pairing
    patterns for secondary structures
  • when a nucleotide in one base changes, the base
    it pairs to must also change to maintain the same
    structure
  • Such variation is referred to as covariation.

11
Covariance
  • secondary structure prediction in RNA takes into
    account conserved patterns of base-pairing
  • Positions of covariance are conserved matches,
    since they maintain the secondary structure
  • computationally challenging

12
Features of RNA
  • RNA polymer composed of a combination of four
    nucleotides
  • adenine (A)
  • cytosine (C)
  • guanine (G)
  • uracil (U)

13
Features of RNA
  • G-C and A-U form complementary hydrogen bonded
    base pairs (canonical Watson-Crick)
  • G-C base pairs being more stable (3 hydrogen
    bonds) A-U base pairs less stable (2 bonds)
  • non-canonical pairs can occur in RNA -- most
    common is G-U

14
Features of RNA
  • RNA typically produced as a single stranded
    molecule (unlike DNA)
  • Strand folds upon itself to form base pairs
  • secondary structure of the RNA

15
Features of RNA
  • intermediary between a linear molecule and a
    three-dimensional structure
  • Secondary structure mainly composed of
    double-stranded RNA regions formed by folding the
    single-stranded RNA molecule back on itself

16
Stem Loops (Hairpins)
  • Loops generally at least 4 bases long

17
Bulge Loops
  • occur when bases on one side of the structure
    cannot form base pairs

18
Interior Loops
  • occur when bases on both sides of the structure
    cannot form base pairs

19
Junctions (Multiloops)
  • two or more double-stranded regions converge to
    form a closed structure

20
Tertiary Interactions
  • tertiary interactions can be present as well
  • located using covariance analysis

21
Kissing Hairpins
  • unpaired bases of two separate hairpin loops base
    pair with one another

22
Pseudoknots
23
Hairpin-Bulge Interactions
24
RNA structure prediction methods
  • Dot Plot Analysis
  • Base-Pair Maximization
  • Free Energy Methods
  • Covariance Models

25
How RNA Prediction Methods Were Developed
  • Mount p. 334
  • Since Tinoco et al. measured energy associated
    with regions of ss a few energy based algorithms
    were developed
  • Nussinov and Jacobson (1980), Zuker and Stiegler
    (1981), Trifonov and Bolshoi (1983) .

26
Main approaches to RNA secondary structure
prediction
  • Energy minimization
  • dynamic programming approach
  • does not require prior sequence alignment
  • require estimation of energy terms contributing
    to secondary structure
  • Comparative sequence analysis
  • Using sequence alignment to find conserved
    residues and covariant base pairs.
  • most trusted

27
Circular Representation
  • base pairs of a secondary structure represented
    by a circle
  • arc drawn for each base pairing in the structure
  • If any arcs cross, a pseudoknot is present

28
Circular Representation
  • Image source http//www.finchcms.edu/cms/biochem/
    Walters/rna_folding.html

29
(No Transcript)
30
Circular Representation
31
Base-Pair Maximization
  • Find structure with the most base pairs
  • Efficient dynamic programming approach to this
    problem introduced by Ruth Nussinov (Tel-Aviv,
    1970s).
  • Tutorial in the classroom let us try to
    reconstruct Nussinovs algorithm
  •  

32
Nussinov Algorithm
  • Four ways to get the best structure between
    position i and j from the best structures of the
    smaller subsequences
  • 1)      Add i,j pair onto best structure found
    for subsequence i1, j-1
  • 2)      add unpaired position i onto best
    structure for subsequence i1, j
  • 3)      add unpaired position j onto best
    structure for subsequence i, j-1
  • 4)      combine two optimal structures i,k and
    k1, j

33
Nussinov Algorithm
34
Nussinov Algorithm
  • compares a sequence against itself in a dynamic
    programming matrix
  • Four rules for scoring the structure at a
    particular point
  • Since structure folds upon itself, only necessary
    to calculate half the matrix

35
Nussinov Algorithm
  • Initialization score for matches along main
    diagonal and diagonal just below it are set to
    zero
  • Formally, the scoring matrix, M, is initialized
     
  • Mii 0 for i 1 to L (L is sequence
    length)
  • Mii-1 0 for i 2 to L

36
Nussinov Algorithm
  • Using the sequence GGGAAAUCC, the matrix now
    looks like the following, such that sequences of
    length 1 will score 0

37
Nussinov Algorithm
  • Matrix Fill
  •  
  • Mij max of the following
  • Mi1j (ith residue is hanging off by itself)
  • Mij-1 (jth residue is hanging off by itself)
  • Mi1j-1 S(xi, xj) (ith and jth residue are
    paired if xi complement of xj, then S(xi, xj)
    1 otherwise it is 0.)
  • Mij MAXiltkltj (Mik Mk1j) (merging
    two substructures)

38
Nussinov Algorithm
  • The final filled matrix is as follows

39
Nussinov Algorithm
  • Traceback (P 271, Durbin et al) leads to the
    following structure

40
SCFG Version
  • Nussinov algorithm can be converted to a
    stochastic context-free grammar
  •  
  • S ? aS cS gS uS
  • S ? Sa Sc Sg Su
  • S ? aSu cSg uSa gSc
  • S ? SS

41
Nussinov Algorithm
  • Web Interface
  • http//ludwig-sun2.unil.ch/bsondere/nussinov/

42
Nussinov Results
43
Evaluation of Maximizing Basepairs
  • Simplistic approach
  • Does not give accurate structure predictions
  • nearest neighbor interactions
  • stacking interactions
  • loop length preferences

44
Free Energy Minimization RNA Structure
Prediction
  • All possible choices of complementary sequences
    are considered
  • Set(s) providing the most energetically stable
    molecules are chosen
  • When RNA is folded, some bases are paired with
    other while others remain free, forming loops
    in the molecule.
  • Speaking qualitatively, bases that are bonded
    tend to stabilize the RNA (i.e., have negative
    free energy), whereas unpaired bases form
    destabilizing loops (positive free energy).
  • Through thermodynamics experiments, it has been
    possible to estimate the free energy of some of
    the common types of loops that arise.
  • Because the secondary structure is related to the
    function of the RNA, we would like to be able to
    predict the secondary structure.
  • Given an RNA sequence, the RNA Folding Problem is
    to predict the secondary structure that minimizes
    the total free energy of the folded RNA molecule.

45
Prediction of Minimum-Energy RNA Structure is
Limited
  • In predicting minimum energy RNA secondary
    structure, several simplifying assumptions are
    made.
  • The most likely structure is identical to the
    energetically preferable structure
  • Nearest-neighbor energy calculations give
    reliable estimates of an experimentally
    achievable energy measurements
  • Usually we can neglect pseudoknots

46
Assumptions in secondary Structure Prediction
  • most likely structure similar to energetically
    most stable structure
  • Energy associated with any position is only
    influenced by local sequence and structure
  • Structure formed does not produce pseudoknots

47
Inferring Structure By Comparative Sequence
Analysis
  • most reliable computational method for
    determining RNA secondary structure
  • consider the example from Durbin, et al., p 266
  • See an additional lecture of David Mathews

48
Predicting Structure From a Single Sequence
  • RNA molecule only 200 bases long has 1050
    possible secondary structures
  • Find self-complementary regions in an RNA
    sequence using a dot-plot of the sequence against
    its complement
  • repeat regions can potentially base pair to form
    secondary structures
  • advanced dot-plot techniques incorporate free
    energy measures

49
Dot Plot
  • Image Source http//www.finchcms.edu/cms/biochem/
    Walters/rna_folding.html

50
Energy Minimization Methods
  • RNA folding is determined by biophysical
    properties
  • Energy minimization algorithm predicts the
    correct secondary structure by minimizing the
    free energy (?G)
  • ?G calculated as sum of individual contributions
    of
  • loops
  • base pairs
  • secondary structure elements
  • Energies of stems calculated as stacking
    contributions between neighboring base pairs

51
Energy Minimization Methods
  • Free-energy values (kcal/mole at 37oC ) are as
    follows

52
Energy Minimization Methods
  • Free-energy values (kcal/mole at 37oC ) are as
    follows

53
Energy Minimization Methods
  • Given the energy tables, and a folding, the free
    energy can be calculated for a structure

54
Calculating Best Structure
  • sequence is compared against itself using a
    dynamic programming approach
  • similar to the maximum base-paired structure
  • instead of using a scoring scheme, the score is
    based upon the free energy values
  • Gaps represent some form of a loop
  • The most widely used software that incorporates
    this minimum free energy algorithm is MFOLD.

55
Free Energy Minimization RNA Structure
Prediction
  • http//www.bioinfo.rpi.edu/zukerm/Bio-5495/RNAfol
    d-html/

56
Calculating Best Structure
  • most widely used software incorporating minimum
    free energy algorithm is MFOLD
  • http//www.bioinfo.rpi.edu/applications/mfold/
  • http//www.bioinfo.rpi.edu/applications/mfold/old/
    rna/

57
Example Sequence
  • GCTTACGACCATATCACGTTGAATGCACGC
  • CATCCCGTCCGATCTGGCAAGTTAAGCAAC
  • GTTGAGTCCAGTTAGTACTTGGATCGGAGA
  • CGGCCTGGGAATCCTGGATGTTGTAAGCT

58
MFOLD Energy Dot Plot
59
Optimal Structure
60
Suboptimal Folds
  • The correct structure is not necessarily
    structure with optimal free energy
  • within a certain threshold of the calculated
    minimum energy
  • MFOLD updated to report suboptimal folds

61
Comparison of Methods
62
Inferring Structure By Comparative Sequence
Analysis
  • first step is to calculate a multiple sequence
    alignment
  • Requires sequences be similar enough so that they
    can be initially aligned
  • Sequences should be dissimilar enough for
    covarying substitutions to be detected
  •  

63
Mutual Information
  • fxi frequency of a base in column i
  • fxixj joint (pairwise) frequency of a base
    pair between columns i and j
  • Information ranges from 0 and 2 bits
  • If i and j are uncorrelated, mutual information
    is 0

64
Mutual Information Plot
65
Mutual Information Plot
66
Frameshifting
  • Virology. 2005 Feb 20332(2)498-510
  • Programmed ribosomal frameshifting in decoding
    the SARS-CoV genome.
  • Baranov PV, Henderson CM, Anderson CB, Gesteland
    RF, Atkins JF, Howard MT.Department of Human
    Genetics, University of Utah, 15 N 2030 E, Room
    7410, Salt Lake City, UT 84112-5330,
    USA.Programmed ribosomal frameshifting is an
    essential mechanism used for the expression of
    orf1b in coronaviruses. Comparative analysis of
    the frameshift region reveals a universal shift
    site U_UUA_AAC, followed by a predicted
    downstream RNA structure in the form of either a
    pseudoknot or kissing stem loops. Frameshifting
    in SARS-CoV has been characterized in cultured
    mammalian cells using a dual luciferase reporter
    system and mass spectrometry. Mutagenic analysis
    of the SARS-CoV shift site and mass spectrometry
    of an affinity tagged frameshift product
    confirmed tandem tRNA slippage on the sequence
    U_UUA_AAC. Analysis of the downstream pseudoknot
    stimulator of frameshifting in SARS-CoV shows
    that a proposed RNA secondary structure in loop
    II and two unpaired nucleotides at the stem
    I-stem II junction in SARS-CoV are important for
    frameshift stimulation. These results demonstrate
    key sequences required for efficient
    frameshifting, and the utility of mass
    spectrometry to study ribosomal frameshifting.

67
Frameshifting
  • RNA-struct-frameshift.pdf
  • frameshifts.pdf
  • hepatitisC-frameshift.pdf

68
Covariance Models
  • 7 approaches to locate covarying sites offered in
    Mount, p225
  • key to covariance is mutual information content
  • mutual information content can be plotted on a
    motif logo

69
Mutual Information
  • Image source http//www.cbs.dtu.dk/gorodkin/appl
    /slogo.html

70
Covariance Models
  • A formal covariance model, COVE, devised by Eddy
    and Durbin
  • Provides very accurate results
  • extremely slow and unsuitable for searching large
    genomes

71
SCFGs
  • Stochastic Context Free Grammars (SCFGs) have
    also been used to model RNA secondary structure
  • Examples
  • tRNAScan-SE
  • program created to find snoRNAs
  • Grammars are created by using a training set of
    data, and then the grammars are applied to
    potential sequences to see if they fit into the
    language

72
SCFGs
  • SCFGs allow the detection of sequences belonging
    to a family
  • tRNAs
  • group I introns
  • snoRNAs
  • snRNAs

73
SCFGs
  • base-paired columns modeled by pairwise emitting
    non terminals
  • aWu aWa aWc aWg ...
  • single-stranded columns modeled by leftwise
    emitting nonterminals (when possible)
  • aW cW gW uW ..., when possible

74
SCFGs
  • Any RNA structure can be reduced to a SCFG (see
    Durbin, et al., p 278-279)

75
Transformational Grammars
  • First described by linguist Noam Chomsky in the
    1950s.
  • (Yes, the same Noam Chomsky who has expressed
    various dissident political views throughout the
    years!)

76
Transformational Grammars
  • Very important in computer science, most notably
    in compiler design
  • Covered in detail in compiler and automaton
    classes

77
Transformational Grammars
  • Idea take a set of outputs (sentence, RNA
    structure) and determine if it can be produced
    using a set of rules
  •  
  • consist of a set of symbols and production rules
  • The symbols can terminal (emitting) symbols or
    non-terminal symbols

78
Grammar for Palindromes
  • Consider palindromic DNA sequences
  • Five possible terminal symbols A, C, G, T, ?)
    (? represents the blank terminal symbol)

79
Grammar for Palindromes
  • Production Rules, where S and W are non-terminal
    symbols
  •  
  • S?W
  • W? aWa cWc gWg tWt
  • W? a c g t ?

80
Derivation of Sequences
  • Using these production rules, a derivation of the
    palindromic sequence acttgttca follows
  • S ? W ? aWa ? acWca?actWtca ? acttWttca ?
    acttgttca

81
Parse Trees
  • A context-free grammar can be aligned to a
    sequence using a parse tree
  • Root of the tree is the non-terminal start
    symbol, S
  • Leaves are terminal symbols
  • Internal nodes are the nonterminals
  • Leaves can be parsed from left to right to view
    the results of production

82
Parse Tree
83
RNA Structure SCFG
  • S?W
  • W? WW (bifurcation)
  • W? aWu cWg gWc uWa (stems)
  • W? gWu uWg
  • W? aW cW gW uW (bulges)
  • W? Wa Wc Wg Wu (bulges)
  • W? a c g t ?

84
Example of SCFG
  • structure for the RNA structure for the
    sequence produced by MFOLD, can be constructed
    (5 to 3)
  • GCUUACGACCAUAUCACGUUGAAUGCACGCCAUCCCGUCCGAUCUGGCAA
    GUUAAGCAACGUUGAGUCCAGUUAGUACUUGGAUCGGAGACGGCCUGGGA
    AUCCUGGAUGUUGUAAGCU

85
Example Construction
  • S?
  • W?
  • Wu?
  • gWcu?
  • gcWgcu?
  • gcuWagcu?
  • gcuuWaagcu?
  • gcuuaWuaagcu?
  • gcuuacWguaagcu?
  • gcuuacgWuguaagcu?
  • gcuuacgaWuuguaagcu?
  • gcuuacgacWguuguaagcu?
  • gcuuacgaccWguuguaagcu?
  • gcuuacgaccaWguuguaagcu?....

86
Other Programs
  • RNA Movies
  • http//bibiserv.techfak.uni-bielefeld.de/rnamovies
    /
  • (Visualization of RNA secondary structure)
  • RNA LOGOS
  • http//www.cbs.dtu.dk/gorodkin/appl/slogo.html
Write a Comment
User Comments (0)
About PowerShow.com