Bioinformatics Master Course II: DNA/Protein structure-function analysis and prediction - PowerPoint PPT Presentation


PPT – Bioinformatics Master Course II: DNA/Protein structure-function analysis and prediction PowerPoint presentation | free to download - id: 715772-NTZlZ


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Bioinformatics Master Course II: DNA/Protein structure-function analysis and prediction


... Current prediction methods for gene co ... method that can predict the RNA-protein interaction ... for an expanded nearest-neighbor model for formation ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 17
Provided by: Victor195
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Bioinformatics Master Course II: DNA/Protein structure-function analysis and prediction

Bioinformatics Master Course IIDNA/Protein
structure-function analysis and prediction
  • Lecture 12
  • DNA/RNA structure prediction
  • Centre for Integrative Bioinformatics VU

  • Transcription factors (TF) are essential for
    transcription initialisation
  • Transcription done by polymerase type II
  • mRNA must then move from nucleus to ribosomes
    (extranuclear) for translation
  • In eukaryotes there can be many TF-binding sites
    upstream of an ORF that together regulate
  • Nucleosomes (chromatin structures composed of
    histones) are structures round of which DNA
    coils. This blocks access of TFs

TF binding site (closed)
mRNA transcription
TF binding site (open)
TF binding site
  • Because DNA has flexibility, bound TFs can move
    in order to interact with pol II, which is
    necessary for transcription initiation.
  • Recent TF-based initialisation theory includes a
    wave function (Carlsberg) of TF-binding, which is
    supposed to go from left to right. In this way
    the TF-binding site nearest to the TATA box would
    be bound by a TF which will then in turn bind Pol
  • It has been suggested that Speckles have
    something to do with this (speckels are observed
    protein plaques in the nucleus)
  • Current prediction methods for gene
    co-expression, e.g. finding a single shared TF
    binding site, do not take this TF cooperativity
    into account

mRNA transcription
Pol II
This is still a hypothetical model
TF binding site
Canonical base pairs
The complementary bases, C-G and A-U form stable
base pairs with each other through the creation
of hydrogen bonds between donor and acceptor
sites on the bases. These are called Watson-Crick
(W-C) base pairs. These are called canonical base
pairs. In addition, we consider the weaker G-U
wobble pair, where the bases bond in a skewed
fashion. Other base pairs also occur, some of
which are stable. These are all called
non-canonical base pairs.
The secondary structure of an RNA molecule is the
collection of base pairs that occur in its
3-dimensional structure. An RNA sequence will be
represented as R r1, r1, r2, r3,, rn, where ri
is called the ith (ribo)nucleotide. Each ri
belongs to the set a,c,g,u.                    
Secondary Structure and pseudoknots
The last condition excludes pseudoknots. These
occur when 2 base pairs, i-j  and i-j, satisfy
i ? i ? j  ? j. Pseudoknots are not taken
into account in secondary structure prediction
because energy minimizing methods cannot deal
with them. It is not known how to assign energies
to the loops created by pseudoknots and dynamic
programming methods that compute minimum energy
structures break down. For this reason,
pseudoknots are often considered as belonging to
tertiary structure. However, pseudoknots are real
and important structural features. However,
covariance methods (next slide) are able to
predict them from aligned, homologous RNA
sequences. The Figure on the next slide
represents a small pseudoknot model.
  • A secondary structure, or folding, on R is a set
    S of ordered pairs, written as i-j, satisfying
  • j - i gt 4
  • If i-j  and i-j are 2 base pairs, (assuming
    without loss in generality that i ? i ), then
  • i  i  and j  j  (they are the same base
  • i ? j ? i ? j (i-j precedes i-j), or
  • i ? i ? j  ? j (i-j includes

  • A 3D model of a pseudoknot
  • The 2 helices in this structure are stacked
  • RNA structure can be predicted from sequence
    data. There are two basic routes.
  • The first attempts structure prediction of single
    sequences based on minimizing the free energy of
  • The second computes common foldings for a family
    of aligned, homologous RNAs. Usually, the
    alignment and secondary structure inference must
    be performed simultaneously, or at least
    iteratively (see next slide)

Predicting RNA Secondary Structure
  • By Thermodynamics Method
  • Minimize Gibbs Free Energy
  • By Phylogenetic Comparison Method (Covariance
  • Compare RNA Sequences of Identical Function From
    Different Organisms
  • By Combination of the Above Two Methods
  • In principle, this could be the most powerful

The Equilibrium Partition Function
  • Gibbs Free Energy, G
  • Describes the energetics of biomolecules in
    aqueous solution. The change in free energy, ?G,
    for a chemical process, such as nucleic acid
    folding, can be used to determine the direction
    of the process
  • ?G0 equilibrium
  • ?Ggt0 unfavorable process
  • ?Glt0 favorable process
  • Thus the natural tendency for biomolecules in
    solution is to minimize free energy of the entire
    system (biomolecules solvent).
  •  ?G ?H - T?S
  • ?H is enthalpy, ?S is entropy, and T is the
    temperature in Kelvin.
  • Molecular interactions, such as hydrogen bonds,
    van der Waals and electrostatic interactions
    contribute to the ?H term. ?S describes the
    change of order of the system.
  • Thus, both molecular interactions as well as the
    order of the system determine the direction of a
    chemical process.
  • For any nucleic acid solution, it is extremely
    difficult to calculate the free energy from first
  • Biophysical methods can be used to measure free
    energy changes
  • For a population of structures S, a partition
    function Q and the probability for a particular
    folding, s can be calculated
  • The heat capacity for the RNA can be obtained
  • and
  • Heat capacity Cp (heat required to change
    temperature by 1 degree) can be measured
    experimentally, and can then be used to get
    information on G

is probability
Zukers Energy Minimization Method (mFOLD)
Free Energy Parameters
  • Extensive database of free energies for the
    following RNA units has been obtained (so called
    Tinoco Rules and Turner Rules)
  • Single Strand Stacking energy
  • Canonical (AU GC) and non-canonical (GU)
    basepairs in duplexes
  • Still lacking accurate free energy parameters for
  • Loops
  • Mismatches (AA, CA etc)
  • Using these energy parameters, the current
    version of mFOLD can predict 73
    phylogenetically deduced secondary structures.
  • An RNA Sequence is called R r1,r2,r3rn, where
    ri is the ith ribonucleotide and it belongs to a
    set of A, U, G, C
  • A secondary structure of R is a set S of base
    pairs, i.j, which satisfies
  • 1ltiltjltn
  • j-igt4 (cant have loop containing less than 4
  • If i,j and i.j are two basepairs, (assume i lt
    i), then either
  • i i and j j (same base pair)
  • i lt j lt i lt j (i.j proceeds i.j) or
  • i lt i lt jlt j (i.j includes i. j) (this
    excludes pseudoknots which is iltiltjltj)
  • If e(i,j) is the energy for the base pair i.j,
    the total energy for R is
  • The objective is to minimize E(S).

Dynamic Programming (mFOLD)
Suboptimal Folding (mFOLD)
  • An Example of W(i,j)
  • A matrix W(i,j) is computed that is dependent on
    the experimentally measured basepair energy
  • Recursion begins with i1, jn
  • If W(i1,j)W(i,j), then i is not paired. Set
    ii1 and start the recursion again.
  • If W(i,j-1)W(i,j), then j is not paired. Set
    jj-1 and start the recursion again.
  • If W(i,j)W(i,k)W(k1,j) , the fragment k1,j
    gets put on a stack and the fragment ik is
    analyzed by setting j k and going back to the
    recursion beginning.
  • If W(i,j)e(i,j)W(i1,j-1), a basepair is
    identified and is added to the list by setting
    ii1 and jj-1
  • For any sequence of N nucleotides, the expected
    number of structures is greater than 1.8N
  • A sequence of 100 nucleotides has 3x1025
    foldings. If a computer can calculate 1000
    strs./s-1, it would take 1015 years!
  • mFOLD generates suboptimal foldings whose free
    energy fall within a certain range of values.
    Many of these structures are different in trivial
    ways. These suboptimal foldings can still be
    useful for designing experiments.

A computer predicted folding of Bacillus subtilis
Other Secondary Prediction Methods
Secondary Structure Prediction for Aligned RNA
  • Nusinov algorithm (historically important),
    Hogeweg and Hesper (1984)
  • Vienna http//
  • uses the same recursive method in searching the
    folding space
  • Added the option of computing the population of
    RNA secondary structures by the equilibrium
    partition function
  • Specific heat of an RNA can be calculated by
    numerical differentiation from the equilibrium
    partition function
  • RNACADhttp//
  • An effort in improving multiple RNA sequence
    alignment by taking into account both primary as
    well secondary structure information
  • Use Stochastic Context-Free Grammars (SCFGs), an
    extension of hidden Markov models (HMMs) method
  • Bundschuh, R., and Hwa, T. (1999) RNA secondary
    structure formation A solvable model of
    heteropolymer folding. PHYSICAL REVIEW LETTERS
    83, 1479-1482.
  • This work treats RNA as heteropolymer and uses a
    simplified Go-like model to provide an exact
    solution for RNA transition between its native
    and molten phases.
  • Both energy as well as RNA sequence covariation
    can be combined to predict RNA secondary
  • To quantify sequence covariation, let fi(X) be
    the frequency of base X at aligned position I and
    fij(XY) be the frequency of finding X in i and Y
    in j, the mutual information score is (Chiu
    Kolodziejczak and Gutell Woese)
  • if for instance only GC and GU pairs at
    positions i and j then Mij0.
  • The total energy for RNA is set to a linear
    combination of measured free energy plus the
    covariance contribution

Running mFOLD
  • http//
  • Constraints can be entered
  • force bases i,i1,...,ik-1 to be double stranded
    by enteringF   i   0   k on 1 line in the
    constraint box.
  • force consecutive base pairs i.j,i1.j-1,
    ...,ik-1.j-k1 by enteringF   i   j   k on 1
    line in the constraint box.
  • force bases i,i1,...,ik-1 to be single stranded
    by enteringP   i   0   k on 1 line in the
    constraint box.
  • prohibit the consecutive base pairs i.j,i1.j-1,
    ...,ik-1.j-k1 by enteringP   i   j   k on 1
    line in the constraint box.
  • prohibit bases i to j from pairing with bases k
    to l by enteringP   i-j   k-l on 1 line in the
    constraint box.

No constraint F 1 21 2 entered
Predicting RNA 3D Structures
  • Mc-Sym uses backtracking method to solve a
    general problem in computer science called the
    constraint satisfaction problem (CSP)
  • Backtracking algorithm organizes the search space
    as a tree where each node corresponds to the
    application of an operator
  • At each application, if the partially folded RNA
    structure is consistent with its RNA
    conformational database, the next operator is
    applied, otherwise the entire attached branch is
    pruned and the algorithm backtracks to the
    previous node.
  • Currently available RNA 3D structure prediction
    programs make use the fact that a tertiary
    structure is built upon preformed secondary
  • So once a solid secondary structure can be
    predicted, it is possible to predict its 3D
  • The chances of obtaining a valid 3D structure can
    be increased by known space constraints among the
    different secondary segments (e.g. cross-linking,
    NMR results).
  • However, there are far less thermodynamic data on
    3-D RNA structures which makes 3-D structure
    prediction challenging.

Mc-Sym (Continued)
Mc-Sym (Continued)
Sample script SEQUENCE 1 A r
1 helixA 2 helixA
3 helixA 4 helixA
5 helixA 6 helixA
19 helixA
  • 18 helix 19
  • 17 helix 18
  • 16 helix 17
  • .
  • 5 helix 6
  • 4 helix 5
  • 3 helix 4
  • 2 helix 3
  • 1 helix 2
  • 19 18 17 16 15 14
    13 12
  • 12 11 10 9 8 7 6
  • 4 3 2 1
  • The selection of a spanning tree for a particular
    RNA is left to the user, but it is suggested that
    the nucleotides imposing the most constraints are
    introduced first
  • Users also supply a particular Mc-Sym
    conformation for each nucleotide. These
    conformers are derived from currently available
    3D databases

RNA-protein Interactions
  • There is currently no computational method that
    can predict the RNA-protein interaction
  • Statistical methods have been applied to identify
    structure features at the protein-RNA interface.
    For instance, ENTANCLE finds that most atoms
    contributed from a protein to recogonizing an RNA
    are from main chains (C, O, N, H), not from side
    chains! But much remains to be done
  • Electrostatic potential has primary importance in
    protein-RNA recognition due to the negatively
    charged phosphate backbones. Efforts are made to
    quantify electrostatic potential at the molecular
    surface of a protein and RNA in order to predict
    the site of RNA interaction. This often provides
    good prediction at least for the site on the
  • Predicting RNA secondary structures
  • good reviews
  • 1. Turner, D. H., and Sugimoto, N. (1988) RNA
    structure prediction. Annu Rev Biophys Biophys
    Chem 17, 167-92.
  • 2. Zuker, M. (2000) Calculating nucleic acid
    secondary structure. Curr Opin Struct Biol 10,
  • Obtaining experimental thermodynamics parameters
  • 3. Xia, T., SantaLucia, J., Jr., Burkard, M.
    E., Kierzek, R., Schroeder, S. J., Jiao, X., Cox,
    C., and Turner, D. H. (1998) Thermodynamic
    parameters for an expanded nearest-neighbor model
    for formation of RNA duplexes with Watson-Crick
    base pairs. Biochemistry 37, 14719-35.
  • 4. Borer, P. N., Dengler, B., Tinoco, I., Jr.,
    and Uhlenbeck, O. C. (1974) Stability of
    ribonucleic acid double-stranded helices. J Mol
    Biol 86, 843-53.
  • Thermodynamics Theory for RNA structure
  • 5. Bundschuh, R., and Hwa, T. (1999) RNA
    secondary structure formation A solvable model
    of heteropolymer folding. PHYSICAL REVIEW LETTERS
    83, 1479-1482.
  • 6. McCaskill, J. S. (1990) The equilibrium
    partition function and base pair binding
    probabilities for RNA secondary structure.
    Biopolymers 29, 1105-19.