Gene Ontology GO

About This Presentation

Title:

Gene Ontology GO

Description:

Transcription factors (TF) are essential for transcription ... In addition, we consider the weaker G-U wobble pair, where the bases bond in a skewed fashion. ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 32

Provided by: VictorAS

Category:

more less

Transcript and Presenter's Notes

Title: Gene Ontology GO

1
Master CourseDNA/Protein Structure-function
Analysis and PredictionLecture 12DNA/RNA
Structure Prediction
2
Epigenectics Epigenomics Gene Expression

Transcription factors (TF) are essential for
transcription initialisation
Transcription is done by polymerase type II
(eukaryotes)
mRNA must then move from nucleus to ribosomes
(extranuclear) for translation
In eukaryotes there can be many TF-binding sites
upstream of an ORF that together regulate
transcription
Nucleosomes (chromatin structures composed of
histones) are structures round of which DNA
coils. This blocks access of TFs

3
Epigenectics Epigenomics Gene Expression
TF binding site (closed)
mRNA transcription
TATA
Nucleosome
TF binding site (open)
4
Expression

Because DNA has flexibility, bound TFs can move
in order to interact with pol II, which is
necessary for transcription initiation (see next
slide)
Recent TF-based initialisation theory includes a
wave function (Carlsberg) of TF-binding, which is
supposed to go from left to right. In this way
the TF-binding site nearest to the TATA box would
be bound by a TF which will then in turn bind Pol
II.
It has been suggested that Speckles have
something to do with this (speckels are observed
protein plaques in the nucleus)
Current prediction methods for gene
co-expression, e.g. finding a single shared TF
binding site, do not take this TF cooperativity
into account (parking lot optimisation)

5
Expression..
Speckel
6
DNA/RNA Structure-Function relationships

Apart from coding for proteins via genes, DNA is
now known to code for many more RNA-based cell
components (snRNA, rRNA,..)
The importance of structural features of DNA
(e.g. bendability, binding histones, methylation)
is becoming ever more important.
For the many different classes of RNA molecules,
structure is directly causing function
It is therefore important to analyse and predict
DNA structure, but particularly, RNA structure

7
Canonical base pairs
The complementary bases, C-G and A-U form stable
base pairs with each other through the creation
of hydrogen bonds between donor and acceptor
sites on the bases. These are called Watson-Crick
base pairs and are also referred to as canonical
base pairs. In addition, we consider the weaker
G-U wobble pair, where the bases bond in a skewed
fashion. Other base pairs also occur, some of
which are stable. These are all called
non-canonical base pairs.
8
RNA secondary structure
The secondary structure of an RNA molecule is the
collection of base pairs that occur in its
3-dimensional structure. An RNA sequence will be
represented as R r1, r1, r2, r3,, rn, where ri
is called the ith (ribo)nucleotide. Each ri
belongs to the set a,c,g,u.
.
9
Secondary Structure and Pseudoknots

A secondary structure, or folding, on R is a set
S of ordered pairs, written as i-j, satisfying
j - i gt 4
If i-j and i-j are 2 base pairs, (assuming
without loss in generality that i ? i ), then
either
i i and j j (they are the same base
pair),
i ? j ? i ? j (i-j precedes i-j), or
i ? i ? j ? j (i-j includes
i-j)

The last condition excludes pseudoknots. These
occur when 2 base pairs, i-j and i-j, satisfy
i ? i ? j ? j.
10
Pseudoknots
Pseudoknots are not taken into account in
secondary structure prediction because energy
minimizing methods cannot deal with them. It is
not known how to assign energies to the loops
created by pseudoknots and dynamic programming
methods that compute minimum energy structures
break down. For this reason, pseudoknots are
often considered as belonging to tertiary
structure. However, pseudoknots are real and
important structural features. However,
covariance methods (next slide) are able to
predict them from aligned, homologous RNA
sequences. The Figure on the next slide
represents a small pseudoknot model.
11
A 3D model of a pseudoknot
12

A 3D model of a pseudoknot
The 2 helices in the structure (preceding slide)
are stacked coaxially.
RNA structure can be predicted from sequence
data. There are two basic routes.
The first attempts structure prediction of single
sequences based on minimizing the free energy of
folding.
The second computes common foldings for a family
of aligned, homologous RNAs. Usually, the
alignment and secondary structure inference must
be performed simultaneously, or at least
iteratively (see next slide)

13
Predicting RNA Secondary Structure

By Thermodynamics Method
Minimize Gibbs Free Energy
By Phylogenetic Comparison Method (Covariance
method)
Compare RNA Sequences of Identical Function From
Different Organisms
By Combination of the Above Two Methods
In principle, this could be the most powerful
method

14
Thermodynamics

Gibbs Free Energy, G
Describes the energetics of biomolecules in
aqueous solution. The change in free energy, ?G,
for a chemical process, such as nucleic acid
folding, can be used to determine the direction
of the process
?G0 equilibrium
?Ggt0 unfavorable process
?Glt0 favorable process
Thus the natural tendency for biomolecules in
solution is to minimize free energy of the entire
system (biomolecules solvent).

15
Thermodynamics

?G ?H - T?S
?H is enthalpy, ?S is entropy, and T is the
temperature in Kelvin.
Molecular interactions, such as hydrogen bonds,
van der Waals and electrostatic interactions
contribute to the ?H term. ?S describes the
change of order of the system.
Thus, both molecular interactions as well as the
order of the system determine the direction of a
chemical process.
For any nucleic acid solution, it is extremely
difficult to calculate the free energy from first
principle
Biophysical methods can be used to measure free
energy changes

16
Thermodynamics
The Equilibrium Partition Function

For a population of structures S, a partition
function Q and the probability for a particular
folding, s can be calculated
The heat capacity for the RNA can be obtained
and
Heat capacity Cp (heat required to change
temperature by 1 degree) can be measured
experimentally, and can then be used to get
information on G

is probability
17
Zukers Energy Minimization Method (mFOLD)

An RNA Sequence is called R r1,r2,r3rn, where
ri is the ith ribonucleotide and it belongs to a
set of A, U, G, C
A secondary structure of R is a set S of base
pairs, i.j, which satisfies
1ltiltjltn
j-igt4 (cant have loop containing less than 4
nucleotides)
If i,j and i.j are two basepairs, (assume i lt
i), then either
i i and j j (same base pair)
i lt j lt i lt j (i.j proceeds i.j) or
i lt i lt jlt j (i.j includes i. j) (this
excludes pseudoknots which is iltiltjltj)
If e(i,j) is the energy for the base pair i.j,
the total energy for R is
The objective is to minimize E(S).

5
3
18
Zukers Energy Minimization Method (mFOLD)
Free Energy Parameters

Extensive database of free energies for the
following RNA units has been obtained (so called
Tinoco Rules and Turner Rules)
Single Strand Stacking energy
Canonical (AU GC) and non-canonical (GU)
basepairs in duplexes
Still lacking accurate free energy parameters for
Loops
Mismatches (AA, CA etc)
Using these energy parameters, the current
version of mFOLD can predict 73
phylogenetically deduced secondary structures.

19
Dynamic Programming (mFOLD)

An Example of W(i,j)

A matrix W(i,j) is computed that is dependent on
the experimentally measured basepair energy
e(i,j)
Recursion begins with i1, jn
If W(i1,j)W(i,j), then i is not paired. Set
ii1 and start the recursion again.
If W(i,j-1)W(i,j), then j is not paired. Set
jj-1 and start the recursion again.
If W(i,j)W(i,k)W(k1,j) , the fragment k1,j
gets put on a stack and the fragment ik is
analyzed by setting j k and going back to the
recursion beginning.
If W(i,j)e(i,j)W(i1,j-1), a basepair is
identified and is added to the list by setting
ii1 and jj-1

20
Suboptimal Folding (mFOLD)

For any sequence of N nucleotides, the expected
number of structures is greater than 1.8N
A sequence of 100 nucleotides has 3x1025
foldings. If a computer can calculate 1000
strs./s-1, it would take 1015 years!
mFOLD generates suboptimal foldings whose free
energy fall within a certain range of values.
Many of these structures are different in trivial
ways. These suboptimal foldings can still be
useful for designing experiments.

21
A computer predicted folding of Bacillus subtilis
RNase P RNA
These three representations are equivalent..
22
Secondary Structure Prediction for Aligned RNA
Sequences

Both energy as well as RNA sequence covariation
can be combined to predict RNA secondary
structures
To quantify sequence covariation, let fi(X) be
the frequency of base X at aligned position I and
fij(XY) be the frequency of finding X in i and Y
in j, the mutual information score is (Chiu
Kolodziejczak and Gutell Woese)
if for instance only GC and GU pairs at
positions i and j then Mij0.
The total energy for RNA is set to a linear
combination of measured free energy plus the
covariance contribution

23
Other Secondary Prediction Methods

Nusinov algorithm (historically important),
Hogeweg and Hesper (1984)
Vienna http//www.tbi.univie.ac.at/ivo/RNA/
uses the same recursive method in searching the
folding space
Added the option of computing the population of
RNA secondary structures by the equilibrium
partition function
Specific heat of an RNA can be calculated by
numerical differentiation from the equilibrium
partition function
RNACADhttp//www.cse.ucsc.edu/research/compbio/ss
urrna.html
An effort in improving multiple RNA sequence
alignment by taking into account both primary as
well secondary structure information
Use Stochastic Context-Free Grammars (SCFGs), an
extension of hidden Markov models (HMMs) method
Bundschuh, R., and Hwa, T. (1999) RNA secondary
structure formation A solvable model of
heteropolymer folding. PHYSICAL REVIEW LETTERS
83, 1479-1482.
This work treats RNA as heteropolymer and uses a
simplified Go-like model to provide an exact
solution for RNA transition between its native
and molten phases.

24
Running mFOLD

http//bioinfo.math.rpi.edu/mfold/rna/form1.cgi
Constraints can be entered
force bases i,i1,...,ik-1 to be double stranded
by enteringF i 0 k on 1 line in the
constraint box.
force consecutive base pairs i.j,i1.j-1,
...,ik-1.j-k1 by enteringF i j k on 1
line in the constraint box.
force bases i,i1,...,ik-1 to be single stranded
by enteringP i 0 k on 1 line in the
constraint box.
prohibit the consecutive base pairs i.j,i1.j-1,
...,ik-1.j-k1 by enteringP i j k on 1
line in the constraint box.
prohibit bases i to j from pairing with bases k
to l by enteringP i-j k-l on 1 line in the
constraint box.

25
Running mFOLD5-CUUGGAUGGGUGACCACCUGGG-3
No constraint F 1 21 2 entered
26
Predicting RNA 3D Structures

Currently available RNA 3D structure prediction
programs make use the fact that a tertiary
structure is built upon preformed secondary
structures
So once a solid secondary structure can be
predicted, it is possible to predict its 3D
structure
The chances of obtaining a valid 3D structure can
be increased by known space constraints among the
different secondary segments (e.g. cross-linking,
NMR results).
However, there are far less thermodynamic data on
3-D RNA structures which makes 3-D structure
prediction challenging.

27
Mc-Sym

Mc-Sym uses backtracking method to solve a
general problem in computer science called the
constraint satisfaction problem (CSP)
Backtracking algorithm organizes the search space
as a tree where each node corresponds to the
application of an operator
At each application, if the partially folded RNA
structure is consistent with its RNA
conformational database, the next operator is
applied, otherwise the entire attached branch is
pruned and the algorithm backtracks to the
previous node.

28
Mc-Sym (Continued)

The selection of a spanning tree for a particular
RNA is left to the user, but it is suggested that
the nucleotides imposing the most constraints are
introduced first
Users also supply a particular Mc-Sym
conformation for each nucleotide. These
conformers are derived from currently available
3D databases

29
Mc-Sym (Continued)
Sample script SEQUENCE 1 A
r GAAUGCCUGCGAGCAUCCC DECLARE
1 helixA 2 helixA
3 helixA 4 helixA
5 helixA 6 helixA
19 helixA

RELATIONS
18 helix 19
17 helix 18
16 helix 17
.
5 helix 6
4 helix 5
3 helix 4
2 helix 3
1 helix 2
BUILD
19 18 17 16 15 14
13 12
12 11 10 9 8 7 6
5
4 3 2 1
CONSTRAINTS

30
RNA-protein Interactions

There is currently no computational method that
can predict the RNA-protein interaction
interfaces
Statistical methods have been applied to identify
structure features at the protein-RNA interface.
For instance, ENTANCLE finds that most atoms
contributed from a protein to recogonizing an RNA
are from main chains (C, O, N, H), not from side
chains! But much remains to be done
Electrostatic potential has primary importance in
protein-RNA recognition due to the negatively
charged phosphate backbones. Efforts are made to
quantify electrostatic potential at the molecular
surface of a protein and RNA in order to predict
the site of RNA interaction. This often provides
good prediction at least for the site on the
protein.

31
References

Predicting RNA secondary structures
good reviews
1. Turner, D. H., and Sugimoto, N. (1988) RNA
structure prediction. Annu Rev Biophys Biophys
Chem 17, 167-92.
2. Zuker, M. (2000) Calculating nucleic acid
secondary structure. Curr Opin Struct Biol 10,
303-10.
Obtaining experimental thermodynamics parameters
3. Xia, T., SantaLucia, J., Jr., Burkard, M.
E., Kierzek, R., Schroeder, S. J., Jiao, X., Cox,
C., and Turner, D. H. (1998) Thermodynamic
parameters for an expanded nearest-neighbor model
for formation of RNA duplexes with Watson-Crick
base pairs. Biochemistry 37, 14719-35.
4. Borer, P. N., Dengler, B., Tinoco, I., Jr.,
and Uhlenbeck, O. C. (1974) Stability of
ribonucleic acid double-stranded helices. J Mol
Biol 86, 843-53.
Thermodynamics Theory for RNA structure
prediction
5. Bundschuh, R., and Hwa, T. (1999) RNA
secondary structure formation A solvable model
of heteropolymer folding. PHYSICAL REVIEW LETTERS
83, 1479-1482.
6. McCaskill, J. S. (1990) The equilibrium
partition function and base pair binding
probabilities for RNA secondary structure.
Biopolymers 29, 1105-19.