Title: A Bioinformatics Approach to Improving the Prediction of Programmed Ribosomal Frameshifting
1A Bioinformatics Approach to Improving the
Prediction of Programmed Ribosomal Frameshifting
Yen Lin Huang Department of Computer Science,
National Tsing-Hua University
08/27/2007 InCOB 2007
2Outline
- Introduction
- Our algorithm
- Experimental results and discussion
- Conclusion
3Introduction
- Programmed ribosomal frameshifting ( PRF) is a
recoding by which translating ribosome switches
from initial (zero) reading frame to -1 or 1
reading frame at a specific position and then
continues its translation.
-1 PRF of SARS-CoV.
4Introduction
- Consequently, the recoding of PRF leads to an
expression of an alternative protein, which is
different from that produced by standard
translation.
5Example -1 PRF of SARS-CoV
- Genomic organization of the SARS-CoV.
- If there is no frameshifting, polyprotein (pp) 1a
is translated from ORF1a if there is a
frameshifting, pp 1a/1b is translated from ORF1a
and ORF1b.
6Significance of PRF
- Many viruses, as well as bacteria, have been
found to utilize the PRF mechanism for increasing
the diversity of gene expression. - This event can also be found in a few eukaryotes.
- It has been reported that for viruses, even small
changes in their frameshifting efficiencies can
inhibit viral propagation. - This implies that frameshifting sites in viruses
may present a potential target for antiviral
therapeutics.
7Signals of -1 PRF
- Two mRNA signals are critical for -1 PRF.
- Slippery sequence
- It is the place where -1 PRF event occurs.
- 3-stimulatory RNA structure
- It forces ribosome to pause over the slippery
site such that the ribosome have a chance to
switch from zero reading frame to -1 reading
frame.
Stimulatory RNA structure
Slippery sequence
8Model of -1 PRF
9Slippery sequence
- Usually, the slippery sequence is a
hepta-nucleotide (7-mer) of the general form X
XXY YYZ. - The spaces in X XXY YYZ separate codons in the
zero frame. - X and Z are any nucleotide and Y is mostly A or U.
10Stimulatory RNA structures
- In most cases, the RNA structure is an H-type
pseudoknot (or bulged helix), but in some cases,
it is a simple stem-loop.
11Other factors of -1 PRF
- The spacer between the slippery sequence and the
RNA structure is also important for -1 PRF. - The length of spacer alters the location of the
paused ribosome and hence influences its shifting
probability. - For some bacteria (such as E. coli), an internal
SD-like (Shine-Dalgarno-like) sequence often can
be found upstream of the -1 PRF site.
12Signals of 1 PRF
- It has been observed that the events of 1 PRF
occur less than those of -1 PRF. - Therefore, there is no general model that can be
widely accepted to describe 1 PRF. - The most known cellular genes with 1 PRF are
prfB and oaz genes. - prfB encode polypetide chain release factor 2
(RF2) in E. Coli. - oaz (ornithine decarboxylase antizyme ) encode
antizyme 1 in mammals.
13Signals of 1 PRF
- There is no general form of the slippery sequence
for 1 PRF. - EX the slippery sequences in prfB genes are CUU
URA C and those in oaz genes are UUU UGA or YCC
UGA, where R is A or G, and Y is C or U. - Not all 1 PRF sites have a downstream RNA
structure to function as the stimulator. - EX the 1 PRF site in the bacterial prfB genes.
14Proteins produced by PRF
- Protein products are longer than those by
standard translation. - This kind of PRF events are observed frequently.
- It occurs near the end of the zero reading frame.
- The ribosome switches to translate the new
reading frame by extending beyond the terminator
of the zero reading frame.
15Proteins produced by PRF (cont.)
- Protein products are shorter than those by
standard translation. - Such a PRF is less observed currently.
- It takes place within the zero reading frame.
- The ribosome then slips backwards and terminates
quickly, because it reaches a stop codon in the
new reading frame near the slippery site.
16Outline
- Introduction
- Our algorithm
- Experimental results and discussion
- Conclusion
17Previous results
- Based on the model described above, several
computational approaches have been proposed for
prediction of PRFs. - Pattern recognition (Hammell et al., 1999 Moon
et al., 2004) - Statistical analysis (Shah et al., 2002)
- Machine learning (Bekaert et al., 2003)
- Hidden Markov models (Bekaert et al., 2005 2006)
Hammell, A. B. et al. (1999) Genome Res., 9,
417427 Moon, S. et al. (2004) Nucleic Acids
Res., 32, 48844892. Shah, A. A. et al. (2002)
Bioinformatics, 18, 10461053. Bekaert, M. et
al., (2003) Bioinformatics, 19, 327335. Bekaert,
M. et al., (2005) Mol. Cell, 17, 6168. Bekaert M
et al. (2006), Bioinformatics, 22, 2463-2465.
18Our approach and web server
- In this study, we improved the pattern
recognition method of detecting PRF sites in a
genomic sequence with using structural and
functional bioinformatics. - In addition, we have implemented this algorithm
as a web server, called PRooF (Programmed
Ribosomal Frameshifting), that is open to the
public for online analysis. - http//bioalgorithm.life.nctu.edu.tw/PROOF
19Flowchart of our algorithm
20Step 1 Identification of ORFs
- All ORFs above a threshold (whose default is 100
nt) are identified from an input sequence.
21Step 2 Detection of slippery sites
- For the PRFs with longer products
- Find all pairs of partially overlapping ORFs.
- Use the pattern recognition to detect all
possible slippery sites in the overlapping
regions. - The slippery sequences conform to the default
patterns or user-defined patterns.
22Step 2 Detection of slippery sites
- For the PRFs with shorter products
- We simply searches each identified ORF for its
possible slippery sites that possess the required
slippery sequences.
23Detection of slippery sites (cont.)
- If the input is a bacterial sequence, we further
looks for an internal SD-like sequence upstream
of each slippery site.
24Step 3 Verifying protein function
- For all candidate ORFs, their translated protein
sequences are further verified by InterProScan to
see if they have the potential protein
motifs/domains already registered in the InterPro
database. - InterPro is an integrated database of protein
families, domains and functional sites. - InterProScan is a tool the InterPro that combines
various protein signature recognition methods for
the detection of motifs/domains.
25Step 3 Verifying protein function
- For the cases of longer product
- Each of two overlapping ORFs is translated into a
protein sequence, which is then examined by
InterProScan. - For the cases of shorter product
- The full-length ORF is cut into two fragments at
the slippery site. - These two fragments are then translated into
protein sequences and are further examined by
InterProScan.
26Step 4 Predicting RNA structure
- We use a heuristic approach we developed before
(Huang et al., 2005) to detect the H-type
pseudoknot for the sequence fragment downstream
of the slippery site of each PRF candidate.
C.-H. Huang et al. (2005), A heuristic approach
for detecting RNA H-type pseudoknots,
Bioinformatics, Vol. 21, pp. 3501-3508.
27Step 4 Predicting RNA structure
- If no stable H-type pseudoknot is found, we
continue to use RNAMotif to search for all
possible bulged helixes and choose the most
stable one. - RNAMotif is an RNA structural motif search tool
that can find the fragments with the possibility
of forming a given structure.
28Step 4 Predicting RNA structure
- If neither a stable H-type pseudoknot nor a
bulged helix is found, RNAMotif is used to search
for simple stem-loops.
29Outline
- Introduction
- Our algorithm
- Experimental results and discussion
- Conclusion
30Our web server PRooF
- Based on the algorithm we described above, we
have implemented a web server called PRooF for
online analysis. - http//bioalgorithm.life.nctu.edu.tw/PROOF
- Our PRooF was tested with a number of sequences
with one or two known PRF sites from different
species. - The experimental results were compared with those
obtained by FSFinder2, which was developed by
Moon et al. based on pattern recognition. - Moon, S. et al. (2004) Nucleic Acids Res., 32,
48844892. - Byun, Y. et al. (2006) LNCS, 3991, 284-291.
- Song, J.J. et al. (2007) Comput. Biol. Chem., 31,
298-302.
31Testing data sets
- The tested sequences in our experiments were
taken from the PseudoBase and RECODE. - PseudoBase collects RNA pseudoknots, some of
which are thought to function as the PRF
stimulators . - http//biology.leidenuniv.nl/batenburg/PKB.html
- RECODE contains the translational recoding events
of PRFs in various biological species. - http//recode.genetics.utah.edu/
- All the tests of PRooF and FSFinder2 were run
with default parameters, unless otherwise
specified.
32Testing data sets of -1 PRF
33Testing data sets of 1 PRF
34Sensitivity and specificity
- Sensitivity (Sen) 100 ? TP /(TPFN)
- TP number of correctly predicted PRF sites
- FN number of known PRF sites that were not
predicted - Specificity (Spe) 100 ? TN /(TNFP)
- TN number of predicted non-PRF sites that
possess a required slippery sequence but are not
annotated as PRF sites in the database - FP number of incorrectly predicted PRF sites
35Average sensitivity and specificity
Prediction of PRF Average Sensitivity Average Specificity
PRooF
FSFinder2
- Indeed, our PRooF greatly improves detection
sensitivity, when compared with FSFinder2. - For the details, please refer to our paper.
36Reduction of false positives
- To reduce false positives, FSFinder2 considered
only two pairs of the partially overlapping ORFs
whose zero reading frames are the largest two in
length. - Moon et al. (2004) reported that these two pairs
had the highest probability to contain -1 and 1
PRF sites. - However, currently there seems to be no
biological evidence to support their observation.
37Reduction of false positives
- On the contrary, we utilized InterProScan to
screen out the partially overlapping ORFs whose
protein sequences contain no functional
motifs/domains. - As shown in our experiments, this approach of
functional bioinformatics is very useful to
reduce the number of false positives.
38Predicted RNA structures
- Most of the RNA structures predicted by PRooF are
H-type pseudoknots and bulged helixes, but many
RNA structures identified by FSFinder2 are just
simple stem-loops. - Both H-type pseudoknots and bulged helixes are
believed to be more constructive to promote the
efficiency of -1 PRFs and some 1 PRFs. - The reason is that they have a similar structure
of bend conformation and are structurally more
stable than simple stem-loops.
39Predicted RNA structures
- The -1 PRF of HIV-1 was first thought to be a
simple stem-loop, but it was then proved
experimentally to be a bulged helix. - Gaudin C. et al. (2005) J Mol Biol, 349,
1024-1035 - The RNA structure predicted by PRooF for the -1
PRF of HIV-1 is indeed a bulged helix, but the
one predicted by FSFinder2 is just a simple
stem-loop. - It should be worthwhile to determine
experimentally the RNA structures in other
similar cases, where the structures predicted by
PRooF are H-type pseudoknots or bulged-helixes,
but just simple stem-loops by FSFinder2 or
reported in the literature.
40Outline
- Introduction
- Our algorithm
- Experimental results and Discussion
- Conclusion
41Conclusion
- We improved the pattern recognition approach to
automatically detecting PRF sites in a given
genomic sequence with using both structural
bioinformatics and functional bioinformatics. - Based on this approach, we have developed a web
server PRooF that is open to the public for
online analysis. - http//bioalgorithm.life.nctu.edu.tw/PROOF
42Conclusion (cont.)
- In our experiments, the testing results showed
that PRooF greatly improves sensitivity, when
compared with FSFinder2. - Most of the RNA structures predicted by PRooF are
H-type pseudoknots and bulged helixes, whereas
those predicted by FSFinder2 are simple
stem-loops. - PRooF was implemented in a flexible way that it
allows the user to modify all the default
parameters such that some exceptional PRF sites
can still be detected.
43Acknowledgement
- Prof. Chin Lung Lu
- Institute of Bioinformatics Department of
Biological Science and Technology, National Chiao
Tung University - Mr. Chia-Jung Wu
- Institute of Bioinformatics, National Chiao Tung
University - Prof. Hien-Tai Chiu
- Department of Biological Science and Technology,
National Chiao Tung University - Prof. Chuan Yi Tang
- Department of Computer Science, National
Tsing-Hua University
44- Thank you for your attention