A Bioinformatics Approach to Improving the Prediction of Programmed Ribosomal Frameshifting - PowerPoint PPT Presentation

About This Presentation

Title:

A Bioinformatics Approach to Improving the Prediction of Programmed Ribosomal Frameshifting

Description:

... tested sequences in our experiments were taken from the PseudoBase and RECODE. ... RECODE contains the translational recoding events of PRFs in various ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 45

Provided by: yuk6

Category:

more less

Transcript and Presenter's Notes

Title: A Bioinformatics Approach to Improving the Prediction of Programmed Ribosomal Frameshifting

1
A Bioinformatics Approach to Improving the
Prediction of Programmed Ribosomal Frameshifting
Yen Lin Huang Department of Computer Science,
National Tsing-Hua University
08/27/2007 InCOB 2007
2
Outline

Introduction
Our algorithm
Experimental results and discussion
Conclusion

3
Introduction

Programmed ribosomal frameshifting ( PRF) is a
recoding by which translating ribosome switches
from initial (zero) reading frame to -1 or 1
reading frame at a specific position and then
continues its translation.

-1 PRF of SARS-CoV.
4
Introduction

Consequently, the recoding of PRF leads to an
expression of an alternative protein, which is
different from that produced by standard
translation.

5
Example -1 PRF of SARS-CoV

Genomic organization of the SARS-CoV.
If there is no frameshifting, polyprotein (pp) 1a
is translated from ORF1a if there is a
frameshifting, pp 1a/1b is translated from ORF1a
and ORF1b.

6
Significance of PRF

Many viruses, as well as bacteria, have been
found to utilize the PRF mechanism for increasing
the diversity of gene expression.
This event can also be found in a few eukaryotes.
It has been reported that for viruses, even small
changes in their frameshifting efficiencies can
inhibit viral propagation.
This implies that frameshifting sites in viruses
may present a potential target for antiviral
therapeutics.

7
Signals of -1 PRF

Two mRNA signals are critical for -1 PRF.
Slippery sequence
It is the place where -1 PRF event occurs.
3-stimulatory RNA structure
It forces ribosome to pause over the slippery
site such that the ribosome have a chance to
switch from zero reading frame to -1 reading
frame.

Stimulatory RNA structure
Slippery sequence
8
Model of -1 PRF
9
Slippery sequence

Usually, the slippery sequence is a
hepta-nucleotide (7-mer) of the general form X
XXY YYZ.
The spaces in X XXY YYZ separate codons in the
zero frame.
X and Z are any nucleotide and Y is mostly A or U.

10
Stimulatory RNA structures

In most cases, the RNA structure is an H-type
pseudoknot (or bulged helix), but in some cases,
it is a simple stem-loop.

11
Other factors of -1 PRF

The spacer between the slippery sequence and the
RNA structure is also important for -1 PRF.
The length of spacer alters the location of the
paused ribosome and hence influences its shifting
probability.
For some bacteria (such as E. coli), an internal
SD-like (Shine-Dalgarno-like) sequence often can
be found upstream of the -1 PRF site.

12
Signals of 1 PRF

It has been observed that the events of 1 PRF
occur less than those of -1 PRF.
Therefore, there is no general model that can be
widely accepted to describe 1 PRF.
The most known cellular genes with 1 PRF are
prfB and oaz genes.
prfB encode polypetide chain release factor 2
(RF2) in E. Coli.
oaz (ornithine decarboxylase antizyme ) encode
antizyme 1 in mammals.

13
Signals of 1 PRF

There is no general form of the slippery sequence
for 1 PRF.
EX the slippery sequences in prfB genes are CUU
URA C and those in oaz genes are UUU UGA or YCC
UGA, where R is A or G, and Y is C or U.
Not all 1 PRF sites have a downstream RNA
structure to function as the stimulator.
EX the 1 PRF site in the bacterial prfB genes.

14
Proteins produced by PRF

Protein products are longer than those by
standard translation.
This kind of PRF events are observed frequently.
It occurs near the end of the zero reading frame.
The ribosome switches to translate the new
reading frame by extending beyond the terminator
of the zero reading frame.

15
Proteins produced by PRF (cont.)

Protein products are shorter than those by
standard translation.
Such a PRF is less observed currently.
It takes place within the zero reading frame.
The ribosome then slips backwards and terminates
quickly, because it reaches a stop codon in the
new reading frame near the slippery site.

16
Outline

Introduction
Our algorithm
Experimental results and discussion
Conclusion

17
Previous results

Based on the model described above, several
computational approaches have been proposed for
prediction of PRFs.
Pattern recognition (Hammell et al., 1999 Moon
et al., 2004)
Statistical analysis (Shah et al., 2002)
Machine learning (Bekaert et al., 2003)
Hidden Markov models (Bekaert et al., 2005 2006)

Hammell, A. B. et al. (1999) Genome Res., 9,
417427 Moon, S. et al. (2004) Nucleic Acids
Res., 32, 48844892. Shah, A. A. et al. (2002)
Bioinformatics, 18, 10461053. Bekaert, M. et
al., (2003) Bioinformatics, 19, 327335. Bekaert,
M. et al., (2005) Mol. Cell, 17, 6168. Bekaert M
et al. (2006), Bioinformatics, 22, 2463-2465.
18
Our approach and web server

In this study, we improved the pattern
recognition method of detecting PRF sites in a
genomic sequence with using structural and
functional bioinformatics.
In addition, we have implemented this algorithm
as a web server, called PRooF (Programmed
Ribosomal Frameshifting), that is open to the
public for online analysis.
http//bioalgorithm.life.nctu.edu.tw/PROOF

19
Flowchart of our algorithm
20
Step 1 Identification of ORFs

All ORFs above a threshold (whose default is 100
nt) are identified from an input sequence.

21
Step 2 Detection of slippery sites

For the PRFs with longer products
Find all pairs of partially overlapping ORFs.
Use the pattern recognition to detect all
possible slippery sites in the overlapping
regions.
The slippery sequences conform to the default
patterns or user-defined patterns.

22
Step 2 Detection of slippery sites

For the PRFs with shorter products
We simply searches each identified ORF for its
possible slippery sites that possess the required
slippery sequences.

23
Detection of slippery sites (cont.)

If the input is a bacterial sequence, we further
looks for an internal SD-like sequence upstream
of each slippery site.

24
Step 3 Verifying protein function

For all candidate ORFs, their translated protein
sequences are further verified by InterProScan to
see if they have the potential protein
motifs/domains already registered in the InterPro
database.
InterPro is an integrated database of protein
families, domains and functional sites.
InterProScan is a tool the InterPro that combines
various protein signature recognition methods for
the detection of motifs/domains.

25
Step 3 Verifying protein function

For the cases of longer product
Each of two overlapping ORFs is translated into a
protein sequence, which is then examined by
InterProScan.
For the cases of shorter product
The full-length ORF is cut into two fragments at
the slippery site.
These two fragments are then translated into
protein sequences and are further examined by
InterProScan.

26
Step 4 Predicting RNA structure

We use a heuristic approach we developed before
(Huang et al., 2005) to detect the H-type
pseudoknot for the sequence fragment downstream
of the slippery site of each PRF candidate.

C.-H. Huang et al. (2005), A heuristic approach
for detecting RNA H-type pseudoknots,
Bioinformatics, Vol. 21, pp. 3501-3508.
27
Step 4 Predicting RNA structure

If no stable H-type pseudoknot is found, we
continue to use RNAMotif to search for all
possible bulged helixes and choose the most
stable one.
RNAMotif is an RNA structural motif search tool
that can find the fragments with the possibility
of forming a given structure.

28
Step 4 Predicting RNA structure

If neither a stable H-type pseudoknot nor a
bulged helix is found, RNAMotif is used to search
for simple stem-loops.

29
Outline

Introduction
Our algorithm
Experimental results and discussion
Conclusion

30
Our web server PRooF

Based on the algorithm we described above, we
have implemented a web server called PRooF for
online analysis.
http//bioalgorithm.life.nctu.edu.tw/PROOF
Our PRooF was tested with a number of sequences
with one or two known PRF sites from different
species.
The experimental results were compared with those
obtained by FSFinder2, which was developed by
Moon et al. based on pattern recognition.
Moon, S. et al. (2004) Nucleic Acids Res., 32,
48844892.
Byun, Y. et al. (2006) LNCS, 3991, 284-291.
Song, J.J. et al. (2007) Comput. Biol. Chem., 31,
298-302.

31
Testing data sets

The tested sequences in our experiments were
taken from the PseudoBase and RECODE.
PseudoBase collects RNA pseudoknots, some of
which are thought to function as the PRF
stimulators .
http//biology.leidenuniv.nl/batenburg/PKB.html
RECODE contains the translational recoding events
of PRFs in various biological species.
http//recode.genetics.utah.edu/
All the tests of PRooF and FSFinder2 were run
with default parameters, unless otherwise
specified.

32
Testing data sets of -1 PRF
33
Testing data sets of 1 PRF
34
Sensitivity and specificity

Sensitivity (Sen) 100 ? TP /(TPFN)
TP number of correctly predicted PRF sites
FN number of known PRF sites that were not
predicted
Specificity (Spe) 100 ? TN /(TNFP)
TN number of predicted non-PRF sites that
possess a required slippery sequence but are not
annotated as PRF sites in the database
FP number of incorrectly predicted PRF sites

35
Average sensitivity and specificity
Prediction of PRF Average Sensitivity Average Specificity
PRooF
FSFinder2

Indeed, our PRooF greatly improves detection
sensitivity, when compared with FSFinder2.
For the details, please refer to our paper.

36
Reduction of false positives

To reduce false positives, FSFinder2 considered
only two pairs of the partially overlapping ORFs
whose zero reading frames are the largest two in
length.
Moon et al. (2004) reported that these two pairs
had the highest probability to contain -1 and 1
PRF sites.
However, currently there seems to be no
biological evidence to support their observation.

37
Reduction of false positives

On the contrary, we utilized InterProScan to
screen out the partially overlapping ORFs whose
protein sequences contain no functional
motifs/domains.
As shown in our experiments, this approach of
functional bioinformatics is very useful to
reduce the number of false positives.

38
Predicted RNA structures

Most of the RNA structures predicted by PRooF are
H-type pseudoknots and bulged helixes, but many
RNA structures identified by FSFinder2 are just
simple stem-loops.
Both H-type pseudoknots and bulged helixes are
believed to be more constructive to promote the
efficiency of -1 PRFs and some 1 PRFs.
The reason is that they have a similar structure
of bend conformation and are structurally more
stable than simple stem-loops.

39
Predicted RNA structures

The -1 PRF of HIV-1 was first thought to be a
simple stem-loop, but it was then proved
experimentally to be a bulged helix.
Gaudin C. et al. (2005) J Mol Biol, 349,
1024-1035
The RNA structure predicted by PRooF for the -1
PRF of HIV-1 is indeed a bulged helix, but the
one predicted by FSFinder2 is just a simple
stem-loop.
It should be worthwhile to determine
experimentally the RNA structures in other
similar cases, where the structures predicted by
PRooF are H-type pseudoknots or bulged-helixes,
but just simple stem-loops by FSFinder2 or
reported in the literature.

40
Outline

Introduction
Our algorithm
Experimental results and Discussion
Conclusion

41
Conclusion

We improved the pattern recognition approach to
automatically detecting PRF sites in a given
genomic sequence with using both structural
bioinformatics and functional bioinformatics.
Based on this approach, we have developed a web
server PRooF that is open to the public for
online analysis.
http//bioalgorithm.life.nctu.edu.tw/PROOF

42
Conclusion (cont.)

In our experiments, the testing results showed
that PRooF greatly improves sensitivity, when
compared with FSFinder2.
Most of the RNA structures predicted by PRooF are
H-type pseudoknots and bulged helixes, whereas
those predicted by FSFinder2 are simple
stem-loops.
PRooF was implemented in a flexible way that it
allows the user to modify all the default
parameters such that some exceptional PRF sites
can still be detected.

43
Acknowledgement

Prof. Chin Lung Lu
Institute of Bioinformatics Department of
Biological Science and Technology, National Chiao
Tung University
Mr. Chia-Jung Wu
Institute of Bioinformatics, National Chiao Tung
University
Prof. Hien-Tai Chiu
Department of Biological Science and Technology,
National Chiao Tung University
Prof. Chuan Yi Tang
Department of Computer Science, National
Tsing-Hua University