A Bioinformatics Approach to Improving the Prediction of Programmed Ribosomal Frameshifting - PowerPoint PPT Presentation

About This Presentation
Title:

A Bioinformatics Approach to Improving the Prediction of Programmed Ribosomal Frameshifting

Description:

... tested sequences in our experiments were taken from the PseudoBase and RECODE. ... RECODE contains the translational recoding events of PRFs in various ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 45
Provided by: yuk6
Category:

less

Transcript and Presenter's Notes

Title: A Bioinformatics Approach to Improving the Prediction of Programmed Ribosomal Frameshifting


1
A Bioinformatics Approach to Improving the
Prediction of Programmed Ribosomal Frameshifting
Yen Lin Huang Department of Computer Science,
National Tsing-Hua University
08/27/2007 InCOB 2007
2
Outline
  • Introduction
  • Our algorithm
  • Experimental results and discussion
  • Conclusion

3
Introduction
  • Programmed ribosomal frameshifting ( PRF) is a
    recoding by which translating ribosome switches
    from initial (zero) reading frame to -1 or 1
    reading frame at a specific position and then
    continues its translation.

-1 PRF of SARS-CoV.
4
Introduction
  • Consequently, the recoding of PRF leads to an
    expression of an alternative protein, which is
    different from that produced by standard
    translation.

5
Example -1 PRF of SARS-CoV
  • Genomic organization of the SARS-CoV.
  • If there is no frameshifting, polyprotein (pp) 1a
    is translated from ORF1a if there is a
    frameshifting, pp 1a/1b is translated from ORF1a
    and ORF1b.

6
Significance of PRF
  • Many viruses, as well as bacteria, have been
    found to utilize the PRF mechanism for increasing
    the diversity of gene expression.
  • This event can also be found in a few eukaryotes.
  • It has been reported that for viruses, even small
    changes in their frameshifting efficiencies can
    inhibit viral propagation.
  • This implies that frameshifting sites in viruses
    may present a potential target for antiviral
    therapeutics.

7
Signals of -1 PRF
  • Two mRNA signals are critical for -1 PRF.
  • Slippery sequence
  • It is the place where -1 PRF event occurs.
  • 3-stimulatory RNA structure
  • It forces ribosome to pause over the slippery
    site such that the ribosome have a chance to
    switch from zero reading frame to -1 reading
    frame.

Stimulatory RNA structure
Slippery sequence
8
Model of -1 PRF
9
Slippery sequence
  • Usually, the slippery sequence is a
    hepta-nucleotide (7-mer) of the general form X
    XXY YYZ.
  • The spaces in X XXY YYZ separate codons in the
    zero frame.
  • X and Z are any nucleotide and Y is mostly A or U.

10
Stimulatory RNA structures
  • In most cases, the RNA structure is an H-type
    pseudoknot (or bulged helix), but in some cases,
    it is a simple stem-loop.

11
Other factors of -1 PRF
  • The spacer between the slippery sequence and the
    RNA structure is also important for -1 PRF.
  • The length of spacer alters the location of the
    paused ribosome and hence influences its shifting
    probability.
  • For some bacteria (such as E. coli), an internal
    SD-like (Shine-Dalgarno-like) sequence often can
    be found upstream of the -1 PRF site.

12
Signals of 1 PRF
  • It has been observed that the events of 1 PRF
    occur less than those of -1 PRF.
  • Therefore, there is no general model that can be
    widely accepted to describe 1 PRF.
  • The most known cellular genes with 1 PRF are
    prfB and oaz genes.
  • prfB encode polypetide chain release factor 2
    (RF2) in E. Coli.
  • oaz (ornithine decarboxylase antizyme ) encode
    antizyme 1 in mammals.

13
Signals of 1 PRF
  • There is no general form of the slippery sequence
    for 1 PRF.
  • EX the slippery sequences in prfB genes are CUU
    URA C and those in oaz genes are UUU UGA or YCC
    UGA, where R is A or G, and Y is C or U.
  • Not all 1 PRF sites have a downstream RNA
    structure to function as the stimulator.
  • EX the 1 PRF site in the bacterial prfB genes.

14
Proteins produced by PRF
  • Protein products are longer than those by
    standard translation.
  • This kind of PRF events are observed frequently.
  • It occurs near the end of the zero reading frame.
  • The ribosome switches to translate the new
    reading frame by extending beyond the terminator
    of the zero reading frame.

15
Proteins produced by PRF (cont.)
  • Protein products are shorter than those by
    standard translation.
  • Such a PRF is less observed currently.
  • It takes place within the zero reading frame.
  • The ribosome then slips backwards and terminates
    quickly, because it reaches a stop codon in the
    new reading frame near the slippery site.

16
Outline
  • Introduction
  • Our algorithm
  • Experimental results and discussion
  • Conclusion

17
Previous results
  • Based on the model described above, several
    computational approaches have been proposed for
    prediction of PRFs.
  • Pattern recognition (Hammell et al., 1999 Moon
    et al., 2004)
  • Statistical analysis (Shah et al., 2002)
  • Machine learning (Bekaert et al., 2003)
  • Hidden Markov models (Bekaert et al., 2005 2006)

Hammell, A. B. et al. (1999) Genome Res., 9,
417427 Moon, S. et al. (2004) Nucleic Acids
Res., 32, 48844892. Shah, A. A. et al. (2002)
Bioinformatics, 18, 10461053. Bekaert, M. et
al., (2003) Bioinformatics, 19, 327335. Bekaert,
M. et al., (2005) Mol. Cell, 17, 6168. Bekaert M
et al. (2006), Bioinformatics, 22, 2463-2465.
18
Our approach and web server
  • In this study, we improved the pattern
    recognition method of detecting PRF sites in a
    genomic sequence with using structural and
    functional bioinformatics.
  • In addition, we have implemented this algorithm
    as a web server, called PRooF (Programmed
    Ribosomal Frameshifting), that is open to the
    public for online analysis.
  • http//bioalgorithm.life.nctu.edu.tw/PROOF

19
Flowchart of our algorithm
20
Step 1 Identification of ORFs
  • All ORFs above a threshold (whose default is 100
    nt) are identified from an input sequence.

21
Step 2 Detection of slippery sites
  • For the PRFs with longer products
  • Find all pairs of partially overlapping ORFs.
  • Use the pattern recognition to detect all
    possible slippery sites in the overlapping
    regions.
  • The slippery sequences conform to the default
    patterns or user-defined patterns.

22
Step 2 Detection of slippery sites
  • For the PRFs with shorter products
  • We simply searches each identified ORF for its
    possible slippery sites that possess the required
    slippery sequences.

23
Detection of slippery sites (cont.)
  • If the input is a bacterial sequence, we further
    looks for an internal SD-like sequence upstream
    of each slippery site.

24
Step 3 Verifying protein function
  • For all candidate ORFs, their translated protein
    sequences are further verified by InterProScan to
    see if they have the potential protein
    motifs/domains already registered in the InterPro
    database.
  • InterPro is an integrated database of protein
    families, domains and functional sites.
  • InterProScan is a tool the InterPro that combines
    various protein signature recognition methods for
    the detection of motifs/domains.

25
Step 3 Verifying protein function
  • For the cases of longer product
  • Each of two overlapping ORFs is translated into a
    protein sequence, which is then examined by
    InterProScan.
  • For the cases of shorter product
  • The full-length ORF is cut into two fragments at
    the slippery site.
  • These two fragments are then translated into
    protein sequences and are further examined by
    InterProScan.

26
Step 4 Predicting RNA structure
  • We use a heuristic approach we developed before
    (Huang et al., 2005) to detect the H-type
    pseudoknot for the sequence fragment downstream
    of the slippery site of each PRF candidate.

C.-H. Huang et al. (2005), A heuristic approach
for detecting RNA H-type pseudoknots,
Bioinformatics, Vol. 21, pp. 3501-3508.
27
Step 4 Predicting RNA structure
  • If no stable H-type pseudoknot is found, we
    continue to use RNAMotif to search for all
    possible bulged helixes and choose the most
    stable one.
  • RNAMotif is an RNA structural motif search tool
    that can find the fragments with the possibility
    of forming a given structure.

28
Step 4 Predicting RNA structure
  • If neither a stable H-type pseudoknot nor a
    bulged helix is found, RNAMotif is used to search
    for simple stem-loops.

29
Outline
  • Introduction
  • Our algorithm
  • Experimental results and discussion
  • Conclusion

30
Our web server PRooF
  • Based on the algorithm we described above, we
    have implemented a web server called PRooF for
    online analysis.
  • http//bioalgorithm.life.nctu.edu.tw/PROOF
  • Our PRooF was tested with a number of sequences
    with one or two known PRF sites from different
    species.
  • The experimental results were compared with those
    obtained by FSFinder2, which was developed by
    Moon et al. based on pattern recognition.
  • Moon, S. et al. (2004) Nucleic Acids Res., 32,
    48844892.
  • Byun, Y. et al. (2006) LNCS, 3991, 284-291.
  • Song, J.J. et al. (2007) Comput. Biol. Chem., 31,
    298-302.

31
Testing data sets
  • The tested sequences in our experiments were
    taken from the PseudoBase and RECODE.
  • PseudoBase collects RNA pseudoknots, some of
    which are thought to function as the PRF
    stimulators .
  • http//biology.leidenuniv.nl/batenburg/PKB.html
  • RECODE contains the translational recoding events
    of PRFs in various biological species.
  • http//recode.genetics.utah.edu/
  • All the tests of PRooF and FSFinder2 were run
    with default parameters, unless otherwise
    specified.

32
Testing data sets of -1 PRF
33
Testing data sets of 1 PRF
34
Sensitivity and specificity
  • Sensitivity (Sen) 100 ? TP /(TPFN)
  • TP number of correctly predicted PRF sites
  • FN number of known PRF sites that were not
    predicted
  • Specificity (Spe) 100 ? TN /(TNFP)
  • TN number of predicted non-PRF sites that
    possess a required slippery sequence but are not
    annotated as PRF sites in the database
  • FP number of incorrectly predicted PRF sites

35
Average sensitivity and specificity
Prediction of PRF Average Sensitivity Average Specificity
PRooF
FSFinder2
  • Indeed, our PRooF greatly improves detection
    sensitivity, when compared with FSFinder2.
  • For the details, please refer to our paper.

36
Reduction of false positives
  • To reduce false positives, FSFinder2 considered
    only two pairs of the partially overlapping ORFs
    whose zero reading frames are the largest two in
    length.
  • Moon et al. (2004) reported that these two pairs
    had the highest probability to contain -1 and 1
    PRF sites.
  • However, currently there seems to be no
    biological evidence to support their observation.

37
Reduction of false positives
  • On the contrary, we utilized InterProScan to
    screen out the partially overlapping ORFs whose
    protein sequences contain no functional
    motifs/domains.
  • As shown in our experiments, this approach of
    functional bioinformatics is very useful to
    reduce the number of false positives.

38
Predicted RNA structures
  • Most of the RNA structures predicted by PRooF are
    H-type pseudoknots and bulged helixes, but many
    RNA structures identified by FSFinder2 are just
    simple stem-loops.
  • Both H-type pseudoknots and bulged helixes are
    believed to be more constructive to promote the
    efficiency of -1 PRFs and some 1 PRFs.
  • The reason is that they have a similar structure
    of bend conformation and are structurally more
    stable than simple stem-loops.

39
Predicted RNA structures
  • The -1 PRF of HIV-1 was first thought to be a
    simple stem-loop, but it was then proved
    experimentally to be a bulged helix.
  • Gaudin C. et al. (2005) J Mol Biol, 349,
    1024-1035
  • The RNA structure predicted by PRooF for the -1
    PRF of HIV-1 is indeed a bulged helix, but the
    one predicted by FSFinder2 is just a simple
    stem-loop.
  • It should be worthwhile to determine
    experimentally the RNA structures in other
    similar cases, where the structures predicted by
    PRooF are H-type pseudoknots or bulged-helixes,
    but just simple stem-loops by FSFinder2 or
    reported in the literature.

40
Outline
  • Introduction
  • Our algorithm
  • Experimental results and Discussion
  • Conclusion

41
Conclusion
  • We improved the pattern recognition approach to
    automatically detecting PRF sites in a given
    genomic sequence with using both structural
    bioinformatics and functional bioinformatics.
  • Based on this approach, we have developed a web
    server PRooF that is open to the public for
    online analysis.
  • http//bioalgorithm.life.nctu.edu.tw/PROOF

42
Conclusion (cont.)
  • In our experiments, the testing results showed
    that PRooF greatly improves sensitivity, when
    compared with FSFinder2.
  • Most of the RNA structures predicted by PRooF are
    H-type pseudoknots and bulged helixes, whereas
    those predicted by FSFinder2 are simple
    stem-loops.
  • PRooF was implemented in a flexible way that it
    allows the user to modify all the default
    parameters such that some exceptional PRF sites
    can still be detected.

43
Acknowledgement
  • Prof. Chin Lung Lu
  • Institute of Bioinformatics Department of
    Biological Science and Technology, National Chiao
    Tung University
  • Mr. Chia-Jung Wu
  • Institute of Bioinformatics, National Chiao Tung
    University
  • Prof. Hien-Tai Chiu
  • Department of Biological Science and Technology,
    National Chiao Tung University
  • Prof. Chuan Yi Tang
  • Department of Computer Science, National
    Tsing-Hua University

44
  • Thank you for your attention
Write a Comment
User Comments (0)
About PowerShow.com