Genome Wide Searches for RNA Secondary Structure Motifs - PowerPoint PPT Presentation

About This Presentation
Title:

Genome Wide Searches for RNA Secondary Structure Motifs

Description:

Genome Wide Searches for RNA Secondary Structure Motifs Russell S. Hamilton Davis Lab Wellcome Trust Centre for Cell Biology Drosophila melanogaster – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 23
Provided by: dave198
Category:

less

Transcript and Presenter's Notes

Title: Genome Wide Searches for RNA Secondary Structure Motifs


1
Genome Wide Searches for RNA Secondary Structure
Motifs
Russell S. Hamilton Davis Lab Wellcome Trust
Centre for Cell Biology
?
Drosophila melanogaster
2
Introduction RNA Localization 2
mRNA cis-acting signal
Trans-acting factors Dynein
-

microtubules
RNA Localization is a mode of targeting various
proteins to their site of function Cis-acting
signals in the mRNA are recognised by
trans-acting factors bound to the dynein
motor Translation of the mRNA into protein is
blocked during transport The mRNA is anchored at
the site of function before being translated to
protein (Delanoue Davis, 2005, Cell, in press)
3
Introduction gurken 3
Localizing mRNA in oocyte
D
grk
A
P
osk
bcd
gurken encodes a TGFa homologue
V
gurken is localized to the dorso/anterior corner,
forming a cap around the oocyte nucleus and
establishes the dorso/ventral axis gurken
localization has been shown to be dynein
dependent (MacDougal et al, 2003, Dev. Cell, 4,
307-19) gurken localization signal has been
mapped to 64nt necessary and sufficient for
localization (Van De Bor Davis, 2004, Curr.
Opin. Cell Biol. 16, 300-7 ) gurken also
localizes in the embryo
4
Introduction I Factor 4
I Factor is a retrotransposon (or transposable
element), which inserts itself into the genome of
an organism I Factor has been found to localize
in a similar manner to gurken (Van De Bor,
Hartswood, Jones, Finnegan Davis) The
localization signal has been mapped to a 58nt
signal necessary and sufficient for localization.
Van De Bor
5
Introduction gurken and I Factor 5
Sequence Similarity ID 34 gurken
AAGTAATTTTCGTGCTCTCAACAATTGTCGCCGTCACAGATTGTTGTTCG
AGCCGAATCTTACT 64 Ifactor ---TGCACACCTCCCTCGTC
ACTCTTGATTTT-TCAAGAGCCTTCGATCGAGTAGGTGTGCA-- 58

Structural
Similarity
Are there more examples in the Drosophila genome
using a similar mechanism of localization? Search
by secondary structure not sequence
V. Van Der Bor, D. Finnegan, E. Harstwood and C.
Jones
6
Method Outline 6
Database
Genome sequences
Folded Genome sequences
Comparison with grk I Factor structures
7
Method RNALFOLD 7
RNALFOLD Folds large genomic sequences outputting
stable structures of a given size Similar to
mfold, but optimised for folding on genome wide
scale
2L chromosome arm genomic sequence
  • Window Length user defined
  • Use 64 and 58 (grk I Factor LEs)

Stable Structures
RNALfold Hofacker et al (2004) Bioinformatics 20,
191-198
8
Method RNAdistance RNAforester 8
  • RNAdistance RNAforester
  • Structures represented in bracket format
  • Minimal representation maintaining all
    structural characteristics
  • Structures then aligned (not by sequence) with
    the query structure e.g. gurken LE
  • Scores can be weighted by sequence length and
    total number of base pairs

..(((((.....))))). Matches
score .-(-(((-....))))-. Mismatches -
score ( base pair . unpaired
base - gap
RNAdistance Global Structure Comparison
Hofacker (1994) Monatsh.Chem. 125,
167-188 RNAforester Local Structure
Comparison Hochsmann (2003) Proc. Comp. Sys.
Bioinf. (CSB 2003)
9
Method RNAMotif 9
Flexible secondary structure definition and
searching algorithm Two step process Step 1.
Create a structure description Step 2. Use the
description to find matching structures in a
sequence database Uses Mfold (and pknots) for
secondary structure predictions Output can be
ranked by thermodynamic stability User Defined
Scoring Based on if/then/else statements e.g.
if loop has 6-8 bases then
score 10 else score -
10 Algorithm Summary Description converted to a
tree structure Sequence being matched, has
secondary structure converted to tree
structure Then the matching can occur.
Macke, T.J. et al (2001) Nucl., Acids., Res., 29,
4724-4735
10
Method RNAMotif 10
  • Define base pairings allowed
  • (in addition to Watson-Crick)
  • Define stems, loops, and bulges
  • Including number of nucleotides
  • Setting a range 0-N means it can either
  • be present or not
  • Can also put in sequence constraints
  • Including tolerated mismatches
  • Can search for pseudoknots, triplexes
    quadruplexes
  • Very flexible method of describing secondary
    structures

11
Method RNAMotif 11
4 Description files so far 1. Basic 2900
hits Matches both gurken and I factor LEs 2.
Basic score 2900 hits Scores nearer gurken as
positive Scores nearer I factor as negative 3.
Basic score seq contraint UU 394 hits UU in
bulge present in both gurken and I factor 4.
Basic score seq contraint UU CAA/AAC 151
hits CAA/AAC stem1 present in both gurken and I
factor
12
Method Overview 12
Take all available sequence databases Predict
all stable secondary structures Calculate
similarity between grk/Ifactor and stable
structures Pattern match structures against an
RNAMotif description Results put in database
and accessed via web interface
13
Computational Infrastructure 13
Computational requirements are beyond desktop
PCs Main requirements are for processing power
and enough storage space for the sequences being
searched and the database of matching structures
  • Processing
  • 6 processing nodes
  • Pentium 4 HT
  • 1GB RAM

Development Platform
Data Storage RAID Array File Server Tape Backup
Robot
Web Server Linked to Database
14
Web Interface Searching 14
To stop your browser crashing, you can limit the
number of hits displayed
Filter by percentage of the sequence deemed to
have low complexity
Select the RNAMotif structure description used in
the searches
Narrow down the search by CG, TE, CR or
individual identifiers
X
http//wcbweb.icmb.ed.ac.uk/ilan/bioinformatics.h
tml
15
Web Interface Search Results 15
Custom RNAMotif Score
RNAdistance scores displayed
RNAMotif raw output showing how sequence matches
the structure description Indicates if the
sequence has regions of low complexity/repeat
regions (option to filter these out)
16
Web Interface Gene Mapping 16
17
Web Interface Conservation Assessment 17
18
Results Candidate Injections 18
We are currently in the process of injecting
candidates from the database into oocytes and
embryos to determine if the RNA is localized.
Results of candidate injections are stored in the
database
There have been suggestions that up to 20 of
Drosophila genes may localize in the oocyte
and/or embryo So we want to show that our
method is able to enrich for localizing genes
19
Future Work Expanding Searches 19
  • Depending of the success of the experimental
    localization assays
  • Expand the searches to
  • Other Drosophilid genomes
  • 12 will be sequenced in the near future
  • Mammalian genomes (particularly human)
  • Will require considerable computational power
  • Search for LINE/SINE elements in human
    (transposon equivalents)
  • Develop the web interface to enable real time
    searches to be performed on genes/genomes of
    interest
  • Requires massive computational power

20
Future Work Tertiary Structure 20
Squid Protein gurken mRNA is known to bind Squid
protein Used homology modelling to predict squid
tertiary structure (2.5Å) (Hamilton
Soares) RNA tertiary structure
prediction Secondary structure alone may not be
sufficient for finding similar structures Experi
mental Structure Determination RNA Protein -
X-Ray and/or NMR RNA only - NMR
RNA protein 3D Structure
Staufen RNA Ramos et al, 2000, EMBO, 19,
997-1009
21
Future Work Machine Learning 21
Long Term Future Support Vector Machines
(SVMs) Take sequence structure for localizing
and non-localizing matches ( other
data) Algorithm learns how to separate
localizing from non-localizing
Problem is we dont have enough data at the
moment However with all the candidate injections
we will hopefully generate enough data for
localizing and non-localizing genes
22
Acknowledgements 22
Davis Lab Ilan Davis Veronique Van De Bor Georgia
Vendra Hille Tekotte Renald Delanue Carine
Meignin Alejandra Clark Isabelle Kos Richard
Parton
Finnegan Lab David Finnegan Eve Hartswood Cheryl
Jones
Bioinformatics Discussions Alastair Kerr Systems
Administration Paul Taylor Homology
Modelling Dinesh Soares
Funding
Software
Write a Comment
User Comments (0)
About PowerShow.com