A Method to Detect Gene Structure and Alternative Splice Sites by Agreeing ESTs to a Genomic Sequence - PowerPoint PPT Presentation

Loading...

PPT – A Method to Detect Gene Structure and Alternative Splice Sites by Agreeing ESTs to a Genomic Sequence PowerPoint presentation | free to download - id: 7158f0-MWYyN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

A Method to Detect Gene Structure and Alternative Splice Sites by Agreeing ESTs to a Genomic Sequence

Description:

A Method to Detect Gene Structure and Alternative Splice Sites by Agreeing ESTs to a Genomic Sequence Paola Bonizzoni Graziano Pesole* Raffaella Rizzi – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 27
Provided by: unim186
Learn more at: http://www.bio.disco.unimib.it
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: A Method to Detect Gene Structure and Alternative Splice Sites by Agreeing ESTs to a Genomic Sequence


1
A Method to Detect Gene Structure and
Alternative Splice Sites by Agreeing ESTs to a
Genomic Sequence
  • Paola Bonizzoni Graziano Pesole
    Raffaella Rizzi
  • DISCo, University of Milan-Bicocca, Italy
  • Department of Physiology and Biochemistry,
    University of Milan, Italy
  • Supported by FIRB Bioinformatics Genomics and
    Proteomics

2
Outline
  • Gene structure and alternative splicing (AS)
  • Problem definition and algorithm
  • ASPic program
  • Experimental results and discussion

3
Mechanism of Splicing
DNA
4
Modes of Alternative Splicing
Genomic sequence
5
Modes of Alternative Splicing
1 3
1 2b
6
Why AS is important?
  • AS occurs in 59 of human genes (Graveley, 2001)
  • AS expands protein diversity (generates from a
    single gene multiple transcripts)
  • AS is tissue-specific (Graveley, 2001)
  • AS is related to human diseases

7
Motivations
Regulation of AS is still an open problem
  • predict alternative splicing forms
  • analyze such a mechanism by a representation of
    splicing forms

8
What is available?
Fast programs to produce a single EST alignment
to a genomic sequence Spidey (Wheelan et al.,
2001) Squall (Ogasawara Morishita, 2002)
  • sequencing errors in EST make difficult to locate
    splice sites by alignment
  • duplications, repeated sequences may produce more
    than one possible EST alignment

9
Open Problems
  • Formal definition of AS prediction problem
  • Combined analysis of ESTs alignments related to
    the same gene by agreeing ESTs to a common
    exon-intron gene structure
  • Optimization criteria

10
Formal Definitions
  • Def 1
  • Genomic sequence, G I1 f1 I2 f2 I3 f3 In fn
    In1, where Ii (i1, 2, , n1) are introns and
    fi (i1, 2, , n) are exons
  • Def 2
  • Exon factorization of G, GE f1 f2 f3 fn
  • Def 3
  • EST factorization of an EST S compatible with GE
    is Ss1 s2 sk s.t.
    there exists 1 ? i1 lt i2 lt lt ik ? n
  • st fit for t2, 3, , k-1
  • s1 is a suffix of fi1 and sk is a prefix of fik
  • Def 1
  • Genomic sequence, G I1 f1 I2 f2 I3 f3 In fn
    In1, where Ii (i1, 2, , n1) are introns and
    fi (i1, 2, , n) are exons
  • Def 2
  • Exon factorization of G, GE f1 f2 f3 fn
  • Def 3
  • EST factorization of an EST S compatible with GE
    is Ss1 s2 sk s.t.
    there exists 1 ? i1 lt i2 lt lt ik ? n
  • edit (st, fit) ? error for t2, 3, , k-1
  • edit(s1, suff(fi1)) ? error and edit(sk,
    pref(fik)) ? error

st suff (fit) or st pref (fit) splice variant
11
The Problem
Input - A genomic sequence G - A set of EST
sequences S S1, S2, , Sn Output An exon
factorization GE of G (GE f1, f2, , fn) and
a set of ESTs factorizations compatible with
GE Objective minimize n
12
Example
Genomic sequence G
A2
A1A2
B
D1
C1
D1D2
C1C2
A2
D1
C1
A1A2
B
D1
A2
D1D2
C1C2
A2
D1D2
C1C2
B
D1D2
C1C2
A1A2
EST set S S1, S2, S3
13
Results
  • MEFC is MAX-SNP-hard (linear reduction from
    NODE-COVER)
  • heuristic algorithm

Iterate process to factorize each
EST backtracking to recompute previous EST
factors if not compatible to GE
14
The algorithm
Iterative jth step partial EST factorization of
Si (compute factor sij)
si-1 1
si-1 j-1
si-1 j
si-1 n
Si-1
si1
si j-1
sij
Si
e1
e2
em
G
After placing all the factors sij for the set
S, place the external factors
if (Compatible(em, exon_list)) then add em to
exon_list
otherwise try to place sij elsewhere
If not possible then backtrack
15
The algorithm (more details)
Compute factor sij
G
c2
exon
si1
si j-1
si j
si jy
Si
c1
c2
c3
c4
c5
c1
c2
c1
c1
sij
Sij can be divided into n components ck
(k1,2,,n) At least one of these components for
k from 1 to (n-1) is error-free and can be placed
on G
The algorithm searches a perfect match of c1 on
G
Then the algorithm searches a perfect match of c2
on G
Then the entire factor sij can be placed on G
Find the canonical ag pattern on the left
Find the rightmost gt pattern such that the edit
distance between sijy and the genomic substring
from ag to gt is bounded
Suppose that c1 has no perfect match on G
Suppose that c2 has a perfect match on G
16
ASPic (Alternative Splicing PredICtion)
Input - A minimum length of an exon - A maximum
number of exons in the exon factorization of
the genomic sequence - An error percentage - A
genomic sequence - An ESTs set (or
cluster) Output - A text file for all ESTs
alignments - An HTML file for the exon
factorization of the genomic sequence
17
ASPic data validation
Validation Database
ASAP (Lee et al., 2003)
ASPic INPUT
  • Genomic sequences from ASAP database
  • EST clusters of human chromosome 1 from UniGene
    database

18
Experimental Results
Genomic sequence (official gene name)
Introns detected by ASAP
ASAP introns detected by ASPic
Novel introns detected by ASPic
Genomic shift detected by ASPic
19
Execution times
PENTIUM IV, 1600 MHZ, 256 MB, running Linux
20
An example of data (gene HNRPR)
ASPic finds a novel intron from 2144 to 5333
confirmed by 18 EST sequences
Positions are from 0 for ASPic and from 1 for ASAP
21
An example of data (gene HNRPR, intron 2144-5333)
EST ID
Left and right ends of the two exons
EST exons
Genomic exons
22
WEB site
23
WEB site
24
WEB site
25
  • Responsabili di progetto Prof. Paola Bonizzoni
    Prof. Graziano Pesole
  • Responsabile disegno software Raffaella
    Rizzi
  • Sito WEB Gabriele Ravanelli
  • Rappresentazione grafica Francesco Perego
  • Anna Redondi
  • Analisi dati Francesca Rossin
  • Altri contributi Gianluca Dellavedova

26
GRAZIE!
About PowerShow.com