Prediction of human mRNA donor and acceptor sites from the DNA sequence

About This Presentation

Title:

Prediction of human mRNA donor and acceptor sites from the DNA sequence

Description:

combination: decision tree and probabilistic model. Brunak s NN. Haussler s ... However: it might happen, that FNNs become better in the long range, rules might ... – PowerPoint PPT presentation

Number of Views:124

Avg rating:3.0/5.0

Slides: 31

Provided by: Barb403

Category:

more less

Transcript and Presenter's Notes

Title: Prediction of human mRNA donor and acceptor sites from the DNA sequence

1
Prediction of human mRNA donor and acceptor sites
from the DNA sequence

Article by
S.Brunak, J.Engelbrecht, S.Knudsen

2
Outline

Reminder donor and acceptor sites
Reminder Standard feedforward networks
Experiments
Another classical alternative rule insertion and
extraction
Discussion

3
Reminder donor and acceptor sites
4
Donor and acceptor sites

DNA is copied to (pre-m)RNA, non-coding regions
(introns) are spliced out within the nucleus,
coding regions (exons) form the gene
donor exon/intron boundary, acceptor
intron/exon boundary
It can be expected that splice sites can be
predicted up to a certain accuracy depending on a
local window around the possible splice sites
only
We have already seen SVM achieves good
classification accuracy

branch site
A64G73G100T100G62A68G84T63
C65A100G100
18-40 bp pyrimidines, i.e. T,C
5
Donor and acceptor sites

seminal paper which establishes neural networks
as a very good method for this task
previous alternatives finite automata (lt50)
scoring schemes based on nucleotide weight
tables, open reading frames, free energy of mRNA
and snRNP, coding statistics (80) previous
FNNs (lt95)
Data from GenBank (annotated collection of all
publicly available DNA, http//www.ncbi.nlm.nih.go
v/Genbank) only human splice sites with
sufficient information ? 449 donor sites, 449
acceptor sites, 2/3 training set, 1/3 test set,
sequences of training/test set have small
overlap, negatives via shifting the window

6
Reminder standard feedforward networks
7
Standard feedforward networks
.. are based on simple neurons
w1
w2
?
s(wtx - ?)

wn
s(t) sgd(t) (1e-t)-1
8
Standard feedforward networks
.. combine the neurons in a network architecture
x
y
fw Rn ? Ro
9
Standard feedforward networks

.. can be trained efficiently
goal learn unknown fRn ? Ro given examples
f(x1),,f(xm)
Training
Choose an architecture (n input neurons, o output
neurons, number of hidden neurons determined by
trial and error)
Choose the weights (regression such that the
examples from the training set are matched as
accurately as possible)
Test the resulting function on the test set

10
Standard feedforward networks
? (1,0)
i.e. f R4k ? R2 is to be learned
11
Experiments
12
Experiments

Donor sites test set contains 118 donors, 190987
non-donors
Evaluation highly unbalanced distribution ?
Matthews correlation coefficient (in -1,1)
(Pxtrue positives, Pxffalse positives, Nxtrue
negatives, Nxffalse negatives)

13
Experiments

Value of C(x) for different sizes of the window
(w) and different numbers of hidden neurons

111 of 118 donors correctly classified
(94.1) 11789 of 11800 non-donors correctly
classified (99.9)
14
Experiments

Effect of resetting the output cutoff level

Probability of misclassified donors compared to
the distance from a true site
15
Experiments

Acceptor sites test set contains 118 acceptors,
190987 non-acceptors

100 acceptors classified correctly (87.4) 11800
non-acceptors classified correctly
(99.8) optimum window size includes
polypyrimidine tract
16
Experiments

Effect of resetting the output cutoff level

Probability of misclassified acceptors compared
to the distance from a true site
17
Experiments
comparison of methods on another human splice set
Genesplicer a new computational method for
splice site reognition, Pertea,Lin,Salzberg,
Nucleid Acid Research 29(5)1216-1221, 2001
18
Experiments
comparison of methods on UCI-benchmark, from the
SVM-paper (Sonnenburg et al.)
some NN
rules
19
Rule insertion and extraction
20
Rule insertion and extraction

seminal approach Knowledge Based Artificial
Neural Networks (KBANN) by G.Towell,J.Shavlik
assumption a set of approximate propositional,
acyclic if/then rules (acyclic Horn clauses) is
given (e.g. a - b,c,d,e)
transfer it into a FNN

true/false0/1
21
(No Transcript)
22
Rule insertion and extraction

find nice rules
transfer the rules into a network
extend the network by additional neurons (and
possibly additional inputs) and initialize the
additional rules with small values
find additional nice training examples possibly
not yet covered by the rules
train the network

23
Rule insertion and extraction
24
Rule insertion and extraction
25
Rule insertion and extraction
Training backprop with crossentropy error
(?(1-yi)lg oi yi lg oi) Structure of the
network and initialization improve the
performance However it might happen, that FNNs
become better in the long range, rules might get
lost here KBANN is better for acceptors, but
not for donors and none of both.
26
Experiments
27
Rule insertion and extraction

Extraction of rules
each neuron represents a variable
the weights of each neuron are clustered into
nearly identical weights, pruning of irrelevant
weights
m-of-n rules are extracted from each neuron via
enumeration of (some of) the possibilities

if 2 of (2,8,9) and 2 of (1,3,5,7) and none of
(4,6) then on or
yields usually a large and incomprehensible set
of rules
28
Rule insertion and extraction

Just a remark there exists a very nice
alternative from Duch et al. (not applied to
splice sets) to infer logical rules by the help
of NNs from scretch
MLP2LN

for each output class separately rules of
increasing complexity
train s.t. a formula arises E ?1? wij2
?2?wij2(wij-1)2(wij1)2

L units
R units
29
Discussion
30
Discussion

FNNs constitute a very simple and efficient tool
for splice site recognition
interesting often highly unbalanced sets, hence
adequate error measures are important
(correlation coefficient), even gt99 correct
prediction of non-splice sites might lead to 51
(non-correct splice sites, correct splice sites)
prior knowledge is available, hence rule
insertion is a striking alternative (with some
benefits)

Write a Comment

User Comments (0)

About PowerShow.com

Prediction of human mRNA donor and acceptor sites from the DNA sequence - PowerPoint PPT Presentation

Prediction of human mRNA donor and acceptor sites from the DNA sequence

combination: decision tree and probabilistic model. Brunak s NN. Haussler s ... However: it might happen, that FNNs become better in the long range, rules might ... – PowerPoint PPT presentation