Matching Protein Sheet Partners by Feedforward and Recurrent Neural Network - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Matching Protein Sheet Partners by Feedforward and Recurrent Neural Network

Description:

P. Baldi, G. Pollastri, C. Anderson, and S. Brunak. Cho, Dong-Yeon ... Two input windows of length W ... 9, and 11) are used as the size of two input windows. ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 19
Provided by: chodon
Category:

less

Transcript and Presenter's Notes

Title: Matching Protein Sheet Partners by Feedforward and Recurrent Neural Network


1
Matching Protein ?-Sheet Partners by Feedforward
and Recurrent Neural Network
  • Proceedings of Eighth International Conference on
    Intelligent Systems for Molecular Biology
    (ISMB2000), pp. 25-36
  • P. Baldi, G. Pollastri, C. Anderson, and S.
    Brunak
  • Cho, Dong-Yeon

2
Introduction
  • Prediction of the Secondary Structure of Proteins
  • Understanding their three dimensional
    conformations
  • ?-helices are built up from one contiguous region
    of the polypeptide chain.
  • ?-sheets are built up from a combination of
    several disjoint regions.
  • Previous Studies
  • The best existing methods for predicting protein
    secondary structure achieve prediction accuracy
    in 75-77 range.
  • ?-sheet is almost invariably the weakest category
    in terms of correct percentages.
  • Prediction of Amino Acid Partners in ?-sheets

3
Data Preparation
  • Selecting the Data
  • 826 protein chains from the PDB select list of
    June 1998
  • Assigning ?-sheets Partners

A2-B2 A3-B3 B2-C2 B3-C3 C2-D2 C3-D3
4
Statistical Analysis
  • First Order Statistics
  • The frequency of occurrence of each amino acid

General amino acid frequencies in the data
Amino acid frequencies in ?-sheets
5
  • The ratio of the frequencies in ?-sheets over data

6
  • Second Order Statistics
  • The conditional probabilities P(XY) of observing
    a X knowing that the partner is Y in a ?-sheet

7
  • Logo representation

8
  • Length Distribution
  • Interval distances between paired ?-strands,
    measured in residue positions along the chain

9
Artificial Neural Network Architecture
  • Feedforward Neural Network
  • Large input windows
  • They tend to dilute sparse information present in
    the input that is really relevant for the
    prediction.
  • Two-window approach
  • One can either provide the distance information
    as a third input to the system or one can train a
    different architecture for each distance type.

10
  • The architecture
  • Two input windows of length W
  • The number D of amino acid is also given as an
    input unit to the architecture with scaled
    activity D/100.
  • The goal is to output a probability reflecting
    whether the two amino acids located at the center
    of each window are partners or not.

11
  • Recurrent Neural Network
  • Bi-directional recurrent neural network (BRNN)
  • Input layer
  • Forward and backward Markov chain
  • Output layer

12
Experiments and Results
  • Data
  • Randomly split the data 2/3 for training and 1/3
    for test
  • Extremely unbalanced
  • At each epoch, all the 37008 positive examples
    are presented with 37008 randomly selected
    negative examples.
  • The total balanced percentage is the average of
    the two percentages obtained on the positive and
    negative examples.

13
  • Results
  • Feedforward neural network
  • The best architecture

14
  • The predicted second order statistics

15
  • Five-fold cross validation
  • BRNN Architecture
  • Three values (7, 9, and 11) are used as the size
    of two input windows.
  • Length 7 yields again the best performance.

16
  • Five-fold cross validation
  • Ensemble architecture
  • The ensemble of 3 BRNNS
  • Five-fold cross validation

17
  • Summary of all the five-fold cross validation
    results
  • Profile approach
  • The profile approach was used as input to the
    artificial neural network.
  • The overall performance is comparable, but not
    any better.
  • Profiles may provide more robust first order
    statistics, but weaker intrasequence correlation.

18
Discussion
  • We have developed a NN architecture that predicts
    ?-sheet amino acid partners with a balanced
    performance close to 84 correct prediction.
  • It is insufficient by itself to reliably predict
    strand pairing because of the large number of
    false positive predictions.
  • Some of directions for future work
  • Profiles on the BRNNs
  • Reduce the number of false positive predictions
  • Improve the quality of the match
  • Use of raw sequence information in addition to
    profiles
  • ?-sheet predictor
  • Various combinations of the present architectures
Write a Comment
User Comments (0)
About PowerShow.com