Use of Artificial Neural Networks and effects of amino acid encodings in the membrane protein predic - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Use of Artificial Neural Networks and effects of amino acid encodings in the membrane protein predic

Description:

Side chain groups (the R group) may vary in size, shape, charge, hydrophobicity, and reactivity ... Due to the hydrophobicity, and detergent micelles formation ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 16
Provided by: sbo55
Category:

less

Transcript and Presenter's Notes

Title: Use of Artificial Neural Networks and effects of amino acid encodings in the membrane protein predic


1
Use of Artificial Neural Networks and effects of
amino acid encodings in the membrane protein
prediction problem
  • Subrata K Bose1, Antony Browne2, Hassan Kazemian3
    and Kenneth White4
  • 1 MRC Clinical Sciences Centre, Faculty of
    Medicine, Imperial College London, UK
  • 2 Department of Computing, School of Engineering
    and Physical Sciences, University of
  • Surrey, Guildford
  • 3 Intelligent Systems Research Centre,
    Department of Computing, Communication
  • Technology and Mathematics, London Metropolitan
    University, London, UK
  • 4 Institute for Health Research and Policy,
    Departments of Health and Human Sciences,
  • London Metropolitan University, London, UK
  • Email subrata.bose_at_ic.ac.uk

2
  • Proteins
  • Proteins are essential macromolecules of life
    that constitute about half of a living cells dry
    weight
  • They are polymers of one or more uncleaved chains
    of amino acids ranging typically from two
    hundreds to few thousands
  • Proteins play crucial roles in almost in
    all-biological processes
  • Example
  • Almost all known enzymes are proteins
  • Hormones, are also proteins
  • .

3
  • Amino Acids
  • There are 20 different amino acids
  • Amino acids have similar structures (in Fig)
  • Side chain groups (the R group) may vary in size,
    shape, charge, hydrophobicity, and reactivity
  • The amino acids could be considered as the
    alphabet from which the proteins are written
  • Amino acids form long chains of polymer via
    peptide bond to form peptides
  • Long chains of a-amino acids make Proteins
  • .

Fig Amino Acid's Structure
4
  • Structure of Protein
  • The structure of the protein can be classified in
    the following fashion
  • .

Level one
Primary structure
Level two Secondary structure
Level three Tertiary structure
Level Four Quaternary structure
5
  • Membrane Protein (MP)
  • Proteins can be classified as being either
    soluble, such as the proteins found in blood or
    in the liquid compartments of cells (cytosol), or
    bound to cell membranes
  • All living cells are enclosed by a cytoplasmic
    membrane to separate the cytoplasm from the
    outside environment
  • A MP is attached to, or associated with the
    membrane of a cell or an organelle
  • The part of the protein, which makes contact with
    the cell membrane, is called the transmembrane
    domain (TMs) or the membrane spanning region
    (MSR)
  • .

6
  • Pharma-economical Importance of MPs
  • MPs are accountable for a number of key tasks in
    a living cell
  • Comprise some extremely essential biochemical
    components for cell-cell signaling,
    transportations of nutrients, solvents and ions
    across the membrane, cell-cell adhesion and
    intercellular communication
  • Many drugs or drug adjuvants may interfere with
    membrane dynamics and subcellular localisation of
    enzyme systems
  • From a pharma-economical perspective, MPs
    constitute 75 of possible targets for novel
    drugs
  • 3D structures of proteins derived by X-ray
    crystallography have been determined for about
    15000 proteins, only about 80 of these are MPs
  • The human proteome is estimated to be comprised
    of about 30 proteins containing membrane
    spanning regions (MSRs)
  • .

7
  • System Method
  • .

Fig Neural Networks (NNs) architecture
8
  • Prediction Problem
  • MPs are not soluble in aqueous buffer solutions
    and denature in organic solvent because of the
    presence of the hydrophobic and hydrophilic
    region on the surface of MPs
  • Due to the hydrophobicity, and detergent micelles
    formation around the hydrophobic regions, it is
    hard to induce MPs to form well-ordered 3D
    crystals
  • The precise and reliable prediction of membrane
    protein secondary structure is an important
    intermediate step towards a fuller understanding
    of protein folding
  • No explicit methods for prediction of the
    structure or packing of transmembrane proteins
    from primary amino acids sequences exist today
  • Re-evaluation studies show that available methods
    are far from achieving 95 reliability in
    prediction and overrated the accuracy of their
    methods by 15-50
  • Mainly due to the lack of non-redundant membrane
    datasets which were used in the learning and
    tuning processes

9
  • Dataset
  • Two MP datasets with known biochemical
    characterisations of membrane topology
  • For Nonmembrane Protein DB-NTMR dataset was
    chosen
  • Amino acid Sequence Encodings
  • Four binary encoding schemas are adopted
  • 20 bits- an orthogonal distributed 20 binary bits
    string consisting of nineteen 0s and one 1
  • 16 bits-a product of a genetic search algorithm
    and 30 shorter than the traditional orthogonal
    representations
  • 9 bits -a feature-based grouping of amino acids
    where hydrophobicity, positive charge, negative
    charge, aromatic side chain, aliphatic side
    chain, small size, bulk size properties and two
    correction bits are used to discriminate similar
    amino acids
  • 5 bits - 5 bits representation is designed based
    on a classification of 20 amino acids which
    unites hydropathy index, molecular weight , and
    pI values together
  • .

10
  • Architecture of Neural Networks
  • The multi-layered feed-forward neural network
    used here consists of an input layer, a single
    hidden layer and an output layer
  • Training was done by the scaled conjugate
    gradient algorithm
  • This algorithm was applied to all the encodings
    separately to identify the best encodings
  • 75 of the source data was randomly taken for
    training the network, and the remaining 25 of
    the data were selected as the test data
  • To detect when the over-fitting starts and to
    prevent the over-fitting problem, the training
    set was further partitioned into two disjoint
    subsets training and validation datasets
  • .

11
  • Results
  • .

12
  • Results
  • Increasing the ANNs binary input vectors number
    may improve the classification rate and
    robustness of the system slightly, but
    incorporation of physico-chemical information
    didnt improve the networks performance
    dramatically
  • This could be due to these binary encodings dont
    initiate any artificial ordering uniform weight
    was given to each amino acid for the network
    learning purposes
  • Binary encodings have the advantage of not
    importing any artificial correlations between the
    amino acids during the learning process
  • Some earlier research suggests that binary input
    data may improve an ANNs learning but according
    to the best of our knowledge, there is no
    recorded evidence on the effect of binary input
    data on the robustness of the ANNs, especially in
    the area of membrane protein prediction
  • .

13
  • Acknowledgement
  • London Metropolitan University
  • This project is fully funded by the London
    Metropolitan University
  • .

14
  • References
  • Baldi.P. and Pollastri, G. (2002). Machine
    Learning Structural and Functional Proteomics.
    IEEE Intelligent Systems (Intelligent Systems in
    Biology II).
  • Bose, S., Kazemian, H., White, K. Browne., A.
    (2005). Use Of Neural Networks To Predict And
    Analyse Membrane Proteins in the Proteome. BMC
    Bioinformatics. ISSN 1471-2105 Vol. 6 (Suppl 3)
    P3.
  • Bose S., Kazemian H.B., White K. Browne A.
    (2006), Presenting a Novel Neural Network
    Architecture for Membrane Protein Prediction.
    Proceedings 10th International Conference on
    Intelligent Engineering Systems, London, UK June
    26-28(ISBN of printed proceedings1-4244-9708-8
    and ISBN of CD proceedings 1-4244-9709-6) Brusic
    V., Rudy G. and Harrison L.C. (1995). Prediction
    of MHC binding peptides using arti- ficial neural
    networks. Complexity International volume 2, ISSN
    1320-0682.
  • Chandonia, J M. and Karplus M (1996). The
    importance of larger data sets for protein
    secondary structure prediction with neural
    networks. Protein Science, 5, 768-774.
  • de la Maza M (1994). Generate, Test and Explain
    Synthesizing Regulatory Exposing Attributes in
    Large Protein Databases. Proceedings of the
    Twenty-Seven Annual Hawaii International
    Conference on System Sciences.
  • Ikeda, M., Arai, M., Okuno, T. and Shimizu,
    T.(2003). TMPDB a database of experimentallychara
    cterized transmembrane topologies Nucleic Acids
    Res., January 1, 2003 31(1) 406 - 409.
  • Ito A. (2000). Mitochondrial processing
    peptidase multiple-site recognition of precursor
    proteins. TICB, 1025-31.
  • Kihara, D., Shimizu, T. and Kanehisa, M. (1998)
    Prediction of membrane proteins based on
    classification of transmembrane segments. Protein
    Eng., 11, 961-970.
  • Qian, N. and Sejnowski, T. J. (1988). Predicting
    the secondary structure of globular proteins
    using neural network models. Journal of Molecular
    Biology. 202, 865-884.
  • Parris, N. Onwulata, C. (1995). Food Proteins
    and Interactions. In Molecular Biology and
    Biotechnology A comprehensive desk reference, (R.
    A. Meyers ed.) pp. 320-323, Cambridge UK VCH
    Publishers.
  • Pasquier, C., and Hamodrakas, S. J. (1999a) An
    hierarchical artificial neural network system for
    the classification of transmembrane proteins,
    Protein Eng., 12(8), 631-4.
  • Pasquier C, Promponas VJ, Palaios GA, Hamodrakas
    JS, Hamodrakas SJ(1999b).A novel method for
    predicting transmembrane segments in proteins
    based on a statistical analysis of the SwissProt
    database the PRED-TMR algorithm. Protein Eng.
    12(5)381-5.
  • .

15
Use of Artificial Neural Networks and effects of
amino acid encodings in the membrane protein
prediction problem
  • Subrata K Bose1, Antony Browne2, Hassan Kazemian3
    and Kenneth White4
  • 1 MRC Clinical Sciences Centre, Faculty of
    Medicine, Imperial College London, UK
  • 2 Department of Computing, School of Engineering
    and Physical Sciences, University of
  • Surrey, Guildford
  • 3 Intelligent Systems Research Centre,
    Department of Computing, Communication
  • Technology and Mathematics, London Metropolitan
    University, London, UK
  • 4 Institute for Health Research and Policy,
    Departments of Health and Human Sciences,
  • London Metropolitan University, London, UK
  • Email subrata.bose_at_ic.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com