An Algorithm for Helices Mapping between 3D and 1D Protein Structure - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

An Algorithm for Helices Mapping between 3D and 1D Protein Structure

Description:

... prediction flowchart drawn by Dr. Robert Russel, http://www.bmm.icnet.uk ... Wen Jiang, Matthew L. Baker, et al., Bridging the information gap: computational ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 32
Provided by: road
Category:

less

Transcript and Presenter's Notes

Title: An Algorithm for Helices Mapping between 3D and 1D Protein Structure


1
An Algorithm for Helices Mapping between 3D and
1D Protein Structure
  • Presenter
  • Yonggang lu
  • Computer Science Department, NMSU
  • Sept. 2004

2
Protein
  • The protein is a very important component of a
    cell.
  • The protein can be considered as
  • a sequence of amino acids in one dimensional
    (1-D) space
  • a folded chain of amino acids in three
    dimensional (3-D) space
  • Most of the functions required by life are
    determined by proteins.
  • These functions are closely related to the
    structure of a protein, especially the 3
    dimensional structure.

3
Amino Acid Sequence for protein 1GP1
  • CHAIN A
  • AAALAAAAPRTVYAFSARPLAGGEPFNLSSLRGKVLLIENVASLXGTTV
    RDYTQMNDLQRRLGPRGLVVLGFPCNQFGHQENAKNEEILNCLKYVRPGG
    GFEPNFMLFEKCEVNGEKAHPLFAFLREVLPTPSDDATALMTDPKFITWS
    PVCRNDVSWNFEKFLVGPDGVPVRRYSRRFLTIDIEPDIETLLSQGASA
  • CHAIN B AAALAAAAPRTVYAFSARPLAGGEPFNLSSLRGKVLLIENVA
    SLXGTTVRDYTQMNDLQRRLGPRGLVVLGFPCNQFGHQENAKNEEILNCL
    KYVRPGGGFEPNFMLFEKCEVNGEKAHPLFAFLREVLPTPSDDATALMTD
    PKFITWSPVCRNDVSWNFEKFLVGPDGVPVRRYSRRFLTIDIEPDIETLL
    SQGASA

4
The 3-D Structure of the Protein 1GP1 from PDB
Chain A
Chain B
5
Protein Structure Hierarchy
  • Primary structure
  • AA sequence
  • Secondary structure
  • ahelix, ßstrand, coils and turns
  • Tertiary structure
  • repeated 3 dimensional structure
  • Quaternary structure
  • association of multiple protein chains into a
    single protein

6
Protein Structure Determination
  • Experimental methods
  • Edman degradation (1-D sequencing)
  • X-ray crystallography (3-D)
  • NMR spectroscopy (3-D)
  • Prediction Methods
  • Secondary Structure Prediction
  • Tertiary Structure Prediction

7
Why Structure Prediction?
  • Problems with experimental methods
  • NMR and X-ray methods are very expensive
  • Difficulties in sample preparation
  • Technical difficulties for large molecules
  • Recent advances in molecular biology and the
    equipment have incurred the rapid sequencing of
    large genomes of several species
  • Human Genome Project

8
Structure Prediction Methods
  • Secondary Structure Prediction Methods
  • PHD
  • PSIPRED
  • Jnet
  • Tertiary Structure Prediction Methods
  • Comparative/Homology Modeling
  • Fold Recognition
  • Ab Initio Folding

9
Tertiary structure prediction flowchart drawn by
Dr. Robert Russel, http//www.bmm.icnet.uk/people/
rob/CCP11BBS/flowchart2.html
10
Recent Developments in Structure Prediction
  • Rosetta Ab Initio Structure Prediction Method
  • Combination of the Tertiary and Secondary
    Structure Prediction
  • Using experimental data to help the structure
    prediction
  • Cryoelectron microscopy
  • The Computational Tools for Bridging the
    Information Gap

11
The 1-D and 3-D Mapping Problem
  • The developed computational tools can extract the
    secondary structure information from the electron
    density map of low to intermediate resolutions
  • Wen Jiang, et al. has developed two programs
    helixhunter and foldhunter
  • Combination of the secondary structure prediction
    and the length constraint.
  • Building a mapping between 1-D sequence and 3-D
    structure for the secondary structure element
    (SSE)

12
Example of the PHD Prediction Output
PHD prediction result for 2TGP chain Z
13
Description of the Mapping Problem
  • The actual lengths of SSEs are determined from
    the electron density map
  • The PHD secondary structure prediction gives the
    probability for assigning helix, strand, and
    coil to every position of the sequence
  • A good method is required for mapping the SSEs to
    the sequence using all the information

14
An Example of the Mapping Problem ----
Protein 1CC5
SS(3D) Length -------------------- H1
13 H2 12 H3 8 H4
8
Helix hunter
Modeling the 3-D structure
Mapping
PHD prediction results
....,....1....,....2....,....3....,....4....
,....5....,....6....,....7....,....8....,....9 AA
GGGARSGDDVVAKYCNACHGTGLLNAPKVGDSAAWKTRADAKGGLDGL
LAQSLSGLNAMPPKGTCADCSDDELKAAIGKMSGL prH-00012566
87776777875422111112211234444445422377888888753321
0012101111267899999999420
probability
position
REAL ______HHHHHHHH_________________HHHHHHHHHHHH_
___HHHHHHHH______________HHHHHHHHHHHHH_ LEN
8 12 8 13
15
Our Approach to the Mapping Problem
  • Building the initial position library
  • Tree representation of the solution space
  • Using constraints to trim the tree structure
  • Depth-first search (Backtracking search)
  • Building the mapping library from the search
    results

16
Producing the Initial Library
  • Extract the probability information from PHD
    prediction
  • Get the actual lengths of the SSEs
  • Calculate the score (accumulative probability)
    for assigning each real SSE for every position.
  • Store the positions with high scores to the
    initial library

17
Testing of the Initial Library
The testing result of the initial library for
protein 1L58 (only for helices)
18
The Sequential Mapping Program
The initial library
The mapping tree
19
Constraints and Simplifications
  • No two SSEs can overlap
  • All the SSEs must have positions on the protein
    sequence
  • The average score of an intermediate solution
    (partial SSE assignments) is required to be
    greater than a threshold
  • Similar results are represented by one candidate

20
Depth-first Search of the Solution Space
  • Using depth-first search is a good choice since
    it saves lots of memory
  • Depth-first search is implemented by a stack
    structure which is used to store the intermediate
    solutions (tree nodes)
  • After the children of a tree node are produced,
    they are selected by the constraints
  • Only these selected nodes are pushed back into
    the stack.
  • The leaves of the tree are collected in a result
    queue which forms the mapping library (the final
    results)

21
The Results of the Running Test
22
The Parallel Mapping Program
  • The MPI is used for the parallel programming
  • A fully decentralized dynamic scheduling
    technique is used for balancing the loads of
    processors
  • Each processor maintains a task queue and a
    result queue.
  • A mixed queue structure is used for storing the
    task nodes to minimize the communications between
    processors

23
(No Transcript)
24
The Mixed Queue Structure the Parallel
Processing
25
The running time in seconds for different number
of processors the prediction results
26
The speedup of the parallel program for different
number of processors
27
Conclusion
  • Our program can do the mapping for small to
    medium-sized proteins. Parallel processing is a
    success.
  • And for large protein, improvements of the
    algorithm are necessary.
  • More constraints need to be found for minimizing
    the solution library.

28
References
  • Helixhunter and Foldhunter
  • Wen Jiang, Matthew L. Baker, et al., Bridging the
    information gap computational tools for
    intermediate resolution structure interpretation.
    J. Mol. Biol., 2001. 308 p. 1033-1044
  • PDB website
  • http//www.rcsb.org/pdb/
  • PHD website
  • http//www.embl-heidelberg.de/predictprotein/predi
    ctprotein.html
  • Prediction Flow chart
  • http//www.bmm.icnet.uk/people/rob/CCP11BBS/flowch
    art2.html

29
Questions?
  • Thank you!

30
Acknowledgements
  • Thanks
  • My advisor, Dr. Jing He, for bringing me to the
    area of bioinformatics, and her great help in the
    whole process of program design.
  • Dr. Pontelli for his great help in designing the
    parallel program
  • My wife, Yuxia Wang

31
An Algorithm for Helices Mapping between 3D and
1D Protein Structure
  • Presenter
  • Yonggang lu
  • Computer Science Department, NMSU
  • Sept. 2004
Write a Comment
User Comments (0)
About PowerShow.com