Title: An Algorithm for Helices Mapping between 3D and 1D Protein Structure
1An Algorithm for Helices Mapping between 3D and
1D Protein Structure
- Presenter
- Yonggang lu
- Computer Science Department, NMSU
- Sept. 2004
2Protein
- The protein is a very important component of a
cell. - The protein can be considered as
- a sequence of amino acids in one dimensional
(1-D) space - a folded chain of amino acids in three
dimensional (3-D) space - Most of the functions required by life are
determined by proteins. - These functions are closely related to the
structure of a protein, especially the 3
dimensional structure.
3Amino Acid Sequence for protein 1GP1
- CHAIN A
- AAALAAAAPRTVYAFSARPLAGGEPFNLSSLRGKVLLIENVASLXGTTV
RDYTQMNDLQRRLGPRGLVVLGFPCNQFGHQENAKNEEILNCLKYVRPGG
GFEPNFMLFEKCEVNGEKAHPLFAFLREVLPTPSDDATALMTDPKFITWS
PVCRNDVSWNFEKFLVGPDGVPVRRYSRRFLTIDIEPDIETLLSQGASA
- CHAIN B AAALAAAAPRTVYAFSARPLAGGEPFNLSSLRGKVLLIENVA
SLXGTTVRDYTQMNDLQRRLGPRGLVVLGFPCNQFGHQENAKNEEILNCL
KYVRPGGGFEPNFMLFEKCEVNGEKAHPLFAFLREVLPTPSDDATALMTD
PKFITWSPVCRNDVSWNFEKFLVGPDGVPVRRYSRRFLTIDIEPDIETLL
SQGASA
4The 3-D Structure of the Protein 1GP1 from PDB
Chain A
Chain B
5Protein Structure Hierarchy
- Primary structure
- AA sequence
- Secondary structure
- ahelix, ßstrand, coils and turns
- Tertiary structure
- repeated 3 dimensional structure
- Quaternary structure
- association of multiple protein chains into a
single protein
6Protein Structure Determination
- Experimental methods
- Edman degradation (1-D sequencing)
- X-ray crystallography (3-D)
- NMR spectroscopy (3-D)
- Prediction Methods
- Secondary Structure Prediction
- Tertiary Structure Prediction
7Why Structure Prediction?
- Problems with experimental methods
- NMR and X-ray methods are very expensive
- Difficulties in sample preparation
- Technical difficulties for large molecules
- Recent advances in molecular biology and the
equipment have incurred the rapid sequencing of
large genomes of several species - Human Genome Project
8Structure Prediction Methods
- Secondary Structure Prediction Methods
- PHD
- PSIPRED
- Jnet
- Tertiary Structure Prediction Methods
- Comparative/Homology Modeling
- Fold Recognition
- Ab Initio Folding
9Tertiary structure prediction flowchart drawn by
Dr. Robert Russel, http//www.bmm.icnet.uk/people/
rob/CCP11BBS/flowchart2.html
10Recent Developments in Structure Prediction
- Rosetta Ab Initio Structure Prediction Method
- Combination of the Tertiary and Secondary
Structure Prediction - Using experimental data to help the structure
prediction - Cryoelectron microscopy
- The Computational Tools for Bridging the
Information Gap
11The 1-D and 3-D Mapping Problem
- The developed computational tools can extract the
secondary structure information from the electron
density map of low to intermediate resolutions - Wen Jiang, et al. has developed two programs
helixhunter and foldhunter - Combination of the secondary structure prediction
and the length constraint. - Building a mapping between 1-D sequence and 3-D
structure for the secondary structure element
(SSE)
12Example of the PHD Prediction Output
PHD prediction result for 2TGP chain Z
13Description of the Mapping Problem
- The actual lengths of SSEs are determined from
the electron density map - The PHD secondary structure prediction gives the
probability for assigning helix, strand, and
coil to every position of the sequence - A good method is required for mapping the SSEs to
the sequence using all the information
14An Example of the Mapping Problem ----
Protein 1CC5
SS(3D) Length -------------------- H1
13 H2 12 H3 8 H4
8
Helix hunter
Modeling the 3-D structure
Mapping
PHD prediction results
....,....1....,....2....,....3....,....4....
,....5....,....6....,....7....,....8....,....9 AA
GGGARSGDDVVAKYCNACHGTGLLNAPKVGDSAAWKTRADAKGGLDGL
LAQSLSGLNAMPPKGTCADCSDDELKAAIGKMSGL prH-00012566
87776777875422111112211234444445422377888888753321
0012101111267899999999420
probability
position
REAL ______HHHHHHHH_________________HHHHHHHHHHHH_
___HHHHHHHH______________HHHHHHHHHHHHH_ LEN
8 12 8 13
15Our Approach to the Mapping Problem
- Building the initial position library
- Tree representation of the solution space
- Using constraints to trim the tree structure
- Depth-first search (Backtracking search)
- Building the mapping library from the search
results
16Producing the Initial Library
- Extract the probability information from PHD
prediction - Get the actual lengths of the SSEs
- Calculate the score (accumulative probability)
for assigning each real SSE for every position. - Store the positions with high scores to the
initial library
17Testing of the Initial Library
The testing result of the initial library for
protein 1L58 (only for helices)
18The Sequential Mapping Program
The initial library
The mapping tree
19Constraints and Simplifications
- No two SSEs can overlap
- All the SSEs must have positions on the protein
sequence - The average score of an intermediate solution
(partial SSE assignments) is required to be
greater than a threshold - Similar results are represented by one candidate
20Depth-first Search of the Solution Space
- Using depth-first search is a good choice since
it saves lots of memory - Depth-first search is implemented by a stack
structure which is used to store the intermediate
solutions (tree nodes) - After the children of a tree node are produced,
they are selected by the constraints - Only these selected nodes are pushed back into
the stack. - The leaves of the tree are collected in a result
queue which forms the mapping library (the final
results)
21The Results of the Running Test
22The Parallel Mapping Program
- The MPI is used for the parallel programming
- A fully decentralized dynamic scheduling
technique is used for balancing the loads of
processors - Each processor maintains a task queue and a
result queue. - A mixed queue structure is used for storing the
task nodes to minimize the communications between
processors
23(No Transcript)
24The Mixed Queue Structure the Parallel
Processing
25The running time in seconds for different number
of processors the prediction results
26The speedup of the parallel program for different
number of processors
27Conclusion
- Our program can do the mapping for small to
medium-sized proteins. Parallel processing is a
success. - And for large protein, improvements of the
algorithm are necessary. - More constraints need to be found for minimizing
the solution library.
28References
- Helixhunter and Foldhunter
- Wen Jiang, Matthew L. Baker, et al., Bridging the
information gap computational tools for
intermediate resolution structure interpretation.
J. Mol. Biol., 2001. 308 p. 1033-1044 - PDB website
- http//www.rcsb.org/pdb/
- PHD website
- http//www.embl-heidelberg.de/predictprotein/predi
ctprotein.html - Prediction Flow chart
- http//www.bmm.icnet.uk/people/rob/CCP11BBS/flowch
art2.html
29Questions?
30Acknowledgements
- Thanks
- My advisor, Dr. Jing He, for bringing me to the
area of bioinformatics, and her great help in the
whole process of program design. - Dr. Pontelli for his great help in designing the
parallel program - My wife, Yuxia Wang
31An Algorithm for Helices Mapping between 3D and
1D Protein Structure
- Presenter
- Yonggang lu
- Computer Science Department, NMSU
- Sept. 2004