An Algorithm for Helices Mapping between 3D and 1D Protein Structure - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

An Algorithm for Helices Mapping between 3D and 1D Protein Structure

Description:

... prediction flowchart drawn by Dr. Robert Russel, http://www.bmm.icnet.uk ... Wen Jiang, Matthew L. Baker, et al., Bridging the information gap: computational ... – PowerPoint PPT presentation

Number of Views:26

Avg rating:3.0/5.0

Slides: 32

Provided by: road

Category:

more less

Transcript and Presenter's Notes

Title: An Algorithm for Helices Mapping between 3D and 1D Protein Structure

1
An Algorithm for Helices Mapping between 3D and
1D Protein Structure

Presenter
Yonggang lu
Computer Science Department, NMSU
Sept. 2004

2
Protein

The protein is a very important component of a
cell.
The protein can be considered as
a sequence of amino acids in one dimensional
(1-D) space
a folded chain of amino acids in three
dimensional (3-D) space
Most of the functions required by life are
determined by proteins.
These functions are closely related to the
structure of a protein, especially the 3
dimensional structure.

3
Amino Acid Sequence for protein 1GP1

CHAIN A
AAALAAAAPRTVYAFSARPLAGGEPFNLSSLRGKVLLIENVASLXGTTV
RDYTQMNDLQRRLGPRGLVVLGFPCNQFGHQENAKNEEILNCLKYVRPGG
GFEPNFMLFEKCEVNGEKAHPLFAFLREVLPTPSDDATALMTDPKFITWS
PVCRNDVSWNFEKFLVGPDGVPVRRYSRRFLTIDIEPDIETLLSQGASA
CHAIN B AAALAAAAPRTVYAFSARPLAGGEPFNLSSLRGKVLLIENVA
SLXGTTVRDYTQMNDLQRRLGPRGLVVLGFPCNQFGHQENAKNEEILNCL
KYVRPGGGFEPNFMLFEKCEVNGEKAHPLFAFLREVLPTPSDDATALMTD
PKFITWSPVCRNDVSWNFEKFLVGPDGVPVRRYSRRFLTIDIEPDIETLL
SQGASA

4
The 3-D Structure of the Protein 1GP1 from PDB
Chain A
Chain B
5
Protein Structure Hierarchy

Primary structure
AA sequence
Secondary structure
ahelix, ßstrand, coils and turns
Tertiary structure
repeated 3 dimensional structure
Quaternary structure
association of multiple protein chains into a
single protein

6
Protein Structure Determination

Experimental methods
Edman degradation (1-D sequencing)
X-ray crystallography (3-D)
NMR spectroscopy (3-D)
Prediction Methods
Secondary Structure Prediction
Tertiary Structure Prediction

7
Why Structure Prediction?

Problems with experimental methods
NMR and X-ray methods are very expensive
Difficulties in sample preparation
Technical difficulties for large molecules
Recent advances in molecular biology and the
equipment have incurred the rapid sequencing of
large genomes of several species
Human Genome Project

8
Structure Prediction Methods

Secondary Structure Prediction Methods
PHD
PSIPRED
Jnet
Tertiary Structure Prediction Methods
Comparative/Homology Modeling
Fold Recognition
Ab Initio Folding

9
Tertiary structure prediction flowchart drawn by
Dr. Robert Russel, http//www.bmm.icnet.uk/people/
rob/CCP11BBS/flowchart2.html
10
Recent Developments in Structure Prediction

Rosetta Ab Initio Structure Prediction Method
Combination of the Tertiary and Secondary
Structure Prediction
Using experimental data to help the structure
prediction
Cryoelectron microscopy
The Computational Tools for Bridging the
Information Gap

11
The 1-D and 3-D Mapping Problem

The developed computational tools can extract the
secondary structure information from the electron
density map of low to intermediate resolutions
Wen Jiang, et al. has developed two programs
helixhunter and foldhunter
Combination of the secondary structure prediction
and the length constraint.
Building a mapping between 1-D sequence and 3-D
structure for the secondary structure element
(SSE)

12
Example of the PHD Prediction Output
PHD prediction result for 2TGP chain Z
13
Description of the Mapping Problem

The actual lengths of SSEs are determined from
the electron density map
The PHD secondary structure prediction gives the
probability for assigning helix, strand, and
coil to every position of the sequence
A good method is required for mapping the SSEs to
the sequence using all the information

14
An Example of the Mapping Problem ----
Protein 1CC5
SS(3D) Length -------------------- H1
13 H2 12 H3 8 H4
8
Helix hunter
Modeling the 3-D structure
Mapping
PHD prediction results
....,....1....,....2....,....3....,....4....
,....5....,....6....,....7....,....8....,....9 AA
GGGARSGDDVVAKYCNACHGTGLLNAPKVGDSAAWKTRADAKGGLDGL
LAQSLSGLNAMPPKGTCADCSDDELKAAIGKMSGL prH-00012566
87776777875422111112211234444445422377888888753321
0012101111267899999999420
probability
position
REAL ______HHHHHHHH_________________HHHHHHHHHHHH_
___HHHHHHHH______________HHHHHHHHHHHHH_ LEN
8 12 8 13
15
Our Approach to the Mapping Problem

Building the initial position library
Tree representation of the solution space
Using constraints to trim the tree structure
Depth-first search (Backtracking search)
Building the mapping library from the search
results

16
Producing the Initial Library

Extract the probability information from PHD
prediction
Get the actual lengths of the SSEs
Calculate the score (accumulative probability)
for assigning each real SSE for every position.
Store the positions with high scores to the
initial library

17
Testing of the Initial Library
The testing result of the initial library for
protein 1L58 (only for helices)
18
The Sequential Mapping Program
The initial library
The mapping tree
19
Constraints and Simplifications

No two SSEs can overlap
All the SSEs must have positions on the protein
sequence
The average score of an intermediate solution
(partial SSE assignments) is required to be
greater than a threshold
Similar results are represented by one candidate

20
Depth-first Search of the Solution Space

Using depth-first search is a good choice since
it saves lots of memory
Depth-first search is implemented by a stack
structure which is used to store the intermediate
solutions (tree nodes)
After the children of a tree node are produced,
they are selected by the constraints
Only these selected nodes are pushed back into
the stack.
The leaves of the tree are collected in a result
queue which forms the mapping library (the final
results)

21
The Results of the Running Test
22
The Parallel Mapping Program

The MPI is used for the parallel programming
A fully decentralized dynamic scheduling
technique is used for balancing the loads of
processors
Each processor maintains a task queue and a
result queue.
A mixed queue structure is used for storing the
task nodes to minimize the communications between
processors

23
(No Transcript)
24
The Mixed Queue Structure the Parallel
Processing
25
The running time in seconds for different number
of processors the prediction results
26
The speedup of the parallel program for different
number of processors
27
Conclusion

Our program can do the mapping for small to
medium-sized proteins. Parallel processing is a
success.
And for large protein, improvements of the
algorithm are necessary.
More constraints need to be found for minimizing
the solution library.

28
References

Helixhunter and Foldhunter
Wen Jiang, Matthew L. Baker, et al., Bridging the
information gap computational tools for
intermediate resolution structure interpretation.
J. Mol. Biol., 2001. 308 p. 1033-1044
PDB website
http//www.rcsb.org/pdb/
PHD website
http//www.embl-heidelberg.de/predictprotein/predi
ctprotein.html
Prediction Flow chart
http//www.bmm.icnet.uk/people/rob/CCP11BBS/flowch
art2.html

29
Questions?

Thank you!

30
Acknowledgements

Thanks
My advisor, Dr. Jing He, for bringing me to the
area of bioinformatics, and her great help in the
whole process of program design.
Dr. Pontelli for his great help in designing the
parallel program
My wife, Yuxia Wang

31
An Algorithm for Helices Mapping between 3D and
1D Protein Structure