Title: Protein Structure Prediction on a Lattice Model via Multimodal Optimization Techniques
1Protein Structure Prediction on a Lattice Model
via Multimodal Optimization Techniques
- Ka-Chun Wong, Kwong-Sak Leung, Man-Hon Wong
- Department of Computer Science Engineering
- The Chinese University of Hong Kong, HKSAR, China
- kcwong, ksleung, mhwong_at_cse.cuhk.edu.hk
2Outline
- Introduction
- Background
- Objective
- Related Works
- Paper Contributions
- Apply multimodal optimization techniques
- Propose a novel mutation method
- Experiments
- Conclusion
3Introduction
- Protein is
- a sequence of amino acid residues folded into a
3D structure - important for living
- Material transportations across cells
- Catalyzing metabolic reactions
- Body defenses against viruses
4Introduction
- Protein Function
- Substantially depends on its 3D structure
http//www.pdb.org/pdb/explore/explore.do?structur
eId2X7M
5Introduction
- Protein Structure Determination
- Wet-lab experiments exist
- X-ray crystallography
- NMR spectroscopy
-
- But they are
- Labor intensive
- Not scalable
- Expensive
6Introduction
- Wet lab experiments for Protein Structure
Determination are - Costly
- Time-consuming
- Not scalable
- Accurate
- Computational approaches for Protein Structure
Prediction are - Less Costly
- Fast
- Scalable
- Less Accurate
Complementary Twins Wet-labs for
fine-tuning Computation for coarse-tuning
7Introduction
- Protein Structure Prediction (PSP)
- Input An amino acid sequence
- Output The 3D structure of the sequence
- Divided into two classes
- Using / Not using
- similar sequences their structures
Prediction
YDVAEGCKVV
Similar sequences their structures
8Introduction
- This paper focuses on
- De novo protein structure prediction on the 3D HP
lattice model using evolutionary algorithms - De novo means the input of the method only
contains the sequence to be predicted
N. Krasnogor, W.E. Hart, J. Smith, and D. Pelta.
Protein structure prediction with evolutionary
algorithms. In Eiben Garzon Honovar Jakiela
Banzhaf, Daida and Smith, editors, International
Genetic and Evolutionary Computation Conference
(GECCO99), pages 1569-1601. Morgan Kaufmann, 1999.
9Background
- 3D HP lattice model
- Assume the main driving forces are the
interactions among the hydrophobic amino acid
residues - All known amino acid residues are experimentally
classified as either hydrophobic (H) or polar
(P).
10Background
- 3D HP lattice model
- An amino acid sequence is represented as a string
H,P - The sequence folded into a limited space, a cubic
lattice
11Background
- Amino acid residue Bead
- Peptide bond Straight Line
HPHPPHHPHPPHPHHPPHPH H Red color P Blue color
12Objective
- To find the conformation with the minimal energy.
- Maximize the number of the H-H bonds which are
formed by two non-sequence-adjacent residues
(non-local H-H bonds)
13Objective
- Mathematically, it is to minimize the following
function
Distance Function
Only non-sequence-adjacent residues are checked
Bond Energy
H. Li, R. Helling, C. Tang, and N. Wingreen.
Emergence of Preferred Structures in a Simple
Model of Protein Folding. Science,
273(5275)666669, 1996.
14Related Works
- Unger et al. first apply a hybridized genetic
algorithm to solve the problem 1 - Patton et al. use a standard genetic algorithm 2
1 Unger, R. and Moult, J. 1993. Genetic
Algorithm for 3D Protein Folding Simulations. In
Proceedings of the 5th international Conference
on Genetic Algorithms S. Forrest, Ed. Morgan
Kaufmann Publishers, San Francisco, CA, 581-588.
2 Patton, A. L., Punch, W. F., and Goodman, E.
D. 1995. A Standard GA Approach to Native Protein
Conformation Prediction. In Proceedings of the
6th international Conference on Genetic
Algorithms (July 15 - 19, 1995). L. J. Eshelman,
Ed. Morgan Kaufmann Publishers, San Francisco,
CA, 574-581.
15Related Works
- Berger et al. prove that the problem is
NP-complete 1 - Krasnogor et al. publish a work discussing the
basic algorithmic factors affecting the problem
2
1 Berger, B. and Leighton, T. 1998. Protein
folding in the hydrophobic-hydrophilic (HP) is
NP-complete. In Proceedings of the Second Annual
international Conference on Computational
Molecular Biology. RECOMB '98. ACM, New York, NY,
30-39. 2 N. Krasnogor, W.E. Hart, J. Smith, and
D. Pelta. Protein structure prediction with
evolutionary algorithms. In Eiben Garzon Honovar
Jakiela Banzhaf, Daida and Smith, editors,
International Genetic and Evolutionary
Computation Conference (GECCO99), pages
1569-1601. Morgan Kaufmann, 1999.
16Related Works
- Since then, many related algorithms are proposed.
Some examples - Multimeme algorithm by Krasnogor et al.
- Guided genetic algorithm by Hoque et al.
- Ant colony algorithm by Shmygelska et al.
- Differential Evolution by Bitello et al.
- Immune Algorithm by Cutello et al.
- EDA by Santana et al.
17Paper Contributions
- Observation
- Some diversity preserving techniques are
incorporated in most algorithms - Duplicate predator 1
- Aging operator 2
- Additional renormalization of the pheromone 3
1 G. A. Cox, T. V. Mortimer-Jones, R. P.
Taylor, and R. L. Johnston. Development and
optimisation of a novel genetic algorithm for
studying model protein folding. Theoretical
Chemistry Accounts Theory, Computation, and
Modeling, 112(3)163178, 2004. 2 V. Cutello,
G. Nicosia, M. Pavone, and J. Timmis. An immune
algorithm for protein structure prediction on
lattice models. IEEE Transactions on Evolutionary
Computation, 11(1)101117, Feb. 2007. 3 A.
Shmygelska and H. Hoos. An ant colony
optimisation algorithm for the 2d and 3d
hydrophobic polar protein folding problem. BMC
Bioinformatics, 6(1)30, 2005.
18Paper Contributions
- Observation
- Unger et al. have observed that there can be
multiple conformations for each energy value 1 - A study also indicates the fitness landscapes of
the problem are multimodal 2
1 R. Unger and J. Moult. Genetic algorithms for
protein folding simulations. J. Mol. Biol.,
2317581, May 1993. 2 S. D. Flores and J.
Smith. Study of fitness landscapes for the HP
model of protein structure prediction. In
Evolutionary Computation, 2003. CEC 03. pages
23382345, Dec. 2003.
19Paper Contributions
- In this paper
- Apply multimodal optimization techniques to solve
the PSP problem - Fitness Sharing (SharingGA) 1
- Species Conserving (SCGA) 2
- Crowding (CGA) 3
- Goldberg, D. E. and Richardson, J. 1987. Genetic
algorithms with sharing for multimodal function
optimization. In Proceedings of the Second
international Conference on Genetic Algorithms on
Genetic Algorithms and their Application, 41-49. - Li, J., Balazs, M. E., Parks, G. T., and
Clarkson, P. J. 2002. A species conserving
genetic algorithm for multimodal function
optimization. Evol. Comput. 10, 3 (Sep. 2002),
207-234. - De Jong, K. A. 1975 An Analysis of the Behavior
of a Class of Genetic Adaptive Systems.. Doctoral
Thesis. UMI Order Number AAI7609381., University
of Michigan.
20Paper Contributions
- In this paper
- Proposes a novel mutation method
- Mixing two types of mutations together
- Sometimes use RM, sometimes use AM
- and apply it in CGA (called CGA-mixed)
RM Mutation in Relative Encoding AM Mutation in
Absolute Encoding
21Experiments
- Experiments are conducted
- Relative Encoding 1
- Hamming Distance
- 100 Individuals (Overlapping)
- Uniform Deterministic (Parent Selection)
- Truncation (Survival Selection)
- 50 runs
- 105 and 5x106 energy evaluations
- UN 2 as a control algorithm
- N. Krasnogor, W.E. Hart, J. Smith, and D. Pelta.
Protein structure prediction with evolutionary
algorithms. In Eiben Garzon Honovar Jakiela
Banzhaf, Daida and Smith, editors, International
Genetic and Evolutionary Computation Conference
(GECCO99), pages 1569-1601. Morgan Kaufmann,
1999. - K.A. De Jong, Evolutionary computation a unified
approach. MIT Press, Cambridge MA, 2006
22Experiments
- 105 energy evaluations over 50 runs
H(x) The lowest energy over 50 runs means The
lowest energy of a run averaged over 50 runs
23Experiments
- 5x106 energy evaluations over 50 runs
H(x) The lowest energy over 50 runs means The
lowest energy of a run averaged over 50 runs
24Experiments
- The experimental results quoted in the following
literatures are taken and compared under the same
termination condition - Santana, R. Larranaga, P. Lozano, J.A. ,
"Protein Folding in Simplified Models With
Estimation of Distribution Algorithms,"
Evolutionary Computation, IEEE Transactions on ,
vol.12, no.4, pp.418-438, Aug. 2008 - Cutello, V. Nicosia, G. Pavone, M. Timmis, J.
, "An Immune Algorithm for Protein Structure
Prediction on Lattice Models," Evolutionary
Computation, IEEE Transactions on , vol.11, no.1,
pp.101-117, Feb. 2007
25Experiments
- 105 energy evaluations over 50 runs
H(x) The lowest energy over 50 runs means The
lowest energy of a run averaged over 50 runs
26Experiments
- 5 x 106 energy evaluations over 50 runs
H(x) The lowest energy over 50 runs means The
lowest energy of a run averaged over 50 runs
27Conclusion
- In this paper, we
- Apply multimodal optimization techniques for PSP
- Propose a novel mutation method for PSP
- Some results comparable with the state-of-the-art
algorithms have been obtained - The source codes can be downloaded at
http//pc89075.cse.cuhk.edu.hk8080/myapp/GECCO201
0-PSP-LatticeModels.zip
28QA
- The source codes can be downloaded at
http//pc89075.cse.cuhk.edu.hk8080/myapp/GECCO201
0-PSP-LatticeModels.zip
29Paper Contributions
- and apply it in CGA (called CGA-mixed)