Title: CZ5225 Methods in Computational Biology Lecture 8: Protein Structure Prediction Methods . Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, NUS August 2004
1CZ5225 Methods in Computational Biology
Lecture 8 Protein Structure Prediction
Methods. Chen Yu ZongTel 6874-6877Email
csccyz_at_nus.edu.sghttp//xin.cz3.nus.edu.sgRoom
07-24, level 7, SOC1, NUSAugust 2004
2Protein Structural Organization
- Proteins are made from just 20 kinds of amino
acids
3Protein Structural Organization
- Protein has four
- levels of structural
- organization
4Protein Folding Sequence-Structure-Function
Relationship
5Protein Folding Sequence-Structure-Function
Relationship
6Measuring Structural SimilarityThe use of RMSD
7Measuring Structural Similarity
8Measuring Structural Similarity
9Measuring Structural Similarity
10Protein Structure Prediction
11Protein Structure Prediction
12Protein Secondary Structure Prediction
- Secondary structure forms early in protein
folding process. - Identification of secondary structural elements
makes the topology of protein structure more
obviousso that similar ones can be identified in
a topology database such as TOPS. - Prediction of the positions and lengths of
secondary structure elements can be used as a
prelude to "docking" these secondary structural
elements against each other - Useful guide in the construction or refinement of
primary structure alignments, and to the correct
correspondence between parts of two proteins'
respective tertiary structures. - Useful for making some kind of intelligent guess
about the higher order structure of your protein
13Protein Secondary Structure Prediction
- Traditional methods CF, GOR Accuracy 60
- Recent improvements Neural network, homologous
sequences Accuracy gt 70 - References
- "Prediction of the secondary structure of
proteins from their amino acid sequence", P. Y.
Chou, G. D. Fasman, 1978, Adv. Enzymolog. Relat.
Areas Mol. Biol., 47, 45-147. - "GOR method for predicting secondary structure
from amino acid sequence", J. Garnier, J.-F.
Gibrat, B. Robson, 1996, Methods Enzymol., 266,
540-553. - "Analysis of the accuracy and implications simple
methods for predicting the secondary structure of
globular proteins", J. Garnier, D. J. Osguthorpe,
B. Robson, 1978, J. Mol. Biol., 120, 45-147. - "Improvements in protein secondary structure
prediction by an enhanced neural network",
Kneller, 1990, J. Mol. Biol., 214, 171-182
14Protein Secondary Structure Prediction
- Software
- Zvelebil, M.J.J.M., Barton, G.J., Taylor, W.R.
Sternberg, M.J.E. (1987). Prediction of Protein
Secondary Structure and Active Sites Using the
Alignment of Homologous Sequences Journal of
Molecular Biology, 195, 957-961. (ZPRED) - Rost, B. Sander, C. (1993), Prediction of
protein secondary structure at better than 70
Accuracy, Journal of Molecular Biology, 232,
584-599. PHD) - Salamov A.A. Solovyev V.V. (1995), Prediction
of protein secondary strurcture by combining
nearest-neighbor algorithms and multiply sequence
alignments. Journal of Molecular Biology, 247,1
(NNSSP) - Geourjon, C. Deleage, G. (1994), SOPM a self
optimized prediction method for protein secondary
structure prediction. Protein Engineering, 7,
157-16. (SOPMA) - Solovyev V.V. Salamov A.A. (1994) Predicting
alpha-helix and beta-strand segments of globular
proteins. (1994) Computer Applications in the
Biosciences,10,661-669. (SSP) - Wako, H. Blundell, T. L. (1994), Use of
amino-acid environment-dependent substitution
tables and conformational propensities in
structure prediction from aligned sequences of
homologous proteins. 2. Secondary Structures,
Journal of Molecular Biology, 238, 693-708. - Mehta, P., Heringa, J. Argos, P. (1995), A
simple and fast approach to prediction of protein
secondary structure from multiple aligned
sequences with accuracy above 70 . Protein
Science, 4, 2517-2525. (SSPRED) - King, R.D. Sternberg, M.J.E. (1996)
Identification and application of the concepts
important for accurate and reliable protein
secondary structure prediction. Protein Sci,5,
2298-2310. (DSC).
15Protein Secondary Structure Prediction
- Types of amino acids
- Hydrophobic
- Hydrophilic, Neutral
- Hydrophilic, Acidic
- Hydrophilic, Basic
16Protein Secondary Structure Prediction
- Types of Secondary Structures
- Alpha helix and Beta- sheet
17Protein Secondary Structure Prediction
- Secondary Structures Favored Peptide Conformation
18Protein Secondary Structure Prediction
- Secondary Structures
- Computation of structural propensity of a residue
- Data derived from proteins of known structure is
used to calculate 'propensities' for each amino
acid type for adopting helix, sheet or turn
19Protein Secondary Structure Prediction
- Secondary Structures
- Computation of structural propensity of a residue
- Three states alpha helix, beta sheet, turn
20Protein Secondary Structure Prediction
- Structural propensity of
- amino acids
- Each residue is assigned to
- one of the three classes
- Forming residues favor a structure
- Indifferent residues
- Breaking residues stop the extension of a
structure
21Protein Secondary Structure Prediction
- Position specific turn parameters
22Protein Secondary Structure Prediction
- Chou and Fasman procedure
- Find helical initiation regions
- Extend helices until they reach tetrapeptide
breakers - Find beta initiation regions
- Extend until they reach tetrapeptide breakers
- Find turns
- Resolve conflicts between alpha and beta
- Somewhat subjective often have overlaps. Chou
and Fasman suggest using additional information - alpha-beta pattern, i.e. does this look like an
b-a-b structure ??? - end probabilities Chou and Fasman in later
papers also tabulated the preferences for the
residues to occur at the amino and carboxyl
terminal ends of a and b structures. - These can be used to resolve overlaps
- Chou and Fasman did not provide an explicit
algorithm for this conflict resolution, relying
on their expert - judgment. This meant that each persons
prediction could be different. Most people are
not experts. - "Prediction of the secondary structure of
proteins from their amino acid sequence", - P. Y. Chou, G. D. Fasman, 1978, Adv. Enzymolog.
Relat. Areas Mol. Biol., 47, 45-147.
23Protein Secondary Structure Prediction
24Homology Modeling
25Homology Modeling
- Reference
- Sanchez R, Sali A. Advances in comparative
protein-structure modelling. Curr Opin Struct
Biol. 1997 Apr7(2)206-14. - Krieger E, Nabuurs SB, Vriend G. Homology
modeling. Methods Biochem Anal. 200344509-23 - Rodriguez R, Chinea G, Lopez N, Pons T, Vriend G.
Homology modeling, model and software evaluation
three related resources. Bioinformatics.
199814(6)523-8 - Alexandrov NN, Luethy R. Alignment algorithm for
homology modeling and threading. Protein Sci.
1998 Feb7(2)254-8
26Homology Modeling
- Basic Idea
- Similar sequencegt Similar structure
- Structure is conserved more than sequence
- Structure of new protein derived using existing
protein structures as templates. - Changes are compensated for locally.
27Homology Modeling
Twilight Zone below 25 sequence homology
28Homology Modeling
- Similar sequencegt Similar structure
29Homology Modeling
- Step One
- Align sequence of your protein (unknown) with
that of candidate template proteins (known)
30Homology Modeling
- Step Two
- Select template proteins based on sequence
similarity and minimize their X-ray structures - The whole sequence can be matched by one or more
templates
31Homology Modeling
- Step Three
- Combine the main chain of the template proteins
and fill-in gap sections to generate a complete
main chain model of your protein - Gaps are filled-in by using short sequences from
a sequence linker library, the selected short
32Homology Modeling
- Step Three
- Combine the main chain of the template proteins
and fill-in gap sections to generate a complete
main chain model of your protein - Gaps are filled-in by using short sequences from
a sequence linker library, the selected short
sequences need to be exchangeable to the section
of your original protein.
33Homology Modeling
- Step Four Adding side chains to the main-chain
model based on the sequence of your protein - Mutate and add
34Homology Modeling
- Step Five
- Minimization and MD of the homology model of your
protein
35Homology Modeling
- Swiss-Model - an automated homology modeling
server developed at Glaxo Welcome Experimental
Research in Geneva. http//www.expasy.ch/swissmod
/ - Closely linked to Swiss-PdbViewer, a tool for
viewing and manipulating protein structures and
models. - Likely take 24 hours to get results returned!
36Homology Modeling
- How Swiss-model works?
- 1) Search for suitable templates
- 2) Check sequence identity with target
- 3) Create ProModII jobs
- 4) Generate models with ProModII
- 5) Energy minimization with Gromos96
- First approach mode (regular)
- First approach mode (with user-defined template)
- Optimize mode
37Homology Modeling
- How Swiss-model works?
- Program Database Action
- BLASTP2 ExNRL-3D Find homologous
sequences - of
proteins with known structure. - SIM -- Select
all templates with sequence -
identities above 25. - -- --
Generate ProModII input files - ProModII ExPDB Generate all
models - Gromos96 -- Energy
minimization of all models
38Threading Methods
- Similar proteins at the sequence level may have
very different secondary structures. On the other
hand, proteins very different at the sequence
level may have similar structures. Why? Because
the protein function is determined by its
functional sites, which reside in the cores not
the loops. - Therefore, researchers propose the inverse
protein folding problem, namely, fitting a known
structure to a sequence. - The problem of aligning a protein sequence to a
given structural model is known as protein
threading. - Given a protein whose structure is known, we
derive a structural model by replacing amino
acids by place-holders, each is associated with
some basic properties such as an alpha-helix or
beta-strand or loop of the original amino acids.
39Threading Methods
- References and software
- Lemer C., Rooman, M. J. Wodak, S. J. (1996),
Protein Structure Prediction By Threading
Methods Evaluation Of Current Techniques,
PROTEINS Structure, Function and Genetics, 23,
337-355. - Bryant, S. H. Lawrence, C. E. (1993), An
empirical energy function for threading a protein
sequence through the folding motif, PROTEINS
Structure, Function and Genetics, 16, 92-112. - Alexandrov NN, Luethy R. Alignment algorithm for
homology modeling and threading. Protein Sci.
1998 Feb7(2)254-8 - Jones, D.T., Taylor, W.R Thornton, J.M (1992),
A new approach to protein fold recognition,
Nature,358, 86-89. (THREADER).
40Threading Methods
- Threading methods take the amino acid sequence of
an uncharacterized protein structure, rapidly
compute models based on a large set of existing
3D structures. - The algorithm then evaluates these models to
determine how well the unknown amino acid fits
each template structure. - All the threading models in the second to most
recent CASP competition produced accurate models
in less than half of the cases. - However, threading is more successful than
homology modeling when attempting to detect
remote homologies that cant be detected by
standard sequence alignment.
41Threading Methods
- Protein Threading Model
- Input
- A protein sequence A with n amino acids
- A structural model with m core segments Ci
- (1) Each core segment Ci has length ci.
- (2) Core segments Ci and Cj are connected by loop
Li, which has length between li-min and li-max. - (3) The local structural environment for each
amino acid position, such as chemical properties
and spatial constraints. - A score function to evaluate a given threading.
- Output
- T t1, t2, ..., tm of integers, where ti is
the amino acid position in A that occupies the
first position in core segment Ci.
42Threading Methods
- Protein Threading Model
- An algorithm Branch and bound
- Spatial constraints
- 1 SUM (cj lj-min) lt ti lt n 1 - SUM
(cj lj-min) - j lt i
j gt i - ti ci li-min lt ti1 lt ti ci li-max
- A score function (second order, considering
pairwise interaction) - f(T) SUM g1(i,ti) SUM g2(i,j,ti,tj)
- i j gt i
- Algorithm testing self-threading and using
structural analogs.
43Ab initio Methods
- ab initio means from the beginning.
- Ab-initio algorithms attempt to predict structure
based on sequence information alone (i.e., no
emperical structural info is considered). - Although many researchers are working in this
vein, it is a science in progress sometimes
marginally successful, but very unreliable. - Methods MD and Simplified models
44Ab initio Methods
- References
- Hardin C, Pogorelov TV, Luthey-Schulten Z. Ab
initio protein structure prediction. Curr Opin
Struct Biol. 2002 Apr12(2)176-81. Review. - Srinivasan R, Rose GD. Ab initio prediction of
protein structure using LINUS. Proteins. 2002 Jun
147(4)489-95. - Bonneau R, Strauss CE, Rohl CA, Chivian D,
Bradley P, Malmstrom L, Robertson T, Baker D. De
novo prediction of three-dimensional structures
for major protein families. - J Mol Biol. 2002 Sep 6322(1)65-78.
- Bystroff C, Shao Y. Fully automated ab initio
protein structure prediction using I-SITES,
HMMSTR and ROSETTA. Bioinformatics. 2002 Jul18
Suppl 1S54-61
45Ab initio Methods
- LINUS as an example Local Independently
Nucleated Units of Structure - 50 amino acids are folded at a time, in an
overlapping fashion 1-50, 26-75, ... - Based on the idea that actual proteins fold by
forming local secondary structure first. - Side chains are simplified. Only 3 interactions
are used - 1 repulsive steric
- 2 attractive H-bonds and hydrophobic
- Then the calculation of all possibilities for the
search of the lowest free energy
46CZ5225 Methods in Computational Biology
Assignment 2
- Option 1
- Write a code for protein secondary structure
prediction. - Test your code on several selected proteins and
compare your prediction results with those from
the PHD software at http//npsa-pbil.ibcp.fr - Option 2
- Write a code for protein homology modeling
- Test your code on several selected proteins,
compute the rmsd of each of your predicted
structures against an x-ray structure of that
protein. - Option 3
- Write a code for structural comparison of two
structures of unequal number of atoms. Test your
code on several pairs of molecules/proteins and
compute the rmsd between each pairs - Requirement Write a report about the theory,
algorithm, testing results, and suggested - Improvement/future work and submit together with
a soft copy of your code.