CZ5225 Methods in Computational Biology Lecture 8: Protein Structure Prediction Methods . Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, NUS August 2004 - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

CZ5225 Methods in Computational Biology Lecture 8: Protein Structure Prediction Methods . Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, NUS August 2004

Description:

CZ5225 Methods in Computational Biology. Lecture 8: Protein ... can be used as a prelude to 'docking' these secondary structural elements against each other ... – PowerPoint PPT presentation

Number of Views:143

Avg rating:3.0/5.0

Slides: 47

Provided by: dbs7

Category:

more less

Transcript and Presenter's Notes

Title: CZ5225 Methods in Computational Biology Lecture 8: Protein Structure Prediction Methods . Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, NUS August 2004

1
CZ5225 Methods in Computational Biology
Lecture 8 Protein Structure Prediction
Methods. Chen Yu ZongTel 6874-6877Email
csccyz_at_nus.edu.sghttp//xin.cz3.nus.edu.sgRoom
07-24, level 7, SOC1, NUSAugust 2004
2
Protein Structural Organization

Proteins are made from just 20 kinds of amino
acids

3
Protein Structural Organization

Protein has four
levels of structural
organization

4
Protein Folding Sequence-Structure-Function
Relationship
5
Protein Folding Sequence-Structure-Function
Relationship
6
Measuring Structural SimilarityThe use of RMSD
7
Measuring Structural Similarity
8
Measuring Structural Similarity
9
Measuring Structural Similarity
10
Protein Structure Prediction
11
Protein Structure Prediction
12
Protein Secondary Structure Prediction

Secondary structure forms early in protein
folding process.
Identification of secondary structural elements
makes the topology of protein structure more
obviousso that similar ones can be identified in
a topology database such as TOPS.
Prediction of the positions and lengths of
secondary structure elements can be used as a
prelude to "docking" these secondary structural
elements against each other
Useful guide in the construction or refinement of
primary structure alignments, and to the correct
correspondence between parts of two proteins'
respective tertiary structures.
Useful for making some kind of intelligent guess
about the higher order structure of your protein

13
Protein Secondary Structure Prediction

Traditional methods CF, GOR Accuracy 60
Recent improvements Neural network, homologous
sequences Accuracy gt 70
References
"Prediction of the secondary structure of
proteins from their amino acid sequence", P. Y.
Chou, G. D. Fasman, 1978, Adv. Enzymolog. Relat.
Areas Mol. Biol., 47, 45-147.
"GOR method for predicting secondary structure
from amino acid sequence", J. Garnier, J.-F.
Gibrat, B. Robson, 1996, Methods Enzymol., 266,
540-553.
"Analysis of the accuracy and implications simple
methods for predicting the secondary structure of
globular proteins", J. Garnier, D. J. Osguthorpe,
B. Robson, 1978, J. Mol. Biol., 120, 45-147.
"Improvements in protein secondary structure
prediction by an enhanced neural network",
Kneller, 1990, J. Mol. Biol., 214, 171-182

14
Protein Secondary Structure Prediction

Software
Zvelebil, M.J.J.M., Barton, G.J., Taylor, W.R.
Sternberg, M.J.E. (1987). Prediction of Protein
Secondary Structure and Active Sites Using the
Alignment of Homologous Sequences Journal of
Molecular Biology, 195, 957-961. (ZPRED)
Rost, B. Sander, C. (1993), Prediction of
protein secondary structure at better than 70
Accuracy, Journal of Molecular Biology, 232,
584-599. PHD)
Salamov A.A. Solovyev V.V. (1995), Prediction
of protein secondary strurcture by combining
nearest-neighbor algorithms and multiply sequence
alignments. Journal of Molecular Biology, 247,1
(NNSSP)
Geourjon, C. Deleage, G. (1994), SOPM a self
optimized prediction method for protein secondary
structure prediction. Protein Engineering, 7,
157-16. (SOPMA)
Solovyev V.V. Salamov A.A. (1994) Predicting
alpha-helix and beta-strand segments of globular
proteins. (1994) Computer Applications in the
Biosciences,10,661-669. (SSP)
Wako, H. Blundell, T. L. (1994), Use of
amino-acid environment-dependent substitution
tables and conformational propensities in
structure prediction from aligned sequences of
homologous proteins. 2. Secondary Structures,
Journal of Molecular Biology, 238, 693-708.
Mehta, P., Heringa, J. Argos, P. (1995), A
simple and fast approach to prediction of protein
secondary structure from multiple aligned
sequences with accuracy above 70 . Protein
Science, 4, 2517-2525. (SSPRED)
King, R.D. Sternberg, M.J.E. (1996)
Identification and application of the concepts
important for accurate and reliable protein
secondary structure prediction. Protein Sci,5,
2298-2310. (DSC).

15
Protein Secondary Structure Prediction

Types of amino acids
Hydrophobic
Hydrophilic, Neutral
Hydrophilic, Acidic
Hydrophilic, Basic

16
Protein Secondary Structure Prediction

Types of Secondary Structures
Alpha helix and Beta- sheet

17
Protein Secondary Structure Prediction

Secondary Structures Favored Peptide Conformation

18
Protein Secondary Structure Prediction

Secondary Structures
Computation of structural propensity of a residue

Data derived from proteins of known structure is
used to calculate 'propensities' for each amino
acid type for adopting helix, sheet or turn

19
Protein Secondary Structure Prediction

Secondary Structures
Computation of structural propensity of a residue

Three states alpha helix, beta sheet, turn

20
Protein Secondary Structure Prediction

Structural propensity of
amino acids
Each residue is assigned to
one of the three classes
Forming residues favor a structure
Indifferent residues
Breaking residues stop the extension of a
structure

21
Protein Secondary Structure Prediction

Position specific turn parameters

22
Protein Secondary Structure Prediction

Chou and Fasman procedure
Find helical initiation regions
Extend helices until they reach tetrapeptide
breakers
Find beta initiation regions
Extend until they reach tetrapeptide breakers
Find turns
Resolve conflicts between alpha and beta
Somewhat subjective often have overlaps. Chou
and Fasman suggest using additional information
alpha-beta pattern, i.e. does this look like an
b-a-b structure ???
end probabilities Chou and Fasman in later
papers also tabulated the preferences for the
residues to occur at the amino and carboxyl
terminal ends of a and b structures.
These can be used to resolve overlaps
Chou and Fasman did not provide an explicit
algorithm for this conflict resolution, relying
on their expert
judgment. This meant that each persons
prediction could be different. Most people are
not experts.
"Prediction of the secondary structure of
proteins from their amino acid sequence",
P. Y. Chou, G. D. Fasman, 1978, Adv. Enzymolog.
Relat. Areas Mol. Biol., 47, 45-147.

23
Protein Secondary Structure Prediction
24
Homology Modeling
25
Homology Modeling

Reference
Sanchez R, Sali A. Advances in comparative
protein-structure modelling. Curr Opin Struct
Biol. 1997 Apr7(2)206-14.
Krieger E, Nabuurs SB, Vriend G. Homology
modeling. Methods Biochem Anal. 200344509-23
Rodriguez R, Chinea G, Lopez N, Pons T, Vriend G.
Homology modeling, model and software evaluation
three related resources. Bioinformatics.
199814(6)523-8
Alexandrov NN, Luethy R. Alignment algorithm for
homology modeling and threading. Protein Sci.
1998 Feb7(2)254-8

26
Homology Modeling

Basic Idea
Similar sequencegt Similar structure
Structure is conserved more than sequence
Structure of new protein derived using existing
protein structures as templates.
Changes are compensated for locally.

27
Homology Modeling
Twilight Zone below 25 sequence homology
28
Homology Modeling

Similar sequencegt Similar structure

29
Homology Modeling

Step One
Align sequence of your protein (unknown) with
that of candidate template proteins (known)

30
Homology Modeling

Step Two
Select template proteins based on sequence
similarity and minimize their X-ray structures
The whole sequence can be matched by one or more
templates

31
Homology Modeling

Step Three
Combine the main chain of the template proteins
and fill-in gap sections to generate a complete
main chain model of your protein
Gaps are filled-in by using short sequences from
a sequence linker library, the selected short

32
Homology Modeling

Step Three
Combine the main chain of the template proteins
and fill-in gap sections to generate a complete
main chain model of your protein
Gaps are filled-in by using short sequences from
a sequence linker library, the selected short
sequences need to be exchangeable to the section
of your original protein.

33
Homology Modeling

Step Four Adding side chains to the main-chain
model based on the sequence of your protein
Mutate and add

34
Homology Modeling

Step Five
Minimization and MD of the homology model of your
protein

35
Homology Modeling

Swiss-Model - an automated homology modeling
server developed at Glaxo Welcome Experimental
Research in Geneva. http//www.expasy.ch/swissmod
/
Closely linked to Swiss-PdbViewer, a tool for
viewing and manipulating protein structures and
models.
Likely take 24 hours to get results returned!

36
Homology Modeling

How Swiss-model works?
1) Search for suitable templates
2) Check sequence identity with target
3) Create ProModII jobs
4) Generate models with ProModII
5) Energy minimization with Gromos96
First approach mode (regular)
First approach mode (with user-defined template)
Optimize mode

37
Homology Modeling

How Swiss-model works?
Program Database Action
BLASTP2 ExNRL-3D Find homologous
sequences
of
proteins with known structure.
SIM -- Select
all templates with sequence
identities above 25.
-- --
Generate ProModII input files
ProModII ExPDB Generate all
models
Gromos96 -- Energy
minimization of all models

38
Threading Methods

Similar proteins at the sequence level may have
very different secondary structures. On the other
hand, proteins very different at the sequence
level may have similar structures. Why? Because
the protein function is determined by its
functional sites, which reside in the cores not
the loops.
Therefore, researchers propose the inverse
protein folding problem, namely, fitting a known
structure to a sequence.
The problem of aligning a protein sequence to a
given structural model is known as protein
threading.
Given a protein whose structure is known, we
derive a structural model by replacing amino
acids by place-holders, each is associated with
some basic properties such as an alpha-helix or
beta-strand or loop of the original amino acids.

39
Threading Methods

References and software
Lemer C., Rooman, M. J. Wodak, S. J. (1996),
Protein Structure Prediction By Threading
Methods Evaluation Of Current Techniques,
PROTEINS Structure, Function and Genetics, 23,
337-355.
Bryant, S. H. Lawrence, C. E. (1993), An
empirical energy function for threading a protein
sequence through the folding motif, PROTEINS
Structure, Function and Genetics, 16, 92-112.
Alexandrov NN, Luethy R. Alignment algorithm for
homology modeling and threading. Protein Sci.
1998 Feb7(2)254-8
Jones, D.T., Taylor, W.R Thornton, J.M (1992),
A new approach to protein fold recognition,
Nature,358, 86-89. (THREADER).

40
Threading Methods

Threading methods take the amino acid sequence of
an uncharacterized protein structure, rapidly
compute models based on a large set of existing
3D structures.
The algorithm then evaluates these models to
determine how well the unknown amino acid fits
each template structure.
All the threading models in the second to most
recent CASP competition produced accurate models
in less than half of the cases.
However, threading is more successful than
homology modeling when attempting to detect
remote homologies that cant be detected by
standard sequence alignment.

41
Threading Methods

Protein Threading Model
Input
A protein sequence A with n amino acids
A structural model with m core segments Ci
(1) Each core segment Ci has length ci.
(2) Core segments Ci and Cj are connected by loop
Li, which has length between li-min and li-max.
(3) The local structural environment for each
amino acid position, such as chemical properties
and spatial constraints.
A score function to evaluate a given threading.
Output
T t1, t2, ..., tm of integers, where ti is
the amino acid position in A that occupies the
first position in core segment Ci.

42
Threading Methods

Protein Threading Model
An algorithm Branch and bound
Spatial constraints
1 SUM (cj lj-min) lt ti lt n 1 - SUM
(cj lj-min)
j lt i
j gt i
ti ci li-min lt ti1 lt ti ci li-max
A score function (second order, considering
pairwise interaction)
f(T) SUM g1(i,ti) SUM g2(i,j,ti,tj)
i j gt i
Algorithm testing self-threading and using
structural analogs.

43
Ab initio Methods

ab initio means from the beginning.
Ab-initio algorithms attempt to predict structure
based on sequence information alone (i.e., no
emperical structural info is considered).
Although many researchers are working in this
vein, it is a science in progress sometimes
marginally successful, but very unreliable.
Methods MD and Simplified models

44
Ab initio Methods

References
Hardin C, Pogorelov TV, Luthey-Schulten Z. Ab
initio protein structure prediction. Curr Opin
Struct Biol. 2002 Apr12(2)176-81. Review.
Srinivasan R, Rose GD. Ab initio prediction of
protein structure using LINUS. Proteins. 2002 Jun
147(4)489-95.
Bonneau R, Strauss CE, Rohl CA, Chivian D,
Bradley P, Malmstrom L, Robertson T, Baker D. De
novo prediction of three-dimensional structures
for major protein families.
J Mol Biol. 2002 Sep 6322(1)65-78.
Bystroff C, Shao Y. Fully automated ab initio
protein structure prediction using I-SITES,
HMMSTR and ROSETTA. Bioinformatics. 2002 Jul18
Suppl 1S54-61

45
Ab initio Methods

LINUS as an example Local Independently
Nucleated Units of Structure
50 amino acids are folded at a time, in an
overlapping fashion 1-50, 26-75, ...
Based on the idea that actual proteins fold by
forming local secondary structure first.
Side chains are simplified. Only 3 interactions
are used
1 repulsive steric
2 attractive H-bonds and hydrophobic
Then the calculation of all possibilities for the
search of the lowest free energy

46
CZ5225 Methods in Computational Biology
Assignment 2

Option 1
Write a code for protein secondary structure
prediction.
Test your code on several selected proteins and
compare your prediction results with those from
the PHD software at http//npsa-pbil.ibcp.fr
Option 2
Write a code for protein homology modeling
Test your code on several selected proteins,
compute the rmsd of each of your predicted
structures against an x-ray structure of that
protein.
Option 3
Write a code for structural comparison of two
structures of unequal number of atoms. Test your
code on several pairs of molecules/proteins and
compute the rmsd between each pairs
Requirement Write a report about the theory,
algorithm, testing results, and suggested
Improvement/future work and submit together with
a soft copy of your code.