Protein Secondary Structure Prediction - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Protein Secondary Structure Prediction

Description:

Scan window 17-25 residues calculate hydrophobicity score. Many false positives ... Hydrophobicity. Use structure fold that best fits profile of parameters. Ab ... – PowerPoint PPT presentation

Number of Views:216

Avg rating:3.0/5.0

Slides: 29

Provided by: hughpat

Category:

more less

Transcript and Presenter's Notes

Title: Protein Secondary Structure Prediction

1
Chapter 14 Protein Secondary Structure Prediction
2
Refresher

Proteins have secondary structures
These structures are essential to maintain the 3D
structure of the protein
Secondary structure can be either of
?-helix
?-strand
Coil
?-helix H-bond between CO and N-H of every 4ith
residue
3.6 aa per turn
1.5 Å / aa ( 5.4 Å per turn)
(fully extended peptide backbone 3.5 Å / aa)
?-strand H-bond between CO and N-H of distant
regions
Parallel or anti-parallel
Coiled coil
Hydrophobic amino acids interact

3
Secondary Structure Predictions

Prediction of conformation of each amino acid
H ?-helix
E ?-strand
C Coil (no defined 2 structure)
Used for classification of proteins
Defining domains and motifs
Intermediary step towards 3 structure prediction
Globular and trans-membrane proteins are
structurally very different
Required different algorithms to predict these
two classes of proteins

Problem is not trivial
?-helix based on short distance (4i
interactions)
?-strand based on long distance (5 50
residues)
Long range interaction predictions less accurate
Accuracy about 75
Ab initio based
Statistical calculation of residues in single
query sequence
Homology-based
Common 2 structure patterns in homologous
sequences

5
Ab initio Methods
Chou-Fasman Intrinsic property of residue to be
in helix, strand or turn structure A, E, M common
in ?-helices N residues in all protein
structures M residues in ?-helices Y Total Ala
in protein structures X Ala in
?-helices Propensity Ala in ?-helix
(X/Y)/(M/N) Value 1 same distribution as
average Value gt 1 more often in ?-helix than
average Value lt 1 less often in ?-helix than
average 6 residue window of which 4 is H ?
?-helix Window extended bidirectionally until P
lt 1.0 5 residue window of which 3 is E ? ?-strand
6
http//fasta.bioch.virginia.edu/fasta_www2/fasta_w
ww.cgi?rmmisc1
7
Example Chou-Fasman
10 20 30 40
50 60 SRRSASHPTY SEMIAAAIRA
EKSRGGSSRQ SIQKYIKSHY KVGHNADLQI KLSIRRLLAA
70 80 90 GVLKQTKGVG
ASGSFRLAKS DKAKRSPGKK
HELIX 1 HA1 SER A 29 ALA A 38 HELIX 2
HA2 ARG A 47 SER A 56 HELIX 3 HA3 ALA A
64 ALA A 78 SHEET 1 SA 3 SER A 45 SER A
46 SHEET 2 SA 3 GLY A 91 ARG A 94 SHEET
3 SA 3 LEU A 81 GLY A 86
. . . .
. . SRRSASHPTYSEMIAAAIRAEKSRG
GSSRQSIQKYIKSHYKVGHNADLQIKLSIRRLLAA helix
lt--------gt lt-----gt
lt----------------- sheet EEEEEEEEE
EEEEEE EEEEEEEEEEEEE turns T T
T T T
. . .
GVLKQTKGVGASGSFRLAKSDKAKRSPGKK helix -------gt
lt-------gt sheet EEEEEEEEE
turns T T TT T
8
Garnier-Osguthorpe-Robson (GOR)

Makes use of distant influences on propensity
Uses 17 residue window
Adds propensity for four 2º structure states (H,
E, T, C)
Highest value defines 2º structure state of
central residue in window

. 10 . 20 . 30 . 40 . 50
. 60 SRRSASHPTYSEMIAAAIRAEKSRGGSSRQSIQKY
IKSHYKVGHNADLQIKLSIRRLLAA helix
HHHHHHHHHHH HHHHHH
HHHH sheet EEEEEEEE
E EEEEEE turns TTTT
TTTTT T TTTT coil C
CCCCC CCC C
. 70 . 80 . 90
GVLKQTKGVGASGSFRLAKSDKAKRSPGKK helix HHHH
HHHHHHHHHHH sheet EEEEE E
turns TTT
coil CCCC C C Residue
totals H 36 E 21 T 17 C 16
percent H 48.6 E 28.4 T 23.0 C 21.6
9
Expansion using larger crustal structure databases

Algorithms based on a larger database of crystal
structure information
GOR II, III and IV
SOPM
http//npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?p
age/NPSA/npsa_server.html

SRRSASHPTYSEMIAAAIRAEKSRGGSSRQSIQKYIKSHYKVGHNADLQI
KLSIRRLLAAGVLKQTKGVG cccccccchhhhhhhhhhhhtccttcccc
hhhhhhhhhtcccccccthhhhhhhhhhhhhhhhhttttcc ASGSFRL
AKSDKAKRSPGKK cccceeeecccccccccccc
10
Homology based methods
11
Neural Network programs

A neural net has an input layer, hidden layers
composed of nodes given different weights, and an
output layer
Neural net trained with multiply aligned
sequences
Accuracy gt75
PHD
BLASTP
MAXHOM (sequence alignment)
Neural Net
Layer one 13 residue window
Layer two 17 residue window
Layer three Jury layer removes very short
stretches
PSIPRED
PSI-BLAST
Neural net
SSpro
PROTER
PROF

12
Predictions with Multiple Methods

No single prediction program is correct, and it
is generally good practice to use the output from
several programs
Some web servers do this
JPred
PHD, PREDATOR, DSC, NNSSP, Inet and ZPred
First submitted to PSI-BLAST
Multiple alignment
Submitted to above 6 programs
Consensus returned
No consensus, uses PHD
SRRSASHPTYSEMIAAAIRAEKSRGGSSRQSIQKYIKSHYKVGHNADLQI
KLSIRRLLAAGVLKQTKGVGASGSFRLAKSDKAKRSPGKK
---------HHHHHHHHHHH--------HHHHHHHHHH-------HHHHH
HHHHHHHH---EEEEE------EEEE--------------

13
How accurate?
14
Trans-membrane proteins

Two types of trans-membrane proteins
?-helix
?-barrel
Many consists solely of ?-helix and are found in
the cytoplasmic membrane
?-barrel normally found in outer-membrane of gram
negative bacteria
Difficult to get X-ray or NMR structure

?-helix perpendicular to membrane 17-25 residues
Hydrophobic residues separated by hydrophilic
loops (lt60 residues)
Residues bordering hydrophobic module is
generally charged
Inner cytosolic region most often highly charged
(orientation info)
Positive inside rule
Scan window 17-25 residues calculate
hydrophobicity score
Many false positives
Signal peptide sequences confuse algorithm

TMHMM
Trained with 160 known TM sequences
Probability of having an ?-helix is given
Orientation of ?-helix based on positive inside
rule
Phobius
Incorporates distinct HMM models for signal
peptides and TM helices
Signal peptide sequence ignored
Can use sequence homologs and multiply aligned
sequences

17
Prediction of ?-barrel proteins

?-strand forming trans-membrane section is
amphipatic
10-22 residues
Alternating hydrophobic and hydrophilic sequence
arrangement
?-helix TM prediction programs thus not
applicable to ?-barrel proteins
TBBpred
Neural net trained with ?-barrel protein
sequences

18
Coiled coil prediction

Two or more ?-helices winding around each other
For every 7 residues, 1 and 4 are hydrophobic,
facing central core
Coils
Scan window of 14, 21 or 28 residues
Compares residues to probability matrix based on
known coiled coils
Accurate for left-handed coil, but not
right-handed coil
Multicoil
Scoring matrix based on 2-strand and 3-strand
coils
Used in several genome-wide studies
Leucine zippers
sub-class of coiled coils
L-X6-L-X6-L-
Found in transcription factors
Anti-parallel ?-helices stabilized by leucine
core

19
Chapter 13 Protein Tertiary Structure Prediction
20

The need for predicting 3D structures
X-ray crystallography is extremely tedious
DNA sequences and therefore protein sequences are
rapidly generated
A gap between sequence and structure is widening
Protein structure often provides insight info
function
Thee main methods for 3D prediction
Homology modeling
Threading
Ab initio

21
Homology Modeling
22
Template Selection

Search PDB for homologous sequences with BLAST or
FASTA
Should have gt30 sequence identity (20 at a
stretch)
In case of multiple hits, choose
Highest identity
Highest resolution
Most appropriate co-factors

Sequence Alignment
Critical Incorrectly aligned residues will give
an incorrect model Use Praline or T-Coffee for
alignment Inspect visually to confirm alignment
of key residues
23
Backbone Model Building

Copy the backbone atoms of the query sequence to
that of the corresponding aligned residue
If the residues are identical, the coordinates of
the whole residue can be copied
If the residues are different, only the ?C are
copied
The remaining atoms of the residue are modeled
later

Loop Modeling

It often happens that there are gaps in the
aligned sequences
Two techniques to connect the protein on either
side of the gap
Database
Search database for fragments that fit the gap
Measure coordinates and orientation of backbone
on either side of gap
Search for fragments that can fit
Best loop gives no steric clash with structure
Ab Initio
Generate random loop No clash with nearby
side-chains
? And ? angles in acceptable region of
Ramachandran plot

24
Side Chain Refinement

Need to model side-chains where these differ from
aligned template sequence
Search database for all occurrences of given
side-chain in backbone conformation and minimal
clash with neighbouring residues
Computationally prohibitive
Library of rotamers
Collection of conformations for each residue that
is most often observed in structure database
Select rotamer with conformation that best fits
backbone
Minimal interference with neighbouring
side-chains
SCWRL

25
Model Refinement using Energy Function

After loop modeling and side-chain refinement the
follwing remain
Unfavourable torsion angles
Unacceptable proximity of atoms
Use energy minimization to alleviate such
problems
Limit number of iteration (lt100) to ensure that
the entire model does not change form the
template
Molecular Dynamic can be used to search for a
global minimum

Model Evaluation

Check consistency in ?-? angles
Bond lengths
Close contacts
Flag regions below acceptability threshold
Procheck
WHATIF
ANOLEA
Verify3D

26
Comprehensive Modeling Programs

Modeler
Swiss-Model
3D-Jigsaw

27
Threading and Fold Recognition

Pairwise Energy Method
Fit sequence to each fold in database
Use local alignment to improve fit
Calculate energies
Pairwise residue interaction
Solvation Hydrophobic
Profile Method
Fit sequence to fold
Calculate propensity of each amino acid to be
present at each profile position
Secondary structure types
Solvent exposure
Hydrophobicity
Use structure fold that best fits profile of
parameters

28
Ab Initio Prediction
Protein fold into a native, low-energy native
state The mechanism driving this process is
poorly understood Computationally untenable to
explore all possible states and calculate
energies A 40 residue peptide will require 1020
years to calculate all states using a 11012
FLOPS computer Not realistic approach currently

Write a Comment

User Comments (0)