Protein Secondary Structures - PowerPoint PPT Presentation

1 / 52

About This Presentation

Title:

Protein Secondary Structures

Description:

From http://www.imb-jena.de. Helices. phi(deg) psi(deg) H-bond pattern ... 6-8 Percentage points increase in prediction accuracy over standard neural networks ... – PowerPoint PPT presentation

Number of Views:151

Avg rating:3.0/5.0

Slides: 53

Provided by: joha96

Category:

more less

Transcript and Presenter's Notes

Title: Protein Secondary Structures

1
Protein Secondary Structures

Assignment and prediction

2
Secondary Structure Elements
ß-strand
3
Use of secondary structure

Classification of protein structures
Definition of loops (active sites)
Use in fold recognition methods
Improvements of alignments
Definition of domain boundaries

4
Classification of secondary structure

Defining features
Dihedral angles
Hydrogen bonds
Geometry
Assigned manually by crystallographers or
Automatic
DSSP (Kabsch Sander,1983)
STRIDE (Frishman Argos, 1995)
DSSPcont (Andersen et al., 2002)

5
Dihedral Angles
6
Helices
phi(deg) psi(deg)
H-bond pattern ----------------------------------
-------------------------------- right-handed
alpha-helix -57.8 -47.0
i4 pi-helix -57.1
-69.7 i5 310 helix
-74.0 -4.0 i3 (omega is 180 deg in
all cases) ---------------------------------------
--------------------------
From http//www.imb-jena.de
7
Beta Strands
Hydrogen bond patterns in beta sheets. Here a
four-stranded beta sheet is drawn schematically
which contains three antiparallel and one
parallel strand. Hydrogen bonds are indicated
with red lines (antiparallel strands) and green
lines (parallel strands) connecting the hydrogen
and receptor oxygen.
8
Secondary Structure Elements
ß-strand
9
Helix formation is local
THYROID hormone receptor (2nll)
10
b-sheet formation is NOT local
11
Secondary Structure Type Descriptions
12
Automatic assignment programs

DSSP ( http//www.cmbi.kun.nl/gv/dssp/ )
STRIDE ( http//www.hgmp.mrc.ac.uk/Registered/Opti
on/stride.html )
DSSPcont ( http//cubic.bioc.columbia.edu/services
/DSSPcont/ )

RESIDUE AA STRUCTURE BP1 BP2 ACC N-H--gtO
O--gtH-N N-H--gtO O--gtH-N TCO KAPPA ALPHA
PHI PSI X-CA Y-CA Z-CA 1 4 A E
0 0 205 0, 0.0 2,-0.3 0, 0.0
0, 0.0 0.000 360.0 360.0 360.0 113.5 5.7
42.2 25.1 2 5 A H - 0 0
127 2, 0.0 2,-0.4 21, 0.0 21, 0.0 -0.987
360.0-152.8-149.1 154.0 9.4 41.3 24.7
3 6 A V - 0 0 66 -2,-0.3
21,-2.6 2, 0.0 2,-0.5 -0.995
4.6-170.2-134.3 126.3 11.5 38.4 23.5 4
7 A I E -A 23 0A 106 -2,-0.4
2,-0.4 19,-0.2 19,-0.2 -0.976
13.9-170.8-114.8 126.6 15.0 37.6 24.5 5
8 A I E -A 22 0A 74 17,-2.8
17,-2.8 -2,-0.5 2,-0.9 -0.972
20.8-158.4-125.4 129.1 16.6 34.9 22.4 6
9 A Q E -A 21 0A 86 -2,-0.4
2,-0.4 15,-0.2 15,-0.2 -0.910 29.5-170.4
-98.9 106.4 19.9 33.0 23.0 7 10 A A
E A 20 0A 18 13,-2.5 13,-2.5
-2,-0.9 2,-0.3 -0.852 11.5 172.8-108.1 141.7
20.7 31.8 19.5 8 11 A E E A 19
0A 63 -2,-0.4 2,-0.3 11,-0.2 11,-0.2
-0.933 4.4 175.4-139.1 156.9 23.4 29.4
18.4 9 12 A F E -A 18 0A 31
9,-1.5 9,-1.8 -2,-0.3 2,-0.4 -0.967
13.3-160.9-160.6 151.3 24.4 27.6 15.3 10
13 A Y E -A 17 0A 36 -2,-0.3
2,-0.4 7,-0.2 7,-0.2 -0.994
16.5-156.0-136.8 132.1 27.2 25.3 14.1 11
14 A L E gtgt -A 16 0A 24 5,-3.2
4,-1.7 -2,-0.4 5,-1.3 -0.929
11.7-122.6-120.0 133.5 28.0 24.8 10.4 12
15 A N T 45S 0 0 54 -2,-0.4 -2,
0.0 2,-0.2 0, 0.0 -0.884 84.3 9.0-113.8
150.9 29.7 22.0 8.6 13 16 A P T
45S 0 0 114 0, 0.0 -1,-0.2 0, 0.0
-2, 0.0 -0.963 125.4 60.5 -86.5 8.5 32.0
21.6 6.8 14 17 A D T 45S- 0 0
66 2,-0.1 -2,-0.2 1,-0.1 3,-0.1 0.752
89.3-146.2 -64.6 -23.0 33.0 25.2 7.6 15
18 A Q T lt5 0 0 132 -4,-1.7
2,-0.3 1,-0.2 -3,-0.2 0.936 51.1 134.1
52.9 50.0 33.3 24.2 11.2 16 19 A S E
lt A 11 0A 44 -5,-1.3 -5,-3.2 2, 0.0
2,-0.3 -0.877 28.9 174.9-124.8 156.8 32.1
27.7 12.3 17 20 A G E -A 10 0A
28 -2,-0.3 2,-0.3 -7,-0.2 -7,-0.2 -0.893
15.9-146.5-151.0-178.9 29.6 28.7 14.8 18
21 A E E -A 9 0A 14 -9,-1.8
-9,-1.5 -2,-0.3 2,-0.4 -0.979
5.0-169.6-158.6 146.0 28.0 31.5 16.7 19
22 A F E A 8 0A 3 12,-0.4
12,-2.3 -2,-0.3 2,-0.3 -0.982 27.8
149.2-139.1 120.3 26.5 32.2 20.1 20 23
A M E -AB 7 30A 0 -13,-2.5 -13,-2.5
-2,-0.4 2,-0.4 -0.983 39.7-127.8-152.1 161.6
24.5 35.4 20.6 21 24 A F E -AB 6
29A 45 8,-2.4 7,-2.9 -2,-0.3 8,-1.0
-0.934 23.9-164.1-112.5 137.7 21.7 37.0
22.6 22 25 A D E -AB 5 27A 6
-17,-2.8 -17,-2.8 -2,-0.4 2,-0.5 -0.948
6.9-165.0-123.7 138.3 18.9 38.9 20.8 23
26 A F E gt S-AB 4 26A 76 3,-3.5
3,-2.1 -2,-0.4 -19,-0.2 -0.947 78.4
-27.2-127.3 111.5 16.4 41.3 22.3 24 27
A D T 3 S- 0 0 74 -21,-2.6 -20,-0.1
-2,-0.5 -1,-0.1 0.904 128.9 -46.6 50.4 45.0
13.4 42.1 20.2 25 28 A G T 3 S 0
0 20 -22,-0.3 2,-0.4 1,-0.2 -1,-0.3
0.291 118.8 109.3 84.7 -11.1 15.4 41.4
17.0 26 29 A D E lt S-B 23 0A 114
-3,-2.1 -3,-3.5 109, 0.0 2,-0.3 -0.822
71.8-114.7-103.1 140.3 18.4 43.4 18.1 27
30 A E E -B 22 0A 8 -2,-0.4
-5,-0.3 -5,-0.2 3,-0.1 -0.525 24.9-177.7
-74.1 127.5 21.8 41.8 19.1
13
Prediction of protein secondary structure

What to predict?
How to predict?
How good are the best?

14
Secondary Structure Prediction

What to predict?
All 8 types or pool types into groups

DSSP
H alpha helix G 310 -helix I 5
helix (pi helix) E extended strand B
beta-bridge T hydrogen bonded turn S
bend C coil
15
Secondary Structure Prediction

What to predict?
All 8 types or pool types into groups

Straight HEC
H alpha helix E extended strand T
hydrogen bonded turn S bend C
coil G 310-helix I 5 helix (pi helix) B
beta-bridge
16
Secondary Structure Prediction

Simple alignments
Align to a close homolog for which the structure
has been experimentally solved.
Heuristic Methods (e.g., Chou-Fasman, 1974)
Apply scores for each amino acid an sum up over a
window.
Neural Networks (different inputs)
Raw Sequence (late 80s)
Blosum matrix (e.g., PhD, early 90s)
Position specific alignment profiles (e.g.,
PsiPred, late 90s)
Multiple networks balloting, probability
conversion, output expansion (Petersen et al.,
2000).

17
The pessimistic point of viewPrediction by
alignment
18
Secondary structure predictions of 1. and 2.
generation

single residues (1. generation)
Chou-Fasman, GOR 1957-70/8050-55 accuracy
segments (2. generation)
GORIII 1986-9255-60 accuracy
problems
lt 100 they said 65 max
lt 40 they said strand non-local
short segments

19
Improvement of accuracy
20
Simple Alignments

Solved structure of a homolog to query is needed
Homologous proteins have 88 identical (3
state) secondary structure
If no close homologue can be identified
alignments will give almost random results

21
Amino acid preferences in a-Helix
22
Amino acid preferences in b-Strand
23
Amino acid preferences in coil
24
Chou-Fasman
25
Chou-Fasman
1. Assign all of the residues in the peptide the
appropriate set of parameters. 2. Scan through
the peptide and identify regions where 4 out of 6
contiguous residues have P(a-helix) gt 100. That
region is declared an alpha-helix. Extend the
helix in both directions until a set of four
contiguous residues that have an average
P(a-helix) lt 100 is reached. That is declared the
end of the helix. If the segment defined by this
procedure is longer than 5 residues and the
average P(a-helix) gt P(b-sheet) for that segment,
the segment can be assigned as a helix. 3.
Repeat this procedure to locate all of the
helical regions in the sequence. 4. Scan through
the peptide and identify a region where 3 out of
5 of the residues have a value of P(b-sheet) gt
100. That region is declared as a beta-sheet.
Extend the sheet in both directions until a set
of four contiguous residues that have an average
P(b-sheet) lt 100 is reached. That is declared the
end of the beta-sheet. Any segment of the region
located by this procedure is assigned as a
beta-sheet if the average P(b-sheet) gt 105 and
the average P(b-sheet) gt P(a-helix) for that
region. 5. Any region containing overlapping
alpha-helical and beta-sheet assignments are
taken to be helical if the average P(a-helix) gt
P(b-sheet) for that region. It is a beta sheet if
the average P(b-sheet) gt P(a-helix) for that
region. 6. To identify a bend at residue number
j, calculate the following value p(t)
f(j)f(j1)f(j2)f(j3) where the f(j1) value for
the j1 residue is used, the f(j2) value for the
j2 residue is used and the f(j3) value for the
j3 residue is used. If (1) p(t) gt 0.000075 (2)
the average value for P(turn) gt 1.00 in the
tetra-peptide and (3) the averages for the
tetra-peptide obey the inequality P(a-helix) lt
P(turn) gt P(b-sheet), then a beta-turn is
predicted at that location.
26
Chou-Fasman

General applicable
Works for sequences with no solved homologs
But the accuracy is low!

27
Neural Networks

Benefits
General applicable
Can capture higher order correlations
Inputs other than sequence information
Drawbacks
Needs many data (different solved structures).
However, theese does exist today (nearly 2500
solved structures with low sequence identity/high
resolution.)
Complex method with several pitfalls

28
How is it done

One network (SEQ2STR) takes sequence (profiles)
as input and predicts secondary structure
Cannot deal with SS elements i.e. helices are
normally formed by at least 5 consecutive
aminoacids
Second network (STR2STR) takes predictions of
first network and predicts secondary structure
Can correct for errors in SS elements, i.e remove
single helix prediction, mixture of strand and
helix predictions

29
Architecture
30
Secondary networks(Structure-to-Structure)
31
Example

PITKEVEVEYLLRRLEE (Sequence)
HHHHHHHHHHHHTGGG. (DSSP)
ECCCHEEHHHHHHHCCC (SEQ2STR)
CCCCHHHHHHHHHHCCC (STR2STR)

32
PHD method (Rost and Sander)

Combine neural networks with sequence profiles
6-8 Percentage points increase in prediction
accuracy over standard neural networks
Use second layer Structure to structure network
to filter predictions
Jury of predictors
Set up as mail server

33
Sequence profiles
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
Prediction accuracy PHD
39
Stronger predictions more accurate!
40
PSI-Pred (Jones)

Use alignments from iterative sequence searches
(PSI-Blast) as input to a neural network
Better predictions due to better sequence
profiles
Available as stand alone program and via the web

41
Petersen et al. 2000

SEQ2STR (gt70 networks)
Not one single network architecture is best for
all sequences
STR2STR (gt70 network)
gt 4900 network predictions,
Others have 1
ACT2PROB (not used by others)

42
Why so many networks?
43
Why not select the best?
44
Prediction accuracy (Q381.2). 2006. (Petersen
et al. 2000)
45
Spectrin homology domain (SH3)
CEEEEEEECCCCCCCCCCCCCCCCEEEEEECCCCCEEEEEECCCEEEECC
CCCEECC .EEEEESS.B...STTB..B.TT.EEEEEE..SSSEEEEEET
TEEEEEEGGGEEE..
Petersen
93
46
False prediction for engineered proteins!
47
Benchmarking secondary structure predictions

CASP
Critical Assessment of Structure Predictions
Sequences from about-to-be-deposited-structures
are given to groups who submit their predictions
before the structure is published
Every 2. year
EVA
Newly solved structures are send to prediction
servers.
Every week

48
EVA results (Rost et al., 2001)

PROFphd 77.0
PSIPRED 76.8
SAM-T99sec 76.1
SSpro 76.0
Jpred2 75.5
PHD 71.7
Cubic.columbia.edu/eva

49
EVA secondary structure
76
50
Prediction of protein secondary structure

1980 55 simple
1990 60 less simple
1993 70 evolution
2000 76 more evolution
2006 80 more evolution
what is the limit?
88 for proteins of similar structure
80 for 1/5th of proteins with families gt 100
missing through better definition of secondary
structure including long-range interactions
structural switches
chameleon / folding

51
Links to servers