Protein families and structure prediction - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Protein families and structure prediction

Description:

The Rosetta Method Using I chains - David Baker ... Rosetta Method ... Rosetta was the best predictor in the CASP4 meeting. Summary of structure prediction ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 20

Provided by: mou68

Category:

more less

Transcript and Presenter's Notes

Title: Protein families and structure prediction

1
Protein families and structure prediction

Classification of proteins by sequence
similarity
Prediction of 2D and 3D structure from amino
acid sequence

2
Protein classification and structure prediction
Protein classification and structure prediction

Protein classification schemes include the
following
family (gt50 identity) and superfamily
(significant identity but ltlt50) based on
sequence alignments
domain classification (PFAM, interpro) based on
local alignments of domains
global clustering analysis based on degree of
similarity or significance of alignment score
following pair-wise alignment

3
Maximal linkage clustering

sequence similarity ()
seq 1 2 3 4 5 6 7
70 60 23 28 26 17
75 20 30 24 20
24 25 22 15
65 53 13
60 12

1
2
3
Find the largest cluster, that maximizes the ave.
score for cluster1,2,3 ave. (706075)/3
68 but for 1,2,3,4 ave. 706075232024 45
so dont add 4 to 1,2,3 seq. 7 is an orphan
5
6
4
7
Strong link high sequence similarity or very
low E value

4
Flow chart for structure prediction
Protein sequence
Database similarity search
Protein family, domain, cluster analysis
Does sequence align with protein of known 3D
structure?
no
3D comparative modeling
Predicted three dimensional structure
Relation-ship to known structure?
yes
no
3D analysis in laboratory
Is there a predicted structure?
Structural analysis
no
5
Secondary structure prediction
sequence
Sliding window tries to predict secondary
structure of amino acid in middle of window
13-17
METHODS

score types of amino acids in window Chou
Fasman and GOR methods
neural networks
nearest neighbor methods

6
Chou Fasman and GOR

Secondary structure of middle amino acid scored
in sequence window for known structures
scoring system made for each amino acid in each
type of structure (helix, strand, or loop) score
(V in helices) freq of V in windows within
helices / freq of all amino acids in helices
Rules used to decide what is predicted for a
particular segment need series of same score

7
Neural networks e.g. PHD
Used for general secondary structure prediction
and for prediction of protein class e.g. membrane
proteins

input layer is the sliding window plus any
homologous sequences
nn is trained on known sequences by adjusting
weights
the hidden layer can detect correlations within
sequence window
output layer fed into another network that keeps
track of sequential predictions

8
Nearest neighbor methods

Make a table of sequence windows from known
structures noting structure of middle aa in
window
Find 50 best alignments of window in test
sequence with this table
Score frequencies of helix, strand, loop in the
middle amino acid position.
Scan sequence for series of high scoring
predictions.

50 matching windows
h
h
h
l
h
s
h
h
Looks like middle amino acid should be in a helix
9
Zinc finger
One of the most commonly predicted structures
because of the conserved pattern of Cs and Hs.
10
A leucine zipper caused by a repeat of leucines
at every second turn of two antiparallel alpha
helices. The helices bond and can attach to DNA,
cause protein-protein interactions or form a
coiled coil.
11
Three dimensional prediction

Hidden Markov models for a few families and
classes
Threading using structural profiles
Threading by the contact potential method
New method - the I chain method

12
hidden Markov models
junction
Variable series of match states for a loop
P3
Start of alpha unit
P1
Series of match states for an alpha helix
End of alpha unit
P2
1 - P3
Variable series of match states for a beta turn
1 P2
1 P1 (transitional probability)
This diagram illustrates how to match a sequence
to a model of a set of proteins that have both
sequence and structural similarities. Only the
part of the model region is shown.
13
Threading a sequence through structural core
models
First, prepare a series of core models
represented by scoring matrices, HMMs of or some
other model that represents the whole core
sequence.

Thread the test sequence through the cores to
find a good match
Achieved by aligning the sequence with the
models and screening for a high score or
probability.

14
Structural or 3D profile method

Determine structural parameters for each amino
acid in a core
The parameters include neighbor geometry and
closeness, chemical environment, hyrophobicity,
secondary structures of nearby amino acids, etc.
Based on this analysis, each amino acid in the
core is assigned to one of 18 environmental types
The ability of each of the other 19 amino acids
to fit into this environment is determined
A scoring matrix with gap penalties (a profile)
is then made for each core based on the above.
A test sequence is aligned with the profiles by
dynamic programming

15
Contact potential method

method is like the distance matrix method for
aligning structures

the new sequence is superimposed on the 2D
representation of each structure
the object is to try and fit the amino acids so
that the distances between adjacent ones are
suitable for van der Waals contacts
this is a form of energy minimzation that itself
is also undergoing development

16
The Rosetta Method Using I chains - David Baker
Laboratoryhttp//depts.washington.edu/bmsdwp/

Structural similarity is conserved more strongly
than sequence similarity
Search same 3D structural folds for distant
sequence similarities
These are found and are short patterns of 3-15
amino acids called I chains

17
The Rosetta Method

Rosetta is based on a picture of protein folding
in which local sequence segments rapidly
alternate between different possible local
structures, and folding occurs when the
conformations and relative orientations of these
local segments combine to form low energy global
structures. D. Baker

18
Rosetta Method

The I chains form a series of structures close
too the optimal, most energetic one
For structure prediction, the object is to
through a sequence for matches to I chains and
then generate a best local match
Sequences searched for matches in a window of 9
amino acids to see if they are represented
A compatible 3D model is then produced
Rosetta was the best predictor in the CASP4
meeting

19
Summary of structure prediction

The most important predictor is sequence
similarity, even very distant sequence similarity
Some structures are readily predictable e.g.
membrane spanning helices whereas other are much
more difficult to predict
Alpha helices are quite accurately predicted
All methods suffer from a memory of the training
method the models are overtrained
A truly blind experiment the CASP meetings have
revealed that none of the methods works
particularly well on some new structures -
Rosetta is best so far