Protein families and structure prediction - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Protein families and structure prediction

Description:

The Rosetta Method Using I chains - David Baker ... Rosetta Method ... Rosetta was the best predictor in the CASP4 meeting. Summary of structure prediction ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 20
Provided by: mou68
Category:

less

Transcript and Presenter's Notes

Title: Protein families and structure prediction


1
Protein families and structure prediction
  • Classification of proteins by sequence
    similarity
  • Prediction of 2D and 3D structure from amino
    acid sequence

2
Protein classification and structure prediction
Protein classification and structure prediction
  • Protein classification schemes include the
    following
  • family (gt50 identity) and superfamily
    (significant identity but ltlt50) based on
    sequence alignments
  • domain classification (PFAM, interpro) based on
    local alignments of domains
  • global clustering analysis based on degree of
    similarity or significance of alignment score
    following pair-wise alignment

3
Maximal linkage clustering
  • sequence similarity ()
  • seq 1 2 3 4 5 6 7
  • 70 60 23 28 26 17
  • 75 20 30 24 20
  • 24 25 22 15
  • 65 53 13
  • 60 12

1
2
3
Find the largest cluster, that maximizes the ave.
score for cluster1,2,3 ave. (706075)/3
68 but for 1,2,3,4 ave. 706075232024 45
so dont add 4 to 1,2,3 seq. 7 is an orphan
5
6
4
7
Strong link high sequence similarity or very
low E value

4
Flow chart for structure prediction
Protein sequence
Database similarity search
Protein family, domain, cluster analysis
Does sequence align with protein of known 3D
structure?
no
3D comparative modeling
Predicted three dimensional structure
Relation-ship to known structure?
yes
no
3D analysis in laboratory
Is there a predicted structure?
Structural analysis
no
5
Secondary structure prediction
sequence
Sliding window tries to predict secondary
structure of amino acid in middle of window
13-17
METHODS
  • score types of amino acids in window Chou
    Fasman and GOR methods
  • neural networks
  • nearest neighbor methods

6
Chou Fasman and GOR
  • Secondary structure of middle amino acid scored
    in sequence window for known structures
  • scoring system made for each amino acid in each
    type of structure (helix, strand, or loop) score
    (V in helices) freq of V in windows within
    helices / freq of all amino acids in helices
  • Rules used to decide what is predicted for a
    particular segment need series of same score

7
Neural networks e.g. PHD
Used for general secondary structure prediction
and for prediction of protein class e.g. membrane
proteins
  • input layer is the sliding window plus any
    homologous sequences
  • nn is trained on known sequences by adjusting
    weights
  • the hidden layer can detect correlations within
    sequence window
  • output layer fed into another network that keeps
    track of sequential predictions

8
Nearest neighbor methods
  • Make a table of sequence windows from known
    structures noting structure of middle aa in
    window
  • Find 50 best alignments of window in test
    sequence with this table
  • Score frequencies of helix, strand, loop in the
    middle amino acid position.
  • Scan sequence for series of high scoring
    predictions.

50 matching windows
h
h
h
l
h
s
h
h
Looks like middle amino acid should be in a helix
9
Zinc finger
One of the most commonly predicted structures
because of the conserved pattern of Cs and Hs.
10
A leucine zipper caused by a repeat of leucines
at every second turn of two antiparallel alpha
helices. The helices bond and can attach to DNA,
cause protein-protein interactions or form a
coiled coil.
11
Three dimensional prediction
  • Hidden Markov models for a few families and
    classes
  • Threading using structural profiles
  • Threading by the contact potential method
  • New method - the I chain method

12
hidden Markov models
junction
Variable series of match states for a loop
P3
Start of alpha unit
P1
Series of match states for an alpha helix
End of alpha unit
P2
1 - P3
Variable series of match states for a beta turn
1 P2
1 P1 (transitional probability)
This diagram illustrates how to match a sequence
to a model of a set of proteins that have both
sequence and structural similarities. Only the
part of the model region is shown.
13
Threading a sequence through structural core
models
First, prepare a series of core models
represented by scoring matrices, HMMs of or some
other model that represents the whole core
sequence.
  • Thread the test sequence through the cores to
    find a good match
  • Achieved by aligning the sequence with the
    models and screening for a high score or
    probability.

14
Structural or 3D profile method
  • Determine structural parameters for each amino
    acid in a core
  • The parameters include neighbor geometry and
    closeness, chemical environment, hyrophobicity,
    secondary structures of nearby amino acids, etc.
  • Based on this analysis, each amino acid in the
    core is assigned to one of 18 environmental types
  • The ability of each of the other 19 amino acids
    to fit into this environment is determined
  • A scoring matrix with gap penalties (a profile)
    is then made for each core based on the above.
  • A test sequence is aligned with the profiles by
    dynamic programming

15
Contact potential method
  • method is like the distance matrix method for
    aligning structures
  • the new sequence is superimposed on the 2D
    representation of each structure
  • the object is to try and fit the amino acids so
    that the distances between adjacent ones are
    suitable for van der Waals contacts
  • this is a form of energy minimzation that itself
    is also undergoing development

16
The Rosetta Method Using I chains - David Baker
Laboratoryhttp//depts.washington.edu/bmsdwp/
  • Structural similarity is conserved more strongly
    than sequence similarity
  • Search same 3D structural folds for distant
    sequence similarities
  • These are found and are short patterns of 3-15
    amino acids called I chains

17
The Rosetta Method
  • Rosetta is based on a picture of protein folding
    in which local sequence segments rapidly
    alternate between different possible local
    structures, and folding occurs when the
    conformations and relative orientations of these
    local segments combine to form low energy global
    structures. D. Baker

18
Rosetta Method
  • The I chains form a series of structures close
    too the optimal, most energetic one
  • For structure prediction, the object is to
    through a sequence for matches to I chains and
    then generate a best local match
  • Sequences searched for matches in a window of 9
    amino acids to see if they are represented
  • A compatible 3D model is then produced
  • Rosetta was the best predictor in the CASP4
    meeting

19
Summary of structure prediction
  • The most important predictor is sequence
    similarity, even very distant sequence similarity
  • Some structures are readily predictable e.g.
    membrane spanning helices whereas other are much
    more difficult to predict
  • Alpha helices are quite accurately predicted
  • All methods suffer from a memory of the training
    method the models are overtrained
  • A truly blind experiment the CASP meetings have
    revealed that none of the methods works
    particularly well on some new structures -
    Rosetta is best so far
Write a Comment
User Comments (0)
About PowerShow.com