Folie 1 - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Folie 1

Description:

Minimizes disallowed torsion angles. Maximizes number of hydrogen bonds ... Protein Backbone Torsion Angles ... Protein Backbone Torsion Angles. PROMOTIF ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 64
Provided by: jrgen7
Category:
Tags: folie | torsion

less

Transcript and Presenter's Notes

Title: Folie 1


1
3D Structures of Biological Macromolecules Part
5 Protein Structure Prediction
Jürgen Sühnel jsuehnel_at_fli-leibniz.de
Leibniz Institute for Age Research, Fritz Lipmann
Institute, Jena Centre for Bioinformatics Jena /
Germany
Supplementary Material http//www.fli-leibniz.de/
www_bioc/3D/
2
PDB Content Growth
New Structures Per Year Per Day 1993
698 2 2003 4181 11 2004
5212 14 2005 5402 15 2006
6541 18
(no theoretical structures)
Start
3
PDB Statistics
4
SwissProt/TrEMBL Growth Rate
15-Jan-2008
5
Swiss-Prot/TrEMBL Amino Acid Composition
Swiss-Prot
TrEMBL
15-Jan-2008
6
Structural Genomics
Structural genomics consists in the determination
of the three dimensional structure of all
proteins of a given organism, by experimental
methods such as X-ray crystallography, NMR
spectroscopy or computational approaches such as
homology modelling. As opposed to traditional
structural biology, the determination of a
protein structure through a structural genomics
effort often (but not always) comes before
anything is known regarding the protein
function. This raises new challenges in
structural bioinformatics, i.e. determining
protein function from its 3D structure. One of
the important aspects of structural genomics is
the emphasis on high throughput determination of
protein structures. This is performed in
dedicated centers of structural genomics. While
most structural biologists pursue structures of
individual proteins or protein groups,
specialists in structural genomics pursue
structures of proteins on a genome wide scale.
This implies large scale cloning, expression and
purification. One main advantage of this approach
is economy of scale. On the other hand, the
scientific value of some resultant structures is
at times questioned.
en.wikipedia.org/wiki/Structural_genomics
7
Structural Genomics
8
Protein Structure Prediction
clickable map
http//speedy.embl-heidelberg.de/gtsp/flowchart2.h
tml
9
Protein Structure Prediction
10
A Good Protein Structure
  • Minimizes disallowed torsion angles
  • Maximizes number of hydrogen bonds
  • Minimizes interstitial cavities or spaces
  • Minimizes number of bad contacts
  • Minimizes number of buried charges

11
Protein Structure Prediction CAFASP Contest
http//www.cs.bgu.ac.il/dfischer/CAFASP5/
12
Protein Structure Prediction CASP Contest
http//predictioncenter.gc.ucdavis.edu/
13
  • Secondary structure
  • 3D structure
  • Modeling by homology (Comparative modeling)
  • Fold recognition (Threading)
  • Ab initio prediction
  • Rule-based approaches
  • Lattice models
  • Simulating the time dependence of folding
  • Refinement
  • Exploring the effect of single amino acid
    substitutions
  • Ligand effects on protein structure and dynamics
    (induced fit)

Protein Structure Prediction
14
Lysozyme
15
Lysozyme 5lyz
16
Lysozyme 5lyz
17
Lysozyme 5lyz Information from the JenaLib
Atlas Page
old
18
Lysozyme 5lyz Information from the JenaLib
Atlas Page
new
19
Lysozyme 5lyz Information from the JenaLib
Atlas Page
20
Lysozyme 5lyz Information from the JenaLib
Atlas Page
21
Lysozyme 5lyz Information from the JenaLib
Atlas Page
22
Lysozyme 5lyz PROSITE Signature
23
Lysozyme 5lyz PROSITE Signature
24
PROMOTIF Secondary Structure Analysis 5lyz
. .
25
Protein Backbone Torsion Angles
D. W. Mount Bioinformatics, Cold Spring Harbor
Laboratory Press, 2001.
26
Protein Backbone Torsion Angles
27
PROMOTIF Secondary Structure Analysis 5lyz
28
PROMOTIF Secondary Structure Analysis 5lyz
29
PROMOTIF Secondary Structure Analysis 5lyz
30
Chou-Fasman Secondary Structure Prediction
31
Chou-Fasman Secondary Structure Prediction
32
Amino Acid Propensities
From a database of experimental 3D structures,
calculate the propensity for a given amino acid
to adopt a certain type of secondary structure
  • Example
  • N(Ala)2,000 N(tot)20,000 N(Ala, helix)500
    N(helix)4,000.
  • P(Ala,helix) N(Ala,helix)/N(helix) /
    N(Ala)/N(tot)
  • P(Ala,helix) 500/4,000/2,000/20,000 1.25
  • Used in Chou-Fasman algorithm

33
Chou-Fasman Secondary Structure Prediction
  • Assign all of the residues in the peptide the
    appropriate set of parameters.
  • Scan through the peptide and identify regions
    where 4 out of 6 contiguous residues have
    P(a-helix) gt 100.
  • That region is declared an alpha-helix. Extend
    the helix in both directions until a set of four
    contiguous
  • residues that have an average P(a-helix) lt 100
    is reached. That is declared the end of the
    helix.
  • If the segment defined by this procedure is
    longer than 5 residues and the average
  • P(a-helix) gt P(b-sheet) for that segment, the
    segment can be assigned as a helix.
  • Repeat this procedure to locate all of the
    helical regions in the sequence.
  • Scan through the peptide and identify a region
    where 3 out of 5 of the residues have a value of
  • P(b-sheet) gt 100. That region is declared as a
    beta-sheet. Extend the sheet in both directions
  • until a set of four contiguous residues that
    have an average P(b-sheet) lt 100 is reached.
  • That is declared the end of the beta-sheet. Any
    segment of the region located by this procedure
  • is assigned as a beta-sheet if the average
    P(b-sheet) gt 105 and the average P(b-sheet) gt
    P(a-helix)
  • for that region.
  • Any region containing overlapping alpha-helical
    and beta-sheet assignments are taken to be
    helical if the
  • average P(a-helix) gt P(b-sheet) for that region.
    It is a beta sheet if the average
  • P(b-sheet) gt P(a-helix) for that region.
  • To identify a bend at residue number j, calculate
    the following value
  • p(t) f(j)f(j1)f(j2)f(j3)

34
Lysozyme 5lyz Chou-Fasman Secondary Structure
Prediction
http//fasta.bioch.virginia.edu/fasta_www/chofas.h
tm
35
Lysozyme 5lyz Chou-Fasman Secondary Structure
Prediction
GRCE (0.570.980.701.39) 0.91 RCEL
(0.980.701.391.41) 1.12 CELA
(0.701.391.411.42) 1.23 ELAA
(1.391.411.421.42) 1.41
http//fasta.bioch.virginia.edu/fasta_www/chofas.h
tm
36
Lysozyme 5lyz PhD/PROF Structure Prediction
PROF_sec PROF predicted secondary structure
Hhelix, Eextended (sheet), blankother
(loop) PROF PROF Profile network prediction
Heidelberg Rel_sec reliability index for
PROF_sec prediction (0low to 9high)
SUB_sec subset of the PROFsec prediction, for
all residues with an expected average accuracy gt
82 (tables in header) NOTE for this subset the
following symbols are used L is loop (for
which above ' ' is used) . means that no
prediction is made for this residue, as the
reliability is Rel lt 5 O3_acc observed
relative solvent accessibility (acc) in 3 states
b 0-9, i 9-36, e 36-100. P3_acc PROF
predicted relative solvent accessibility (acc) in
3 states b 0-9, i 9-36, e
36-100. Rel_acc reliability index for PROFacc
prediction (0low to 9high) SUB_acc subset of
the PROFacc prediction, for all residues with an
expected average correlation gt 0.69 (tables in
header) NOTE for this subset the following
symbols are used I is intermediate (for which
above ' ' is used) . means that no prediction
is made for this residue, as the reliability is
Rel lt 4
http//cubic.bioc.columbia.edu/predictprotein/subm
it_def.htmltop
37
Lysozyme 5lyz PhD/PROF Structure Prediction,
BLAST
http//cubic.bioc.columbia.edu/predictprotein/subm
it_def.htmltop
38
Lysozyme 5lyz PhD/PROF Structure Prediction,
BLAST
http//cubic.bioc.columbia.edu/predictprotein/subm
it_def.htmltop
39
Lysozyme 5lyz PhD/PROF Structure Prediction
  • Perform BLAST search to find local alignments
  • Remove alignments that are too close
  • Perform multiple alignments of sequences
  • Construct a profile (PSSM) of amino-acid
    frequencies at each residue
  • Use this profile as input to the neural network
  • A second network performs smoothing
  • The third level computes jury decision of several
    different instantiations of the first two levels.

http//cubic.bioc.columbia.edu/predictprotein/subm
it_def.htmltop
40
Lysozyme 5lyz PsiPred Structure Prediction
http//bioinf.cs.ucl.ac.uk/psipred/psiform.html
41
PsiPred
PSIPRED is a  simple and reliable secondary
structure prediction method, incorporating two
feed-forward neural networks which perform an
analysis on output obtained from PSI-BLAST
(Position Specific Iterated - BLAST). Version
2.0 of PSIPRED includes a new algorithm which
averages the output from up to 4 separate neural
networks in the prediction process to further
increase prediction accuracy. Using a very
stringent cross validation method to evaluate the
method's performance, PSIPRED 2.0 is capable of
achieving an average Q3 score of nearly 78.
Predictions produced by PSIPRED were also
submitted to the CASP4 server and assessed
during the CASP4 meeting, which took place in
December 2000 at Asilomar. PSIPRED 2.0 achieved
an average Q3 score of 80.6 across all 40
submitted target domains with no obvious
sequence similarity to structures present in PDB,
which placed PSIPRED in first place out of 20
evaluated methods (an earlier version of PSIPRED
was also ranked first in CASP3 held in 1998).
http//bioinf.cs.ucl.ac.uk/psipred/psiform.html
42
PSI-BLAST
Position specific iterative BLAST (PSI-BLAST)
refers to a feature of BLAST 2.0 in which a
profile (or position specific scoring matrix,
PSSM) is constructed (automatically) from a
multiple alignment of the highest scoring hits in
an initial BLAST search. The PSSM is generated
by calculating position-specific scores for each
position in the alignment. Highly conserved
positions receive high scores and weakly
conserved positions receive scores near zero.
The profile is used to perform a second (etc.)
BLAST search and the results of each "iteration"
are used to refine the profile. This iterative
searching strategy results in increased
sensitivity.
43
Comparing Secondary Structure Prediction Results
PsiPred
Chou-Fasman
Phd/PROF
44
Comparing Secondary Structure Prediction Results
45
Protein Secondary Structure Prediction - Summary
  • 1st Generation - 1970s
  • Chou Fasman, Q3 50-55
  • 2nd Generation -1980s
  • Qian Sejnowski, Q3 60-65
  • 3rd Generation - 1990s
  • PHD, PSI-PRED, Q3 70-80
  • Features of the new methods
  • Taking into account evolutionary information
  • Neural networks
  • Failures
  • Nonlocal sequence interactions
  • Wrong prediction at the ends of H/E

Q3 Percentage of correctly assigned amino acids
in a test set
46
Protein Structure Prediction
http//speedy.embl-heidelberg.de/gtsp/flowchart2.h
tml
47
Modeling by Homology (Comparative Modeling)
http//salilab.org/modeller/
48
Modeling by Homology (Comparative Modeling)
http//modbase.compbio.ucsf.edu/modbase-cgi-new/se
arch_form.cgi
49
Modeling by Homology (Comparative Modeling)
http//modbase.compbio.ucsf.edu/modbase-cgi-new/se
arch_form.cgi
50
Modeling by Homology (Comparative Modeling)
http//modbase.compbio.ucsf.edu/modbase-cgi-new/se
arch_form.cgi
51
Modeling by Homology (Comparative Modeling)
http//swissmodel.expasy.org/
52
Modeling by Homology (Comparative Modeling)
  • Comparative modeling predicts the
    three-dimensional structure of a given
  • protein sequence (target) based primarily on its
    alignment to one or more proteins
  • of known structure (templates).
  • The prediction process consists of
  • fold assignment,
  • target  template alignment,
  • model building, and
  • model evaluation and refinement.
  • The number of protein sequences that can be
    modeled and the accuracy of
  • the predictions are increasing steadily because
    of the growth in the number of
  • known protein structures and because of the
    improvements in the modeling
  • software.
  • Further advances are necessary in recognizing
    weak sequence  structure
  • similarities, aligning sequences with structures,
    modeling of rigid body shifts,
  • distortions, loops and side chains, as well as
    detecting errors in a model.

http//salilab.org/modeller/
53
Fold Recognition (Threading)
Methods of protein fold recognition attempt to
detect similarities between protein 3D structure
that are not accompanied by any significant
sequence similarity. The unifying theme of
these appraoches is to try and find folds that
are compatible with a particular sequence.
Unlike sequence-only comparison, these methods
take advantage of the extra information made
available by 3D structure information. Rather
than predicting how a sequence will fold, they
predict how well a fold will fit a sequence.
54
Fold Recognition (Threading) Why ?
  • Secondary structure is more conserved than
    primary structure
  • Tertiary structure is more conserved than
    secondary structure
  • Therefore very remote relationships can be better
    detected through 2o or 3o structural homology
    instead of sequence homology

55
Fold Recognition (Threading)
56
Fold Recognition (Threading) 2 Kinds
  • 2D Threading or Prediction Based Methods (PBM)
  • Predict secondary structure (SS) or ASA of query
  • Evaluate on basis of SS and/or ASA matches
  • 3D Threading or Distance Based Methods (DBM)
  • Create a 3D model of the structure
  • Evaluate using a distance-based hydrophobicity
    or pseudo-thermodynamic potential

57
Fold Recognition
  • Database of 3D structures and sequences
  • Protein Data Bank (or non-redundant subset)
  • Query sequence
  • Sequence lt 25 identity to known structures
  • Alignment protocol
  • Dynamic programming
  • Evaluation protocol
  • Distance-based potential or secondary structure
  • Ranking protocol

58
Fold Recognition
http//www.sbg.bio.ic.ac.uk/3dpssm/index2.html
59
Ab Initio Prediction
  • Predicting the 3D structure without any prior
    knowledge
  • Used when homology modelling or threading have
    failed (no homologues are evident)
  • Equivalent to solving the Protein Folding
    Problem
  • Still a research problem

60
Ab Initio Prediction
http//rosettadesign.med.unc.edu/
61
Ab Initio Prediction
http//rosettadesign.med.unc.edu/
62
Ab Initio Prediction Lysozyme (5lyz)
http//rosettadesign.med.unc.edu/
63
Combining Prediction Procedures
http//robetta.bakerlab.org/
Write a Comment
User Comments (0)
About PowerShow.com