Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

Description:

Multiple sequence alignment and family profiles. Secondary structure and solvent ... membrane regions based on 'hydropathy' (or hydrophobicity) profiles. ... – PowerPoint PPT presentation

Number of Views:340
Avg rating:3.0/5.0
Slides: 30
Provided by: pediatrici
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction


1
Introduction to Bioinformatics Lecture
XIComputational Protein Structure Prediction
  • Jarek Meller
  • Division of Biomedical Informatics,
  • Childrens Hospital Research Foundation
  • Department of Biomedical Engineering, UC

2
Outline of the lecture
  • Protein structure and complexity of
    conformational search from similarity based
    methods to de novo structure prediction
  • Multiple sequence alignment and family profiles
  • Secondary structure and solvent accessibility
    prediction
  • Matching sequences with known structures
    threading and fold recognition
  • Ab initio folding simulations

3
Polypeptide chains backbone and side-chains
N-ter
C-ter
4
Distinct chemical nature of amino acid side-chains
C-ter
PHE
N-ter
CYS
VAL
GLU
ARG
5
Hydrogen bonds and secondary structures
b-strand
a-helix
6
Tertiary structure and long range contacts
annexin
7
Quaternary structure and protein-protein
interactions annexin hexamer
8
Domains, interactions, complexes cyclin D and Cdk
Cyclin Box
9
Domains, interactions, complexes VHL
10
Protein folding problem
  • The protein folding problem consists of
    predicting three-dimensional structure of a
    protein from its amino acid sequence
  • Hierarchical organization of protein structures
    helps to break the problem into secondary
    structure, tertiary structure and protein-protein
    interaction predictions
  • Computational approaches for protein structure
    prediction similarity based and de novo methods

11
Polypeptide chains backbone and rotational
degrees of freedom
               H     O         R2
                          
        NH3--Ca -- C -- N -- Ca -- C --O-
                              
       \\                 R1         
H    H        O
The equilibrium length of the peptide bond (C --
N) is about 2 Ang. The average Ca - Ca
distance in a polypeptide chain is about 3.8
Ang. The angle of rotation around N - Ca bond
is called j, and the angle around the Ca - C
bond is called f. These two angles define the
overall conformation of polypeptide
chains. Simplifying, there are three discrete
states (rotations) for each of these single
bonds, implying 9N possible backbone
conformations. 
12
Scoring alternative conformations with empirical
force fields (folding potentials)
Ideally, each misfolded structure should have an
energy higher than the native energy, i.e.
Emisfolded - Enative gt 0
E
misfolded
native
13
Ab initio (or de novo) folding simulations
  • When dealing with a new fold, the similarity base
    methods cannot be applied
  • Ab initio folding simulations consist of
    conformational search with an empirical scoring
    function (force field) to be maximized (or
    minimized)
  • Computational bottleneck exponential search
    space and sampling problem (global optimization!)
  • Fundamental problem inaccuracy of empirical
    force fields
  • Importance of mixed protocols, such as Rosetta by
    D. Baker and colleagues (more when Monte Carlo
    protocols for global optimization are introduced)

14
Similarity based approaches to structure
prediction from sequence alignment to fold
recognition
  • High level of redundancy in biology sequence
    similarity is often sufficient to use the guilt
    by association rule if similar sequence then
    similar structure and function
  • Multiple alignments and family profiles can
    detect evolutionary relatedness with much lower
    sequence similarity, hard to detect with pairwise
    sequence alignments Psi-BLAST by S. Altschul et.
    al.
  • For sufficiently close proteins one may
    superimpose the backbones using sequence
    alignment and then perform conformational search
    (with the backbone fixed) to find the optimal
    geometry (according to atomistic empirical force
    field) of the side-chains homology modeling
    (e.g. Modeller by A. Sali et. al.)
  • Many structures are already known (see PDB) and
    one can match sequences directly with structures
    to enhance structure recognition fold
    recognition
  • For both, fold recognition and de novo
    simulation, prediction of intermediate attributes
    such secondary structure or solvent accessibility
    helps to achieve better sensitivity and
    specificity

15
Protein families and domains
The notion of protein family is derived from
evolutionary considerations members of the same
family are related, perform the same function
and are assumed to have diverged from the same
ancestor. The notion of domain is derived from
structural considerations A domain is defined
as an autonomous structural unit, or a reusable
sequence unit that may be found in multiple
protein contexts, Baterman et. al.
PFAM (7246 families as of April
2004) http//www.sanger.ac.uk/Software/Pfam/ PRO
DOM http//prodes.toulouse.inra.fr/prodom/current
/html/home.php CDD http//www.ncbi.nlm.nih.gov/S
tructure/cdd/cddsrv.cgi Check pfam00134.11,
Cyclin_N
16
Multiple alignment and PSSM
17
Multiple alignment, clustering and families
  • DP search gives optimal solution scaling
    exponentially with the number of sequences K,
    O(nK), not practical for more than 3,4 sequences.
  • Standard heuristics start from pairwise
    alignments (e.g. PsiBLAST, Clustalw)
  • Hidden Markov Model approach to family profiles
    (profile HMM) as an alternative with pre-fixed
    parameters, trained separately for each family.
    Some initial multiple alignments necessary for
    training (next lecture).

18
Predicting 1D protein profiles from sequences
secondary structures and solvent accessibility
a) Multiple alignment and family profiles improve
prediction of local structural propensities. b)
Use of advanced machine learning techniques, such
as Neural Networks or Support Vector Machines
improves results as well. B. Rost and C. Sander
were first to achieve more than 70 accuracy in
three state (H, E, C) classification, applying a)
and b).
SABLE server http//sable.cchmc.org POLYVIEW
server http//polyview.cchmc.org
19
Predicting 1D protein profiles from sequences
secondary structures and solvent accessibility
20
Predicting transmembrane domains
21
Hydropathy profiles and membrane domains
prediction
Problem Design a simple algorithm for finding
putative trans- membrane regions based on
hydropathy (or hydrophobicity) profiles.
Consider an extension based on prototypes and
k-NN.
22
Predicting transmembrane domains
23
Going beyond sequence similarity threading and
fold recognition
When sequence similarity is not detectable use a
library of known structures to match your
query with target structures. As in case of de
novo folding, one needs a scoring function that
measures compatibility between sequences and
structures.
24
Why fold recognition?
  • Divergent (common ancestor) vs. convergent (no
    ancestor) evolution
  • PDB virtually all proteins with 30 seq.
    identity have similar structures, however most of
    the similar structures share only up to 10 of
    seq. identity !
  • www.columbia.edu/rost/Papers/1997_evolution/paper
    .html (B. Rost)
  • www.bioinfo.mbb.yale.edu/genome/foldfunc/ (H.
    Hegyi, M. Gerstein)

25
Simple contact model for protein structure
prediction
Each amino acid is represented by a point in 3D
space and two amino acids are said to be in
contact if their distance is smaller than a
cutoff distance, e.g. 7 Ang.
26
Sequence-to-structure matching with contact models
  • Generalized string matching problem aligning a
    string of amino acids against a string of
    structural sites characterized by other
    residues in contact
  • Finding an optimal alignment with gaps using
    inter-residue pairwise models
  • E S klt l e k l ,
  • is NP-hard because of the non-local character
    of scores at a given structural site (identity of
    the interaction partners may change depending on
    location of gaps in the alignment)
  • R.H. Lathrop, Protein Eng. 7 (1994)

27
Hydrophobic contact model and sequence-to-structur
e alignment
-
HPHPP
  • Solutions to this yet another instance of the
    global optimization problem
  • Heuristic (e.g. frozen environment approximation)
  • Profile or local scoring functions (folding
    potentials)

28
Using sequence similarity, predicted secondary
structures and contact potentials fold
recognition protocols
In practice fold recognition methods are often
mixtures of sequence matching and threading,
e.g., with compatibility between a sequence and a
structure measured by contact potentials and
predicted secondary structures compared to the
secondary structure of a template). D.Fischer
and D. Eisenberg, Curr. Opinion in Struct. Biol.
1999, 9 208
29
Some fold recognition servers
  • PsiBLAST (Altschul SF et. al., Nucl. Acids Res.
    25 3389)
  • Live Bench evaluation (http//BioInfo.PL/LiveBench
    /1/)
  • FFAS (L. Rychlewski, L. Jaroszewski, W. Li, A.
    Godzik (2000), Protein Science 9 232) seq.
    profile against profile
  • 3D-PSSM (Kelley LA, MacCallum RM, Sternberg JE,
    JMB 299 499 ) 1D-3D profile combined with
    secondary structures and solvation potential
  • GenTHREADER (Jones DT, JMB 287 797) seq.
    profile combined with pairwise interactions and
    solvation potential
  • LOOPP annotations of remote homologs
  • http//www.tc.cornell.edu/CBIO/loopp
Write a Comment
User Comments (0)
About PowerShow.com