Protein Structure Prediction - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Protein Structure Prediction

Description:

Title: PowerPoint Presentation Author: Xiaole Shirley Liu Last modified by: Jun Liu Created Date: 1/3/2005 7:27:35 PM Document presentation format – PowerPoint PPT presentation

Number of Views:219
Avg rating:3.0/5.0
Slides: 48
Provided by: XiaoleSh
Category:

less

Transcript and Presenter's Notes

Title: Protein Structure Prediction


1
Protein Structure Prediction
  • Xiaole Shirley Liu
  • And
  • Jun Liu
  • STAT115

2
Protein Structure Prediction Ram
Samudrala University of Washington
3
Outline
  • Motivations and introduction
  • Protein 2nd structure prediction
  • Protein 3D structure prediction
  • CASP
  • Homology modeling
  • Fold recognition
  • ab initio prediction
  • Manual vs automation
  • Structural genomics

4
Protein Structure
  • Sequence determines structure, structure
    determines function
  • Most proteins can fold by itself very quickly
  • Folded structure lowest energy state

5
Protein Structure
  • Main forces for considerations
  • Steric complementarity
  • Secondary structure preferences (satisfy H bonds)
  • Hydrophobic/polar patterning
  • Electrostatics

6
Rationale for understanding protein structure and
function
Protein sequence -large numbers of sequences,
including whole genomes
?
Protein function - rational drug design and
treatment of disease - protein and genetic
engineering - build networks to model cellular
pathways - study organismal function and evolution
7
Protein Databases
  • SwissProt protein knowledgebase
  • PDB Protein Data Bank, 3D structure

8
View Protein Structure
  • Free interactive viewers
  • Download 3D coordinate file from PDB
  • Quick and dirty
  • VRML
  • Rasmol
  • Chime
  • More powerful
  • Swiss-PdbViewer

9
Compare Protein Structures
  • Structure is more conserved than sequence
  • Why compare?
  • Detect evolutionary relationships
  • Identify recurring structural motifs
  • Predicting function based on structure
  • Assess predicted structures
  • Protein structure comparison and classification
  • Manual SCOP
  • Automated DALI

10
Compare protein structures
  • Need ways to determine if two protein structures
    are related and to compare predicted
  • models to experimental structures
  • Commonly used measure is the root mean square
    deviation (RMSD) of the Cartesian
  • atoms between two structures after optimal
    superposition (McLachlan, 1979)
  •  
  • Usually use Ca atoms
  •  
  • Other measures include contact maps and torsion
    angle RMSDs

11
SCOP
  • Compare protein
  • structure, identify
  • recurring structural
  • motifs, predict function
  • A. Murzin et al, 1995
  • Manual classification
  • A few folds are highly
  • populated
  • 5 folds contain 20 of all homologous
    superfamilies
  • Some folds are multifunctional

12
Determine Protein Structure
  • X-ray crystallography (gold standard)
  • Grow crystals, rate limiting, relies on the
    repeating structure of a crystalline lattice
  • Collect a diffraction pattern
  • Map to real space electron density, build and
    refine structural model
  • Painstaking and time consuming

13
Protein Structure Prediction
  • Since AA sequence determines structure, can we
    predict protein structure from its AA sequence?
  • predicting the three angles, unlimited DoF!
  • Physical properties that determine fold
  • Rigidity of the protein backbone
  • Interactions among amino acids, including
  • Electrostatic interactions
  • van der Waals forces
  • Volume constraints
  • Hydrogen, disulfide bonds
  • Interactions of amino acids with water

14
Protein folding landscape Large
multi-dimensional space of changing conformations
free energy
folding reaction
15
Protein primary structure
16
2nd Structure Prediction
  • ? helix, ? sheet, turn/loop

17
2nd Structure Prediction
  • Chou-Fasman 1974
  • Base on 15 proteins (2473 AAs) of known
    conformation, determine P?, P? from
  • ? 0.5-1.5
  • Empirical rules for 2nd struct nucleation
  • 4 H? or h? out of 6 AA, extends to both dir, P? gt
    1.03, P? gt P?, no ? breakers
  • 3 H? or h? out of 5 AA, extends to both dir, P? gt
    1.05, P? gt P?, no ? breakers
  • Have 50-60 accuracy

18
P? and P?
19
2nd Structure Prediction
  • Garnier, Osguthorpe, Robson, 1978
  • Assumption each AA influenced by flanking
    positions
  • GOR scoring tables (problem limited dataset)
  • Add scores, assign 2nd with highest score

20
2nd Structure Prediction
  • D. Eisenberg, 1986
  • Plot hydrophobicity as function of sequence
    position, look for periodic repeats
  • Period 3-4 AA, ? (3.6 aa / turn)
  • Period 2 AA, ? sheet
  • Best overall JPRED by Geoffrey Barton, use many
    different approaches, get consensus
  • Overall accuracy 72.9

21
3D Protein Structure Prediction
  • CASP contest Critical Assessment of Structure
    Prediction
  • Biannual meeting since 1994 at Asilomar, CA
  • Experimentalists before CASP, submit sequence of
    to-be-solved structure to central repository
  • Predictors download sequence and minimal
    information, make predictions in three categories
  • Assessors automatic programs and experts to
    evaluate predictions quality

22
CASP Category I
  • Homology Modeling (sequences with high homology
    to sequences of known structure)
  • Given a sequence with homology gt 25-30 with
    known structure in PDB, use known structure as
    starting point to create a model of the 3D
    structure of the sequence
  • Takes advantage of knowledge of a closely related
    protein. Use sequence alignment techniques to
    establish correspondences between known
    template and unknown.

23
CASP Category II
  • Fold recognition (sequences with no sequence
    identity (lt 30) to sequences of known structure
  • Given the sequence, and a set of folds observed
    in PDB, see if any of the sequences could adopt
    one of the known folds
  • Takes advantage of knowledge of existing
    structures, and principles by which they are
    stabilized (favorable interactions)

24
CASP Category III
  • Ab initio prediction (no known homology with any
    sequence of known structure)
  • Given only the sequence, predict the 3D structure
    from first principles, based on energetic or
    statistical principles
  • Secondary structure prediction and multiple
    alignment techniques used to predict features of
    these molecules. Then, some method necessary for
    assembling 3D structure.

25
Structure Prediction Evaluation
  • Hydrophobic core similar?
  • 2nd struct identified?
  • Energy minimized? H-bond contacts?
  • Compare with solved crystal structure gold
    standard

26
Comparative modelling of protein structure
refine
27
Homology Modeling Results
  • When sequence homology is gt 70, high resolution
    models are possible (lt 3 Å RMSD)
  • MODELLER (Sali et al)
  • Find homologous proteins with known structure and
    align
  • Collect distance distributions between atoms in
    known protein structures
  • Use these distributions to compute positions for
    equivalent atoms in alignment
  • Refine using energetics

28
Homology Modeling Results
  • Many places can go wrong
  • Bad template - it doesnt have the same structure
    as the target after all
  • Bad alignment (a very common problem)
  • Good alignment to good template still gives wrong
    local structure
  • Bad loop construction
  • Bad side chain positioning

29
Homology Modeling Results
  • Use of sensitive multiple alignment (e.g.
    PSI-BLAST) techniques helped get best alignments
  • Sophisticated energy minimization techniques do
    not dramatically improve upon initial guess

30
Fold Recognition Results
  • Also called protein threading
  • Given new sequence and library of known folds,
    find best alignment of sequence to each fold,
    returned the most favorable one

31
Fold Recognition with Dynamic Programming
  • Environmental class for each AA based on known
    folds (buried status, polarity, 2nd struct)

32
Protein Folding with Dynamic Programming
  • D. Eisenburg 1994
  • Align sequence to each fold (a string of
    environmental classes)
  • Advantages fast and works pretty well
  • Disadvantages do not consider AA contacts

33
Fold Recognition Results
  • Each predictor can submit N top hits
  • Every predictor does well on something
  • Common folds (more examples) are easier to
    recognize
  • Fold recognition was the surprise performer at
    CASP1. Incremental progress at CASP2, CASP3,
    CASP4

34
Fold Recognition Results
  • Alignment (seq to fold) is a big problem

35
ab initio
  • Predict interresidue contacts and then compute
    structure (mild success)
  • Simplified energy term reduced search space
    (phi/psi or lattice) (moderate success)
  • Creative ways to memorize sequence ?? structure
    correlations in short segments from the PDB, and
    use these to model new structures ROSETTA

36
Ab initio prediction of protein structure
sample conformational space such that native-like
conformations are found
hard to design functions that are not fooled by
non-native conformations (decoys)
astronomically large number of conformations 5
states/100 residues 5100 1070
37
Sampling conformational space continuous
approaches
  • Most work in the field
  • Molecular dynamics
  • Continuous energy minimization (follow a valley)
  • Monte Carlo simulation
  • Genetic Algorithms
  • Like real polypeptide folding process
  • Cannot be sure if native-like conformations are
    sampled

38
Molecular dynamics
  • Force -dU/dx (slope of potential U)
    acceleration, force m a(t)
  • All atoms are moving so forces between atoms are
    complicated functions of time
  • Analytical solution for x(t) and v(t) is
    impossible numerical solution is trivial
  • Atoms move for very short times of 10-15 seconds
    or 0.001 picoseconds (ps)
  • x(tDt) x(t) v(t)Dt 4a(t) a(t-Dt)
    Dt2/6
  • v(tDt) v(t) 2a(tDt)5a(t)-a(t-Dt) Dt/6
  • Ukinetic ½ S mivi(t)2 ½ n KBT
  • Total energy (Upotential Ukinetic) must not
    change with time

acceleration
old velocity
old position
new position
new velocity
n is number of coordinates (not atoms)
39
Energy minimization
  • For a given protein, the energy depends on
    thousands of x,y,z Cartesian atomic coordinates
    reaching a deep minimum is not trivial
  • Furthermore, we want to minimize the free energy,
    not just the potential energy.

40
Monte Carlo Simulation
  • Propose moves in torsion or Cartesian
    conformation space
  • Evaluate energy after every move, compute ?E
  • Accept the new conformation based on
  • If run infinite time, the simulated conformation
    follows the Boltzmann distribution
  • Many variations, including simulated annealing
    and other heuristic approaches.

41
Scoring/energy functions
  • Need a way to select native-like conformations
    from non-native ones
  • Physics-based functions electrostatics, van der
    Waals, solvation, bond/angle terms.
  • Knowledge-based scoring functions
  • Derive information about atomic properties from a
    database of experimentally determined
    conformations
  • Common parameters include pairwise atomic
    distances and amino acid burial/exposure.

42
Rosetta
  • D. Baker, U. Wash
  • Break sequence into short segments (7-9 AA)
  • Sample 3D from library of known segment
    structures, parallel computation
  • Use simulated annealing (metropolis-type
    algorithm) for global optimization
  • Propose a change, if better energy, take
    otherwise take at smaller probability
  • Create 1000 structures, cluster and choose one
    representative from each cluster to submit

43
Manual Improvements and Automation
  • Very often manual examination could improve
    prediction
  • Catch errors
  • Need domain knowledge
  • A. Murzins success at CASP2
  • CAFASP Critical Assessment of Fully Automated
    Structure Prediction
  • Murzin Cant play!!
  • MetaServers combine different methods to get
    consensus

44
CAFASP Evaluation
45
Structural Genomics
  • With more and more solved structures and novel
    folds, computational protein structure prediction
    is going to improve
  • Structural genomics
  • Worldwide initiative to high throughput determine
    many protein structures
  • Especially, solve structures that have no homology

46
Summary
  • Protein structures 1st, 2nd, 3rd, 4th
  • Different DB SwissProt, PDB and SCOP
  • Determine structure X-ray crystallography
  • Protein structure prediction
  • 2nd structure prediction
  • Homology modeling
  • Fold recognition
  • Ab initio
  • Evaluation energy, RMSD, etc
  • CASP and CAFASP contest
  • Manual improvement and combination of
    computational approaches work better
  • Structural Genomics, still very difficult problem

47
Acknowledgement
  • Amy Keating
  • Michael Yaffe
  • Mark Craven
  • Russ Altman
Write a Comment
User Comments (0)
About PowerShow.com