A Statistical Geometry Approach to the Study of Protein Structure PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: A Statistical Geometry Approach to the Study of Protein Structure


1
A Statistical Geometry Approach to the Study of
Protein Structure
  • Majid Masso
  • Bioinformatics and Computational Biology
  • George Mason University

2
Protein Basics
  • formed by linearly linking amino acid residues
    (aas are the building blocks of proteins)
  • 20 distinct aa types
  • A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y

3
Protein Basics
  • genes code, or blueprint
  • proteins product, or building
  • protein structure gives rise to function
  • why do things go wrong?
  • mistakes in blueprint
  • incorrectly built, or nonexistent buildings
  • Protein Data Bank (PDB) repository of protein
    structural data, including 3D coords. of all
    atoms (www.rcsb.org/pdb/)

PDB ID 1REZ Structure reference Muraki M.,
Harata K., Sugita N., Sato K., Origin of
carbohydrate recognition specificity of human
lysozyme revealed by affinity labeling,
Biochemistry 35 (1996)
4
Computational Geometry Approach to Protein
Structure Prediction
  • Tessellation
  • protein structure represented as a set of points
    in 3D, using Ca coordinates
  • Voronoi tessellation convex polyhedra, each
    contains one Ca , all interior points closer to
    this Ca than any other
  • Delaunay tessellation connect four Ca whose
    Voronoi polyhedra meet at a common vertex
  • vertices of Delaunay simplices objectively define
    a set of four nearest-neighbor residues
    (quadruplets)
  • 5 classes of Delaunay simplices
  • Quickhull algorithm (qhull program), Barber et
    al., UMN Geometry Center

Voronoi/Delaunay tessellation in 2D space.
Voronoi tessellation-dashed line, Delaunay
tessellation-solid line (Adapted from Singh R.K.,
et al. J. Comput. Biol., 1996, 3, 213-222.)
Five classes of Delaunay simplices. (Adapted from
Singh R.K., et al. J. Comput. Biol., 1996, 3,
213-222.)
5
Counting Quadruplets
  • assuming order independence among residues
    comprising Delaunay simplices, the maximum number
    of all possible combinations of quadruplets
    forming such simplices is 8855

6
Residue Environment Scores
  • log-likelihood
  • normalized frequency of quadruplets
    containing residues i,j,k,l in a representative
    training set of high-resolution protein
    structures with low primary sequence identity
  • i.e., total number of quadruplets in
    dataset containing only residues i,j,k,l divided
    by total number of observed quadruplets
  • frequency of random occurrence of the
    quadruplet (multinomial)
  • i.e.,
  • total number of occurrences of residue i
    divided by total number of residues in the
    dataset
  • , where n number of distinct
    residue types in the
  • quadruplet, and t i is the
    number of residues of type i.

7
Residue Environment Scores
  • total statistical potential (topological score)
    of protein sum the log-likelihoods of all
    quadruplets forming the Delaunay simplices
  • individual residue potentials sum the
    log-likelihoods of all quadruplets in which the
    residue participates (yields a 3D-1D potential
    profile)

PDB ID 3phvHIV-1 Protease Monomer 99 amino
acids (total potential 27.93)
Structure reference R. Lapatto, T. Blundell, A.
Hemmings, et al., X-ray analysis of HIV-1
proteinase at 2.7 Å resolution confirms
structural homology among retroviral enzymes,
Nature 342 (1989) 299-302.
8
HIV-1 Protease Comprehensive Mutational Profile
(CMP)
  • mutate 19 times the residue present at each of
    the 99 positions in the primary sequence
  • get total potential and potential profile of each
    artificially created mutant protein
  • create 20x99 matrix containing total potentials
    of all the single residue mutants
  • columns labeled with residues in the primary
    sequence of wild-type (WT) HIV-1 protease
    monomer, and rows labeled with the 20 naturally
    occurring amino acids
  • subtract WT total potential (TP) from each cell,
    then average columns to get CMP
  • CMPj (mutant TP)ij-(WT TP)
    (mutant TP)ij-27.93 , j1,,99

9
(No Transcript)
10
Structure-Function Correlations
  • 536 single point missense mutations
  • 336 published mutants Loeb D.D., Swanstrom R.,
    Everitt L., Manchester M., Stamper S.E.,
    Hutchison III C.A. Complete mutagenesis of the
    HIV-1 protease. Nature, 1989, 340, 397-400
  • 200 mutants provided by R. Swanstrom (UNC)
  • each mutant placed in one of 3 phenotypic
    categories, positive, negative, or intermediate,
    based on activity
  • mutant activity compared with change in
    sequence-structure compatibility elucidated by
    potential data

11
Observations
  • set of mutants with unaffected protease activity
    exhibit minimal (negative) change in potential
  • set of mutants that inactivate protease exhibit
    large negative change in potential, weighted
    heavily by NC
  • set of mutants with intermediate phenotypes
    exhibit moderate negative change in potential
    (similar among C and NC) wide range for
    intermediate phenotype in the experiments

12
Acknowledgements
  • Iosif Vaisman (Ph.D. advisor, first to apply
    Delaunay to protein structure)
  • Zhibin Lu (Java programs for calculating
    statistical potentials from tessellations)
  • Ronald Swanstrom (experimental HIV-1 protease
    mutants and activity measure)
Write a Comment
User Comments (0)
About PowerShow.com