A MULTIBODY ATOMIC STATISTICAL POTENTIAL FOR PREDICTING ENZYME-INHIBITOR BINDING ENERGY - PowerPoint PPT Presentation

About This Presentation

Title:

A MULTIBODY ATOMIC STATISTICAL POTENTIAL FOR PREDICTING ENZYME-INHIBITOR BINDING ENERGY

Description:

PDB repository of solved (x-ray, nmr, ...) structures Each structure file contains atomic 3D coordinate data ... Six-letter atomic alphabet: C, N, O, S, M (metals ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 2

Provided by: Maj779

Category:

more less

Transcript and Presenter's Notes

Title: A MULTIBODY ATOMIC STATISTICAL POTENTIAL FOR PREDICTING ENZYME-INHIBITOR BINDING ENERGY

1
A MULTIBODY ATOMIC STATISTICAL POTENTIAL FOR
PREDICTING ENZYME-INHIBITOR BINDING ENERGY Majid
Masso (mmasso_at_gmu.edu) Laboratory for Structural
Bioinformatics, School of Systems Biology, George
Mason University, 10900 University Blvd. MS 5B3,
Manassas, Virginia 20110, USA
II. Protein Data Bank (http//www.rcsb.org/pdb)
III. Macromolecular Modeling
I. Abstract
Accurate prediction of enzyme-inhibitor binding
energy has the capacity to speed drug design and
chemical genomics efforts by helping to narrow
the focus of experiments. Here a non-redundant
set of three hundred high-resolution
crystallographic enzyme-inhibitor structures was
compiled for analysis, complexes with known
binding energies (?G) based on the availability
of experimentally determined inhibition constants
(ki). Additionally, a separate set of over 1400
diverse high-resolution macromolecular crystal
structures was collected for the purpose of
creating an all-atom knowledge-based statistical
potential, via application of the Delaunay
tessellation computational geometry technique.
Next, two hundred of the enzyme-inhibitor
complexes were randomly selected to develop a
model for predicting binding energy, first by
tessellating structures of the complexes as well
as the enzymes without their bound inhibitors,
then by using the statistical potential to
calculate a topological score for each structure
tessellation. We derived as a predictor of
binding energy an empirical linear function of
the difference between topological scores for a
complex and its isolated enzyme. A correlation
coefficient (r) of 0.79 was obtained for the
experimental and calculated ?G values, with a
standard error of 2.34 kcal/mol. Lastly, the
model was evaluated with the held-out set of one
hundred complexes, for which structure
tessellations were performed in order to
calculate topological score differences, and
binding energy predictions were generated from
the derived linear function. Calculated binding
energies for the test data also compared well
with their experimental counterparts, displaying
a correlation coefficient of r 0.77 with a
standard error of 2.50 kcal/mol.

PDB repository of solved (x-ray, nmr, ...)
structures
Each structure file contains atomic 3D coordinate
data

Native structure is conformation having lowest
energy
Physics-based energy calculations using quantum
mechanics are computationally impractical
Same for molecular mechanics-based potential
energy functions (i.e., force fields) E(total)
E(bond) E(angle) E(dihedral)
E(electrostatic) E(van der Waals)
Alternative (our approach) knowledge-based
potentials of mean force (i.e., generated from
known protein structures)

Atom
X
Y
Z

IV. Knowledge-Based Potentials of Mean Force
V. Motivational ExamplePairwise Amino Acid
Potential
VI. All-Atom Four-Body Statistical Potential

Obtain diverse PDB dataset of 1417 single chain
and multimeric proteins, many complexed to
ligands (see XV. References)
Six-letter atomic alphabet C, N, O, S, M
(metals), X (other)
Apply Delaunay tessellation to the atomic point
coordinates of each PDB file objectively
identifies all nearest-neighbor quadruplets of
atoms in the structure (8 angstrom cutoff)

Assumptions
At equilibrium, native state has global free
energy min
Microscopic states (i.e., features) follow
Boltzmann dist
Examples
Well-documented in the literature
distance-dependent pairwise interactions at the
atomic or amino acid level
This study inclusion of higher-order
contributions by developing all-atom four-body
statistical potentials
Motivation (our prior work)
Four-body protein potential at the amino acid
level

A 20-letter protein alphabet yields 210 residue
pairs
Obtain large, diverse PDB dataset of single
protein chains
For each residue pair (i, j), calculate the
relative frequency fij with which they appear
within a given distance (e.g., 12 angstroms) of
each other in all the protein structures
Calculate a rate pij expected by chance alone
from a background or reference distribution (more
later)
Apply inverted Bolzmann principle sij log(fij
/ pij) quantifies interaction propensity and is
proportional to the energy of interaction (by a
factor of RT)

VII. All-Atom Four-Body Statistical Potential
VIII. Summary Data for the 1417 Structure Files
and their Delaunay Tessellations
IX. All-Atom Four-Body Statistical Potential

A six-letter atomic alphabet yields 126 distinct
quadruplets
For each quad (i, j, k, l), calculate observed
rate of occurrence fijkl among all tetrahedra
from the 1417 structure tessellations
Compute rate pijkl expected by chance from a
multinomial reference distribution
an proportion of atoms from all structures that
are of type n
tn number of occurrences of atom type n in the
quad
Apply inverted Bolzmann principle sijkl
log(fijkl / pijkl) quantifies the interaction
propensity and is proportional to the energy of
atomic quadruplet interaction

X. Topological Score (TS)
XII. Application of ?TS Predicting
EnzymeInhibitor Binding Energy
XI. Topological Score Difference (?TS)

Delaunay tessellation of any macromolecular
structure yields an aggregate of tetrahedral
simplices
Each simplex can be scored using the all-atom
four-body potential based on the quad present at
the four vertices
Topological score (or total potential) of the
structure the sum of all constituent simplices
in the tessellation

MOAD repository of exp. inhibition constants
(ki) for proteinligand complexes whose
structures are in PDB
Collected ki values for 300 complexes reflecting
diverse protein structures
Obtained exp. binding energy from ki via ?Gexp
RTln(ki)
Calculated ?TS for complexes

TS Ssijkl
sijkl
XIII. Predicting EnzymeInhibitor Binding Energy
XIV. Predicting EnzymeInhibitor Binding Energy
XV. References and Acknowledgments

PDB dataset http//proteins.gmu.edu/automute/tess
ellatable1417.txt
Train/test dataset http//proteins.gmu.edu/automu
te/MOAD300ki.txt
PDB (structure DB) http//www.rcsb.org/pdb
MOAD (ligand binding DB) http//bindingmoad.org/
Qhull (Delaunay tessellation) http//www.qhull.or
g/
UCSF Chimera (ribbon/ball-stick structure
visualization) http//www.cgl.ucsf.edu/chimera/
Matlab (tessellation visualization)
http//www.mathworks.com/products/matlab/

For the test set of 100 remaining complexes
r 0.77 between ?Gcalc and ?Gexp
SE 2.50 kcal/mol
Fitted regression line is y 1.07x 0.46
All training/test data available online as a text
file (see XV. References)

Randomly selected 200 complexes to train a model
Correlation coefficient r 0.79 between ?TS and
?Gexp
Empirical linear transform of ?TS to reflect
energy values
?Gcalc (1 / 0.0003) ?TS 6.24
Linear gt same r 0.79 value between ?Gcalc and
?Gexp
Also, standard error of SE 2.34 kcal/mol and
fitted regression line of y 0.98x 0.41 (y
?Gcalc and x ?Gexp)