Title: Proteinligand docking: A case study of DEF docking motif interactions in MAP kinases
1Protein-ligand docking A case study of DEF
docking motif interactions in MAP kinases
- Yong Kong
- Bioinformatics Resource
- Yale University
2Outline
- Available programs in Bioinformatics Resource
- Introduction to molecular docking
- Autodock 4 a free docking software
- Substrate discrimination among MAP kinases
through distinct docking motifs - Modeling DEF docking motif interactions in MAP
kinases using Autodock 4
3Available commercial software
- DNA/protein sequence analysis
- Lasergene
- Gene Construction Kits
- Microarray analysis
- Genespring GX
- Partek Genomics Suite
- Pathway Analysis
- Ingenuity Pathway Analysis
- MetaCore
- Genotyping analyses
- Genespring GT
- HelixTree
4Available commercial software
- Protein structure modeling and visualization
- SYBYL 8
- Pipelining programs
- Pipeline Pilot
- VIBE
- Mass spectrometry data analysis
- GPMAW
5SYBYL
6SYBYL
7SYBYL
- SYBYL Base Comprehensive tools for molecular
modeling - structure building, optimization, and comparison
- visualization of structures and associated data
- annotation, hardcopy, and screen capture
capabilities - a wide range of force fields
Electrostatic potential for inhibitor
methotrexate bound to dihydrofolate reductase
8SYBYL
- Receptor Based Design docking and de novo design
- Ligand Based Design QSAR, ADME, pharmacophore,
structure alignment, etc
Docked inhibitor (yellow) superimposed with
crystal structure (purple)
Left pharmacophore model right X-way structure
(CDK2 inhibitors)
9SYBYL
- Protein Modeling
- A database of detailed structural profiles of all
known protein families - Structural homologs identified by
sequence-structure comparison - Comparative models built from a target sequence
using single or multiple structural homologs
A set of structurally aligned oxidoreductase
structures of 8 sequence identity.
10Molecular docking
- Computationally predict
- the structure (pose)
- binding free energy
- of the intermolecular complex formed between two
or more constituent molecules
11Questions and Goals
- The questions we are interested in are
- Do two biomolecules bind each other?
- If so, how and where do they bind?
- What is the binding free energy or affinity?
- What chemical groups determine the binding?
- The goals we have are
- Searching for lead compounds
- Estimating effect of modifications
- General understanding of binding
- Design directed libraries
-
12Docking input data
- The starting point
- the atomic coordinates of the two molecules
- Additional data
- biochemical
- mutational
- conservational
-
- These additional data can significantly improve
the performance however, this extra information
is not absolutely necessary
13Docking two components
- Two related components of docking
- Search algorithm sample sufficiently and
efficiently the degrees of freedom of the
proteinligand system (position, orientation, and
conformation) - Scoring function represent the thermodynamics of
interactions so as to distinguish the true
binding modes from all the other possible
solutions, and to rank them accordingly
14Flowchart of docking algorithms
15Rigid or flexible molecules
- Protein ligand
- Rigid protein rigid ligand
- Rigid protein flexible ligand
- Flexible protein flexible ligand
- Protein protein
- Rigid protein rigid protein (still the
standard) - Introducing flexibilities into protein-protein
docking is challenging
16Docking software total number of citations till
2005
Sousa, et. al (2006)
17Docking programs citations per year
Sousa, et. al (2006)
18Docking programs percentage of citations per year
freely available for academic users
Sousa, et. al (2006)
19Autodock
- Developed in Arthur Olsons lab in the Scripps
Research Institute - Free academic license
- The most used program for molecular docking
- The latest version is Autodock 4
20Autodock features
- Pre-calculate atomic affinity potentials for each
atom type in the ligand - Support different search methods
- Lamarckian genetic algorithm (LGA)
- traditional genetic algorithm (GA)
- Monte Carlo simulated annealing
- Reasonably accurate binding free energy the
scaling factors are empirically calibrated from
experimental data
21Pre-calculated grid maps
- A grid map consists of a three dimensional
lattice of regularly spaced points, surrounding
(either entirely or partly) and centered on some
region of interest of the macromolecule under
study. - The probe's energy at each grid point is
determined by the set of parameters supplied for
that particular atom type, and is the summation
over all atoms of the macromolecule, within a
non-bonded cutoff radius, of all pairwise
interactions.
From AutoDock manual
22Pre-calculated grid maps
- After the grid map is calculated, it can be used
repeatedly in the docking calculations - The time to perform an energy calculation using
the grids is proportional only to the number of
atoms in the ligand, and is independent of the
number of atoms in the receptor
23Genetic Algorithm (GA)
- Computational method based on the ideas and
language of natural genetics and evolution - State variables (translation, orientation, and
conformation of ligands) ?? genes - These genes make up the genotypes
- Atomic coordinates are phenotypes
- Fitness is the total interaction energy
Gene1
Gene2
Gene3
One for each torsion
x
y
z
q0
q1
q2
q3
t1
t2
t3
quaternion
chromosome
24Genetic Algorithm (GA)
- The evolution starts from a population of
randomly generated individuals - Random individuals are mated randomly
- New individuals inherit genes from either
parent through crossover - ABC/abc ? AbC/aBc
- Some offspring undergo random mutation (one
gene is changed by a random amount) - Selection of offspring is based on fitness
25Genetic Algorithm (GA)
Create a random population
Fitness evaluation
Selection best individuals to reproduce, and
their offspring
Crossover ABC/abc ? AbC/aBc
Mutation (based on Cauchy distribution)
Elitist select (top individuals survive into next
generation)
Termination generation? OR energy evaluation?
26Lamarckian Genetic Algorithm
- Most GAs mimic Darwinian evolution one-way
transfer of information from genotype ? phenotype
(right-side) - This corresponds to the global search of the
minima
fitness
Darwinian
Lamarckian
27Lamarckian Genetic Algorithm
fitness
- One novel improvement of Autodock is the
incorporation of local search (left-side) - This is called Lamarckian Genetic Algorithm
(LGA), in allusion to Larmarcks discredited
assertion that phenotype acquired can become
heritable.
Darwinian
Lamarckian
28Lamarckian Genetic Algorithm
- Its only possible for LGA if the mapping
function from genotype ? phenotype is invertible
phenotype ? genotype - Genotype ?? Phenotype
- Another novel feature of Autodock
- the local search is done in the genotypic space
rather than phenotypic space - So there is no need for the mapping to be
inverted - Performance LGA gt GA gt SA
29Autodock Scoring Function
Dispersion/repulsion
H-bond
- The program uses a five-term force field-based
function loosely based on the AMBER force field - The scaling factor for each of these five terms
is empirically calibrated from a set of 30
structurally known proteinligand complexes.
electrostatic
DGtor entropic term
DGsol intermolecular pairwise desolvation term
30Protein kinases
- Phosphorylation is the most common reversible
post-translational protein modification in
eukaryotes - Protein kinases are key players in signal
transduction networks - Many cancers are characterized by uncontrolled
kinase activity
31The human kinome
32Kinase specificity
- Tight control of the specificity of protein
kinases is required to maintain normal physiology - Specificity is determined in part through
recognition of consensus sequences around the
site of phosphorylation - However, active site alone is not enough short
amino acid sequence motifs can occur at high
frequency in proteomes - 700,000 potentially phosphorylatable residues
33Ubersax and Ferrell (2007)
34Docking interactions ensure specificity
- Combinatorial docking interactions are a
generally-used mechanism to ensure kinase
specificity - The docking sites are distal from the
phosphorylation site in the substrates - Outside the active site in the kinase
35MAP kinases
- Mediate cellular responses to a wide variety of
extracellular stimuli growth factor, cytokines,
UV, oxidative stress, etc. - Regulate many important cellular activities gene
expression, mitosis, movement, metabolism, cell
death, etc. - MAP kinases lie at the bottom of conserved
three-component phosphorylation cascades
36MAP Kinase cascade
37MAPKs
Ramen, et. al (2007)
38MAP kinases
- Three major subfamilies
- ERK (extracellular regulated kinases) ERK1 and
ERK2 - p38 p38a, p38b, p38g, p38d
- JNK (c-Jun N-terminal kinases) JNK1-3
- The different MAPK subfamilies phosphorylate a
distinct set of protein substrates
39Consensus sequence
- Consensus sequence for ERK1, ERK2 and p38a
- P-X-S/T-P
- 700,000 potentially phosphorylatable residues
- Needs other mechanisms to ensure specificity
40MAPKs common phos-site
- Positional scanning peptide library
Systematically substitutions of 20 a.a pT pY
at the 9 positions surrounding a central
phosphorylation site (9 x 22) - Confirmed the P-X-S/T-P previously found for ERK2
and p38a - No significant differences among any of the four
representative MAPKs
Sheridan et. al
41D-site
- Two docking interactions D-site DEF site
- The first one D-site (also referred to as the
D-domain, d-domain, or DEJL domain) - Two or more basic residues followed by a short
linker and a cluster of hydrophobic residues - Docking occurs along a groove on the opposite
face of the active site of MAPK
42D-site
- Well-characterized
- Mutagenesis
- Hydrogen-exchange mass spectrometry (HX-MS)
- X-ray crystallography
Lee, et. al (2004)
43DEF site
- DEF site (docking site for ERK FXF, also called
the F-site) - Best characterized in ERK
- F-X-F/Y-P
- 6 and 20 amino acids C-terminal to the
phosphorylation site
44DEF motif
- Peptide derived from Elk1 386-399 (phos-site
DEF site) - 19 a.a. (excluding cys) substitutions at each
four positions (Z) - The extent of phosphorylation was quantified
Sheridan et. al
45DEF motif
aromatic
Selectivity gt 1.5 (bold when gt 3.0)
aliphatic
Sheridan et. al
No preference
46DEF site selectivity
Phos-site
DEF site
p38a
p38d
Sheridan et. al
47DEF interacting pocket - HX-MS
green decreased exchange rate upon DEF peptide
binding ? solvent protection
Lee, et. al (2004)
48DEF interacting pocket - HX-MS
yellow surface hydrophobic residues
Strongest protected regions
Lee, et. al (2004)
pT183, pY185
49Docking with autodock
- Ligand a capped pentapeptide DEF site ligand
acetyl-SFQFP-amide - Receptor published structure of diphosphorylated
ERK2 (PDB code 2ERK) - Grid map 50 x 50 x 50 points with a spacing of
0.375 Å, centered on the previously identified
hydrophobic pocket on the ERK2 surface - 256 independent docking runs
50Grid map
51Autodock results model clusters
Clustering threshold RMSD 2 Å
52Model of DEF site interaction
Orange peptide ligand Green hydrophobic pocket
53Model of DEF site interaction
54Model of DEF site interaction
55Model of DEF site interaction
56Structural determinants mutagenesis studies
Highlighted residues surrounding the DEF pocket
- Alanine substitutions of key residues in the
binding pocket significantly attenuate
phosphorylation (except for L195A of p38d)
57Mutagenesis studies
- Mutants that swap DEF site specificity
WT aromatic ? DM aliphatic
WT aliphatic ? DM aromatic
Sheridan et. al
58Mutagenesis studies
- Collectively these mutagenesis experiments and
molecular docking support a mode of binding - P1 residue contacts residues analogous to Ile196,
Met197 and Leu198 of ERK2 - P3 residue makes contact with Leu235
-
59Acknowledgements
Dr. Turk Dr. Sheridan Department of Pharmacology,
Yale University