Progress Report: Gridenabled Protein Docking Simulation using DOCK - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Progress Report: Gridenabled Protein Docking Simulation using DOCK

Description:

The NCI diversity set was converted and tested in this fashion. ... the go ahead for final docking and/or give us tips on how to improve the model. ... – PowerPoint PPT presentation

Number of Views:124

Avg rating:3.0/5.0

Slides: 21

Provided by: danielg99

Category:

more less

Transcript and Presenter's Notes

Title: Progress Report: Gridenabled Protein Docking Simulation using DOCK

1
Progress Report Grid-enabled Protein Docking
Simulation using DOCK

Cathy Chang, Daniel Goodman, Marshall Levesque,
Noah Ollikainen
General Goals
To use DOCK to accurately simulate
protein-protein and protein-ligand docking
To find novel binding sites and inhibitors in a
high-throughput manner
To develop scripts and software to automate this
as much as possible on the grid environment

2
Summary of Progress So Far

DOCK 5 has been successfully set up and installed
on numerous local clusters at Osaka University,
and has been successfully set up to run in
parallel with GLOBUS and MPI using Perl scripts.
This task is largely complete, and we should have
few problems running DOCK on the available
machines.
Perl scripts were written to convert databases
from various formats and prep them for docking by
adding AMBER charges, correcting format errors,
and removing incompatible ligands. The NCI
diversity set was converted and tested in this
fashion. This task still remains troublesome,
despite the automation.
1WBN has been successfully docked as a test
kinase to a small slice (2000 ligands) of the
Drug-Like subset ZINC database at
http//zinc.docking.org. However, there were some
problems with our results, which will be
discussed.
Using this test kinase as well as 1ND4, various
docking parameters were tested to find the
parameter settings - like number of minimization
steps and energy grid granularity that minimize
the time per ligand and maximize the supposed
accuracy of the simulation. The ideal parameters
largely depend on the protein and binding site in
question. This is a difficult problem, but we
have automated it to some degree using Perl.

3
Computing Time/Resources

Currently Accessible Machines
3 Local Clusters at Osaka (Cafe, Tea, and a
third), which we must share with other
researchers here via the Globus Queue
Another cluster (TDWT) which we have to
ourselves, but is not apart of the Grid
The ROCKS cluster at SDSC, accessible via the
Grid
We can gain access to more machines on the Grid,
including clusters in Taiwan, China, and
Australia, but have not done so yet. Applying for
an account could take as much as a week.

4
Computing Time

Currently, docking a flexible ligand to a rigid
protein with our current parameters (for 1ND4)
takes about 20-30 minutes on a single processor,
depending on the size of the ligand and the
number of rotable bonds.
Depending on various parameters, this can
fluctuate from as little as 2 minutes to as many
as 90.
When we performed a test run with our Drug-Like
ZINC subset on TDWT, it took about 6 hours to do
2000 ligands, with an average of 5 ligands per
minute for the whole cluster.

5
Database Considerations

At this speed, flexible docking of ligands takes
a prohibitively long time.
The Drug-like subset of ZINC has well over 2
million ligands, and flexibly docking them all
would take months.
From the literature, weve found two possible
ways of increasing throughput speed
Filter this database
Use a smaller and more targeted database

6
Database Filtering

Filtering would involve checking each ligand for
specific side chains, solvent properties, number
of rotable bonds, molecular weight, and other
factors. We could filter for toxicity, similarity
to other molecules that bind to kinases, etc.
Some filtering criteria are easier to check for
than others some we could do with a quick
script, others might require complex software.
We are currently using a subset of the ZINC
database that is already filtered for drug-like
criteria, using the method described in Lipinski
et al.
Problems
Currently not clear from a chemistry standpoint
which ligands to remove/keep
All software we were able to find which does
complex filtering is commercial only
Limiting to other ligands similar to molecules
that bind to other kinases (kinase-like) has
the potential to miss many molecules that bind
well

7
Scoring

GRID energy and contact scoring
DOCK 5s two main scoring methods both use a
pre-computed energy grid. This is faster, but
does not give an absolute measure of binding
affinity.
Automated docking with grid-based energy
evaluationEC Meng, BK Shoichet, ID Kuntz -
Journal of Computational Chemistry, 1992
There are also several other scoring methods
available, including two flavors of GBSA pairwise
free-energy scoring and an all-atom AMBER
force-field.
http//dock.compbio.ucsf.edu/DOCK_6/dock6_manual.h
tmScoring
However, these other scoring methods, while
sometimes more accurate, take much longer, and
usually are performed after the best orientation
and conformation has been found by a grid-based
score.

8
Consensus Scoring

Bissantz et. al. and Charifson et al. both
suggest that combining scoring methods increases
accuracy and removes many false positives.
However, many of the scoring methods they used
(Chemscore, Pmf, PLP, etc) are all commercial,
and thus not easily available to us
Also, this will increase our computation time

9
Putative Human Kinase

Cathy Changs Work
Target protein novel human protein kinase
ACAD10
discovered by Kristine Breidis, et al.
Need known 3D structure for binding site
prediction and docking simulations therefore
Model novel target protein structure with
Modeller, protein homology modeling
Identify new possible binding sites on target
protein with
SVM, Support Vector Machine
written by Jo-Lan Chung, graduate student under
Dr. Bourne
Visualize results and transfer data to Daniel for
docking

10
Modeller 8v2

Based on sequence alignment, Modeller uses
protein homology (comparative) modeling to
predict a possible 3D structure for a protein
with unknown structure

11
Sample 1WBN vs. Prediction
Original 1WBN Modellers predicted 1WBN

Modeller is able to predict the major
characteristics of 1WBN with room for improvement
Will attempt to model target protein with same
procedure
After modeling, we can predict possible binding
sites with SVM for docking

12
Target Protein ACAD10 vs. 1ND4

ACAD10 contains a total of 4 domains, including a
protein kinase region
Kinase region alignment identity closest to 1ND4
Modeller fails to model a majority of target due
to lack of information
Output structure has a protein core with a long
tail region
Kinase region is composed of a quarter of target
sequence

13
Problems

To eliminate the tail region, tried to confirm
sequence alignment with 123D
Outputs 1JQI, which aligns best with tail region
However, Modeller result outputs 2 cores
connected with a central chain
We tried BLAST for alternative sequence
alignments
Instead of 4 domains, only 3 are recognized, and
all top alignment results do not have PDB IDs
PDB file is one of the required input for Modeller

14
Solutions

1 predict structure of kinase region only
Since ACAD10 is identified as a protein kinase,
we modelled this region specifically against 1ND4
The resulting 3D structure has a major core and a
smaller tail
2 alternative modeling program Swiss-Model
Swiss-Model automatically constructs 3D models
after automatic sequence alignments and homolog
search
The resulting structure is more complete

Left kinase domain
Right SWISS-MODEL
15
Currently using 1ND4 as a DOCKing template

How this affects the Virtual Screening
Kristine has told us that the closest kinase
homolog is 1ND4, despite the problems that weve
had modeling it so far.
We have given Kristine our models, and asked her
to give the go ahead for final docking and/or
give us tips on how to improve the model.
We are currently attempting to fine tune the
docking parameters using 1ND4 as the receptor, so
that the ligand that is crystallized with the PDB
file docks in a similar fashion.
To the left are the superimposed backbones of
1ND4 and the modeled structure for our putative
human kinase.
While their backbones are the same, the side
chains are of course very different.

16
Protein Tyrosine Phosphatases and DOCKspecific
to Marshall Levesques work

Goals
Examine known and potential binding sites of
SHP/Gab proteins
Use DOCK to screen ligand database against
tyrosine phosphatases SHP-1, SHP-2, and adapter
protein Gab2 in hopes of finding potential
inhibitors of activity and Gab binding.
Attempt to simulate protein-protein binding
between SHP-2 and Gab2

17
What we have

SHP-1 is the only protein with crystal structures
for both apo and bound forms
SHP-2s bound catalytic domain could be modeled
or substituted by that of SHP-1
Gab has no structures, so produced models would
have to be used entirely.

18
Problems with what we have

The SHP-1 bound structures substrate is a long
peptide with many rotable bonds, allowing for a
large number of orientations and conformations to
be scored by DOCK.
DOCKs determined binding site is also large due
to the substrates size, increasing surface area
to test.
Differing input parameters have all given
unsatisfactory RMSD values and DOCK runtimes,
averaging gt4Å and 3-4hrs respectively.
The substrates bound orientation is dependent on
multiple binding pockets

Crystallized orientation of substrate, SIRP?, is
colored according to elements. Energy scoring
(yellow) and Contact scoring (blue) both gave
incorrectly bound forms, with their Tyr(P)
residues not in the base of the binding pocket
(cyan) which consists of the SHP-1 signature
motif.
19
Dealing with what we have

Options
Alter the SIRP? peptide in order to reduce
rotable bonds, decrease the potential binding
site box, and concentrate on main binding residue
Tyr(P).
What parts could be removed changed needs to be
investigated
Find other known binding substrates for SHP-1 and
use dock to find/compare its orientation.
Other ideas?

SHP-1 catalytic domain and SIRP? Tyr(P)469
complex with spheres generated used to determine
binding pockets. PTP signature motifis labeled
with cyan and WPD Loop with red. Notice the box
contains a large portion of the protein.
20
Some remaining problems

Many scoring methods, filtering programs, and
general tools to do this type of study are only
available commercially
Its not clear how accurate our final model will
be
So far, on test molecules, Autodock and Dock
results have been somewhat different
It will be difficult to gauge the precision of
our scoring methods until we test these molecules
in vitro
although using consensus scoring and comparing
Autodock and Dock results will help narrow down
our leads
It is not clear what type of database is best
suited to this task
It is not clear that we have sufficient time and
resources to test a massive database( gt1 million
ligands)