Progress Report: Gridenabled Protein Docking Simulation using DOCK - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Progress Report: Gridenabled Protein Docking Simulation using DOCK

Description:

The NCI diversity set was converted and tested in this fashion. ... the go ahead for final docking and/or give us tips on how to improve the model. ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 21
Provided by: danielg99
Category:

less

Transcript and Presenter's Notes

Title: Progress Report: Gridenabled Protein Docking Simulation using DOCK


1
Progress Report Grid-enabled Protein Docking
Simulation using DOCK
  • Cathy Chang, Daniel Goodman, Marshall Levesque,
    Noah Ollikainen
  • General Goals
  • To use DOCK to accurately simulate
    protein-protein and protein-ligand docking
  • To find novel binding sites and inhibitors in a
    high-throughput manner
  • To develop scripts and software to automate this
    as much as possible on the grid environment

2
Summary of Progress So Far
  • DOCK 5 has been successfully set up and installed
    on numerous local clusters at Osaka University,
    and has been successfully set up to run in
    parallel with GLOBUS and MPI using Perl scripts.
    This task is largely complete, and we should have
    few problems running DOCK on the available
    machines.
  • Perl scripts were written to convert databases
    from various formats and prep them for docking by
    adding AMBER charges, correcting format errors,
    and removing incompatible ligands. The NCI
    diversity set was converted and tested in this
    fashion. This task still remains troublesome,
    despite the automation.
  • 1WBN has been successfully docked as a test
    kinase to a small slice (2000 ligands) of the
    Drug-Like subset ZINC database at
    http//zinc.docking.org. However, there were some
    problems with our results, which will be
    discussed.
  • Using this test kinase as well as 1ND4, various
    docking parameters were tested to find the
    parameter settings - like number of minimization
    steps and energy grid granularity that minimize
    the time per ligand and maximize the supposed
    accuracy of the simulation. The ideal parameters
    largely depend on the protein and binding site in
    question. This is a difficult problem, but we
    have automated it to some degree using Perl.

3
Computing Time/Resources
  • Currently Accessible Machines
  • 3 Local Clusters at Osaka (Cafe, Tea, and a
    third), which we must share with other
    researchers here via the Globus Queue
  • Another cluster (TDWT) which we have to
    ourselves, but is not apart of the Grid
  • The ROCKS cluster at SDSC, accessible via the
    Grid
  • We can gain access to more machines on the Grid,
    including clusters in Taiwan, China, and
    Australia, but have not done so yet. Applying for
    an account could take as much as a week.

4
Computing Time
  • Currently, docking a flexible ligand to a rigid
    protein with our current parameters (for 1ND4)
    takes about 20-30 minutes on a single processor,
    depending on the size of the ligand and the
    number of rotable bonds.
  • Depending on various parameters, this can
    fluctuate from as little as 2 minutes to as many
    as 90.
  • When we performed a test run with our Drug-Like
    ZINC subset on TDWT, it took about 6 hours to do
    2000 ligands, with an average of 5 ligands per
    minute for the whole cluster.

5
Database Considerations
  • At this speed, flexible docking of ligands takes
    a prohibitively long time.
  • The Drug-like subset of ZINC has well over 2
    million ligands, and flexibly docking them all
    would take months.
  • From the literature, weve found two possible
    ways of increasing throughput speed
  • Filter this database
  • Use a smaller and more targeted database

6
Database Filtering
  • Filtering would involve checking each ligand for
    specific side chains, solvent properties, number
    of rotable bonds, molecular weight, and other
    factors. We could filter for toxicity, similarity
    to other molecules that bind to kinases, etc.
  • Some filtering criteria are easier to check for
    than others some we could do with a quick
    script, others might require complex software.
  • We are currently using a subset of the ZINC
    database that is already filtered for drug-like
    criteria, using the method described in Lipinski
    et al.
  • Problems
  • Currently not clear from a chemistry standpoint
    which ligands to remove/keep
  • All software we were able to find which does
    complex filtering is commercial only
  • Limiting to other ligands similar to molecules
    that bind to other kinases (kinase-like) has
    the potential to miss many molecules that bind
    well

7
Scoring
  • GRID energy and contact scoring
  • DOCK 5s two main scoring methods both use a
    pre-computed energy grid. This is faster, but
    does not give an absolute measure of binding
    affinity.
  • Automated docking with grid-based energy
    evaluationEC Meng, BK Shoichet, ID Kuntz -
    Journal of Computational Chemistry, 1992
  • There are also several other scoring methods
    available, including two flavors of GBSA pairwise
    free-energy scoring and an all-atom AMBER
    force-field.
  • http//dock.compbio.ucsf.edu/DOCK_6/dock6_manual.h
    tmScoring
  • However, these other scoring methods, while
    sometimes more accurate, take much longer, and
    usually are performed after the best orientation
    and conformation has been found by a grid-based
    score.

8
Consensus Scoring
  • Bissantz et. al. and Charifson et al. both
    suggest that combining scoring methods increases
    accuracy and removes many false positives.
  • However, many of the scoring methods they used
    (Chemscore, Pmf, PLP, etc) are all commercial,
    and thus not easily available to us
  • Also, this will increase our computation time

9
Putative Human Kinase
  • Cathy Changs Work
  • Target protein novel human protein kinase
    ACAD10
  • discovered by Kristine Breidis, et al.
  • Need known 3D structure for binding site
    prediction and docking simulations therefore
  • Model novel target protein structure with
  • Modeller, protein homology modeling
  • Identify new possible binding sites on target
    protein with
  • SVM, Support Vector Machine
  • written by Jo-Lan Chung, graduate student under
    Dr. Bourne
  • Visualize results and transfer data to Daniel for
    docking

10
Modeller 8v2
  • Based on sequence alignment, Modeller uses
    protein homology (comparative) modeling to
    predict a possible 3D structure for a protein
    with unknown structure

11
Sample 1WBN vs. Prediction
Original 1WBN Modellers predicted 1WBN
  • Modeller is able to predict the major
    characteristics of 1WBN with room for improvement
  • Will attempt to model target protein with same
    procedure
  • After modeling, we can predict possible binding
    sites with SVM for docking

12
Target Protein ACAD10 vs. 1ND4
  • ACAD10 contains a total of 4 domains, including a
    protein kinase region
  • Kinase region alignment identity closest to 1ND4
  • Modeller fails to model a majority of target due
    to lack of information
  • Output structure has a protein core with a long
    tail region
  • Kinase region is composed of a quarter of target
    sequence

13
Problems
  • To eliminate the tail region, tried to confirm
    sequence alignment with 123D
  • Outputs 1JQI, which aligns best with tail region
  • However, Modeller result outputs 2 cores
    connected with a central chain
  • We tried BLAST for alternative sequence
    alignments
  • Instead of 4 domains, only 3 are recognized, and
    all top alignment results do not have PDB IDs
  • PDB file is one of the required input for Modeller

14
Solutions
  • 1 predict structure of kinase region only
  • Since ACAD10 is identified as a protein kinase,
    we modelled this region specifically against 1ND4
  • The resulting 3D structure has a major core and a
    smaller tail
  • 2 alternative modeling program Swiss-Model
  • Swiss-Model automatically constructs 3D models
    after automatic sequence alignments and homolog
    search
  • The resulting structure is more complete

Left kinase domain
Right SWISS-MODEL
15
Currently using 1ND4 as a DOCKing template
  • How this affects the Virtual Screening
  • Kristine has told us that the closest kinase
    homolog is 1ND4, despite the problems that weve
    had modeling it so far.
  • We have given Kristine our models, and asked her
    to give the go ahead for final docking and/or
    give us tips on how to improve the model.
  • We are currently attempting to fine tune the
    docking parameters using 1ND4 as the receptor, so
    that the ligand that is crystallized with the PDB
    file docks in a similar fashion.
  • To the left are the superimposed backbones of
    1ND4 and the modeled structure for our putative
    human kinase.
  • While their backbones are the same, the side
    chains are of course very different.

16
Protein Tyrosine Phosphatases and DOCKspecific
to Marshall Levesques work
  • Goals
  • Examine known and potential binding sites of
    SHP/Gab proteins
  • Use DOCK to screen ligand database against
    tyrosine phosphatases SHP-1, SHP-2, and adapter
    protein Gab2 in hopes of finding potential
    inhibitors of activity and Gab binding.
  • Attempt to simulate protein-protein binding
    between SHP-2 and Gab2

17
What we have
  • SHP-1 is the only protein with crystal structures
    for both apo and bound forms
  • SHP-2s bound catalytic domain could be modeled
    or substituted by that of SHP-1
  • Gab has no structures, so produced models would
    have to be used entirely.

18
Problems with what we have
  • The SHP-1 bound structures substrate is a long
    peptide with many rotable bonds, allowing for a
    large number of orientations and conformations to
    be scored by DOCK.
  • DOCKs determined binding site is also large due
    to the substrates size, increasing surface area
    to test.
  • Differing input parameters have all given
    unsatisfactory RMSD values and DOCK runtimes,
    averaging gt4Å and 3-4hrs respectively.
  • The substrates bound orientation is dependent on
    multiple binding pockets

Crystallized orientation of substrate, SIRP?, is
colored according to elements. Energy scoring
(yellow) and Contact scoring (blue) both gave
incorrectly bound forms, with their Tyr(P)
residues not in the base of the binding pocket
(cyan) which consists of the SHP-1 signature
motif.
19
Dealing with what we have
  • Options
  • Alter the SIRP? peptide in order to reduce
    rotable bonds, decrease the potential binding
    site box, and concentrate on main binding residue
    Tyr(P).
  • What parts could be removed changed needs to be
    investigated
  • Find other known binding substrates for SHP-1 and
    use dock to find/compare its orientation.
  • Other ideas?

SHP-1 catalytic domain and SIRP? Tyr(P)469
complex with spheres generated used to determine
binding pockets. PTP signature motifis labeled
with cyan and WPD Loop with red. Notice the box
contains a large portion of the protein.
20
Some remaining problems
  • Many scoring methods, filtering programs, and
    general tools to do this type of study are only
    available commercially
  • Its not clear how accurate our final model will
    be
  • So far, on test molecules, Autodock and Dock
    results have been somewhat different
  • It will be difficult to gauge the precision of
    our scoring methods until we test these molecules
    in vitro
  • although using consensus scoring and comparing
    Autodock and Dock results will help narrow down
    our leads
  • It is not clear what type of database is best
    suited to this task
  • It is not clear that we have sufficient time and
    resources to test a massive database( gt1 million
    ligands)
Write a Comment
User Comments (0)
About PowerShow.com