The Utility of New Openaccess Resources for Tracking Down Chemical Structures of Proteinprotein Inte - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

The Utility of New Openaccess Resources for Tracking Down Chemical Structures of Proteinprotein Inte

Description:

While mapping bioactive compounds between the literature, patents and to PubChem, ... strings (see Wikepedia for definitions), that define the structures ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 2
Provided by: cdsou
Category:

less

Transcript and Presenter's Notes

Title: The Utility of New Openaccess Resources for Tracking Down Chemical Structures of Proteinprotein Inte


1
European Bioinformatics Institute
The Utility of New Open-access Resources for
Tracking Down Chemical Structures of
Protein-protein Interaction Inhibitors
Chris Southan
Introduction
More challenging examples
While the use of small-molecules as experimental
perturbagens is a fundamental approach in systems
biology comparing compound structures has
hitherto required chemical drawing expertise and
expensive commercial database subscriptions.
However, the last few years have seen a
revolution in public cheminformatic resources
(Complementarity between public and commercial
databases new opportunities in medicinal
chemistry informatics.Southan et al.
PMID17897036). This work evaluates a selection
of open resources that can already be used by
systems biologists, using as an example a recent
Nature review of key small molecule inhibitors of
protein-protein interactions (SMIPPIs) that have
utility both as systems biology tools and
therapeutic candidates (Reaching for
high-hanging fruit in drug discovery at
proteinprotein interfaces Wells et al.
PMID18075579).
Some of the code names in fig.1 have no matches
in PubChem compound searches e.g. ABT-737.
However, there was match to a PubChem substance
ID 24771379 extracted from the MMDB entry. Thus,
via the crystal structure, the ligand ABT-737
could be mapped to the PubChem Compound ID
11228183. This has 32 similar compounds in
PubChem, one of which CID 15991564 also has a PDB
Bcl-Xl complex.
Attempting complete disambiguation
Either by direct look-up or indirectly via PDB
ligands it was possible to map additional
compounds from fig 1 to PubChem CIDs e.g.
Ro26-4550 16760522, SP4206 5327044, Compound
1 656967, SP304 5327044. However, some of the
others are close matches but have molecular mass
and other discrepancies with those chemical
structures depicted in the review. These include
Compound 23 and CID 5287508, the benzodiazepine
dione ligand in PDB code it4e, CID 656933, and
the ligand in irv1, CID 448419, that is not
identical to the Nutlin-3, CID16755649.
Fig.1 Compound designations and protein partners
from PMID 18075579
The problem
While there are sketches of their strcutures
included in the publication the series of
non-standard compound identifers show above in
fig.1 illustrates a common problem. These include
company code names such as (Abott) ABT-737 and
references to figures in other papers such as
compound 23b. Unlike gene names of their
interaction partners these cannot be easily
disambiguated into standardised representations.
Checking chemical patents
Establishing which bioactive chemical structures
have been published in patents has still required
expensive commercial database subscriptions
because, while many of the compounds are in
PubChem, there is no open link to the patent
documents. This has changed in the last few
months with the SureChem free patent search
facility. Taking Nutlin-3 as an example the
SMILES entry from PubChem was pasted into the
SureChem search box. There are nine exact
matches including the granted patent application
from Roche shown below.
The simple solution a database look-up
The first stop is the PubChem compound search box
where nutlin-3 picks up an entry, CID
16755649. This links across to the ChEBI entry
46742 shown below
Conclusions
While mapping bioactive compounds between the
literature, patents and to PubChem, ChEBI or
other open chemical databases is still
challenging the expansion in public sources will
increasingly enable the Systems Biology community
to make these links (background information at
http//www.cdsouthan.info/Data/CDS_data.htm).
The consequent ability to identify, search,
compare, source and extend the assaying of tool
compounds and/or drug candidates in different
systems will move the field forward. However,
this still needs engagement from the community
e.g. by including PubChem IDs in publications,
submitting assay results to PubChem and sharing
compounds.
This example confirms the explicit mapping of
Nutlin-3 to structural representations in two
chemical dbs with accession numbers, without a
sketching operation. In both entries (the ChEBI
one is shown above) you can see two
representation types, InChIs and SMILES strings
(see Wikepedia for definitions), that define the
structures independently of database IDs. These
open up a wide range of cheminformatic searches
and other operations that can now be done by the
non-specialist with open resources (including
Googling InChIs).
Dr Christopher Southan ELIXIR Database Survey
Co-ordinator southan_at_ebi.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com