Title: On the Path to Drug Discovery: New Small Molecule Resources
1On the Path to Drug DiscoveryNew Small
Molecule Resources
- Michel Dumontier, Ph.D.
- Carleton University
2The Path to Drug Discovery
- There are essentially six steps in drug
discovery - disease selection
- target identification
- lead compound identification
- lead compound optimization
- pre-clinical trial
- clinical trial
3In essence, we need to
- Identify proteins that bind small molecules.
These are potential targets and their small
molecule hits. - Reduce the list of targets by identifying
essential genes. - Reduce the hits by considering specificity of
interaction.
4Protein Small-Molecule Interaction Database
- 23,000 non-redundant protein small-molecule
interactions. - Derived from PDB structure database
- Filtered for crystallographic symmetry, buffer
agents, non-biologically interesting small
molecules - Captured in the 3DSM division of the Biomolecular
Interaction Network Database (BIND)
http//bind.ca
Biopolymers. 2001-2002 61(2)111-20
5Small Molecule Curation
Internal Links BIND Records Binding Domains
External Links PDBSum HIC-up MMDB PDB KEGG PubC
hem EcoCyc
Taxol
Approved for treatment of breast cancer (1994)
6SMID Mapping small molecule interactions to
domains
Taxol binding conservation in Tubulin/FtsZ domain
7Conserved Domain Ligand Visualization
Small molecule and contacting ligands are
automatically rendered/colored Domain
Purple Ligands Green Taxol Spacefill
8Use Cases for SMID
- Domain Studies
- Work with small molecule binding domains
- Conservation of binding site across domain
members - Structural Genomics
- Domain/ligand/binding site identification
- Some ligands go over domain boundaries
- Domain family entry into PDB structures
- Quickly identify candidate co-crystalization
ligands
9SMID-BLAST
- Enables users to predict small-molecule binding
sites in proteins for which a crystal-structure
has not yet been determined. - Maps binding sites to the query protein from
experimentally determined interactions. - requires a hit to a conserved domain.
- Freely available
- Web interface
- http//smid.blueprint.org
- Standalone tool
- ftp//ftp.blueprint.org/pub/SMID/tool/
10Protein Sequence
Alignment to conserved small-molecule binding
domains
Evaluate binding site conservation
domain small- molecule interactions
cluster binding sites from all domains
List of small-molecule binding sites with
confidence scores
11Validation Summary
- 1652 experimentally determined protein
small-molecule interactions from PDB - Predictions
- Ligand correctly predicted in 62 of cases.
- 25 of the ligands also obtained the best ligand
score. - 73 of interactions were predicted with gt80
binding site coverage - This is very good, as the test set is not
comprehensive - we dont have a set of all possible ligands to
each protein crystal structure. - we can only use exact small molecule matches (not
similar molecules, e.g. ATP vs ATP-gamma-S)
12SMID-BLAST ExampleHIV Integrase
- Mediates integration of a copy of the viral
genome into the host DNA. - Lack of mammalian counterpart makes for good drug
target. - There is no solved structure as of yet, but has
known Zn binding domain, a catalytic core and
DNA-binding domain. - Use SMID-BLAST to make short list of small
molecules that interact with the integrase - could lead to pharmacological studies to
determine inhibition.
13Small molecules predicted to bind to HIV Integrase
- 4 clustered binding sites
- 4 small molecules
- 5 ions
- 100 5CITEP
- known inhibitory ligand
- TTA - tetraphenyl-arsonium
- known interactor, but doesnt inhibit
- Y3
- known interactor with Avian Sarcoma Virus
integrase - New compounds identified from pharmacophore search
14Integrase Binding Site Mapping
- Each hyperlinked residue indicates binding site
- Clicking on link shows coverage of
domain-specific binding sites, and a link to the
SMID, PDB, BIND record that supports the
interaction.
15A genomic context
- SMID Genomes bridges the gap between structural
proteomics and genomics - Small-molecule interaction predictions for
proteins of all genomes - Allows for comparative analysis of small-molecule
binding profiles
16Application area of Prokaryotic Genome
Projects
Source http//www.genomesonline.org
17SMID-Genomes
- Ran SMID-BLAST for all protein sequences
- 10.5M small molecule binding interactions
- Conservative threshold
- Added genomic context
- 9.4M interactions across 385,000 proteins from
1558 completely sequenced genomes - 50 protein coverage for archae,bacteria
eukaryotes - 35 for viruses/phages
- Search/Browse/Compare predicted small molecule
interactions in a taxonomy specific manner
18Search
Browse
Compare
19Browsing SMID-Genomes
20Protein view lists all proteins with putative
small molecule interactions
Click to list small molecules predicted to bind
to this protein
Click to view small-molecule binding sites on
genomic proteins
Limit the search results by a term!
21Small molecule view includes picture and link to
small molecule summary page
Click to view domain-specific interactions
Click G to list all genomes having a
interaction with this small molecule
22Domain view lists the conserved domains used to
make the small molecule predictions
Click G to list all genomes having a
interaction with this domain
23Multi-Genome Small-Molecule Binding Profile
Comparison
- Malaria
- a disease that directly impacts 300-500 million
people worldwide and is a prominent economic and
social problem in the developing world - Browse small molecules that target proteins in
humans, Plasmodium falciparum, the parasite that
causes malaria and Anopheles gambiae, the
mosquito that carries the malarial parasite.
24- Fosmidomycin is an antibiotic isolated from
Streptomyces lavendulae operates as a potent
inhibitor of DOXP reductoisomerase, a key
enzyme of the alternative pathway of isoprenoid
synthesis. - It is known that the malaria parasite Plasmodium
is dependent on this pathway, because it lacks
the primary isoprenoid synthesis pathway.
25Small-Molecule Specificity
- Hits 74 of bacteria
- For eukaryotes, hit to rat, but only plant has
ortholog of reductoisomerase
26Text Query SARS
27Essential Genes
- Putatively identify essential genes with small
molecule interactions and find even better
targets. - Database of essential genes http//tubic.tju.edu.c
n/deg/ - Haemophilus influenzae. 2002. Proc Natl Acad Sci
U S A 99 966-971. - Mycoplasma genitalium. 1999. Science 286
2165-2169. - Staphylococcus aureus. 2001. Science 293
2266-2269. - Vibrio cholerae. 2000. Nat Biotechnol 18
740-745. - Bacillus subtilis. 2003. Proc Natl Acad Sci U S A
100 4678-4683. - Staphylococcus aureus. 2002. Mol Microbiol.
431387-400. - Streptococcus pneumoniae. 2002. Nucleic Acids
Res. 303152-62. - Helicobacter pylori. 2004. J Bacteriol.
1867926-7935. - E. coli http//www.shigen.nig.ac.jp/ecoli/pec/inde
x.jsp - Yeast - MIPS http//www.mips.biochem.mpg.de/proj/y
east - Map essential genes to other genomes by ortholog
assignment - Identify best reciprocal BLAST hits between
organisms - Inparanoid algorithm
- Sonnhammer et al. 2001. JMB, 3141041-1052.
28Identify small molecules that exclusively
interact with mycoplasma urethral pathogens
Chicken Urethral (Men) Swine Non-pathogen Cattle H
uman (resp.)
29FM2 nucleoside inhibitor acts on an essential gene
30Converting Small Molecules hits to Lead Compounds
- So far, we have been using small molecules in
PDB - Expand small molecule library with NCBIs PubChem
650,000 compounds - Display small molecule hits with
chemical/biological significance. - Chemical Ontology
31Chemical Ontology
- 233 Chemical Functional groups
- Simple organization illustrations
- Functional groups automatically assigned by
checkmol computer program - Can identify molecules having two or more
functional groups - Semantic similarity algorithm to compare small
molecules on shared terms - Applied to internal SMDB PubChem
32Chemical Ontology applied to PubChemamine
functional groups
parent
child
siblings
33Union of functional groups
34(No Transcript)
35Next Steps
- Compare PDB small molecules to PubChem compounds
with semantic similarity algorithm - Filter similar molecules by drug-like properties
- Lipinski rule of 5 molwt lt 500, logp lt 5.0, lt5
h-bond donors, lt10 h-bond acceptors - Add any available biological assay information
(deposited in Pubchem)
36Future Directions
- Identify diseases that may be affected by binding
site mutations in genes described in the OMIM
database. - Do known variations disrupt binding?
- Identify small molecule binding sites that are
affected by non-synonymous SNPs in
disease-related proteins. - Experimentally investigate via wet-lab
collaborations
37Conclusions
- New and freely available resources for drug
discovery - A software program that annotates small-molecule
binding sites to protein sequences with no known
structure. - A platform for drug discovery that considers
target exclusivity and compound specificity
across all genomes.
38Acknowledgements
- Christopher Hogue Blueprint PI
- SMID/SMID-BLAST
- Howard Feldman
- Kevin Snyder
- Small Molecule Group
- Susan Ling
- Brian Parker
- Roberta Stasiuk
- BIND
- Marc Dumontier
- Anthony Hrjovic
- Shawn Konopinksy
- John Salama
- IT
- Sam Sgro
- Funding Agencies
- Genome Canada
- Genome Ontario
- CIHR
- ORDCF