Title: Complementary methods for virtual screening and the elucidation of binding patterns: MOLPRINT 2D3D
1Complementary methods for virtual screening and
the elucidation of binding patterns MOLPRINT
2D/3D
- Andreas Bender
- ab454_at_cam.ac.uk
- Unilever Centre for Molecular Science Informatics
Cambridge University, UK
2Outline
- Objective More efficient similarity searching of
chemical databases - New methods developed to detect molecules with
similar biology One is based on connectivity
(2D), the other on surface points (3D) - Results Lead Discovery finding new drugs,
finding new chemotypes - Feature Discovering Binding Patterns
3Substructure vs. Similarity Searching
- Substructure searching aims to detect molecules
containing a particular subgraph / substructure
exact matching desired (e.g. to detect toxic
groups) - Similarity searching aims to detect molecules
with similar properties structures may
(sometimes should!) be structurally different - Here employed for activity detection virtual
screening - Literature Bender, A. and Glen, R.C. Molecular
similarity a key technique in molecular
informatics. Org. Biomol. Chem. 2004, 2, 3204
3218.
4Descriptor Choice
52D Environment around an atom (MOLPRINT 2D)
Assign Sybyl mol2 atom types Find
connections Find connections to
connections Create a tree down to n levels Create
a fingerprint for this atom
N2
Level 0 Level 1 Level 2
Car Car
Car, Car, Car
1
2
1
1
These features are created for every (heavy) atom
in the molecule (J. Chem. Inf. Comput. Sci. 2004,
44, 170-178 2004, 44, 1710-1718)
6Feature Selection
- E.g. comparing faces first requires the
identification of key features. - How do we identify these?
- The same applies to molecules.
7B) Information-Gain Feature Selection
- We wish to select the important features.
- To do this we calculate the entropy of the data
as a whole and for each class. - This is used to select those features with the
highest discrimination, e.g. active and inactive
molecules.
Information gain (to be maximized)
Entropy of the whole set
Entropy of subsets
8Classification
- The next step is to identify which molecules
belong to which class. - To do this we use a Naïve Bayesian Classifer
using the features (atom environments) we have
identified as being important.
9C) Naïve Bayesian Classifier (classification by
presumptive evidence)
- Include all selected features fi in calculation
of - Ratio gt 1 Class membership 1
- Ratio lt 1 Class membership 2
- F feature vector fifeature elements
active
inactive
Feature counts in datasets
10Application lead discovery
- Database MDL Drug Data Report (MDDR)
- 957 ligands selected from MDDR
- 49 5HT3 Receptor antagonists,
- 40 Angiotensin Converting Enzyme inhib. (ACE),
- 111 HMG-Co-Reductase inhibitors (HMG),
- 134 PAF antagonists and
- 49 Thromboxane A2 antagonists (TXA2)
- 574 inactives
- Briem and Lessel, Perspect Drug Discov Des
2000, 20, 245-264. - Calculated Hit rate among ten nearest neighbours
for each molecule
11Comparison
Using Tanimoto Coefficient
Using Bayesian
- Bender, A., et al., Similarity searching of
chemical databases using atom environment
descriptors evaluation of performance (MOLPRINT
2D). J. Chem. Inf. Comput. Sci. 2004 (44) 1708
1718.
12Combining Information in Molecules
- In this method, we can extend the approach by
extracting from a set of molecules those features
having the best information gain - This can describe patterns in molecules much
better than individual cases
13Combining Information of 5 Actives
Bender, A., et al., Molecular Similarity
Searching using Atom Environments,
Information-Based Feature Selection and a Naïve
Bayesian Classifier. J. Chem. Inf. Comput. Sci.
2004 (44) 170 178.
14Transformation to 3D MOLPRINT 3D
- Idea To develop an analogous translationally and
rotationally invariant (TRI) descriptor based on
surface points - Advantage Switching from element atom types to
interaction energies gives more general model
than 2D (graph) approach - In Addition Local Description hopefully less
conformationally dependent - Approach to Fingerprint Surfaces Tanimoto and
other methods become applicable (until now mainly
used for 2D fingerprints) - Reference Bender, A. et al., J. Med. Chem.,
2004, 47, 6569 6583 and IEEE SMC 2004
proceedings
15The Conformational Problem
163D Environment around a surface point solvent
accessible surface
Central Point (Layer 0)
Points in Layer 1
8Å
Etc.
17Overall Performance Comparable to 2D methods, and
in addition
18TXA2, Graph-based Descriptors
1
2
3
4
5
6
7
Very little diversity in heterocyclic systems
no patents, no money!
19TXA2, 7 Hits among Top 10
1
2
3
4
5
6
7
20Which features are selected for classification?
- Even if your classifier works, do the selected
features make sense? - Set of active vs. inactive molecules
- Information Gain calculated for each feature,
those which are much more frequent among actives
are suspicious and might constitute the
pharmacophore - Look at features from HMG and TXA2
21Selected Features - HMG
- Binding Site HMG rigid lipophilic ring
22HMG-15
23TXA2
Yellow lipophilic side chains
- Yamamoto et al., J. Med. Chem. 1993 (36) 820
24TXA2-7
25Identification of Features
- (a) Not in binding conformation
- (b) In different conformations
26Summary
- 2D Method
- Performs about as other 2D methods for single
molecule searches - Outperforms them by a large margin when combining
information from multiple molecules (J. Chem.
Inf. Comput. Sci., 2004, 44, 170-178 J. Chem.
Inf. Comput. Sci., 2004, 44, 1710-1718.) - 3D Method translationally and rotationally
stable (invariant) combines high enrichment
factors with scaffold hopping discovery of new
chemotypes possible - (J. Med. Chem., 2004, 47, 6569 - 6583)
- Features shown to correlate with binding patterns
27Future Work
- Use more solid description of surface
properties (COSMO descriptors instead of force
field properties) - Shape encoding
- Different machine learning approaches
28Acknowledgements
- Robert C Glen (Unilever Centre, Cambridge, UK)
- Hamse Y. Mussa (Unilever Centre, Cambridge, UK)
- Stephan Reiling (Aventis, Bridgewater, USA)
- David Patterson (Tripos)
- Software
- GRID, CACTVS, gOpenMol many, many others
- Funding
- The Gates Cambridge Trust, Unilever, Tripos