Molecular Similarity Searching Using Atom Environments and Surface Point Environments - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Molecular Similarity Searching Using Atom Environments and Surface Point Environments

Description:

54 Acetylcholine esterase (AChE) 999 Diverse Compounds from MDDR ... AChE much better than other methods (docking difficult large pocket, water, ... AChE ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 37

Provided by: andreas83

Category:

more less

Transcript and Presenter's Notes

Title: Molecular Similarity Searching Using Atom Environments and Surface Point Environments

1
Molecular Similarity Searching Using Atom
Environments and Surface Point Environments

Andreas Bender
Unilever Centre for Molecular Informatics
Cambridge University, UK

2
Outline

Objective More efficient searching of chemical
databases
New methods developed to detect molecules with
similar biology One is based on connectivity
(2D), the other on surface points (3D)
Details of the algorithms presented here,
starting with the 2D type
Results Lead Discovery finding new drugs,
finding new chemotypes

3
Descriptor Choice
4
2D Environment around an atom

E.g. 6-aminoquinoline

Assign Sybyl mol2 atom types find
connections find connections to
connections create a tree down to n levels bin
the atom types for each level create a
fingerprint for this atom
Level 0 Level 1 Level 2
N2
Car--Car
Car,H
Car,Car
1
2
1
1
These features are created for every atom in the
molecule
5
Feature Selection

E.g. comparing faces first requires the
identification of key features.
How do we identify these?
The same applies to molecules.

6
B) Information-Gain Feature Selection

We wish to select the important features.
To do this we calculate the entropy of the data
as a whole and for each class.
This is used to select those features with the
highest discrimination, e.g. active and inactive
molecules.

7
Classification

The next step is to identify which molecules
belong to which class.
To to this we use a Naïve Bayesian Classifer
using the features (atom environments) we have
identified as being important.

8
C) Naïve Bayesian Classifier (classification by
presumptive evidence)

Include all selected features fi in calculation
of
Ratio gt 1 Class membership 1
Ratio lt 1 Class membership 2
F feature vector
fifeature elements

9
Application lead discovery

Database MDL Drug Data Report (MDDR)
957 ligands selected from MDDR
49 5HT3 Receptor antagonists, 40 Angiotensin
Converting Enzyme inhibitors (ACE), 111
HMG-Co-Reductase inhibitors (HMG), 134 PAF
antagonists and 49 Thromboxane A2 antagonists
(TXA2) Briem and Lessel, Perspect Drug Discov
Des 2000, 20, 245-264.
A) Hit rate among ten nearest neighbours for each
molecule
B) 20-fold Cross Validation, 5 Molecules for
query generation

10
Comparison
Using single molecule query

Briem and Lessel, Perspectives in Drug Discovery
and Design 2000, 20, 245-264.

11
Combining Information in Molecules

In this method, we can extend the approach by
extracting from a set of molecules those features
having the best information gain
This can describe patterns in molecules much
better than individual cases
The following example shows cross-validated
database searches using combinations of features
from five molecules at a time
Inactives were used in a 50/50 split, no molecule
is in the training and the test set at the same
time

12
MDDR ACE cumulative recall plot
Optimal Selection
Random selection
Random Selection
We found about 80 of the active molecules among
the first 10 of the library
13
Using Multiple Query Molecules
14
Transformation to 3D

Idea To develop an analogous translationally and
rotationally invariant (TRI) descriptor based on
surface points
Advantage Switching from element atom types to
interaction energies gives more general model -gt
scaffold hopping?
Two parts Interaction fingerprint and shape
description here results using only interaction
fingerprints are shown, shape description under
development
Again information-gain feature selection and the
Naïve Bayesian used for Classification

15
3D Environment around a surface point solvent
accessible surface
Central Point (Layer 0)
Points in Layer 1

Points in Layer 2

Etc.
16
Algorithm
Interaction Energies at Surface Points, one Probe
at a time
Binning Scheme -1.0 -0.45 -0.4 -0.3 -0.1 0.0 0.2
-0.35
-0.35
Surface Point Environment
00010000 01100010 - 011101100
17
Relation to other algorithms

Surface Autocorrelation Averaging of interaction
energies Here a favourable and unfavourable
interaction in a given layer will both remain in
the fingerprint
GRIND continuous variables from GRID entire
field of interaction energies simplified only
maximum product enters descriptor
MaP categorical variables, counts are kept
size description
(In addition the feature selection and scoring
are handled differently)

18
Algorithm Flow
19
Standard Parameters

MSMS Probe radius 1.5 Å, Density 0.5 Points/ Å2,
double Van-der-Waals radii for atoms, giving
effectively solvent accessible surface
GRID DRY, C3, N1, N2, O, O- probes, otherwise
standard parameters
Binning Using variable number of layers, 8 bits,
cutoffs were set that equal frequencies are
observed

20
L0-4
L0-5
L0-3
L0-2
L0-1
Layer0

Random

21
Enrichment Curves Briem (4 Layers, Standard
Settings)
22
Comparison
23
ACE Binding Site

Snake venom peptide analog with putative binding
motif to angiotensin used in early compound
design (Cushman et al., Biochemistry (1977), 16,
5484-5491.)

24
ACE - Query
25
Hits using 2D descriptors Hits 1 to 5
26
Hits Using 2D Descriptors Hits 6 to 10
27
ACE Selected Features
28
Hits using 3D descriptors (10 Hits among top 20,
enrichment 20)
29
New scaffolds
30
Jacobsson data set

110 ER? toxins (ER?t)
36 ER? mimics (ER?m)
60 Matrix Metalloprotease 3 (MMP3)
129 Factor Xa (fXa)
54 Acetylcholine esterase (AChE)
999 Diverse Compounds from MDDR
2/3 for training, 1/3 for testing
Performance Measure Classification
Actives and Inactives also used in other methods
Jacobsson et al, J Med Chem 2003, 46, 5781-5789

31
Jacobsson et al., Methods

Docking with 7 Scoring functions (2 implemented
in ICM, five in Tripos CScore) used (GOLD, ICM,
Glide similar)
Fusion by classical consensus scoring (CScore),
Partial Least Squares Discriminant Analysis
(PLS-DA), Bayesian Classification and rule-based
methods
With exception of ER? large number of close
analogues

32
Results

With the exception of MMP3, superior performance
to other methods (better precision accuracy at
the same recall)
AChE much better than other methods (docking
difficult large pocket, water, multiple binding
sites)
Given the fact that docking takes much time, at
least in some cases (AChE) it seems not to be the
method of choice

33
Factor Xa
(Accuracy is overall correct prediction,
precision fraction of correct positive
predictions)
34
AChE

(Accuracy is overall correct prediction,
precision fraction of correct positive
predictions)

35
Summary

2D Method Performs about as well as other 2D
methods for single molecule searches, outperforms
them by a large margin when combining molecules
(published in J. Chem. Inf. Comput. Sci. (2004)
44, 170-178)
3D Method Combines high enrichment factors with
scaffold hopping discovery of new chemotypes
Performance (at least in part) due to Bayesian
Classifier, which is able to take multiple
structures and active and inactive information
into account

36
Acknowledgements