Molecular Similarity Searching Using Atom Environments and Surface Point Environments - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Molecular Similarity Searching Using Atom Environments and Surface Point Environments

Description:

54 Acetylcholine esterase (AChE) 999 Diverse Compounds from MDDR ... AChE much better than other methods (docking difficult large pocket, water, ... AChE ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 37
Provided by: andreas83
Category:

less

Transcript and Presenter's Notes

Title: Molecular Similarity Searching Using Atom Environments and Surface Point Environments


1
Molecular Similarity Searching Using Atom
Environments and Surface Point Environments
  • Andreas Bender
  • Unilever Centre for Molecular Informatics
    Cambridge University, UK

2
Outline
  • Objective More efficient searching of chemical
    databases
  • New methods developed to detect molecules with
    similar biology One is based on connectivity
    (2D), the other on surface points (3D)
  • Details of the algorithms presented here,
    starting with the 2D type
  • Results Lead Discovery finding new drugs,
    finding new chemotypes

3
Descriptor Choice
4
2D Environment around an atom
  • E.g. 6-aminoquinoline

Assign Sybyl mol2 atom types find
connections find connections to
connections create a tree down to n levels bin
the atom types for each level create a
fingerprint for this atom
Level 0 Level 1 Level 2
N2
Car--Car
Car,H
Car,Car
1
2
1
1
These features are created for every atom in the
molecule
5
Feature Selection
  • E.g. comparing faces first requires the
    identification of key features.
  • How do we identify these?
  • The same applies to molecules.

6
B) Information-Gain Feature Selection
  • We wish to select the important features.
  • To do this we calculate the entropy of the data
    as a whole and for each class.
  • This is used to select those features with the
    highest discrimination, e.g. active and inactive
    molecules.

7
Classification
  • The next step is to identify which molecules
    belong to which class.
  • To to this we use a Naïve Bayesian Classifer
    using the features (atom environments) we have
    identified as being important.

8
C) Naïve Bayesian Classifier (classification by
presumptive evidence)
  • Include all selected features fi in calculation
    of
  • Ratio gt 1 Class membership 1
  • Ratio lt 1 Class membership 2
  • F feature vector
  • fifeature elements

9
Application lead discovery
  • Database MDL Drug Data Report (MDDR)
  • 957 ligands selected from MDDR
  • 49 5HT3 Receptor antagonists, 40 Angiotensin
    Converting Enzyme inhibitors (ACE), 111
    HMG-Co-Reductase inhibitors (HMG), 134 PAF
    antagonists and 49 Thromboxane A2 antagonists
    (TXA2) Briem and Lessel, Perspect Drug Discov
    Des 2000, 20, 245-264.
  • A) Hit rate among ten nearest neighbours for each
    molecule
  • B) 20-fold Cross Validation, 5 Molecules for
    query generation

10
Comparison
Using single molecule query
  • Briem and Lessel, Perspectives in Drug Discovery
    and Design 2000, 20, 245-264.

11
Combining Information in Molecules
  • In this method, we can extend the approach by
    extracting from a set of molecules those features
    having the best information gain
  • This can describe patterns in molecules much
    better than individual cases
  • The following example shows cross-validated
    database searches using combinations of features
    from five molecules at a time
  • Inactives were used in a 50/50 split, no molecule
    is in the training and the test set at the same
    time

12
MDDR ACE cumulative recall plot
Optimal Selection
Random selection
Random Selection
We found about 80 of the active molecules among
the first 10 of the library
13
Using Multiple Query Molecules
14
Transformation to 3D
  • Idea To develop an analogous translationally and
    rotationally invariant (TRI) descriptor based on
    surface points
  • Advantage Switching from element atom types to
    interaction energies gives more general model -gt
    scaffold hopping?
  • Two parts Interaction fingerprint and shape
    description here results using only interaction
    fingerprints are shown, shape description under
    development
  • Again information-gain feature selection and the
    Naïve Bayesian used for Classification

15
3D Environment around a surface point solvent
accessible surface
Central Point (Layer 0)
Points in Layer 1
  • Points in Layer 2

Etc.
16
Algorithm
Interaction Energies at Surface Points, one Probe
at a time
Binning Scheme -1.0 -0.45 -0.4 -0.3 -0.1 0.0 0.2
-0.35
-0.35
Surface Point Environment
00010000 01100010 - 011101100
17
Relation to other algorithms
  • Surface Autocorrelation Averaging of interaction
    energies Here a favourable and unfavourable
    interaction in a given layer will both remain in
    the fingerprint
  • GRIND continuous variables from GRID entire
    field of interaction energies simplified only
    maximum product enters descriptor
  • MaP categorical variables, counts are kept
    size description
  • (In addition the feature selection and scoring
    are handled differently)

18
Algorithm Flow
19
Standard Parameters
  • MSMS Probe radius 1.5 Å, Density 0.5 Points/ Å2,
    double Van-der-Waals radii for atoms, giving
    effectively solvent accessible surface
  • GRID DRY, C3, N1, N2, O, O- probes, otherwise
    standard parameters
  • Binning Using variable number of layers, 8 bits,
    cutoffs were set that equal frequencies are
    observed

20
L0-4
L0-5
L0-3
L0-2
L0-1
Layer0
  • Random

21
Enrichment Curves Briem (4 Layers, Standard
Settings)
22
Comparison
23
ACE Binding Site
  • Snake venom peptide analog with putative binding
    motif to angiotensin used in early compound
    design (Cushman et al., Biochemistry (1977), 16,
    5484-5491.)

24
ACE - Query
25
Hits using 2D descriptors Hits 1 to 5
26
Hits Using 2D Descriptors Hits 6 to 10
27
ACE Selected Features
28
Hits using 3D descriptors (10 Hits among top 20,
enrichment 20)
29
New scaffolds
30
Jacobsson data set
  • 110 ER? toxins (ER?t)
  • 36 ER? mimics (ER?m)
  • 60 Matrix Metalloprotease 3 (MMP3)
  • 129 Factor Xa (fXa)
  • 54 Acetylcholine esterase (AChE)
  • 999 Diverse Compounds from MDDR
  • 2/3 for training, 1/3 for testing
  • Performance Measure Classification
  • Actives and Inactives also used in other methods
  • Jacobsson et al, J Med Chem 2003, 46, 5781-5789

31
Jacobsson et al., Methods
  • Docking with 7 Scoring functions (2 implemented
    in ICM, five in Tripos CScore) used (GOLD, ICM,
    Glide similar)
  • Fusion by classical consensus scoring (CScore),
    Partial Least Squares Discriminant Analysis
    (PLS-DA), Bayesian Classification and rule-based
    methods
  • With exception of ER? large number of close
    analogues

32
Results
  • With the exception of MMP3, superior performance
    to other methods (better precision accuracy at
    the same recall)
  • AChE much better than other methods (docking
    difficult large pocket, water, multiple binding
    sites)
  • Given the fact that docking takes much time, at
    least in some cases (AChE) it seems not to be the
    method of choice

33
Factor Xa
(Accuracy is overall correct prediction,
precision fraction of correct positive
predictions)
34
AChE
  • (Accuracy is overall correct prediction,
    precision fraction of correct positive
    predictions)

35
Summary
  • 2D Method Performs about as well as other 2D
    methods for single molecule searches, outperforms
    them by a large margin when combining molecules
    (published in J. Chem. Inf. Comput. Sci. (2004)
    44, 170-178)
  • 3D Method Combines high enrichment factors with
    scaffold hopping discovery of new chemotypes
  • Performance (at least in part) due to Bayesian
    Classifier, which is able to take multiple
    structures and active and inactive information
    into account

36
Acknowledgements
  • Robert C Glen (Unilever Centre, Cambridge, UK)
  • Hamse Y Mussa (Unilever Centre, Cambridge, UK)
  • Stephan Reiling (Aventis, Bridgewater, USA)
  • David Patterson (Tripos)
  • Funding
  • The Gates Cambridge Trust, Unilever, Tripos
  • Chemical Computing Group / ACS Comp Division
Write a Comment
User Comments (0)
About PowerShow.com