In silico screening in modern drug discovery research - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

In silico screening in modern drug discovery research

Description:

In silico screening in modern drug discovery research. Presented by Olga Komina ... MW, CmOnHk ,hydrophobicity. 2D-descriptors encode chemical topology ... – PowerPoint PPT presentation

Number of Views:1347

Avg rating:3.0/5.0

Slides: 49

Provided by: okom

Category:

more less

Transcript and Presenter's Notes

Title: In silico screening in modern drug discovery research

1
In silico screening in modern drug discovery
research

Presented by Olga Komina
Department of Computer Science Engineering
University of Nebraska Lincoln
July 2004

2
Modern Drug Discovery

Multidisciplinary area of research
Combinarotial chemistry
Chemoinformatics
Molecular biology
Biochemistry
Medicine
Macromolecular modeling
Pharmacology
Drug Discovery is a goal of research. Methods and
approaches from different science areas can be
applied to achieve the goal.

3
Drug Discovery Pipeline

Target identification and validation
Assay development
Virtual screening (VS)
High throughput screening (HTS)
Quantitative structure activity relationship
(QSAR) and refinement of compounds
Characterization of prospective drugs
Testing on animals for activity and side effects
Clinical trials
FDA approval

4
Computer-aided Drug Design Strategy
5
Mechanism of Drug Action
6
Virtual Screening (VS)

In silico screening of large compound databases
in order to reduce the scale of high-throughput
screening.
Conceptual diversity
Small molecule screening
Protein structure based screening
Algorithmic diversity
Similarity searching
Clustering and partitioning
Simple filters
Artificial intelligence
Integration of different computational approaches
Similarity paradox

7
Similarity Paradox
8
Descriptors of Molecular Structure Properties

1D-descriptors encode chemical composition
physicochemical properties
MW, CmOnHk ,hydrophobicity
2D-descriptors encode chemical topology
Connectivity indices, degree of branching, degree
of flexibility, of aromatic bonds
3D-descriptors encode 3D shape, volume,
functionality, surface area
Pharmacophore the spatial arrangement of
chemical groups that determines its activity

9
Connectivity Indices

Connectivity of an atom
of atoms connected to it
Connectivity of a bond -
the reciprocal of the square root of the product
of the connectivities of the atoms
Connectivity index of a molecule summation of
all bond connectivities

Isobytul alcohol
10
Classification of Atoms to Atom Types

Developed for prediction of log P values
A molecule characterized by the count of 120 atom
types
Atom type commonly occurring atomic states of
C, H, O, N, S, P, Se and
halogens (F, Cl, Br, I)

11
Atom Types (Carbon)
12
Example Description of a Molecule by the Count
of Atom Types
13
Molecular Fingerprints

Molecule A 00011101010
Molecule B 00101111000
Tanimoto coefficient Tc

Nc
3
Tc

NA NB - Nc
5 5 - 3
Nc the number of common bits set on NA the
number of bits set in A NB the number of bits
set in B.
14
Drugs vs. Non-drugs

Enriching screening libraries with drug-like
compounds
fail fast, fail cheap strategy
Manual classification is time-consuming and bias
Computational approaches speeds up the screening,
reduce the size and improves the quality of
combinatorial libraries
Assumption typical drugs have something in
common that other compounds lack

15
Lipinski Rule of Five (1997)

Poor absorption and permeation are more likely to
occur when there are more than 5 hydrogen-bond
donors, more than 10 hydrogen-bond acceptors, the
molecular mass is greater than 500, or the log P
value is greater than 5.
Further research studied a broader range of
physicochemical and structural properties
Related problems
Compound toxicity
Compound mutagenicity
Blood-brain barrier penetration
Central nervous system activity

16
Data Sets

Drug Databases
World Drug Index (WDI)
Comprehensive Medical Chemistry (CMC)
MACCS-II Drug Data Report (MDDR)
Non-drug Databases
Available Chemical Directory (ACD)
Quality of training sets

17
Artificial Neural Networks

ANNs are self learning systems which learn from
experience
Biologically inspired
Neuron is a processing element
Artificial neuron simulates four basic functions
of a natural neuron
Receives input from other sources
Combines those inputs in some way
Performs nonlinear operations on the result
Outputs the final result

18
Artificial Neuron
19
Network Topology
20
ANN Training

Supervised both inputs and outputs are provided
Initial weights chosen randomly
Errors propagated back through the system to
adjust weights
Most common algorithm backward-error propagation
(back-propagation)

21
ANNs for Drug Classification (1998)

Input Counts of atom types
Topology 92 x 5 x 1
Feedforward with backpropagation
Training 5000 ACD and 5000 WDI
Accuracy 83 - ACD, 77 - WDI

22
ANNs for Drug Classification (1998)

Input seven 1D descriptors (MW, log P, aromatic
density) and ISIS fingerprints
Topology 173 x 0/5/10 x 1
Bayesian learning procedure
Training 3500 ACD and 3500 CMC
Accuracy 90 - CMC, 80 - MDDR, 90 - ACM

23
Misclassification Examples
Misclassified non-drug
Misclassified drug
24
ANNs to Predict Biological Activity

Applications
CNS-active compounds
Protein kinase inhibitors
G protein-coupled receptor ligands
Best prediction accuracy 80
Advantage capable of predicting structurally
diverse compounds
Disadvantage no definite rules

25
Recursive Partitioning

Statistical method for analyzing and mining large
data sets that consists of active and inactive
molecules
HTS data analyzed to discover SAR
Easy to visualize and interpret
Applicable to a variety of classification problem
A problem of assigning chemical compounds to
property classes based on their structural and
physicochemical features

26
Partitioning Problem Definition

Given a training set of D descriptor values and P
property values for each molecule in the set, the
question is to create a set of yes/no questions
which are organized into hierarchical tree from
with one question per node and class predictions
at leaf nodes with minimum classification error.

27
Single Property RP

Single property classification such as molecules
classified active or inactive
Drugs vs. Non-drugs
C4.5, C5.0

28
Single Property RP (cont.)

All possible questions are asked based on single
descriptor values, scores of corresponding
partitions are computed
Descriptor resulting in the best score is used
to grow the tree
Loop to question asking until terminating
condition is met

29
Gini Impurity Metric

Impurity, I, of a node
I ? pipj
where pi and pj are the fractions of the members
of a node that belong to class value i and j
respectively
Gini metric maximizes the decrease in Impurity,
?I, from a potencial node question
?I I pLIL pRIR
where pL and pR are the fractions of the node
members that partition to nodes L and R
respectively for a given question, and IL and IR
are the impurities of new nodes

30
Tree Growth
Entire Training set
Root
Descriptor 3
yes
no
Node L
Node R
Pruning phase metric R? R? ?Nleaf R? the
number of misclassifications in the training set
31
Application of Single Property RP for
Drug/Non-drug Classification

Input 120 atom types
C5.0
Training 5000 WDI, 5000 ACD
Prediction error 21
The presence of alcohols, tertiary and secondary
amines, phenols, enols, and carboxylic groups
accounts for 75 of correct classifications for
drugs.

32
Decision Tree for Drug/Non-drug Classification
33
Multiple Property RP

SP is not sufficient in many biological systems
ADMET properties
Absorbtion
Distribution
Metabolism
Excretion
Toxicity
Nonspecific binding to multiple targets causes
side effects
Dependent properties

34
Partially Unified Multiple Property RP

Developed for prediction of multiple dependent
properties
Discover features that distinguishes the classes
of different properties and make them similar
Some node apply to all properties while others
apply to only single properties
Classes are NOT mutually exclusive
Nodes are labeled with one class of a single
property type

35
Mapping to SP Representation

D descriptor values x1, x2, x3, , xD
P property values y1, y2, y3, , yP
New descriptor K is a property descriptor
x1, x2, x3, 1, y1
x1, x2, x3, y1, y2, y3 x1, x2, x3, 2,
y2
x1, x2, x3, 3, y3
Every path from the root to a leaf has a split on
the descriptor K

36
1. Pure Specific Tree
2. Generic node growth Max ( Min ?I k ) gt 0
k
37
PUMP-RP (cont.)

A split with an improvement for each property is
chosen
The metric maximizes the minimum decrease in
impurity from each potential node question
A compound may appear in more than one leaf node
Each K node is regrown recursively
The resulting tree is overgeneralized

38
Finding the Best Tree

R?? Ro ?(Nleaf - ?Ngeneric)
Where ? is a generality parameter,
Ngeneric is the number of generic nodes

39
Application of PUMP-RP for Drug Specificity

Cyclooxygenase (COX) inhibitors
COX-2 inhibitors are antiinflammatory agents
COX-1 inhibitors damage gastrointestinal tract
Good drug should be highly specific to COX-2
Celebrex, Vioxx are widely prescribed
Goal to obtain a model of activity and
selectivity of COX-2 inhibitors as a function of
their physicochemical properties

40
Data and Results

100 2D and 3D descriptors
Each property has two classes active and
inactive
Gini Impurity score
Accuracy
on the training set
60-80 COX-2, 78-91 COX-1
On the test set
50-89 COX-2, 60-100 COX-1
Disadvantage not capable of predicting compounds
with molecular scaffolds not yet discovered

41
Extension to PUMP-RP

To model systems with more than two properties
semi-generic node applies to more than one
property but not all
To model multiple properties with opportunity to
observe what properties are more closely related
than others
Problems to apply
ADMET properties
Activity/ADMET properties
COX-2/COX-1/Drugs
Drug-drug interactions based on target
specificity
Modified Gini Impurity score

42
Gini Impurity for the Extended PUMP-RP

Modified scoring function
Max (Max (Min ?Ik)), where k P

k
43
Tree Built by the Extended PUMP-RP
44
Targeting RNA

Emerging field in drug discovery
RNA plays an essential role in many biological
processes
Natural antibiotics are RNA-targeting drugs
(streptomycin, tetracycline, etc)
Potential drug targets viral RNAs
Antisense strategy

45
Targeting RNA

HTS against RNA targets less successful that for
protein targets
Identification of new classes of RNA ligands are
extremely rare
Limited knowledge of the chemistry and structure
of RNA recognition
Consists of 4 nucleotides less diverse than
proteins, RNA flexibility

46
What Can Be Done?

Assumption compounds binding RNA have something
in common that other compounds lack
Dataset a comprehensive database containing
examples of bindings between small molecules and
RNAs
Computational approaches to extract common
features of such compounds and to train models
for prediction (AI methods)

47
Concluding Remarks

Drug Discovery is a goal of multidisciplinary
research
No algorithm to discover a drug
Old problem given a compound structure, what are
its properties?
Computational approaches can assist drug
discovery process
Limitation lack of systematic biological data
Market pressure and prospective profit bring more
and more resources into drug discovery

48
Multilevel Neighborhoods of Atoms
phenol

Write a Comment

User Comments (0)