Novel Methods for the Prediction of logP, pKa and logD - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Novel Methods for the Prediction of logP, pKa and logD

Description:

Novel Methods for the Prediction of logP, pKa and logD ... Duodenum pH 5 7.5. Jejunum pH 6.5 8. Colon pH 5.5 6.8. Blood pH 7.4 'Liver-first-pass-effect' ... – PowerPoint PPT presentation

Number of Views:697
Avg rating:3.0/5.0
Slides: 35
Provided by: mat162
Category:

less

Transcript and Presenter's Notes

Title: Novel Methods for the Prediction of logP, pKa and logD


1
Novel Methods for the Prediction of logP, pKa and
logD
  • Li Xing and Robert C. Glen
  • J. Chem. Inf. Comput. Sci. 2002, 42, 796-805

2
Outline
  • Introduction
  • logP prediction
  • pKa prediction
  • logD calculation
  • Conclusions
  • Discussion

3
Introduction
4
Introduction The Motivation
  • Important for rational drug design
  • Bioavailability of a drug
  • The drugs access to the therapeutic target
  • Factors determing the environment the compound is
    exposed to
  • Route of administration of the drug
  • Location of the target
  • Interest in quantification of
  • the drugs affinity for the target
  • the ability to partition into a lipophilic
    environment at different pH values

5
Introduction The Motivation
  • Example of oral administration
  • Drug is exposed to a large variety
  • of pH values
  • Saliva pH 6.4
  • Stomach pH 1.0 3.5
  • Duodenum pH 5 7.5
  • Jejunum pH 6.5 8
  • Colon pH 5.5 6.8
  • Blood pH 7.4
  • Liver-first-pass-effect

www.3dscience.com
6
Introduction The Motivation
  • Applications
  • ADME/Tox screening of virtual libraries
  • Lipinskis rule of five
  • Quantitative structure-activity relationships
    (QSAR)
  • Protein active site determination
  • Goal Proper prediction of the drugs ability to
    interact with the biological target

7
Introduction The Descriptors
  • logP gives the distribution coefficient
  • of a substance in the 2 phase system
  • 1-octanol/water and thus is a mea-
  • sure of lipophilicity
  • The ionization constant pKa defines the strength
    of acidity
  • LogD is a pH dependent distribution coefficient
    and can be described using logP and pKa

www.fable.co.uk
www.llnl.gov/str/April02/Galli.html
8
logP Prediction
9
logP The Model
  • Use of only three parameters
  • Polarizability a
  • Partial charges
  • for oxygen atoms qO
  • for nitrogen atoms qN
  • Description of molecular size and dispersion
    interactions empirically by molecular
    polarizability
  • Charge estimation using Gasteiger-Hückel-Method

Created with BallView
10
logP The Data Set
  • Measured logP values for 449 molecules under
    standard conditions from CRC Handbook
  • 157 logP values, a collection of measured and
    predicted logP values
  • No account for stereochemistry cis/trans
    isomers represented by one of both with average
    logP assigned to it
  • In total 592 non-redundant molecules

11
logP The Model Fit
  • Sum of squared charges on nitrogen and oxygen
    atoms more effective than the corresponding
    absolute charges
  • The resulting equation by multiple linear
    regression
  • LogP 0.287(0.066) 0.190(0.004) a
    14.862(0.545) qN2 8.445(0.216) qO2
  • 10-fold cross-validition gives average q2 0.877

12
logP The Model Fit
  • Quality of the fit
  • r2 0.893,
  • r 0.945,
  • s 0.653,
  • F 1631,
  • n 592

13
logP The Test Set
  • 40 molecules from literature composed of five
    pharmacological classes
  • Quantitative evaluation of the diversity of the
    test set using Tanimoto index
  • Mean Tanimoto index 0.75
  • 70 below value of 0.85 (close neighborship)
  • Experimental values from 0.42 to 5.99

14
logP Predictive Power
  • LogP values for 40 drug molecules predicted by
    the model and CLogP vs. The experimental values

15
logP Predictive Power
  • Summary of statistics of the logP prediction on
    40 drug molecules
  • When breaking down the test set into therapeutic
    subsets
  • Class I antiarrhytmics are modeled by far the
    worst
  • (SE 1 log unit)
  • ß-blockers stand out as the best modeled class

16
pKa Prediction
17
pKa The Model
  • Molecular fragments precomputed as a general
    metalayer
  • Nitro, cyano, carbonyl, hydroxyl, amino groups
  • Amino groups divided by hybridization state
    (sp2/sp3)
  • Explicit handling of atom types

18
pKa The Model
  • Hypothesis of ionization state dependence upon
    immediate subenvironment
  • Hierarchical tree constructed from ionized atom
    outward up to five levels

19
pKa The Model
  • At each tree level 33 (24 atom and nine group
    types) atom/group type bins
  • Maximum number of bins over five levels
  • 5 levels 33 types/level 24 ionizable types
    189
  • The model
  • pKc0 base dissociation constant
  • xi and yj number of occurences of corresponding
    subclass
  • zi class-specific indicator variables
  • ai, bj, qk coefficients obtained from
    Partial-Least-Squares along with Sybyl and
    cross-validation

20
pKa The Data Set
  • Data set derived from Langes Handbook of
    Chemistry by careful inspection for dubious
    entries
  • pKa values in the range from -6.94 to 12.48 for
    bases and from -2.8 to 14.22 for acids
    respectively
  • In total 384 bases and 645 acids

21
pKa The Model Fit
  • Bases Acids
  • Summary

22
pKa The Test Set
  • 25 molecules of different chemical classes
    (Perrins examples)
  • Test set includes few molecules containing
    multiple ionizable centers
  • pKa values range from 1.8 to 10.23

23
pKa Predictive Power
  • Prediction and Perrins
  • values vs. experimental
  • Summary statistics

24
logD Calculation
25
logD The Calculation
  • LogD may simply be calculated from predicted logP
    and pKa of the singly ionized species at certain
    pH
  • For acids
  • logD(pH) logP log1 10(pH - pKa)
  • For bases
  • logD(pH) logP log1 10(pKa - pH)

26
Conclusions
27
Conclusions
  • Fast and simple methods with significant
    development potential
  • Usage of natural descriptors provides insights
    into mechanisms of partitioning and acid/base
    properties
  • Only 86(acids) and 80(bases) of training(!)
    samples predicted within one log unit of accuracy
  • No account for cis/trans isomers
  • Models seem to be overfitted since for all models
    slope and intercept of the regression are 1.0 and
    0.0 respectively

28
Thank you for your attention!
29
Discussion
30
Gasteiger-Hückel-Method
  • Description of atomic charges by partial charges
  • only uses information about atom type and bonds
  • partial equalisation of orbital electronegativity
  • Computes a grid for the molecule in question
  • For each grid point a partial charge is
    calculated
  • s-electrons by Gasteiger
  • p-electrons by Hückel
  • From this the atomic charges for the whole
    molecule are estimated

31
Cross-Validation (CV)
  • In K-fold cross-validation (here 10-fold), the
    original sample is partitioned into K subsamples
  • Of the K subsamples, a single subsample is
    retained as the validation data to validate the
    model
  • The remaining K - 1 subsamples are used as
    training data
  • The CV process is then repeated K times with each
    of the K samples used exactly once as the
    validation data
  • The K results from the folds then can be
    averaged to produce a single estimation.

32
Tanimoto Index
  • Also known as Jaccard similarity
  • Can be interpreted as the intersection of the
    properties of x and y divided by the union of
    their properties
  • If x and y are equal, then the Tanimoto index is
    1, if they are orthogonal, then it is 0

33
Partial-Least-Squares
  • Goal of ordinary least squares regression
  • Minimizing prediction error
  • Seeking linear functions of the predictors
    exhibiting as much variation as possible
  • Additional goal of Partial-Least-Squares
  • Accounting for variation in the predictors, under
    the assumption that directions with high variance
    provide better prediction for new observations
    when the predictors are highly correlated with
    the outcome

34
Partial-Least-Squares
  • Seeks directions that have high variance in
    predictors xj and high correlation with outcome y
  • The Algorithm
  • Compute univariate regression coefficients of y
    on each xj
  • Construct the first PLS direction
  • Regress y on z1 yielding
  • Orthogonalize x1,,xp with respect to z1
  • Continue with Step 1 until M p directions have
    been obtained
Write a Comment
User Comments (0)
About PowerShow.com