P. Gramatica1, M. Pavan1, F. Consolaro1, V. Consonni2 and R. Todeschini2 - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

P. Gramatica1, M. Pavan1, F. Consolaro1, V. Consonni2 and R. Todeschini2

Description:

1QSAR Research Unit, Dept. of Structural and ... Environmental fate of a chemical is strictly related to its biodegradability. ... MEC: molecular eccentricity ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 2
Provided by: profssapao
Category:

less

Transcript and Presenter's Notes

Title: P. Gramatica1, M. Pavan1, F. Consolaro1, V. Consonni2 and R. Todeschini2


1
QSAR MODELLING OF THE BIODEGRADATION BY HOLISTIC
MOLECULAR DESCRIPTORS
P. Gramatica1, M. Pavan1, F. Consolaro1, V.
Consonni2 and R. Todeschini2 1QSAR Research Unit,
Dept. of Structural and Functional Biology,
University of Insubria, Varese, ITALY 2Milano
Chemometrics QSAR Research Group, Dept. of
Environmental Sciences, University of Milano
Bicocca, Milano, ITALY E-mail paola.gramatica_at_uni
mi.it Web-site http//fisio.dipbsf.uninsubria.it/
dbsf/qsar/QSAR.html
INTRODUCTION Environmental fate of a chemical is
strictly related to its biodegradability. A good
prediction of biodegradation would greatly aid in
planning the synthesis of chemicals for
environmental uses. During recent years, many
approaches have been realised to model
biodegradation data with predictive purposes
most of them are based on quantitative
structure-biodegradability relationship (QSBR)
and mainly on a structure representation by
molecular fragments (i.e. functional groups,
number of atoms, etc.). Our approach to predict
the biodegradability is based on an holistic
representation of a chemical, by using a set of
molecular descriptors that account not only for
local characteristics of a structure, but also
for general aspects, allowing the extension to
multifunctional heterogeneous compounds. Due to
the great variability of biodegradation data and
the difficulty to consider a well-defined
end-point we have applied our descriptors to
different aspect of biodegradation in regression
modelling of BOD, ThOD, degradation rate
constants and in classification on various
biodegradability criteria.
  • MOLECULAR DESCRIPTORS
  • The molecular structure has been represented by a
    wide set of 657 molecular descriptors calculated
    by the software DRAGON1
  • constitutional descriptors (56)
    topological descriptors (69)
  • walk counts (20) BCUT descriptors
    (7)
  • Galvez index (21) 2D
    autocorrelation descriptors
  • charge descriptors (7) aromaticity
    descriptors (4)
  • molecular profiles (40)
    geometrical descriptors (18)
  • 3D-MoRSE descriptors (160) WHIM
    descriptors (99) 2
  • GETAWAY descriptors (196) empirical
    descriptors (3)
  • 1 R.Todeschini and V.Consonni - DRAGON -
    Software for the calculation of molecular
    descriptors, Talete s.r.l. Milan (Italy) 2000.
    Download http//www.disat.unimib.it/chm
  • 2 R.Todeschini and P.Gramatica, 3D-modelling
    and prediction by WHIM descriptors. Part 5.
    Theory development and chemical meaning of the
    WHIM descriptors, Quant.Struct.-Act.Relat., 16
    (1997) 113-119.

REGRESSION MODELS The regression models have
been applied on different data set 43 alcohols,
chetons and aromatic compounds 28 alchols and
chetons 15 anilines and phenols 17 PCBs and 43
heterogeneous compounds. Our representation of a
chemical is based on 670 molecular descriptors,
thus an effective variable selection strategy is
necessary. GA-VSS (Genetic Algorithm - Variable
Subset Selection) was applied to the whole set of
descriptors in order to set out the most
variables in modelling the biodegradation
end-points by Ordinary Least Squares regression
(OLS). Regression models have been obtained with
satisfactory prediction power. All the models
have been also validated on an external test set,
by splitting the original data set in
representative training and test sets by
different approaches on structural similarity.
BIODEGRADABILITY CLASSIFICATION Different
chemometric methods (CART, K-NN and RDA) were
used in order to classify 296 chemicals of
environmental concern according to some
literature biodegradability criteria obtaining
satisfactory results. The selection of the best
subset of variables were realized by Genetic
Algorithm (GA-VSS) on Logistic regression (Rlog),
a regression method useful when there is a
restriction on the possible values of the
dependent variable Y, and by PLS-DA, which
confirmed the results previously obtained. It is
important to point out that the literature
criteria disagree in most of the cases so that we
had to compare them in order to find a new
general classification criteria for the compounds
studied the comparison was realised as the
scheme below shows. All the models developed on
an opportunely selected training set have been
validated internally (ER) and externally
(ERext).
BEST MODEL PARAMETERS
Training set selection procedure
Data set 296 compounds
HATS5v leverage-weighted autocorrelation of
lag 5 (weighted by atomic van der Waals
volumes) R8m R maximal autocorrelation of lag 8
(weighted by atomic masses)
PREDICTION
Available biodegradability data 152 compounds
Not available biodegradability data 144 compounds
SPLITTING
Training set 77 compounds
Test set 75 compounds
PREDICTION
BEST MODEL PARAMETERS
BENe6 negative Burden eigenvalue n. 6 (weighted
byb atomic Sanderson electronegativities) Ds
WHIM total accessibility index (weighted by
atomic electrotopological states)
Linear Discriminant Analysis (LDA)
model variables nX, nN, P1u, Ku, Dm, ATS2p,
MEC, Mor04v No Model Error Rate (NOMER) 32.5
nX n. of halogen atoms nN n. of Nitrogen
atoms P1u 1st component shape directional WHIM
index Ku global shape WHIM index Dm total
accessibility WHIM index ATS2p autocorrelation
index of a topological structure MEC molecular
eccentricity Mor04v 3D-MoRSE-signal 04 (weighted
by atomic van der Waals volumes)
Confusion matrix in fitting
Confusion matrix in prediction
CONCLUSIONS
Different kinds of holistic molecular descriptors
appear relevant in the modelling of the
biodegradability. Both in regression models and
in classification models molecular descriptors
taking into account global structural properties
of the molecules have been selected by Genetic
Algorithm as correlated to biodegradability and
in same cases added to local descriptors.
Write a Comment
User Comments (0)
About PowerShow.com