QSAR Modelling of Carcinogenicity for Regulatory Use in Europe - PowerPoint PPT Presentation

About This Presentation
Title:

QSAR Modelling of Carcinogenicity for Regulatory Use in Europe

Description:

QSAR Modelling of Carcinogenicity for Regulatory Use in Europe Natalja Fjodorova, Marjana Novi , Marjan Vra ko, Marjan Tu ar, National institute of Chemistry, – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 27
Provided by: Natal60
Category:

less

Transcript and Presenter's Notes

Title: QSAR Modelling of Carcinogenicity for Regulatory Use in Europe


1
  • QSAR Modelling of Carcinogenicity for Regulatory
    Use in Europe

Natalja Fjodorova, Marjana Novic, Marjan Vracko,
Marjan Tušar, National institute of Chemistry,
Ljubljana, Slovenia
2

CAESAR MEETING, 17.11.2008,BERLIN, GERMANY
3
Overview
  • Carcinogenic potency prediction-
  • state of art
  • Data and methods used for modeling by NIC_LJU
  • Statistical performance of obtained models and
    their evaluation
  • Some findings about structural alerts
  • Conclusion

4
Carcinogenic potency prediction- state of art
  • The QSAR models can be divided into two families
  • congeneric (for certain classes of chemicals)
    external prediction performance for rodent
    carcinogenicity is 58 to 71 accurate
  • noncongeneric (for different classes of
    chemicals) accuracy is around 65.
  • Further studies are required to improve the
  • predictive reliability of noncongeneric
    chemicals.
  • Ref.Romualdo Benigni, Cecilia Bossa, Tatiana
    Netzeva, Andrew Worth.
  • Collection and Evaluation of (Q)SAR Models for
    Mutagenicity and Carcinogenicity. EUR 22772EN,
    2007

5
  • The chemicals involved in the study belong to
    different chemical classes,
  • (noncongeneric substances)
  • The work is addressed to industrial chemicals,
    referring to REACH initiative. The aim is to
    cover chemical space as much as possible

6
  • Carcinogenicity prediction
  • in scope of CAESAR project
  • Present state
  • - compilation of dataset for carcinogenicity ?
  • cross-checking of structures ?
  • calculation of descriptors
    ?
  • selection of descriptors
    ?
  • development of models carcingenicity ?
  • investigation of structural alerts (SA)- ongoing

7
Dataset
  • 805 chemicals were extracted from rodent
  • carcinogenicity study findings for 1481chemicals
  • taken from Distributed Structure-Searchable
  • Toxicity (DSSTox) Public Database Network
  • http//www.epa.gov/ncct/dsstox/sdf_cpdbas.html
  • derived from the Lois Gold Carcinogenic Database
  • (CPDBAS)

8
Response
  • for quantitative models
  • TD50_Rat- Carcinogenic potency in rat
  • (expressed in mmol/kg body wt/day)
  • for qualitative models
  • yes/no principle
  • P-positive-active
  • NP-not positive-inactive

9
Training and test sets
  • 805 chemicals were splitted into
  • training set (644 chemicals) and
  • test set (161 chamicals)
  • (done at the Helmholtz Centre for Environmental
    Research UFZ (Germany)

10
Distribution of active (P) and inactive (NP)
chemicals in the total, training and test sets
11
Descriptors
  • 254 MDL descriptors calculated by MDL
  • QSAR software,
  • 254MDLdes_806carcinogenicity.rar file
  • 835 Dragon descriptors calculated by
  • DRAGON software,
  • Dragon_Carc.xls file
  • 88 CODESSA descriptors calculated
  • using CODESSA software
  • 88_CODESSA_descr_Cancer.xls  file

12
Descriptors used for modeling
  • Model CARC_NIC_CPANN_01
  • 27 MDL descriptors provided by NIC_LJU
  • (method for variable selection Kohonen network
    and PCA).
  • Model CARC_NIC_CPANN_02
  • 18 DRAGON and MDL descriptors were taken from one
    of the best models (CARC_CSL_KNN_05) developed by
    CSL. The goal was to compare results obtained for
    carcinogenicity prediction using different
    methods.
  • Model CARC_NIC_CPANN_03
  • 34 CODESSA descriptors were taken from one
  • of the best models (CARC_CSL_KNN_02) developed by
    CSL.
  • (method for variable selection for models 2 and
    3- cross correlation
  • matrix, multicolinearity technique, fisher ratio
    and genetic algorithm)

13
Counter Propagation Artificial Neural Network
Step1 mapping of molecule Xs (vector
representing structure) into the Kohonen layer
Step2 correction of weights in both, the Kohonen
and the Output layer
Step3 prediction of the four-dementional target
(toxicity) Tscarcinogenicity
14
Model input parameters
  • Minimal correction factor- 0.01
  • Maximum correction factor- 0.5
  • Number of neurons in x direction- (35)
  • Number of neurons in y direction- (35)
  • Number of learning epochs-
  • 100, 200, 400, 600, 800, 1000, 1200, 1400,
    1600, 1800

15
Statistical evaluation of models
  • Confusion matrix for two class

True positive (TP) True negative (TN) False
positive (FP) False negative (FN) Accuracy (AC)
(TNTP)/(TNTPFNFP) Sensitivity(SE)TP/(TPFN)
Specificity(SP)TN/(TNFP)
16
Statistical performance of models
17
Changing the threshold from 0 to 1 leads to
decrease the number of false positive and
increases and number of false negative increases.
This tendency is common for all our models 1, 2
and 3.
18
(No Transcript)
19
In the figure we have marked the maximum accuracy
and corresponding thresholds. For model 1 the
optimal threshold is equal to 0.45. In this case
accuracy has a maximal value of 0.68, sensitivity
is 0.71 and specificity is 0.65.
20
For model 2 optimal threshold for test set is 0.6
and accuracy has maximal value of 0.70.
Sensitivity in this point is 0.69 and specificity
is 0.72.
21
For model 3 optimal threshold is equal to 0.5,
maximum accuracy is 0.68, sensitivity is 0.70 and
specificity is 0.62. Changing the threshold leads
to revision of sensitivity and specificity. It
may be used to increase the number of correctly
predicted carcinogens or non carcinogens.
22
The closer the curve tends towards (0,1) the more
accurate are the prediction made
A model with no predicted ability yields the
diagonal line
23
Accuracy of prediction and area under the curve
(AUC) (models 1,2,3)
24
Study structural alerts for our dataset collected
from Benigni Toxtree program
  • We have extracted the following alerts for out
    dataset of 805 compounds
  • GA-genotoxic alerts
  • nGA-non-genotoxic alerts
  • NA-no carcinogenic alerts
  • When we have calculated how many chemicals with
    pointed alerts fall into NP-not positive and
    P-positive area.

25
P-positive and NP-not positive relates only for
results for rats
For substances with GA about 2/3 belong to
Positive and about 1/3 to NP-not positive For
substances with nGA about half substances belong
to Positive and half to NP For substances with
NA-no carcinogenic alerts about 2/3 belongs to NP
and 1/3 belong to Positive
Needs for future investigations
26
Conclusion
  • Quantitative models with dependent
    variable-tumorgenic dose TD50 for rats, have
    shown low prediction power with correlation
    coefficient for the test set less than 0.5.
  • Conversely, qualitative models demonstrated an
    excellent accuracy of internal performance
    (accuracy of the training set is 91-93) and good
    external performance (accuracy of the test set is
    68-70, sensitivity is 69-73 and specificity
    63-72).
  • Changing the threshold leads to revision of
    sensitivity and specificity. It may be used to
    increase the number of correctly predicted
    carcinogens or non carcinogens.
Write a Comment
User Comments (0)
About PowerShow.com