Title: QSAR Application Toolbox: Step 4: Category pruning capabilities
1QSAR Application ToolboxStep 4 Category
pruning capabilities
2Objectives
- This presentation demonstrates a number of
functionalities of the Toolbox - Category definition by protein binding mechanism
- Identifying and removing (pruning) from the
category chemicals that have additional
protein-binding mechanisms - Filling data gaps by trend-analysis.
3The Exercise
- In this exercise we will predict the toxicity
towards the cilliate Tetrahymena pyriformis, of
the substance Hexanal, 4-Methyl (CAS Nr
41065-97-8), which is called the target
chemical. - This prediction will be accomplished by
collecting experimental results for a set of
chemicals considered to be in the same category
as the target molecule. - The category definition will be based on Protein
binding mechanism. - The Trend-analysis will be used for data gap
filling.
4Input of target chemical by CAS Number
5 Chemical Identification Information
6Profiling
- Profiling refers to the electronic process of
retrieving relevant information on the target
compound, other than environmental fate,
ecotoxicity, and toxicity data, which are stored
in the Toolbox database. - Available information includes likely
mechanism(s) of action.
7Profiling Target Chemical
- Select the Profiling methods you wish to use
by clicking on the box before the name of the
profiler. - This selects (a red check mark appears) or
deselects (red check disappears) profilers. - For this example check all 8 mechanistic
methods.
8Profilers for Hexanal, 4-Methyl
9Profiles of Hexanal, 4-Methyl
Profiling results for hexanal,4-methyl -Very
specific profiling results are obtained for the
target compound -Please note the specific
protein-binding profile -These results will be
used to search for suitable analogues in the next
steps of the exercise
10Endpoints
- Endpoints refer to the electronic process of
retrieving the measured data for environmental
fate, ecotoxicity and toxicity that are stored in
the Toolbox database. - Data gathering can be executed in a global
fashion (i.e., collecting all data of all
endpoints) or on a more narrowly defined basis
(e.g., collecting data for a single or limited
number of endpoints). - In this example, we limit our data gathering to
common toxicity endpoints from all databases
except Danish EPA (where only estimated results
are stored).
11Data Gathering
12Next Step in Data Gathering
- Toxicity information on the target chemical is
electronically collected from the selected
datasets. - In this example, an insert window appears
stating there was no data found for the target
chemical (see next slide). - Close the insert window.
13No data for target chemical
14Recap
- You have entered the target chemical being sure
of the correct structure. - You have profiled the target chemical and found
no experimental data is currently available for
this structure. - You have identified a data gap, which you would
like to fill.
15Category Definition
- This module provides the user with several means
of grouping chemicals into a toxicologically
meaningful category that includes the target
molecule - The target chemical (Hexanal,4-methyl) could
react with proteins and thus has a potential for
excess aquatic toxicity. - Therefore mechanisms by which the target
chemical binds with proteins are relevant to
grouping chemicals that may act as aquatic
toxicants, so we have mechanistic reasons for
defining our category based on a specific
protein-binding mechanism. - Highlight Protein binding in the list of
Grouping methods - Click on Defining category
16Defining the Category
17Category results
The category of chemicals with the same protein
binding mechanism (Schiff base formation)
consists of 184 mechanistic analogues.
18 Gathering Data
- Highlight the category of 184 analogues with
the same Protein binding mechanism - The inserted window entitled Read Data?
appears (see next slide). - Click OK.
19Summary of Aquatic toxicity Information for
Analogues
20Reading the Selected Data
- Select the mode of reading data. In this case
click on Select single to eliminate any double
entries in the databases. - Click OK.
21Data Tree
- All the analogues with their available
experimental results are inserted into the data
matrix (see next slide). - Open the data tree by double-clicking on the
nodes of the data tree, to access the results for
Tetrahymena pyriformis - Ecotoxicological information
- Aquatic Toxicity
- Protozoa
- Tetrahymena pyriformis
- IGC50
22Data Tree
23The Filling Data Gap Window
- Move to the module Filling data gap
- Take a moment to examine the filling data matrix
on the next slide. - Note it contains
- information on the chemicals, which form the
category, - the 3 options for data filling, and
- a means of selecting data points used to fill the
data gap.
24The Filling Data Gap Window
25Filling Data Gaps
- This step in the work flow provides the user
with three options for making an
endpoint-specific prediction for the target
molecule. - As noted earlier, these options, in increasing
order of complexity, are - by read-across,
- by trend analysis, and
- through the use of QSAR models.
- In this example we only use trend analysis.
26Filling Data Gaps
27Selecting the Data Points
- Before applying trend analysis, the Toolbox
allows the user to decide which type of results
should be used in case more than one result is
available for any analogue, (i.e., all values,
average values, minimum or maximum results) . - It should be noted that averaging results is
only useful for quantitative endpoints, which is
the case in this example.
28Data Point Selection
29Applying Trend-analysis
- Highlight the data endpoint box corresponding to
IGC50 for Tetrahymena pyriformis under the
target chemical. - It should be empty as this is the data gape we
are trying to fill. - Next with the trend analysis box highlighted,
click Apply.
30Data Point Selection
The trend analysis is chosen, because we have a
quantitative endpoint and enough data.
31Results of Trend-analysis
32Interpreting the Trend-analysis
- The resulting plot shows the experimental results
(IGC50-48h) of all analogues (Y axis) according
to the default descriptor Log Kow (X axis). - The RED dot represents the target chemical.
- The BLUE dots represent the experimental results
available for the analogues. - The GREEN dots (see following slides) represent
the analogues belonging to different
subcategories.
33An Accurate Trend Analysis of the Data set
- Due to the polyfunctionality of the molecules,
there are analogues with additional protein
binding mechanisms (e.g. Michael-type
nucleophilic addition and nucleophilic addition
to azomethynes) different from those of the
target compound. - There are analogues with organic functional
group different than those of the target - for
example alkane, arene, benzyl. - These analogues can be identified via
subcategorisation.
34An Accurate Trend Analysis of the Data set
- Two subsequent subcategorisations are applied to
prune the analogues - Having protein binding mechanism different than
that of the target (Subcategorisation 1) - Which are structurally dissimilar to target
i.e., have different organic functional groups
than those of the target (Subcategorisation 2).
35Subcategorization(1)
36An Accurate Trend Analysis of the Data set
- A mechanistic transparency is provided for
different subsets of analogues - Highlight the different interaction mechanisms
associated with analogues in the
subcategorisation window - By right clicking one could see the chemicals
with the specified mechanism - Also, by double clicking on a dot in the graph,
detailed information can be obtained for the
structural and parametric boundaries of
underlying binding mechanism.
37An Accurate Trend Analysis of the Data set
By right clicking one can see the chemicals with
the specified mechanism
38An Accurate Trend Analysis of the Data set
39An Accurate Trend Analysis of the Data set
By double clicking on a dot in the graph,
detailed information can be obtained for the
structural and parametric boundaries of
underlying binding mechanism
Click to see detailed information
Click to see detailed information
40Subcategorization(1)
41Subcategorization(2)
42Results
43Filled Data Gap
- The predicted target value can be accepted.
- Click on Accept.
- The estimated result is inserted into the data
matrix (see next slide)
44Filled Data Gap
45Report
- The final step in the workflow, report, provides
the user with a downloadable written audit trail
of what the Toolbox did to arrive at the
prediction. - Click on Study history.
- This study history can be printed or copied to be
inserted in a more detailed report (see next
slide).
46Report