Title: Detailed q2/Q2 results
1Detailed q2/Q2 results for 100 bootstraps
for final runs with (38 dummy features)
2(No Transcript)
3(No Transcript)
4(No Transcript)
5Ordered by correlation coefficient
6(No Transcript)
7Bagged relative sensitivity from 80 bootstraps
for the random dummy variable descriptors
with lower sensitivities will be eliminated in
the next iteration
Example last pass feature Selection from 80
sensitivity bags
Random dummy variable
Descriptors that will be eliminated in the
next iteration
8STRIPMINER OPERATION MODE
Bootstraps with sensitivity analysis with a dummy
var for descriptor selection (480 ? 39
descriptors)
Mode 6 feature selection with sensitivity
analysis ( 1000 neural nets) (Q2 0.46, all
molecules)
Ensemble bagging for selected descriptors Note
all ANNs 39x13x11x1 trained to error of
0.12 11 pats in validation set
Mode 0 train neural nets 300 bootstrap
ANNs (300 neural nets trained)
Mode 4 predict for test set using bagging
weights (100x30/300 bags) (3000 ANNs in user
mode)
Bag prediction on test set Note ensemble results
weighted by Q2 calculated in Mode0
9Stripminer Neural Network Sensitivity
Analysis With Dummy Feature
REPEAT
REPEAT 100x
Do neural network bootstrap And calculate Q2 for
validation set There is one random dummy
feature There is a validation set for bagging
MetaNeural
Prepare file for sensitivity analysis (can be
up to 30 MB) Run neural net in user mode for
sensitivity analysis
SENSIT
Calculate sensitivity results for 13 levels
and tally results in sen.txt
CONTINUE
Bag sensitivities
Bagging and feature selection
Reduce features by dropping feats with lower
sensitivity than dummy
TEST
(repeat until the dummy variable is the
least sensitive feature)
10Neural Network Sensitivity Analysis
Neural Network
?
Molecular weight
w11
h
w11
?
?
Boiling Point
H-bonding
?
?
Biological response
Hydrofobicity
?
h
Electrostatic interactions
w23
?
w34
Observable Projection
Molecular Descriptor
- Keep all inputs frozen at median values
- Turn one input at a time from 0 to 1
- Monitor vaqiation in outputs
- Outputs with largest variation are most
- sensitive ? more important
RENSSELAER
DDASSL
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16Correleation biased Sum of 30 best correlated
variables seems to have spurious correlation
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)