A novel approach to analysis of primary HTS data - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

A novel approach to analysis of primary HTS data

Description:

Compound Set Enrichment A novel approach to analysis of primary HTS data Thibault Varin Ansgar Schuffenhauer Gubler, H., Parker, C., Zhang, JH., Raman, P., Ertl, P. – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 23
Provided by: VARI189
Category:

less

Transcript and Presenter's Notes

Title: A novel approach to analysis of primary HTS data


1
Compound Set Enrichment
  • A novel approach to analysis of primary HTS data

Thibault Varin
Ansgar Schuffenhauer
Gubler, H., Parker, C., Zhang, JH., Raman, P.,
Ertl, P.
2
INTRODUCTION
Compound Set Enrichment
3
Introduction
  • Active series identification Can relevant SAR be
    extracted from primary HTS data?
  • Are activity data binary or continuous?

4
IntroductionActive series identification
Hypothesis 1 Within primary HTS screening
data, structure activity relationships (SAR) are
apparent and can be used to help selecting active
compound classes.
5
IntroductionAre the activity data binary or
continuous?
Activity
Scaffold 1
Scaffold 2
  • Binary activity
  • 1 active / 5 inactives
  • Scaffold 1 Scaffold 2

Continuous activity Scaffold 1 gt Scaffold 2
6
Introduction Are the activity data binary or
continuous?
Threshold 1 Activity
Threshold 2 Activity
Binary scaffold activity is different according
to the threshold
Hypothesis 2 Methods based on an activity
cut-off distort the activity information leading
to the incorrect assignment of active series of
compounds.
7
METHODS
Compound Set Enrichment
8
MethodsThe Scaffold Tree classification
The Scaffold Tree Visualization of the Scaffold
Universe by Hierarchical Scaffold Classification
A. Schuffenhauer, P. Ertl et al. J. Chem. Inf.
Model., 47, 47, 2007
9
MethodsDatasets
  • 7 PubChem bioassays
  • Ranging from 9389 to 263679 compounds
  • Ranging from 0.03 to 26.29 of active compounds

Hypothesis 1
PubChem Annotationfrom CRC
Simulation of the primary screening data
10
Methods Single hypothesis test summary procedure
  • 1. State the null and the alternative hypotheses
  • H0 the scaffold is inactive
  • H1 the scaffold is active
  • 2. Specify a significance level a0.01
  • 3. Compute the statistics and the p-value
    )?p-valueprobability that the scaffold is
    inactive (H0)
  • 4. Decision step
  • p-valuegt a H0 is accepted
  • p-valuelt a H0 is rejected and then H1 is
    acceptedThe scaffold is active

11
Methods The KS and the Binomial hypothesis tests
H0 there is no difference in the activity
distribution defined by compounds having the
scaffold S3-2 and the background distribution
H0 there is no difference in the proportion of
active compounds for compounds having the
scaffold S3-2 and the proportion of active
compounds for the full dataset.
Continuous data KS test
Binary data Binomial test
12
Methods Multiple hypothesis tests Bonferroni
correction
  • Problem of false positives
  • a probability to identify as active an inactive
    scaffold (for each test done...)
  • 100 inactive scaffolds probability to identify
    an active by chance is equal 63 (1-0.99100))
  • Suggests to test each scaffold at a critical
    significance level equal to a 0.01 / Nbr of
    scaffolds
  • Makes the assumption that the individual tests
    are independent
  • Each level in the Scaffold Tree have been done
    separately

13
MethodsDetermining the activity of classes
Hypo 1
Hypo 2
Scaffold activity evaluation
Multiple hypothesis test correction (Bonferroni)
Comparison of results
14
RESULTS
Compound Set Enrichment
15
ResultsComparison of KSP and BTP predictions
Bioassay Total Total Total Total BPCA significantly actives BPCA significantly actives BPCA significantly actives BPCA non significantly actives BPCA non significantly actives BPCA non significantly actives
Bioassay KSP BTP ? BPCA KSP BTP ? KSP BTP ?
Hydroxysteroid dehydrogenase 330 231 99 199 183 168 15 147 63 84
Caspase-1 331 114 217 5 2 2 0 329 112 217
PK 12 4 8 12 3 3 0 9 1 8
Luciferase 67 12 55 15 13 11 2 54 1 53
Luciferase 178 48 130 41 32 35 -3 146 13 133
CYP450 2C9 58 33 25 34 34 31 3 24 2 22
CYP450 3A4 121 64 57 60 60 53 7 61 11 50
  • With
  • KSP KS Prediction
  • BTP Binomial Threshold Prediction
  • ? KSP-BTP
  • BPCA Binomial PubChem Annotation

Both KSP and BTP retrieve BPCA significantly
active classes
Number of active classes KSP gt BTP
Most of new KSP active classes are not BPCA
significantly actives
16
ResultsKSP significantly active scaffolds that
are in Pubchem inactives
Compound activity (PubChem Annotation) Active
Inconclusive Inactive
WA
WA
WA
WA
17
ResultsPrioritize nodes instead of individual
scaffolds
Scaffold activity (KS Prediction /
Bonferroni) Non significantly active Significant
ly active
18
ResultsVisualization tool (Peter Ertl)
19
CONCLUSION
Compound Set Enrichment
20
ConclusionCompound Set Enrichment
  • Validation of initial hypotheses
  • A method to mine HTS data and identify active
    series of compounds
  • Chemical classification Scaffold Tree
  • Statistical analysis Kolmogorov-Smirnov
    hypothesis test
  • Multiple hypothesis test correction Bonferroni
    correction
  • Use all primary data
  • No activity cut-off
  • Identification of new active scaffolds not
    necessarily represented by very active compounds
    (latent hits) during the primary screen

21
With many thanks to
Acknowledgments
Primary mentor - Ansgar Schuffenhauer
Help MLI group
  • Scientific advisers
  • Christian Parker
  • Hanspeter Gubler
  • Ji-Hu Zhang
  • Peter Ertl
  • Edgar Jacoby

Fellowship Education office
  • Discussions
  • Martin Beibel
  • Sebastian Bergling
  • Meir Glick
  • Alain Dietrich
  • Marie-Cecile Didiot

22
Questions?
Write a Comment
User Comments (0)
About PowerShow.com