Title: Clustering for the Masses: Pairing Jklustor with Seurat
1Clustering for the MassesPairing Jklustor with
Seurat
- ChemAxon U.S. UGM
- August 8th, 2008
- Derek Debe, Ph.D.
- Senior Group Leader
- Scientific Computing Services
- Work Credit Ravindra Mamidipaka
2Our Industrys Innovation Gap
Significant additional investment in data-driven
RD with low ROI.
3Why Has More Data ? More Impact?
Our goal is to make sure that it is not because
our RD informatics platforms are
under-delivering. Relevant Questions Are we
more preoccupied with data than the scientists we
support? Have we historically given data
producers better tools than data consumers? Are
we curating a garbage dump of data whose
utility for decision support has come and gone?
4The Data Life Cycle Wheres I.T.s Value?
Project Leader Knows Results Makes Decision
Project Leader Devises Next Study
Data Production Stage
Data Consumption Stage
Are we just data librarians, or can we
deliver systems that allow scientists to make
post data mart discoveries?
5The Bar Is High!
Deep Blue bested the human Kasparov at the game
of chess.
Image Credit IBM Research Website
- We have to add value in an environment where drug
project leaders - Master all of the literature around their target
and all of the literature around the top lead
series for their target. - Define the next experiments to be done and can
interact - directly with the data producers to find out the
latest results.
What functionality adds value in this environment?
6Data Consumer Needs List
- Automatic Project Information Updates (Persistent
Template) - Pain-Free Reporting and Re-Reporting
- Comprehensive Compound Data in One Interface
(Corporate Outside) - Effortless yet Flexible Data Pivoting Averaging
- Easy-to-use and Fully Integrated Plotting
Analytics - Fully Integrated Advanced Decision Support
Techniques - Hit-To-Lead, Lead Discovery, Lead Optimization
Candidate Selection (Consumers with Very
Different Needs!)
7Historic Tools For Data Consumers
Grid Views
Graph Views
Form Views
x
x
- Scientists Endure
- Interoperability Limits
- High Learning Curve
- Inadequate, Delayed Decision Support
Assay Data
Chemistry Data
Calculated Properties
Pipeline Pilot (Data Flexibility First)
Strategic IT Concerns Client-Database
Connectivity Explosion Diverse Set of Clients
Prevents New Integrations
8Goal A Single, Integrated Package
Scientists Enjoy One Interface To All
Views Lower Learning Curve Seamless Data
Flow Easier Data Sharing Solid Decision
Tools Data Consumer Requirements Satisfied
IT Enjoys Reduced Database Linkage
Maintenance Focusing Efforts Behind One Client
One Integration Point Chemistry ELN Document
Management Pipelining Applications Modeling
Advances
Assay Data
Chemistry Data
Calculated Properties
9Strategy Collaborate With Synaptic Science
SEURAT Structure Exploration Utility for Rational
Therapeutics
www.synapticscience.com
- Spun out of Celera in Mid-2006 Two top 15 Pharma
deployments - Team includes Comp. Chemists, Lab Automation
Engineers, and Software Engineers - Seurat grew up at Celera over 3 years (150 users
on 10 projects) - Discovery company heritage means it does a
thoughtful job of satisfying many data consumer
needs
10Simple Search Interface Across Multiple DBs
The most commonly performed searches are easy to
do!
11Integrated Project Management
12Easy Keyword Searching To Find Data
13Grid Display and Adding Data
14Results
15Property Calculations Can Also Be Added
ChemAxon LibMCS Clustering Calls
Pipeline Pilot Calls
16LibMCS Clustering By MCS
Largest subgraph shared by several molecular
structures
Slide Credit ChemAxon
17LibMCS GUI
Slide Credit ChemAxon
18Benefit Sensible Clustering, Fast Performance
Slide Credit ChemAxon
19Property Calculations Can Also Be Added
Calls ChemAxons LibMCS API
20Single Column (Cluser ).(Cluster Member)
21Pull Out The Cluster Number, Add Other Data
22Plot and Analyze Clusters
23Two-Dimensional Data Headache!
Assays in Alphabetical Order
Compounds Ordered 1 through N
Data is sparse, is there any order to it?
24Same Data Clustered By Data Existence In Seurat
Assays Reordered
Compounds Reordered
Id rather look at this one, thanks!
25Summary
- Satisfying the needs of data consumers in Drug
Discovery is a significant challenge and is
certainly one place to look to understanding
Pharmas Innovation Gap - Historic solutions for data consumers have not
enabled organizations to move beyond a project
leader knows all paradigm - Making a real impact requires getting the data
consumer requirements right, going beyond basic
data retrieval and analysis methods - Seurat from Synaptic Science addresses these
requirements, and integration of ChemAxon tools
such as LibMCS can provide powerful additional
analysis possibilities