Title: The cancer Biomedical Informatics Grid caBIG: Enabling the patientcentric molecular medicine revolut
1The cancer Biomedical Informatics Grid (caBIG)
Enabling the patient-centric molecular
medicine revolution
- Ken Buetow
- NCICB/NCI/NIH/DHHS
2NCI biomedical informatics
- Goal A virtual web of interconnected data,
individuals, and organizations redefines how - research is conducted
- care is provided
- patients/participants interact with the
biomedical research enterprise
3cancer Biomedical Informatics Grid (caBIG)goals
- Common, widely distributed infrastructure permits
cancer research community to focus on innovation - Shared vocabulary, data elements, data models
facilitate information exchange - Collection of interoperable applications
developed to common standard - Raw cancer research data is available for mining
and integration
4caBIG a new way of doing business
- Coordinated development
- Active management
- Community directed
- Common services
5Boundaries and Interfaces
- focus on boundaries, interfaces, how things fit
together, - not on the internal details of how theyre built
assume that will be diverse changing
6cancer Common Ontologic Representation
Environment (caCORE)
- Information integration
- Cross-discipline reasoning
biomedical objects
common data elements
controlled vocabulary
7caBIG infrastructure joins diverse data within
an organization
8caBIG will facilitate sharing of infrastructure,
applications, and data
9caBIG networks the componentsof the cancer
enterprise
Local network
10caBIG deliverables
- Componentized, standards-based Clinical Trials
Management System - e-IND filing/regulatory reporting with FDA
- Electronic management of trials
- Integration of diverse trials
- Tissue Management System
- Systematic description and characterization of
tissue resources - Ability to link tissue resources to clinical and
molecular correlative descriptions - Plug and Play analytic tool set
- microarray
- proteomics
- pathways
- data analysis and statistical methods
- gene annotation
- Diverse library of raw, structured data
11Current caBIG community
- NCI-designated Cancer Centers (50)
- Academic Centers (integrated into broader
biomedical infrastructure) - Stand-alone (community leaders)
- Community outreach
- Government
- Industry
- International Groups
- gt700 active participants
12Four Domain Workspaces and two Cross Cutting
Workspaces have been launched
DOMAIN WORKSPACE 1 Clinical Trial Management
Systems
addresses the need for consistent, open and
comprehensive tools for clinical trials
management.
DOMAIN WORKSPACE 2 Integrative Cancer Research
provides tools and systems to enable integration
and sharing of information.
DOMAIN WORKSPACE 3 Tissue Banks Pathology Tools
provides for the integration, development, and
implementation of tissue and pathology tools.
DOMAIN WORKSPACE 4 Imaging
provides for the sharing and analysis of in vivo
imaging data.
responsible for evaluating, developing, and
integrating systems for vocabulary and ontology
content, standards, and software systems for
content delivery
CROSS CUTTING WORKSPACE 1 Vocabularies Common
Data Elements
developing architectural standards and
architecture necessary for other workspaces.
CROSS CUTTING WORKSPACE 2 Architecture
13caBIG Products and Data
- caArray Cancer microarray management system
- C3D - Trials data capture application
- Structured Protocol Representation Prototype -
Standardized clinical trial protocol model - caWorkbench - Microarray analysis suite
- caTIES - Automated free-text pathology data
extraction tool
Q1
- caCore Curriculum - Essential training on caBIG
infrastructure - Gene Ontology Miner (GOMiner) - Tool for
aggregate analysis of gene sets - caTISSUE Core Prototype - Database and specimen
tracking system - caTISSUE Prototype - Biological annotation and
mapping system for specimens - C3PR - Clinical trial participant registry tool
Q2
- RProteomics - MALDI-TOF proteomics analysis tool
- caGRID Architecture Prototype
Q3
- Distance Weighted Discrimination - Microarray
data analysis integrator - Cancer Molecular Pages Prototype - Cancer gene
annotation with web-based visualization - Magellan - Tool for the analysis of heterogeneous
data types (e.g., microarray) - Visual and Statistical Data Analyzer (VISDA) -
Multivariate statistical visualization tool for
the analysis of complex data - FunctionExpress - Tool for integrated analysis
and visualization of Microarray data - Quantitative Pathway Analysis in Cancer (QPACA) -
Pathway modeling and analysis tool - TrAPSS - Disease gene mutation discovery and
analysis tool - Proteomics Laboratory Information Management
System Prototype - Pathways Tool Project - Pathway visualization
tools - HapMap - caBIG accessible map of haplotypes in
human genome - Promoter Database
Q4
Products
- Zebrafish Microarray Data Microarray data from
Zebrafish model system for human development - Q5 - Proteomics analysis tools
- Reactome (GKB) Data - Metabolic interactions
database
- Under development
- CTMS/CDUS Reporting Tool
- Trials Financial Billing Tool
- Trials Laboratory Interface
- Gene Pattern Bioinformatics Analysis Workflow
2005
2006
Q1
Q2
Q3
Q4
Data
Q4
- Curated Cancer Pathways Data - Data sets
generated from 60 human cancer cell lines - Ovarian Tumor Gene Expression Data
- SEED - Genome annotation
- NCI-60 Data - Data sets generated from 60 human
cancer cell lines
Q3
Q1
- Clnical Trials
- Cancer microarray data sets
- Protein Information Resources (PIR) - Protein
sequence and annotation database - Additional Human Cancer and Non-cancer Data Sets
14Clinical Research IT Infrastructure
External Reporting
Clinical Systems
Clinical Trials
TranslationService
etc.
HL7-v3, Janus
HL7-v3, Janus
HL7- v2.x,other
Labs, EMR, Tissue, etc.
HL7- v3
Adverse Events
ClinicalResearchInformation Exchange
HL7 trans-actionaldatabase
HL7/CAM SDK
Protocol Authoring
FDA
Participant Registry
SPONSOR
RDC
NCI
Clinical Data Mgmt
other
PatientHealthRecord
ResearchDataWarehouse
De-identification Services
15C3PR Participant Registration
16(No Transcript)
17Clinical Research Information Exchange (CRIX)
- Electronic submissions
- Investigator Registry
- eIND
- eNDA
- eBLA
- Controlled Data Sharing
18caIMAGE Cancer Images Database
- caIMAGE allows researchers to submit and retrieve
images and annotations. - Images are streamed for efficient access.
- Researchers can search images based on tissue and
diagnosis and experiment information. - Use of common terminology originating from the
NCI Enterprise Vocabulary Server (EVS).
19caTISSUE Core Register Specimen Group
20(No Transcript)
21Pathway Database
- Enhance value of imperfect, but available,
pathway knowledge - Make biological assumptions explicit
- Combine sources of data (e.g. KEGG, BioCarta,
...) - Merge data from separate pathways
- Build a causal framework to support (future)
quantitative simulation/analysis
22repositories
Gene Expression Data
Tissue Bank
Research Center
caCORE - caBIO - caDSR - EVS
NCICB
Data Mart
Clinical Data
Gene Expression Data
- Data Services
- Analytical Services
- Annotation Services
- Service Advertisement
- Service Discovery
- Service Query
- Semantic mapping
- Security Services
Clinical Data
Proteomics Data
Analysis Tools
Research Center
Genomics Data
23REMBRANDT Building a robust translational
research framework for brain tumor
studiesREpository of Molecular BRAin Neoplasia
DaTa
http//rembrandt.nci.nih.gov
24Rembrandt Knowledgebase
caIntegrator -DataMart
Expression array data
Better understanding Better treatments
Clinical data
caBIG Analytic Tools
25Rembrandt Application - Homepage
- Highlights
- Web-based application
- Secure login
26Gene Expression Query page
- Searchable Fields
- Gene symbol
- Differential Fold Change (between tumor group
and Normal group) - Chromosomal Region (Cytoband or bp position)
- Clone ID/Probe Set ID
- GO Classification
- Pathways
- Array platform
- (Oligo vs cDNA arrays)
- Clone Location
- (for cDNA arrays)
27Group Multiple Queries
- Combine different queries
- Gene expression queries can be grouped with
Clinical and Comparative Genomic queries - Queries are validated before submission
28Survival plot
29Gene Expression plot
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42NCRI Strategic Framework for Developing Cancer
Informatics in the UK
- Approach to creating UK Cancer Informatics
Platform allowing data flow from all areas of
cancer research - The NCRI Strategic Framework
- Sets out a national vision for cancer informatics
- Maps out ongoing activity
- Provides a focus for future funding
- Develop coherent approaches to ethics,
data-sharing etc
43(No Transcript)
44Sample NCRI and caBIG synergies
DOMAIN WORKSPACE 1 Clinical Trial Management
Systems
CancerGRID, eRDC
DOMAIN WORKSPACE 2 Integrative Cancer Research
MyGrid
DOMAIN WORKSPACE 3 Tissue Banks Pathology Tools
Imaging and Pathology Project
DOMAIN WORKSPACE 4 Imaging
E-DiaMoND projects
CROSS CUTTING WORKSPACE 1 Vocabularies Common
Data Elements
CLEF Services
CROSS CUTTING WORKSPACE 2 Architecture
Platform Reference Model
45acknowledgements
- NCICB
- Peter Covitz
- Sue Dubman
- Leslie Derr
- Mary Jo Deering
- Carl Schaefer
- Mervi Heiskanen
- Denise Hise
- Kotien Wu
- Frank Hartel
- LPG/CCR
- Michael Edmundson
- Bob Clifford
- Cu Nguyen
http//ncicb.nci.nih.gov http//cmap.nci.nih.gov h
ttp//caBIG.nci.nih.gov