caBIG%20Data%20Structures - PowerPoint PPT Presentation

About This Presentation
Title:

caBIG%20Data%20Structures

Description:

caBIG background (5 min, 8 s) Goals, program ... caBIG compatibility (30 min, 21 s demonstration) ... Grouper. authorize. caBIG Compatibility ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 95
Provided by: mathcs
Category:

less

Transcript and Presenter's Notes

Title: caBIG%20Data%20Structures


1
caBIG Data Structures
  • CS584 Lecture on 4/6/2007

Patrick McConnell Duke Comprehensive Cancer
Centerpatrick.mcconnell_at_duke.edu
2
Agenda
  • caBIG background (5 min, 8 slides)
  • Goals, program structure, organizations
  • caTRIP background (5 min, 6 slides)
  • Background, use cases, architecture
  • caBIG compatibility (30 min, 21 slides
    demonstration)
  • Interoperability, compatibility, syntactics, and
    semantics
  • Building caBIG compatible systems (10 min, 7
    slides)
  • Interoperability, compatibility, syntactics, and
    semantics
  • caGrid (10 min, 8 slides)
  • Background, service creation, metadata
  • caTRIP demonstration (10 min, 2 slides demo)
  • Demonstration
  • Discussion/questions (5 min throughout)

3
caBIG Background
  • Goals, program structure, organizations

4
caBIG backgroundBiomedical information tsunami
  • overwhelming volume of data
  • multitude of sources

5
caBIG backgroundInformatics tower of Babel
  • Each cancer research community speaks its own
    scientific dialect
  • Integration critical to achieve promise of
    molecular medicine

6
caBIG backgroundGoals and principles
  • 50 Cancer Centers are working towards a common
    goal of integrated data, tools and methodologies
    to accelerate cancer research goals at the
    National Cancer Institute for Bioinformatics
    (NCICB), the cancer Biomedical Informatics Grid
    (caBIG)
  • The goal of caBIG is to create a virtual web of
    interconnected data, individuals, and
    organizations which will
  • redefine how research is conducted
  • care is provided
  • patients / participants interact with the
    biomedical research enterprise
  • The principles driving caBIG are
  • Open Source
  • Open Access
  • Open Development
  • Federated Model

7
caBIG backgroundcaBIG facilitates sharing
8
(No Transcript)
9
caBIG backgroundWorkspaces
DOMAIN WORKSPACE 1 Clinical Trial Management
Systems
addresses the need for consistent, open and
comprehensive tools for clinical trials
management.
DOMAIN WORKSPACE 2 Integrative Cancer Research
provides tools and systems to enable integration
and sharing of information.
DOMAIN WORKSPACE 3 Tissue Banks Pathology Tools
provides for the integration, development, and
implementation of tissue and pathology tools.
DOMAIN WORKSPACE 4 Imaging
provides for the sharing and analysis of in vivo
imaging data.
responsible for evaluating, developing, and
integrating systems for vocabulary and ontology
content, standards, and software systems for
content delivery
CROSS CUTTING WORKSPACE 1 Vocabularies Common
Data Elements
developing architectural standards and
architecture necessary for other workspaces.
CROSS CUTTING WORKSPACE 2 Architecture
10
caBIG backgroundCommunities
Ohio State University-Arthur G. James/Richard
Solove Oregon Health and Science
University Roswell Park Cancer Institute St Jude
Children's Research Hospital Thomas Jefferson
University-Kimmel Translational Genomics Research
Institute Tulane University School of
Medicine University of Alabama at
Birmingham University of Arizona University of
California Irvine-Chao Family University of
California, San Francisco University of
California-Davis University of Chicago University
of Colorado University of Hawaii University of
Iowa-Holden University of Michigan University of
Minnesota University of Nebraska University of
North Carolina-Lineberger University of
Pennsylvania-Abramson University of
Pittsburgh University of South Florida-H. Lee
Moffitt University of Southern
California-Norris University of
Vermont University of Wisconsin Vanderbilt
University-Ingram Velos Virginia Commonwealth
University-Massey Virginia Tech Wake Forest
University Washington University-Siteman Wistar Ya
le UniversityNorthwestern University-Robert H.
Lurie
9Star Research Albert Einstein Ardais Argonne
National Laboratory Burnham Institute California
Institute of Technology-JPL City of Hope
Clinical Trial Information Service (CTIS) Cold
Spring Harbor Columbia University-Herbert
Irving Consumer Advocates in Research and
Related Activities (CARRA) Dartmouth-Norris
Cotton Data Works Development Department of
Veterans Affairs Drexel University Duke
University EMMES Corporation First Genetic
Trust Food and Drug Administration Fox Chase
Fred Hutchinson GE Global Research
Center Georgetown University-Lombardi IBM Indiana
University Internet 2 Jackson Laboratory Johns
Hopkins-Sidney Kimmel Lawrence Berkeley
National Laboratory Massachusetts Institute of
Technology Mayo Clinic Memorial Sloan
Kettering Meyer L. Prentis-Karmanos New York
University
11
caBIG backgroundDukes role in caBIG
  • Pankaj Agarwal
  • Bob Annechiarico
  • Bill Banks
  • Vijaya Chadaram
  • Jamie Cuticchia
  • Raj Dash
  • Mohammad Farid
  • Seth Fehrs
  • Patrick McConnell
  • Salvatore Mungal
  • Mark Peedin
  • CALGB
  • CCR
  • Coalition of Cooperative Groups
  • Dana Farber
  • Georgetown
  • Mayo
  • Oregon Health Sciences University
  • Integrative Cancer Research
  • Workspace participant
  • RProteomics developer
  • caTRIP developer
  • Architecture
  • Workspace participant
  • caGrid developer
  • caGrid scientific liaison
  • Guide to Mentors
  • Vocabularies and Common Data Elements
  • Workspace participant
  • Guide to Mentors
  • Clinical Trials Management Systems
  • Workspace participant
  • C3PR developer
  • CTMS Interoperability architect
  • C3D developer
  • Tissue Banking and Pathology Tools
  • Workspace participant

12
The Cancer Translational ResearchInformatics
Platform (caTRIP)
  • Background, use cases, architecture

13
caTRIPWho is involved?
  • Duke Bioinformatics
  • Jamie Cuticchia (PI)
  • Patrick McConnell (lead architect)
  • Duke Information Systems
  • Bob Annechiarico (PM)
  • Wilma Stanley (developer)
  • Mark Peedin (developer)
  • Mohamad Farid (DBA)
  • Jeff Allred (IT manager)
  • Duke Pathology
  • Raj Dash (domain expert)
  • Chris Hubbard (developer)
  • Duke Oncology
  • Kelley Marcom (domain expert)
  • Gretchen Kimmick (domain expert)
  • Kimberly Blackwell (domain expert)
  • Lee Wilke (domain expert)
  • Duke CALGB
  • Kimberly Johnson (DataMart liaison)
  • SemanticBits
  • Ram Chilukuri (lead developer)
  • Srini Akkala (developer)
  • Sanjeev Agarwal (developer)
  • 5 AM Solutions
  • Bill Mason (developer)
  • NCI
  • Julie Klemm (ICR WS lead)
  • Carl Shaefer (NCI rep)
  • Subha Madhavan (caIntegrator PM)
  • BAH
  • Curtis Lockshin
  • Mehul Shah (tech support)

Managers and Architects
Software Developers
Database Developers and IT
NCI/BAH
Domain Experts
14
caTRIP What is translational research?
  • Bench-to-Bedside
  • Wikipedia (the source of all knowledge)Translat
    ional medicine is a branch of medical research
    that attempts to more directly connect basic
    research to patient care.
  • Basic research occurs in the lab
  • Patient care occurs in the clinic
  • Translational research broadenedTranslational
    medicine can also have a much broader definition,
    referring to the development and application of
    new technologies in a patient driven environment
    - where the emphasis is on early patient testing
    and evaluation.facilitate the interaction
    between basic research clinical medicine,
    particularly in clinical trials.

15
caTRIP Initial focus
  • Our initial focus will be on connecting existing
    data systems, including basic science data, to
    enhance patient care
  • Initial problem scenario outcomes analysis
  • Use data from existing patients to inform the
    treatment of another patient
  • Leverage clinical, pathology, tissue, and basic
    science data
  • ScenarioPatient A enters the clinic. What
    treatments were applied with success on other
    patients with similar characteristics (race, sex,
    symptoms, pathology results, adverse events,
    biomarkers).

16
caTRIP Broadened focus scientific use cases
  • Find available tumor tissue
  • What are all the tissue specimens from her2/neu
    positive patients that have a primary tumor in
    the breast and are BRCA1 positive?
  • Find factors of survival
  • What are all the ER positive patients that have
    survived breast cancer after radiation treatment?
  • Find patients for trials
  • What are all the patients that are triple
    negative (ER, PR, and HER2/NEU negative)?
  • Determine the distribution of disease factors
    over time
  • Does a change in pathology biomarkers over time
    contribute to recurrence or death?
  • Determine correlation of factors pre and post
    surgery
  • Does a change in ER or PR status before and after
    surgery correlate with other factors?
  • Find pathology reports of interest
  • Show me all of the pathology reports for Her2/Neu
    positive patients with a lobular carcinoma.

17
caTRIP Connecting disparate data systems
CAEPathology Biomarkers
Tumor RegistryDiagnosis, Treatment, Recurrence,
Follow-up
caTissue CORETissue Bank
MRN
caTRIP
caTRIP
caTRIP
caTRIP
caTIESPathology Reports
caIntegratorSNP Data
18
caTRIP Architecture overview
Distributed Query Engine
query
GUI
authenticate
discover
Domain Grid Services
Core Grid Services
authorize
CAE
caTissueCORE
CGEMSSNP
caTIES
TR
IdPService
GridGrouper
IndexService
Duke
caTIES
TR
caTissue CORE
CAE
caIntegrator
Domain Controller
Illumina
MAW3
Tumor Registry
19
caBIG Compatibility
  • Interoperability, compatibility, syntactics, and
    semantics

20
caBIG compatibility Interoperability defined
Courtesy Charlie Mead
  • ability of a system to access and use the parts
    or equipment of another system

Semanticinteroperability
Syntacticinteroperability
21
caBIG compatibility How does this apply to caBIG?
  • Connect scientists and practitioners through a
    shareable and interoperable infrastructure
  • Develop standard rules and a common language to
    more easily share information (compatibility
    guidelines)
  • Build or adapt tools for collecting, analyzing,
    integrating, and disseminating information
    associated with cancer research and care.
  • The cancer community is united in its mission to
    eliminate suffering and death due to cancer. It
    is now connected by caBIG.

22
caBIG compatibility What is compatibility in
caBIG?
  • The four areas of the caBIG compatibility
    guidelines
  • Information Models - Individual types of data
    are rarely collected or presented in isolation.
    Rather, they are assembled into a contextual
    environment that includes closely and more
    distantly associated data and information. These
    associations and relationships can be presented
    in the form of an information model.
  • CDEs - Data that is collected on a given study or
    trial must be defined and described such that
    remote users of that data can understand what it
    means. These metadata descriptions are referred
    to as data elements.
  • Vocabularies and Ontologies - Biomedical
    information includes a substantial body of
    specialized concepts that are represented by
    terms. Agreement upon the basic concepts, terms
    and definitions that are inherent in all
    biomedical information is essential for achieving
    semantic interoperability.
  • Programming and Messaging Interfaces - Computer
    programs and the people who write them are able
    to access resources from other programs through
    programming and messaging interfaces. Each of
    these interfaces responds to a particular syntax
    for its communications. Agreement upon standards
    for these interfaces is necessary to overcome
    barriers to syntactic interoperability.

23
caBIG compatibility Levels of compatibility
  • The four levels of the caBIGTM compatibility
    guidelines
  • Legacy - Implies no interoperability with an
    external system or resource. A system that was
    designed without awareness of or prior to the
    availability of these compatibility guidelines,
    and which does not meet any of the requirements
    for interoperability.
  • Bronze - Classifies the minimum requirements that
    must be met to achieve a basic degree of
    interoperability.
  • Silver - A rigorous set of requirements that,
    when met, significantly reduce the barrier to use
    of a resource by a remote party who was not
    involved in the development of that resource.
  • Gold - Currently being defined by caBIG. Is
    expected to provide for a formalized grid
    architecture and data standards that will enable
    standardized advertising, discovery, and use of
    all federated caBIG resources.

24
caBIG compatibility caBIG compatibility
guidelines
Syntactic
Semantic
Semantic Syntactic
25
caBIG compatibility Syntactic interoperability
  • The solution for syntactic interoperability in
    caBIG at the silver level of compatibility is for
    all systems to provide an Object Oriented
    Application Programmer Interface (API).
  • Object Oriented Interfaces can be implemented in
    many programming languages.
  • This interface can be connected to the caGrid so
    that the local data repository is globally
    accessible in a language independent way.
  • The interface is described by an information
    model, which acts as the junction between the
    syntactic components and the semantic components.

26
caBIG compatibility Programming and messaging
interfaces
  • Types of APIs
  • Client APIs in a programming language
  • Messaging APIs via a messaging protocol
  • Types of systems
  • Data services provide access to an information
    model
  • Query method
  • Associations are traversable
  • Analytical services provide methods tomanipulate
    data
  • Hybrid services provide methods to manipulate
    information models
  • Analytical tools consumer of silver compatible
    data, but dont produce it

27
caBIG compatibility Programming and messaging
interfaces details
Legacy Bronze Silver Gold
No programmatic interfaces to the system are available. Only local data files in a custom format can be read Data transfer mechanisms implemented only on an ad hoc basis Programmatic access to data from an external resource is possible. Well-described APIs provide access to data in the form of data objects. Standards-based electronic data formats are supported for both input to and output from the system. Standards-based messaging protocols are supported wherever messaging is relevant. All features of Silver, plus Service-oriented components produce or consume resources in the form of grid services Interoperable with data grid architecture to be defined by caBIG
Examples Examples Examples Examples
Executables Proprietary API/data format JavaDocs XML, ASN.1 SOAP, CORBA Globus caGrid-based services
28
caBIG compatibility caTRIP API
Hyperlinks to caTRIP API docs
29
caBIG compatibility caTRIP grid service WSDL
Hyperlinks to caTRIP API WSDL
30
caBIG compatibility caTRIP grid service WSDL
Hyperlinks to caTRIP FQP UML
31
caBIG compatibility Semantic interoperability
  • The Solution for semantic interoperability lies
    in object oriented UML design of the service, an
    unambiguous description of elements within the
    system and storage of the description in a
    publicly accessible repository (metadata).
  • UML model
  • Use of publicly accessible terminologies/
    vocabularies/ontologies (EVS-NCI Thesaurus)
  • Use of publicly accessible metadata repository
    (caDSR)

32
caBIG compatibility Common data element (CDE)
details
Legacy Bronze Silver Gold
No Structured metadata is recorded Data element descriptions have sufficient detail for a subject matter expert to unambiguously interpret Data elements are built using controlled terminology Metadata is stored and publicized in an electronic format that is separate from the resource that is being described Common Data Elements (CDEs) built from controlled terminologies and according to practices validated by the VCDE workspace are used throughout. CDEs are registered as ISO/IEC 11179 metadata components in the cancer Data Standards Repository (caDSR) All features of Silver, plus Common Data Elements (CDEs) designated as caBIG Standards by the VCDE workspace are used. Metadata is advertised and discoverable via the caBIG grid services registry
Examples Examples Examples Examples
Free-text pathology reports GeneOntology from GO website NCI Thesaurus GeneOntology registered in EVS NCI Thesaurus
33
Enterprise Vocabulary Services
caBIG compatibility Metadata stored in caDSR
  • Storage of Metadata
  • caDSR cancer Data Standards Repository
  • Common Data Elements CDEs
  • Enable end-users to access information about data
    and services without having to access human
    developers
  • Fusion of UML models Concepts/Definitions

34
caBIG compatibility caTRIP CDEs
Hyperlinks to caTRIP CDEs
35
caBIG compatibility Vocabulary/terminology
details
Legacy Bronze Silver Gold
Free text used throughout for data collection Use of publicly accessible controlled vocabularies as well as local terminologies. Terminologies must include definitions of terms that meet caBIG VCDE workspace guidelines Terminologies reviewed and validated by the caBIG Vocabulary/Common Data Element (VCDE) Workspace used for all relevant data collection fields. All features of Silver, plus Full adoption of caBIG terminology standards as approved by the VCDE Workspace.
Examples Examples Examples Examples
Free-text pathology reports GeneOntology from GO website NCI Thesaurus GeneOntology registered in EVS NCI Thesaurus
36
Enterprise Vocabulary Services
caBIG compatibility Publicly accessible
terminologies
  • Controlled vocabulary resources for the cancer
    research community
  • Vocabulary Products and Services
  • NCI Thesaurus
  • NCI Metathesaurus
  • External Vocabularies
  • NCI Thesaurus - controlled vocabulary source for
    metadata
  • Has excellent coverage of cancer terminology
  • Expands based on needs for additional terminology
  • Based on concepts rather than terms
  • Each concept has a unique identifier or CUI with
    definitions and synonym
  • Housed by the Enterprise Vocabulary Service (EVS)
  • LexBIG
  • a caBIG-funded vocabulary server to enable a
    Federated Vocabulary environment.

37
caBIG compatibility caTRIP CDEs
Hyperlinks to a caTRIP concept
38
caBIG compatibility Information model (UML)
details
Legacy Bronze Silver Gold
No model describing the system is available in electronic format Diagrammatic representation of the information model is available in electronic format. Information models are defined in UML as class diagrams and are reviewed and validated by the VCDE workspace. All features of Silver, plus Information models are harmonized across the caBIG Domain Workspaces
Examples Examples Examples Examples
Database diagram
39
caBIG compatibility Domain information modeling
  • A Domain Information Model is a representation of
    our understanding of an area of knowledge.
  • Domain Information Models consist of Classes
    that represent things in the real world
  • Classes contain attributes that are
    characteristics of different instances of things
    in the real world.
  • Relationships between the classes are described
    by associations and indicated by lines with
    directionality and cardinality
  • Each class plus attribute creates one Common Data
    Element (CDE)

40
caBIG compatibility Tumor Registry model
Diagnosis
Participant

Collaborative Staging
Follow up and Recurrence
Hyperlinks to caTRIP UML
Treatment
41
Building caBIG Compatible Systems
42
Building caBIG compatible systemsSteps for
creating an analytical system
  • Step 1 model and register metadata
  • Model the domain objects
  • Register metadata
  • Step 2 implement the analytical system
  • Implement an interface
  • Map data objects to existing inputs
  • Plug-in analytics
  • Step 3 create the data service
  • Create an XML Schema
  • Use the caGrid 1.0 Introduce toolkit to create a
    service
  • Configure the service
  • Deploy
  • Step 4 invoke the service
  • Java-based client
  • Use caTRIP

43
Building caBIG compatible systemsSteps for
creating a data system
  • Step 1 model and register metadata
  • Model the domain objects
  • Register metadata
  • Step 2 implement the information system
  • Model the databases (via scripts or EA)
  • Build the database
  • Generate Java beans
  • Create Hibernate mappings
  • Jar it all up
  • Step 3 create the data service
  • Create an XML Schema
  • Use the caGrid 1.0 Introduce toolkit to create a
    service
  • Configure the service
  • Deploy
  • Step 4 invoke the service
  • Java-based client
  • Use caTRIP

44
Building caBIG compatible systemsN-tier
architecture
Index Service
advertise
advertise
Distributed Query Engine
CQL Query
caGrid Data Service
caCORE SDK
CQL Engine
domainmodel
Object-relational mapping
database
45
Building caBIG Compatible SystemscaCORE SDK
Vocabularies
Info Model
Common Data Elements
Messaging Interfaces/ API
46
caBIG compatibility Mapping UML to CDEs
47
caBIG compatibility Mapping UML to CDEs example
Created Data Element
Gene Entrez Gene Genomic Identifier
java.lang.String
Class Gene
Datatype String
Attribute entrezGeneID
Gene
Entrez Gene Genomic Identifier
java.lang.String
48
caBIG compatibility Use SIW to designate
existing CDEs
49
caGrid
  • Background, service creation, metadata

50
caGridWhat is caGrid?
  • What is Grid?
  • Evolution of distributed computing to support
    sciences and engineering
  • Sharing of resources (computational, storage,
    data, etc)
  • Secure Access (global authentication, local
    authorization, policies, trust, etc.)
  • Open Standards
  • Virtualization
  • What is caGrid?
  • Development project of Architecture Workspace
  • Helping define and implement Gold Compliance
  • Implementation of Grid technology
  • Leverages open standards, community open source
    projects
  • No requirements on implementation technology
    necessary for compliance
  • Specifications will be created defining
    requirements for interoperability
  • caGrid provides core infrastructure, and tooling
    to provide a way to achieve Gold compliance
  • Gold compliance creates the G in caBIG
  • Gold gt Grid gt connecting Silver Systems

51
caGridMetadata infrastructure goals
  • Support strongly typed grid
  • Syntactic and Semantic interoperability
  • Programmatic!
  • Smooth transition from Application to Grid and
    back
  • Leverage wealth of existing metadata
  • Enable service Advertisement and Discovery

52
caGridService development process
  • Service developers first create a service using a
    simple wizard to specify information (target
    directory, type of service, service name, etc)
  • Next developer locate the data types they will
    use for inputs or outputs
  • Can be discovered from the caDSR, GME, file
    system, etc
  • Operations are then defined that take some number
    of the data types as input, and produce some
    number as output
  • Metadata and Service Properties can be added and
    configured
  • The services security can be completely
    configured
  • Some or all of these steps may be automatically
    handled by extensions

53
caGrid Introduce
  • GUI for creating and manipulating a grid service
  • Provides means of simple creation of service
    skeleton that a developer can then implement,
    build, and deploy
  • Automatic code generation of complete caBIG
    compliant grid service which is configured to
    provide
  • Advertisement
  • Standard Metadata
  • Security
  • Complete Client API

54
caGridSteps for creating a data system
  • Step 1 model and register metadata
  • Model the domain objects
  • Register metadata
  • Step 2 implement the information system
  • Model the databases (via scripts or EA)
  • Build the database
  • Generate Java beans
  • Create Hibernate mappings
  • Jar it all up
  • Step 3 create the data service
  • Create an XML Schema
  • Use the caGrid 1.0 Introduce toolkit to create a
    service
  • Configure the service
  • Deploy
  • Step 4 invoke the service
  • Java-based client
  • Use caTRIP

55
caGridSteps for creating an analytical system
  • Step 1 model and register metadata
  • Model the domain objects
  • Register metadata
  • Step 2 implement the analytical system
  • Implement an interface
  • Map data objects to existing inputs
  • Plug-in analytics
  • Step 3 create the data service
  • Create an XML Schema
  • Use the caGrid 1.0 Introduce toolkit to create a
    service
  • Configure the service
  • Deploy
  • Step 4 invoke the service
  • Java-based client
  • Use caTRIP

56
caGridcaGrid data description infrastructure
  • Client and service APIs are object oriented, and
    operate over well-defined and curated data types
  • Objects are defined in UML and converted into
    ISO/IEC 11179 Administered Components, which are
    in turn registered in the Cancer Data Standards
    Repository (caDSR)
  • Object definitions draw from controlled
    terminology and vocabulary registered in the
    Enterprise Vocabulary Services (EVS), and their
    relationships are thus semantically described
  • XML serialization of objects adhere to XML
    schemas registered in the Global Model Exchange
    (GME)

57
caGridMetadata services
  • Cancer Data Standards Repository (caDSR)
  • caBIG projects register their data models as
    Common Data Elements (CDEs) which are
    semantically harmonized and then centrally stored
    and managed the caDSR
  • The caDSR grid service provides
  • Model discovery and traversal
  • caGrid standard metadata generation capabilities
  • Enterprise Vocabulary Services (EVS)
  • EVS is set of services and resources that address
    the need for controlled vocabulary
  • The EVS grid service provides
  • Query access to the data semantics and controlled
    vocabulary managed by the EVS
  • Global Model Exchange (GME)
  • GME is a DNS-like data definition registry and
    exchange service that is responsible for storing
    and linking together data models in the form of
    XML schema.
  • The GME grid service provides
  • Access to the authoritative structural
    representation of data types on the grid
  • Globus Information Services Index Service
  • The Globus Information Services infrastructure
    provides a generic framework for aggregation of
    service metadata, a registry of running Grid
    services, and a dynamic data-generating and
    indexing node, suitable for use in a hierarchy or
    federation of services
  • The Index grid service provides
  • Yellow and white pages for the grid

58
caGridcaGrid production environment
59
The Cancer Translational ResearchInformatics
Platform (caTRIP)
  • Demonstration

60
caTRIP Clinical and research scenarios
  • Clinical scenario for demonstration
  • A patient enters the clinic and is diagnosed with
    a lobular carcinoma
  • The Her2/Neu biomarker test comes back positive
  • What are the treatments and outcomes of other
    patients with similar characteristics?
  • Query for diagnosis date, treatment, treatment
    date, survival, recurrence, and BRCA1 and BRCA2
    status
  • Look for treatments given with success and
    correlation between BRCA status in case test
    should be ordered
  • Research scenario for demonstration
  • Is there a correlation between recurrence,
    mortality, histologic grade, and Her2/Neu status
    for breast cancer patients diagnosed with lobular
    carcinoma?
  • Query caTRIP for recurrence type, date of death,
    histologic grade, and Her2/Neu status for
    patients diagnosed with lobular carcinoma
  • Correlation is determined in Microsoft Excel
  • Investigate gene biomarkers that correlate with a
    Her2/Neu status of negative and survival
  • Query caTRIP for all available tissue to order
    for microarray experiments
  • Query sharing
  • What are all the triple negative patients?

61
caTRIP Why the Simple GUI?
  • What are all the tissue specimens from her2/neu
    positive patients that have a primary tumor in
    the breast and are BRCA1 positive?

caTissue CORE
CAE
Participant Medical Record Number
CGEMS
Tumor Registry
62
Discussion/questions
63
Backup Slides
64
CTMS Interoperability Project
  • Goals, scope, BRIDG, architecture, demo

65
CTMSiA collaborative effort
  • 11 Organizations
  • Booz Allen Hamilton
  • Dana-Farber
  • Duke University
  • Ekagra
  • Harvard University
  • Mayo Clinic
  • NCICB
  • Nortel Government Solutions
  • Northwestern University
  • ScenPro
  • SemanticBits
  • 8 Locations
  • Maryland
  • Minnesota
  • Virginia
  • Georgia
  • Massachusetts
  • 35 Team Members / 5 Applications
  • Cancer Central Clinical Participant Registry
    (C3PR)
  • Cancer Central Clinical Database (C3D)
  • Patient Study Calendar (PSC)
  • caXchange LabViewer and the Clinical Trials
    Object Model (CTOM)
  • Cancer Adverse Events Reporting System (caAERS)
  • 8 Roles
  • Analysts
  • Architects
  • Developers
  • Project Director
  • Project Manager
  • Project Sponsor
  • Project Tech Leads
  • Subject Matter Experts

66
CTMSi Credits
  • Project Director
  • Meg Gronvall (BAH)
  • Charles N. Mead, M.D. (BAH)
  • NCICB CTMS Lead
  • Christo Andonyadis, D.Sc. (NCICB)
  • Project Manager
  • Edmond Mulaire (SemanticBits)
  • Project Architects
  • Patrick McConnell (Duke)
  • Niket Parikh (BAH)
  • Analysts
  • Smita Hastak (ScenPro)
  • Wendy Ver Hoef (ScenPro)
  • Subject Matter Experts
  • Project Technical Leads
  • Ram Chilukuri (SemanticBits)
  • Charles Griffin (Ekagra)
  • Vinay Kumar (SemanticBits)
  • Stephen Reckford (Nortel Government Solutions)
  • Rhett Sutphin (Northwestern)
  • Sean Whitaker (Northwestern)
  • caAERS Ram Chilukuri (SemanticBits), Krikor
    Krumlian
  • (Akaza Research), Vinay Kumar (SemanticBits),
    Rhett
  • Sutphin (Northwestern), Kulasekaran Sethumadhavan
  • (SemanticBits), Sujith Thayylithodi
    (SemanticBits)
  • caGrid Manav Kher (SemanticBits), Vinay Kumar
  • (SemanticBits), Joshua Phillips (SemanticBits)
  • caXchange (Lab Viewer/CTOM) Charles Griffin
  • (Ekagra), Smita Hastak (ScenPro), Mukesh
    Mediratta
  • (Ekagra), Kunal Modi (Ekagra), Wendy Ver Hoef

67
CTMSi Goal
Lab Results
Participant Registration
Patient Scheduling
Adverse Events
Clinical Trials DB
68
CTMSi BRIDG extract
Labs
Subject
AdverseEvents
Eligibility
Study
Site
69
(No Transcript)
70
CTMSi Architectural overview
AuthenticationTrustAuthorization
Messages
caXchange
caGrid
Enterprise Service Bus
InboundBindingComponent
OutboundBindingComponent
Routing Rules
GTS
Dorian
Grid Grouper
71
CTMSi Demonstration
72
Service Metadata All Services
  • Common Service Metadata
  • Provided by all services
  • Details services capabilities, operations,
    contact information, hosting research center
  • Service operations inputs and outputs defined in
    terms of structure and semantics extracted from
    caDSR and EVS
  • Majority auto-generated by Introduce

73
Service Metadata Service Security
  • Service Security Metadata
  • Provided by all services
  • Details the services requirements on
    communication channel for each operation
  • Can be used by client to programmatically
    negotiate an acceptable means of communication
  • For example Does operation X allow anonymous
    clients, or are credentials required?
  • Auto-generated by Introduce

74
Service Metadata Data Service
  • Data Service Metadata
  • Provided by all data services
  • Describes the Domain Model being exposed, in
    terms of a UML model linked to semantics
  • Provides information needed to formulate the
    Object-Oriented Query
  • As with common metadata, data types defined in
    terms of structure and semantics extracted from
    caDSR and EVS
  • Auto-generated by Introduce

75
caTRIP in-depth ArchitectureSecurity
authorization
User Grid Certificate
Grid Data Service
authentication
User Credentials
SAML Assertion
Dorian
CSM
Trust Fabric
caGrid Authentication Service
backenddata
GridGrouper
Duke Authentication Plugin
Duke Domain ControllerNT Security
76
caTRIP in-depth Data sharingChallenges in data
sharing
  • Building data-oriented systems
  • Duke requires IRB approval to gain access to
    identifiable data
  • We worked around by leveraging people already on
    IRB protocols
  • Deidentifying data
  • Data is owned by different groups across the
    cancer center
  • Traditional deidentification data manager
    deidentifies an entire dataset then throws away
    the key
  • Distributed deidentification trusted service
    provider (TSP) deidentifies discreet values
  • Traditional approach is not scalable requires a
    middle-man
  • IRB approval required for distributed approach
    because it deviates from traditional
    deidentification (at Duke)

77
caTRIP in-depth Data sharingDistributed
deidentification
Secure connection
MRN3
MRN3
GHI789
GHI789
Trusted Service Provider
Has IRB approval to see identifiable data
Has IRB approval to see identifiable data
Has IRB approval to store identifiable data
PHI DEID
MRN1 ABC123
MRN2 DEF456

. . . . . .
PHI DEID
MRN1 ABC123
MRN2 DEF456
MRN3 GHI789
. . . . . .
Randomly generated
78
caTRIP in-depth ArchitectureSimple GUI
configuration
Service A
Service B
TissueSpecimen
SpecimenCollectionGroup
BreastCancerBiomarkers
Target
ClinicalReport
Linking Object Join Condition
Associated Classes
ParticipantMedicalIdentifier
Filter Object
Association Direction
SpecimenCharacteristics
Foreign Association inbound Paths
Associated Object Tree
Linking Object Join Condition
Join Condition CDE ex. MRN
Target
Association Direction
Service A
Service B
Foreign Association Outbound Path
Foreign Association
79
caTRIP in-depth ArchitecturecaBIG compatibility
  • Challenge
  • Silver-compatibility is in some ways (and for
    good reason) stringent
  • Grid technologies were still in development
    (caGrid 1.0 is now released)
  • caTRIP is a silver-compatible application (in
    theory)
  • Compatibility submission package completed
  • Going through review now for silver-compatible
    data services
  • caTRIP leverages caCORE technologies
  • Common Security Module (CSM) provides
    authorization
  • caCORE-SDK provides tooling to create Java
    classes from UML (XMI), XML schemas, and castor
    mappings
  • caTRIP leverages caGrid technologies
  • Index Service provides advertisement and
    discovery
  • Authentication Service provides
  • Dorian helps provide authentication
  • GTS provides trust fabrics

80
Next steps
  • Aggregate data from multiple services of the same
    type
  • Scenario caTissue Suite deployed at 13 cancer
    centers
  • Add datasets and data types
  • CTMS, population sciences, basic science, etc.
  • Add analytical services
  • Integrate with workflow
  • Add visualization components
  • Enhanced reporting
  • Automate Excel pivot table
  • Data mining results
  • Enhanced querying
  • Asynchronous, parallel querying
  • Querying multiple deployed distributed query
    services
  • Continue refinement of user interface
  • Synchronization of advanced and simple GUI
  • Additional usability features

81
caGridcaBIG Resources
  • caBIG Website http//cabig.cancer.gov/index.asp
  • caBIG Compatibility Guidelines
    https//cabig.nci.nih.gov/compatibility_guidelines
    _documentation/
  • Cancer Common Ontologic Representation
    Environment (caCORE) http//ncicb.nci.nih.gov/NCI
    CB/infrastructure/cacore_overview
  • Enterprise Vocabulary Services (EVS)
    http//ncicb.nci.nih.gov/NCICB/infrastructure/caco
    re_overview/vocabulary
  • Cancer Data Standards Repository (caDSR)
    http//ncicb.nci.nih.gov/NCICB/infrastructure/caco
    re_overview/cadsr
  • caCORE Software Developers Kit (caCORE SDK)
    http//ncicb.nci.nih.gov/NCICB/infrastructure/caco
    resdk
  • caCORE Training http//ncicb.nci.nih.gov/NCICB/t
    raining/cadsr_training
  • Model Driven Architecture http//www.omg.org/mda/
  • UML Modeling http//www.sparxsystems.com.au/UML_
    Tutorial.htm

82
caTRIP Why cant I just write DCQL?
  • What are all the tissue specimens from her2/neu
    positive patients that have a primary tumor in
    the breast and are BRCA1 positive?
  • ltDCQLQuery xmlns"http//caGrid.caBIG/1.0/gov.nih.
    nci.cagrid.dcql"gt
  • ltTargetObject name"edu.wustl.catissuecore.dom
    ainobject.impl.TissueSpecimenImpl"
    serviceURL"http//152.16.96.114/wsrf/services/cag
    rid/CaTissueCore"gt
  • ltAssociation name"edu.wustl.catissuecore.
    domainobject.impl.SpecimenCollectionGroupImpl"
    roleName"specimenCollectionGroup"gt
  • ltAssociation name"edu.wustl.catissuec
    ore.domainobject.impl.ClinicalReportImpl"
    roleName"clinicalReport"gt
  • ltAssociation name"edu.wustl.catis
    suecore.domainobject.impl.ParticipantMedicalIdenti
    fierImpl" roleName"participantMedicalIdentifier"gt
  • ltGroup logicRelation"AND"gt
  • ltForeignAssociationgt
  • ltJoinConditiongt
  • ltLeftJoingt

  • ltObjectgtedu.wustl.catissuecore.domainobject.impl.P
    articipantMedicalIdentifierImpllt/Objectgt

  • ltPropertygtmedicalRecordNumberlt/Propertygt
  • lt/LeftJoingt
  • ltRightJoingt

  • ltObjectgtedu.duke.catrip.cae.domain.general.Partici
    pantMedicalIdentifierlt/Objectgt

  • ltPropertygtmedicalRecordNumberlt/Propertygt
  • lt/RightJoingt
  • lt/JoinConditiongt
  • ltForeignObject
    name"edu.duke.catrip.cae.domain.general.Participa
    ntMedicalIdentifier" serviceURL"http//152.16.96.
    114/wsrf/services/cagrid/CAE"gt
  • ltAssociation
    name"edu.duke.catrip.cae.domain.general.Participa
    nt" roleName"participant"gt

Select tissue
Foreign Join w/ CAE
HER2/NEU Positive
Foreign Join w/ Tumor Registry
Primary Site Breast
Foreign Join w/ CGEMS
BRCA1 Positive
83
caTRIPDistributed query engine
CQL
database
caGrid data service
data objects
CQL
Distributed Query Engine
DCQL
database
caGrid data service
data objects
data objects
CQL
database
caGrid data service
data objects
84
CTMSi BRIDG dynamic modeling
  • Process flow
  • story boards
  • Scenarios
  • Use cases
  • Text UML activity diagrams
  • Links to static structures
  • Interaction diagrams (?)
  • Sequence diagrams
  • Collaboration diagrams (UML 2.0)

85
CTMSi Patient registration message
JMS OUT Queue
ESB
Router
caAERS Grid Service
JMS IN Queue
PSC Grid Service
86
caBIG compatibility CDE Browser
87
caBIG compatibility CDE Browser permissible
values
88
caBIG compatibility NCI Thesaurus
Concept Code
Relationships
Preferred Name
Definition
Synonyms
89
caGrid caGrid community involvement
  • caGrid itself provides no real data or
    analysis to caBIG
  • Its the enabling infrastructure which allows the
    community to do so
  • Community members add value to the grid as
    applications, services, and processes (for
    example shared workflows)
  • caGrid provides the necessary core services,
    APIs, and tooling
  • The real value of the grid comes from bringing
    this information to the end user
  • Data Services expose data to the grid in a
    unified way
  • Analytical Services expose analytical operations
    to the grid
  • Community members develop end user applications
    which consume of the resources provided by the
    grid

90
caGridcaGrid exposing silver systems
  • Object Oriented APIs and data resources are
    developed using Object types and information
    models registered in the caDSR
  • These silver systems are grid-enabled by
    defining a grid service interface that defines
    the functionality to be exposed to the grid
  • The grid service interface uses the same Object
    types as the existing system, but leverages a
    platform and language neutral representation
    (XML) of them
  • The grid service implementation maps service
    invocations to API calls or queries into the
    existing system

91
caGridFederated Query Processor
  • Provides a mechanism to perform basic distributed
    aggregations and joins of queries over multiple
    data services
  • As caGrid data services all use a uniform query
    language, CQL, the Federated Query Infrastructure
    can be used to express queries over any
    combination of caGrid data services
  • Federated queries are expressed with a query
    language, DCQL, which is an extension to CQL to
    express such concepts as joins, aggregations, and
    target services
  • Implemented as a stateful grid service, queries
    may be executed asynchronously and results
    retrieved at a later time
  • Supports secure deployments wherein result
    ownership is enforced
  • Coupled with semantic discovery capabilities of
    caGrid, provides a powerful framework for data
    discovery, mining, and integration

92
caGridData service common query language
  • Specifies a target object (result) type and
    selects the instances which satisfy the specified
    properties and nested object properties
  • Allows path navigation
  • Provides logical grouping
  • Provides name/predicate/value filtering on
    properties of objects
  • Recursively defined
  • Ability to return full Objects, Set of
    attributes, count of results, or distinct
    attribute values

93
caGridExample CQL query
LIKE BRCA
Return all Genes with a symbol beginning with BRCA and have an associated Taxon with a scientificName equal to Homo sapiens
ltCQLQuery xmlns"http//CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery"gt ltTarget name"gov.nih.nci.cabio.domain.Gene"gt ltGroup logicRelation"AND"gt ltAttribute name"symbol" predicate"LIKE value"BRCA"/gt ltAssociation roleName"taxon name"gov.nih.nci.cabio.domain.Taxon"gt ltAttribute namescientificName" predicateEQUAL_TO valueHomo sapiens"/gt lt/Associationgt lt/Groupgt lt/Targetgt lt/CQLQuerygt
Homo sapiens
94
caBIG compatibility Metadata and concepts example
Write a Comment
User Comments (0)
About PowerShow.com