Architecture%20of%20a%20Strongly%20Typed%20Grid%20and%20Lessons%20Learned%20from%20the%20Cancer%20Research%20Community - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Architecture%20of%20a%20Strongly%20Typed%20Grid%20and%20Lessons%20Learned%20from%20the%20Cancer%20Research%20Community

Description:

Architecture of a Strongly Typed Grid and Lessons Learned from the Cancer ... Booz | Allen | Hamilton (http://www.bah.com) Arumani Manisundaram. Michael Keller ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Architecture%20of%20a%20Strongly%20Typed%20Grid%20and%20Lessons%20Learned%20from%20the%20Cancer%20Research%20Community


1
Architecture of a Strongly Typed Grid and Lessons
Learned from the Cancer Research Community
  • Joel Saltz MD, PhD
  • Departments of Biomedical Informatics and
    Pathology

2
Overview
  • What is caBIG?
  • What is caGrid?
  • Brief overview of architecture and Tooling of
    caGrid
  • Example caBIG application
  • Potential links to petascale computing

3
caBIG Application community led effort
  • Originated at NCI
  • Biologists rather than computer scientists
    created caBIG program

4
caBIG Overview
5
Sample caBIG Project Areas
Discovery Research
  • Challenge A growing volume of increasingly
    complex data, but no system in place to collect,
    aggregate, analyze and distribute.
  • sample caBIG Tools
  • caArray
  • caWorkbench
  • webGenome
  • GenePattern
  • RProteomics
  • Proteomics LIMS
  • FunctionExpress
  • TrAPSS
  • Benefits
  • Access to and integrated analysis of data from
    divergent sources
  • Increased efficiency in analyzing and
    visualizing results
  • Accelerated discovery of molecular signatures

6
caBIG Projects
Imaging
  • Challenge Existing systems provide no way to
    share or archive images to validate or
    facilitate diagnostics or prognostics.caBIG
    Tools
  • Imaging Testbed
  • caIMAGE
  • National Cancer Imaging Archives
  • Benefits
  • Digitized format enables information to be
    integrated with other molecular and clinical
    data
  • Improved clinical decision support more
    accurate, objective and reproducible

7
caBIG Projects
Infrastructure
  • Challenge Lack of a common, connecting
    infrastructure requires every institution to
    duplicate data, applications and infrastructure
  • caBIG Product
  • caGrid
  • caCORE
  • caDSR
  • EVS
  • LexGrid
  • Benefits
  • Efficient use of limited resources
  • Appropriate access to key resources throughout
    the cancer enterprise
  • Communities knowledge is electronically
    accessible

8
caGrid Overview
  • Requirements
  • Support scientific requirements Use cases from
    cancer research community
  • Support functional requirements identifiers,
    workflow, query, etc
  • Support non-functional requirements security,
    reliability, performance, etc
  • Principles
  • Driven by cancer research community requirements
  • caBIG Principles
  • Open Source, Open Access, Open Development
  • Federated
  • Syntactic and Semantic Interoperability
  • Services-Oriented Architecture
  • Metadata driven and implements Virtualization
  • Standards based

9
caGrid 1.0 Conceptual View
10
Focus on three notable features
  • Metadata management infrastructure, curation
  • Tooling to help users produce new grid services
    and clients
  • Security (no time for details but crucial and
    controversial feature of biomedical grid)

11
Biomedical Information Objects
  • Data service infrastructure developed using OMGs
    Model Driven Architecture approach
  • Object models expressed in UML represent actual
    biomedical research entities such as genes,
    sequences, chromosomes, sequences, cellular
    pathways, ontologies, clinical protocols, etc.
  • The object models form the basis for uniform APIs
    (Java, SOAP, HTTP-XML, Perl) that provide an
    abstraction layer and interfaces for developers
    to access information without worrying about the
    back-end data stores

biomedical objects
common data elements
controlled vocabulary
12
Common Data Elements
  • Structured data reporting elements
  • Precisely defining the questions and answers
  • What question are you asking, exactly?
  • What are the possible answers, and what do they
    mean?

biomedical objects
common data elements
controlled vocabulary
13
Enterprise Vocabulary
  • NCI Meta-Thesaurus (Cross-map standard
    vocabularies/ontologies, e.g. SNOMED, MEDRA,
    ICD)
  • Semantic integration, inter-vocabulary mapping
  • UMLS Metathesaurus extended with cancer-oriented
    vocabularies
  • 800,000 Concepts, 2,000,000 terms and phrases
  • Mappings among over 50 vocabularies
  • NCI Thesaurus
  • Description logic-based
  • 18,000 Concepts
  • Concept is the semantic unit
  • One or more terms describe a Concept synonymy
  • Semantic relationships between Concepts

biomedical objects
common data elements
controlled vocabulary
14
Formal semantic curation process
  • (George Komatsoulis)

15
Introduce Tooling to help users produce new grid
services and clients
  • A framework which enables fast and easy creation
    of caGrid compatible services whether they are
    data, analytical, custom, or core services.
  • Provide easy to use graphical service authoring
    tools.
  • Hide all grid-ness from the developer so that
    they can concentrate on the domain expert
    implementation.
  • Utilize best practice layered grid service
    architectures.
  • Handle all service architecture requirements of
    the caGrid.
  • Strong service interface data typing
  • Metadata and service registration
  • Grid security integration

16
Introduce
  • Requirements
  • Basic strongly typed grid requirements plus
    semantically interoperable caBIG requirements
  • Architecture
  • Grid service framework which is encapsulated and
    layered on Globus
  • Introduce Toolkit
  • Enables easy development of caBIG compliant grid
    service

17
Addressing the Requirements
  • Tool providers will describe the grid service
    interface they wish to provide.
  • Clients will do not need to be aware of any
    implementation specific details of the grid
    service
  • Introduce will enable schema extraction from a
    GME so that the wsdl, beans, and service metadata
    can be automatically populated so the service
    will be using strongly typed and publicly
    accessible data types
  • Build process will automatically generate a
    client side object oriented API
  • We will generate a wrapper for this API which
    matches the service interface to make a clean
    mapping from client to service.

18
Introduce Service Creation/Modification Tool
  • Graphical tool to automatically create source
    code, configuration files, and build process for
    new analytical services
  • Developer defines the operations of the service
    and just has to focus on the implementation of
    them
  • Generated service is caBIG compliant in its
    mechanisms to register, advertise, and secure

19
Introduce Service Creation/Modification Tool cont.
  • Input and output parameters can be discovered
    from GME or caDSR
  • Schema types can be automatically downloaded and
    configured as operation parameters
  • Specified types are used to create necessary Java
    Objects using Axis/Globus behind the scenes

20
Created Skeleton Layout
generated
built
developers contribution
21
Example Application Remote execution of multiple
Image Analysis algorithms using multiple image
databases
  • Facilitate research and clinical decision support
    with large number of subjects and multiple image
    analysis algorithms.
  • Enable better algorithm development and
    validation through the use of many distributed,
    shared image datasets
  • Support remote algorithm execution reduce data
    transfer and avoid the need to transmit PHI

22
gridIMAGE Architecture
Expose algorithms, human markup and image data as
caGrid Services
23
Image Data Service
  • Expose data in DICOM PACS servers as caGrid Data
    Service
  • XML based data transfer (our own schema)
  • caDSR CDE

24
Image Analysis Application Service
  • caGrid middleware to wrap image analysis
    applications with grid services
  • Interact with Data Services to retrieve images
  • Invoke algorithm with required inputs
  • Transform and report results to results data
    service

25
Human Markup Services
  • Query a work-order queue to retrieve any new
    markup requests
  • Interact with Data Services to retrieve images
  • Capture markups and save to results data service

26
ACRIN Image Archive
  • 14 million images since 1999
  • 16 Terrabytes of storage
  • All images from all trials stored at HQ
  • Images re-transmitted from HQ to selected
    investigators for
  • Quality assurance (QA plan specific to each
    trial)
  • Off-line reader studies
  • HQ laboratories for on-site image manipulation
    and interpretation
  • Imaging Lab
  • PET Core Lab

27
Links to Petascale ComputingDataset Size
  • Basic small mouse is 10 cm3
  • 1 µ resolution very roughly 1013 bytes/mouse
  • Molecular data (spatial location) multiply by
    102
  • Vary genetic composition, environmental
    manipulation, systematic mechanisms for varying
    genetic expression multiply by 103
  • Total 1018 bytes per big science animal
    experiment
  • Data drives complex computational pipelines

28
Now Virtual Slides(roughly 25TB/cm2 tissue)
29
Understand function of Rb gene
30
Wild vs Mutant
Wild type - Labyrinth neat, well-ordered,
maternal blood sinusoids and trophoblasts evenly
dispersed among fetal blood cells.
Mutant - Trophoblasts grow wildly, clump
together and disrupt fetal and maternal cells
layers necessary for proper embryonic growth
31
Tumor Microenvironment
  • Cancer is a complex phenomenon
  • A tumor is an organ
  • Structural and functional differentiation within
    tumor
  • Molecular pathways are time and space dependent
  • Field effects gradient of genetic, epigenetic
    changes
  • Anatomy, physiology, molecular biology of cancer

32
Tumor microenvironment research True Multiscale
Information Integration
33
Compare phenotypes of normal vs Rb deficient mice
Alignment
Slides/Slices
Placenta
Visualization
Segmentation
34
3-D Reconstruction
35
Three dimensional duct reconstruction
36
(No Transcript)
37
(No Transcript)
38
Approach link fine grain (MPI based) parallel
component support to caGrid DataCutter
  • Pipeline with sequence image normalization,
    segmentation, registration, feature detection
    routines
  • caGrid/DataCutter/Matlab
  • Workflow optimization testbed
  • DDDAS NSF project with ISI
  • Application to urgent computing

39
Processing Terabyte-scale images on OSC MSS (16
nodes)
40
Lessons Learned
  • Applications community-driven large scale grid
    effort (and possibly petascale computing)
  • Cancer research consists of many types of
    conceptually interconnected application areas
  • Going from conceptual links to joins requires
    major retooling in each application area as well
    as federated database/distributed processing
    grid infrastructure
  • Data modeling/metadata plays central role
  • Ease of grid development, security necessary (but
    sufficient?)

41
Open issues
  • Will cancer researchers broadly adopt this
    infrastructure?
  • If the infrastructure is broadly adopted, will it
    lead to increased research productivity?
  • How should a community go about
    standardizing/curating metadata is caBIG
    approach scalable?
  • Can caGrid serve as front end for integrated
    Grid/High end computing infrastructure (once
    called metacomputing

42
caGrid Team
  • Ohio State University - Department of BioMedical
    Informatics (http//bmi.osu.edu/)
  • Dave Ervin
  • Shannon Hastings
  • Tahsin Kurc
  • Stephen Langella
  • Scott Oster
  • Joel Saltz
  • Argonne National Lab / University of
    Chicago(http//www.globus.org)
  • William Allcock
  • Jarek Gawor
  • Ravi Madduri
  • Frank Siebenlist
  • Michael Wilde
  • Duke University
  • A. Jamie Cuticchia
  • Patrick McConnell
  • Georgetown University
  • Colin Freas
  • Paul A. Kennedy
  • Chad La Joie
  • SAIC (http//www.saic.com)
  • Manav Kher
  • Booz Allen Hamilton (http//www.bah.com)
  • Arumani Manisundaram
  • Michael Keller
  • Reechik Chatterjee

43
gridCAD Team
  • Tony Pan, Joel Saltz, Tahsin Kurc, Stephen
    Langella,
  • Shannon Hastings, Scott Oster, Ashish Sharma,
    Metin Gurcan
  • Department of Biomedical Informatics
  • The Ohio State University Medical Center,
    Columbus OH
  • Eliot Siegel, Khan M. Siddiqui
  • University of Maryland School of Medicine,
    Baltimore, MD

For more information, please contact Tony Pan
(tpan_at_bmi.osu.edu) Dept. of Biomedical
Informatics, The Ohio State University
http//bmi.osu.edu
44
Microscopy Image Analysis
  • Biomedical Informatics
  • Tony Pan
  • Alexandra Gulacy
  • Dr. Metin Gurcan
  • Dr. Ashish Sharma
  • Dr. Kun Huang
  • Dr. Joel Saltz
  • Computer Science and Engineering
  • Raghu Machiraju
  • Kishore Mosaliganti
  • Randall Ridgway
  • Richard Sharp
  • Human Cancer Genetics
  • Pamela Wenzel
  • Dr. Gustavo Leone
  • Dr. Alain deBruin
  • Dr. Tony Trimboli
  • Jana Opavska

45

OSU Epigenetics Integrated Cancer Biology Center
caBIG Group (Informatics Core)
  • Joel H. Saltz, Junghee Han, Hao Sun, Pearlly Yan,
    Ramana Davuluri, and Tim Huang (PI)

46
GridCAD Acknowledgements
The RIDER dataset used during this demonstration
is provided courtesy of NCI Cancer Imaging
Program iCAD Inc. Euvondia Friedmann, Maha
Sallam, Tim Carter This project was funded by
NIH BISTI Center for Grid Enabled Medical
Imaging, NCI, NSF, and the State of Ohio Board of
Regents BRTT program
For more information, please contact Tony Pan
(tpan_at_bmi.osu.edu) Dept. of Biomedical
Informatics, The Ohio State University
http//bmi.osu.edu
About PowerShow.com