Distributed, Modular Grid Software for Management and Exploration of Data in Patient-Centric Healthcare IT - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed, Modular Grid Software for Management and Exploration of Data in Patient-Centric Healthcare IT

Description:

Distributed, Modular Grid Software for Management and Exploration of Data in Patient-Centric Healthcare IT – PowerPoint PPT presentation

Number of Views:446
Avg rating:3.0/5.0
Slides: 39
Provided by: docsHuiho1
Category:

less

Transcript and Presenter's Notes

Title: Distributed, Modular Grid Software for Management and Exploration of Data in Patient-Centric Healthcare IT


1
Distributed, Modular Grid Software for Management
and Exploration of Data in Patient-Centric
Healthcare IT
  • Andrew Hart
  • NASA Jet Propulsion Laboratory
  • David Kale
  • Whittier VPICU, Childrens Hospital LA
  • Heather Kincaid
  • NASA Jet Propulsion Laboratory

2
Agenda
  • Health Care Data Challenges for Large-scale
    Research
  • Intro to Object Oriented Data Technology (OODT)
  • Applications of OODT in distributed scientific
    data systems
  • NASAs Planetary Data System
  • NCIs Early Detection Research Network
  • Whittier Virtual Pediatric Intensive Care Unit
    (VPICU)
  • OODT as Open Source
  • Learning More Keeping in Touch

3
Health care research
  • Increasingly collaborative
  • Increasingly geographically distributed
  • Scale, Complexity, Cost drive cooperation
  • Opportunities for discovery emerge through larger
    data sets
  • Increase in need for technology to support for
    virtual organizations carrying out distributed
    scientific research

4
OODT What Is It?
  • A data grid software infrastructure for
    constructing large-scale, distributed
    data-intensive systems
  • Reference Architecture
  • Software Product Line
  • Reusable Components
  • Common Patterns

5
A Brief History of OODT
  • Funded out of NASAs Office of Space Science in
    1998
  • Funded to address critical software engineering
    challenges affecting the design of mission
    science data systems
  • Designed, implemented, and refined over the past
    7 years across multiple scientific domains
  • Planetary Science,
  • Earth Science,
  • Cancer Research,
  • Space Physics,
  • Modeling and Simulation,
  • Pediatric Intensive Care
  • Runner up NASA software of the year in 2003

6
Principles behind OODT
  • Division of LaborAvoid making one component the
    workhorse, configurable
  • Technology Independence Guard against unexpected
    changes in the technology landscape
  • Metadata as a first-class citizenDescriptions of
    resources come in handy
  • Separation of software and data modelsAllow each
    to evolve independently
  • Modular, domain-agnostic Pick and choose from
    adaptable components with defined interfaces

7
OODT Core Framework Services
  • Archive ServiceIngest data metadata,
    processing algorithms, workflow support
  • Profile ServiceDeliver metadata from an
    underlying data store
  • Product ServiceDeliver data from an underlying
    data store
  • Query ServiceManage sets of profile servers
  • Data Grid ServiceInterfaces and tools for
    connecting distributed resources over the web

8
Applications of OODT PDS
  • Planetary Data System
  • National Aeronautics and Space Administration
  • http//pds.nasa.gov

9
NASA Planetary Data System
  • Official NASA archive for all planetary data
  • 9 Nodes with data located at discipline sites
  • All missions must add theirdata (required as
    part of mission Announcement of Opportunity
  • Prior to October 2002, no ability to find and
    share data between PDS nodes

10
PDS Data Key Challenges
  • Challenges to building a science data system for
    the PDS
  • NASA often flies unique, one of a kind missions
  • A static infrastructure wont work Nodes and
    models change
  • Data stored at PDS nodes differs dramatically in
    structure
  • Missions are required to share science data
    results with the research community

11
PDS Data Architecture
  • Distributed data system environment with
    federated governanceEach site maintains their
    own database and infrastructure
  • Common domain information model (regularly
    updated) used to drive system implementationsOnto
    logy and Common Data Elements (based on ISO/IEC
    11179)
  • Common query interface to distributed
    servicesimplemented with OODT Query Handlers
  • Software services that wrap existing data systems
    to share data Implemented with OODT Product
    Profile servers
  • Publishing of data products to a common portal
    Implemented using Resource Description Format
    (RDF)

12
PDS Architecture Decomposition
13
Applications of OODT EDRN
  • Early Detection Research Network
  • Division of Cancer Prevention, National Cancer
    Institute
  • http//cancer.gov/edrn

14
EDRN Overview
  • Focus investigator-initiated, collaborative
    research on molecular, genetic and other
    biomarkers for cancer detection and risk
    assessment.
  • Funded since 2000 by the Division of Cancer
    Prevention in the National Cancer Institute (NCI)
  • 40 geographically distributed centers
    performing parallel, complementary studies
  • Strong emphasis on therole of informatics

15
EDRN Participants
  • Biomarker Development LaboratoriesResponsible
    for the development and characterization of new
    biomarkers or the refinement of existing
    biomarkers.
  • Biomarker Reference LaboratoriesServe as a
    Network resource for clinical and laboratory
    validation of biomarkers, which includes
    technological development, quality control,
    refinement, and high throughput.
  • Clinical Epidemiology and Validation
    CentersConduct clinical and epidemiological
    research regarding the clinical application of
    biomarkers.
  • Data Management and Coordinating
    CenterCoordinate EDRN research activities,
    provide logistic support, conduct statistical and
    computational research for data analysis,
    analyzing data for validation.

16
OODT and EDRN
  • OODTs success lead to interagency agreements
    with both NIH and NCI, resulting in
  • EDRN Informatics CenterSupport EDRN's efforts
    through the development of software systems for
    information management. Located at NASA Jet
    Propulsion Laboratory, Pasadena, CA.
  • Principal Investigator Dan Crichton, JPL.

17
EDRN Data
  • EDRN collects, generates, analyzes, and stores a
    wide variety of different data, including
  • Specimen Inventories Map specimens collected
    (blood, sputum, etc.) to patient characteristics
  • Studies and Publications Information about
    studies conducted in the EDRN as well as
    published results (publications, outputs)
  • Biomarkers Information about indicators of early
    disease
  • Science DataOutputs of experiments on specimens,
    regarding biomarkers, driven by particular
    studies and protocols

18
EDRN Data Flow
  • Moving beyond the local laboratory
  • Scalability, interoperability

19
Case Study ERNE
  • ERNE EDRN Resource Network Exchange
  • Challenge Overcome differences in local schema
    to develop a national distributed specimen
    information infrastructure
  • All sites running different software and
    following own procedures
  • Rely on a common informationmodel for
    distributed querying,and provide site-specific
    mappings at each participant

20
ERNE Architecture
21
Connecting Research
  • Designing the EDRN informatics architecture as a
    collection of well-defined components via OODT
    has simplified the process of building interfaces
    to non-EDRN systems
  • Wrappers can be built to link non-EDRN systems
  • Translators can be developed to deal with
    different semantic architectures
  • caBIG
  • ERNE/caTissue Wrapper
  • EDRN-Canary Collaboration
  • A cloud computing effort that shares raw science
    data via Amazon S3 between EDRN and the Canary
    group which uses software from GenoLogics Life
    Sciences

22
EDRN Knowledge Environment
  • Building a Semantic Bioinformatics Grid for the
    EDRN

23
Lessons From EDRN
  • Architecture and a vision has been critical
  • Technology hasnt been as critical
  • Keep it simple
  • Science support has been critical
  • Getting buy-in and participation from domain
    experts is key
  • Incremental development and deployment
  • Starting with a few sites was very helpful in
    understanding the issues
  • We had both development sites and observer sites
    initially
  • The IRB process has been a big schedule driver
  • Distributed architecture can be a challenge
  • Not all sites up to maintaining the
    implementation
  • Loosely coupled architecture with simple
    interfaces helped

24
Applications of OODT VPICU
  • Whittier Virtual Pediatric Intensive Care Unit
  • Childrens Hospital Los Angeles
  • http//picu.net
  • Collaboration between 85 Multi-disciplinary
    pediatric intensive care units across the U.S.

25
Collaboration with VPICU
  • Laura P. and Leland K. Whittier Virtual Pediatric
    Intensive Care Unit (VPICU), founded in 1998 by
    clinicians at CHLA
  • Leverage advances in technology to
  • Improve patient care
  • Educate practitioners
  • Conduct research
  • Reduce cost of providing care

26
VPICU Research Data
Secondary use of observational clinical (EHR,
monitor, annotations) data
  • Real Health Care Data Set
  • Massive, grows continuously
  • Heterogeneous formats, types, etc.
  • Incomplete, proprietary, descriptions
  • Fragmented across stores, organizational
    boundaries
  • Incomplete, inconsistent
  • Highly restricted (legal, privacy, ethical
    considerations)
  • Ideal Research Data Set
  • Manageable size, Static
  • Homogeneous
  • Complete, standardized descriptions and
    annotations
  • Available as single unit
  • Complete, consistent
  • Minimal usage restrictions

27
VPICU Project Areas
  • Data extraction and managementTake data from
    proprietary stores, make it accessible
  • Transformation of data into knowledgeProcess
    (and re-process) the data to extract insight
  • Data-driven decision supportDevelop tools that
    learn continuously from the data
  • Distributed data-sharing over a national
    networkEnable research on scales previously
    impossible while maintaining security, privacy,
    compliance

28
Principles behind VPICU
  • Decouple from (proprietary) vendor databases
  • Integrate disparate data sources into a single
    model
  • Dynamically (re)generate research database(s)
  • we dont know for sure what queries will be most
    useful at the outset
  • Provide web services for multi-faceted access to
    the data to enable discovery analysis
  • Support federation among multiple PICU sites

29
Algorithm for VPICU Data System
  • Develop a common Domain Ontology to describe the
    information space
  • Develop compute services that support extraction
    of data from existing databases
  • Identify mechanisms to integrate information
    objects from disparate repositories and map them
    to the common domain ontology
  • Construct a set of online research databases to
    enable data mining and analysis
  • Deploy a data grid infrastructure of hardware
    software to facilitate utilization of the data
    environment at CHLA and beyond (external entities
    and applications)
  • Deploy a set of compute services to support data
    mining and analysis
  • Develop an architectural plan and roadmap for
    scaling and integrating other PICUs

30
VPICU Architecture
File-based storage
31
VPICU Architecture
  • Original data sources/stores at backend
  • Proprietary schema
  • Hardware that we dont own or control
  • Production systems (very load-sensitive)
  • Legacy technologies (sometimes)
  • Unreliable (cant guarantee always available)
  • Includes
  • Hospital-wide commercial EHR system(s)
  • Homegrown critical care database
  • Specialized clinical applications
  • Raw bedside monitor data

EHR
Homegrown
File-based storage
Clinical apps
Monitor data
Proprietary data sources
32
VPICU Architecture
  • Regular extraction of new data
  • VPICU-controlled resources(Our hardware and
    software)
  • Transform to VPICU schema
  • Link data belonging to same patient
  • May contain PHIMust be highly secure
  • Data at this stage is normalized, stored in a
    format suitable for ingestion into any number of
    research databases

File-based storage
VPICU-owned resources
33
VPICU Architecture
  • Research databases
  • Application-specific
  • Optimized
  • Contain de-identified or anonymized data
  • VPICU ontology, schema
  • Access via configurable web services

File-based storage
34
What are research databases?
  • Designed for specific research questions,
    analytical techniques
  • Need not always be relational or databases at all
  • Available via web interfaces and software
    servicesResearcher using R can connect directly
    through R bindings
  • Examples
  • Relational database for traditional retrospective
    studies
  • Search engine over free text clinical notes, etc.
  • Patient/patient comparison, retrieval (find
    patient like this one)
  • Data-backed patient simulator for testing
    interventions

35
VPICU Architecture
File-based storage
36
OODT and the VPICU Data System
  • Develop an Information Model (Ontology) to
    describe the domain
  • Develop compute services that support extraction
    of data from existing CHLA databases (OODT Query
    Handlers)
  • Identify mechanisms to integrate information
    objects from disparate repositories and map them
    to the common domain ontology (OODT CAS crawler,
    catalog services)
  • Construct a set of online research databases to
    enable data mining and analysis (OODT Catalog and
    Archive Services)
  • Deploy a data grid infrastructure of hardware
    software to facilitate utilization of the data
    environment at CHLA and beyond (external entities
    and applications) (OODT Data Grid Services)
  • Deploy a set of compute services to support data
    mining and analysis
  • Develop an architectural plan and roadmap for
    scaling and integrating other PICUs

37
OODT as Open Source
  • Jan 2010 OODT Accepted as a podling in the
    Apache Software Foundation (ASF) Incubator
  • First NASA software licensed and incubating
    within the ASF
  • Learn more and track our progress at
  • http//incubator.apache.org/projects/oodt.html
  • Join the mailing list
  • oodt-dev_at_incubator.apache.org
  • Chat on IRC
  • oodt on irc.freenode.net

38
Acknowledgements
  • Jet Propulsion Laboratory Dan Crichton, Chris
    Mattmann, Sean Kelly, Steve Hughes, Amy
    Braverman, Thuy Tran
  • National Cancer Institute Sudhir Srivastava,
    Christos Patriotis, Don Johnsey
  • Fred Hutchinson Cancer Research Center Mark
    Thornquist, Ziding Feng, Jackie Dalhgren, Suzanna
    Reid
  • Childrens Hospital Los Angeles Randall Wetzel,
    Robinder Khemani,Paul Vee, Jeff Terry, Robert
    Kaptan,Doug Hallam
Write a Comment
User Comments (0)
About PowerShow.com