Distributed, Modular Grid Software for Management and Exploration of Data in Patient-Centric Healthcare IT - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Distributed, Modular Grid Software for Management and Exploration of Data in Patient-Centric Healthcare IT

Description:

Distributed, Modular Grid Software for Management and Exploration of Data in Patient-Centric Healthcare IT – PowerPoint PPT presentation

Number of Views:452

Avg rating:3.0/5.0

Slides: 39

Provided by: docsHuiho1

Category:

more less

Transcript and Presenter's Notes

Title: Distributed, Modular Grid Software for Management and Exploration of Data in Patient-Centric Healthcare IT

1
Distributed, Modular Grid Software for Management
and Exploration of Data in Patient-Centric
Healthcare IT

Andrew Hart
NASA Jet Propulsion Laboratory
David Kale
Whittier VPICU, Childrens Hospital LA
Heather Kincaid
NASA Jet Propulsion Laboratory

2
Agenda

Health Care Data Challenges for Large-scale
Research
Intro to Object Oriented Data Technology (OODT)
Applications of OODT in distributed scientific
data systems
NASAs Planetary Data System
NCIs Early Detection Research Network
Whittier Virtual Pediatric Intensive Care Unit
(VPICU)
OODT as Open Source
Learning More Keeping in Touch

3
Health care research

Increasingly collaborative
Increasingly geographically distributed
Scale, Complexity, Cost drive cooperation
Opportunities for discovery emerge through larger
data sets
Increase in need for technology to support for
virtual organizations carrying out distributed
scientific research

4
OODT What Is It?

A data grid software infrastructure for
constructing large-scale, distributed
data-intensive systems
Reference Architecture
Software Product Line
Reusable Components
Common Patterns

5
A Brief History of OODT

Funded out of NASAs Office of Space Science in
1998
Funded to address critical software engineering
challenges affecting the design of mission
science data systems
Designed, implemented, and refined over the past
7 years across multiple scientific domains
Planetary Science,
Earth Science,
Cancer Research,
Space Physics,
Modeling and Simulation,
Pediatric Intensive Care
Runner up NASA software of the year in 2003

6
Principles behind OODT

Division of LaborAvoid making one component the
workhorse, configurable
Technology Independence Guard against unexpected
changes in the technology landscape
Metadata as a first-class citizenDescriptions of
resources come in handy
Separation of software and data modelsAllow each
to evolve independently
Modular, domain-agnostic Pick and choose from
adaptable components with defined interfaces

7
OODT Core Framework Services

Archive ServiceIngest data metadata,
processing algorithms, workflow support
Profile ServiceDeliver metadata from an
underlying data store
Product ServiceDeliver data from an underlying
data store
Query ServiceManage sets of profile servers
Data Grid ServiceInterfaces and tools for
connecting distributed resources over the web

8
Applications of OODT PDS

Planetary Data System
National Aeronautics and Space Administration
http//pds.nasa.gov

9
NASA Planetary Data System

Official NASA archive for all planetary data
9 Nodes with data located at discipline sites
All missions must add theirdata (required as
part of mission Announcement of Opportunity
Prior to October 2002, no ability to find and
share data between PDS nodes

10
PDS Data Key Challenges

Challenges to building a science data system for
the PDS
NASA often flies unique, one of a kind missions
A static infrastructure wont work Nodes and
models change
Data stored at PDS nodes differs dramatically in
structure
Missions are required to share science data
results with the research community

11
PDS Data Architecture

Distributed data system environment with
federated governanceEach site maintains their
own database and infrastructure
Common domain information model (regularly
updated) used to drive system implementationsOnto
logy and Common Data Elements (based on ISO/IEC
11179)
Common query interface to distributed
servicesimplemented with OODT Query Handlers
Software services that wrap existing data systems
to share data Implemented with OODT Product
Profile servers
Publishing of data products to a common portal
Implemented using Resource Description Format
(RDF)

12
PDS Architecture Decomposition
13
Applications of OODT EDRN

Early Detection Research Network
Division of Cancer Prevention, National Cancer
Institute
http//cancer.gov/edrn

14
EDRN Overview

Focus investigator-initiated, collaborative
research on molecular, genetic and other
biomarkers for cancer detection and risk
assessment.
Funded since 2000 by the Division of Cancer
Prevention in the National Cancer Institute (NCI)
40 geographically distributed centers
performing parallel, complementary studies
Strong emphasis on therole of informatics

15
EDRN Participants

Biomarker Development LaboratoriesResponsible
for the development and characterization of new
biomarkers or the refinement of existing
biomarkers.
Biomarker Reference LaboratoriesServe as a
Network resource for clinical and laboratory
validation of biomarkers, which includes
technological development, quality control,
refinement, and high throughput.
Clinical Epidemiology and Validation
CentersConduct clinical and epidemiological
research regarding the clinical application of
biomarkers.
Data Management and Coordinating
CenterCoordinate EDRN research activities,
provide logistic support, conduct statistical and
computational research for data analysis,
analyzing data for validation.

16
OODT and EDRN

OODTs success lead to interagency agreements
with both NIH and NCI, resulting in
EDRN Informatics CenterSupport EDRN's efforts
through the development of software systems for
information management. Located at NASA Jet
Propulsion Laboratory, Pasadena, CA.
Principal Investigator Dan Crichton, JPL.

17
EDRN Data

EDRN collects, generates, analyzes, and stores a
wide variety of different data, including
Specimen Inventories Map specimens collected
(blood, sputum, etc.) to patient characteristics
Studies and Publications Information about
studies conducted in the EDRN as well as
published results (publications, outputs)
Biomarkers Information about indicators of early
disease
Science DataOutputs of experiments on specimens,
regarding biomarkers, driven by particular
studies and protocols

18
EDRN Data Flow

Moving beyond the local laboratory
Scalability, interoperability

19
Case Study ERNE

ERNE EDRN Resource Network Exchange
Challenge Overcome differences in local schema
to develop a national distributed specimen
information infrastructure
All sites running different software and
following own procedures
Rely on a common informationmodel for
distributed querying,and provide site-specific
mappings at each participant

20
ERNE Architecture
21
Connecting Research

Designing the EDRN informatics architecture as a
collection of well-defined components via OODT
has simplified the process of building interfaces
to non-EDRN systems
Wrappers can be built to link non-EDRN systems
Translators can be developed to deal with
different semantic architectures
caBIG
ERNE/caTissue Wrapper
EDRN-Canary Collaboration
A cloud computing effort that shares raw science
data via Amazon S3 between EDRN and the Canary
group which uses software from GenoLogics Life
Sciences

22
EDRN Knowledge Environment

Building a Semantic Bioinformatics Grid for the
EDRN

23
Lessons From EDRN

Architecture and a vision has been critical
Technology hasnt been as critical
Keep it simple
Science support has been critical
Getting buy-in and participation from domain
experts is key
Incremental development and deployment
Starting with a few sites was very helpful in
understanding the issues
We had both development sites and observer sites
initially
The IRB process has been a big schedule driver
Distributed architecture can be a challenge
Not all sites up to maintaining the
implementation
Loosely coupled architecture with simple
interfaces helped

24
Applications of OODT VPICU

Whittier Virtual Pediatric Intensive Care Unit
Childrens Hospital Los Angeles
http//picu.net
Collaboration between 85 Multi-disciplinary
pediatric intensive care units across the U.S.

25
Collaboration with VPICU

Laura P. and Leland K. Whittier Virtual Pediatric
Intensive Care Unit (VPICU), founded in 1998 by
clinicians at CHLA
Leverage advances in technology to
Improve patient care
Educate practitioners
Conduct research
Reduce cost of providing care

26
VPICU Research Data
Secondary use of observational clinical (EHR,
monitor, annotations) data

Real Health Care Data Set
Massive, grows continuously
Heterogeneous formats, types, etc.
Incomplete, proprietary, descriptions
Fragmented across stores, organizational
boundaries
Incomplete, inconsistent
Highly restricted (legal, privacy, ethical
considerations)

Ideal Research Data Set
Manageable size, Static
Homogeneous
Complete, standardized descriptions and
annotations
Available as single unit
Complete, consistent
Minimal usage restrictions

27
VPICU Project Areas

Data extraction and managementTake data from
proprietary stores, make it accessible
Transformation of data into knowledgeProcess
(and re-process) the data to extract insight
Data-driven decision supportDevelop tools that
learn continuously from the data
Distributed data-sharing over a national
networkEnable research on scales previously
impossible while maintaining security, privacy,
compliance

28
Principles behind VPICU

Decouple from (proprietary) vendor databases
Integrate disparate data sources into a single
model
Dynamically (re)generate research database(s)
we dont know for sure what queries will be most
useful at the outset
Provide web services for multi-faceted access to
the data to enable discovery analysis
Support federation among multiple PICU sites

29
Algorithm for VPICU Data System

Develop a common Domain Ontology to describe the
information space
Develop compute services that support extraction
of data from existing databases
Identify mechanisms to integrate information
objects from disparate repositories and map them
to the common domain ontology
Construct a set of online research databases to
enable data mining and analysis
Deploy a data grid infrastructure of hardware
software to facilitate utilization of the data
environment at CHLA and beyond (external entities
and applications)
Deploy a set of compute services to support data
mining and analysis
Develop an architectural plan and roadmap for
scaling and integrating other PICUs

30
VPICU Architecture
File-based storage
31
VPICU Architecture

Original data sources/stores at backend
Proprietary schema
Hardware that we dont own or control
Production systems (very load-sensitive)
Legacy technologies (sometimes)
Unreliable (cant guarantee always available)
Includes
Hospital-wide commercial EHR system(s)
Homegrown critical care database
Specialized clinical applications
Raw bedside monitor data

EHR
Homegrown
File-based storage
Clinical apps
Monitor data
Proprietary data sources
32
VPICU Architecture

Regular extraction of new data
VPICU-controlled resources(Our hardware and
software)
Transform to VPICU schema
Link data belonging to same patient
May contain PHIMust be highly secure
Data at this stage is normalized, stored in a
format suitable for ingestion into any number of
research databases

File-based storage
VPICU-owned resources
33
VPICU Architecture

Research databases
Application-specific
Optimized
Contain de-identified or anonymized data
VPICU ontology, schema
Access via configurable web services

File-based storage
34
What are research databases?

Designed for specific research questions,
analytical techniques
Need not always be relational or databases at all
Available via web interfaces and software
servicesResearcher using R can connect directly
through R bindings
Examples
Relational database for traditional retrospective
studies
Search engine over free text clinical notes, etc.
Patient/patient comparison, retrieval (find
patient like this one)
Data-backed patient simulator for testing
interventions

35
VPICU Architecture
File-based storage
36
OODT and the VPICU Data System

Develop an Information Model (Ontology) to
describe the domain
Develop compute services that support extraction
of data from existing CHLA databases (OODT Query
Handlers)
Identify mechanisms to integrate information
objects from disparate repositories and map them
to the common domain ontology (OODT CAS crawler,
catalog services)
Construct a set of online research databases to
enable data mining and analysis (OODT Catalog and
Archive Services)
Deploy a data grid infrastructure of hardware
software to facilitate utilization of the data
environment at CHLA and beyond (external entities
and applications) (OODT Data Grid Services)
Deploy a set of compute services to support data
mining and analysis
Develop an architectural plan and roadmap for
scaling and integrating other PICUs

37
OODT as Open Source

Jan 2010 OODT Accepted as a podling in the
Apache Software Foundation (ASF) Incubator
First NASA software licensed and incubating
within the ASF
Learn more and track our progress at
http//incubator.apache.org/projects/oodt.html
Join the mailing list
oodt-dev_at_incubator.apache.org
Chat on IRC
oodt on irc.freenode.net

38
Acknowledgements

Jet Propulsion Laboratory Dan Crichton, Chris
Mattmann, Sean Kelly, Steve Hughes, Amy
Braverman, Thuy Tran
National Cancer Institute Sudhir Srivastava,
Christos Patriotis, Don Johnsey
Fred Hutchinson Cancer Research Center Mark
Thornquist, Ziding Feng, Jackie Dalhgren, Suzanna
Reid
Childrens Hospital Los Angeles Randall Wetzel,
Robinder Khemani,Paul Vee, Jeff Terry, Robert
Kaptan,Doug Hallam