Challenges In Transforming Observational Data For Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Challenges In Transforming Observational Data For Analysis

Description:

Challenges In Transforming Observational Data For Analysis OR How To Call Into Question Your Observational Data Without Even Trying Don Griffin Health Informatics ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 31
Provided by: vtan1
Category:

less

Transcript and Presenter's Notes

Title: Challenges In Transforming Observational Data For Analysis


1
Challenges In Transforming Observational DataFor
Analysis
OR How To Call Into QuestionYour Observational
Data Without Even Trying
Don Griffin Health Informatics Technology
Director Computer Sciences Corporation May 20,
2009
2
Objectives
  • Lofty Objective
  • Present a complete health Informatics solution
  • that is flexible enough to accommodate all of the
    types of source data that end users will
    requireeven if they do not know what those data
    will beand
  • that is rich enough in functionality to support
    all of the data transformations and manipulations
    that end users will require to convert those
    source data into research-oriented knowledge on
    which they may confidently rely.
  • More Practical Objective
  • Leave those in the audience with an appreciation
    for the things that must be done ahead-of-time to
    make multifarious, disparate, observational
    source data sets useful for analysis.

3
Definitions
  • Observational Data
  • ... the outcomes of acts of measurement using
    particular protocols within the context of any
    objective scientific measurement activity.
  • the basic or atomic notion of an observation
    represents
  • the outcome of some measurement taken of a
    defined attribute or characteristic of some
    entity (e.g., an organism in the field, a
    specimen, a sample, an experimental treatment,
    etc.),
  • within some context (possibly given by other
    observations).
  • Every observation entails the measurement of one
    or more properties of some real-world entity or
    phenomenon.
  • Biodiversity Information Standards TDWG
  • For Our Purposes
  • we are most interested in observational data on
    drug exposures and medical conditions (but other
    data may interest us, too), and
  • chief sources will be Medical Claims and
    Electronic Health Records (EHRs).

4
Definitions
  • Data Transformation
  • ... the operation of changing (as by rotation or
    mapping) one configuration or expression into
    another in accordance with a mathematical rule
    especially a change of variables or coordinates
    in which a function of new variables or
    coordinates is substituted for each original
    variable or coordinate
  • an operation that converts (as by insertion,
    deletion, or permutation) one grammatical string
    (as a sentence) into another
  • Merriam-Websters Dictionary
  • One of the three pillars of data governance
    (along with compliance and integration).
    transformation is a goal unto itself, as well as
    an enabler for the goals of compliance and
    integration.
  • The Data Warehousing Institute
  • For Our Purposes
  • we are most interested in reformatting data into
    a Common Data Model that allows portability of
    analysis methods across disparate source data
    sets, and
  • in standardizing data representations to make
    analysis results from disparate source data sets
    readily comparable.

5
Transforming Observational Data
  • Again, for our purposes, the process is rather
    simple. However, to do it correctly presents some
    challenges.

6
Transforming Observational Data
  • Again, for our purposes, the process is rather
    simple. However, to do it correctly presents some
    challenges.

7
The IT View of the End Users Goal
  • Skillful use of Common Data Model content to
    communicate complex ideas with clarity,
    precision, and efficiency (and, ideally,
    unimpeachability )
  • Show the data
  • Induce the viewer to think about the substance
    rather than about methodology, graphic design,
    the technology of graphic production, or
    something else
  • Avoid distorting what the data have to say
  • Present many numbers in a small space
  • Make large data sets coherent
  • Encourage the eye to compare different pieces of
    data
  • Reveal the data at several levels of detail, from
    a broad overview to the fine structure
  • Serve a reasonably clear purpose description,
    exploration, tabulation, or decoration
  • Be closely integrated with the statistical and
    verbal descriptions of a data set

Edward Tufte, The Visual Display of Quantitative
Information
8
The IT View of ITs Goals
  • Provide services necessary to populate the Common
    Data Model
  • Data Architecture
  • Data Collection
  • Data Extraction, Transformation, and Loading
    (ETL)
  • Data Management
  • Help (or do not hinder) end users in pursuit of
    their own goals
  • Preserve the data (i.e., their native values,
    formats, etc.)
  • Avoid distorting the data
  • Maintain data detail
  • Foster the widespread understanding of the data
  • What the data are and are not
  • What the data can and cannot do

9
IT Issues/Challenges
Source
Target (CDM)
DataManagement
Technical
DataCollection
ETL Design
DataArchitecture
DataUnderstanding
Philosophical
10
IT Issues/Challenges
  • Data Collection
  • Batch vs. Stream
  • Reception and Profiling
  • Verification to Specification
  • Culling and Cleansing
  • Staging

11
Profiling
12
Verification to Specification
13
Profiling
14
Profiling
15
Verification to Specification
16
IT Issues/Challenges
  • Data Management
  • Inventory and Tracking
  • Privacy, Security, and Compliance
  • Master/Reference Data Management
  • Logging and Auditing

17
Privacy
  • Protected Health Information
  • Any information (not just textual data) in the
    medical record or designated data set that can be
    used to identify an individual, and
  • That was created, used, or disclosed in the
    course of providing a health care service (e.g.,
    diagnosis, treatment, etc.)
  • HIPAA regulations allow researchers to access and
    use PHI when necessary to conduct research.
    However, HIPAA only affects research that uses,
    creates, or discloses PHI that will be entered in
    to the medical record or that will be used for
    the provision of heath care services (e.g.,
    treatment).
  • Research studies involving review of existing
    medical records for research information, such as
    retrospective chart review, are subject to HIPAA
    regulations.
  • Research studies that enter new PHI into the
    medical record (e.g., because the research
    includes rendering a health care service, such as
    diagnosing a health condition or prescribing a
    new drug or device for treating a health
    condition) are also subject to HIPAA regulations.
  • If in doubt, stay away from the 18 identifiers.

18
Privacy
  • 18 Identifiers
  • 1. Names
  • 2. All geographical subdivisions smaller than a
    State, including street address, city, county,
    precinct, zip code, and their equivalent
    geocodes, except for the initial three digits of
    a zip code, if according to the current publicly
    available data from the Bureau of the Census (1)
    The geographic unit formed by combining all zip
    codes with the same three initial digits contains
    more than 20,000 people and (2) The initial
    three digits of a zip code for all such
    geographic units containing 20,000 or fewer
    people is changed to 000.
  • 3. All elements of dates (except year) for dates
    directly related to an individual, including
    birth date, admission date, discharge date, date
    of death and all ages over 89 and all elements
    of dates (including year) indicative of such age,
    except that such ages and elements may be
    aggregated into a single category of age 90 or
    older
  • 4. Phone numbers
  • 5. Fax numbers
  • 6. Electronic mail addresses
  • 7. Social Security numbers

19
Privacy
  • 18 Identifiers
  • 8. Medical record numbers
  • 9. Health plan beneficiary numbers
  • 10. Account numbers
  • 11. Certificate/license numbers
  • 12. Vehicle identifiers and serial numbers,
    including license plate numbers
  • 13. Device identifiers and serial numbers
  • 14. Web Universal Resource Locators (URLs)
  • 15. Internet Protocol (IP) address numbers
  • 16. Biometric identifiers, including finger and
    voice prints
  • 17. Full face photographic images and any
    comparable images and
  • 18. Any other unique identifying number,
    characteristic, or code (note this does not mean
    the unique code assigned by the investigator to
    code the data)

20
Privacy
  • De-identification is a possible solution.
    However, additional standards and criteria apply.
  • Any code used to replace the identifiers in
    datasets cannot be derived from any information
    related to the individual and the master codes,
    nor can the method to derive the codes be
    disclosed. For example, a subject's initials
    cannot be used to code his data because the
    initials are derived from his name.
  • The researcher must not have actual knowledge
    that the subject could be re-identified from the
    remaining identifiers in the PHI used in the
    research study. That is, the information would
    still be considered identifiable is there was a
    way to identify the individual even though all of
    the 18 identifiers were removed.

21
Privacy
  • The following is NOT considered PHI, and
    therefore is not subject to HIPAA regulations.
  • Health information absent the 18 identifiers.
  • Data that would ordinarily be considered PHI, but
    which are not associated with or derived from a
    healthcare service event (treatment, payment,
    operations, medical records), not entered into
    the medical record, and not disclosed to the
    subject. Research health information that is kept
    only in the researchers records is not subject
    to HIPAA, but is regulated by other human
    subjects protection regulations.
  • Examples of research health information not
    subject to HIPAA include such studies as the use
    of aggregate data, diagnostic tests that do not
    go into the medical record because they are part
    of a basic research study and the results will
    not be disclosed to the subject, and testing done
    without the PHI identifiers.
  • Some genetic basic research can fall into this
    category such as the search for potential genetic
    markers, promoter control elements, and other
    exploratory genetic research.
  • In contrast, genetic testing for a known disease
    that is considered to be part of diagnosis,
    treatment and health care would be considered to
    use PHI and therefore subject to HIPAA
    regulations.

University of California, Berkeley Committee for
Protection of Human Subjects
22
IT Issues/Challenges
  • Data Extraction
  • Form (e.g., ASCII vs. EBCDIC)
  • Format (e.g., delimited, fixed-length, ragged
    right, etc.)
  • Data Transformation
  • Reformatting (usually from flat to relational)
  • Probabilistic Matching
  • Augmentation (excluding Standardization)
  • Master ltfill in the blankgt Indexing
  • Standardization
  • Data Loading

23
Augmentation
24
Standardization
25
IT Issues/Challenges
  • Data Architecture
  • Common Data Model Design Paradigms
  • All models are wrong, but some are useful
    George Box, Statistician
  • Flexibility vs. Intuitiveness Compromise

26
OMOP Common Data Model (conceptual)
27
OMOP Common Data Model (logical)
28
Solution Framework
CORE BUSINESS INTELLIGENCE SERVICES
Statistical Analysis and Validation
Reports/ Dashboards
Process Models
OLAP, ROLAP MOLAP, HOLAP
Business Rules/Predictive Models
Queries
Optimization
FOUNDATIONAL DATA SERVICES
Data Architecture
SUPPORTING SERVICES
Business Integration Services
Presentation and Portal Services
Systems Management Services
Database Management System
Data Models
Metadata
Data Collection
Data Integration
Data Management
Reception and Profiling
Verification to Specification
Probabilistic Matching
Inventory and Tracking
Privacy, Security, and Compliance
Augmentation
Controlled Medical Vocabularies
Staging for Integration
Master Person Indexing
Culling and Cleansing
Master/Reference Data Maintenance
Logging and Auditing
29
Solution Context
OVERALL SOLUTION STEWARDSHIP
Strategy
Process Intelligence
Governance
LIFE SCIENCES SOLUTIONS
Scientific Applications
Operational Reporting
Marketing
Study Recruitment
Drug Safety Monitoring
Site Management
Clinical Data Management
Market Intelligence
Exploratory Data Analysis
Study Management
Protocol Feasibility
Health Outcomes Economics
Drug Safety Management
Executive Dashboards
Licensing Intelligence
Closed Loop Marketing
CORE BUSINESS INTELLIGENCE SERVICES
Statistical Analysis and Validation
Reports/ Dashboards
OLAP, ROLAP MOLAP, HOLAP
Process Models
Business Rules/Predictive Models
Queries
Optimization
FOUNDATIONAL DATA SERVICES
Data Architecture
SUPPORTING SERVICES
Business Integration Services
Presentation and Portal Services
Systems Management Services
Database Management System
Data Models
Metadata
Data Collection
Data Integration
Data Management
Verification to Specification
Reception and Profiling
Probabilistic Matching
Inventory and Tracking
Privacy, Security, and Compliance
Augmentation
Controlled Medical Vocabularies
Staging for Integration
Master Person Indexing
Culling and Cleansing
Logging and Auditing
Master/Reference Data Maintenance
30
Thank You
Don Griffin (dgriffin2_at_csc.com) Health
Informatics Technology Director Computer Sciences
Corporation May 20, 2009
Write a Comment
User Comments (0)
About PowerShow.com