Data Management - PowerPoint PPT Presentation


PPT – Data Management PowerPoint presentation | free to view - id: fbd0b-ZDc1Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Data Management


Medical History/Concomitant Disorders. Participant Compliance. Outcome Measurements ... Concomitant Medications. Behavioral (eg Quality of Life, compliance, diary) ... – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 41
Provided by: alwil1


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Data Management

  • Data Management
  • Issues for
  • Clinical Research
  • Allan Williams - Database Manager
  • Anthony Mwatha - Statistician
  • Statistical Center for HIV/AIDS Research and

Key Principles
  • Keep your focus on the big picture. Some things
    are more important than others.
  • Pay attention to the details. The devil is in
    those details.
  • Case Report Form Development
  • Data Collection and Management
  • Use a Risk Management perspective what is most
    important to worry about, and if problems occur
    what is your mitigation strategy?

Important References
  • Take good care of your data, Svend Juul, 2004
    wise, excellent practical suggestionsExamples
    for EpiData, SPSS, Stata but useful for any data
    management and analysis environment
  • Good Clinical Data Management PracticesVersion
    4, September, 2007Society of Clinical Data
  • Excellent reference for entire field
  • Recommendations for Best Practices and Standard
    Operating Proceduress (SOPs)

What is the big picture?
  • Protocol Primary Objectives Endpoints
  • How will they be measured, collected, checked?
  • Secondary Objectives
  • Other major pre-planned analysis

How Much Data?
  • Friedman, Furberg DeMets, 1996
  • Beware of it would be interesting to know, Pg.
  • A typical Phase III clinical trial of primary HIV
    infection conducted by the ACTG collects more
    than 1,000 data items per patient Less than 100
    items usually appear in the published reports
    NDAs also tend to focus on the same items
  • Clinical Trial Safety Surveillance, DIA, 1997
  • The more that is asked for on a CRF, the more
    variability will occur in the answers.
    Investigators complain about being overwhelmed
    with CRFs. This does not create an ideal
    environment for collecting quality safety data.
  • DeMets, 2006 Course Description, Introduction to
    Clinical Trials
  • Many studies also fail because the amount of
    data collected exceeds what is necessary and what
    is affordable.

Less is More
  • Make everything as simple as possible, but no
    simpler Einstein
  • Long questionnaires do not necessarily give
    better or more complete information
  • Respondent fatigue
  • Strategies for avoidance Bias
  • Overly complex data collection may decrease
    overall quality enough to jeopardize the ability
    to answer the primary hypothesis
  • Not having a primary hypothesis is itself a
    problem with respect to less is more

Definition of Data Quality
  • There can be no perfect data set
  • Quality data would therefore be defined as data
    that sufficiently support conclusions and
    interpretations equivalent to those derived from
    error-free data.
  • From Assuring Data Quality and Validity in
    Clinical Trials for Regulatory Decision Making,
    Workshop Report, National Academy Press,
    Washington, D.C., 1999.
  • http//
  • Data can be overcleaned Perfect Data is
    probably not true

Pay attention to the details
  • Design your forms, database design, and QA
    activities around those primary, secondary and
    other major pre-planned objectives.
  • Build in support mechanisms and layered QA
    processes around the data for those key
  • Validate that those processes and QA systems work
    as intended
  • Think about Risk Management and Audits

The CRFs are the study
  • Form design is even harder than a good protocol
  • Focus on Primary/Secondary Hypothesis Definition
  • Focus on Safety (if clinical intervention)
  • Organize Visit Structure
  • Pilot test forms and visit flow
  • Details count. Finding operational problems early
    is better than patching up for them afterwards
  • Review, Review, Review! Multiple people with
    different viewpoints, responsibilities

CRF Design
CRF Design
  • Standardized Modules
  • Eligibility
  • Enrollment/Randomization Confirmation
  • Vital Signs/Physical Exam
  • Medical History/Concomitant Disorders
  • Participant Compliance
  • Outcome Measurements
  • Laboratory Tests and Specimen Tracking
  • Adverse Experiences, SAE Reporting
  • Concomitant Medications
  • Behavioral (eg Quality of Life, compliance,
  • Administrative (visit record, change in study
  • Termination (Finished/Withdrawal)

Visit Schedules
  • Which data and lab test collected at each visit?
  • What are visit windows?
  • Are data allowed to be collected outside
    windowor is the visit counted as missed?
  • Which forms are required at a visit? Which are
    optional? How is each tracked?
  • Events are often treated as logs? (Adverse
    experiences, concomitant meds, patient diaries)

Data Collection Forms
  • Designing data collection forms
  • Organization and content
  • Review protocol for required data create as
    form items
  • Deciding on what forms are needed by visit and
  • Some logical divisions include
  • Who will be completing the forms ?
  • When will the data be available ?
  • Where will the data be collected ?
  • Better to have many pages than to try and fit too
    much in one page.

Data Collection Forms
  • Design forms with ease of use in mind
  • Individuals filling the forms, data entry and
    analyzing the data.
  • Responses to questions include text strings,
    numeric or categorical values.
  • Text strings Responses to these types of
    questions contain data that is typically not
    analyzed, e.g. name of a medication. May have to
    later categorize if want to analyze.
  • Allow adequate space for handwriting the

Data Collection Forms (Keys)
  • Participant ID number
  • Dont overload with meaning
  • Use of checksum
  • Initials for cross-check (confidentiality?)
  • Unique to study or keep across studies
  • Study ID
  • Site ID
  • Form type/page sequence number
  • Visit number, visit date
  • Interviewer ID (Regulatory requirement)

Data Collection Forms (numeric)
  • Provide correct number of boxes for the answer
    pre-print decimal points commas or
    punctuations specify relevant units if rounding
    is required be clear on whether to round up or
  • Examples
  • Weight ____ . _ Kg
  • CD4 cells ____ cells / ?L
  • BP ___ / ___ mmHg

Format of Questions
  • Keep text as concise and clear as possible
  • __ __ .__ o C Temperature
  • __ __ .__ o C What was the patients temperature
    at delivery
  • Use terminology that is familiar / pilot the
  • Dates formats, US vs. international -
    09-OCT-2003 / 10-9-2003 / 9-10-2003
  • Time 24 hr clock - 0000 hrs vs. 2400 hrs
  • Unknown/Not applicable/Not available/Not done
  • Other
  • Give reason for stopping
  • 1. Completed per protocol
  • 2. Refused
  • 3. Side effects
  • 4. Other, specify ________________
  • 9. Unknown

Modifications to CRF
  • After activation of study, only when absolutely
  • If necessary things to consider
  • Change adds a question
  • Change adds an additional response
  • A question is being removed
  • Form version numbers/dates
  • Procedures for dealing with missing data (before
  • Documentation of change, impacts, mitigation

Data Collection and ManagementProcess Overview
(multi-part paper forms)
Data Management
Database Design
  • Desirable Clinical Data Management system
  • DE screens easy to set up and maintain
  • Supports variable labels, range checking,
    category checks, skip patterns, edit-check
  • Options for missing values Missing vs. NA vs.
    required skip
  • Supports relational links (mother, children
  • Support for double entry and verification
  • Ability to do queries, w/out extensive
  • Good links to SPSS, SAS, other statistics package
  • Multi-user, simultaneous read/write
  • Cost
  • Patient tracking
  • Security
  • Audit trails

Database/Entry Options (1)
  • Microsoft Excel (Not recommended)
  • General Database Software
  • Microsoft Access
  • Filemaker

Database/Entry Options (2)
  • Epi. Data Entry/Data Management Systems
  • Epi Info (
  • Uses Microsoft Access, writes Access .mdb
    files, which can be read by SPSS, SAS, STATA
  • Strongly Point and Click oriented, easy to use
  • Supported by CDC
  • EpiData (
  • Danish Free - Not Access (not affected by Access
  • Outputs to Excel, SAS, SPSS, Stata
  • Oriented to Batch Processing, but easy to set up
  • Good edit check language and Codebook
    documentation features

Database/Entry Options (3)
  • Scanning
  • Optical Mark Reading fill in the bubble - NCS
  • Fax submission, image scanning Teleform, DataFax
  • Commercial Clinical Trial Data Management
  • Oracle Clinical
  • Phase Forward Inform (Web based data capture)
    Clintrial (Oracle based)
  • DataFax Image based CRF transfer, central data
    entry, often internet based transfer

Paper Change Control
  • Multi-part Paper Model
  • Forms separated when sentto Sponsor for Data
  • Change Control very complicated(Requires
    change-NCR forms!)
  • Result is delay of sending andextensive onsite
    checking by CRA prior to splitting
    (harvesting) to send to Stat. Center for batch
    Data Entry

Centralized Paper Models
  • Most familiar to pharmaceutical and FDA
  • CRFs filled out at site, sent to SC
  • Multi-part carbonless forms standard
  • CRA checks forms before harvesting to send to
  • SC double-enters paper forms in batches
  • Site focuses on forms and protocol, not
  • Huge paper flow tracking issues
  • Change Control and QC flow is complex
  • May get into field quickly. Easiest for large
  • Strong software support from major
    vendors(Clintrial, Oracle Clinical, SAS

WEB Models
  • Mixture of remote and centralized
  • Remote data entry onto central database
  • Single or double entry?
  • Less site IT support Use browser model
  • Control over entry, edits like distributed model
  • Infrastructure requirements
  • Web, network, good ISPs, Bandwidth issues
  • New environment, FDA cautious about Investigator
  • Roll your own or Purchased (

Image Change Control
  • Image/Fax Model
  • Single piece of paper
  • Always at study site
  • All changes made directly to form and resent
  • Impt Change in white space, initialized and
  • Encourages sending of CRF immediately after use
  • Image and data can be immediately available to
    DE, programmers, statisticians, QA at same time
  • Audit trail and backup very important
  • Corrections may require multiple
  • Pages in a form may be sent out of sequence

Risk Management Approach
  • Invest in selected prevention and risk reduction
    activities to optimize regulatory compliance,
    problem resolution efforts and costs.
  • Bottom Line Avoid costly activities that have
    no, or little value in order to prevent
    significant regulatory and operational costs
    throughout the system life cycle

Layered Risk Management
  • Most secure, least risky systems use a layered
    approach, with multiple approaches and points for
    proactively reducing risk, and/or detecting risk
    events and having an effective mitigation
    strategy upon detection
  • Increasing probability of detecting a fault
    condition is an effective risk mitigation strategy

Categorize Risk Priority
  • Risk Type
  • Regulatory risk potential impact on patient
    safety, decision quality, or regulated data
  • Business risk potential impact on reputation,
    brand, development, cycle time, or revenue
  • Likelihood of event occurring (high, medium, low)
  • Severity of impact occurring (high, medium, low)
  • Detectability of discovering a fault condition
    (high, medium, low)

Sources of Error in a Study
Omission, mis-communication
Data entry errors
Programming, summary tables Statistical
Clinical interpretation
Questions to Ask
  • The following questions should be answered to
    identify critical systems/data .
  • Does the system have direct or indirect impact on
    patient safety?
  • Which functions of the system have the most/least
  • Is the data included in regulatory submissions?
  • Has the FDA audited the data in the past?
  • What is the worst thing that could happen if data
    was lost or corrupted?
  • What will happen if the system is not available?
  • Are there external processes that can detect
    failure of system?
  • Are all processes documented and can the process
    be reconstructed?
  • Determine how the validation effort will be
    focused in order to thoroughly test the most
    critical aspects of your system

Standardizing, Organizing, Documenting
  • Standardizing
  • Standard CRF modules and items
  • Standard Operating Procedures
  • Organizing
  • Process flow
  • Study communications
  • Programs and systems
  • Documenting
  • All the above
  • Exceptions and deviations from standard
    procedures and mitigation strategy to prevent
    future occurrences

Standard Operating Procedures
  • Commitment to developing SOPs, following those
    SOPs, being able to document that they are/were
    followed, and being able to detect when they are
    not being followed (risk mitigation)
  • Use Society for Clinical Data Management (SCDM)
    Good Clinical Data Management Practices as a
    guide (GCDMP, Vol. 3 or 4) for developing your
    own SOPs.
  • Carefully re-read Take Good Care of your Data
    by Svend Juul

Keeping Organized
  • Create Project master folder, separate from
  • Organize sub-directories
  • Data, code/procedures, documentation, analysis,
  • Keep a log of all actions, modifications,
    analyses, locations, flow of procedures (Data
    Flow Diagram)
  • Standardize on file naming (and extension)
  • Keep copy of each procedure that modifies or
    extends the data. Do not work interactively and
    then not save a copy of what you did. Keep copies
    of any program/processing so that it can be
    re-run later
  • Document your procedures/programs
  • Purpose, inputs, outputs, directory locations,
    programmer, modification history

Batch vs. Interactive Processes
  • Most data management and analysis programs should
    be run as a batch program. This means that the
    program is not run interactively but is saved in
    a standard location as a file and run by an
    external procedure.
  • Change control procedures should document each
    update to the program and documentation should
    exist that shows that proper testing of the
    changes were performed.

Backing Up, Archiving
  • Backing Up
  • Daily, to disk, tape, both
  • Purpose to restore data if loss (accidental
    deletion, disk failure, theft, flood, fire, etc.)
  • Backup when anything of importance changes
  • new data, any modifications, analysis changes,
    doc. changes
  • Keep one copy off-site (able to retrieve quickly
    but distant)
  • Archiving
  • At final or major intermediate stages
    (presentation, publication, outside review, etc)
  • Snapshot Save all data that went into event
    along with all programs that were necessary to
    create publications and all output.
  • Archiving Save all original, programs to clean,
    merge, check, analysis. Save codebook, study
    protocol, study logs, all documentation, all
    final output, analysis. Document
    structure/directories on CD/DVD/Tape
  • Make multiple copies, keep one off-site in safe
    location permanently

  • Good Clinical Data Management Practices, Society
    for Clinical Data Management, Version 3,
    September, 2003. (
  • Guidance for Industry E6 Good Clinical Practice
    Consolidated Guidance, ICH, April 1996, 63 pgs.
    (http// Click on Guidelines)
  • North, Phillip Ensuring Good Statistical
    Practice in Clinical Research Guidelines for
    Standard Operating Procedures (An Update), Drug
    Information Journal, Vol. 32, pp. 665-682, 1998
  • Data Management for Multicenter Studies Methods
    and Guidelines, Controlled Clinical Trials, Vol.
    16, Number 2S, April 1995
  • Good Clinical Laboratory Practice Training
    Workshop, PPD-HVTN-NIAID, Washington, DC, May
    12-14 2002 (

Articles (cont)
  • Svend Juul. Take Good Care of your Data.
  • Assuring Data Quality and Validity in Clinical
    Trials for Regulatory Decision Making. Institute
    of Medicine. National Academy Press, 1999.
  • Review of the HIVNET 012 Perinatal HIV Prevention
    Study. Institute of Medicine. National Academy
    Press, 2005. (http//
  • Reviewer Guidance Conducting a Clinical Safety
    Review of a New Product Application and Preparing
    a Report on the Review, FDA/CDER, February, 2005,
    79 pgs. (

Articles (cont)
  • Clinical Data Capture, Clinical Trial EDC Task
    Group, RhRMA Bostatistics and Data Management
    Technical Group, Society for Clinical Data
    Management, 2005. (http//
  • IATA Dangerous Goods Regulations Manual
    m) or(http//

  • J. Kolman, P. Meng, G. Scott, Good Clinical
    Practice Standard Operating Procedures for
    Clinical Researchers, Wiley, 1998
  • E. McFadden, Management of Data in Clinical
    Trials, Wiley, 1998
  • D. Finkelstein, D. Schoenfield, Aids Clinical
    Trials Guidelines for Design and Analysis,
    Wiley, 1995
  • Gad SC., Taulbee SM., Handbook of data recording,
    maintenance and management for the biomedical
    sciences, CRC Press 1996.
  • John M. Marry, The Great Influenza The epic
    story of the deadliest plague in History, Viking
    Press 2004 (Origin of American experimental