Collecting - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Collecting

Description:

Something used as a basis for calculating or measuring (Webster's Dictionary) ... Data Dictionary (Code Book, Metadata) What is the ... Data Dictionary Example ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 51
Provided by: richard195
Category:

less

Transcript and Presenter's Notes

Title: Collecting


1
Collecting Managing Data
  • General Clinical Research Center
  • Informatics Core
  • Richard Harris
  • (HarrisRichardM_at_UAMS.edu)
  • 501-257-5878

2
Goal
  • Introduction to Data Collection and Management
  • Defining data
  • Collecting data
  • Storing data
  • Retrieving data

3
Data Definition
  • What is data?
  • Representation of facts or concepts in a manner
    suitable for communication, interpretation, or
    processing by humans or by automatic means
  • Something used as a basis for calculating or
    measuring (Websters Dictionary)

4
Representation of Facts
Representation of Facts
105 / 74 110 / 78 108 / 75 103 / 70
5
Representation of Facts
Baghdad, Iraq
Average Temperature High / Low
105 / 74 110 / 78 108 / 75 103 / 70
Years Charted 9 Source International Station
Meteorological Climate Summary, Version 4.0
6
Data to Value
Data produces information
Information
Information enhances knowledge
Knowledge drives action
Action produces outcomes
Outcomes deliver value
Dave Wells, TDWI
7
Bench to Bedside
Drivers and Goals determine strategy
Strategy drives Protocols/Study design
Protocols produce Results
Positive Results produce Value
8
Parallel Relationship
Dave Wells, TDWI
9
Data Collection
  • Begin with the End in Mind
  • The 7 Habits of Highly Effective People
  • by Stephen R. Covey

10
General Considerations
  • Determine
  • What data you will collect
  • Why you will collect it
  • How you will collect it
  • What you will do with it

11
Data Collection
  • Data Dictionary (Code Book, Metadata)
  • What is the purpose of your protocol
  • Can you explain your hypothesis in numerical
    terms
  • List of fields/data points to collect
  • Subject Name, demographic information
  • Diagnosis, medication, lab results
  • Relationship of data collected

12
Data Dictionary Example
13
Data Structure Considerations
  • Use short, meaningful field names with no spaces
    and no special characters (_at_!)
  • Store a 4 digit year, not 2
  • Determine data type
  • Will I want to derive averages, sums, StdDev?
    (One, 1, 1)
  • Contemplate subject or sample ID numbering
    scheme. (HIPAA)
  • Store identifying data in a different place but
    linked.
  • Make users pick from a list when possible for
    more uniform data entry
  • DO NOT put first and last name in same field!
  • Height, Weight, Medication Dosage, etc.

14
For note takers
Determine all relevant data to be
collected. Create a Data Dictionary
(Metadata). De-Identify data. Backup data
regularly.
15
Storing Data
  • A database is a collection of data organized in a
    manner that allows access, retrieval, and use of
    that data.
  • A stack of paper research subject data forms
    could be considered a database.
  • Databases vary from the very simple to the very
    complex

16
Data Access
  • Retrieving information, such as a list of
    subjects with a specific diagnosis, from a stack
    of hundred subject forms would be laborious.

Electronic databases make data management much
easier!
17
Excel Example
18
Excel as a Database
  • An electronic spreadsheet can be considered a
    database.
  • Most spreadsheets are not

Relational
19
Relational Database
20
Relational Database
Lab Table keySubjectID/Lab
Visit Table keySubjectID/Visit
More efficient compared to one spreadsheet!
21
Table Example
Data Types SubjectID Numeric, Integer,
Text SubjectBirthYear Date (Date functions
built-in) Race / Ethnicity Make selection box
with NIH values
22
Collection - Data Entry Form
Modeled to represent Collection Forms or
Flowsheets
23
Collection - Data Entry Form
24
Datasheet or Table View
25
Importance of Relationships
  • Subject Table each subject has one entry
  • Visit Table - entry for each visit
  • LabResults Table entry for each lab test
  • Meds Table- entry for each medication

26
Table Data
Table structures presented as Flat files
27
Importing Data
  • Importing supported from
  • Access
  • Excel
  • Lotus
  • Text
  • Other file formats

28
Relational Database vs Spreadsheet
  • Unlike a spreadsheet a relational database can
    help insure that
  • visit data not collected without subjects basic
    (or root) data being present.
  • duplicate data not entered for the same subject.
  • data not redundantly entered.

29
Advantages of Relational Databases
  • You define the relationships among tables.
  • Each piece of information is stored in one place.
  • Reduces error since changes to information occur
    in only one place.
  • Easy to revise the structure of the data base.

30
Database Management System (DBMS) Software
  • Excel (?)
  • Small Database Software
  • Microsoft Access, FoxPro, mySQL
  • Midsize Database Software
  • Microsoft SQL Server, mySQL, PostgreSQL
  • Large (user base/data set) Software
  • Oracle, DB2, Sybase

31
Query to Retrieve Data
QUERY question to the database to locate
specific information
Examples
  • List all subject IDs with first visit after
    April, 2004
  • List the minimum, the maximum, and the average
    pulse rate for subject group
  • List all subject IDs of all females receiving
    DrugX
  • List all subjects and phone numbers in
    alphabetical order by subject last name, then by
    first name
  • List all subjects IDs whose blood pressure was
    higher on their last visit than it was on the
    their first visit and give the number of days
    between those visits

32
Structured Query Language (SQL)
  • SELECT tblSubject.SubjectID, tblSubject.SubjectBir
    thYear, tblSubject.SubjectGender,
    tblSubject.SubjectEthnicity, tblVisit.VisitDate,
    tblVisit.Systolic, tblVisit.Diastolic
  • FROM tblSubject
  • INNER JOIN tblVisit ON
  • tblSubject.SubjectID tblVisit.SubjectID

33
Structured Query Language (SQL)
  • SELECT tblVisit.SubjectID, tblVisit.VisitDate,
    tblVisit.VisitReason
  • FROM tblVisit
  • WHERE (((tblVisit.VisitDate) Between 4/15/2003
    And 5/15/2003))
  • ORDER BY tblVisit.VisitDate

34
Query Result
35
Common Query and SQL Shorthand
  • or GT,
  • or EQ, or NE
  • IS NULL (-99)
  • AND, OR, NOT
  • BETWEEN
  • and ? (with LIKE)
  • Parentheses usage

36
Structured Query Language (SQL)
  • SELECT tblSubject.SubjectID, tblVisit.VisitDate,
    tblSubject.SubjectGender, tblVisit.Systolic,
    tblVisit.Diastolic, tblVisit.Pulse,
    tblVisit.Respiration
  • FROM tblSubject INNER JOIN tblVisit ON
    tblSubject.SubjectID tblVisit.SubjectID
  • WHERE ((tblSubject.SubjectGender"1") AND
    (tblVisit.Systolic150) AND (tblVisit.Diastolic)) OR (tblVisit.Pulse90)

37
Query Results
38
Structured Query Language (SQL)
  • SELECT tblSubject.SubjectID, tblVisit.VisitDate,
    tblSubject.SubjectGender, tblVisit.Systolic,
    tblVisit.Diastolic, tblVisit.Pulse,
    tblVisit.Respiration
  • FROM tblSubject INNER JOIN tblVisit ON
    tblSubject.SubjectID tblVisit.SubjectID
  • WHERE ((tblSubject.SubjectGender"1") AND
    (tblVisit.Systolic150) AND (tblVisit.Diastolic)) OR (tblVisit.Pulse90)

39
Query in Design View
40
Analysis in Access
  • Queries
  • Sort
  • Inclusion/Exclusion
  • Boolean Logic
  • Calculated values
  • Cross tab results
  • Pivot tables
  • Reports
  • Additional calculated values
  • Visual Basic for complex calculations

41
Reporting / Analysis
Queries produce a subset of records Reports
format and present the subsets Examples Query
the database for list of subjects WHERE vital
signs above or below expected values Format the
report in alphabetical order or date sequence or
descending order by specified field Format the
last and first name together with a comma
separator
42
Reporting / Analysis
43
Analysis - Report Design Example
44
Common Reporting Needs
  • Count number of records meeting criteria
  • Calculate sums, averages, totals, standard
    deviation and specialty specific equations
  • Accounting of study work and recruiting
  • Cross sectional analysis or representation of
    study findings
  • Statistical Analysis

45
Regular Interim Reporting
  • Reports should be generated early and often in
    order to identify problems in data entry or
    collection.
  • Simple summary statistics on variables helpful in
    identifying outliers or nonsensical results.
  • A schedule should be followed.

46
Exporting Data
Options Access Excel HTML Lotus Text RTF Others
47
Why Export?
  • Analyze data in a more robust program
  • Analyze data in a program familiar to you
  • Interface with a new or specialized program
  • No re-keying increases data integrity
  • No re-keying saves time!

48
Export to Excel
  • From Access, export to Excel using Save
    As/Export
  • From Excel, open the file
  • Data sorts
  • Functions
  • Pivot Table
  • Data Analysis
  • Charts

49
Why Collect, Store, Retrieve Data?
  • Quantifying the results of your protocol
  • Explaining your hypothesis in numerical terms
  • Statistical Analysis
  • Progress Reports
  • Publications, Publications, Publications!
  • Even if findings do not support your hypothesis
  • Data Sharing and Knowledgebase for collaborations
    and future projects
  • And, Publications, Publications, Publications

50
What can the GCRC do for you?
  • Database Design Consultation
  • Database Storage on Server
  • Backup ?
  • Data Translation Assistance
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com