Title: SPSS 201: Using SPSS to Perform Commonly Used Statistical Testing in Medical Research (Workshop)
1SPSS 201 Using SPSS to Perform Commonly Used
Statistical Testing in Medical Research
(Workshop)
- Dr. Daisy Dai
- Department of Medical Research
2Who are biostatisticians?
- Ashley Sherman
- Phone 816-701-1347
- aksherman_at_cmh.edu
- Daisy Dai
- Phone 816-701-5233
- Email hdai_at_cmh.edu
- Consultation
- Experimental design and sampling plan
- Collaboration in presentation and publication of
studies - Education
- Research
3Statistical Courses
- SPSS 201 Using SPSS to perform statistical tests
I - SPSS 202 Using SPSS to perform statistical tests
II - SPSS 204 Using SPSS to manage data
- SPSS 203 Summarize data with tables and graphs
- STA 101 Properly Setting up and Designing a
Clinical Research Study Including Power Analysis
for Proper Patient Numbers (July 16th) -
- STA 102 Commonly Used Statistical Tests in
Medical Research - Part I - STA 103 Commonly Used Statistical Tests in
Medical Research - Part II
4Contents
- Review statistical tools (1 hour)
- Introduce SPSS (30 minutes)
- Practice (1 hour)
- Questions and discussions ( 30 minutes)
5Statistical tools
6Medical Research
- Clinical Trials
- Intervention or therapeutic
- Preventative
- Retrospective Studies
7Data
- Medical data
- Physics data
- Chemistry data
- Education
- Economics
- Social studies
- Sensory
- Nutrition
- Many more
- Continuous variable
- Interval variable
- Ordinal variable
- Categorical variable
- Binary variable
- Discrete variable
- Ordinal variable
8Information Collections
- Historical Data
- Pro Convenient Save a lot of work
- Con Outdated Different Objectives and Designs
Unknown Detailed Information - Census
- Pro reliable, accurate and comprehensive (e.g.
Population census) - Con Time consuming requiring more resources
difficult to investigate all subjects in the
population - Sampling
- Pro Efficient Less risky exploratory
informative - Caveats Selection bias misinterpretation
design flaw
9Statistics
- Descriptive Statistics
- Methods to organize and summarize information
- Mean, median, max, min, frequency and
proportions, etc. that summarize sample
demographics - Inferential Statistics
- Methods to draw conclusions about a population
based on information obtained from a sample of
the population
10Population
Inferential Statistics
Sampling Plan
Conclusion
Sample
Descriptive Statistics
11Summary Statistics
- Measures of Center
- Mean
- Median the middle value in its ordered list
- Mode the most frequently occurring value
- Measures of variation
- Range the difference between the largest and
smallest value in the data set, i.e.,
RangeMax-Min. - Standard deviation measure variation by
indicating how far, on average, the observations
are from the mean.
We will talk more about data summary and
distribution graphs in SPSS 204 Workshop.
12Exercise Determine for the mean, median and
mode, which measure of center is most appropriate
in the following case studies?
- A student takes four exams in a biology class.
His grades are 88, 75, 95, and 100. - The National Association of REALTORS publishes
data on resale prices of U.S. homes. - In the 2003 Boston Marathon, there were two
categories of official finishers male and
female, of which there were 10,737 and 6,309,
respectively.
13Statistical Testing Procedures
- Clarify study objectives.
- Establish hypotheses.
- Determine the outcome variables, treatment
groups, risk factors and covariates. - Perform appropriate statistical testing.
- Interpret results.
14Statistical Testing Procedures
- Null Hypothesis
- Ho Mean_TreatmentMean_Control
- Alternative Hypothesis
- Ha Mean_Treatment ? Mean_Control (Two-sided
Test) - Ha Mean_Treatment gt Mean_Control (One-sided
Test) - Ha Mean_Treatment lt Mean_Control (One-sided
Test) - Calculate statistics
- Make Inference
- If P-value gt 0.05, then Ho holds
- If P-value lt 0.05, then Ha holds
15Continuous Variables
- Two or multiple treatment groups
16Two samples t-test
- Compare the means of a normally distributed
interval dependent variable for two independent
groups.
17Case Study FEV1 Changes
- A new compound, ABC-123, is being developed
for long-term treatment of patients with chronic
asthma. Asthma patients were enrolled in a
double-blind study and randomized to receive
daily oral or a placebo for 6 weeks.
asthmatic patients
Placebo
Test
FEV1 after 6-week treatment
18FEV1 Data
Test Group Test Group Test Group
Patient ID Baseline week 6
101 1.35 n/a
103 3.22 3.55
106 2.78 3.15
108 2.45 2.3
109 1.84 2.37
110 2.81 3.2
113 1.9 2.65
116 3 3.96
118 2.25 2.97
120 2.86 2.28
121 1.56 2.67
124 2.66 3.76
Placebo Group Placebo Group Placebo Group
Patient ID Baseline week 6
102 3.01 3.9
104 2.24 3.01
105 2.25 2.47
107 1.65 1.99
11 1.95 n/a
112 3.05 3.26
114 2.5 2.55
115 1.6 2.2
117 .77 2.56
119 2.06 2.9
122 1.71 n/a
123 3.54 2.92
19What is the difference between std and std error?
P-value
P-value
20Mean and Error Bar
- Conclusion
- As compared to placebo, the new drug did not
show any effect on FEV1.
21Paired t-test
- Compare the means of a normally distributed
interval dependent variable for two related
groups.
Test Group Test Group Test Group
Patient ID Baseline week 6
101 1.35 n/a
103 3.22 3.55
106 2.78 3.15
108 2.45 2.3
109 1.84 2.37
110 2.81 3.2
113 1.9 2.65
116 3 3.96
118 2.25 2.97
120 2.86 2.28
121 1.56 2.67
124 2.66 3.76
22Conclusion For subjects on the new drug, FEV1
at week 6 is significantly higher than baseline.
P-value
23One-way ANOVA
- Test for differences of the means for continuous
variables in multiple independent treatment
groups.
24Case Study HAM-A Scores in GAD
Patients with GAD
- A new serotonin-update inhibiting agent,
SN-X95, is being studied in subjects with general
anxiety disorder (GAD). Fifty-two subjects
diagnosed with GAD were enrolled and randomly
assigned to one of three treatment groups three
treatment groups 25mg SN-X95, 100mg SN-X95 or
placebo. After 10 weeks of once-daily oral dosing
in a double-blind fashion, a test based on the
Hamilton Rating Scale for Anxiety (HAM-A) was
administered. This test consists of 14
anxiety-related items (e.g. anxious mood,
tension, insomnia, fear, etc.), each rated
by the subject as no present, mild,
moderate, severe, or very severe. HAM-A
test scores were founded by summing the coded
values of all 14 items using the numeric coding
scheme of 0 for not present, 1 for . Are there
any differences in means HAM-A test score among
the three groups?
100 mg SN-X95
25mg SN-X95
Placebo
HAM-A Score after 10-week treatment
25Data
Lo-Dose Hi-Dose Placebo
21 16 22
18 21 26
19 31 29
99 25 19
28 23 99
22 25 33
30 18 37
27 20 25
28 18 28
19 16 26
23 24 99
22 22 31
20 21 27
19 16 30
26 33 25
35 21 22
99 17 36
26P-value
27Mean and Error Bar
- Conclusion
- There is significant difference in mean HAM-A
among three treatment at 95 confidence level.
28Categorical Variables
- Two or multiple treatment groups
29Fishers Exact Test
- A conservative non-parametric test about a
relationship between two categorical variables.
Responders Non-responders Total
Group 1 N11 N12 N11N12
Group 2 N21 N22 N21N22
Combined N11N21 N12 N22 N
30Case Study CHF Incidence in CABG after ARA
- A new adenosine-releasing agent (ARA), thought
to reduce side effects in patients undergoing
coronary artery bypass surgery (CABG), was
studied in a pilot trial.
CHF No CHF Total
ARA 2 (6) 33 35
Placebo 5 (25) 20 25
Combined 7 53 60
Fishers exact test p0.0455
31Chi-square test
- Test a relationship between two categorical
variables. The chi-square test assumes that the
expected value for each cell is five or higher.
32Case Study ADR Frequency with Antibiotic
Treatment
- A study was conducted to monitor the incidence
of GI adverse drug reactions of a new antibiotic
used in lower respiratory tract infections.
Responders Non-responders Total
Test (new antibiotic) 22 (33) 44 66
Control (erythromycin) 28 (54) 24 53
Combined 50 (42) 68 118
Chi-square test p0.0252 Fishers exact test
p0.0385
33Other tests
- One-way repeated measures ANOVA
- Repeated measures logistic regression
- Factorial ANOVA
- Friedman test
- Factorial logistic regression
- Simple Linear Regression
- Multiple Regression
- Factor analysis
- Multiple logistic regression
- Discriminant analysis
- One-way MANOVA
- Multivariate multiple regression
- Canonical correlation
- Analysis of covariance
We will cover all tests including non-parametric
tests in SPSS 202 Workshop.
34Questions?
35Introduction to SPSS
36What is SPSS?
- Statistical software.
- 16 server licenses.
- SPSS 18.
37SPSS Data Entry
- SPSS data can be entered manually.
- The format is ready for analysis.
- SAS, Excel, txt, etc. data can be easily imported
to SPSS. - SPSS data files are saved as SPSS data document
(.sav). - SPSS output files are saved as SPSS viewer
document (.spv).
38SPSS Data Entry
- SPSS has a few unique features in data entry.
- Categorical variables need to be coded. For
instance, code male as 1 and female as 0 or vice
versa. - When you have two treatments, test and control,
please use 1 for test and 0 for control. - Categorical variables that are not coded in other
sourced data files will not be imported or
analyzed properly in SPSS. - Continuous variables dont need coding.
- Missing values needs to be defined in variable
view page.
39Example CDC Survey Data
- An allergy survey was conducted in 2005 and 2006
to children more than 1 year old. - Two data sets, allergy questionnaire and
demographic information, are saved in sas export
format.
40Tasks
- Import these two SAS data files to SPSS and save
them as SPSS data file. - Sort each data set by study ID.
- Merge allergy variables and demographic
variables. - Save new data set as SPSS data file.
41Log in SPSS
- CMH offers server version SPSS 18. Any employee
can log in SPSS from your employee account. - Go to Start
- -gtProgram
- -gtAccessories
- -gt Remote Desktop Connection
42Log in SPSS
- In the prompted connection window, enter cmhterm.
- Click Connect.
43Log in SPSS
- In the Log On Window, enter your cmh user name
and password. - Choose log on to CMH
- Click OK.
44Task 1 Import Data
- We need to import two data sets to SPSS.
- Allergy qustionaire aqq_d.xpt (xpt is sas export
file) - Demographic information demo_d.xpt
- Please note that SPSS is on server and data must
be saved in shared drive such as u drive or w
drive. You will not be able to find the file in
SPSS if you save them on your local disk.
45Task 1 Import Data
- Double click spss 18 icon on the screen.
- In the task wizard, click Open an existing
source. - Click OK.
46Task 1 Import Data
- Just in case wizard does not prompt, you can go
to file - -gt Open
- -gt Data
47Task 1 Import Data
- Select the folder.
- Choose agg_d file.
- Select xpt format.
- Click Open.
- Note SPSS is compatible with other commonly used
statistical and data management software
packages. Excel, SAS, Access files are all
convertible to SPSS.
48Task 1 Import Data
- Now the data is open.
- You can see the data in data View tab.
49Task 1 Import Data
- The data structure, variable name, label, etc.
are in Variable View tab.
50Task 2 Sort Data
- Variable to be sort SEQN, that is, Respondent
sequence number.
51Task 2 Sort Data
- Go to Data and select Sort Cases.
- On Sort Cases page, select the variable,
Respondent sequence number. - Click on right arrow.
- Choose Ascending or Descending.
- Click OK.
52Practice
- Now lets repeat this process by doing the
following - Open the demographic data, demo_d.xpt.
- Sort the data by variable, Respondent Sequence
Number.
53Task 3 Merge Two Data Sets
- Two data sets need to be linked by key variables.
- In our case, the key variable is SEQN-Respondent
Sequence Number. - Make sure the key variable has the same name and
variable type in two data sets. - Both data sets needs to be sorted by the key
variable.
54Task 3 Merge Two Data Sets
- Under any data set, go to Data
- -gt Merge File
- -gt Add Variables
55Task 3 Merge Two Data Sets
- Choose the other data to add on.
- Note, this page will look different in SPSS 18.
By all means, choose the other data set.
56Task 4 Save the New Data
- Go to File
- -gt Save As
- Select the folder.
- Create new file, MergedData.
- Choose SPSS data format.
- Click Save.
57Task 4 Save the New Data
- Go to Data
- -gt Merge File
- -gt Add Variables
58Questions?
We will cover more data management in SPSS 203
workshop.
59Lets play with SPSS
60Project 1 FEV1 Changes
61Case Study FEV1 Changes
- A new compound, ABC-123, is being developed
for long-term treatment of patients with chronic
asthma. Asthma patients were enrolled in a
double-blind study and randomized to receive
daily oral or a placebo for 6 weeks.
asthmatic patients
Placebo
Test
FEV1 after 6-week treatment
62FEV1 Data
Test Group Test Group Test Group
Patient ID Baseline week 6
101 1.35 n/a
103 3.22 3.55
106 2.78 3.15
108 2.45 2.3
109 1.84 2.37
110 2.81 3.2
113 1.9 2.65
116 3 3.96
118 2.25 2.97
120 2.86 2.28
121 1.56 2.67
124 2.66 3.76
Placebo Group Placebo Group Placebo Group
Patient ID Baseline week 6
102 3.01 3.9
104 2.24 3.01
105 2.25 2.47
107 1.65 1.99
11 1.95 n/a
112 3.05 3.26
114 2.5 2.55
115 1.6 2.2
117 .77 2.56
119 2.06 2.9
122 1.71 n/a
123 3.54 2.92
63Tasks
- Log in to intranet and open SPSS.
- Define variables and missing values in variable
view tab. - Enter data in data view tab.
- Perform two-sample t-tests to compare FEV1 at 6
weeks between test and control. - Generate mean and error bar graph for two groups.
- Interpret the SPSS output and make conclusion.
64Tasks to be continued
- Perform paired t-test to compare the FEV between
baseline and 6 weeks for test group. - Interpret SPSS results and draw conclusions.
- Save SPSS data and SPSS output respectively.
- Open SPSS data and SPSS output by double clicking
the icons. - Close both files.
65Project 2 HAM-A Scores in GAD
66Case Study HAM-A Scores in GAD
Patients with GAD
- A new serotonin-update inhibiting agent,
SN-X95, is being studied in subjects with general
anxiety disorder (GAD). Fifty-two subjects
diagnosed with GAD were enrolled and randomly
assigned to one of three treatment groups three
treatment groups 25mg SN-X95, 100mg SN-X95 or
placebo. After 10 weeks of once-daily oral dosing
in a double-blind fashion, a test based on the
Hamilton Rating Scale for Anxiety (HAM-A) was
administered. This test consists of 14
anxiety-related items (e.g. anxious mood,
tension, insomnia, fear, etc.), each rated
by the subject as no present, mild,
moderate, severe, or very severe. HAM-A
test scores were founded by summing the coded
values of all 14 items using the numeric coding
scheme of 0 for not present, 1 for . Are there
any differenceds in means HAM-A test score among
the three groups?
100 mg SN-X95
25mg SN-X95
Placebo
HAM-A Score after 10-week treatment
67Data
Lo-Dose Hi-Dose Placebo
21 16 22
18 21 26
19 31 29
99 25 19
28 23 99
22 25 33
30 18 37
27 20 25
28 18 28
19 16 26
23 24 99
22 22 31
20 21 27
19 16 30
26 33 25
35 21 22
99 17 36
68Tasks
- Open data in excel. Make sure the data structure,
variables and missing values are set up properly.
- Import Excel to SPSS.
- Perform one-way ANOVA to compare high dose, low
dose and control groups. - Generate mean and error bar graph for three
groups. - If the global F-test is significant, then perform
post-hoc pair-wise comparisons. - Interpret the SPSS output and make conclusion.
- Save data and output.
- Close files.
69Project 3 CHF Incidence in CABG after ARA
70Case study CHF Incidence in CABG after ARA
- A new adenosine-releasing agent (ARA), thought to
reduce side effects in patients undergoing
coronary artery bypass surgery (CABG), was
studied in a pilot trial That enrolled 35
patients who receive active medication and 20
patients who received a placebo. Follow-up
observation revealed that 2 patients who received
active medication and 5 patients who received the
placebo had shown symptoms of congestive heart
failure (CHF) within 90 days post surgery. Is
this evidence of a reduced rate of CHF for
patients treated with the ARA compound?
71Tasks
- Open SPSS data.
- Summarize frequency, percentage in two-way
contingency table. - Perform Fishers exact test.
- Perform Chi-square test.
- Compare Fishers exact test with Chi-square test.
- Interpret the SPSS output and make conclusion.
- Close files.
72Project 4 ADR Frequency with Antibiotic Treatment
73Case Study ADR Frequency with Antibiotic
Treatment
- A study was conducted to monitor the incidence of
GI adverse drug reactions of a new antibiotic
used in lower respiratory tract infections. Two
parallel groups were included in the study. One
group consisted of 66 LRTI patients randomized to
receive the new treatment and a reference group
of 52 patients randomized to receive
erythromycin.
74Tasks
- Open SPSS data.
- Summarize frequency, percentage in two-way
contingency table. - Perform Fishers exact test.
- Perform Chi-square test.
- Compare Fishers exact test with Chi-square test.
- Interpret the SPSS output and make conclusion.
- Close files.
75Questions?
Let us know statistics topics you are interested.
76In summary
77Thank You
- For more information, visit my website
- http//www.childrensmercy.org/content/view.aspx?id
9740 - Or go to Scope -gtResearch -gt Medical Research -gt
Statistics
78References
- Medical Statistics by Campbell et al.
- Introductory Statistics by Neil Weiss
- Common Statistical Methods for Clinical Research
by Walker