Title: Implementation of Statistical Methods using SPSS Sourish Saha PhD student Department of Statistics University of Florida sourish@ufl.edu
1 Implementation of Statistical Methods using
SPSS Sourish SahaPhD studentDepartment of
StatisticsUniversity of Floridasourish_at_ufl.edu
2TOPICS
- Manipulating Data
- Recoding, Subsetting
- Descriptive Statistics
- Comparing MeansOne-Sample T Test,
Independent-Samples T Test, Paired-Samples T
TestOne-Way ANOVA, Multiple Comparison,
Correlations - Simple Multiple Regression Analysis
- Comparison of Several GroupsTwo-way
ANOVAChi-Square as a Test of HomogeneityKruskal-
Wallis Test - Logistic Regression
3To Recode the Values of a Variable into a New
Variable
-
- Transform -gt Recode -gt Into Different
Variables - Select the variables you want to recode.
-
- Enter an output (new) variable name and click
Change. -
- Click Old and New Values and specify how to
recode values.
4To Select Subsets of Cases Based on a
Conditional Expression
- Data
- Select Cases.
- Select If condition is satisfied.
- Click If.
- Enter the conditional expression.
-
5Exploring the data in SPSS
- Analyze Descriptive Statistics Descriptives
-
- Descriptives provides basic descriptive
statistics - n, mean, standard deviation, min and max.
6Exploring the data in SPSS
- Analyze Descriptive Statistics Explore
- Explore provides more descriptive statistics,
including the variance, skewness, kurtosis, the
median, percentiles and other descriptive
statistics and information. - Plots
- Boxplots, stem-and-leaf plots, histograms,
normality plots. - Reasons for using the Explore procedure include
data screening, outlier identification,
description, assumption checking.
7Exploring the data in SPSS
- Analyze Descriptive Statistics Frequencies
- Frequencies produces a frequency distribution
table. - Statistics and plots.
- Frequency counts, percentages, cumulative
percentages, quartiles, user-specified
percentiles, bar charts, pie charts, and
histograms and more
8Exploring the data in SPSS
- Analyze Descriptive Statistics Crosstabs
- Crosstabs with 2 variables creates a two-way
table or crosstabulation. With statistics button
one can choose among many statistics, including
the chi-square value along with its p-value. - The Crosstabs procedure offers tests of
independence and measures of association. One can
obtain estimates of the relative risk of an
event.
9Exploring the data in SPSS
- Analyze Descriptive Statistics Ratio
Statistics - The Ratio Statistics procedure provides a
comprehensive list of summary statistics for
describing the ratio between two scale variables.
10Means
- Analyze Compare Means Means
- The Means procedure calculates subgroup means and
related univariate statistics for dependent
variables within categories of one or more
independent variables. - The Means procedure is useful for both
description and analysis of scale variables. A
variety of statistics is available to
characterize the central tendency and dispersion
of your test variables. -
11One-Sample T Test
- Analyze Compare Means One Sample t-test
- The One-Sample T Test procedure tests the
difference between a sample mean and a known or
hypothesized value. - Allows you to specify the level of confidence for
the difference - Produces a table of descriptive statistics for
each test variable
12Independent-Samples T Test
- Analyze Compare Means Independent Samples
T-test - The Independent-Samples T Test procedure compares
means for two groups of cases. Ideally, for this
test, the subjects should be randomly assigned to
two groups, so that any difference in response is
due to the treatment (or lack of treatment) and
not to other factors. -
- Also displayed are
- Descriptive statistics for each test variable
- A test of variance equality
13Paired-Samples T Test
- Analyze Compare Means Paired-Samples T-test
- The Paired-Samples T Test procedure compares the
means of two variables for a single group. It
computes the differences between values of the
two variables for each case and tests whether the
average differs from 0.
14One-Way ANOVA
- Let
-
- be independent random samples from m normal
populations with the ith population having
parameters - Assuming equal variances, we want to test
the null hypothesis -
- against the alternative that any two of the
population means are unequal. - ANOVA involves partitioning the total
variation in the combined sample into two parts.
One part explains the variation between the
samples while the second part explains the
variation within each sample (SSTSSG SSE).
15One-Way ANOVA
- Analyze Compare Means One-Way
Anova - Produces a one-way analysis of variance for a
quantitative - dependent variable by a single factor
(independent) variable. - Used to test the hypothesis that several means
are equal. - Extension of the two-sample t test.
- In order to know which means differ.
- Two types of tests for comparing means a priori
contrasts and post hoc tests. Contrasts are tests
set up before running the experiment, and post
hoc tests are run after the experiment has been
conducted.
16One-Way ANOVA
- For each group number of cases, mean, standard
deviation, standard error of the mean, minimum,
maximum, and 95 confidence interval for the
mean. - Levenes test for homogeneity of variance,
analysis-of-variance table and robust tests of
the equality of means for each dependent
variable, user-specified a priori contrasts, and
post hoc range tests and multiple comparisons
Bonferroni, Tukeys honestly significant
difference, Scheffé, and least-significant
difference.
17Multiple Comparison tests
- Tests suitable for the simultaneous testing of
several hypotheses concerning the equality of
three or more population means. - When samples have been taken from several
populations, as a preliminary to the more general
question of whether the populations differ, there
is the simpler question of whether they have
different means. - If our null hypothesis is rejected then we wish
to know where the differences lie, like for
example using Tukeys test (HSD).
18Multiple Comparison tests
- With m populations,
- If null is rejected then we wish to know where
the differences lie. There are - pairs of populations that could be
compared.
19Bivariate Correlations
- The Bivariate Correlations procedure computes
Pearsons correlation coefficient (r), Spearmans
rho, and Kendalls tau-b with their significance
levels. - Correlations measure how variables or rank orders
are related. - Before calculating a correlation coefficient, one
should screen the data for outliers (which can
cause misleading results) and evidence of a
linear relationship. - Pearsons correlation coefficient is a measure of
linear association. Two variables can be
perfectly related, but if the relationship is not
linear, Pearsons correlation coefficient is not
an appropriate statistic for measuring their
association.
20Rank Correlation Coefficient
- Rank correlation is a method of finding the
degree of association between two variables. - The calculation for the rank correlation
coefficient the same as that for the Pearson
correlation coefficient, but is calculated using
the ranks of the observations and not their
numerical values. - This method is useful when the data are not
available in numerical form but information is
sufficient to rank the data.
21(No Transcript)
22Recode ComputeCreate new variable
- Transform -gt Compute
- Give the name of the Target variable
- In the Numeric Expression box choose the
Function of your - choice
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27Simple Linear Regression
- The simple linear regression is aimed at finding
the "best-fit" values of two parameters in the
following regression equation -
-
-
- "the y-intercept of the regression line
- "the slope of the regression line"
- A popular method for finding the "best-fit"
values is the Least Squares Regression method.
28Multiple Regression
- Multiple (linear) regression is a regression
technique aimed at finding a linear relationship
between the dependent variable and multiple
independent variables. - The multiple regression model is as follows
-
- Multiple regression finds the set of parameters
-
- that provides the best fit between the model
and the given data .
29(No Transcript)
30Kruskal - Wallis Test
- The Kruskal-Wallis test is a nonparametric test
for finding if three or more independent samples
come from populations having the same
distribution. - It is a nonparametric version of ANOVA.
31(No Transcript)
32Logistic Regression
- Useful for situations in which we want to predict
the presence or absence of a characteristic or
outcome based on values of a set of predictor
variables. - Similar to a linear regression model BUT it is
suited to models where the dependent variable is
dichotomous. - Logistic regression coefficients can be used to
estimate odds ratios for each of the independent
variables in the model.
33Logistic Regression
- To perform logistic regression, go to
- Analyze
- Choose Regression
- Then click on Binary Logistic