Quantitative Data Analysis presentation

About This Presentation

Transcript and Presenter's Notes

Title: Quantitative Data Analysis

1
Quantitative Data Analysis
Edouard Manet In the Conservatory, 1879
2

Quantification of Data
Introduction
To conduct quantitative analysis, responses to
open-ended questions in survey research and the
raw data collected using qualitative methods must
be coded numerically.

Quantification of Data
Introduction (Continued)
Most responses to survey research questions
already are recorded in numerical format.
In mailed and face-to-face surveys, responses are
keypunched into a data file.
In telephone and internet surveys, responses are
automatically recorded in numerical format.

Quantification of Data
Developing Code Categories
Coding qualitative data can use an existing
scheme or one developed by examining the data.
Coding qualitative data into numerical categories
sometimes can be a straightforward process.
Coding occupation, for example, can rely upon
numerical categories defined by the Bureau of the
Census.

Quantification of Data
Developing Code Categories (Continued)
Coding most forms of qualitative data, however,
requires much effort.
This coding typically requires using an iterative
procedure of trial and error.
Consider, for example, coding responses to the
question, What is the biggest problem in
attending college today.
The researcher must develop a set of codes that
are
exhaustive of the full range of responses.
mutually exclusive (mostly) of one another.

Quantification of Data
Developing Code Categories (Continued)
In coding responses to the question, What is the
biggest problem in attending college today, the
researcher might begin, for example, with a list
of 5 categories, then realize that 8 would be
better, then realize that it would be better to
combine categories 1 and 5 into a single category
and use a total of 7 categories.
Each time the researcher makes a change in the
coding scheme, it is necessary to restart the
coding process to code all responses using the
same scheme.

Quantification of Data
Developing Code Categories (Continued)
Suppose one wanted to code more complex
qualitative data (e.g., videotape of an
interaction between husband and wife) into
numerical categories.
How does one code the many statements, facial
expressions, and body language inherent in such
an interaction?
One can realize from this example that coding
schemes can become highly complex.

Quantification of Data
Developing Code Categories (Continued)
Complex coding schemes can take many attempts to
develop.
Once developed, they undergo continuing
evaluation.
Major revisions, however, are unlikely.
Rather, new coders are required to learn the
existing coding scheme and undergo continuing
evaluation for their ability to correctly apply
the scheme.

Quantification of Data
Codebook Construction
The end product of developing a coding scheme is
the codebook.
This document describes in detail the procedures
for transforming qualitative data into numerical
responses.
The codebook should include notes that describe
the process used to create codes, detailed
descriptions of codes, and guidelines to use when
uncertainty exists about how to code responses.

Quantification of Data
Data Entry
Data recorded in numerical format can be entered
by keypunching or the use of sophisticated
optical scanners.
Typically, responses to internet and telephone
surveys are entered directly into a numerical
data base.
Cleaning Data
Logical errors in responses must be reconciled.
Errors of entry must be corrected.

Univariate Analysis
Distributions
Data analysis begins by examining distributions.
One might begin, for example, by examining the
distribution of responses to a question about
formal education, where responses are recorded
within six categories.
A frequency distribution will show the number and
percent of responses in each category of a
variable.

Univariate Analysis
Central Tendency
A common measure of central tendency is the
average, or mean, of the responses.
The median is the value of the middle case when
all responses are rank-ordered.
The mode is the most common response.
When data are highly skewed, meaning heavily
balanced toward one end of the distribution, the
median or mode might better represent the most
common or centered response.

Univariate Analysis
Central Tendency (Continued)
Consider this distribution of respondent ages
18, 19, 19, 19, 20, 20, 21, 22, 85
The mean equals 27. But this number does not
adequately represent the common respondent
because the one person who is 85 skews the
distribution toward the high end.
The median equals 20.
This measure of central tendency gives a more
accurate portrayal of the middle of the
distribution.

Univariate Analysis
Dispersion
Dispersion refers to the way the values are
distributed around some central value, typically
the mean.
The range is the distance separating the lowest
and highest values (e.g., the range of the ages
listed previously equals 18-85).
The standard deviation is an index of the amount
of variability in a set of data.

Univariate Analysis
Dispersion (Continued)
The standard deviation represents dispersion with
respect to the normal (bell-shaped) curve.
Assuming a set of numbers is normally
distributed, then each standard deviation equals
a certain distance from the mean.
Each standard deviation (1, 2, etc.) is the
same distance from each other on the bell-shaped
curve, but represents a declining percentage of
responses because of the shape of the curve (see
Chapter 7).

Univariate Analysis
Dispersion (Continued)
For example, the first standard deviation
accounts for 34.1 of the values below and above
the mean.
The figure 34.1 is derived from probability
theory and the shape of the curve.
Thus, approximately 68 of all responses fall
within one standard deviation of the mean.
The second standard deviation accounts for the
next 13.6 of the responses from the mean (27.2
of all responses), and so on.

Univariate Analysis
Dispersion (Continued)
If the responses are distributed approximately
normal and the range of responses is lowmeaning
that most responses fall close to the meanthen
the standard deviation will be small.
The standard deviation of professional golfers
scores on a golf course will be low.
The standard deviation of amateur golfers scores
on a golf course will be high.

Univariate Analysis
Continuous and Discrete Variables
Continuous variables have responses that form a
steady progression (e.g., age, income).
Discrete (i.e., categorical) variables have
responses that are considered to be separate from
one another (i.e., sex of respondent, religious
affiliation).

Univariate Analysis
Continuous and Discrete Variables
Sometimes, it is a matter of debate within the
community of scholars about whether a measured
variable is continuous or discrete.
This issue is important because the statistical
procedures appropriate for continuous-level data
are more powerful, easier to use, and easier to
interpret than those for discrete-level data,
especially as related to the measurement of the
dependent variable.

Univariate Analysis
Continuous and Discrete Variables (Continued)
Example Suppose one measures amount of formal
education within five categories less than hs,
hs, 2-years vocational/college, college,
post-college).
Is this measure continuous (i.e., 1-5) or
discrete?
In practice, five categories seems to be a cutoff
point for considering a variable as continuous.
Using a seven-point response scale will give the
researcher a greater chance of deeming a variable
to be continuous.

Subgroup Comparisons
Collapsing Response Categories
Sometimes the researcher might want to analyze a
variable by using fewer response categories than
were used to measure it.
In these instances, the researcher might want to
collapse one or more categories into a single
category.
The researcher might want to collapse categories
to simplify the presentation of the results or
because few observations exist within some
categories.

Subgroup Comparisons
Collapsing Response Categories Example
Response Frequency
Strongly disagree 2
Disagree 22
Neither agree nor disagree 45
Agree 31
Strongly Agree 1

Subgroup Comparisons
Collapsing Response Categories Example
One might want to collapse the extreme responses
and work with just three categories
Response Frequency
Disagree 24
Neither agree nor disagree 45
Agree 32

Subgroup Comparisons
Handling Dont Knows
When asking about knowledge of factual
information (Does your teenager drink alcohol?)
or opinions on a topic the subject might not know
much about (Do school officials do enough to
discourage teenagers from drinking alcohol?), it
is wise to include a dont know category as a
possible response.
Analyzing dont know responses, however, can be
a difficult task.

Subgroup Comparisons
Handling Dont Knows (Continued)
The research-on-research literature regarding
this issue is complex and without clear-cut
guidelines for decision-making.
The decisions about whether to use dont know
response categories and how to code and analyze
them tends to be idiosyncratic to the research
and the researcher.

Bivariate Analysis
Introduction
Bivariate analysis refers to an examination of
the relationship between two variables.
We might ask these questions about the
relationship between two variables
Do they seem to vary in relation to one another?
That is, as one variable increases in size does
the other variable increase or decrease in size?
What is the strength of the relationship between
the variables?

Bivariate Analysis
Bivariate Tables
Divide the cases into groups according to the
attributes of the independent variable (e.g., men
and women).
Describe each subgroup in terms of attributes of
the dependent variable (e.g., what percent of men
approve of sexual equality and what percent of
women approve of sexual equality).

Bivariate Analysis
Bivariate Tables (Continued)
Read the table by comparing the independent
variable subgroups with one another in terms of a
given attribute of the dependent variable (e.g.,
compare the percentages of men and women who
approve of sexual equality).
Bivariate analysis gives an indication of how the
dependent variable differs across levels or
categories of an independent variable.
This relationship does not necessarily indicate
causality (see Chapter 15).

Bivariate Analysis
Contingency Tables
Tables that compare responses to a dependent
variable across levels/categories of an
independent variable are called contingency
tables (or sometimes, crosstabs).
When writing a research report, it is common
practice, even when conducting highly
sophisticated statistical analysis, to present
contingency tables also to give readers a sense
of the distributions and bivariate relationships
among variables.

Bivariate Analysis
Contingency Tables (Continued)
A table should have a title that succinctly
describes what is contained in the table.
If a table lists information about a scale or
index, then it or a prior table should list the
statements used to measure the scale or index.
The attributes of each variable should be clearly
indicated.
The base of percentages should be reported.
Notes should be provided about missing data.

Multivariate Analysis
Introduction
Although informative, bivariate analysis can
mislead the researcher regarding cause and
effect.
Multivariate analysis (see Ch. 15-16) often is
needed to gain a better understanding of cause
and effect among variables.
Multivariate analysis can involve the
introduction of a third variable into a
contingency table, or it can involve more
sophisticated analysis and presentation of
relationships among variables.

Multivariate Techniques
Factor Analysis
Factor analysis indicates the extent to which a
set of variables measures the same underlying
concept.
This procedure assesses the extent to which
variables are highly correlated with one another
compared with other sets of variables.
Consider the table of correlations (i.e., a
correlation matrix) on the following slide

33
Multivariate Techniques Factor Analysis
(Continued) X1 X2 X3 X4 X5 X6 X1 1 .52 .60 .21
.15 .09 X2 .52 1 .59 .12 .13 .11 X3 .60 .59 1 .08
.10 .10 X4 .21 .12 .08 1 .72 .70 X5 .15 .13 .10 .7
2 .68 .73 X6 .09 .11 .10 .70 .73 1
34

Multivariate Techniques
Factor Analysis (Continued)
Note that variables X1-X3 are moderately
correlated with one another, but have weak
correlations with variables X4-X6.
Similarly, variables X4-X6 are moderately
correlated with one another, but have weak
correlations with variables X1-X3.
The figures in this table indicate that variables
X1-X3 go together and variables X4-X6 go
together.

Multivariate Techniques
Factor Analysis (Continued)
Factor analysis would separate variables X1-X3
into Factor 1 and variables X4-X6 into Factor
2.
Suppose variables X1-X3 were designed by the
researcher to measure self-esteem and variables
X4-X6 were designed to measure marital
satisfaction.

Multivariate Techniques
Factor Analysis (Continued)
The researcher could use the results of factor
analysis, including the statistics produced by
it, to evaluate the construct validity of using
X1-X3 to measure self-esteem and using X4-X6 to
measure marital satisfaction.
Thus, factor analysis can be a useful tool for
confirming the validity of measures of latent
variables.

Multivariate Techniques
Factor Analysis (Continued)
Factor analysis can be used also for exploring
groupings of variables.
Suppose a researcher has a list of 20 statements
that measure different opinions about same-sex
marriage.
The researcher might wonder if the 20 opinions
might reflect a fewer number of basic opinions.

Multivariate Techniques
Factor Analysis (Continued)
Factor analysis of responses to these statements
might indicate, for example, that they can be
reduced into three latent variables, related to
religious beliefs, beliefs about civil rights,
and beliefs about sexuality.
Then, the researcher can create scales of the
grouped variables to measure religious beliefs,
civil beliefs, and beliefs about sexuality to
examine support for same-sex marriage.

39
Questions?

Write a Comment

User Comments (0)

About PowerShow.com

Quantitative Data Analysis PowerPoint PPT Presentation