Title: Exploring configurational causation in large datasets with QCA: possibilities and problems
1Exploring configurational causation in large
datasets with QCA possibilities and problems
- Barry Cooper Judith Glaesser
- School of Education, Durham University
3rd ESRC Research Methods Festival St Catherines
College Oxford, 30 June 3 July 2008
2A note re these slides.
- Some of these slides will be used in our
presentation itself but some have been written to
provide, as a context for the tables, etc., a
pre- and post-festival web-based sketch of the
method we have employed (Ragins Qualitative
Comparative Analysis, or QCA) for any readers new
to it. - After a brief description of the background to
Ragins development of the set theoretic
approach, and a list of what we see as its
strengths, we will illustrate its use with large
n data, drawing on our experience of using QCA
(Cooper, 2005, 2006 Cooper Glaesser, 2007,
2008, in press Glaesser, forthcoming). - To keep things less complex than they would
otherwise become, we will not draw attention,
during this part of our presentation, to the more
problematic issues that we wish to mention. - Instead, we deal with this aspect of our
presentation after the illustration of the use of
QCA in a large n context.
3Concerns about the dominant regression approach
in quantitative analysis have a long history.
Here, for example, are various remarks taken from
Peter Abells 1971 book, Model Building in
Sociology
- It is often (perhaps more often than not) the
case that the covariation between sociological
variables is not linear (p.174). - It was argued ... that interaction is a
characteristic feature of sociological
covariation (p.183). - Multicollinearity is pervasive in sociology it
is more often than not the case that explanatory
variables are intercorrelated (p.189). - But from what was said earlier it might be
expected that (cardinal) variables will be of
relatively rare occurrence in sociology. One is
much more likely to encounter the situation where
nominal and ordinal variables are related
(p.197). - We have noted earlier that the typical causal
situation in social science is one of
over-determination many different clusters of
variables are sufficient for a given effect
(p.236). - Abells book also includes considerable
discussion of the logic of necessary and
sufficient conditions alongside his discussion of
linear modelling.
4Several authors, from various perspectives, have
raised important concerns about regression and
its uses. For example (see attached bibliography
for details)
- Boudon (1974a,b)
- Byrne (1998, 2002)
- Freedman (1987, 1997)
- Hedström (2005)
- Lieberson (1985)
- Morgan and Winship (2007)
- Ormerod (1998)
- Pawson Tilley (1997)
- Pearl (2000)
- Ron (2002)
- Sörensen (1998)
- Taagepera (2005).
5Andrew Abbott (2001) has summarised some of the
key assumptions of the linear model normally used
in regression
- The social world is made up of fixed entities
with varying attributes (demographic assumption). - Some attributes determine (cause) others
(attribute causality assumption). - What happens to one case doesn't constrain what
happens to others, temporally or spatially
(casewise independence assumption). - Attributes have one and only one causal meaning
within a given study (univocal meaning
assumption). - Attributes determine each other principally as
independent scales rather than as constellations
of attributes main effects are more important
than interactions (which are complex types) (main
effects assumption).
6Charles Ragins work
- Ragin (1987) shared many of the concerns of
these various writers, but, in particular
perhaps, focussed on Abbotts third and fourth
points, the relative neglect of causal
heterogeneity and complex interaction in
regression models when used in practice1. Using
set theory rather than regressions linear
algebra as the basis for developing a
configurational approach to causal modelling, he
began to explore ways in which (i) complex
interaction between causal factors and (ii)
causal heterogeneity (i.e. the existence of
several distinct types of cases in a
population2 and therefore of possible
multiple pathways to an outcome) could be
described in Boolean or configurational terms
(Ragin, 1987, 2000, 2006a). In doing so, he also
aimed to shift researchers practices away from a
focus on the net average effects of variables
(i.e. on which variables win the race to explain
most variance) and towards an approach that
recognised that events in the world are often
caused by conjunctions of factors (Ragin, 2006b).
It is his Qualitative Comparative Analysis (QCA)
on which we focus in this paper. - 1 On Abbotts second point, see Hedström
(2005). - 2 The returns to cognitive capacity, for
example, might differ systematically between
social classes.
7Before introducing QCA in more detail, we might
set out what we regard as the strengths of
Ragins approach
- A focus on cases and their constituent features
rather than, as in regression, on abstracted
variables (and therefore net and often average
effects). - Analysis of multiple and conjunctural causation
in terms of necessary and/or sufficient
conditions rather than in terms of the linear
additive model. - The recognition, up front, of the possibility of
causal heterogeneity. - The offer of a rigorous approach, drawing on set
theory and logic, to the analysis of these
features of social reality. - Through a focus on INUS1 conditions, the
allowing, up front, of complex interactions
between causes. - The recognition of the problems resulting from
limited diversity in social datasets. - 1 An INUS condition is an insufficient but
non-redundant part of an unnecessary but
sufficient condition (Mackie, 1974).
8Boolean functional form an example
- Ragins QCA and its associated software use
Boolean algebra to address conjunctural
causation. Boolean equations have a different
functional form to the regression equations with
which social scientists are familiar. Here is an
example taken from a paper contrasting the
approaches (Mahoney Goertz, 2006) - Y (ABc) (ACDE)
-
- In these equations the symbol indicates Logical
AND (set intersection), indicates Logical OR
(set union), upper case letters indicate the
presence of factors, lower case indicate their
absence. In this fictional example of causal
heterogeneity, the equation indicates that there
are two causal paths to the outcome Y. The first,
captured by the causal configuration ABc
involves the presence in the case of features A
and B, combined with the absence of C. The
second, captured by ACDE, requires the joint
presence of A, C, D and E. Either of these causal
configurations is sufficient for the outcome to
occur, but neither is necessary, considered
alone. A is necessary but not sufficient. The
factor C behaves differently in the two
configurations. This non-probabilistic - or
veristic - example, of course, assumes no
empirical exceptions to these relations.
9QCA Sufficiency and quasi-sufficiency
- Sufficiency, understood causally or
logically, involves a subset relation. If, for
example, a single condition is always sufficient
for an outcome to occur, the set of cases with
the condition will be a subset of the set of
cases with the outcome. This is shown in Figure 1
(next slide) based on a hypothetical relation
between being of service class origin and
achieving a degree. Given the condition, we
obtain the outcome. In applications to real large
n data, perfect sufficiency is unlikely to be
found, and a situation like Figure 2 (next slide)
will often be found, where most but not all of
the set of cases with the condition also are
members of the outcome set. - Using conventional crisp sets, the
proportion of the members of the condition set
who are also members of the outcome set can be
used as a measure of the degree of consistency of
the empirical relation with a relation of perfect
sufficiency (here the number in the yellow
subset divided by the number in the yellow and
green subsets taken together). Figure 2
illustrates a relation that might be described as
only nearly always sufficient. Alternatively,
using a probabilistic view of causation, being of
service class origin here could be said to be a
sufficient condition, all else being equal, for
raising the probability of achieving the outcome
to a level equal to this consistency proportion.
10Figure 2 Quasi-Sufficiency
Figure 1 Perfect Sufficiency
11QCA Necessity Coverage
In Figure 3 (next slide), another hypothetical
relation between being of service class origin
and achieving a degree is shown. This is another
example of less than perfect sufficiency. Here
the members of the yellow fringe of the service
class origin set are not also members of the
outcome set. However, most members of this
condition set are. This example is also, in fact,
a special case in that being of service class
origin is a necessary condition for achieving a
degree (and in the case of necessity the outcome
set is, as can be seen, a subset of the condition
set, reversing the direction of the subsethood
relation that characterises sufficiency). Venn
diagrams can also illustrate Ragins concept of
explanatory coverage (Ragin, 2006a). The
proportion of the outcome set that is overlapped
by the condition set can be used as a measure of
the degree to which the outcome is covered
(explained) by the condition. In Figure 1
(previous slide), the coverage of the outcome of
having a degree by the condition of being of
service class origin can be seen to be low, with
only around 40 of the (blue) outcome set covered
by the (yellow) condition set. In Figure 3 (next
slide), on the other hand, it can be seen that
the whole of the outcome set (again in blue) is
covered by the (yellow) condition set, and
coverage is 100 (the arithmetic mark of a
necessary condition in this simple case).
12(No Transcript)
13QCA Multiple conditions and the partitioning of
coverage I
- In more complex set theoretic models with
more than one condition, coverage can be
partitioned in a manner analogous to the
partitioning of variance explained in
regression-based approaches (Ragin, 2006a). The
partitioning of coverage into raw and unique
components can be illustrated, again using
imaginary data, by reference to a more complex
Venn diagram (Figure 4, next slide). Here we have
added the condition of being of high ability. In
this fictional case we now have two crisp sets
representing the conditions, SERVICE CLASS
ORIGIN and HIGH ABILITY, and the outcome is
the achievement of a degree. The Boolean solution
can be written as - DEGREE SERVICE CLASS ORIGIN HIGH ABILITY.
- Either being of service class origin or of high
ability is sufficient for the outcome (since both
condition sets, considered separately, are
subsets of the outcome set). Greater coverage of
the outcome is achieved by having both of these
factors in the analysis rather than either alone.
14(No Transcript)
15QCA Multiple conditions and the partitioning of
coverage II
- We can also see here how coverage can be
partitioned straightforwardly in the case of
crisp sets. In the case of the relations
illustrated in Figure 4 (previous slide) it is
easy to see that the total coverage can be broken
into three components - That due to being of service class origin while
not being of high ability (the yellow subset as a
proportion of the blue outcome set) - That due to being of high ability while not being
of service class origin (the orange subset as a
proportion of the blue outcome set) - That due to being of service class origin and
being of high ability (the red subset as a
proportion of the blue outcome set). - If we take service class origin as an
example, Ragin (2006a) would describe the first
of these three (the yellow subset as a proportion
of the outcome set) as the unique coverage due to
being from this social class background. On the
other hand, the coverage due to being of this
class origin, whether or not this is conjoined
with other causal conditions in the model (the
yellow and red subsets taken together as a
proportion of the outcome set), he would describe
as the raw coverage due to membership in this set
(being of service class origin). - Parallel arguments apply to being of high
ability.
16From this point on we employ real large n data in
illustrating QCA in use.
- We can use data from the National Child
Development Study (NCDS), comprising children
born in one week in March 1958, to illustrate a
multifactor conjunctural explanation1. Of
course, we will not expect to find perfect
sufficiency in the empirical world and our
example will show how the method embodied in the
software addresses this problem. We explore the
relations between highest qualifications achieved
by age 33 and a number of factors which might be
seen as either causal or as summarising possible
causes of achievement. - To begin with we will take, as our outcome
measure, having a highest level of qualification
of at least A level or its equivalent
(HQUAL_ADVANCED). We wish to capture something
more, when referring to social class origin, than
one point in time, and so, for illustrative
purposes, we will take fathers2 social class
at two points. We also include a measure of
mothers education and sex of the respondent. We
will not include any measure of ability in this
first example, in order to keep things simpler. - 1 We will begin by using a subset of the data
containing 3826 cases chosen to include no
missing values on four measures of fathers class
at different times and on mothers education as
well as other key variables. - 2 We use fathers class because there are many
more cases of missing/not-applicable data for
mothers class. However, we include a maternal
influence via mothers education.
17An illustrative Boolean analysis.
- We will address the Boolean equation
- HQUAL_ADVANCED
- function(MALE, PMT_FATHER_AT_BIRTH1,
PMT_FATHER_AT_AGE_11, MOTHER_POST_16_EDUCATED) - where
- HQUAL_ADVANCED refers to having qualifications
of at least A level standard by age 33. - MOTHER_POST_16_EDUCATED refers to the mother
having stayed on in education after age 16. - MALE refers to being male rather than female.
- PMT_FATHER_AT_BIRTH refers to the mothers
husband being in a professional, managerial
or technical position2 at the time of
the respondents birth. - PMT_FATHER_AT_AGE_11 refers to the
respondents father being in a professional,
managerial or technical position when the
respondent was aged 11. - We should stress that we are not claiming
that we have anything like a properly specified
model of educational achievement here. Our
purpose here is to illustrate QCA in use with
large n data. - 1 This is actually a measure of the mothers
husband in 1958, but to avoid unnecessary
complexity (and given that this is usually the
respondents father) we have used this
description. - 2 The PMT grouping used here comprises Classes
I and II of the contemporary Registrar Generals
scheme.
18Table 1 Proportions achieving HQUAL_ADVANCED by
class origin, sex and mothers education (NCDS
data n3826) a crosstabulation
19QCA Moving from the crosstab via a truth table
to a Boolean solution
- The first step required is to reconfigure
this as a truth table (next slide) where a 1 is
entered to indicate the presence of a condition
and a 0 to indicate its absence. In this table,
where the rows are ordered by the measure of
consistency with sufficiency, the first row
(1101), for example, represents the causal
configuration - MALEPMT_FATHER_AT_BIRTHpmt_father_at_age_11
MOTHER_POST_16_EDUCATED - with the upper case letters indicating
membership in a set and lower case letters
non-membership. The proportion of the 34 cases in
this configuration who achieve the outcome, i.e.
0.824, appears in the consistency column. - The second step is to determine a threshold
for quasi-sufficiency and, in the light of this
decision, to enter a 1 into the empty outcome
(HQUAL_ADVANCED) column against each row (or
causal configuration) for which the consistency
proportion in the final column passes the
threshold set. - This decision determines which
configurations are allowed into the final
solution.
20Table 2 Truth table for achieving HQUAL_ADVANCED
(NCDS data, n3826)
21Three types of cases?
- The decision re a threshold also
effectively determines which cases, seen as
captured by configurations of conditions, will be
grouped together in the final solution. In this
illustration we will assume that there are three
levels of outcome that we wish to understand in
configurational terms - Those configurations or sets of cases in
which more than 60 of the cases achieve the
outcome. Passing this consistency level might be
argued to be consistent with this level of
outcome approaching being more or less the norm
for these configurations. These configurations
are also those we might want to allow forward
into a solution for quasi-sufficiency. - Those configurations (sets of cases) in which
fewer than 40 of the cases achieve the outcome.
This level might be seen as making not achieving
this level of outcome more or less the norm for
these configurations. - The remaining configurations (sets of cases) in
which 40 - 60 of the cases achieve the
outcome. In these configurations neither
achieving nor not achieving the outcome is the
norm. - Clearly, these decisions require
judgements to be made. The reader will see that
it is easy to explore other analyses based on
other boundaries.
22The first group of cases.
- Let us turn to the first group. These
configurations have been picked out by entering
1s and 0s in Table 2 in the HQUAL_ADVANCED
column. Table 3a (next slide) shows the solution
that results when fs/QCA is asked to minimise the
configurations picked out by these 1s. These
eight rows (causal configurations) are
subjected to an algebraic process of Boolean
minimisation1 (Quine, 1952 Ragin, 1987) in
order to create the final simplest solution - MALEPMT_FATHER_AT_BIRTH
- PMT_FATHER_AT_BIRTHMOTHER_POST_16_EDUCATED
- PMT_FATHER_AT_AGE_11 MOTHER_POST_16_EDUCATED
- The two final expressions pick out cases whose
mothers had stayed on after 16 and had a father
figure in the PMT class at one point of two in
their childhood. Both males and females are
included in these expressions. The first
expression picks out just males who were born
into a family setting with a father in the PMT
class at birth. - 1 This proceeds as follows. Taking the first
two rows as an example, we have 1101 and 1111.
Clearly, at the level of quasi-sufficiency we
have chosen the presence or absence of the third
element makes no difference. We can therefore
replace it with a dash to indicate this, giving
11-1. A similar argument can be applied to the
fourth and fifth rows (0111 and 0101) to give
01-1. Taking 11-1 and 01-1 together, and
continuing the process we arrive at -1-1. This is
PMT_FATHER_AT_BIRTH MOTHER_POST_16_EDUCATED, one
of the terms in our final solution.
23(No Transcript)
24(No Transcript)
25(No Transcript)
26QCA an example of a quasi-necessary condition I
- It might be thought, at least for some
hypothesised meritocracy, that were academic
ability to be appropriately defined and measured
then some minimum level of this factor ought to
be a necessary condition for anyone to achieve a
degree. Table 4a illustrates this, where one cell
should be empty if the chosen level of ability
(X) is a strictly necessary condition for a
degree to be achieved. Here, we might be seen as
assuming causal homogeneity for the factor of
ability.
Table 4a Strict necessity of some level of
ability (X) for achieving a degree
27QCA an example of a quasi-necessary condition II
An examination by eye of the NCDS distribution of
the proportions achieving a degree at each point
of the ability scale allows us to estimate what
such a level of ability might be empirically, for
all respondents taken together. It is, in fact,
around the mean ability score and if we create a
factor setting ability as either over or under
the mean score for our subset of 3826, we obtain
Table 4b, showing that the proportion of those
obtaining a degree whose ability score is below
the mean is only 10.4. Especially given that
this proportion may include cases where the
measurement was low through either error or
chance factors, we might be willing to say that a
score above the mean approaches being a necessary
condition for achieving a degree in this sample
and is therefore a quasi-necessary condition.
Table 4b Achieving a degree by ability below and
above the mean row (column )1
1 As it happens this test only has discrete
scores, from 0 to 80. The mean lies between two
of these scores.
28QCA an example of a quasi-necessary condition
III
However, we can not be satisfied with this
conclusion which, as we said, effectively assumes
causal homogeneity, with ability operating in the
same way across all types of cases and, of
course, leaves us wondering about the features of
the cases amongst the 10.4. We obviously want
to know whether there are sets of cases
perhaps, for example, differentiated by social
class - for whom being either above or below the
mean, when conjoined with other factors, is
either necessary and/or sufficient or not for
achieving a degree (or quasi-necessary or
quasi-sufficient), especially as apparent returns
to ability vary by class, as Figure 5 (next
slide), produced using a slightly different class
origin categorisation, clearly shows.
29Figure 5 Proportions gaining a degree by ability
at age 11 and social class
30QCA an example of a quasi-necessary condition
IV
- To explore these questions, we might undertake an
analysis that includes a measure of ability being
over the mean, given what we found in Table 4b.
Let us undertake an analysis of - HQUAL_DEGREE
- function (ABILITY_ABOVE_MEAN, MALE,
PMT_FATHER_AT_BIRTH, -
- PMT_FATHER_AT_AGE_11, MOTHER_POST_16_EDUCATE
D). - The relevant truth table is shown in Table 5
(next slide), with the rows ordered by
consistency. We can see that the first five rows
have a consistency level of 0.40 or above, which
we might label as implying that for these cases,
gaining a degree is, all else being equal, a
definite possibility, something that is a pretty
common occurrence in their milieus. Each of these
configurations is characterised by having ability
above the mean, but conjoined with several
supportive paternal and maternal ascriptive
factors, and, in most cases, with male sex. The
minimised solution for these rows is shown in
Table 6 (two slides on) where ABILITY_ABOVE_MEAN
appears, as a necessary condition should, in each
expression. - We will return to the somewhat paradoxical
threshold-dependent sense which the term
necessary has in this claim after a subsequent
example.
31Table 5
32Table 6 Minimised solution for Table 5, for
first five rows
--- TRUTH TABLE SOLUTION --- frequency
cutoff 9.000 consistency cutoff 0.417
raw unique
coverage coverage consistency
-------- ---------- ----------- ABILITY_AB
OVE_MEANMALEPMT_FATHER_AT_BIRTH PMT_FATHER_AT_A
GE_11
0.184 0.065
0.485 ABILITY_ABOVE_MEANMALE
PMT_FATHER_AT_BIRTH MOTHER_POST_16_EDUCATED
0.141 0.022 0.477
ABILITY_ABOVE_MEANMALEPMT_FATHER_AT_AGE_11 MO
THER_POST_16_EDUCATED
0.159 0.039
0.466 ABILITY_ABOVE_MEANPMT_FATHER_AT_BIRTH
PMT_FATHER_AT_AGE_11MOTHER_POST_16_EDUCATED
0.239 0.120 0.452 solution
coverage 0.365 solution consistency
0.453
33QCA an example of a quasi-necessary condition V
A further inspection of Table 5 shows, as we
might expect, that having this level of ability
characterises the top half of the ordered table
(14 out of the 16 rows). However, there are
exceptions. The first, in the twelfth row, is the
configuration, with only 34 cases
ability_above_meanMALEPMT_FATHER_AT_BIRTH P
MT_FATHER_AT_AGE_11MOTHER_POST_16_EDUCATED This
conjunction of lower ability with supportive
ascriptive factors is associated with some 20.6
achieving a degree, some way above the mean of
13.3.
34QCA an example of a quasi-necessary condition VI
We might be especially interested in exploring
what it is about those with lower than mean
ability that might explain their achieving
proportionally more degrees than expected. It is
likely, as we can see from this example, to be
the presence of supporting ascriptive factors.
However, the numbers become very small in some of
the relevant rows in Table 5. For this reason, we
will explore this question using a different
boundary within the ability scale. Sixty-one
percent of those achieving degrees in the 3826
have ability in the top 20 of the overall
distribution in the NCDS (see Table 7). We can
use the remaining 39 to explore what factors,
conjoined with being outside the top 20 are
associated with raising the proportion gaining a
degree. We will define, for current purposes,
ability in the top 20 as high ability.
Table 7 Degrees by High Ability (i.e. ability in
top 20) (column )
35QCA an example of a quasi-necessary condition
VII
Therefore let us undertake a Boolean analysis
parallel to the earlier one but that excludes the
top 20 of the ability range. Table 8 (next
slide) is the relevant truth table, ordered by
consistency. A glance at this shows that, for
these cases, mothers education is a key factor
in raising the likelihood of a degree. If we
set a 0.20 threshold to explore this (having
noted the jump from 0.16 to 0.20 in the
consistency column), we obtain the solution in
Table 9 (two slides on). Within the confines of
this analysis, i.e. for those not of high ability
as defined, MOTHER_POST_16_EDUCATED is necessary
to raise the proportion obtaining a degree to
20, as is also a fathers class position in the
PMT classes for at least one of the two points
included. However, the low coverage figure for
the solution should be noted (0.296). Amongst
those not of high ability as defined, more
degrees (140) are gained by individuals outside
of the configurations included in this solution
than by those within them (59). It must therefore
be stressed that the sense of necessary here is
necessary to raise the proportion for a
configuration to 0.2 or better and not the sense
that it is not possible for an individual to gain
a degree without a suitably educated mother. Many
do precisely the latter.
36Table 8 Degree by sex, class and mothers
education (only for those whose ability is
outside the top 20)
37Table 9 Degree by sex, class and mothers
education (only for those whose ability is
outside the top 20)
--- TRUTH TABLE SOLUTION --- frequency cutoff
17.000 consistency cutoff 0.200
raw
unique
coverage coverage
consistency
---------- ----------
----------- PMT_FATHER_AT_AGE_11 MOTHER_POST_1
6_EDUCATED 0.276 0.201
0.239 malePMT_FATHER_AT_BIRTH MOTHER_POST_16_
EDUCATED 0.095 0.020
0.202 solution coverage 0.296 solution
consistency 0.236
38QCA Limited Diversity in Datasets and
Counterfactual Reasoning
In the examples we have used above, and with the
number of conditions employed in those models, we
did not experience the problem of very small
numbers in some rows of the truth table that can
arise with more conditions as a consequence of
(i) the exponential increase in the number of
rows as more conditions are included and (ii) the
relations or correlations - between conditions
in the empirical world (Ragin Sonnett, 2005).
Small numbers of cases in some configurations
constitute a problem because it is difficult to
make a valid statement about a group of cases
who, empirically, only appear in small numbers.
In regression analyses, since the weight of the
various combinations of scores on variables is
taken into account in calculating average net
effects, this problem is effectively dealt with
mechanically, partly via the use of significance
tests. Ragin has suggested a range of ways of
using counterfactual reasoning to address the
problems caused by limited diversity. For our
use of these approaches with the NCDS data, which
we will not have time to discuss, see Cooper
Glaesser (2008).
39QCA Some Problems in its Use With Large Datasets
We will introduce here some of the problems and
issues that arise for us in using QCA with large
n data. We will begin with problems that are
not peculiar to QCA since they parallel the
correlation / causation problem in conventional
quantitative analyses. We will then discuss
some problems that are more QCA-specific, though,
to some extent, it must be remembered, these may
be a consequence of its relatively recent
development. Unlike regression, QCA has not been
under development for more than a century!
40Although we may, and certainly should, have
inserted some cautious words (potentially,
possible, etc.) before the word causal at
various places in this talk, we have not yet
addressed the question of whether QCA, as an
analytic tool, is able to avoid analogous
problems to those associated with moving from
correlations to causal claims in the regression
approach. Clearly, we might enter into a Boolean
model a condition that we then found to be
logically necessary, for example, for some
outcome, but which we would not want to regard as
truly causal. Two types of such conditions are
worth distinguishing.
41QCA non-causal conditions I
Alcohol might be a necessary (and causal)
condition for drunkenness, but, in a society in
which it was always mixed with tonic water, we
would want to be able to reject a claim (which
QCA could obviously deliver, if used
mechanically) that tonic water was a necessary
causal condition for drunkenness. We would do
this, presumably, by reference to existing
theoretical knowledge, preferably of the
mechanisms and processes involved in the
production of drunkenness and/or by comparisons
with other sets of findings where tonic water was
not mixed with alcohol, etc1.
1 Cartwright (2007) provides a formal treatment
of this correlation/causation problem in the
context of QCA.
42QCA non-causal conditions II
To avoid problems of infinite regress, we would
want to be able to distinguish some types of
causal necessary conditions from others. It may
well be necessary for oxygen to be present in
order for degrees to be achieved, but we wouldnt
normally expect to address this in an analysis of
educational achievement. Mackies (1974)
concept of the causal field provides a way of
addressing this potential problem. This field
acts as a background context which absorbs the
causal factors we would not expect to see
referred to as part of an explanation of some
particular outcome under examination.
43QCA non-causal conditions III
Having noted these problems, we would
nevertheless want to argue that, in our earlier
analyses, there are plausible mechanisms implied
by such summarising conditions as social class.
These conditions (class, ability, etc.) or, at
least, the more specific factors they summarise,
are plausible causal factors. Furthermore, when
addressing some evaluative questions (e.g. is
Britain a meritocracy?), the question itself,
once its constituent terms are defined, usually
points to the relevant factors to include in a
configurational analysis (Cooper, 2005, 2006).
44QCA Underdetermination of theory by data, etc.
We might find in some population that being in
the set maleWORKING_CLASS is perfectly
sufficient for NOT achieving a given level of
educational qualification. However, whether
this is due to working class females lacking some
capacity or disposition required to cope with the
appropriate curriculum or whether, on the other
hand, some form of educational apartheid ensures
that no working class female is allowed to enter
the institution offering the curriculum, clearly
can not be read off from the Boolean expression.
Of course, other Boolean models perhaps could
be used to provide part of the answer (exploring
what happens to other females, to working class
males including dispositional factors) but,
ideally, we need knowledge of the processes and
mechanisms that generate the observed outcomes.
Nothing in Ragins work, we should note, suggests
that he thinks otherwise.
45QCA problems to do with randomness
We might find that the configuration HIGH_ABILITY
SERVICE_CLASS has a consistency with
sufficiency of, say, 0.90, for achieving some
outcome, thereby reaching a level that Ragin
would regard as indicating quasi-sufficiency.
However, is this gap between 1.00 and 0.90 to be
explained by our having the equivalent of an
underspecified model in a regression analysis
(e.g. perhaps some missing ascriptive factors or
a lack of factors concerning choice) or by the
existence of stochastic elements in the social
world (and/or measurement or sampling error)?
In the former case, there exists some causal
heterogeneity yet to be picked out by the
conditions entered in the model. It might be that
HIGH_ABILITY SERVICE_CLASS MALE has perfect
consistency with sufficiency, for example. This
would leave us, however, with HIGH_ABILITY
SERVICE_CLASS male having a lower consistency
than 0.90 and return us to the same question
again, but this time just for females.
46QCA and counterfactualist perspectives of
causation
A counterfactualist perspective on causation
(e.g. Morgan Winship, 2007) could be used to
raise questions about some QCA-derived claims re
causality in the same way it raises questions
about some regression-based forms of analysis
that basically use a branch of mathematics to
describe relations in datasets1. On the other
hand, a move from a net effects perspective (one
assuming independently manipulable independent
variables) to one emphasising conjunctural
causation might be expected to make it less
likely that unjustified counterfactual claims are
made by policy makers on the basis of research
findings, especially about the effects of
intervening to change a single factor without
taking account of its context.
1 For a relevant and interesting exchange of
views, see Ragin Rihoux, 2004a,b Lieberson,
2004 Seawright, 2004 Mahoney, 2004.
47More QCA-specific issues inference from samples
to populations I
The first point concerns work that uses samples
from some population. This is usually the
situation we find ourselves in when working with
large datasets. Although attempts have been made
(e.g. in earlier version of the fs/QCA software)
to incorporate significance testing (see also
Ragin, 2000, and Smithson and Verkuilen, 2006),
this is an area requiring more work. Especially
when numbers become small in some rows of a truth
table, and especially when survey data are being
used, a critic will always be able to ask whether
sampling (or measurement) error has been taken
into account. Although we have considerable
sympathy with the view that judgement should play
a role in these situations especially as
significance tests are frequently employed when
the conditions for their use are not met we
also recognise that more work on incorporating
significance testing into QCA would be useful,
simply because chance always offers a potential
threat to any analytic claim we might make.
But, note that Ragin (1987, 2000) has a
different perspective on populations to the one
implied here.
48More QCA-specific issues inference from samples
to populations II
A related problem we have ignored during the talk
so far is that of missing data. Can we assume
that the Boolean solutions we have presented,
often based on smallish subsets of the whole NCDS
(because of the missing data problem) would hold
for the NCDS as a whole? This would seem unlikely
unless the missing data have been generated by
random rather than systematic processes. Of
course, it is possible to undertake some simple
checks to see whether any bias is likely to have
been introduced. It is also possible to use
sophisticated techniques (multiple imputation,
etc.) to replace missing data, but such
approaches require considerable faith in the very
linear models that Ragin and others have argued
are often unhelpful in the social world. This is
a difficult problem to which we intend to give
further thought.
49More QCA-specific issues case knowledge (or its
lack) in large n contexts
- We lack, in the traditional sense, the detailed
case knowledge that Ragin argues is required to
undertake QCA. - The NCDS, in one sense, does contain a mass of
data on each individual respondent but, for
example, - it is collected via techniques that are likely to
generate considerable error and, - (ii) it is not possible for us to return to the
respondent to correct likely errors or to seek
new data from earlier periods as analyses
develop.
50More QCA-specific issues quasi as opposed to
perfect necessity and sufficiency
Repeating what we said earlier there is the
question of whether and when it makes sense to
ever stop at quasi- levels of consistency, i.e.
to ignore the deviant cases in a row (or to allow
a ceteris paribus clause). More generally, the
use of weak implication (quasi-sufficiency and
quasi-necessity as opposed to sufficiency and
necessity) deserves more discussion (but see
Abell, 1971, and also Goertz, 2005 Waldner,
2005 Sekhon, 2005 for a recent exchange).
51Weve raised a lot of problems here, though we
ourselves believe QCA to be a very important
addition to the armoury of the social scientist
interested in exploring potentially causal
relations. The fuzzy set variety of QCA allows
the conjunctural perspective to be brought to
bear more finely than the crisp set version we
have discussed here, but, inevitably, given the
nature of fuzzy sets and logic, brings along some
additional problems (many addressed in Ragins
own account in Fuzzy Set Social Science). We
are looking forward to further developments of
these methods and, in particular, to Ragins
forthcoming new book Redesigning Social Inquiry
Fuzzy Sets and Beyond.
52References Abell, P. (1971) Model Building in
Sociology. London Weidenfeld Nicolson. Abbott,
A. (2001) Time Matters. London Chicago Chicago
University Press. Boudon, R. (1974a) The logic of
sociological explanation. Harmondsworth
Penguin. Boudon, R. (1974b) Education,
Opportunity and Social Inequality. NY
Wiley-Interscience. Byrne, D. (1998) Complexity
Theory and the Social Sciences. London
Routledge. Byrne, D. (2002) Interpreting
Quantitative Data. London Sage. Cartwright, N.
(2007) Hunting Causes and Using Them Approaches
in Philosophy and Economics. Cambridge Cambridge
University Press. Cooper, B. (2005) Applying
Ragins crisp and fuzzy set QCA to large
datasets social class and educational
achievement in the National Child Development
Study. Sociological Research Online. 10, 2
lthttp//www.socresonline.org.uk/10/2/cooper.htmlgt
Cooper, B. (2006) Using Ragins Qualitative
Comparative Analysis with longitudinal datasets
to explore the degree of meritocracy
characterising educational achievement in
Britain. Paper presented to the Sociology of
Education SIG at the Annual Meeting of the
American Educational Research Association, San
Francisco. Cooper B. and Glaesser, J. (2007)
Exploring Social Class Compositional Effects on
Educational Achievement with Fuzzy Set Methods A
British Study. Paper presented to the Sociology
of Education SIG at the Annual Meeting of the
American Educational Research Association,
Chicago. Cooper B. Glaesser, J. (2008)
Exploring alternatives to the regression analysis
of quantitative survey data in education what
does the configurational approach have to offer?
Paper presented at the Annual Meeting of the
American Educational Research Association, New
York. Cooper B. Glaesser, J. (in press) How has
educational expansion changed the necessary and
sufficient conditions for achieving professional,
managerial and technical class positions in
Britain? A configurational analysis. Sociological
Research Online. Freedman, D.A. (1987) As others
see us a case study in path analysis. Journal of
Educational Statistics. 12, 2, 101-128. Freedman,
D.A. (1997) From association to causation via
regression. In McKim, V.R. Turner, S.P. (Eds)
Causality in Crisis? Statistical Methods and the
Search for Causal knowledge in the Social
Sciences. Notre Dame, Indiana University of
Notre Dame Press. Glaesser, J. (forthcoming,
2009) Just how flexible is the German selective
secondary school system? A configurational
analysis. International Journal of Research and
Method in Education. Goertz, G. (2005) Necessary
condition hypotheses as deterministic or
probabilistic does it matter? Qualitative
Methods Newsletter of the American Political
Science Association Organized Section on
Qualitative Methods. Spring 2005, 22-27. Gorard,
S. (2006) Towards a judgement-based statistical
analysis. British Journal of Sociology of
Education. 27, 1, 67-80. Hauser, R. (1976) On
Boudons model of social mobility. The American
Journal of Sociology. 81, 4, 911-928 Hedström, P.
(2005) Dissecting the Social On the Principles
of Analytical Sociology. Cambridge Cambridge
University Press. Lieberson, S. (1985). Making it
Count the improvement of Social Research and
Theory. Berkeley University of California Press.
Lieberson, S. (2004) Comments on the use and
utility of QCA. In Qualitative Methods
Newsletter of the American Political Science
Association Organized Section on Qualitative
Methods. Fall 2004, Vol. 2, No. 2, 13-14. Mackie,
J. (1974) The Cement of the Universe. Oxford
Clarendon Press.
53Mahoney, J. (2001) Beyond correlational
analysis recent innovations in theory and
method. Sociological Forum. 16, 3
,575-593. Mahoney, J. (2004) Reflections on
fuzzy-set/QCA. In Qualitative Methods Newsletter
of the American Political Science Association
Organized Section on Qualitative Methods. Fall
2004, Vol. 2, No. 2, 17-21. Mahoney, J. Goertz,
G. (2006) A tale of two cultures contrasting
quantitative and qualitative research. Political
Analysis, 14, 3, 227-249. Morgan S.L. Winship,
C. (2007) Counterfactuals and Causal Inference
Methods and Principles for Social Research.
Cambridge Cambridge University Press. Ormerod,
P. (1998) Butterfly Economics. London Faber and
Faber. Pawson, R. Tilley, N. (1997) Realistic
Evaluation. London Sage. Pearl, J. (2000)
Causality models, reasoning and inference.
Cambridge Cambridge University Press. Quine,
W.V. (1952) The problem of simplifying truth
functions. American Mathematical Monthly,
Vol. 59, No. 8, pp. 521-531. Ragin, C.C. (1987)
The comparative method. Berkeley Los Angeles
California University Press. Ragin, C.C. (2000)
Fuzzy set social science. Chicago Chicago
University Press. Ragin, C.C. (2003) Recent
advances in fuzzy-set methods and their
application to policy questions.
lthttp//www.compasss.org/Ragin2003.PDFgt. Ragin,
C.C. (2005) From fuzzy sets to crisp truth
tables. lthttp//www.compasss.org/Raginfztt_April05
.pdf gt Ragin, C.C. (2006a) Set relations in
social research evaluating their consistency and
coverage. Political Analysis. 14, 291-310.
Ragin, C.C. (2006b) The limitations of net
effects thinking. In Rihoux, B. Grimm, H. (Eds)
Innovative Comparative Methods for Political
Analysis, NY Springer. Ragin, C.C. Rihoux, B.
(2004a) Qualitative Comparative Analysis (QCA)
state of the art and prospects. In Qualitative
Methods Newsletter of the American Political
Science Association Organized Section on
Qualitative Methods. Fall 2004, Vol. 2, No. 2,
3-13. Ragin, C.C. Rihoux, B. (2004b) Replies
to commentators reassurances and rebuttals. In
Qualitative Methods Newsletter of the American
Political Science Association Organized Section
on Qualitative Methods. Fall 2004, Vol. 2, No. 2,
22-24. Ragin, C.C. and Sonnett, J. (2005) Between
complexity and parsimony limited diversity,
counterfactual cases, and comparative analysis.
In Kropp, S. And Minkenberg, M. (Eds) Vergleichen
in der Politikwissenschaft. WiesbadenVS Verlag
für Sozialwissenschaften. Ragin, C.C., Rubinson,
C., Schaefer, D., Anderson, S., Williams, E. and
Giesel, H. (2006) User's Guide to
Fuzzy-Set/Qualitative Comparative Analysis 2.0.
Tucson, Arizona Department of Sociology,
University of Arizona. Ron, A. (2002) Regression
analysis and the philosophy of social science a
critical realist view. Journal of Critical
Realism. 1, 1, 119-142. Rothman K.J. (1976)
Causes. American Journal of Epidemiology. 104, 6,
587-592. Seawright, J. (2004) Qualitative
comparative analysis vis-à-vis regression. In
Qualitative Methods Newsletter of the American
Political Science Association Organized Section
on Qualitative Methods. Fall 2004, Vol. 2, No. 2,
14-17. Sekhon, J.S. (2005) Probability tests
require distributions. Qualitative Methods
Newsletter of the American Political Science
Association Organized Section on Qualitative
Methods. Spring 2005, 29-30. Smithson, M.
Verkuilen, J. (2006) Fuzzy Set Theory
Applications in the Social Sciences. London
Sage. Sörensen, A. (1998) Theoretical mechanisms
and social processes. In Hedström, P. Swedberg,
R. (Eds) Social Mechanisms an analytical
approach to social theory. Cambridge Cambridge
University Press. Taagepera, R. (2005) Predictive
versus postdictive models. Paper presented to the
3rd conference of the European Consortium for
Political Research. Budapest, September
2005. Waldner, D. (2005) It aint necessarily so
or is it? Qualitative Methods Newsletter of
the American Political Science Association
Organized Section on Qualitative Methods. Spring
2005, 27-29.