Exploring configurational causation in large datasets with QCA: possibilities and problems - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Exploring configurational causation in large datasets with QCA: possibilities and problems

Description:

Barry Cooper & Judith Glaesser. School of Education, Durham University ... at least for some hypothesised meritocracy, that were academic ability to be ... – PowerPoint PPT presentation

Number of Views:243
Avg rating:3.0/5.0
Slides: 54
Provided by: barryc86
Category:

less

Transcript and Presenter's Notes

Title: Exploring configurational causation in large datasets with QCA: possibilities and problems


1
Exploring configurational causation in large
datasets with QCA possibilities and problems
  • Barry Cooper Judith Glaesser
  • School of Education, Durham University

3rd ESRC Research Methods Festival St Catherines
College Oxford, 30 June 3 July 2008
2
A note re these slides.
  • Some of these slides will be used in our
    presentation itself but some have been written to
    provide, as a context for the tables, etc., a
    pre- and post-festival web-based sketch of the
    method we have employed (Ragins Qualitative
    Comparative Analysis, or QCA) for any readers new
    to it.
  • After a brief description of the background to
    Ragins development of the set theoretic
    approach, and a list of what we see as its
    strengths, we will illustrate its use with large
    n data, drawing on our experience of using QCA
    (Cooper, 2005, 2006 Cooper Glaesser, 2007,
    2008, in press Glaesser, forthcoming).
  • To keep things less complex than they would
    otherwise become, we will not draw attention,
    during this part of our presentation, to the more
    problematic issues that we wish to mention.
  • Instead, we deal with this aspect of our
    presentation after the illustration of the use of
    QCA in a large n context.

3
Concerns about the dominant regression approach
in quantitative analysis have a long history.
Here, for example, are various remarks taken from
Peter Abells 1971 book, Model Building in
Sociology
  • It is often (perhaps more often than not) the
    case that the covariation between sociological
    variables is not linear (p.174).
  • It was argued ... that interaction is a
    characteristic feature of sociological
    covariation (p.183).
  • Multicollinearity is pervasive in sociology it
    is more often than not the case that explanatory
    variables are intercorrelated (p.189).
  • But from what was said earlier it might be
    expected that (cardinal) variables will be of
    relatively rare occurrence in sociology. One is
    much more likely to encounter the situation where
    nominal and ordinal variables are related
    (p.197).
  • We have noted earlier that the typical causal
    situation in social science is one of
    over-determination many different clusters of
    variables are sufficient for a given effect
    (p.236).
  • Abells book also includes considerable
    discussion of the logic of necessary and
    sufficient conditions alongside his discussion of
    linear modelling.

4
Several authors, from various perspectives, have
raised important concerns about regression and
its uses. For example (see attached bibliography
for details)
  • Boudon (1974a,b)
  • Byrne (1998, 2002)
  • Freedman (1987, 1997)
  • Hedström (2005)
  • Lieberson (1985)
  • Morgan and Winship (2007)
  • Ormerod (1998)
  • Pawson Tilley (1997)
  • Pearl (2000)
  • Ron (2002)
  • Sörensen (1998)
  • Taagepera (2005).

5
Andrew Abbott (2001) has summarised some of the
key assumptions of the linear model normally used
in regression
  • The social world is made up of fixed entities
    with varying attributes (demographic assumption).
  • Some attributes determine (cause) others
    (attribute causality assumption).
  • What happens to one case doesn't constrain what
    happens to others, temporally or spatially
    (casewise independence assumption).
  • Attributes have one and only one causal meaning
    within a given study (univocal meaning
    assumption).
  • Attributes determine each other principally as
    independent scales rather than as constellations
    of attributes main effects are more important
    than interactions (which are complex types) (main
    effects assumption).

6
Charles Ragins work
  • Ragin (1987) shared many of the concerns of
    these various writers, but, in particular
    perhaps, focussed on Abbotts third and fourth
    points, the relative neglect of causal
    heterogeneity and complex interaction in
    regression models when used in practice1. Using
    set theory rather than regressions linear
    algebra as the basis for developing a
    configurational approach to causal modelling, he
    began to explore ways in which (i) complex
    interaction between causal factors and (ii)
    causal heterogeneity (i.e. the existence of
    several distinct types of cases in a
    population2 and therefore of possible
    multiple pathways to an outcome) could be
    described in Boolean or configurational terms
    (Ragin, 1987, 2000, 2006a). In doing so, he also
    aimed to shift researchers practices away from a
    focus on the net average effects of variables
    (i.e. on which variables win the race to explain
    most variance) and towards an approach that
    recognised that events in the world are often
    caused by conjunctions of factors (Ragin, 2006b).
    It is his Qualitative Comparative Analysis (QCA)
    on which we focus in this paper.
  • 1 On Abbotts second point, see Hedström
    (2005).
  • 2 The returns to cognitive capacity, for
    example, might differ systematically between
    social classes.

7
Before introducing QCA in more detail, we might
set out what we regard as the strengths of
Ragins approach
  • A focus on cases and their constituent features
    rather than, as in regression, on abstracted
    variables (and therefore net and often average
    effects).
  • Analysis of multiple and conjunctural causation
    in terms of necessary and/or sufficient
    conditions rather than in terms of the linear
    additive model.
  • The recognition, up front, of the possibility of
    causal heterogeneity.
  • The offer of a rigorous approach, drawing on set
    theory and logic, to the analysis of these
    features of social reality.
  • Through a focus on INUS1 conditions, the
    allowing, up front, of complex interactions
    between causes.
  • The recognition of the problems resulting from
    limited diversity in social datasets.
  • 1 An INUS condition is an insufficient but
    non-redundant part of an unnecessary but
    sufficient condition (Mackie, 1974).

8
Boolean functional form an example
  • Ragins QCA and its associated software use
    Boolean algebra to address conjunctural
    causation. Boolean equations have a different
    functional form to the regression equations with
    which social scientists are familiar. Here is an
    example taken from a paper contrasting the
    approaches (Mahoney Goertz, 2006)
  • Y (ABc) (ACDE)
  • In these equations the symbol indicates Logical
    AND (set intersection), indicates Logical OR
    (set union), upper case letters indicate the
    presence of factors, lower case indicate their
    absence. In this fictional example of causal
    heterogeneity, the equation indicates that there
    are two causal paths to the outcome Y. The first,
    captured by the causal configuration ABc
    involves the presence in the case of features A
    and B, combined with the absence of C. The
    second, captured by ACDE, requires the joint
    presence of A, C, D and E. Either of these causal
    configurations is sufficient for the outcome to
    occur, but neither is necessary, considered
    alone. A is necessary but not sufficient. The
    factor C behaves differently in the two
    configurations. This non-probabilistic - or
    veristic - example, of course, assumes no
    empirical exceptions to these relations.

9
QCA Sufficiency and quasi-sufficiency
  • Sufficiency, understood causally or
    logically, involves a subset relation. If, for
    example, a single condition is always sufficient
    for an outcome to occur, the set of cases with
    the condition will be a subset of the set of
    cases with the outcome. This is shown in Figure 1
    (next slide) based on a hypothetical relation
    between being of service class origin and
    achieving a degree. Given the condition, we
    obtain the outcome. In applications to real large
    n data, perfect sufficiency is unlikely to be
    found, and a situation like Figure 2 (next slide)
    will often be found, where most but not all of
    the set of cases with the condition also are
    members of the outcome set.
  • Using conventional crisp sets, the
    proportion of the members of the condition set
    who are also members of the outcome set can be
    used as a measure of the degree of consistency of
    the empirical relation with a relation of perfect
    sufficiency (here the number in the yellow
    subset divided by the number in the yellow and
    green subsets taken together). Figure 2
    illustrates a relation that might be described as
    only nearly always sufficient. Alternatively,
    using a probabilistic view of causation, being of
    service class origin here could be said to be a
    sufficient condition, all else being equal, for
    raising the probability of achieving the outcome
    to a level equal to this consistency proportion.

10
Figure 2 Quasi-Sufficiency
Figure 1 Perfect Sufficiency
11
QCA Necessity Coverage
In Figure 3 (next slide), another hypothetical
relation between being of service class origin
and achieving a degree is shown. This is another
example of less than perfect sufficiency. Here
the members of the yellow fringe of the service
class origin set are not also members of the
outcome set. However, most members of this
condition set are. This example is also, in fact,
a special case in that being of service class
origin is a necessary condition for achieving a
degree (and in the case of necessity the outcome
set is, as can be seen, a subset of the condition
set, reversing the direction of the subsethood
relation that characterises sufficiency). Venn
diagrams can also illustrate Ragins concept of
explanatory coverage (Ragin, 2006a). The
proportion of the outcome set that is overlapped
by the condition set can be used as a measure of
the degree to which the outcome is covered
(explained) by the condition. In Figure 1
(previous slide), the coverage of the outcome of
having a degree by the condition of being of
service class origin can be seen to be low, with
only around 40 of the (blue) outcome set covered
by the (yellow) condition set. In Figure 3 (next
slide), on the other hand, it can be seen that
the whole of the outcome set (again in blue) is
covered by the (yellow) condition set, and
coverage is 100 (the arithmetic mark of a
necessary condition in this simple case).
12
(No Transcript)
13
QCA Multiple conditions and the partitioning of
coverage I
  • In more complex set theoretic models with
    more than one condition, coverage can be
    partitioned in a manner analogous to the
    partitioning of variance explained in
    regression-based approaches (Ragin, 2006a). The
    partitioning of coverage into raw and unique
    components can be illustrated, again using
    imaginary data, by reference to a more complex
    Venn diagram (Figure 4, next slide). Here we have
    added the condition of being of high ability. In
    this fictional case we now have two crisp sets
    representing the conditions, SERVICE CLASS
    ORIGIN and HIGH ABILITY, and the outcome is
    the achievement of a degree. The Boolean solution
    can be written as
  • DEGREE SERVICE CLASS ORIGIN HIGH ABILITY.
  • Either being of service class origin or of high
    ability is sufficient for the outcome (since both
    condition sets, considered separately, are
    subsets of the outcome set). Greater coverage of
    the outcome is achieved by having both of these
    factors in the analysis rather than either alone.

14
(No Transcript)
15
QCA Multiple conditions and the partitioning of
coverage II
  • We can also see here how coverage can be
    partitioned straightforwardly in the case of
    crisp sets. In the case of the relations
    illustrated in Figure 4 (previous slide) it is
    easy to see that the total coverage can be broken
    into three components
  • That due to being of service class origin while
    not being of high ability (the yellow subset as a
    proportion of the blue outcome set)
  • That due to being of high ability while not being
    of service class origin (the orange subset as a
    proportion of the blue outcome set)
  • That due to being of service class origin and
    being of high ability (the red subset as a
    proportion of the blue outcome set).
  • If we take service class origin as an
    example, Ragin (2006a) would describe the first
    of these three (the yellow subset as a proportion
    of the outcome set) as the unique coverage due to
    being from this social class background. On the
    other hand, the coverage due to being of this
    class origin, whether or not this is conjoined
    with other causal conditions in the model (the
    yellow and red subsets taken together as a
    proportion of the outcome set), he would describe
    as the raw coverage due to membership in this set
    (being of service class origin).
  • Parallel arguments apply to being of high
    ability.

16
From this point on we employ real large n data in
illustrating QCA in use.
  • We can use data from the National Child
    Development Study (NCDS), comprising children
    born in one week in March 1958, to illustrate a
    multifactor conjunctural explanation1. Of
    course, we will not expect to find perfect
    sufficiency in the empirical world and our
    example will show how the method embodied in the
    software addresses this problem. We explore the
    relations between highest qualifications achieved
    by age 33 and a number of factors which might be
    seen as either causal or as summarising possible
    causes of achievement.
  • To begin with we will take, as our outcome
    measure, having a highest level of qualification
    of at least A level or its equivalent
    (HQUAL_ADVANCED). We wish to capture something
    more, when referring to social class origin, than
    one point in time, and so, for illustrative
    purposes, we will take fathers2 social class
    at two points. We also include a measure of
    mothers education and sex of the respondent. We
    will not include any measure of ability in this
    first example, in order to keep things simpler.
  • 1 We will begin by using a subset of the data
    containing 3826 cases chosen to include no
    missing values on four measures of fathers class
    at different times and on mothers education as
    well as other key variables.
  • 2 We use fathers class because there are many
    more cases of missing/not-applicable data for
    mothers class. However, we include a maternal
    influence via mothers education.

17
An illustrative Boolean analysis.
  • We will address the Boolean equation
  • HQUAL_ADVANCED
  • function(MALE, PMT_FATHER_AT_BIRTH1,
    PMT_FATHER_AT_AGE_11, MOTHER_POST_16_EDUCATED)
  • where
  • HQUAL_ADVANCED refers to having qualifications
    of at least A level standard by age 33.
  • MOTHER_POST_16_EDUCATED refers to the mother
    having stayed on in education after age 16.
  • MALE refers to being male rather than female.
  • PMT_FATHER_AT_BIRTH refers to the mothers
    husband being in a professional, managerial
    or technical position2 at the time of
    the respondents birth.
  • PMT_FATHER_AT_AGE_11 refers to the
    respondents father being in a professional,
    managerial or technical position when the
    respondent was aged 11.
  • We should stress that we are not claiming
    that we have anything like a properly specified
    model of educational achievement here. Our
    purpose here is to illustrate QCA in use with
    large n data.
  • 1 This is actually a measure of the mothers
    husband in 1958, but to avoid unnecessary
    complexity (and given that this is usually the
    respondents father) we have used this
    description.
  • 2 The PMT grouping used here comprises Classes
    I and II of the contemporary Registrar Generals
    scheme.

18
Table 1 Proportions achieving HQUAL_ADVANCED by
class origin, sex and mothers education (NCDS
data n3826) a crosstabulation
19
QCA Moving from the crosstab via a truth table
to a Boolean solution
  • The first step required is to reconfigure
    this as a truth table (next slide) where a 1 is
    entered to indicate the presence of a condition
    and a 0 to indicate its absence. In this table,
    where the rows are ordered by the measure of
    consistency with sufficiency, the first row
    (1101), for example, represents the causal
    configuration
  • MALEPMT_FATHER_AT_BIRTHpmt_father_at_age_11
    MOTHER_POST_16_EDUCATED
  • with the upper case letters indicating
    membership in a set and lower case letters
    non-membership. The proportion of the 34 cases in
    this configuration who achieve the outcome, i.e.
    0.824, appears in the consistency column.
  • The second step is to determine a threshold
    for quasi-sufficiency and, in the light of this
    decision, to enter a 1 into the empty outcome
    (HQUAL_ADVANCED) column against each row (or
    causal configuration) for which the consistency
    proportion in the final column passes the
    threshold set.
  • This decision determines which
    configurations are allowed into the final
    solution.

20
Table 2 Truth table for achieving HQUAL_ADVANCED
(NCDS data, n3826)
21
Three types of cases?
  • The decision re a threshold also
    effectively determines which cases, seen as
    captured by configurations of conditions, will be
    grouped together in the final solution. In this
    illustration we will assume that there are three
    levels of outcome that we wish to understand in
    configurational terms
  • Those configurations or sets of cases in
    which more than 60 of the cases achieve the
    outcome. Passing this consistency level might be
    argued to be consistent with this level of
    outcome approaching being more or less the norm
    for these configurations. These configurations
    are also those we might want to allow forward
    into a solution for quasi-sufficiency.
  • Those configurations (sets of cases) in which
    fewer than 40 of the cases achieve the outcome.
    This level might be seen as making not achieving
    this level of outcome more or less the norm for
    these configurations.
  • The remaining configurations (sets of cases) in
    which 40 - 60 of the cases achieve the
    outcome. In these configurations neither
    achieving nor not achieving the outcome is the
    norm.
  • Clearly, these decisions require
    judgements to be made. The reader will see that
    it is easy to explore other analyses based on
    other boundaries.

22
The first group of cases.
  • Let us turn to the first group. These
    configurations have been picked out by entering
    1s and 0s in Table 2 in the HQUAL_ADVANCED
    column. Table 3a (next slide) shows the solution
    that results when fs/QCA is asked to minimise the
    configurations picked out by these 1s. These
    eight rows (causal configurations) are
    subjected to an algebraic process of Boolean
    minimisation1 (Quine, 1952 Ragin, 1987) in
    order to create the final simplest solution
  • MALEPMT_FATHER_AT_BIRTH
  • PMT_FATHER_AT_BIRTHMOTHER_POST_16_EDUCATED
  • PMT_FATHER_AT_AGE_11 MOTHER_POST_16_EDUCATED
  • The two final expressions pick out cases whose
    mothers had stayed on after 16 and had a father
    figure in the PMT class at one point of two in
    their childhood. Both males and females are
    included in these expressions. The first
    expression picks out just males who were born
    into a family setting with a father in the PMT
    class at birth.
  • 1 This proceeds as follows. Taking the first
    two rows as an example, we have 1101 and 1111.
    Clearly, at the level of quasi-sufficiency we
    have chosen the presence or absence of the third
    element makes no difference. We can therefore
    replace it with a dash to indicate this, giving
    11-1. A similar argument can be applied to the
    fourth and fifth rows (0111 and 0101) to give
    01-1. Taking 11-1 and 01-1 together, and
    continuing the process we arrive at -1-1. This is
    PMT_FATHER_AT_BIRTH MOTHER_POST_16_EDUCATED, one
    of the terms in our final solution.

23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
QCA an example of a quasi-necessary condition I
  • It might be thought, at least for some
    hypothesised meritocracy, that were academic
    ability to be appropriately defined and measured
    then some minimum level of this factor ought to
    be a necessary condition for anyone to achieve a
    degree. Table 4a illustrates this, where one cell
    should be empty if the chosen level of ability
    (X) is a strictly necessary condition for a
    degree to be achieved. Here, we might be seen as
    assuming causal homogeneity for the factor of
    ability.

Table 4a Strict necessity of some level of
ability (X) for achieving a degree
27
QCA an example of a quasi-necessary condition II
An examination by eye of the NCDS distribution of
the proportions achieving a degree at each point
of the ability scale allows us to estimate what
such a level of ability might be empirically, for
all respondents taken together. It is, in fact,
around the mean ability score and if we create a
factor setting ability as either over or under
the mean score for our subset of 3826, we obtain
Table 4b, showing that the proportion of those
obtaining a degree whose ability score is below
the mean is only 10.4. Especially given that
this proportion may include cases where the
measurement was low through either error or
chance factors, we might be willing to say that a
score above the mean approaches being a necessary
condition for achieving a degree in this sample
and is therefore a quasi-necessary condition.
Table 4b Achieving a degree by ability below and
above the mean row (column )1

1 As it happens this test only has discrete
scores, from 0 to 80. The mean lies between two
of these scores.
28
QCA an example of a quasi-necessary condition
III
However, we can not be satisfied with this
conclusion which, as we said, effectively assumes
causal homogeneity, with ability operating in the
same way across all types of cases and, of
course, leaves us wondering about the features of
the cases amongst the 10.4. We obviously want
to know whether there are sets of cases
perhaps, for example, differentiated by social
class - for whom being either above or below the
mean, when conjoined with other factors, is
either necessary and/or sufficient or not for
achieving a degree (or quasi-necessary or
quasi-sufficient), especially as apparent returns
to ability vary by class, as Figure 5 (next
slide), produced using a slightly different class
origin categorisation, clearly shows.
29
Figure 5 Proportions gaining a degree by ability
at age 11 and social class
30
QCA an example of a quasi-necessary condition
IV
  • To explore these questions, we might undertake an
    analysis that includes a measure of ability being
    over the mean, given what we found in Table 4b.
    Let us undertake an analysis of
  • HQUAL_DEGREE
  • function (ABILITY_ABOVE_MEAN, MALE,
    PMT_FATHER_AT_BIRTH,
  • PMT_FATHER_AT_AGE_11, MOTHER_POST_16_EDUCATE
    D).
  • The relevant truth table is shown in Table 5
    (next slide), with the rows ordered by
    consistency. We can see that the first five rows
    have a consistency level of 0.40 or above, which
    we might label as implying that for these cases,
    gaining a degree is, all else being equal, a
    definite possibility, something that is a pretty
    common occurrence in their milieus. Each of these
    configurations is characterised by having ability
    above the mean, but conjoined with several
    supportive paternal and maternal ascriptive
    factors, and, in most cases, with male sex. The
    minimised solution for these rows is shown in
    Table 6 (two slides on) where ABILITY_ABOVE_MEAN
    appears, as a necessary condition should, in each
    expression.
  • We will return to the somewhat paradoxical
    threshold-dependent sense which the term
    necessary has in this claim after a subsequent
    example.

31
Table 5
32
Table 6 Minimised solution for Table 5, for
first five rows
--- TRUTH TABLE SOLUTION --- frequency
cutoff 9.000 consistency cutoff 0.417


raw unique


coverage coverage consistency


-------- ---------- ----------- ABILITY_AB
OVE_MEANMALEPMT_FATHER_AT_BIRTH PMT_FATHER_AT_A
GE_11
0.184 0.065
0.485 ABILITY_ABOVE_MEANMALE
PMT_FATHER_AT_BIRTH MOTHER_POST_16_EDUCATED

0.141 0.022 0.477
ABILITY_ABOVE_MEANMALEPMT_FATHER_AT_AGE_11 MO
THER_POST_16_EDUCATED
0.159 0.039
0.466 ABILITY_ABOVE_MEANPMT_FATHER_AT_BIRTH
PMT_FATHER_AT_AGE_11MOTHER_POST_16_EDUCATED
0.239 0.120 0.452 solution
coverage 0.365 solution consistency
0.453
33
QCA an example of a quasi-necessary condition V
A further inspection of Table 5 shows, as we
might expect, that having this level of ability
characterises the top half of the ordered table
(14 out of the 16 rows). However, there are
exceptions. The first, in the twelfth row, is the
configuration, with only 34 cases
ability_above_meanMALEPMT_FATHER_AT_BIRTH P
MT_FATHER_AT_AGE_11MOTHER_POST_16_EDUCATED This
conjunction of lower ability with supportive
ascriptive factors is associated with some 20.6
achieving a degree, some way above the mean of
13.3.
34
QCA an example of a quasi-necessary condition VI
We might be especially interested in exploring
what it is about those with lower than mean
ability that might explain their achieving
proportionally more degrees than expected. It is
likely, as we can see from this example, to be
the presence of supporting ascriptive factors.
However, the numbers become very small in some of
the relevant rows in Table 5. For this reason, we
will explore this question using a different
boundary within the ability scale. Sixty-one
percent of those achieving degrees in the 3826
have ability in the top 20 of the overall
distribution in the NCDS (see Table 7). We can
use the remaining 39 to explore what factors,
conjoined with being outside the top 20 are
associated with raising the proportion gaining a
degree. We will define, for current purposes,
ability in the top 20 as high ability.
Table 7 Degrees by High Ability (i.e. ability in
top 20) (column )
35
QCA an example of a quasi-necessary condition
VII
Therefore let us undertake a Boolean analysis
parallel to the earlier one but that excludes the
top 20 of the ability range. Table 8 (next
slide) is the relevant truth table, ordered by
consistency. A glance at this shows that, for
these cases, mothers education is a key factor
in raising the likelihood of a degree. If we
set a 0.20 threshold to explore this (having
noted the jump from 0.16 to 0.20 in the
consistency column), we obtain the solution in
Table 9 (two slides on). Within the confines of
this analysis, i.e. for those not of high ability
as defined, MOTHER_POST_16_EDUCATED is necessary
to raise the proportion obtaining a degree to
20, as is also a fathers class position in the
PMT classes for at least one of the two points
included. However, the low coverage figure for
the solution should be noted (0.296). Amongst
those not of high ability as defined, more
degrees (140) are gained by individuals outside
of the configurations included in this solution
than by those within them (59). It must therefore
be stressed that the sense of necessary here is
necessary to raise the proportion for a
configuration to 0.2 or better and not the sense
that it is not possible for an individual to gain
a degree without a suitably educated mother. Many
do precisely the latter.
36
Table 8 Degree by sex, class and mothers
education (only for those whose ability is
outside the top 20)
37
Table 9 Degree by sex, class and mothers
education (only for those whose ability is
outside the top 20)
--- TRUTH TABLE SOLUTION --- frequency cutoff
17.000 consistency cutoff 0.200
raw
unique
coverage coverage
consistency
---------- ----------
----------- PMT_FATHER_AT_AGE_11 MOTHER_POST_1
6_EDUCATED 0.276 0.201
0.239 malePMT_FATHER_AT_BIRTH MOTHER_POST_16_
EDUCATED 0.095 0.020
0.202 solution coverage 0.296 solution
consistency 0.236
38
QCA Limited Diversity in Datasets and
Counterfactual Reasoning
In the examples we have used above, and with the
number of conditions employed in those models, we
did not experience the problem of very small
numbers in some rows of the truth table that can
arise with more conditions as a consequence of
(i) the exponential increase in the number of
rows as more conditions are included and (ii) the
relations or correlations - between conditions
in the empirical world (Ragin Sonnett, 2005).
Small numbers of cases in some configurations
constitute a problem because it is difficult to
make a valid statement about a group of cases
who, empirically, only appear in small numbers.
In regression analyses, since the weight of the
various combinations of scores on variables is
taken into account in calculating average net
effects, this problem is effectively dealt with
mechanically, partly via the use of significance
tests. Ragin has suggested a range of ways of
using counterfactual reasoning to address the
problems caused by limited diversity. For our
use of these approaches with the NCDS data, which
we will not have time to discuss, see Cooper
Glaesser (2008).
39
QCA Some Problems in its Use With Large Datasets
We will introduce here some of the problems and
issues that arise for us in using QCA with large
n data. We will begin with problems that are
not peculiar to QCA since they parallel the
correlation / causation problem in conventional
quantitative analyses. We will then discuss
some problems that are more QCA-specific, though,
to some extent, it must be remembered, these may
be a consequence of its relatively recent
development. Unlike regression, QCA has not been
under development for more than a century!
40
Although we may, and certainly should, have
inserted some cautious words (potentially,
possible, etc.) before the word causal at
various places in this talk, we have not yet
addressed the question of whether QCA, as an
analytic tool, is able to avoid analogous
problems to those associated with moving from
correlations to causal claims in the regression
approach. Clearly, we might enter into a Boolean
model a condition that we then found to be
logically necessary, for example, for some
outcome, but which we would not want to regard as
truly causal. Two types of such conditions are
worth distinguishing.
41
QCA non-causal conditions I
Alcohol might be a necessary (and causal)
condition for drunkenness, but, in a society in
which it was always mixed with tonic water, we
would want to be able to reject a claim (which
QCA could obviously deliver, if used
mechanically) that tonic water was a necessary
causal condition for drunkenness. We would do
this, presumably, by reference to existing
theoretical knowledge, preferably of the
mechanisms and processes involved in the
production of drunkenness and/or by comparisons
with other sets of findings where tonic water was
not mixed with alcohol, etc1.
1 Cartwright (2007) provides a formal treatment
of this correlation/causation problem in the
context of QCA.
42
QCA non-causal conditions II
To avoid problems of infinite regress, we would
want to be able to distinguish some types of
causal necessary conditions from others. It may
well be necessary for oxygen to be present in
order for degrees to be achieved, but we wouldnt
normally expect to address this in an analysis of
educational achievement. Mackies (1974)
concept of the causal field provides a way of
addressing this potential problem. This field
acts as a background context which absorbs the
causal factors we would not expect to see
referred to as part of an explanation of some
particular outcome under examination.
43
QCA non-causal conditions III
Having noted these problems, we would
nevertheless want to argue that, in our earlier
analyses, there are plausible mechanisms implied
by such summarising conditions as social class.
These conditions (class, ability, etc.) or, at
least, the more specific factors they summarise,
are plausible causal factors. Furthermore, when
addressing some evaluative questions (e.g. is
Britain a meritocracy?), the question itself,
once its constituent terms are defined, usually
points to the relevant factors to include in a
configurational analysis (Cooper, 2005, 2006).
44
QCA Underdetermination of theory by data, etc.
We might find in some population that being in
the set maleWORKING_CLASS is perfectly
sufficient for NOT achieving a given level of
educational qualification. However, whether
this is due to working class females lacking some
capacity or disposition required to cope with the
appropriate curriculum or whether, on the other
hand, some form of educational apartheid ensures
that no working class female is allowed to enter
the institution offering the curriculum, clearly
can not be read off from the Boolean expression.
Of course, other Boolean models perhaps could
be used to provide part of the answer (exploring
what happens to other females, to working class
males including dispositional factors) but,
ideally, we need knowledge of the processes and
mechanisms that generate the observed outcomes.
Nothing in Ragins work, we should note, suggests
that he thinks otherwise.
45
QCA problems to do with randomness
We might find that the configuration HIGH_ABILITY
SERVICE_CLASS has a consistency with
sufficiency of, say, 0.90, for achieving some
outcome, thereby reaching a level that Ragin
would regard as indicating quasi-sufficiency.
However, is this gap between 1.00 and 0.90 to be
explained by our having the equivalent of an
underspecified model in a regression analysis
(e.g. perhaps some missing ascriptive factors or
a lack of factors concerning choice) or by the
existence of stochastic elements in the social
world (and/or measurement or sampling error)?
In the former case, there exists some causal
heterogeneity yet to be picked out by the
conditions entered in the model. It might be that
HIGH_ABILITY SERVICE_CLASS MALE has perfect
consistency with sufficiency, for example. This
would leave us, however, with HIGH_ABILITY
SERVICE_CLASS male having a lower consistency
than 0.90 and return us to the same question
again, but this time just for females.
46
QCA and counterfactualist perspectives of
causation
A counterfactualist perspective on causation
(e.g. Morgan Winship, 2007) could be used to
raise questions about some QCA-derived claims re
causality in the same way it raises questions
about some regression-based forms of analysis
that basically use a branch of mathematics to
describe relations in datasets1. On the other
hand, a move from a net effects perspective (one
assuming independently manipulable independent
variables) to one emphasising conjunctural
causation might be expected to make it less
likely that unjustified counterfactual claims are
made by policy makers on the basis of research
findings, especially about the effects of
intervening to change a single factor without
taking account of its context.
1 For a relevant and interesting exchange of
views, see Ragin Rihoux, 2004a,b Lieberson,
2004 Seawright, 2004 Mahoney, 2004.
47
More QCA-specific issues inference from samples
to populations I
The first point concerns work that uses samples
from some population. This is usually the
situation we find ourselves in when working with
large datasets. Although attempts have been made
(e.g. in earlier version of the fs/QCA software)
to incorporate significance testing (see also
Ragin, 2000, and Smithson and Verkuilen, 2006),
this is an area requiring more work. Especially
when numbers become small in some rows of a truth
table, and especially when survey data are being
used, a critic will always be able to ask whether
sampling (or measurement) error has been taken
into account. Although we have considerable
sympathy with the view that judgement should play
a role in these situations especially as
significance tests are frequently employed when
the conditions for their use are not met we
also recognise that more work on incorporating
significance testing into QCA would be useful,
simply because chance always offers a potential
threat to any analytic claim we might make.
But, note that Ragin (1987, 2000) has a
different perspective on populations to the one
implied here.
48
More QCA-specific issues inference from samples
to populations II
A related problem we have ignored during the talk
so far is that of missing data. Can we assume
that the Boolean solutions we have presented,
often based on smallish subsets of the whole NCDS
(because of the missing data problem) would hold
for the NCDS as a whole? This would seem unlikely
unless the missing data have been generated by
random rather than systematic processes. Of
course, it is possible to undertake some simple
checks to see whether any bias is likely to have
been introduced. It is also possible to use
sophisticated techniques (multiple imputation,
etc.) to replace missing data, but such
approaches require considerable faith in the very
linear models that Ragin and others have argued
are often unhelpful in the social world. This is
a difficult problem to which we intend to give
further thought.
49
More QCA-specific issues case knowledge (or its
lack) in large n contexts
  • We lack, in the traditional sense, the detailed
    case knowledge that Ragin argues is required to
    undertake QCA.
  • The NCDS, in one sense, does contain a mass of
    data on each individual respondent but, for
    example,
  • it is collected via techniques that are likely to
    generate considerable error and,
  • (ii) it is not possible for us to return to the
    respondent to correct likely errors or to seek
    new data from earlier periods as analyses
    develop.

50
More QCA-specific issues quasi as opposed to
perfect necessity and sufficiency
Repeating what we said earlier there is the
question of whether and when it makes sense to
ever stop at quasi- levels of consistency, i.e.
to ignore the deviant cases in a row (or to allow
a ceteris paribus clause). More generally, the
use of weak implication (quasi-sufficiency and
quasi-necessity as opposed to sufficiency and
necessity) deserves more discussion (but see
Abell, 1971, and also Goertz, 2005 Waldner,
2005 Sekhon, 2005 for a recent exchange).
51
Weve raised a lot of problems here, though we
ourselves believe QCA to be a very important
addition to the armoury of the social scientist
interested in exploring potentially causal
relations. The fuzzy set variety of QCA allows
the conjunctural perspective to be brought to
bear more finely than the crisp set version we
have discussed here, but, inevitably, given the
nature of fuzzy sets and logic, brings along some
additional problems (many addressed in Ragins
own account in Fuzzy Set Social Science). We
are looking forward to further developments of
these methods and, in particular, to Ragins
forthcoming new book Redesigning Social Inquiry
Fuzzy Sets and Beyond.
52
References Abell, P. (1971) Model Building in
Sociology. London Weidenfeld Nicolson. Abbott,
A. (2001) Time Matters. London Chicago Chicago
University Press. Boudon, R. (1974a) The logic of
sociological explanation. Harmondsworth
Penguin. Boudon, R. (1974b) Education,
Opportunity and Social Inequality. NY
Wiley-Interscience. Byrne, D. (1998) Complexity
Theory and the Social Sciences. London
Routledge. Byrne, D. (2002) Interpreting
Quantitative Data. London Sage. Cartwright, N.
(2007) Hunting Causes and Using Them Approaches
in Philosophy and Economics. Cambridge Cambridge
University Press. Cooper, B. (2005) Applying
Ragins crisp and fuzzy set QCA to large
datasets social class and educational
achievement in the National Child Development
Study. Sociological Research Online. 10, 2
lthttp//www.socresonline.org.uk/10/2/cooper.htmlgt
Cooper, B. (2006) Using Ragins Qualitative
Comparative Analysis with longitudinal datasets
to explore the degree of meritocracy
characterising educational achievement in
Britain. Paper presented to the Sociology of
Education SIG at the Annual Meeting of the
American Educational Research Association, San
Francisco. Cooper B. and Glaesser, J. (2007)
Exploring Social Class Compositional Effects on
Educational Achievement with Fuzzy Set Methods A
British Study. Paper presented to the Sociology
of Education SIG at the Annual Meeting of the
American Educational Research Association,
Chicago. Cooper B. Glaesser, J. (2008)
Exploring alternatives to the regression analysis
of quantitative survey data in education what
does the configurational approach have to offer?
Paper presented at the Annual Meeting of the
American Educational Research Association, New
York. Cooper B. Glaesser, J. (in press) How has
educational expansion changed the necessary and
sufficient conditions for achieving professional,
managerial and technical class positions in
Britain? A configurational analysis. Sociological
Research Online. Freedman, D.A. (1987) As others
see us a case study in path analysis. Journal of
Educational Statistics. 12, 2, 101-128. Freedman,
D.A. (1997) From association to causation via
regression. In McKim, V.R. Turner, S.P. (Eds)
Causality in Crisis? Statistical Methods and the
Search for Causal knowledge in the Social
Sciences. Notre Dame, Indiana University of
Notre Dame Press. Glaesser, J. (forthcoming,
2009) Just how flexible is the German selective
secondary school system? A configurational
analysis. International Journal of Research and
Method in Education. Goertz, G. (2005) Necessary
condition hypotheses as deterministic or
probabilistic does it matter? Qualitative
Methods Newsletter of the American Political
Science Association Organized Section on
Qualitative Methods. Spring 2005, 22-27. Gorard,
S. (2006) Towards a judgement-based statistical
analysis. British Journal of Sociology of
Education. 27, 1, 67-80. Hauser, R. (1976) On
Boudons model of social mobility. The American
Journal of Sociology. 81, 4, 911-928 Hedström, P.
(2005) Dissecting the Social On the Principles
of Analytical Sociology. Cambridge Cambridge
University Press. Lieberson, S. (1985). Making it
Count the improvement of Social Research and
Theory. Berkeley University of California Press.
Lieberson, S. (2004) Comments on the use and
utility of QCA. In Qualitative Methods
Newsletter of the American Political Science
Association Organized Section on Qualitative
Methods. Fall 2004, Vol. 2, No. 2, 13-14. Mackie,
J. (1974) The Cement of the Universe. Oxford
Clarendon Press.
53
Mahoney, J. (2001) Beyond correlational
analysis recent innovations in theory and
method. Sociological Forum. 16, 3
,575-593. Mahoney, J. (2004) Reflections on
fuzzy-set/QCA. In Qualitative Methods Newsletter
of the American Political Science Association
Organized Section on Qualitative Methods. Fall
2004, Vol. 2, No. 2, 17-21. Mahoney, J. Goertz,
G. (2006) A tale of two cultures contrasting
quantitative and qualitative research. Political
Analysis, 14, 3, 227-249. Morgan S.L. Winship,
C. (2007) Counterfactuals and Causal Inference
Methods and Principles for Social Research.
Cambridge Cambridge University Press. Ormerod,
P. (1998) Butterfly Economics. London Faber and
Faber. Pawson, R. Tilley, N. (1997) Realistic
Evaluation. London Sage. Pearl, J. (2000)
Causality models, reasoning and inference.
Cambridge Cambridge University Press. Quine,
W.V. (1952) The problem of simplifying truth
functions. American Mathematical Monthly,
Vol. 59, No. 8, pp. 521-531. Ragin, C.C. (1987)
The comparative method. Berkeley Los Angeles
California University Press. Ragin, C.C. (2000)
Fuzzy set social science. Chicago Chicago
University Press. Ragin, C.C. (2003) Recent
advances in fuzzy-set methods and their
application to policy questions.
lthttp//www.compasss.org/Ragin2003.PDFgt. Ragin,
C.C. (2005) From fuzzy sets to crisp truth
tables. lthttp//www.compasss.org/Raginfztt_April05
.pdf gt Ragin, C.C. (2006a) Set relations in
social research evaluating their consistency and
coverage. Political Analysis. 14, 291-310.
Ragin, C.C. (2006b) The limitations of net
effects thinking. In Rihoux, B. Grimm, H. (Eds)
Innovative Comparative Methods for Political
Analysis, NY Springer. Ragin, C.C. Rihoux, B.
(2004a) Qualitative Comparative Analysis (QCA)
state of the art and prospects. In Qualitative
Methods Newsletter of the American Political
Science Association Organized Section on
Qualitative Methods. Fall 2004, Vol. 2, No. 2,
3-13. Ragin, C.C. Rihoux, B. (2004b) Replies
to commentators reassurances and rebuttals. In
Qualitative Methods Newsletter of the American
Political Science Association Organized Section
on Qualitative Methods. Fall 2004, Vol. 2, No. 2,
22-24. Ragin, C.C. and Sonnett, J. (2005) Between
complexity and parsimony limited diversity,
counterfactual cases, and comparative analysis.
In Kropp, S. And Minkenberg, M. (Eds) Vergleichen
in der Politikwissenschaft. WiesbadenVS Verlag
für Sozialwissenschaften. Ragin, C.C., Rubinson,
C., Schaefer, D., Anderson, S., Williams, E. and
Giesel, H. (2006) User's Guide to
Fuzzy-Set/Qualitative Comparative Analysis 2.0.
Tucson, Arizona Department of Sociology,
University of Arizona. Ron, A. (2002) Regression
analysis and the philosophy of social science a
critical realist view. Journal of Critical
Realism. 1, 1, 119-142. Rothman K.J. (1976)
Causes. American Journal of Epidemiology. 104, 6,
587-592. Seawright, J. (2004) Qualitative
comparative analysis vis-à-vis regression. In
Qualitative Methods Newsletter of the American
Political Science Association Organized Section
on Qualitative Methods. Fall 2004, Vol. 2, No. 2,
14-17. Sekhon, J.S. (2005) Probability tests
require distributions. Qualitative Methods
Newsletter of the American Political Science
Association Organized Section on Qualitative
Methods. Spring 2005, 29-30. Smithson, M.
Verkuilen, J. (2006) Fuzzy Set Theory
Applications in the Social Sciences. London
Sage. Sörensen, A. (1998) Theoretical mechanisms
and social processes. In Hedström, P. Swedberg,
R. (Eds) Social Mechanisms an analytical
approach to social theory. Cambridge Cambridge
University Press. Taagepera, R. (2005) Predictive
versus postdictive models. Paper presented to the
3rd conference of the European Consortium for
Political Research. Budapest, September
2005. Waldner, D. (2005) It aint necessarily so
or is it? Qualitative Methods Newsletter of
the American Political Science Association
Organized Section on Qualitative Methods. Spring
2005, 27-29.
Write a Comment
User Comments (0)
About PowerShow.com