Key Variables: Social Science Measurement and Functional Form Presentation to: - PowerPoint PPT Presentation


PPT – Key Variables: Social Science Measurement and Functional Form Presentation to: PowerPoint presentation | free to download - id: 5170bf-NjEwZ


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Key Variables: Social Science Measurement and Functional Form Presentation to:


Key Variables: Social Science Measurement and Functional Form Presentation to: Interpreting results from statistical modelling A seminar for Scottish ... – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 63
Provided by: pl3
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Key Variables: Social Science Measurement and Functional Form Presentation to:

Key Variables Social Science Measurement and
Functional Form Presentation to Interpreting
results from statistical modelling A seminar
for Scottish Government Social Researchers,
Edinburgh, 1 April 2009
  • Dr Paul Lambert and Professor Vernon Gayle
    University of Stirling

Key Variables Social Science Measurement and
Functional Form
1) Working with variables - Betas in Society and Demystifying Coefficients
2) Key Variables and social science measurement - Harmonisation and standardisation - An example occupations
3) Functional Form
Betas in Society and Demystifying
  • Dorling, D., Simpson, S. (Eds.). (1999).
    Statistics in Society The Arithmetic of
    Politics. London Arnold.
  • Irvine, J., Miles, I., Evans, J. (Eds.).
    (1979). Demystifying Social Statistics. London
    Pluto Press.
  • Famous works on critical interpretation of social
    statistics tend to have a univariate / bivariate
  • Measuring unemployment averaging income
    bivariate significance tests correlation vs
  • But social survey analysts usually argue that
    complex multivariate analyses are more
  • Critical interpretation of joint relative effects
  • Attention to effects of key variables in
    multivariate analysis

  • A program like SPSS .. has two main components
    the statistical routines, .. and the data
    management facilities. Perhaps surprisingly, it
    was the latter that really revolutionised
    quantitative social research Procter, 2001
  • Socio-economic processes require comprehensive
    approaches as they are very complex (everything
    depends on everything else). The data and
    computing power needed to disentangle the
    multiple mechanisms at work have only just become
    available. Crouchley and Fligelstone 2004

Large scale survey data 2 technological themes
  • Were data rich (but analysts poor)
  • Plenty of variables (a thousand is common)
  • Plenty of cases
  • We work overwhelmingly through individual
    analysts micro-computing
  • impact of mainstream software
  • Pressure for simple / accessible / popular
    analytical techniques (whatever happened to
    loglinear models?)
  • Propensity for simple data management
  • Specialist development of very complex analytical
    packages for very simple sets of variables

Survey research Access, manipulate analyse
patterns in variables (variable by case matrix)
Critical role of syntactical records in working
with data variables
  • Reproducible (for self)
  • Replicable (for all)
  • Paper trail for whole lifecycle
  • Cf. Dale 2006 Freese 2007
  • In survey research, this means using clearly
    annotated syntax files
  • (e.g. SPSS/Stata)
  • Syntax Examples

Stata syntax example (do file)

Some comments on survey analysis software for
analysing variables..
  • Data management and data analysis must be seen as
    integrated processes
  • Stata is the most effective software, as it
    combines advanced data management and data
    analysis functionality and makes good
    documentation easy
  • For an extended example of using Stata,
    concentrating on variable operationalisations and
  • Lambert, P. S., Gayle, V. (2009). Data
    management and standardisation A methodological
    comment on using results from the UK Research
    Assessment Exercise 2008. Stirling University of
    Stirling, Technical paper 2008-3 of the Data
    Management through e-Social Science research Node
  • E.g. do http//
  • E.g. use http//
    _3.dta, clear

Working with variables and understanding
variable constructions
processes by which survey measures are defined
and subsequently interpreted by research analysts
  • Meaning?
  • Coding frames re-coding decisions metric
    transformations and functional forms relative
    effects in multivariate models
  • Data collection and data analysis
  • Cf.

ßs - Wheres the action?
  • If we have lots of variables, lots of cases, yet
    often quite simple techniques and software, the
    action is primarily in the variable
  • The example of social mobility research see
    Lambert et al. (2007)
  • How we chose between alternative measures
  • How much data management we try
  • (or bother with)
  • Plus other issues in how we analyse interpret
    the coefficients from the models we use
    (..elsewhere today..)

i) Choosing measures
  • See (2) below
  • A sensible starting point is with key variables
  • Approaches to standardisation / harmonisation
  • Lack of awareness of existing resources
  • See (3) below
  • Influence of functional form

ii) Data management e.g. recoding data

ii) Data management e.g. Missing data / case
ii) Data management e.g. Linking data
  • Linking via ojbsoc00
  • c1-5 original data / c6 derived from data / c7
    derived from

Aspects of data management
  • Manipulating data
  • Recoding categories / operationalising
  • Linking data
  • Linking related data (e.g. longitudinal studies)
  • combining / enhancing data (e.g. linking micro-
    and macro-data)
  • Secure access to data
  • Linking data with different levels of access
  • Detailed access to micro-data cf. access
  • Harmonisation standards
  • Approaches to linking concepts and measures
  • Recommendations on particular variable
  • Cleaning data
  • missing values implausible responses extreme

The significance of data management for social
survey research see http//
entdetail.asp?id2151 and
  • The data manipulations described above are a
    major component of the social survey research
  • Pre-release manipulations performed by
    distributors / archivists
  • Coding measures into standard categories
  • Dealing with missing records
  • Post-release manipulations performed by
  • Re-coding measures into simple categories
  • We do have existing tools, facilities and expert
    experience to help usbut we dont make a good
    job of using them efficiently or consistently
  • So the significance of DM is about how much
    better research might be if we did things more

Data Management through e-Social Science (DAMES
  • Supporting operations on data widely performed by
    social science researchers
  • Matching data files together
  • Cleaning data
  • Operationalising variables
  • Specialist data resources (occupations
    education ethnicity)
  • Why is e-Social Science relevant?
  • Dealing with distributed, heterogeneous datasets
  • Generic data requirements / provisions
  • Lack of previous systematic standards (e.g.
    metadata security citation procedures
    resources to review/obtain suitable data)

Working with variables further issues
  • Re-inventing the wheel
  • In survey data analysis, somebody else has
    already struggled through the variable
    constructions your are working on right now
  • Increasing attention to documentation and
  • cf Dale 2006 Freese 2007
  • Guidance and support
  • In the UK, use
  • Most guidance concerns collecting harmonising
  • Less is directed to analytically exploiting

Key Variables Social Science Measurement and
Functional Form
1) Working with variables - Betas in Society and Demystifying Coefficients
2) Key Variables and social science measurement - Harmonisation and standardisation - An example occupations
3) Functional Form
Key variables and social science measurement
  • Defining key variables
  • Commonly used concepts with numerous previous
  • Methodological research on best practice / best
  • cf. Stacey 1969 Burgess 1986
  • ONS harmonisation primary standards

Key variables concepts and measures
Variable Concept Something useful
Occupation Class stratification unemployment
Education Credentials Ability Merit Schneider 2008
Ethnic group Ethnicity race religion national origins Bosveld et al 2006
Age Age life course stage cohort Abbott 2006
Gender Gender household / family context
Income Income wealth poverty SN 3909
Key variables Standardisation
  • Much attention to key variables involves
    proposing optimum / standard measures
  • UK ONS Harmonisation
  • EU Eurostat standards
  • Studies of criterion and construct validity
  • Standardisation impacts other analyses
  • Affects available data
  • Affects popular interpretations of data


Key variables Harmonisation (across countries across time periods)
  • a method for equating conceptually similar but
    operationally different variables.. Harkness et
    al 2003, p352
  • Input harmonisation esp. Harkness et al 2003
  • harmonising measurement instruments H-Z and
    Wolf 2003, p394
  • unlikely / impossible in longer-term longitudinal
  • common in small cross-national and short term
    lngtl. studies
  • Output harmonisation (ex-post harmonisation)
  • harmonising measurement products H-Z and
    Wolf 2003, p394

More on harmonisation esp. HZ and Wolf 2003,
  • Numerous practical resources to help with input
    and output harmonisation
  • e.g. ONS
    isation UN / EU / NSIs LIS project IPUMS
  • Cross-national e.g. HZ Wolf 2003 Jowell et
    al. 2007
  • Room for more work in justifying/ understanding
    interpretations after harmonisation


  • the degree to which survey measures or questions
    are able to assess identical phenonema across two
    or more cultures Harkness et al 2003, p351

Measurement equivalence involves same instruments and equality of measures (e.g. income in pounds)
Functional equivalence involves different instruments, but addresses same concepts (e.g. inflation adjusted income)
  • Equivalence is the only meaningful criterion if
    data is to be compared from one context to
    another. However, equivalence of measures does
    not necessarily mean that the measurement
    instruments used in different countries are all
    the same. Instead it is essential that they
    measure the same dimension. Thus, functional
    equivalence is more precisely what is required
  • HZ and Wolf 2003, p389

Harmonisation equivalence combined
  • Universality or specificity in variable
  • Universality collect harmonised measures,
    analyse standardised schemes
  • Specificity collect localised measures, analyse
    functionally equivalent schemes
  • Most prescriptions aim for universality
  • But specificity is theoretically better
  • Specificity is more easily obtained than is often
  • Especially for well-known key variables

Working with key variables - speculation
  • a) Data manipulation skills and inertia
  • I would speculate that around 80 of applications
    using key variables dont consult literature and
    evaluate alternative measures, but choose the
    first convenient and/or accessible variable in
    the dataset
  • Data supply decisions (what is on the archive
    version) are critical
  • Much of the explanation lies with lack of
    confidence in data manipulation / linking data
  • Too many under-used resources cf.

Working with key variables speculation b)
Endogeneity and key variables
  • everything depends on everything else
    Crouchley and Fligelstone 2004
  • We know a lot about simple properties of key
  • Key variables often change the main effects of
    other variables
  • Simple decisions about contrast categories can
    influence interpretations
  • Interaction terms are often significant and
  • We have only scratched the surface of
    understanding key variables in multivariate
    context and interpretation
  • Key variables are often endogenous (because they
    are key!)
  • Work on standards / techniques for multi-process
    systems and/or comparing structural breaks
    involving key variables is attractive

An example Occupations
  • In the social sciences, occupation is seen as one
    of the most important things to know about a
  • Direct indicator of economic circumstances
  • Proxy Indicator of social class or
  • Projects at Stirling (
  • GEODE how social scientists use data on
  • DAMES extending GEODE resources

Stage 1 - Collecting Occupational Data (and
making a mess)
Example 1 BHPS Example 1 BHPS Example 1 BHPS Example 1 BHPS Example 1 BHPS Example 1 BHPS
Occ description Employment status Employment status SOC-2000 SOC-2000 EMPST
Miner (coal) Employee Employee 8122 8122 7
Police officer (Serg.) Supervisor Supervisor 3312 3312 6
Electrical engineer Employee Employee 2123 2123 7
Retail dealer (cars) Self-employed w/e Self-employed w/e 1234 1234 2
Example 2 European Social Survey, parents data Example 2 European Social Survey, parents data Example 2 European Social Survey, parents data Example 2 European Social Survey, parents data Example 2 European Social Survey, parents data Example 2 European Social Survey, parents data
Occ description Occ description SOC-2000 SOC-2000 EMPST EMPST
Miner Miner ?8122 ?8122 ?6/7 ?6/7
Police officer Police officer ?3312 ?3312 ?6/7 ?6/7
Engineer Engineer ?? ?? ?? ??
Self employed businessman Self employed businessman ?? ?? ?1/2 ?1/2
Occupations we agree on what we should do
  • Preserve two levels of data
  • Source data Occupational unit groups, employment
  • Social classifications and other outputs
  • Use transparent (published) methods i.e. OIRs
  • for classifying index units
  • for translating index units into social
  • for instance..
  • Bechhofer, F. 1969. 'Occupations' in Stacey, M.
    (ed.) Comparability in Social Research. London
  • Jacoby, A. 1986. 'The Measurement of Social
    Class' Proceedings from the Social Research
    Association seminar on "Measuring Employment
    Status and Social Class". London Social Research
  • Lambert, P.S. 2002. 'Handling Occupational
    Information'. Building Research Capacity 4 9-12.
  • Rose, D. and Pevalin, D.J. 2003. 'A Researcher's
    Guide to the National Statistics Socio-economic
    Classification'. London Sage.

in practice we dont keep to this...
  • Inconsistent preservation of source data
  • Alternative OUG schemes
  • SOC-90 SOC-2000 ISCO SOC-90 (my special
  • Inconsistencies in other index factors
  • employment status supervisory status number
    of employees
  • Individual or household current job or career
  • Inconsistent exploitation of Occupational
  • Numerous alternative occupational information
  • (time country format)
  • Substantive choices over social classifications
  • Inconsistent translations to social
    classifications by file or by fiat
  • Dynamic updates to occupational information
  • Strict security constraints on users
    micro-social survey data
  • Low uptake of existing occupational information

GEODE provides services to help social scientists
deal with occupational information resources
  1. disseminate, and access other, Occupational
    Information Resources
  2. Link together their (secure) micro-data with OIRs

External user (micro-social data) External user (micro-social data) External user (micro-social data) External user (micro-social data) Occ info (index file) (aggregate) Occ info (index file) (aggregate) Occ info (index file) (aggregate) Occ info (index file) (aggregate) Users output (micro-social data) Users output (micro-social data) Users output (micro-social data) Users output (micro-social data)
id oug sex . oug CS-M CS-F EGP id oug CS
1 110 1 . 110 60 58 I 1 110 60 .
2 320 1 . 320 69 71 II 2 320 69 .
3 320 2 . 874 39 51 VIIa 3 320 71 .
4 874 1 . 4 874 39 .
5 874 2 . 5 874 51 .
Occupational information resources small
electronic files about OUGs
Index units distinct files (average size kb) Updates?
CAMSIS, Local OUG(e.s.) 200 (100) y
CAMSIS value labels Local OUG 50 (50) n
ISEI tools, Int. OUG 20 (50) y
E-Sec matrices Int. OUG(e.s.) 20 (200) n
Hakim gender seg codes (Hakim 1998) Local OUG 2 (paper) n
For example ISCO-88 Skill levels classification
and UK 1980 CAMSIS scales and CAMCON classes
Existing resources on occupations
  • Popular websites
  • http//
  • http//
  • Emerging resource http//
  • Some papers
  • Chan, T. W., Goldthorpe, J. H. (2007). Class
    and Status The Conceptual Distinction and its
    Empirical Relevance. American Sociological
    Review, 72, 512-532.
  • Rose, D., Harrison, E. (2007). The European
    Socio-economic Classification A New Social Class
    Scheme for Comparative European Research.
    European Societies, 9(3), 459-490.
  • Lambert, P. S., Tan, K. L. L., Gayle, V., Prandy,
    K., Bergman, M. M. (2008). The importance of
    specificity in occupation-based social
    classifications. International Journal of
    Sociology and Social Policy, 28(5/6), 179-192.

Using data on occupations further speculation
  • Growing interest in longitudinal analysis and use
    of longitudinal summary data on occupations
  • Intuitive measures (e.g. ever in Class I)
  • Lampard, R. (2007). Is Social Mobility an Echo of
    Educational Mobility? Sociological Research
    Online, 12(5).
  • Empirical career trajectories / sequences
  • Halpin, B., Chan, T. W. (1998). Class Careers
    as Sequences. European Sociological Review,
    14(2), 111-130.
  • Growing cross-national comparisons
  • Ganzeboom, H. B. G. (2005). On the Cost of Being
    Crude A Comparison of Detailed and Coarse
    Occupational Coding. In J. H. P.
    Hoffmeyer-Zlotnick J. Harkness (Eds.),
    Methodological Aspects in Cross-National Research
    (pp. 241-257). Mannheim ZUMA, Nachrichten
  • Treatment of the non-working populations
  • Seldom adequate to treat non-working as a
  • Selection modelling approaches expanding

Occupations as key variables
  • Extensive debate about occupation-based social
  • Document your procedures..
  • you may be asked to do something different..
  • When choosing between occupation-based measures
  • They all measure, mostly, the same things
  • Dont assume concepts measure measures
  • Lambert, P. S., Bihagen, E. (2007). Concepts
    and Measures Empirical evidence on the
    interpretation of ESeC and other occupation-based
    social classifications. Paper presented at the
    ISA RC28 conference, Montreal (14-17 August),
    gen_2007_version1.pdf .

Key Variables Social Science Measurement and
Functional Form
1) Working with variables - Betas in Society and Demystifying Coefficients
2) Key Variables and social science measurement - Harmonisation and standardisation - An example occupations
3) Functional Form
Functional form
  • The way in which measures are arithmetically
    incorporated in analysis
  • Level of measurement (nominal, ordinal, interval,
  • Alternative models and link functions
  • Other variables and interaction effects

a) Levels of measurement and the desire to
  • Categories are easier to envisage / communicate
  • Much harmonisation work locating into
  • Appearance of measurement equivalence
  • But functional equivalence is seldom achieved
  • Metrics are better for functional equivalence
  • E.g. Standardised income
  • How to deal with categorisations?
  • The qualitative foundation of quantity Prandy

Example categorisation and the scandalous use of
collapsed EGP/NS-SEC!
  • Ignores heterogeneity within occupations
  • Defines and hinges on arbitrary boundaries
  • Creates artefactual gender differences

The scaling alternative
  • Many concepts can be reasonably regarded as
  • cf. simplified / dichotomisted categorisations
  • Comparability / standardisation is easier with
  • Complex / Multi-process systems are easier with
  • Structural Equation Models
  • Interaction effects
  • Growing availability/use of distance score
  • Stereotyped ordered logit slogit in Stata
  • Correspondence Analysis
  • Latent variable models
  • But, scaling seems to be seen by some as a
    wicked, positivistic activity..!

Practical suggestions on the level of measurement
  • Its rare not to have a few alternative measures
    of the same concepts at different levels of
  • Good practice would be to
  • try alternative measures and see what difference
    they make
  • consider treatment of missing values in relation
    to measurement instrument choice
  • Engage as much as possible with other studies

b) Alternative models and link functions
  • The functional form of the outcome variable(s) is
    of greatest importance (influences which model is
  • Link functions perform the maths to allow for
    alternative functional forms of the outcome
  • See Talk 1 for popular alternative models

Practical observations on link functions
  • Social scientists are unduly conservative in
    choosing between alternative models
  • We tend to favour binary or metric outcomes and
    single process systems
  • Substantively, this isnt ideal
  • Pragmatically, its no longer necessary

Substantive risks (of conservative model choice)
  • Attenuated findings
  • Concentrate on certain category contrasts
  • Ignore or exacerbate extremes of distribution
  • Mis-specification
  • Ignore / mis-measure relevant ßs
  • Ignore / over-emphasise other contextual patterns
  • Endogeneity
  • ignoring multiprocess system may bias results
  • (e.g. selection bias)

Pragmatics of model choice
  • General rapid expansion in model functionality in
    statistical packages
  • Stata stands out for it wide range of data
    management and data analysis functionality
  • E.g. statsby est table outreg2 estout
    facilitate testing and comparing related models
    with different combinations of variables

c) Other variables and interaction effects
  • A very important influence on one RHS coefficient
    is what else is in the RHS and what it is
    interacted with
  • Some brief comments on
  • Offsets (constraints)
  • Interactions
  • Logit models fixed variance

A comment on offsets - for comparisons between
regressions, it is sometimes suitable to force
the coefficients of some variables (e.g.
controls) to have a certain fixed value - Below
example (predicting income) using cnsreg in
Stata, e.g. regress lninc fem age femage
matrix define mod1me(b) scalar
fem_coefmod1m1,1 constraint def 1
femfem_coef cnsreg lninc fem age femage
mcamsis, constraints(1)
Advice on Interaction Effects
  • Start with main effects get a good idea how
    they work
  • Be careful how you fit interaction effects
  • Often appealing substantively
  • In practice not always significant (especially
    higher order)
  • Hard to interpret higher order interactions
  • Over-fit - check for replication (e.g. in other
  • Always wise to formally test interactions (cf.
    armchair critics)
  • Best to construct your own interaction
    variable(s) and maybe fit them as a single X
    (especially complicated categorical interactions)

The fixed variance in logit linear cf.
categorical outcomes
  • GHS Data
  • OLS Y age left education (years)
  • Logit Y Graduate / Non Graduate
  • X Vars
  • Female
  • 4-category social Class
  • (Advantaged Lower Supervisory Semi-routine
  • Age (centred at 40)

Regression Estimates
Female -0.32 -0.34 -0.27
Age (40) -0.06 -0.06 -0.05
Supervisory -1.83 -1.85
Semi-Routine -1.98 -1.88
Routine -2.40 -2.33
Constant 17.52 17.5 17.75 18.22 18.54
Linear Regression Models
  • 1 unit change in X leading to a b change in Y
  • The b is consistent minor insignificant random
    variation (survey data)
  • As long as the X vars are uncorrelated
  • (a classical regression assumption)

Estimates (logit scale)
Parameterization ??
Female -0.24 -0.23 -0.20
Age (40) -0.03 -0.03 -0.04
Supervisory -1.46 -1.52
Semi-Routine -1.82 -1.87
Routine -2.65 -2.70
Constant -0.90 -0.80 -0.39 -0.68 -0.04
Logit Model
  • Estimates on a logit scale
  • The b estimates a shift from X10 to X11 leads
    to a change in the log odds of y1
  • Even when the X vars are uncorrelated, including
    additional variables can lead to changes in b
  • The b estimates the effect given all other X vars
    in the model
  • Fixed variance in the logit model (p2/3)

Summary Social science measurement and
functional form
  • We argue that the route to better critical
    understanding of variable effects combines
    complex analysis with many mundane, prosaic tasks
    in checking data
  • ANALYSIS Coefficient effects in multivariate
    models multi-process models understanding
    interactions etc
  • DATA MANAGEMENT Re-coding data linking data
    missing data mechanisms reviewing literature
  • Seldom central to previous methodological reviews
  • Cf.

  • Abbott, A. (2006). Mobility What? When? How? In
    S. L. Morgan, D. B. Grusky G. S. Fields (Eds.),
    Mobility and Inequality. Stanford University
  • Bosveld, K., Connolly, H., Rendall, M. S.,
    (2006). A guide to comparing 1991 and 2001 Census
    ethnic group data. London Office for National
  • Burgess, R. G. (Ed.). (1986). Key Variables in
    Social Investigation. London Routledge.
  • Crouchley, R., Fligelstone, R. (2004). The
    Potential for High End Computing in the Social
    Sciences. Lancaster Centre for Applied
    Statistics, Lancaster University, and
  • Dale, A. (2006). Quality Issues with Survey
    Research. International Journal of Social
    Research Methodology, 9(2), 143-158.
  • Dorling, D., Simpson, S. (Eds.). (1999).
    Statistics in Society The Arithmetic of
    Politics. London Arnold.
  • Freese, J. (2007). Replication Standards for
    Quantitative Social Science Why Not Sociology?
    Sociological Methods and Research, 36(2), 2007.
  • Harkness, J., van de Vijver, F. J. R., Mohler,
    P. P. (Eds.). (2003). Cross-Cultural Survey
    Methods. New York Wiley.
  • Hoffmeyer-Zlotnik, J. H. P., Wolf, C. (Eds.).
    (2003). Advances in Cross-national Comparison A
    European Working Book for Demographic and
    Socio-economic Variables. Berlin Kluwer Academic
    / Plenum Publishers.
  • Irvine, J., Miles, I., Evans, J. (Eds.).
    (1979). Demystifying Social Statistics. London
    Pluto Press.
  • Jowell, R., Roberts, C., Fitzgerald, R., Eva,
    G. (2007). Measuring Attitudes Cross-Nationally.
    London Sage.
  • Lambert, P. S., Prandy, K., Bottero, W. (2007).
    By Slow Degrees Two Centuries of Social
    Reproduction and Mobility in Britain.
    Sociological Research Online, 12(1).
  • Prandy, K. (2002). Measuring quantities the
    qualitative foundation of quantity. Building
    Research Capacity, 2, 3-4.
  • Procter, M. (2001). Analysing Survey Data. In G.
    N. Gilbert (Ed.), Researching Social Life, Second
    Edition (pp. 252-268). London Sage.
  • Schneider, S. L. (2008). The International
    Standard Classification of Education (ISCED-97).
    An Evaluation of Content and Criterion Validity
    for 15 European Countries. Mannheim MZES.
  • Stacey, M. (Ed.). (1969). Comparability in Social
    Research. London Heineman.