Managing data for social survey research: key issues and concerns - PowerPoint PPT Presentation


PPT – Managing data for social survey research: key issues and concerns PowerPoint presentation | free to download - id: d3f2c-NmRiM


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Managing data for social survey research: key issues and concerns


Harmonisation standards. Approaches to linking concepts' and measures' ( indicators' ... tools/standards for variable harmonisation and standardisation ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 37
Provided by: msfs3
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Managing data for social survey research: key issues and concerns

Managing data for social survey research key
issues and concerns
  • Paul Lambert, Dept. Applied Social Science, Univ.
  • 27th January 2009
  • Presented to the workshop The significance of
    data management for social survey research,
    University of Essex, a workshop organised by the
    Economic and Social Data Service (
    and the Data Management through e-Social Science
    research Node of the National Centre for e-Social
    Science (

Data Management though e-Social Science
  • ESRC Node funded 2008-2011
  • Aim Useful social science provisions
  • Specialist data topics occupations education
    qualifications ethnicity social care health
  • Mainstream packages and accessible resources
  • Aim To exploit/engage with existing DM resources
  • In social science e.g. CESSDA
  • In e-Science e.g. OGSA-DAI OMII

Data management means…
  • the tasks associated with linking related data
    resources, with coding and re-coding data in a
    consistent manner, and with accessing related
    data resources and combining them within the
    process of analysis …DAMES Node..
  • Usually performed by social scientists themselves
  • Most overt in quantitative survey data analysis
  • variable constructions, data manipulations
  • navigating abundance of data thousands of
  • Usually a substantial component of the work
  • Here we differentiate from archiving /
    controlling data itself

Some components…
  • Manipulating data
  • Recoding categories / operationalising
  • Linking data
  • Linking related data (e.g. longitudinal studies)
  • combining / enhancing data (e.g. linking micro-
    and macro-data)
  • Secure access to data
  • Linking data with different levels of access
  • Detailed access to micro-data cf. access
  • Harmonisation standards
  • Approaches to linking concepts and measures
  • Recommendations on particular variable
  • Cleaning data
  • missing values implausible responses extreme

Example recoding data

Example Linking data
  • Linking via ojbsoc00
  • c1-5 original data / c6 derived from data / c7
    derived from

The significance of data management for social
survey research
  • The data manipulations described above are a
    major component of the social survey research
  • Pre-release manipulations performed by
    distributors / archivists
  • Coding measures into standard categories
  • Dealing with missing records
  • Post-release manipulations performed by
  • Re-coding measures into simple categories
  • We do have existing tools, facilities and expert
    experience to help us…but we dont make a good
    job of using them efficiently or consistently
  • So the significance of DM is about how much
    better research might be if we did things more

Some provocative examples for the UK…
  • Social mobility is increasing, not decreasing
  • Popularity of controversial findings associated
    with Blanden et al (2004)
  • Contradicted by wider ranging datasets and/or
    better measures of stratification position
  • DM researchers ought to be able to more easily
    access wider data and better variables
  • Degrees, MScs and PhDs are getting easier
  • or at least, more people are getting such
  • Correlates with measures of education are
    changing over time
  • DM facility in identifying qualification
    categories standardising their relative value
    within age/cohort/gender distributions isnt, but
    should, and could, be widespread
  • Black-Caribbeans are not disappearing
  • As the 1948-70 immigrant cohort ages, the
    Black-Caribbean group is decreasingly prominent
    due to return migration and social integration of
    immigrant descendants
  • Data collectors under-pressure to measure large
    groups only
  • DM It ought to remain easy to access and analyse
    survey data on Black-Caribbeans, such as by
    merging survey data sources and/or linking with
    suitable summary measures

Our own motivation (in DAMES)
  • DM is a big part of the research process
  • ..but receives limited methodological attention
  • Poor practice in soc. sci. DM is easily observed
  • Not keeping adequate records
  • Not linking relevant data
  • Not trying out relevant variable
  • Even though..
  • There are plenty of existing resources and
    standards relevant to data management activities
  • There are suitable software and internet
    facilities (Scott Long 2009)
  • People are working on DM support (e.g. ESDS,

A bit of focus…
  • Most of the DAMES applications aim to facilitate
    one of two data management activities
  • Variable constructions
  • Coding and re-coding values
  • Linking datasets
  • Internal and external linkages

The relevance of e-Science
  • Data management through e-Social Science
  • E-Science refers to adopting a number of
    particular approaches and standards from
    computing science, to applied research areas
  • These approaches include the Grid distributed
    computing data and computing standardisation
    metadata security research infrastructures
  • DAMES (2008-11) developing services / resources
    using e-Science approaches which will help social
    scientists in undertaking data management tasks

National Centre for e-Social Science,
  • Major UK investment into UK oriented e-social
    science projects, typically
  • Handling and displaying large volumes of complex
  • E.g. GeoVue DReSS Obesity e-Lab
  • Resources for computationally demanding
    analytical tasks
  • CQeSS MoSeS
  • Standards setting in preparing / supporting data
    and research

E-Science and Data Management
  • E-Science isnt essential to good DM, but it has
    capacity to improve and support conduct of DM…
  • Concern with standards setting
  • in communication and enhancement of data
  • Linking distributed/heterogeneous/dynamic data
  • Coordinating disparate resources interrogating
    live resources
  • Contribution of metadata
  • tools/standards for variable harmonisation and
  • Linking data subject to different security levels
  • The workflow nature of many DM tasks

E.g. of GEODE Organising and distributing
specialist data resources (on occupations)
The contribution of DAMES 8 project themes

  • DAMES research Node
  • social researchers often spend more time on data
    management than any other part of the research
  • Appendix 1 other extant resources relevant to

DAMES ONS support ESDS support
UK Data Archive Qualidata Flagship social
surveys Office for National Statistics Administrat
ive data Specialist academic outputs
NCRM workshops Essex summer school ESRC RDI
initiatives CQeSS
Data Management
Data access / collection
Data Analysis
  • Some Key issues and concerns for DAMES
  • 4 good habits and principles
  • 3 Challenges

(a) Good habit Keep clear records of your DM
  • Reproducible (for self)
  • Replicable (for all)
  • Paper trail for whole lifecycle
  • Cf. Dale 2006 Freese 2007
  • In survey research, this means using clearly
    annotated syntax files
  • (e.g. SPSS/Stata)
  • Syntax Examples

Stata syntax example (do file)

Software and handling variables a personal view
  • Stata is the superior package for secondary
    survey data analysis
  • Advanced data management and data analysis
  • Supports easy evaluation of alternative measures
    (e.g. est store)
  • Culture of transparency of programming/data
  • Cf. Scott Long (2009)
  • Problems with Stata
  • Not available to all users
  • Slow estimation times

(b) Principle Use existing standards and
previous research
  • Variable operationalisations
  • Use recognised recodes / standard classifications
  • ONS harmonisation standards
  • Shaw et al. 2007
  • Cross-national standards. Hoffmeyer-Zlotnick
    Wolf 2003
  • Common vs best practices (e.g. dichotomisations)
  • Use reproducible recodes / classifications (paper
  • Other data file manipulations
  • Missing data treatments
  • Matching data files (finding the right data)

(c) Principle Do something, not nothing
  • We currently put much more effort into data
    collection and data analysis, and neglect data
  • Survey research the influence of what was on
    the archive version
  • …In my experience, a common reason why people
    didnt do more DM was because they were
    frightened to…

(d) Principle Learn how to match files
  • Complex data (complex research) is distributed
    across different files. In surveys, use key
    linking variables for...
  • One-to-one matching
  • SPSS match files /filefile1.sav
    /filefile2.sav /bypid.
  • Stata merge pid using file2.dta
  • One-to-many matching (table distribution)
  • SPSS match files /filefile1.sav
    /tablefile2.sav /bypid .
  • Stata merge pid using file2.dta
  • Many-to-one matching (aggregation)
  • SPSS aggregate outfilefile3.sav
    /meanincmean(income) /breakpid.
  • Stata collapse (mean) meanincincome, by(pid)
  • Many-to-Many matches
  • Related cases matching

Some challenges for data management..
  • (e) Agreeing about variable constructions
  • Unresolved debates about optimal measures and
  • Esp. in comparative research such as across time,
    between countries
  • In DAMES, we have particular interests in
    comparability for
  • Longitudinal comparability (http//www.longitudina
  • Scaling / scoring categories to achieve meaning
    equivalence or specific measures

Some challenges for data management..
  • (f) Worrying about data security
  • DM activities could challenge data security
  • Inspecting individual cases
  • Multiple copies of related data files
  • Ability to link with other datasets
  • Hands-on model of data review
  • New and exciting data resources
  • have more individual information
  • are more likely to be released with stringent
  • may jeopardize traditional DM approaches

Some routes to secure data
  • Secure portals for direct access to remote data
  • Secure settings (e.g. safe labs)
  • Data annonymisation and attenuation
  • Emphasis on users responsibility rather than the
    data provider

Some challenges for data management..
  • (g) Incentivising documentation / replicability
  • There is little to press researchers to better
    document DM, but much to press them not to
  • Make DM and its documentation easier?
  • Reward documentation (e.g. citations)?

Appendix 1 Existing resources (i) Data
providers - a) Documentation and metadata files
Existing resources (i) Data providers
  • Resources for variables
  • CESSDA PPP on key variables http//
  • UK Question Bank http//
  • ONS Harmonisation http//
  • Resources for datasets
  • UK Census data portal, http//
  • IPUMS international census data facilities,
  • European Social Survey, www.europeansocialsurvey.o
  • Data manipulations prior to data release
  • Missing data imputation / documentation
  • Survey design / weighting information
  • Influential most analysts use the archive

Existing resources (ii) Resource projects /
  • ESDS International ESDS Government
  • ESDS Longitudinal ESDS Qualidata
  • Helpdesks online instructions user support..
  • UK ESRC NCRM / NCeSS / RDI initiatives
  • Longitudinal data
  • Linking micro/macro -
  • Other resources / projects / initiatives
  • EDACwowe - http//
  • ….

Existing resources (iii) Analytical and software
  • Textbooks featuring data management
  • Levesque 2008 Sarantakos 2007 Scott Long
  • Software training covering DM
  • Statas data management manual
  • SPSS user group course on syntax and data
  • But generally, sustained marginalisation of DM as
    a topic
  • Advanced methods texts use simplistic data
  • Advanced software for analysis isnt usually
    combined with extended DM requirements

Existing resources (iv) Data analysts
  • Academic researchers often generate and publish
    their own DM resources, e.g.
  • Harry Ganzeboom on education and occupations,
  • Provision of whole or partial syntax programming
  • Analysts often drive wider resource provisions
    related to DM
  • CAMSIS project on occupational scales,
  • CASMIN project on education and social class

Existing resources (v) Literatures on
harmonisation and standardisation
  • National Statistics Institutes principles and
  • E.g. ONS
  • Cross-national organisations
  • E.g. UNSTATS - http//
  • Academic studies
  • E.g. Harkness et al 2003 Hoffmeyer-Zlotnick
    Wolf 2003 Jowell et al. 2007

Appendix 2 Some other selected NCeSS
projects (concerned with accessing/handling
complex data)
  • Blanden, J., Goodman, A., Gregg, P., Machin, S.
    (2004). Changes in generational mobility in
    Britain. In M. Corak (Ed.), Generational Income
    Mobility in North America and Europe. Cambridge
    Cambridge University Press.
  • Dale, A. (2006). Quality Issues with Survey
    Research. International Journal of Social
    Research Methodology, 9(2), 143-158.
  • Freese, J. (2007). Replication Standards for
    Quantitative Social Science Why Not Sociology?
    Sociological Methods and Research, 36(2), 2007.
  • Harkness, J., van de Vijver, F. J. R., Mohler,
    P. P. (Eds.). (2003). Cross-Cultural Survey
    Methods. New York Wiley.
  • Hoffmeyer-Zlotnik, J. H. P., Wolf, C. (Eds.).
    (2003). Advances in Cross-national Comparison A
    European Working Book for Demographic and
    Socio-economic Variables. Berlin Kluwer Academic
    / Plenum Publishers.
  • Jowell, R., Roberts, C., Fitzgerald, R., Eva,
    G. (2007). Measuring Attitudes Cross-Nationally.
    London Sage.
  • Levesque, R., SPSS Inc. (2008). Programming and
    Data Management for SPSS 16.0 A Guide for SPSS
    and SAS users. Chicago, Il. SPSS Inc.
  • Sarantakos, S. (2007). A Tool Kit for
    Quantitative Data Analysis Using SPSS. London
    Palgrave MacMillan.
  • Scott Long, J. (2009). The Workflow of Data
    Analysis Using Stata. Boca Raton CRC Press.
  • Shaw, M., Galobardes, B., Lawlor, D. A., Lynch,
    J., Wheeler, B., Davey Smith, G. (2007). The
    Handbook of Inequality and Socioeconomic
    Position Concepts and Measures. Bristol Policy