Managing data for social survey research: key issues and concerns - PowerPoint PPT Presentation

Loading...

PPT – Managing data for social survey research: key issues and concerns PowerPoint presentation | free to download - id: d3f2c-NmRiM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Managing data for social survey research: key issues and concerns

Description:

Harmonisation standards. Approaches to linking concepts' and measures' ( indicators' ... tools/standards for variable harmonisation and standardisation ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 37
Provided by: msfs3
Learn more at: http://www.esds.ac.uk
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Managing data for social survey research: key issues and concerns


1
Managing data for social survey research key
issues and concerns
  • Paul Lambert, Dept. Applied Social Science, Univ.
    Stirling
  • 27th January 2009
  • Presented to the workshop The significance of
    data management for social survey research,
    University of Essex, a workshop organised by the
    Economic and Social Data Service (www.esds.ac.uk)
    and the Data Management through e-Social Science
    research Node of the National Centre for e-Social
    Science (www.dames.org.uk).

2
Data Management though e-Social Science
  • DAMES www.dames.org.uk
  • ESRC Node funded 2008-2011
  • Aim Useful social science provisions
  • Specialist data topics occupations education
    qualifications ethnicity social care health
  • Mainstream packages and accessible resources
  • Aim To exploit/engage with existing DM resources
  • In social science e.g. CESSDA
  • In e-Science e.g. OGSA-DAI OMII

3
Data management means…
  • the tasks associated with linking related data
    resources, with coding and re-coding data in a
    consistent manner, and with accessing related
    data resources and combining them within the
    process of analysis …DAMES Node..
  • Usually performed by social scientists themselves
  • Most overt in quantitative survey data analysis
  • variable constructions, data manipulations
  • navigating abundance of data thousands of
    variables
  • Usually a substantial component of the work
    process
  • Here we differentiate from archiving /
    controlling data itself

4
Some components…
  • Manipulating data
  • Recoding categories / operationalising
    variables
  • Linking data
  • Linking related data (e.g. longitudinal studies)
  • combining / enhancing data (e.g. linking micro-
    and macro-data)
  • Secure access to data
  • Linking data with different levels of access
    permission
  • Detailed access to micro-data cf. access
    restrictions
  • Harmonisation standards
  • Approaches to linking concepts and measures
    (indicators)
  • Recommendations on particular variable
    constructions
  • Cleaning data
  • missing values implausible responses extreme
    values

5
Example recoding data

6
Example Linking data
  • Linking via ojbsoc00
  • c1-5 original data / c6 derived from data / c7
    derived from www.camsis.stir.ac.uk

7
The significance of data management for social
survey research
  • The data manipulations described above are a
    major component of the social survey research
    workload
  • Pre-release manipulations performed by
    distributors / archivists
  • Coding measures into standard categories
  • Dealing with missing records
  • Post-release manipulations performed by
    researchers
  • Re-coding measures into simple categories
  • We do have existing tools, facilities and expert
    experience to help us…but we dont make a good
    job of using them efficiently or consistently
  • So the significance of DM is about how much
    better research might be if we did things more
    effectively…

8
Some provocative examples for the UK…
  • Social mobility is increasing, not decreasing
  • Popularity of controversial findings associated
    with Blanden et al (2004)
  • Contradicted by wider ranging datasets and/or
    better measures of stratification position
  • DM researchers ought to be able to more easily
    access wider data and better variables
  • Degrees, MScs and PhDs are getting easier
  • or at least, more people are getting such
    qualifications
  • Correlates with measures of education are
    changing over time
  • DM facility in identifying qualification
    categories standardising their relative value
    within age/cohort/gender distributions isnt, but
    should, and could, be widespread
  • Black-Caribbeans are not disappearing
  • As the 1948-70 immigrant cohort ages, the
    Black-Caribbean group is decreasingly prominent
    due to return migration and social integration of
    immigrant descendants
  • Data collectors under-pressure to measure large
    groups only
  • DM It ought to remain easy to access and analyse
    survey data on Black-Caribbeans, such as by
    merging survey data sources and/or linking with
    suitable summary measures

9
Our own motivation (in DAMES)
  • DM is a big part of the research process
  • ..but receives limited methodological attention
  • Poor practice in soc. sci. DM is easily observed
  • Not keeping adequate records
  • Not linking relevant data
  • Not trying out relevant variable
    operationalisations
  • Even though..
  • There are plenty of existing resources and
    standards relevant to data management activities
  • There are suitable software and internet
    facilities (Scott Long 2009)
  • People are working on DM support (e.g. ESDS,
    DAMES)

10
A bit of focus…
  • Most of the DAMES applications aim to facilitate
    one of two data management activities
  • Variable constructions
  • Coding and re-coding values
  • Linking datasets
  • Internal and external linkages

11
The relevance of e-Science
  • Data management through e-Social Science
  • E-Science refers to adopting a number of
    particular approaches and standards from
    computing science, to applied research areas
  • These approaches include the Grid distributed
    computing data and computing standardisation
    metadata security research infrastructures
  • DAMES (2008-11) developing services / resources
    using e-Science approaches which will help social
    scientists in undertaking data management tasks

12
National Centre for e-Social Science,
www.ncess.ac.uk
  • Major UK investment into UK oriented e-social
    science projects, typically
  • Handling and displaying large volumes of complex
    data
  • E.g. GeoVue DReSS Obesity e-Lab
  • Resources for computationally demanding
    analytical tasks
  • CQeSS MoSeS
  • Standards setting in preparing / supporting data
    and research

13
E-Science and Data Management
  • E-Science isnt essential to good DM, but it has
    capacity to improve and support conduct of DM…
  • Concern with standards setting
  • in communication and enhancement of data
  • Linking distributed/heterogeneous/dynamic data
  • Coordinating disparate resources interrogating
    live resources
  • Contribution of metadata
  • tools/standards for variable harmonisation and
    standardisation
  • Linking data subject to different security levels
  • The workflow nature of many DM tasks

14
E.g. of GEODE Organising and distributing
specialist data resources (on occupations)
15
The contribution of DAMES 8 project themes

16
  • DAMES research Node
  • social researchers often spend more time on data
    management than any other part of the research
    process
  • Appendix 1 other extant resources relevant to
    DM

DAMES ONS support ESDS support
UK Data Archive Qualidata Flagship social
surveys Office for National Statistics Administrat
ive data Specialist academic outputs
NCRM workshops Essex summer school ESRC RDI
initiatives CQeSS
Data Management
Data access / collection
Data Analysis
17
  • Some Key issues and concerns for DAMES
  • 4 good habits and principles
  • 3 Challenges

18
(a) Good habit Keep clear records of your DM
activities
  • Reproducible (for self)
  • Replicable (for all)
  • Paper trail for whole lifecycle
  • Cf. Dale 2006 Freese 2007
  • In survey research, this means using clearly
    annotated syntax files
  • (e.g. SPSS/Stata)
  • Syntax Examples
  • www.longitudinal.stir.ac.uk

19
Stata syntax example (do file)

20
Software and handling variables a personal view
  • Stata is the superior package for secondary
    survey data analysis
  • Advanced data management and data analysis
    functionality
  • Supports easy evaluation of alternative measures
    (e.g. est store)
  • Culture of transparency of programming/data
    manipulation
  • Cf. Scott Long (2009)
  • Problems with Stata
  • Not available to all users
  • Slow estimation times

21
(b) Principle Use existing standards and
previous research
  • Variable operationalisations
  • Use recognised recodes / standard classifications
  • ONS harmonisation standards
  • Shaw et al. 2007
  • Cross-national standards. Hoffmeyer-Zlotnick
    Wolf 2003
  • Common vs best practices (e.g. dichotomisations)
  • Use reproducible recodes / classifications (paper
    trail)
  • Other data file manipulations
  • Missing data treatments
  • Matching data files (finding the right data)

22
(c) Principle Do something, not nothing
  • We currently put much more effort into data
    collection and data analysis, and neglect data
    manipulation
  • Survey research the influence of what was on
    the archive version
  • …In my experience, a common reason why people
    didnt do more DM was because they were
    frightened to…

23
(d) Principle Learn how to match files
(deterministic)
  • Complex data (complex research) is distributed
    across different files. In surveys, use key
    linking variables for...
  • One-to-one matching
  • SPSS match files /filefile1.sav
    /filefile2.sav /bypid.
  • Stata merge pid using file2.dta
  • One-to-many matching (table distribution)
  • SPSS match files /filefile1.sav
    /tablefile2.sav /bypid .
  • Stata merge pid using file2.dta
  • Many-to-one matching (aggregation)
  • SPSS aggregate outfilefile3.sav
    /meanincmean(income) /breakpid.
  • Stata collapse (mean) meanincincome, by(pid)
  • Many-to-Many matches
  • Related cases matching

24
Some challenges for data management..
  • (e) Agreeing about variable constructions
  • Unresolved debates about optimal measures and
    variables
  • Esp. in comparative research such as across time,
    between countries
  • In DAMES, we have particular interests in
    comparability for
  • Longitudinal comparability (http//www.longitudina
    l.stir.ac.uk/variables/)
  • Scaling / scoring categories to achieve meaning
    equivalence or specific measures

25
Some challenges for data management..
  • (f) Worrying about data security
  • DM activities could challenge data security
  • Inspecting individual cases
  • Multiple copies of related data files
  • Ability to link with other datasets
  • Hands-on model of data review
  • New and exciting data resources
  • have more individual information
  • are more likely to be released with stringent
    conditions
  • may jeopardize traditional DM approaches

26
Some routes to secure data
  • Secure portals for direct access to remote data
  • Secure settings (e.g. safe labs)
  • Data annonymisation and attenuation
  • Emphasis on users responsibility rather than the
    data provider

27
Some challenges for data management..
  • (g) Incentivising documentation / replicability
  • There is little to press researchers to better
    document DM, but much to press them not to
  • Make DM and its documentation easier?
  • Reward documentation (e.g. citations)?

28
Appendices
29
Appendix 1 Existing resources (i) Data
providers - a) Documentation and metadata files
30
Existing resources (i) Data providers
  • Resources for variables
  • CESSDA PPP on key variables http//www.nsd.uib.no/
    cessda/project/
  • UK Question Bank http//qb.soc.surrey.ac.uk/
  • ONS Harmonisation http//www.statistics.gov.uk/abo
    ut/data/
  • Resources for datasets
  • UK Census data portal, http//census.ac.uk/
  • IPUMS international census data facilities,
    www.ipums.org
  • European Social Survey, www.europeansocialsurvey.o
    rg
  • Data manipulations prior to data release
  • Missing data imputation / documentation
  • Survey design / weighting information
  • Influential most analysts use the archive
    version

31
Existing resources (ii) Resource projects /
infrastructures
  • UK ESDS www.esds.ac.uk
  • ESDS International ESDS Government
  • ESDS Longitudinal ESDS Qualidata
  • Helpdesks online instructions user support..
  • UK ESRC NCRM / NCeSS / RDI initiatives
  • Longitudinal data www.longitudinal.stir.ac.uk
  • Linking micro/macro - www.mimas.ac.uk/limmd/
  • Other resources / projects / initiatives
  • EDACwowe - http//recwowe.vitamib.com/datacentre
  • ….

32
Existing resources (iii) Analytical and software
support
  • Textbooks featuring data management
  • Levesque 2008 Sarantakos 2007 Scott Long
    2009
  • Software training covering DM
  • Statas data management manual
  • SPSS user group course on syntax and data
    management, www.spssusers.co.uk
  • But generally, sustained marginalisation of DM as
    a topic
  • Advanced methods texts use simplistic data
  • Advanced software for analysis isnt usually
    combined with extended DM requirements

33
Existing resources (iv) Data analysts
contributions
  • Academic researchers often generate and publish
    their own DM resources, e.g.
  • Harry Ganzeboom on education and occupations,
    http//home.fsw.vu.nl/ganzeboom/pisa/
  • Provision of whole or partial syntax programming
    examples
  • Analysts often drive wider resource provisions
    related to DM
  • CAMSIS project on occupational scales,
    www.camsis.stir.ac.uk
  • CASMIN project on education and social class

34
Existing resources (v) Literatures on
harmonisation and standardisation
  • National Statistics Institutes principles and
    practices
  • E.g. ONS www.statistics.gov.uk/about/data/harmonis
    ation/
  • Cross-national organisations
  • E.g. UNSTATS - http//unstats.un.org/unsd/class/
  • Academic studies
  • E.g. Harkness et al 2003 Hoffmeyer-Zlotnick
    Wolf 2003 Jowell et al. 2007

35
Appendix 2 Some other selected NCeSS
projects (concerned with accessing/handling
complex data)
36
References
  • Blanden, J., Goodman, A., Gregg, P., Machin, S.
    (2004). Changes in generational mobility in
    Britain. In M. Corak (Ed.), Generational Income
    Mobility in North America and Europe. Cambridge
    Cambridge University Press.
  • Dale, A. (2006). Quality Issues with Survey
    Research. International Journal of Social
    Research Methodology, 9(2), 143-158.
  • Freese, J. (2007). Replication Standards for
    Quantitative Social Science Why Not Sociology?
    Sociological Methods and Research, 36(2), 2007.
  • Harkness, J., van de Vijver, F. J. R., Mohler,
    P. P. (Eds.). (2003). Cross-Cultural Survey
    Methods. New York Wiley.
  • Hoffmeyer-Zlotnik, J. H. P., Wolf, C. (Eds.).
    (2003). Advances in Cross-national Comparison A
    European Working Book for Demographic and
    Socio-economic Variables. Berlin Kluwer Academic
    / Plenum Publishers.
  • Jowell, R., Roberts, C., Fitzgerald, R., Eva,
    G. (2007). Measuring Attitudes Cross-Nationally.
    London Sage.
  • Levesque, R., SPSS Inc. (2008). Programming and
    Data Management for SPSS 16.0 A Guide for SPSS
    and SAS users. Chicago, Il. SPSS Inc.
  • Sarantakos, S. (2007). A Tool Kit for
    Quantitative Data Analysis Using SPSS. London
    Palgrave MacMillan.
  • Scott Long, J. (2009). The Workflow of Data
    Analysis Using Stata. Boca Raton CRC Press.
  • Shaw, M., Galobardes, B., Lawlor, D. A., Lynch,
    J., Wheeler, B., Davey Smith, G. (2007). The
    Handbook of Inequality and Socioeconomic
    Position Concepts and Measures. Bristol Policy
    Press.
About PowerShow.com