Data curation standards and the messy world of social science occupational information resources - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Data curation standards and the messy world of social science occupational information resources

Description:

Paper presented to the 2nd International Digital Curation Conference, 21-22nd ... Paul Lambert, Larry Tan, Ken Turner, & Vernon Gayle. GEODE - Glasgow DCC, Nov 2006 ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 28
Provided by: geo98
Category:

less

Transcript and Presenter's Notes

Title: Data curation standards and the messy world of social science occupational information resources


1
Data curation standards and the messy world of
social science occupational information resources
  • Paper presented to the 2nd International Digital
    Curation Conference, 21-22nd November 2006,
    Glasgow.

2
GEODE www.geode.stir.ac.uk
  • Grid Enabled Occupational Data Environment
  • Operate as a portal
  • User friendly access to occupational data
  • High volume use
  • Support a community of occupational data
    providers
  • Depository of occupational information resources
  • Limited volume use
  • Experiment with / promote e-Social Science

3
(Part 1) Occupational analyses in the social
sciences
A mans work is as good a clue as any to the
course of his life and to his social being and
identity (Hughes, 1958)
  • (Quotes as reproduced in Coxon and Jones 1978
    Crompton 1998)

Nothing stamps a man as much as his occupation.
Daily work determines the mode of life.. It
constrains our ideas, feelings and tastes
(Goblot, 1961)
The backbone of the class structure, and indeed
of the entire reward system of modern Western
society, is the occupational order (Parkin, 1972)
4
Why is occupational research messy?
  • Two stage process
  • Collect preserve source occupational data
  • Summary / translation of source data
  • This model is a scientific approach
  • Published documentation (at both stages)
  • Replicable
  • Validation exercises
  • But social researchers have been not been good at
    using it
  • (Bechhofer 1969 Marsh 1986 Rose and Pevalin
    2003)

5
Stage 1 - Collecting Occupational Data
Examples
6
Stage 1 - Collecting occupational data summary
  • All methods lead eventually to coding to an
    occupational index scheme
  • Occupational Unit Groups
  • Standardised Industrial Classifications
  • Standardised employment status classifications
  • Occupational index schemes are the point of
    departure for GEODE

7
Stage 2 Summary / translation of source occ. data
  • Published occupational information resources
    used to link source data, via an index scheme,
    with substantively meaningful measures
  • Social class schemes
  • Stratification scales
  • Gender segregation statistics
  • Labour process statistics
  • Coding by fiat
  • (Allocation by expert social scientist)
  • Lack of documentation / replicability /
    consistency
  • Unscientific

8
Whats the problem?
  • But
  • Low uptake of existing occupational information
    resources
  • Strict security constraints on users
    micro-social survey data
  • Problems in the formatting / distribution of
    occupational information resources (Part 2)

9
Handling Occupational Information
  • Messy because
  • Large volume of occupational information
    resources
  • Limited coordination between resources
  • Inconsistencies in access and exploitation
    processes

Occupational information resources are used to
interpret occupational records
10
Some illustrative occupational information
resources
11
Occupational information resources
  • Large volumes of occupational information
    resources
  • Coverage across countries and time periods
  • Different research fields / topics
  • Dynamic updates to occupational information
    resources
  • Internet based distributions lead to duplication
    and expansion, e.g. ISEI - ISCO translation files
    at
  • PISA webpages (Ganzeboom)
  • IDEAS/Repec webpagees (Hendrickx)
  • CAMSIS occupational data webpage
  • Some maths
  • 100 alternative index schemes (OUGs others)
  • X
  • 500 alternative output measures (class schemes,
    etc)

12
Occupational information resources
  • Limited coordination
  • Varying metadata practices
  • Coordinated structure, e.g. ISEI at IDEAS/Repec
    rare
  • Natural language, e.g. CAMSIS common
  • No documentation
  • Varying data file formats
  • SPSS, Stata, Plain text
  • One-way distribution
  • Internet download text publications
  • Gaps between NSIs and academic researchers
  • NSIs make regular changes to favoured resources

13
Occupational information resources
  • Limited coordination (ctd)
  • Varying translation rules
  • One file for all occupations (universal)
  • Multiple files for different contexts
    (specific)
  • Different occupational index requirements

14
Occupational information resources
  • Inconsistencies in access / exploitation
  • Occupational Unit Group schemes variants
  • Decennial updates / International variations
  • Localised adaptations e.g. HESA / Survey
    variations e.g. GHS
  • Numeric or string format preservation
  • Hierarchical organisations
  • E.g. ISCO-88
  • 1234 ? 123 ? 12 ? 1
  • 110 0110 ? 11 ? 1 ? 0
  • Focus for application of occupational data
  • Individual level measures
  • Household / career contexts

15
Returning to the occupational research model
  • Two stage process
  • Collection preservation of source occupational
    data
  • Summary / translation of source data via
    occupational information resources
  • Critically, stage (2) places responsibility for
    reviewing occupational information resources with
    the social scientist
  • The volume of variants / inconsistencies isnt
    huge, but is enough to impede easy application

16
(Part 2) Curating Occupational Data
  • GEODE Grid Enabled Occupational Data
    Environment
  • Core provision support the management of and
    access to occupational information resources
  • Occupational information depository
  • Easy access to occupational data (portal for
    occupational data)

17
Metadata - Occupational information depository
  • How to facilitate searching, registering,
    accessing index service?
  • Establish a GEODE-M meta-data subset (.xml)
  • Founded on Michigan Data Documentation Initiative
  • Semantic curation of occupational information

18
Benefits of DDI-XML curation
  • XML suits
  • OGSA-DAI
  • (data access integration, www.ogsadai.org.uk)
  • Supports data indexing / preservation /
    management
  • Supports secure data matching programme
  • Could facilitate analytical queries
  • Gridsphere search programmes
  • Data curation standards
  • DDI widely deployed in social science resources
  • XML accessibility / transferability
  • Repeatability of tags very helpful
  • E.g. data files index measures contexts
    authors

19
Implementing GEODE-M metadata
  • Critical entries
  • Context of data country, time period
  • Index scheme
  • ltStdCatgrygt GEODE database of known index
    scheme
  • Source uri for resource
  • 2 stage curation process (?)
  • Web-proforma for supply of occupational data
  • Author context, index units
  • Gridsphere portlet
  • Manual updating of xml resource by depositor /
    GEODE members
  • Gridsphere portlet

20
Example issues
  • ltStdCatgrygt Variant implementations lt-gt indexed
    translation files
  • ltcontextgt cross-country resources
  • ltproducergt roleformatting caters to multiple
    author roles
  • ltfileDscr id"dkcherisco88.sav"gt caters to
    multiple files
  • ltabstractgt

21
Management of GEODE-M curation
  • Metadata considerations
  • GEODE-M as flexible recommended components of
    DDI
  • GEODE-M templates
  • webpages at GEODE
  • Other facilities?
  • Data considerations
  • Stored at GEODE vs Linkage to external data
  • Proprietary software (plain text / SPSS / STATA)
  • At present
  • Stage 1 automated curation (allows external
    linkage, any file format)
  • Stage 2 extended manual curation (requires
    GEODE server copy of data, translation to plain
    text rectangular format
  • Premised upon small commitment from depositors
    GEODE

22
GEODE user uptake
  • High potential demand
  • Numerous queries on occupational data management
  • Numerous researchers wishing to distribute
    occupational data
  • Prototype GEODE services not yet user-friendly
  • Carrots
  • High demands for easier access and review
  • Sticks
  • Poor standards of many previous research which
    neglects good review of occupational information
  • Hurdles
  • Change research cultures in social science
    disciplines(?)

23
Conclusions
  • Occupational data curation and the Grid
  • Grid facilitates management / access via xml
    formats (OGSA-DAI)
  • Current models require moderate specialist input
    (manual curation)
  • Grid offers new level of service not previously
    available
  • Dynamic coordinated file storage
  • File matching security
  • Occupational data as case study for focused DDI
    xml curation
  • Complex but finite range of occupational
    information resources
  • High user demand
  • Uptake will require combination of motivation,
    and instigation

24
App 1 e-Social Science
  • The Grid and e-Science
  • Online Coordination of electronic resources and
    collaborations
  • (Distributed computing)
  • Large scale
  • Collaborative
  • Heterogeneous
  • Standard protocols / information management
    systems
  • UK eSocial Science
  • Investment in assessing / implementing technology
  • Computationally demanding data analysis
  • Qualitative and quantitative data collection
    technologies
  • Data sharing, processing and access

25
App 2 GEODE architecture
26
App3 Collecting occupational data
  • Follow a recommended process
  • ONS good practice
  • www.statistics.gov.uk/methods_quality/ns_sec/quest
    ions.asp
  • Industry description / occupation description /
    size of organisation / employment status /
    supervisory status
  • Occupation descriptions -gt standardised numeric
    index
  • Text coding tools, e.g.CASCOT -
    www2.warwick.ac.uk/fac/soc/ier/publications/softwa
    re/cascot/
  • Do your own thing
  • European Social Survey parental occupational
    questions
  • free text description of parental occupations

27
App 4 Summary data what is the best class
scheme?
  • Published occupational information resources
    link source data, via index scheme, with
    substantively meaningful measures
  • Occupation-based social classifications
  • Social class schemes
  • Registrar Generals Social Class Scheme
    (1907-2001) skill / prestige
  • National Statistics Socio-Economic Classifn.
    (2002-) employment relations
  • Goldthorpe / CASMIN / EGP (Employment relations)
  • Wright ownership and authority
  • W.E.S. female occupational groupings
  • Stratification scales
  • SIOPS prestige
  • ISEI socio-economic status education and
    income average
  • CAMSIS social interaction
  • CAMSIS is the best
Write a Comment
User Comments (0)
About PowerShow.com