Inter-university Consortium for Political and Social Research (ICPSR) PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Inter-university Consortium for Political and Social Research (ICPSR)


1
DDI Across the Life Cycle One Data Model, Many
Products
Click to edit Master title style
  • Inter-university Consortium for Political and
    Social Research (ICPSR)
  • and
  • Survey Research Operations (SRO)

Click to edit Master subtitle style
IASSIST MeetingTampere, FinlandMay 29, 2009
2
Presenters
  • Mary Vardigan,Assistant Director, ICPSR
  • Sue Ellen Hansen,Director, SRO Technical Systems
    Group
  • Peter Granda, Archivist, ICPSR
  • Sanda Ionescu, Documentation Specialist, ICPSR
  • Felicia LeClere, Associate Research Scientist,
    ICPSR

3
The Collaborators
  • Both are units of the Institute for Social
    Research, University of Michigan
  • ICPSR is a large social science data archive
  • SRO is a data collection center

4
Past Collaborations
  • Working together on the National Survey of Family
    Growth, sponsored by NCHS, to create data and an
    interactive codebook
  • Partnered on the Collaborative Psychiatric
    Epidemiology Surveys, sponsored by NIMH
  • This involved a harmonization of three datasets
    and interactive documentation featuring question
    comparison and five languages
    www.icpsr.umich.edu/CPES

5
(No Transcript)
6
Rationale for Collaboration
  • We share a need for rich, high-quality metadata
  • We want to comply with metadata standards in
    particular, the Data Documentation Initiative
    (DDI)
  • DDI 3 enables life cycle perspective
  • We need to pass data easily from SRO to ICPSR
    without information loss

7
SRO-ICPSR Joint Project
  • Shared DDI-compliant data model and database
    design for survey metadata
  • Challenges
  • Different computing platforms
  • Different end products
  • Different staff orientations

8
(No Transcript)
9
Products and Benefits
  • SRO
  • Tools to enhance MQDS, which produces XML
    documentation from Blaise instruments
  • Tool to permit external users to add metadata for
    NSFG
  • ICPSR
  • Variable-level database that permits users to
    search across the ICPSR collection compare
    variables create new datasets and questionnaires
  • Internal variable search for harmonization

10
Data Life Cycle Coverage
11
Michigan Questionnaire Documentation System (MQDS)
  • Sue Ellen Hansen
  • Nicole Kirgis

12
What Does MQDS Do?
  • Facilitates automated documentation and
    harmonization of Blaise survey instruments and
    datasets
  • Extracts survey question metadata
  • Standardized format

13
Survey Question Metadata
  • Question universe
  • Variable name and label
  • Question text
  • Question variable text (fills)
  • Data type
  • Code values and code text
  • Skip instructions
  • etc.

14
Data Documentation Initiative (DDI)
  • Standard specification for technical
    documentation of social science data
  • eXtensible Markup Language (XML)
  • Widely used
  • Facilitates sharing of data
  • Initial focus on standard dataset codebook
  • Ongoing development

http//www.ddialliance.org/
15
MQDS Version 1
  • Extracted metadata from Blaise data model as XML
    tagged data
  • Provided user interface for selection of
  • Blaise files
  • Instrument questions and sections
  • Types of metadata to extract
  • Languages to display
  • Style sheet for generation of instrument
    documentation or codebook

16
Using MQDS V1 XML Codebook in Five Languages
National Latino and Asian American
Study www.icpsr.umich.edu/CPES
17
MQDS Version 1
  • Limitations
  • XML not DDI-compliant
  • DDI Version 2 did not have XML tags for all
    metadata provided by Blaise
  • Did not provide easy means of adding XML tags
    without becoming noncompliant
  • XML files for complex surveys can be very large
    (text files)
  • Entire files had to be processed in computer
    memory
  • Limited ability to fully automate documentation

18
DDI Version 3
  • Released April 2008
  • Focus on complete data lifecycle going beyond
    the codebook

19
DDI Version 3
  • Included extensions proposed by DDI working group
    on instrument design

Persistent Content of Question Use of Question in Instrument
Question text Static Dynamic or variable Order and routing Sequence / skip patterns Loops
Multiple-part question Universe
Response domain Open Set categories Special types (date, time, etc.) Analysis unit
Definitional text Instructions
20
MQDS Version 3
  • Joint SRC and ICPSR venture
  • Goals
  • Address version 2 limitations
  • Process Blaise instrument of any size
  • Exploit new elements and validate to the recently
    released DDI version 3 standard
  • Move from processing XML metadata in memory to
    streaming metadata to a relational database

21
MQDS Version 3Relational Database Import,
Export, Transform
XML (DDI 3)
User specifies output files (location,
Language/locale, XML output options, etc.)
Questionnaire
Codebook
User specifies stylesheet selection criteria,
type of output desired (html, rtf, pdf), etc.
22
MQDS Version 3
  • Relational database
  • DDI compliant standardized tables
  • Flexibility for SRC and ICPSR to add extensions
    that meet their specific organizational needs
  • Allows
  • Automated documentation of any Blaise survey
    instrument
  • Importing and documenting data produced by other
    software
  • Lower cost development of other tools that
    facilitate editing and disseminating data

23
MQDS V3 Prototype Exporting Language XML
24
MQDS Development
  • Expect to release Summer 2009
  • Working out a distribution plan for Blaise users

25
Data Life Cycle Coverage
26
ApplicationsCustomized Editing Tool
  • Peter Granda
  • ICPSR

27
MQDS Version 3
  • Relational database
  • DDI compliant standardized tables
  • Flexibility for SRC and ICPSR to add extensions
    that meet their specific organizational needs
  • Allows Development of new tools to deal with the
    practical problems involved in transforming data
    and documentation derived from BLAISE
    instruments into public-use products

28
Features of the Tool
  • Loads MQDS output into database tables
  • Web interface to permit quick viewing
  • Application that permits both internal and
    external clients to access and edit
    variable-level information
  • Ability to include disposition codes to
    designate which variables to include in
    public-use files
  • Maintain permanent record of decisions made
    throughout the editing process

29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
SELECT VARIABLE TO EDIT FROM DATABASE POPULATED
WITH METADATA FROM MQDS WITH POSSIBLE REVISIONS
FROM SUBSEQUENT DATA PROCESSING STEPS
Variable Name
Variable Label
Universe Statements
Value Labels
Question Text
List of Standard Formats
  • VARIABLE DISPOSITION
  • Place in public-use file
  • Place in restricted-use file
  • Leave in original file created by the data
    producer

34
Data Life Cycle Coverage
35
Social Science Variables DatabaseThe Public
Search
  • Sanda Ionescu
  • ICPSR

36
SSVD The Public Search
  • ICPSR variables search
  • Internal (staff, other authorized users)
  • External (public)

37
SSVD The Public Search
  • Enables ICPSR users to search variables across
    datasets
  • Assists in data discovery, comparison,
    harvesting, and analysis
  • Useful in question mining for designing new
    research

38
SSVD The Public Search
  • Concept first tested in a pilot project completed
    in 2005
  • Good functionality
  • Demonstrated benefits of using DDI markup easy
    import complex, granular searches user-friendly
    display
  • Limited number of data sets (69 ICPSR studies
    included)

39
SSVD The Public Search
  • Expand the project to ultimately include most of
    ICPSRs holdings
  • Generate DDI documentation for most ICPSR studies
  • Need for automated production
  • Build a solid, state-of-the-art, DDI compliant
    database
  • Handle large number of files
  • Support multiple applications

40
SSVD The Public Search
  • The Hermes batch processing system

ASCII data file
SPSS system / portable file (Mandatory)
Statistical setups SPSS, SAS, Stata
Ready-to-go data files SAS transport, SPSS
portable, Stata system
Question text file in fixed format (Optional)
DDI 2.1 variable-level documentation with
frequencies and question text (optional)
PDF Codebook
(Part of )
This is a simplified diagram
41
SSVD The Public Search
  • Hermes
  • Consistent, reliable source of variables
    descriptions in DDI
  • DDI documentation limited to content of input
    files
  • Labels may be truncated or may contain
    abbreviations
  • Question text may be missing although available
    in original documentation

42
SSVD The Public Search
  • Additional quality standards necessary for DDI
    documentation, to maximize effectiveness of
    Public Search
  • Presence of question text, whenever available
  • Increased readability of variable/value labels,
    especially if question text is not present

43
SSVD The Public Search
  • Not all ICPSR studies qualify for variable-level
    searches
  • Criteria for selecting studies not included
  • Aggregate/statistical data (ex. Census data, Data
    Books, Roll Call records, etc.)
  • Poor documentation
  • Some restricted data

44
SSVD The Public Search
  • Pre-SSVD upload
  • Review of DDI output from Hermes to apply content
    quality standards and study selection criteria
  • Additional work to upgrade DDI where necessary
    (and feasible)
  • Add question text
  • Complete truncated text
  • Improve readability of labels
  • Add frequencies

45
SSVD The Public Search
  • Preparing studies for SSVD
  • Started end of 2006
  • Included DDI produced for previous projects
  • Reviewed all variable-level DDI created at ICPSR,
    November 2006 to present (new releases and
    updates)

46
SSVD The Public Search
  • New database finalized Fall 2008
  • Built to match DDI 3.0 data model
  • Both DDI 2.x and DDI 3.0 compliant
  • Designed to accept both DDI 2.x and 3.0 input and
    produce output in both versions
  • ICPSR version currently uploads DDI 2.1 and
    generates DDI 3.0 individual variables
    descriptions.

47
SSVD The Public Search
  • First batch of variable-level description files
    uploaded into SSVD
  • Approx. 3,500 DDI files (one file per dataset),
    representing
  • Approx. 1,300 ICPSR studies (approx. 18.5 percent
    of total ICPSR holdings, excluding US Census
    approx. 30 percent of holdings with data and
    setups)
  • Over 1,000,000 individual variable descriptions
    23,000,000 categories

48
SSVD The Public Search
  • Currently in Beta-testing phase.
  • Email bugs at ssvd-testing_at_icpsr.umich.edu
  • Uses Oracle Text.
  • http//www.icpsr.umich.edu/ICPSR/ssvd/index.html

49
SSVD The Public Search Moving forward
  • Fall 2009 switch to Solr searches (based on
    Lucene)
  • Faster
  • More sophisticated results filtered by multiple
    relevant parameters
  • Enable side-by-side/same page display of selected
    variables for comparison
  • Enable variable search from individual study page
    (search within study)

50
SSVD The Public Search Moving forward
  • Adding content
  • Second batch of DDI files ready to upload
  • 900 DDI files, representing 500-600 studies (will
    bring total close to 45 percent of ICPSR studies
    with data and setups)
  • Initiate retrofit project to examine older
    studies that were not covered in the first
    conversion phase

51
SSVD The Public Search Moving forward
  • Transition to automated DDI upload
  • DDI uploaded at the time of study publication
  • First quality check performed by study processing
    staff
  • Acceptable DDI immediately released for public
    view
  • Problematic DDI suppressed from public view for
    further review, and upgrade as appropriate

52
Data Life Cycle Coverage
53
Applications Internal Variable Search and
Documentation
  • Felicia LeClere,
  • ICPSR

54
The Integrated Fertility Survey Series
  • 5 year grant from NICHD to harmonize data from 10
    large surveys of marriage, fertility, and
    child-bearing in the United States
  • 10 surveys beginning in 1955 through 2002

55
Problem of Harmonization
  • In order to make decisions about harmonizing
    across all files need
  • Question text
  • Value labels and categories
  • Be able to find and export metadata from all 10
    files at the variable level
  • Be able to document each variable, recode and
    variable choice

56
Tools from Variables Database
  • Need to be able to do nested searches that are
    documented
  • Need to be able to search all fields individually
    and in sequence
  • Need to be able to download results and document
    what search terms were used

57
ICPSR SSVD Internal Search
  • All 10 data sets were loaded in ICPSRs version
    of the shared data base
  • Designed to capture all of the relevant fields
    that were marked up in DDI

58
Entry screen for internal search
59
Search results screen
60
Excel download from search
Can also download value labels and codes
61
Search Utilities
  • Downloaded search fields serve to
  • 1. Identify variables to be harmonized
  • 2. Provide metadata for translation tables
    which are used to harmonize files

62
Harmonization steps
  • Use search results to populate two intermediate
    steps to reforming data set
  • Exploratory comparative tables
  • Use this comparative table to make decisions
    about harmonization by examining universes,
    question texts, and response categories
  • Translation tables
  • These tables are designed to provide instructions
    on recoding the underlying items from the 10
    surveys to a single harmonized item. The table
    provides instructions to an automated SAS program
    that recodes items from 10 surveys.

63
Comparative table date of birth
64
Translation Table for place of birth
65
Harmonization steps
  • After the translation table, the recode
    instructions for all 10 files are built into the
    SAS file and a new data file has been created.
  • The underlying metadata data provided by the
    database allow us to (1) search all 10 files, (2)
    explore comparability and (3) recode to new
    variables
Write a Comment
User Comments (0)
About PowerShow.com