DDI The Movie 2: Applications of the Architecture (early draft) - PowerPoint PPT Presentation


PPT – DDI The Movie 2: Applications of the Architecture (early draft) PowerPoint presentation | free to download - id: 65dfcc-Mzg4Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

DDI The Movie 2: Applications of the Architecture (early draft)


DDI The Movie 2: Applications of the Architecture (early draft) By I-Lin Kuo Table of Contents Modules and Instrument Documentation The Variable Ontologies and ... – PowerPoint PPT presentation

Number of Views:11
Avg rating:3.0/5.0
Slides: 35
Provided by: ILin8
Learn more at: http://users.pop.umn.edu


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: DDI The Movie 2: Applications of the Architecture (early draft)

DDI The Movie 2 Applications of the
Architecture (early draft)
  • By I-Lin Kuo

Table of Contents
  • Modules and Instrument Documentation
  • The Variable
  • Ontologies and Tagging

Modules and Instrument Documentation
  • Chapter 1

Suggested Approach to Instrument Documentation
  • METS has an extremely well-designed structure map
    which describes the logical structure of its
    objects of interest. See http//www.loc.gov/standa
  • Basically, a skeleton of a structure is created
    which then contains pointers to items. See next
    slide for the recipe for building METS.

Building a METS Document 5 key aspects
  1. Expressing the Structure
  2. Linking Structure with Content
  3. Linking Structure with Descriptive Metadata
  4. Linking Structure and Content Files with
    Administrative metadata
  5. Not covered Linking behaviors with structures.

Suggested Approach to DDI Instrument Documentation
  • I recommend that the DDI adopt a similar approach
  • Create an instrument structure map for each
  • Link the structure with content contained in
  • Examples of ltInstrumentItemgt would be
    ltSimpleQuestiongt, ltGridQuestiongt,
    ltQuestionGroupgt, ltComputationgt, ltFlowCheckgt,
    ltInterviewerInstrgt, etc.
  • Link the structure and content with display
  • This approach has the advantage of allowing
    questions etc. to be reused in different
    instrument structure maps. This would be useful
    in a study with separate male and female
    questionnaires, for example.
  • I also think (havent thought this through
    completely) that this allows a clean separation
    between question content and question display.
    Thus, a multi-mode survey would have identical
    structure maps linked to different display

The Variable
  • Chapter 2

DDI 2.0 variable
4.3 var (ATT wgt, wgt-var, weight, qstn,
files, vendor, dcml, intrvl, rectype, sdatrefs,
methrefs, pubrefs, access, aggrMeth, measUnit,
scale, origin, nature, additivity, temporal,
geog, geoVocab, catQnty) 4.3.1 location (ATT
StartPos, EndPos, width, RecSegNo, fileid,
locMap) 4.3.2 labl (ATT level, vendor,
country, sdatrefs) 4.3.3 imputation? 4.3.4
security? (ATT date) 4.3.5 embargo? (ATT
date, event, format) 4.3.6 respUnit? 4.3.7
anlysUnit? 4.3.8 qstn 4.3.9 valrng 4.3.10
invalrng 4.3.11 undocCod 4.3.12
universe 4.3.13 TotlResp? 4.3.14 sumStat
(ATT wgtd, wgt-var, weight, type) 4.3.15 txt
(ATT level, sdatrefs) 4.3.16 stdCatgry (ATT
date, URI) 4.3.17 catgryGrp 4.3.18 catgry
4.3.19 codInstr 4.3.20 verStmt 4.3.21
concept (ATT vocab, vocabURI) 4.3.22
derivation? 4.3.23 varFormat? (ATT type,
formatname, schema, category, URI) 4.3.24
geoMap (ATT URI, mapformat, levelno) 4.3.25
catLevel (ATT levelnm) 4.3.26 notes
DDI 2.0 var major components
  • Variable type _at_wgt, _at_intrvl
  • Reference _at_qstn, _at_wgt-var, _at_files, _at_sdatrefs,
    _at_methrefs, _at_pubrefs
  • Descriptive ltnotesgt, ltuniversegt, lttxtgt,
    ltconceptgt, ltderivationgt, ltqstngt, ltgeoMapgt
  • Provenance ltverStmtgt
  • Sampling/Measurement ltimputationgt, ltrespUnitgt,
  • Logical Encoding ltvalrnggt, ltinvalrnggt,
    ltundocCodgt, ltcatgrygt, ltcatgryGrpgt, ltstdCatgrygt,
    ltcodInstrgt, ltcatLevelgt, ltvarFormatgt
  • Statistics ltTotlRespgt, ltsumStatgt
  • Security/Access ltsecuritygt, ltembargogt
  • Physical description _at_rectype, ltlocationgt
  • Other _at_vendor,
  • Note some elements and attributes straddle
    several concerns. In that case, I just picked one.

Some problems of 2.0
  • No recoding documentation
  • One variable, one question
  • Question contained within variable
  • No virtual recodes

3.0 restructuring goals for the variable
  • Standardize the usage of elements such as
    security, etc. so that they may be
  • Standardize the naming of elements and attributes
  • Reduce redundancy so there is only one way to
  • Compatibility with ISO11179 conception of the
  • Compatibility with statistical tools conception
    of variable
  • Compatibility with MetaDater concept of
  • More sophisticated recode documentation
  • Better documentation of question flow in
    instrument documentation
  • More complete classification of variable types
  • Systematic handling of variable referencing
  • Support of virtual recodes

2.0 Classification of variable types
  • Well start with this as this is relatively easy
  • 2.0 already has attributes wgt, wg-var, qstn but
    more are needed for a richer machine-actionable

3.0 Classification of variable types
  • Types is actually a misnomer. These should be
    treated as labels rather than types because they
    are not exclusive
  • Raw/question (codes come directly from questions)
    this will probably be affected by the ongoing
    discussion on question typology at DDI-ID
  • Recodes
  • Weight
  • Attrition
  • Key
  • Imputation Flag
  • Time/geog?
  • Continuous/discrete
  • Aggregated
  • Nominalordinalintervalratio
  • Virtual recode a variable for display
    purposes only without corresponding data, such as
    a continuous variable displayed as a discrete
  • dropped
  • Nonexistent intermediate an intermediate
    variable used only for calculation, without data
    or display. Nonexistent instrument an
    artifact of the instrument, without data or
  • incomplete

  • Variables should reference their applicable
    weight variables, or vice versa
  • Imputation flags should reference their
    corresponding variables
  • Variables might need to reference attrition
    variable in some cumulative dataset
  • Recodes will need to reference questions,
    computations, and other variables in their recode
  • Directionality of the references remains to be

Machine-actionable consequences
  • Identification of keys enables complex files
  • Weight, imputation flag, and attrition references
    may allow statistics to be intelligently
    calculated on the fly

General approach to compatibility
  • By compatibility with statistical tools (SPSS,
    SAS, STATA), we mean that we should be able to do
    a round-trip from a setup file ? DDI ? setup file
    with no loss of information.
  • It is not realistic to expect as a 3.0
    deliverable 3 XSLT stylesheets which transform
    DDI ?SPSS, SAS, or Stata setup files.
  • It may also be possible to have stylesheets which
    convert from SPSS and SAS proprietary XML formats
    to DDI, which perform the round-trip without loss
    of information. This is dependent on whether or
    not the DDI is rich enough to contain all the
  • By compatibility with ISO11179 and MetaDater, we
    will suggest a standard way in which ltvargt may be
    marked up.

Compatibility with statistical tools SPSS
  • FILE HANDLE DATA / NAME"data-filename" LRECL66.
  • STANUM 8-9 QTYPE 13

  • STANUM 'State ID' /
  • QTYPE 'State or National precinct' /
  • STANUM 2 'Alaska' /
  • QTYPE 1 'State' 2 'National' /
  • The simple excerpt from an SPSS setup file at
    left can be round-tripped even with DDI 2.0
  • Data List column info goes in ltlocationgt
  • Variable labels go into var.txt
  • Value labels go into ltcatgrygt
  • More analysis is needed to see what is necessary
    for round-tripping for the SPSS xml format and/or
    more complicated setup files. Achim is familiar
    with the xml.

Compatibility with statistical tools Stata
  • _column(8) int STANUM STANUM 2f
    "State ID"
  • _column(10) int PRECINCT 3f
    "Sample precinct number"
  • _column(13) int QTYPE QTYPE 1f
    "State or National precinct"
  • _column(16) int BACKSIDE BACKSIDE 1f
    "Backside completion flag"
  • _column(17) float WGT 6.3f
    "Respondent weight"
  • label define STANUM 2 "Alaska"
  • label define QTYPE 1 "State" 2 "National"
  • Int/float map to DDI 2.0s ltvarFormatgt. Q are
    all statas types map-able into DDI types?
  • Does 6.3f map to DDI? If not, we need to add
    a place for it.
  • The notation STANUM indicates that perhaps
    formats/categories may be shared by different
    variables. If this is true, then ltcatgrygt would
    have to be moved out of ltvargt
  • More analysis needed. Im not too familiar with

Compatibility with statistical tools SAS

  • VALUE STANUM 2'(2) Alaska'

  • VALUE QTYPE 1'(1) State' 2'(2) National'

  • STANUM 8-9 QTYPE 13

  • STANUM 'State ID'

  • QTYPE 'State or National precinct'
  • PROC FORMAT map to DDI ltcatgrygt
  • INPUT maps to DDI ltlocationgt
  • LABEL maps to DDI var.txt
  • FORMAT associates each variable with a coding
    format. Multiple variables may be associated to
    the same format. This will not work with 2.0 for
    the same reason 2.0 cannot associate multiple
    variables with the same question.
  • Thus, ltcatgrygt needs to be taken out of ltvargt for

Compatibility with MetaDater
  • Still a lot of reading yet to do on this one.

Compatibility with ISO11179
  • Harmonization steps based on Dan Gilmans 2003
    presentation http//www.iassistdata.org/conference
  • Goal seek to harmonize with ISO11179 at the
    variable model level so that DDI may be used as a
    transport/exchange format for ISO11179.

ISO/IEC 11179 - Core Model
corresponds to DDI 2.0 tag/concepts
Data Element Concept
Conceptual Domain
ISO11179 ontology or concept registry
ltconceptgt and/or ltuniversegt
Ontologies also do not exist in DDI 2.0
Data Element
Value Domain
ltcatgrygt ltconceptgt ltcatgrygt
However, the catgry.concept does not exist in DDI
ISO11179 Harmonization Steps
  • 3.0 harmonization with the ISO11179 model on
    previous slide
  • Move ltcatgrygt out of ltvargt, as different data
    elements may point to the same value domain. This
    is not possible if value domain is contained
    within data element.
  • Add a ltconceptgt to ltcatgrygt or some means of
    pointing to the reference domain.
  • Add a way of pointing to an ontology or registry
    from the ltconceptgt. This will be explained in the
    section on Ontologies

Additional analysis needed
  • Changes in the structure for the variable have to
    be analyzed for its impact on other concerns
  • Nested categories
  • N-Cubes

Overall restructuring plan
  • Need to identify those components which are
    intrinsic to a variable and those which are
    extrinsic or may be shared between variables
  • Intrinsic type(wgt, derivation, txt), ltrecodegt
  • Extrinsic ltsumStatgt, ltTotlRespgt
  • Shared ltqstngt, ltcatgrygt, ltsecuritygt, ltembargogt,
  • Extrinsic and shared elements need to be moved
    out of ltvargt
  • Elements necessary for compatibility with other
    standards need to be added.

Ontologies and Tagging
  • Chapter ?

Rel-tag microformat
  • Problem How can we associate keywords to a web
  • Old solution meta keywords in an html page
  • 2005 solution rel-tag microformat, popularized
    by the technorati blog aggregator to allow blog
    authors to tag content to aid the technorati
    search engine.
  • This isnt the same as the DDI problem but the
    solution is instructive.

Rel-tag microformat details
  • Example lta href"http//technorati.com/tag/tech"
  • The last segment of the path tech is the
  • The preceding part http//technorati.com/tag
    -- is the space which knows what to do with the
  • technology is the visible part of the tag
  • reltag identifies this as a rel-tag rather
    than a normal anchor
  • See http//microformats.org/wiki/reltag or google
    for more details

DDI ontology problem
  • Problem How can we associate words in DDI markup
    to controlled vocabularies or ontologies such as
    Madeira, ICPSR social science thesaurus, or
    ISO11179 concept registry?
  • Note that the rel-tag microformat already
    contains 75 of what we need
  • The authority
  • The space
  • The tag the keyword
  • So we can probably modify this to suit our needs

ltvargt ltconceptgt lta hrefhttp//www.icpsr.umi
ch.edu/socSciThes/crime relddigtcrimelt/agt
lt/conceptgt lt/vargt
ltcatgrygt ltconceptgtlta hrefhttp//data-archive.ac
.uk/ISO11179/maritalstatus relddigtmarital
statuslt/agtlt/conceptgt ltcatValugt3lt/agt
ltlablgtnever been lta hrefhttp//data-archive.a
c.uk/Madeira/marriage relddigtmarriedlt/agt
lt/lablgt lt/catgrygt
Rel-tag flexibility
  • Note that tags can occur anywhere and are not
    restricted to ltconceptgt
  • The visible part does not have to match the
  • Different ontologies may be used simultaneously

Applications of rel-tags
  • ISO11179 Rel-tags plus the variable
    restructuring suggested in the previous chapter
    The Variable give the DDI variable a
    compatibility with the ISO11179 data
    element/variable model
  • Comparative data search Rel-tags provide a way
    to implement the upward-pointing to a
    controlled vocabulary that Wendy and Jostein
    talked about last week. This implementation does
    not conflict with the variable-variable link
    mechanism needed for Reto.
  • Madeira rel-tags allow Madeira to mark up
    individual words in ltcatgrygt

  • As currently used, rel-tags do not allow for
    nested tags.

  • DDI should look into rel-tags or some variant to
    be used with ontologies
About PowerShow.com