IZA%20Data%20Service%20Center%20DDI/SDMX%20Workshop%20Wiesbaden,%20Germany,%20June%2018th%202008%20The%20Data%20Documentation%20Initiative%20(DDI)

About This Presentation

Title:

IZA%20Data%20Service%20Center%20DDI/SDMX%20Workshop%20Wiesbaden,%20Germany,%20June%2018th%202008%20The%20Data%20Documentation%20Initiative%20(DDI)

Description:

Unlabeled stuff. Labeled stuff. The bean example is taken from: A Manager's ... software and database systems can be used to create and edit XML documents. ... – PowerPoint PPT presentation

Number of Views:197

Avg rating:3.0/5.0

Slides: 139

Provided by: pasca9

Learn more at: http://www.opendatafoundation.org

Category:

more less

Transcript and Presenter's Notes

Title: IZA%20Data%20Service%20Center%20DDI/SDMX%20Workshop%20Wiesbaden,%20Germany,%20June%2018th%202008%20The%20Data%20Documentation%20Initiative%20(DDI)

1
IZA Data Service CenterDDI/SDMX
WorkshopWiesbaden, Germany, June 18th 2008The
Data Documentation Initiative (DDI)

Arofan Gregory / Pascal Heus
agregory_at_opendatafoundation.org /
pheus_at_opendatafoundation.org
Open Data Foundation

2
Content

Background on metadata and XML
Metadata and Microdata
XML and Microdata the DDI
DDI 2.0
DDI 3.0
DDI 2.0 vs 3.0
Major stakeholders / initiatives

3
Metadata / XML
4
What is metadata?

Common definition Data about Data

5
What is XML?

Today's Universal language on the web
Purpose is to facilitate sharing of structured
information across information systems
XML stands for eXtensible Markup Language
eXtensibe ? can be customized
Markup ? tags, marks, attach attributes to things
Language ? syntax (grammatical rules)
HTML (HyperText Markup Language) is a markup
language but not extensible! It is also concerned
about presentation, not content.
XML is a text format (not a binary black box)
XML is a also a collection of technologies (built
on the XML language)
It is platform independent and is understood by
modern programming languages (C, Java, .NET,
pHp, perl, etc.)
It is both machine and human readable

6
Simple XML example
Attributes
ltcataloggt ltbook isbn0385504209gt
lttitlegtDa Vinci Codelt/titlegt
ltauthorgtDan Brownlt/authorgt lt/bookgt
ltbook isbn0553294385 pages352gt
lttitlegtI, robotlt/titlegt ltauthorgtIsaac
Asimovlt/authorgt ltlanguagegtEnglishlt/langu
agegt lt/bookgtlt/cataloggt
Elements
Opening and Closing tags
Text content
7
XML Technology overview
Document Type Definition (DTD) and XSchema are
use to validate an XML document by defining
namespaces, elements, rules
Specialized software and database systems can be
used to create and edit XML documents. In the
future the XForm standard will be used
XML separates the metadata storage from its
presentation. XML documents can be transformed
into something else, like HTML, PDF, XML, other)
through the use of the eXtensible Stylesheet
Language, XSL Transformations (XSLT) and XSL
Formatting Objects (XSL-FO)
Very much like a database system, XML documents
can be searched and queried through the use of
XPath oe XQuery. There is no need to create
tables, indexes or define relationships
XML metadata or data can be published in smart
catalogs often referred to as registries than can
be used for discovery of information.
XML Documents can be sent like regular files but
are typically exchanged between applications
through Web Services using the SOAP and other
protocols
8
What is an XML Schema?

Exchange / sharing / harmonization implies
agreement on structure
We need a specification that describes the
structure and rules ? Schema
A schema is a set of rules to which an XML
document must conform in order to be considered
'valid'
XML Schema was also designed with the intent that
determination of a document's validity would
produce a collection of information adhering to
specific data types
Similar to relational databases structural
definition
Many schemas exists for different purposes
Examples
DDI, SDMX ,Dublin Core, RSS, XHTML, etc.

9
Metadata, XML and Microdata
10
What is a survey?

More than just data.
A complex process to produce data for the purpose
of statistical analysis
Beyond this, a tool to support evidence based
policy making and results monitoring
The data is surrounded by a large body of
documentation
Survey data often come with limited documention
Note that microdata is intended for experts
Statisticians / researchers
Represents a single point in time and space
Need to be aggregated to produce meaningful
results
It is the beginning of the story

11
What is survey metadata?

Survey documentation can be broken down into
structured metadata and documents
Structured metadata can be captured using XML
Documents can be described in structured metadata
Example of metadata
Survey level Title, country, year, abstract,
sampling, agencies, access policy, etc.
Variable level filename, label, code, questions,
instructions, derivation, etc.
Related materials report, questionnaire, papers,
manuals, scripts/programs, photos
Cross-surveys catalogs, longitudinal, concepts,
comparability, etc.

12
Importance of survey metadata

Data Quality
Usefulness accessibility coherence
completeness relevance timeliness
Undocumented data is useless
Partially documented data is risky (misuse)
Data discovery and access
Preservation
Replication standard (Gary King)
Information exchange
Reduce need to access sensitive data
Maintain coherence / linkages across the complete
life cycle (from respondent to policy maker)
Reuse

13
The Data Documentation Initiative

The Data Documentation Initiative is an XML
specification to capture structured metadata
about microdata (broad sense)
First generation DDI 1.02.1 (2000-2008)
focus on single archived instance
Second generation DDI 3.0 (2008)
focus on life cycle
go beyond the single survey concept
mutli-purpose

14
DDI Timeline / Status

Pre-DDI 1.0
70s / 80s OSIRIS Codebook
1993 IASSIST Codebook Action Group
1996 SGML DTD
1997 DDI XML
1999 Draft DDI DTD
2000 DDI 1.0
Simple survey
Archival data formats
Microdata only
2003 DDI 2.0
Aggregate data (based on matrix structure)
Added geographic material to aid geographic
search systems and GIS users
2003 - Establishment of DDI Alliance
2004 Acceptance of a new DDI paradigm
Lifecycle model
Shift from the codebook centric / variable
centric model to capturing the lifecycle of data
Agreement on expanded areas of coverage

2005
Presentation of schema structure
Focus on points of metadata creation and reuse
2006
Presentation of first complete 3.0 model
Internal and public review
2007
Vote to move to Candidate Version (CR)
Establishment of a set of use cases to test
application and implementation
October 3.0 CR2
2008
February 3.0 CR3
March 3.0 CR3 update
April 3.0 CR3 final
April 28th 3.0 Approved by DDI Alliance
May 21st DDI 3.0 Officially announced
Initial presentations at IASSIST 2008
2009
DDI 3.1 and beyond

15
DDI 1/2.x
16
The archive perspective

Focus on preservation of a survey
Often see survey as collection of data files
accompanied by documentation
Code book centric
report, questionnaire, methodologies, scripts,
etc.
Result in a static event the archive
Maintained by a single agency
Is typically documentation after the facts
This is the initial DDI perspective (DDI 2.0)

17
DDI 2.0 Technical Overview

Based on a single structure (DTD)
1 codeBook, 5 sections
docDscr describes the DDI document
The preparation of the metadata
stdyDscr describes the study
Title, abstract, methodologies, agencies, access
policy
fileDscr describes each file in the dataset
dataDscr describes the data in the files
Variables (name, code, )
Variable groups
Cubes
othMat other related materials
Basic document citation

18
Characteristics of DDI 1.0/2.0

Focuses on the static object of a codebook
Designed for limited uses
End user data discovery via the variable or high
level study identification (bibliographic)
Only heavily structured content relates to
information used to drive statistical analysis
Coverage is focused on single study, single data
file, simple survey and aggregate data files
Variable contains majority of information
(question, categories, data typing, physical
storage information, statistics)

19
Impact of these limitations

Treated as an add on to the data collection
process
Focus is on the data end product and end users
(static)
Limited tools for creation or exploitation
The Variable must exist before metadata can be
created
Producers hesitant to take up DDI creation
because it is a cost and does not support their
development or collection process

20
DDI 1/2.x Tools

Nesstar
Nesstar Publisher, Nesstar Server
IHSN
Microdata Management Toolkit
NADA (online catalog for national data archive)
Archivist / Reviewer Guidelines
Other tools
SDA, Harvard/MIT Virtual Data Center (Dataverse)
UKDA DExT, ODaF DeXtris
http//tools.ddialliance.org

21
DDI 2.0 perspective
DDI 2 Survey
DDI 2 Survey
DDI 2 Survey
DDI 2 Survey
DDI 2 Survey
DDI 2 Survey
DDI 2 Survey
22
DDI 3.0

The life cycle

23
When to capture metadata?

Metadata must be captured at the time the event
occurs!
Documenting after the facts leads to considerable
loss of information
Multiple contributors are typically involved in
this process (not only the archivist)
This is true for producers and researchers

24
DDI 3.0 and the Survey Life Cycle

A survey is not a static process It dynamically
evolved across time and involves many
agencies/individuals
DDI 2.x is about archiving, DDI 3.0 across the
entire life cycle
3.0 focus on metadata reuse (minimizes
redundancies/discrepancies, support comparison)
Also supports multilingual, grouping, geography,
and others
3.0 is extensible

25
Requirements for 3.0

Improve and expand the machine-actionable aspects
of the DDI to support programming and software
systems
Support CAI instruments through expanded
description of the questionnaire (content and
question flow)
Support the description of data series
(longitudinal surveys, panel studies, recurring
waves, etc.)
Support comparison, in particular comparison by
design but also comparison-after-the fact
(harmonization)
Improve support for describing complex data files
(record and file linkages)
Provide improved support for geographic content
to facilitate linking to geographic files (shape
files, boundary files, etc.)

26
Approach

Shift from the codebook centric model of early
versions of DDI to a lifecycle model, providing
metadata support from data study conception
through analysis and repurposing of data
Shift from an XML Data Type Definition (DTD) to
an XML Schema model to support the lifecycle
model, reuse of content and increased controls to
support programming needs
Redefine a single DDI instance to include a
simple instance similar to DDI 1/2 which
covered a single study and complex instances
covering groups of related studies. Allow a
single study description to contain multiple data
products (for example, a microdata file and
aggregate products created from the same data
collection).
Incorporate the requested functionality in the
first published edition

27
Designing to support registries

Resource package
structure to publish non-study-specific materials
for reuse
Extracting specified types of information in to
schemes
Universe, Concept, Category, Code, Question,
Instrument, Variable, etc.
Allowing for either internal or external
references
Can include other schemes by reference and select
only desired items
Providing Comparison Mapping
Target can be external harmonized structure

28
Technical Overview

DDI 3 is composed of several schemas
Use only what you need!
Schemas represent modules, sub-modules
(substitutions), reusable, external schemas

archive
comparative
conceptualcomponent
datacollection
dataset
dcelements
DDIprofile
ddi-xhtml11
ddi-xhtml11-model-1
ddi-xhtml11-modules-1
group
inline_ncube_recordlayout

instance
logicalproduct
ncube_recordlayout
physicaldataproduct
physicalinstance
proprietary_record_layout (beta)
reusable
simpledc20021212
studyunit
tabular_ncube_recordlayout
xml
set of xml schemas to support xhtml

29
Technical Overview

Any element that can be referenced is globally
uniquely identified
Maintainable (by an agency)
Versionable (can change across time)
Identifiable (within a maintainable scheme)
Modules
Reflect closely related sets of information
similar to the sections of DDI 1/2. DTD
Modules can be held as separate XML instances and
be included in a large instance by either
inclusion or reference
All modules are maintainable (but not all
maintainables are modules)

30
Technical Overview Maintainable Schemes
(thats with an e not an a)

Category Scheme
Code Scheme
Concept Scheme
Control Construct Scheme
GeographicStructureScheme
GeographicLocationScheme
InterviewerInstructionScheme
Question Scheme
NCubeScheme
Organization Scheme
Physical Structure Scheme
Record Layout Scheme
Universe Scheme
Variable Scheme

Packages of reusable metadata maintained by a
single agency

31
DDI 3.0 Use Cases

Study design/survey instrumentation
Questionnaire generation/data collection and
procesing
Data recoding, aggregation and other processing
Data dissemination/discovery
Archival ingestion/metadata value-add
Question/concept/variable banks
DDI for use within a research project
Capture of metadata regarding data use
Metadata mining for comparison, etc.
Generating instruction packages/presentations

32
Study Design/Survey Instrumentation

This use case concerns how DDI 3.0 can support
the design of studies and survey instrumentation
Without benefit of a question or concept bank

Types of Metadata
Concepts (conceptual module)
Universe (conceptual module)
Questions (datacollection module)
Flow Logic (datacollection module)

ltDDI 3.0gt Concepts Universes
ltDDI 3.0gt Concepts Universes
Final
Drafting/ Review/ Revision

ltDDI 3.0gt Questions Flow Logic
ltDDI 3.0gt Concepts Universes Questions Flow Logic
As the survey instrument is tested, all
revisions and history can be tracked and
preserved. This would include question
translation and internationalization.
Final
Drafting/ Testing/ Revision
34
Questionnaire Generation, Data Collection, and
Processing

This use case concerns how DDI 3.0 can support
the creation of various types of
questionnaires/CAI, and the collection and
processing of raw data into microdata.

Types of Metadata
Concepts (conceptual module)
Universe (conceptual module)
Questions (datacollection module)
Flow Logic (datacollection module)
Variables (logicalproduct module)
Categories/Codes (logicalproduct module)
Coding (datacollection module)

Paper Questionnaire
ltDDI 3.0gt Concepts Universes Questions Flow Logic
Online Survey Instrument
CAI Instrument
Final
Raw Data
Microdata
DDI captures the content XML allows for each
application to do its own presentation
ltDDI 3.0gt Concepts Universes Questions Flow Logic
ltDDI 3.0gt Variables Coding
ltDDI 3.0gt Categories Codes Physical Data
Product Physical Data Instance

36
Data Recoding, Aggregation, etc.

This use case concerns how DDI 3.0 can describe
recodes, aggregation, and similar types of data
processing.

Initial microdata has
Concepts (conceptual module)
Universes (conceptual module)
Questions (datacollection module)
Flow Logic (datacollection module)
Variables (logicalproduct module)
Coding (datacollection module)
Categories (logicalproduct module)
Codes (logicalproduct module)
Physical Data Product
Physical Data Instance
Recode adds
More codings (datacollection module)
New variables
New categories
New codes
NCubes (for aggregation)

Could be a recode, an aggregation, or other
process.
Microdata/ Aggregates
Microdata
ltDDI 3.0gt Conceptual Datacollection Variables Cate
gories Codes
ltDDI 3.0gt Codings Variables (new) Categories
(new) Codes (new) NCubes

38
Data Dissemination/Data Discovery

This use case concerns how DDI 3.0 can support
the discovery and dissemination of data.

39
ltDDI 3.0gt Can add archival events meta-data
Rich metadata supports auto-generation of
websites and other delivery formats
Codebooks
ltDDI 3.0gt Full meta- data set
Websites

Databases, repositories
Research Data Centers
Microdata/ Aggregates
Data-Specific Info Access Systems
Registries Catalogues Question/Concept/ Variable
Banks
40
Archival Ingestion and Metadata Value-Add

This use case concerns how DDI 3.0 can support
the ingest and migration functions of data
archives and data libraries.

41
Supports automation of processing if good DDI
metadata is captured upstream
Provides a neutral format for data migration as
analysis packages are versioned
ltDDI 3.0gt Full meta- data set (?)
Data Archive Data Library
Ingest Processing

Microdata/ Aggregates
ltDDI 3.0gt Full or additional metadata Archival
events
Provides good format foundation for
value- added metadata by archive
42
Question/Concept/Variable Banks

This use case describes how DDI 3.0 can support
question, concept, and variable banks. These are
often termed registries or metadata
repositories because they contain only metadata
links to the data are optional, but provide
implied comparability. The focus is metadata
reuse.

43
Because DDI has links, each type of bank
functions in a modular, complementary way.
Question Bank
ltDDI 3.0gt Questions Flow Logic Codings
ltDDI 3.0gt Questions Flow Logic Codings
Users and Applications
Variable Bank
ltDDI 3.0gt Variables Categories Codes
ltDDI 3.0gt Variables Categories Codes
Users and Applications
ltDDI 3.0gt Concepts
ltDDI 3.0gt Concepts
Users and Applications
Concept Bank
Supports but does not require ISO 11179
44
DDI For Use within a Research Project

This use case concerns how DDI 3.0 can support
various functions within a research project, from
the conception of the study through collection
and publication of the resulting data.

45
Prinicpal Investigator
Research Staff
Collaborators
ltDDI 3.0gt Variables Physical Stores
ltDDI 3.0gt Questions Instrument

ltDDI 3.0gt Concepts Universe Methods Purpose People
/Orgs
ltDDI 3.0gt Funding Revisions

ltDDI 3.0gt Data Collection Data Processing

Data
Archive/ Repository
Submitted Proposal
Publication
Presentations

46
Capture of Metadata Regarding Data Use

This use case concerns how DDI 3.0 can capture
information about how researchers use data, which
can then be added to the overall metadata set
about the data sources they have accessed.

Types of Metadata
Recodes (datacollection module)
Record subsets (physicalinstance module)
Variable subsets (logicalproduct module)
Comparison (comparative module)

Data Sets
ltDDI 3.0gt StudyUnit DataCollection LogicalProduct
PhysicalDataProduct PhysicalInstance

ltDDI 3.0gt
Recodes
Case Selection
Variable Selection
Comparison to original study
Resulting physical file descriptions

Data

Data Analysis
48
Metadata Mining for Comparison, etc.

This use case concerns how collections of DDI 3.0
metadata can act as a resource to be explored,
providing further insight into the comparability
and other features of a collection of data.

Types of Metadata
Universe (comparative module)
Concept (comparative module)
Question (datacollection module)
Variable (logicalproduct module)

Questions
Variable
Concepts
Metadata Repositories/ Registries
Universe
ltDDI 3.0gt Instances

ltDDI 3.0gt
Comparison
Questions
Categories
Codes
Variables
Universe
Concepts
Recodes
Harmonizations

?
Data Sets
50
Generating Instruction Packages/Presentations

This use case concerns how DDI 3.0 can support
automation around the instruction of students and
others.

Types of Metadata
Individual studies (studyunit module)
Grouping purpose (group module)
Linking information (comparative module)
Processing assistance (group module)

ltDDI 3.0gt StudyUnit 1
ltDDI 3.0gt StudyUnit 2
ltDDI 3.0gt StudyUnit 1 StudyUnit 2 StudyUnit
3 StudyUnit 4 Comparative OtherMaterials
ltDDI 3.0gt StudyUnit 3
ltDDI 3.0gt StudyUnit 4
ltDDI 3.0gt StudyUnit 1 StudyUnit 2 StudyUnit
3 StudyUnit 4

Topically related studies selected
Group is made with description of the intended
use for the group
Comparative information is added indicating
matching fields for linking and mapping between
similar variables
Other materials such as SAS/SPSS recode command
are referenced from the group

Instructional Package
52
DDI 3.0 Tools

Under developments
DDI Foundation Tools Program
Road Map
XML Beans, validation,
DDI DExT, DDI2StatsProgs
Other tools
R SPSS Export, Algenta SurveyViz, others
presented at IASSIST
DDI Editing Suite
Proposed as extension of DDI-FTP
Plan for generic editor in 6-9 months
DDI 3.0 related projects / initiatives
RDC Canada, Germany RDC / EURASI, DANS MIXED, NORC

53
DDI 3 Relationship to Other Standards

SDMX (from microdata to indicators / time series)
Completely mapping to and from DDI NCubes
Dublin Core (surveys and documents gets cited)
Mapping of citation elements
Option for DC namespace basic entry
ISO 19115 Geography (microdata gets mapped)
Search requirements
Support for GIS users
METS
Designed to support profile development
OAIS (alignment of archiving standards)
Reference model for the archival lifecycle
ISO/IEC 11179 (metadata mining through concepts)
Variable linking representation to concept and
universe
Optional data element construct in
ConceptualComponent that allows for complete
ISO/IEC 11179 structure as a maintained item

54
DDI 3.0 perspective
55
DDI 2.0 and DDI 3.0
56
DDI 2 / DDI 3

Single survey
Focus on the archive
Non-reusable metadata
Maintained by single agency
Loose validation
DTD based
Sparse documentation
Designed by archivists
Some tools are available

Multiple surveys
Focus on life cycle
Highly reusable metadata
Maintained by many agencies
Tied validation
Schema based
Extensive guide
Designed by expert groups
Tools are beginning to emerge

57
What 3.0 can do for you

Manage multi-surveys
Support multiple contributors
Support many different perspectives
Support many different use cases
Maintain metadata integrity across the life cycle
Connect to other metadata spaces
Metadata reuse
Publication in registries
Backward compatibility with 2.0

58
DDI Community
59
DDI Organizations/ Agencies

DDI Alliance (http//www.ddialliance.org)
Interuniversity Consortium for Political and
Social Research (ICPSR) (http//icpsr.umich.edu)
International Association for Social Science
Infromation Service Technology (IASSIST)
(http//www.iassistdata.org)
International Household Survey Network (IHSN)
(http//www.surveynetwork.org)
Open Data Foundation (ODaF) (http//www.opendatafo
undation.org)
National Opinion Research Center Data Enclave
(NORC) (http//dataenclave.norc.org)
Metadata Technology (http//www.metadatatechnology
.com)

60
IZA Data Service CenterDDI/SDMX
WorkshopWiesbaden, Germany, June 18th 2008The
Statistical Data and Metadata Exchange Standard
(SDMX) An Introduction

Arofan Gregory / Pascal Heus
agregory_at_opendatafoundation.org /
pheus_at_opendatafoundation.org
Open Data Foundation

61
Overview of the Session

SDMX Background and Goals
SDMX and Data
SDMX and Metadata
SDMX and Best Practices The Content-Oriented
Guidelines
The SDMX Information Model
SDMX and Web Services
The SDMX Registry
SDMX Data Services
Tools and Resources

62
SDMX Background and Goals
63
What is SDMX?

The problem space
Statistical collection, processing, and exchange
is time-consuming and resource-intensive
Focus on aggregate data (esp. time series)
Various international and national organisations
have individual approaches for their
constituencies
Uncertainties about how to proceed with new
technologies (XML, web services )

64
What is SDMX?

The Statistical Data and Metadata Exchange
(SDMX) initiative is taking steps to address
these challenges and opportunities that have just
been mentioned
By focusing on business practices in the field
of statistical information
By identifying more efficient processes for
exchange and sharing of data and metadata using
modern technology and open standards

65
Who is SDMX?

SDMX is an initiative made up of seven
international organizations
Bank for International Settlements
European Central Bank
Eurostat
International Monetary Fund
Organisation for Economic Cooperation and
Development
United Nations
World Bank
The initiative was launched in 2002

66

www.z.orgwww.hub.org
180 Countries
Internet, Search, Navigation
www.y.org
www.x.org
67
SDMX Products

Technical standards for the formatting and
exchange of aggregate statistics
SDMX Technical Specifications version 1.0 (now
ISO/TS 17369 SDMX TC 154 WG2)
SDMX Technical Specifications version 2.0 (soon
to be submitted to ISO TC 154 WG2)
Content-Oriented Guidelines (in draft)
Common Metadata Vocabulary
Cross-Domain Statistical Concepts
Statistical Subject-Matter Domains

68
Major Features of SDMX

Structure and formats (XML, EDIFACT) for
aggregate data
Structure and formats (XML) for metadata
Formal information model (UML) for managing
statistical exchange and sourcing
Web-services guidelines and registry services
specification for use of modern technologies
Content-oriented guidelines to recommend best
practices

69
Recent Events

Jan 2007 Launch meeting at the World Bank for
SDMX 2.0 Technical Specifications
February 2007 Endorsement of SDMX by EUs
Statistical Programme Committee
March 2008 SDMX becomes the preferred standard
for data and metadata of the UN Statistical
Commission
Other standards were mentioned DDI and XBRL
specifically

70
Adopters/Interest

The following are known adopters (or planning to
adopt)
US Federal Reserve Board and Bank of New York
European Central Bank
Joint External Debt Hub (WB, IMF, OECD, BIS)
UN/TRADECOM at UN Statistical Division
NAAWE (National Accounts from OECD/Eurostat)
SODI (Eurostat and European Governments)
Mexican Federal System
Vietnamese Ministry of Planning and Investment
Qatar Information Exchange
IMF (BOP, SNA, SDDS/GDDS)
Food and Agriculture Organization
Millenium Development Goals (UN System, others)
International Labor Organization
Bank for International Settlements
OECD
World Bank
Marchioness Islands (Spanish/Portugese
Statistical Region)
UNESCO (Education)

71
Rate of Adoption

Between January 2007 and January 2008, adoption
has doubled
We anticipate a similar rate of growth for the
coming year
Tools are becoming available
UNSC recommendation makes it a safe course to
follow for risk-averse institutions
Training courses are in increasing demand
(Eurostat, Metadata Technology)
Standard data and metadata structures for many
domains are being developed

72
SDMX and Data
73
SDMX and Data Formats

SDMX provides a format for describing the
structure of data (structural metadata)
EDIFACT (was GESMES/TS, now SDMX-EDI)
XML (SDMX-ML)
SDMX provides formats for transmission and
processing of data
EDIFACT (1 message)
XML (4 different equivalent flavors for different
functions)
Data is tabulated, aggregate data (eg,
multi-dimensional/OLAP cubes)
Can be any aggregate data!
Most data formats are derived from the structural
metadata (eg, XML schemas are generated for each
type of structure according to the business
rules)

74
Data Set Structure
75
First Identify the Concepts

A statistical concept is a characteristic of a
time series or an observation (MCV)
A concept is a unit of knowledge created by a
unique combination of characteristics (SDMX
Information Model)
Whatever the definition, statistical concepts are
the DNA of the key family
Their usage (type, structure, sequence) define
the structure of the data

76
Data Set StructureConcepts

Computers need structure of data
Concepts
Code lists
Data values
How these fit together

77
Data Set Structure Code Lists
Code Lists
Concepts
78
Data Makes Sense
Q,ZA,B,1,1999-06-3016547
Quarterly, South Africa, Bank Loans, Stocks, for
30 June 1999
79
Data Set Structure Defining Multi-Dimensional
Structures

Comprises
Concepts that identify the observation value
Concepts that add additional metadata about the
observation value
Concept that is the observation value
Any of these may be
coded
text
date/time
number
etc.

Dimensions
Attributes
Measure
Representation
80
Data Set Structure Concept Usage
(Dimension)
(Dimension)
(Attribute)
(Attribute)
(Dimension)
(Dimension)
(Dimension)
(Measure)
81
SDMX and Metadata
82
SDMX and Metadata

SDMX provides for several types of metadata
Structural (describes structures of data sets and
metadata sets and related items)
Provisioning (describes the sourcing of data
between departments and organizations)
Reference metadata all other types of
metadata (footnotes, methodology, quality, etc.
Can be specified by the user!)
Reference metadata is the most important one it
is what we typically think of as metadata

83
SDMX Metadata Sets

Version 2.0 of the SDMX Technical Specifications
provides XML formats for metadata sets (SDMX-ML)
To describe their structure
To exchange metadata in XML
This is based on concepts (similar to the data
formats)
SDMX supports any metadata concepts the users
wishes to report/exchange/process
May be flat lists or hierarchical
Definitions provided by users, but
recommendations exist for many common concepts
Metadata sets are attached to a formal object in
the information model (an organization, a data
set, a codelist, etc.)

84
SDMX and Metadata

This is a very powerful feature of SDMX
It can be used to integrate/mimic other metadata
standards!
Provides very good support for standard exchange
of metadata which cannot be anticipated by the
designers of systems/standards
Must be based on common agreements about the
meaning of metadata concepts
Often, concepts are taken from other metadata
models/standards such as DDI, Dublin Core, etc.

85
The SDMX Information Model
86
The SDMX Information Model

A formal, documented conceptual model of
statistical exchange, management, and sourcing
Expressed as a UML model
Used as the basis of all SDMX implementation
XML
EDIFACT
Any other programming language/platform
Provides consistency between implementations
Based on analysis of many statistical processing
systems
Describes existing business practices in a
generic way

87
Information Model High-Level Schematic
structure and code list maps
Data or Metadata Structure Definition
Category Scheme
Structure Maps
comprises subject or reporting categories
uses specific data/metadata structure
can be linked to categories in multiple category
schemes
conforms to business rules of the data/metadata
flow
Data or Metadata Flow
Data or Metadata Set
Category
can get data/metadata from multiple data/metadata
providers
publishes/reports data/metadata sets
can have child categories
can provide data/metadata for many data/metadata
flows using agreed data/metadata structure
Registration of Data or Metadata Set
Provision Agreement
URL, registration date etc.
Data Provider
registers existence of data and metadata
88
SDMX and Best Practices The Content-Oriented
Guidelines
89
SDMX Content-Oriented Guidelines

There is a long history of discussion about what
is best practice in the collection of statistics
SDMX decided to define the technical basis for
statistical exchange, and then engage in this
debate
It makes reaching agreements between
organizations easier!
These documents build on many years of work
defining statistical concepts, terms, and
classifications
Although described as statistical, much of what
is here also applies to social science (and
other) research

90
SDMX Content-Oriented Guidelines

Four main documents
Overview
Metadata Common Vocabulary (annex)
Cross-Domain Concepts (2 annexes)
Statistical Subject-Matter Domains (annex)
These will not become ISO specifications, but
will evolve as publications of the SDMX
Initiative
They are now available in their first official
release at www.sdmx.org

91
Common Metadata Vocabulary

A set of terms and definitions for the different
parts of the SDMX technical standards, and many
common concepts used in data and metadata
structures
Does not replace other major vocabularies in this
space (such as the OECD glossary) but references
these other works

92
Cross-Domain Concepts

Includes concepts which are common across many
statistical domains
Names Definitions
Representations
Approximately 130 concepts, some with recommended
representations (codelists)
These are concepts which support both data and
metadata structures
Emphasis on quality frameworks for reference
metadata concepts

93
Statistical Subject-Matter Domains

Based on the UN/ECE classification of statistical
activities
Provides a classification system for use in
exchanging statistics across domain boundaries
Provides a breakdown of the various domains
within official statistics

94
SDMX and Web Services
95
Web-Services Components of SDMX

Web-Services Guidelines
Part of the Technical Specifications package
SDMX Query message
Part of SDMX-ML
SDMX Registry Services
Part of version 2.0 Technical Specifications
Interfaces are in SDMX-ML
Document describes implementation rules

96
Web Services Guidelines

Recommends use of WSS 1.1 for web services which
use SOAP, WSDL
Provides standard function names for many typical
web-services functions
Querying for data
Querying for metadata
Querying for structural information

97
SDMX Query Message

An XML Query to support two-way web-services
calls using XML messages
Designed to support
Queries for structural information from online
databases/repositories
Queries for data from online databases
Queries for metadata from online databases
Part of SDMX-ML
Very similar to the SQL query language supported
by all database packages
Specific to SDMX objects

98
SDMX Registry Services

A registry is a common type of technology
Every Windows machine has a Windows registry to
let applications know what other applications are
on that machine, and where they are located
Web services registries do the same thing on a
network
Functions like a card catalogue in a print
library you can look up resources and find out
how to obtain them
A registry provides a single place on the
Internet where everyone can discover the data,
metadata, and structures that other organizations
use/publish
They do not contain the data and metadata it
just indexes it and links to it

99
SDMX Registry Services (cont.)

SDMX Registry Services are based on generic,
standard web-services registry technology
ISO 15000 ebXML Registry/Repository
OASIS UDDI Registry (part of .NET, etc.)
SDMX Registry Services are not generic
They are specific to SDMX exchanges of data and
metadata, etc.
There is not one central SDMX Registry
Each domain will have its own registry for its
members
The registries can be linked (federated)

100
SDMX Registry/Repository
SDMX Registry Interfaces
Register
Indexes data and metadata
REGISTRY Data Set/Metadata Set
Query
Submit
Describes data and metadata sources and reporting
processes
REPOSITORY Provisioning Metadata
Query
Submit
REPOSITORY Structural Metadata
Describes data and metadata structures
Query
101
SDMX Registry/Repository
SDMX Registry Interfaces
Register
Indexes data and metadata
REGISTRY Data Set/Metadata Set
Query
Subscription/Notification Applications can
subscribe to notification of new or changed
objects
Submit
REPOSITORY Provisioning Metadata
Query
Submit
REPOSITORY Structural Metadata
Describes data and metadata structures
Query
102
The Old JEDH Site
BIS
WEBSITE
IMF
OECD
World Bank
(Various Formats)
(3-month production cycle)
103
JEDH with SDMX
Retrieves data from sites
BIS
SDMX Agent
SDMX-ML
SDMX-ML Loaded into JEDH DB
Info about data is registered
IMF
SDMX-ML
Discover data and URLs
SDMX Registry
OECD
SDMX-ML
Data provided in real time to site
World Bank
SDMX-ML
JEDH Site
SDMX-ML
(Debtor database)
104
Recent and On-Going Developments

Many organizations using SDMX have been
implementing web services
There is growing interest in forming a working
group to further extend the specification for use
with web-services technology
Standard error messages
Expanded function calls
Standard WSDLs
If you are interested in this, please tell me!

105
Tools and Resources
106
SDMX Tools

There are now several sources for SDMX tools
All are free or open-source
Eurostat complete package of tools for data,
metadata, and registry services
Metadata Technology Ltd similar package of
tools
Data editors are usually based on Excel
Some other tools
Open Data Foundation SDMX Browser for data
visualization
OECD, ECB, and UN/Statistical Division provide
some other tools for specific applications
Integration with PC-Axis has been prototyped, to
be available this summer
DevInfo has SDMX support
FAME is developing SDMX support
Commercial vendors provide good support through
web-services functionality
Eg, Oracle 11, .NET, etc.

107
Resources

The SDMX Initiative Site http//www.sdmx.org
The SDMX Toolkit and Forums
http//www.metadatatechnology.com
Various papers and (soon) open-source tools
http//www.opendatafoundation.org

108
IZA Data Service CenterDDI/SDMX
WorkshopWiesbaden, Germany, June 18th
2008SDMX, DDI, and Other Standards

Arofan Gregory / Pascal Heus
agregory_at_opendatafoundation.org /
pheus_at_opendatafoundation.org
Open Data Foundation

109
Overview of the Session

DDI/SDMX Philosophy and Timing of Standards
Development
DDI/SDMX Points of Functional Overlap
DDI/SDMX Direct Mappings
DDI/SDMX Integration Approaches
Other Related Standards and On-Going Work

110
DDI/SDMX Philosophy and Timing of Development
111
Development Philosophies/Timing

Unlike many standards bodies, both the SDMX
Initiative and the DDI Alliance have attempted to
create standards which do not duplicate existing
efforts
There is an awareness that users need to deal
with several different standards
DDI (3.0) and SDMX were both intentionally
aligned with other, related standards
DDI 1./2. existed before SDMX
It was largely self-contained
SDMX was created before DDI 3.0 existed
Created with an awareness of DDI 1./2.
DDI 3.0 benefited from having SDMX as a published
specification
Actively aligned with SDMX and many other
standards

112
SDMX Design

SDMX was intentionally designed to accommodate
integration of standards which are used with the
inputs to aggregate data
This included DDI and XBRL
Mechanism for integration is generic
The key point for this integration is the SDMX
Registry
It provides links between aggregate (SDMX) data
sets, and also to source data and metadata

113
DDI/SDMX Points of Functional Overlap
114
SDMX and DDI as Complementary

DDI is designed to document micro-data
1./2. versions were archival, after-the-fact
documentation
3.0 version covers entire life cycle, but still
has an after-the fact function
SDMX is designed as a standard for processing and
automation
It is not documentary, but is aimed at automation
of statistical systems and exchanges
These purposes are related, but not duplicative
SDMX and DDI can both do useful things within a
single system

115
Examples

DDI could be used to document SDMX-based
aggregates more completely for archival purposes
DDI could be used to document the micro-data on
which aggregates are based
As soon as tabulation occurs, SDMX can be used to
describe and format the data
SDMX can describe micro-data, but it is not very
useful
DDI can be used to automate processing of
multi-dimensional data cubes, but it is more
difficult than with SDMX
SDMX can be used to link DDI instances with other
types of standard data and metadata (including
both SDMX and DDI)

116
DDI and SDMX
SDMX Aggregated data Indicators, Time
Series Across time Across geography Open
Access Easy to use
DDI Microdata Low level observations Single time
period Single geography Controlled access Expert
Audience

Microdata data is a important source of
aggregated data
Crucial overlap and mappings exists between both
worlds (but commonly undocumented)
Interoperability provides users with a full
picture of the production process

117
Generic Process Example
DDI
Survey/Register
Anonymization, cleaning, recoding, etc.
Tabulation, processing, case selection, etc.
Indicators
Raw Data Set
Micro-Data Set/ Public Use Files
Aggregation, harmonization
Aggregation, harmonization
SDMX
Aggregate Data Set (Lower level)
Aggregate Data Set (Higher Level)
118
DDI SDMX?

When you have data which has been
tabulated/aggregated, it may be useful to have
both SDMX and DDI
SDMX for processing and exchanging the data
DDI for documenting these processes, in case they
are of interest to researchers
DDI has a much richer descriptive capability for
addressing the exact processes used in
statistical packages
SDMX is easier to process

119
DDI/SDMX Direct Mappings
120
Direct Mappings DDI SDMX

IDs and referencing use the same approach
(identifiable versionable - maintainable
structured URN syntax)
Both are organized around schemes
Reusable packages of data, similar to relational
tables in databases
Both describe multi-dimensional data
A clean cube in DDI maps directly to/from SDMX
Both have concepts and codelists
DDI has much less emphasis on concepts
SDMX emphasizes concepts because they are needed
for comparison
Both contain mappings (comparison) for codes
and concepts

121
Formal Mapping

There is on-going work to describe a formal
mapping between SDMX and DDI
It will cover these direct correspondences
They are quite obvious a code maps to a code a
concept to a concept etc.
There are currently no tools, because generic
tools such as XSLT will work for this
transformation
Drafts of this work are expected this summer, as
part of the SDMX submission to ISO for the
version 2.0 Technical Specifications
The direct mappings are the easy part!

122
Issues with Direct Mapping

It is possible to describe everything in the DDI
as an SDMX Metadata Set
This is probably not the best way to use SDMX
with DDI!
It is usually better to select the important
fields, and keep the rest in native DDI format
When you map from DDI to SDMX, you typically will
not carry much of the descriptive metadata,
question text, etc.
Mostly structural (codelists, dimensions,
attributes, concepts)
You must have concepts for SDMX which are not
always present in DDI
Going from SDMX to DDI, it is not always possible
to map all the data
Especially for SDMX Metadata Sets, which may have
user-configured concepts that dont always exist
in DDI
Note that SDMX-DDI mappings refer to all versions
of DDI

123
DDI/SDMX Integration Approaches
124
Integration Use Cases

The most important aspect of DDI SDMX
integration is understanding what the use cases
are
This defines what mapping/transformation is
needed
It also defines what links need to be stored
between data and metadata files
There are some common use cases
DDI used to describe and link microdata inputs to
SDMX aggregates
DDI used to more fully document SDMX aggregates
for dissemination to users
Using the SDMX Registry as a lifecycle management
tool for DDI, SDMX, etc.

125
Linking Source Data and Aggregates

DDI provides a wealth of information about the
micro-data which serves as an input to SDMX
aggregates
It is possible to capture these links in SDMX, at
the cell level or higher, to provide automated
access to source data
An SDMX registry can be used to provide easy
access to these links
The user/collector of aggregate data can access
the rich DDI metadata, and possibly the data (if
they have access rights)
It is possible to automatically generate SDMX
output from the DDI metadata describing
tabulation of micro-data
This may not be useful if the desired SDMX target
is a standard cube structure described by another
organization
It may make transformation to the standard cube
easier, however
The SDMX Registry provides a good tool for
managing links
Links between SDMX and DDI files are stored as
Metadata Reports

126
Demo SDMX DDI Source Links
127
DDI SDMX for Dissemination

Typically, the full DDI documentation is not
provided on web-sites which publish
aggregates/indicators
SDMX is becoming a popular dissemination format
for these data
It has been shown to increase the use of data on
the Web
If the DDI documentation is available, this could
also be delivered as additional documentation
Especially useful at study level
Links could be directly embedded in SDMX data
files as attributes or stored in an SDMX
Registry, or both

128
The SDMX Registry for Lifecycle Management

The SDMX Registry provides a tool for tracking
the sources of data for aggregates
It can also track the transformation of versions
of DDI as the data moves through the lifecycle
There is an SDMX model for processes
This can be used to describe the DDI lifecycle
model
SDMX Metadata Reports can be used to link DDI
metadata to specific stages of the DDI lifecycle,
and to each other
Applications could query the SDMX Registry to
discover all of the DDI metadata produced
upstream, as micro-data is collected and processed

129
Demos

SDMX Metadata Report used to express DDI metadata
SDMX Metadata Report used to link DDI instances

130
Other Related Standards and On-Going Work
131
Many Related Standards

DDI
SDMX
ISO/IEC 11179 concept management and semantic
modelling
ISO 19115 Geographical metadata
METS packaging/archiving of digital objects
PREMIS Archival lifecycle metadata
XBRL business reporting
Dublin Core citation metadata
Standard mappings are being defined by people
from many different organizations (see
presentation from METIS 2008 in Luxembourg)

132
ISO/IEC 11179

ISO/IEC 11179 is used to describe the meanings
and representations of terms and concepts
Both SDMX and DDI are aligned with ISO/IEC 11179
SDMX and DDI concepts can be defined using the
ISO/IEC 11179 attributes
Codelists and categories can be directly mapped
(and other representations)
ISO/IEC 11179 can be implemented with DDI
(directly, for concepts) and/or with SDMX (as a
Metadata Report)
ISO/IEC 11179 has no standard expression in XML
it is just a model

133
ISO 19115 Geographical Metadata

ISO 19115 describes geographies (bounding boxes
for countries, etc.)
DDI uses the ISO 19115 model in its own XML
It does not use the standard ISO 19115 XML
format, but there is a 1-to-1 mapping
SDMX could model ISO 19115 if desired
Linking to DDI or ISO 19115 XML is probably more
useful, using the standard SDMX mechanism
Most geographies in SDMX aggregate data sets are
coded, not directly described

134
METS

METS is used to package a set of files which work
together as a digital object
Both DDI and SDMX metadata could be placed inside
a METS wrapper
They would be metadata sections
The primary use case would be for archiving of a
set of related data and metadata files, possibly
with other related materials such as research
publications

135
PREMIS

PREMIS allows for the capture of administrative
metadata as a collection is placed and managed
within the archive
DDI and SDMX files would be treated like any
other files forming part of the collection
Both may contain metadata which can be extracted
and used to populate PREMIS instances (access
levels, confidentiality, etc.)

136
XBRL

XBRL is used by business to report required
information to national supervisory bodies
This includes banking supervision and other
economic data
XBRL is a source format for some aggregate
statistics
XBRL International and the SDMX Sponsors are
working together to define a cross-walk between
the two standards

137
Dublin Core