Title: Workshop%20on%20Metadata%20Standards%20and%20Best%20Practices%20November%2019-20th,%202007%20Session%202%20Metadata%20specifications%20for%20socio-economic%20science%20and%20supporting%20initiatives
1Workshop on Metadata Standards and Best
PracticesNovember 19-20th, 2007Session
2Metadata specifications for socio-economic
science and supporting initiatives
- Pascal Heus
- Open Data Foundation
- pheus_at_opendatafoundation.org
- http//www.opendatafoundation.org
2Outline
- Metadata specifications
- Key players
- Ongoing initiatives
- Conclusions / QA
3What is Metadata?
- Common definition Data about Data
4What are XML specifications? (1)
- XML is a language that facilitate the capture of
descriptive elements and attributes - Different objects carry different characteristics
(book, car, weather) - We need to agreed on common set of descriptive
elements (semantic) - Just like we used to design database, we have to
describe the structure - This modeling process creates a Document Type
Definition (DTD) or an XML Schema
5What are XML specifications? (2)
- Specifications are made available to the general
public on the web - Usually a URL
- Can be turned into a standard (ISO)
- Typically maintained by a consortium of agencies
- Independent model
- OASIS, W3C
- ISO
6A suggested set for socio-economic data
- Statistical Data and Metadata Exchange (SDMX)
- Macrodata, time series, indicators, registries
- http//www.sdmx.org
- Data Documentation Initiative (DDI)
- Microdata (surveys, studies)
- http//www.ddialliance.org
- ISO 11179
- Semantic modeling, concepts, registries
- http//metadata-standards.org/11179/
- ISO 19115
- Geography
- http//www.isotc211.org/
- Dublin Core
- Resources (documentation, images, multimedia)
- http//www.dublincore.org
7Statistical Data and Metadata Exchange (SDMX)
- Purpose Exchange of statistical information
(time series/indicators). - Covers the metadata capture as well as
implementation of registries. - Currently version 2.0 and also an ISO standard
(173692005) - Sponsors Bank for International Settlements
(BIS), European Central Bank (ECB), EUROSTAT,
International Monetary Fund (IMF), Organization
for Economic Cooperation and Development (OECD),
United Nations (UN), World Bank - Can actually be used for many other purposes.
Its a metadata metadata model. - http//www.sdmx.org
8Data Documentation Initiative 1/2.x
- Purpose Archive and document survey microdata
- Effort to establish an international XML-based
standard for the content, presentation,
transport, and preservation of documentation for
datasets in the social and behavioral sciences - Sections document, survey, files, variables,
other material - Used by data archives (producers) and librarians
- Sponsors DDI Alliance
- http//www.ddialliance.org
9Data Documentation Initiative 3.0
- Purpose Document the survey life cycle
- Major shift from DDI 1/2.x
- Currently in candidate recommendation, release in
2008 - Sponsors DDI Alliance
- http//www.ddialliance.org/ddi3
10DDI SDMX
- Are complementary specifications
- DDI 3.0 and SDMX 2.0 have been designed to work
with each other - SDMX registries can wrap DDI documents
- Microdata single point in time / geography, high
level of details (for statisticians, researchers) - Macrodata high level indicators across time and
geography (fro economists, policy makers) - Using DDISDMX allows linkages and drilling down
from indicator to its source - See "DDI and SDMX Complementary, Not Competing,
Standards", A. Gregory, P. Heus, July 2007
available at http//www.opendatafoundation.org/?lv
l1resourceslvl2papers
11ISO 11179
- Purpose Manage registries / concepts
- international standard for representing metadata
for an organization in a Metadata Registry (a
central location in an organization where
metadata definitions are stored and maintained in
a controlled method) - Compliance with this standard is important and
both DDI 3.0 and SDMX have mapping mechanisms - Sponsors ISO/IEC Joint Technical Committee on
Metadata Standards - http//metadata-standards.org/
12ISO 19115
- Purpose Capture geography
- It is a component of the series of ISO 191xx
standards for Geospatial metadata. - ISO 19115 defines how to describe geographical
information and associated services, including
contents, spatial-temporal purchases, data
quality, access and rights to use. - Compliance in DDI 3.0
- Sponsors ISO/TC 211 Geographic
information/Geomatics - http//www.isotc211.org/
13Dublin Core
- Purpose describe resources
- standard for cross-domain information resource
description - widely used to describe digital materials such as
video, sound, image, text, and composite media - Small sore set of elements
- Used for survey documentation
- Sponsors Dublin Core Metadata Initiative
- http//dublincore.org/
14Advantages of XML metadata
- Metadata is easy to transform
- From one standard to another or into different
format - DDI to SDMX, Dublin Core, MARC
- To other formats fro presentation
- HTML, PDF
- Metadata is easy to exchange
- Web services (SOAP, REST, etc.)
- Metadata is searchable
- XPath, XQuery
- All these are native feature of XML
15PART 2Active agencies and ongoing initiatives
16DDI Alliance
- Membership based organization
- Agencies ICPSR, World Bank, Open Data Foundation
- National data archives Danish, Finish, Dutch,
Norway, Swiss, UK - Germany Centre for Survey Research and
Methodology (ZUMA), German Socio-Economic Panel
Study (SOEP), Zentralarchiv fuer Empirische
Sozialforschung (University of Koeln) - Universities Alberta, Berkeley, Guelph,
Harvard/MIT, Minnesota, etc. - Steering and Expert Committee
- Meets annually at IASSIST
- http//www.ddialliance.org
17ICPSR
- The Interuniversity Consortium for Political and
Social Research - The world's largest archive of digital social
science data - Acquire and preserve social science data
- Provide open and equitable access to these data
- Promote effective data use
- Home of the DDI Alliance
- http//www.icpsr.umich.edu
18International Household Survey Network
- Partnership of international organizations
seeking to improve the availability, quality and
use of survey data in developing countries - United Kingdom Department for International
Development (DfID), International Labor
Organization (ILO), Partnership for Statistics in
the 21st Century (PARIS21), United Nations
Children Fund (UNICEF), United Nations Statistics
Division (UNSD), World Health Organization and
the Health Metrics Network (WHO/HMN), World Bank - Plays a major role in the adoption of DDI around
the globe, active in many developing countries - Developer of the Microdata Management Toolkit
- http//www.surveynetwork.org
19Open Data Foundation
- US based non-profit organization
- Adoption of global metadata standards and the
development of open-source solutions promoting
the use of statistical data - Coordination of development efforts
- Board of directors, advisors and management group
- Open to individual membership, institutional
association is through projects - http//www.opendatafoundation.org
20Metadata Technology
- UK based private company
- Consulting services and development of tools
based on open standards and open source - Training services, registry services, metadata
repositories, hosting - Focus on SDMX, DDI and related standards
- http//www.metadatechnology.com
21IASSIST
- International Association for Social Science
Information Service Technology - IASSIST is an international organization of
professionals working in and with information
technology and data services to support research
and teaching in the social sciences. - Individual based membership
- Primary platform for DDI community
- Annual conference
- 2008 Stanford, CA, 2009 Tampere, Finland
- DDI Alliance annual meeting
- http//www.iassistdata.org/
22DDI Foundation Tools Program
- Initiative aiming at the development of a
Foundation Framework and a Toolkit to support the
implementation of DDI applications and utilities
(open source) - MOU established September 2007, 2-year program
(renewable on a annual basis afterwards) - Canada Research Data Centre Network, Danish Data
Archive, DDI Alliance, GESIS-ZUMA, National
Opinion Research Center (NORC), Open Data
Foundation (ODaF), and the UK Data Archive (UKDA) - Web site coming soon
23UKDA Data Exchange Tools (DExT)
- Aim to develop, refine and test models for data
exchange for both survey data and qualitative
research data based on XML/RDF schema and will
develop tools for data import and export - Research the feasibility of developing automated
conversion procedures for legacy formats - ODaF currently involved in data conversion tool
and qualitative metadata (QuDExT) - http//www.data-archive.ac.uk/dext/
24NORC Data Enclave
- National Opinion Research Center
- provides a secure environment within which
authorized researchers can access sensitive
microdata remotely from their offices or onsite - Data from National Institute for Standards and
Technologys (NIST) Technology Innovation Program
(TIP), the Ewing Marion Kauffman Foundation, and
the Economic Research Service at the US
Department of Agriculture - Possibly the first virtual data enclave
- http//dataenclave.norc.org
25Canada RDC Project
- Consists of 14 Research Data Centres Centres, 6
branch RDCs and the Federal Research Data Centre
in Ottawa - Data provided by Statistics Canada
- RDC are now connected through a high speed secure
network - Project to adopt a DDI 3.0 based metadata
framework for survey documentation and research
work and sponsor development of tools - ODaF providing technical assistance
- http//www.statcan.ca/english/rdc/index.htm
26EU 7th Research Framework Program
- Under Socio-economic Sciences and Humanities
related specific 2007 objectives to bring
together existing research infrastructures to
support the efficient provision of essential
research services - INFRA-2008-1.1.2.27 promoting European wide
access to microdata sets of official statistics
for research and leading to a European
statistical system open to researchers. - INFRA-2008-1.1.2.28 (through the development,
harmonisation and optimal use of indicators and
data for economic and innovation research) - INFRA-2008-1.1.2.29 (Developing improved access
to historical archives and cultural collections
for research purpose). - Call coming out this month (due mid-Feb)
- Proposal will be made for RDC networking/remote
access, data disclosure and metadata (Germany
contact is Stefan Bender at IAB Nurnberg RDC)
27Conclusions
- Metadata specifications available but need tools
- Lost of complementary ongoing initiatives and
potential synergies - Need coordination and partnerships (ODaF)