Workshop on Metadata Standards and Best Practices November 19-20th, 2007 Session 1 Leveraging Metadata Standards in RDC - PowerPoint PPT Presentation

Loading...

PPT – Workshop on Metadata Standards and Best Practices November 19-20th, 2007 Session 1 Leveraging Metadata Standards in RDC PowerPoint presentation | free to download - id: 58ad19-YTdjZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Workshop on Metadata Standards and Best Practices November 19-20th, 2007 Session 1 Leveraging Metadata Standards in RDC

Description:

Workshop on Metadata Standards and Best Practices November 19-20th, 2007 Session 1 Leveraging Metadata Standards in RDC Pascal Heus Open Data Foundation – PowerPoint PPT presentation

Number of Views:323
Avg rating:3.0/5.0
Slides: 37
Provided by: pasca9
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Workshop on Metadata Standards and Best Practices November 19-20th, 2007 Session 1 Leveraging Metadata Standards in RDC


1
Workshop on Metadata Standards and Best
Practices November 19-20th, 2007 Session
1 Leveraging Metadata Standards in RDC
  • Pascal Heus
  • Open Data Foundation
  • pheus_at_opendatafoundation.org
  • http//www.opendatafoundation.org

2
Outline
  • PART 1 General issues ODaF
  • Needs and challenges in statistical data and
    metadata management
  • Metadata and XML solutions
  • Selecting specifications
  • Need for tools
  • Open Data Foundation
  • PART 2 RDC Specific issues
  • Metadata in RDCs
  • Solutions and benefits
  • Tools and ongoing initiatives
  • Conclusions / QA

3
What is Metadata?
  • Common definition Data about Data

4
Managing data and metadata is challenging!
We are in charge of the data. We support our
users but also need to protect our respondents!
We have an information management problem
We want easy access to high quality and well
documented data!
We need to collect the information from the
producers, preserve it, and provide access to our
users!
5
XML to the rescue!
  • XML is driving todays web service oriented
    architecture of the Internet and Intranets
  • Using XML, we can capture, structure, transform,
    discover, exchange, query, edit and secure
    metadata and data
  • XML is platform language independent and can be
    used by everyone
  • XML is both machine and human readable
  • XML is non-proprietary, public domain and many
    open tools exist
  • Domain specific standards are available!

6
XML Technical Overview
7
XML Solutions
Well documented data, here we come!
Great, I can provide public metadata!
Use our specifications and your will be happy! It
will harmonize everything.
Now we can talk to each other!
8
Lets use XML, but.
?
Which specifications should we adopt?
How do we do this? Where are the tools and
guidelines?
9
Open Data Foundation (ODaF)
  • US Based non-profit organization, established
    2006
  • Directors, advisors and managers from statistical
    and ICT communities
  • Project oriented
  • Mission
  • Focus on socio-economic data
  • Adoption of global metadata standards
  • Coordinated development of open-source tools
  • Capacity building
  • Improving data and metadata accessibility and
    overall quality
  • Operate at the global level

10
Selecting XML specifications
  • A single specification is not enough!
  • XML specifications commonly focus on a specific
    area of knowledge and/or set of functionalities
  • Cannot answer the needs of all actors
  • XML mappings between specifications are possible
  • Information can be converted from one domain to
    another and be carried across communities
  • Which ones should we use?
  • Fit for purpose
  • Widely accepted and supported
  • Can be mapped to a cross-domain family

11
A suggested set for socio-economic data
  • Statistical Data and Metadata Exchange (SDMX)
  • Macrodata, time series, indicators, registries
  • http//www.sdmx.org
  • Data Documentation Initiative (DDI)
  • Microdata (surveys, studies)
  • http//www.ddialliance.org
  • ISO 11179
  • Semantic modeling, concepts, registries
  • http//metadata-standards.org/11179/
  • ISO 19115
  • Geography
  • http//www.isotc211.org/
  • Dublin Core
  • Resources (documentation, images, multimedia)
  • http//www.dublincore.org

12
The need for Tools
We set specifications and standards. Tools are
not our mandate
We produce data not tools! We dont have the
expertise.
We preserve and disseminate data not software! We
dont have the expertise
We use data and software but we dont build
tools! We dont have the expertise
13
The need for Tools
Mandated to develop tools Provide cross-domain
expertise in ICT and statistics Provide umbrella
for coordinated development Ensure
inter-operability Outline harmonized architecture
and environment Promote open source / maximize
reusability Foster global registries Resources/Fun
d raising
14
ODaF Vision
  • Promote and facilitate the production and use of
    open data
  • Public metadata, high quality, fully documented,
    respondent protected, easy to find, accessible in
    accordance to statistical principles and
    legislations
  • Foster a global harmonized framework
  • Facilitate the flow of data and metadata
  • Promotes dialog between all stakeholders

Unlock the Data!
15
Some Projects Ideas
  • Guidelines for an harmonized architecture and
    development environment
  • Roadmap for tools development
  • XML mappings
  • Facility to host development of open source
    projects (GForge)
  • Provide hosting services for agencies
  • Implement registries / catalogs
  • Produce training and reference material
  • Technical support capacity building
  • Advocacy

16
ODaF partners / clients
  • Statistical agencies / producers
  • Data Archives
  • Academic Research communities
  • Standard settings agencies consortiums
  • Governmental organizations
  • International organizations
  • Open source community
  • Software developers
  • IT Vendors

17
Growing solutions in a complex environment
XML-DB
Programming
XSLT
XPath
SOAP
Databases
Warehouse
SDMX
Web
GIS
ISO 11179
XML
DDI
Infrastructure
ISO 19115
TECHNOLOGY
SAS
METADATA
DCMI
Excel
Stata
Registries
Accessibility
ANALYSIS
DISSEMINATION
SPSS
Legal
DISCOVERY
Privacy
Toolkit
Disclosure
PRESERVATION
Access
SECURITY
SDDS
Blaise
PRODUCTION
QUALITY
GDDS
USE
CSPro
DQAF
What are we concerned with?
18
Growing solutions in a complex environment
XML-DB
Programming
XSLT
XPath
SOAP
Databases
Warehouse
SDMX
Web
GIS
ISO 11179
XML
DDI
Infrastructure
ISO 19115
TECHNOLOGY
SAS
METADATA
DCMI
Excel
Stata
Registries
Accessibility
ANALYSIS
DISSEMINATION
SPSS
Legal
DISCOVERY
Privacy
Toolkit
Disclosure
PRESERVATION
Access
SECURITY
SDDS
Blaise
PRODUCTION
QUALITY
GDDS
USE
CSPro
DQAF
CHALLENGE We need a set of tools that work
together in an harmonized framework. This
requires coordinated efforts and expertise from
the various communities
  • OPEN DATA FOUNDATION
  • Provide cross-domain IT expertise
  • Coordinate and support development
  • Knowledge sharing
  • Capacity Building
  • Provide global vision and guidance

19
Challenges
  • The technology is available today
  • The right people are available today
  • The need and the will are there
  • The real challenges are
  • Tools availability
  • Awareness / Understanding of technology
  • Change management
  • Coordination Guidance
  • Focused resources and funding
  • Institutional commitment
  • Learn for the past for a better future
  • Its not about data, its about people

20
Summary
  • Managing data and metadata is challenging
  • Solutions exist to make it easier and provide
    better information to unlock the data
  • Adopt a set of specifications that answer your
    requirements and can connect across domains
  • DDI, SDMX, ISO 11179, Dublin Core, ISO 19115
  • Promote the use and development of open tools, do
    not work in isolation, get the appropriate
    expertise
  • Open Data Foundation

21
PART 2 Metadata RDCs
22
PART 2
  • RDC metadata perspective
  • List of stakeholders / initiatives
  • Benefits of adopting metadata
  • Challenges
  • Tools demo (IHSN Toolkit)

23
RDC Objectives
  • Provide a secure environment for the researcher
    to perform the in depth analysis of
    sensitive/confidential data in a cost effective
    way
  • Facilitate the capture, sharing and dissemination
    of research knowledge
  • Provide feedback to the producer on data usage
    and quality
  • Exchange information with other RDCs / agencies
    / public
  • Overall benefit all stakeholders producers,
    librarians, researcher, general public, etc.

24
RDC metadata
  • Simple access to data file and codebook is
    insufficient. Researcher need high quality
    comprehensive metadata and a collaborative
    environment to promote dynamic research
  • Traditionally, survey metadata has focused on
    archiving/preservation (current DDI 1/2.x)
  • This however insufficient and should extended
    into both the survey production process and the
    secondary use of the data
  • New DDI 3.0 meets such requirements
  • RDC ideal environment for capture of researcher
    metadata

25
DDI 3.0 and the Survey Life Cycle
  • A survey is not a static process
  • It dynamically evolved across time and involves
    many agencies/individuals
  • DDI 2.x is about archiving, DDI 3.0 extends to
    life cycle
  • 3.0 is a modular framework available for multiple
    purposes (use cases)
  • Metadata is key to comprehensive capture of
    knowledge

26
RDC issues
  • Without producer metadata
  • researchers cant work discover data or perform
    efficient work
  • Without researcher metadata
  • producer dont know about data usage and quality
    issues
  • Other researcher are not aware of what has been
    done
  • Without standards
  • Information cant be properly managed and
    exchanged between agencies or with the public
  • Without tools
  • Cant capture and preserve/share knowledge

27
When to capture metadata?
  • Metadata must be captured at the time the event
    occurs!
  • Documenting after the facts leads to considerable
    loss of information
  • This is true for producers and researchers

28
RDC Metadata Framework
1. Producer provide data basic docs
2. Need to enhance existing metadata
3. Start capturing researcher metadata
4. Knowledge grows and gets reused
5. Provides usage and quality feedback to
producer / RDC
6. Repeat across surveys/topics
Research Output
7. Metadata facilitates output
8. Public metadata facilitates data
discovery / fosters global knowledge
Research Metadata
9. Metadata exchange between agencies
Public Use metadata
Producer/Archive Metadata
Data
29
Metadata Components
  • Producer metadata
  • Codebook, questionnaires, reports, methodologies,
    processing, scripts, quality, admin, etc.
  • Research metadata
  • Recodes, analysis, table, scripts, papers,
    references, logs, quality, usage
  • Activities, discussions, knowledge base
  • Outputs
  • Papers, presentations, tables
  • Public metadata
  • Metadata stripped out of sensitive information
    (summary statistics, sensitive variables, etc.)
  • Metadata capture can be manual, semi-automated,
    automated

30
RDC Solutions
  • Metadata management
  • Adopt standards and provide researcher with
    comprehensive metadata
  • Use related tools to capture research process
  • Metadata mining and reporting utilities
  • Collaborative environment
  • Used web technologies to foster a dynamic
    research environment
  • Connected and Remote enclaves
  • Connect RDCs through secure networks
  • Consider virtual data enclave or batch analysis
  • Data disclosure
  • Protect respondent through sound data disclosure
    techniques (using metadata as well)
  • Train producers/researchers (methods and data)

31
Solution Examples
  • Simple solutions use good practices
  • File and variable naming conventions, sound
    statistical methods
  • Comment source code
  • Document the work
  • Metadata solutions
  • DDI tools, citation database, source code level
    metadata capture, variable recodes, table
    disclosure, data quality feedback, comparability
  • Web based collaboration environment
  • Wiki, blogs, discussion groups, events/todo

32
Benefits (1)
  • Comprehensive data documentation
  • Through good metadata practices, comprehensive
    documentation is available to the researchers
  • Preservation, integration and sharing of
    knowledge
  • Research process is captured and preserved in
    harmonized formats
  • Research knowledge becomes integrant part of the
    survey and available to all
  • Reduce duplication of efforts and facilitates
    reuse
  • Producer gets feedback from the data users
    (usage, quality issues)

33
Benefits (2)
  • Research outputs and dissemination
  • Facilitate production of research outputs
  • Facilitate dissemination and fosters broader
    visibility of research outputs
  • Exchange of information
  • Metadata exchange between RDC, producers,
    librarians
  • Importance of public metadata for sensitive
    datasets
  • Facilitate data discovery (inside and outside
    RDC)
  • Advanced metadata mining / comparability

34
Answering the tools challenge
  • Metadata standards are available but there is a
    lack of tools for metadata management
  • Several efforts are ongoing
  • DDI Alliance, International Household Survey
    Network, UK Data archive, NORC Data Enclave,
    Canada RDC, Open Data Foundation
  • DDI Foundation Tools Program, UK DExT, Canada
    RDC, EU Framework 7
  • Joint efforts will minimize costs, maximize
    reusability and foster tool harmonization /
    interoperability
  • Open source model availability sustainability

35
RDC challenges
  • Adopting good metadata management framework takes
    effort
  • Survey metadata must first be compiled
  • ICT capacity building and tools development
  • Producer and researchers need to be trained
  • Not only a technological challenge
  • change management, training
  • Leads to better research, shared knowledge,
    better user/producer dialog, improved data
    quality
  • Meets the mandate of RDC

36
IHSN Toolkit Quick Demo
1
Import data and compile metadata
3
Generate HTML based CD-ROM
2
Import metadata and prepare CD-ROM
About PowerShow.com