Metadata Challenges National Forum for Geosciences Information Technology FGIT 2005 Annual Meeting O - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Metadata Challenges National Forum for Geosciences Information Technology FGIT 2005 Annual Meeting O

Description:

Covers the needs of the target consumers of the metadata ... Only one person with each (even obscure) album needed to do this to build the database. ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 28
Provided by: Rah48
Category:

less

Transcript and Presenter's Notes

Title: Metadata Challenges National Forum for Geosciences Information Technology FGIT 2005 Annual Meeting O


1
Metadata ChallengesNational Forum for
Geosciences Information TechnologyFGIT 2005
Annual MeetingOctober 6-7, 2005Washington, D.C.
  • Sara Graves, PhD
  • Director, Information Technology and Systems
    Center
  • University Professor, Computer Science Department
  • University of Alabama in Huntsville
  • Director, Information Technology Research Center
  • National Space Science and Technology Center
  • 256-824-6064
  • sgraves_at_itsc.uah.edu
  • http//www.itsc.uah.edu

2
Got Metadata?
  • Metadata is data about data 1
  • meta pref. Beyond, transcending, more complete
    Greek, from meta, beside, after. 1
  • Is all data inherently meta?
  • It is always "about" something, not the thing
    itself 2
  • In the absence of metadata, information becomes
    noise 3
  • All data can be considered metadata whats data
    and whats metadata depends on whos doing the
    asking 4
  • Theres metadata everywhere Challenge is to
    harness and organize this information to enable
    data usability
  • Metadata is information about a resource where
    the resource can be data, information, workflow
    or compute resource.
  • Metadata is the key to ensuring that resources
    will survive and continue to be accessible in the
    future 5

1 dictionary.com 2 Metadata and the Joy of Vague
Boundaries http//www.kn.com.au/networks/2003/11/
metadata_and_th.html 3 discussion on
http//groups.yahoo.com/group/ebook-community/mess
age/18470 4 The End of Data? Journal of the
Hyperlinked Organization, Oct 15, 2004
http//www.hyperorg.com/backissues/joho-oct15-04.h
tmldata 5 Understanding Metadata, National
Information Standards Organization, 2004
3
Formal structured metadata
  • Formal Metadata is metadata that follows some
    standard specification that provides a common set
    of terminology, definitions and information about
    values to be provided
  • Formal metadata are formally-structured
    documentation of resources
  • They describe the "who, what, where, when, why,
    and how" of every aspect of the resource.
  • Formal metadata
  • help organize and maintain an organization's
    internal investment in a resource
  • provide information to data catalogs,
    clearinghouses, search engines etc.
  • provide information to aid data transfers.
  • Metadata should be recorded when the information
    needed for metadata is known, not after the fact,
    when important information may be lost or
    forgotten.

4
Do we need formal metadata?
  • Google indexes every word in every document, so
    why bother with creating metadata?
  • What about science data and resources, as
    distinct from documents?
  • Even for documents, metadata such as Dublin Core
    can make searches better
  • Formal processes such as standards help in
    obtaining and maintaining complete metadata, and
    automating these processes
  • Formal metadata structures can yield
    machine-readable and operable information

5
Roles of metadata in cyberinfrastructure
SEARCH ACCESS
  • Facilitates discovery and access
  • Content information for resource discovery
  • Location information for resource access
  • Facilitates use
  • Syntactic information for resource
    interoperability and integration
  • Semantic information for automated reasoning and
    analysis
  • Facilitates preservation
  • All these types of metadata and more (quality,
    provenance, etc.) support digital identification
    and preservation

USE
6
Basic tension - how much metadata?
  • How much metadata do typical science data
    producers want to provide?
  • The minimum they can get away with
  • How much metadata do typical science data users
    want available?
  • As much as they can have

7
What are some metadata design principles?
  • Who is the metadata for?
  • End-users, scientists, students needing the data
    for their research or decision analysis
  • What metadata schema?
  • What specification should one select such that
    it
  • Covers the needs of the target consumers of the
    metadata
  • Allows interoperability with other systems
  • FGDC, ISO, Dublin Core Initiative
  • Why use specific metadata elements?
  • May not want to use specification as is
  • Need to create an application profile or
    extension suited for the target user community
  • Deleting, modifying or adding of metadata
    elements is based on determining its importance
    to the user community

8
Good metadata requires collaboration of domain
science and information technology
  • Information Technology Scientists
  • Information Science Research
  • Knowledge Management
  • Data Exploitation
  • Domain Scientists
  • Research and Analysis
  • Data Set Development
  • Collaborations
  • Accelerate research process
  • Maximize knowledge discovery
  • Minimize data handling
  • Contribute to both fields

Domain Scientists
Information Scientists
9
Six principles of good metadata
  • Appropriate to the data collection, its users,
    and intended uses
  • Supports interoperability
  • Uses standard, controlled vocabularies
  • Includes clear statement on terms of use for
    digital object
  • Metadata is also a data object, with qualities of
    archivability, persistence, unique
    identification, etc.
  • Supports long-term management of objects in the
    collection

Understanding Metadata, National Information
Standards Organization, 2004
10
Can metadata creation tools ease the burden?
  • Templates enter metadata values into pre-set
    fields
  • Mark-up tools structure metadata attributes and
    values to specified schema
  • Extraction tools analyze digital (text)
    resource to automatically create metadata
  • Note that metadata quality varies greatly
    depending on content and structure of source
    text.
  • Conversion tools translate metadata from one
    format to another

11
Can automated metadata harvesting work for
science data and resources?
  • Challenges to automated metadata generation
    include
  • How to decode metadata embedded in directory
    structures and file names?
  • How to locate relevant external metadata?
  • How to use structured metadata embedded within
    data files (e.g., NetCDF COARDS/CF conventions)?
  • May need specialized code for each data type

12
How do we discover data and resources?
  • Ideally, search results should be accurate and
    complete
  • Find only what you really want
  • Find everything you really want
  • Approaches to meet this challenge
  • Registries Emphasis on accuracy structured
    metadata, possibly controlled vocabulary, rely on
    resource providers to register resources, may
    mandate participation of specified user community
  • Examples GCMD, FGDC Clearinghouse
  • Web crawlers Emphasis on completeness harvest
    and index all available information, better for
    documentation than science data
  • Examples general web search engines, Mercury

13
Google for science data discovery
  • Search metadata is document text, indexed
  • Results here are primarily science data sources

14
Data discovery by browsing metadata
  • Search metadata consists of a list of datasets
    grouped by collection
  • Additional metadata includes structured
    documentation and browse images

15
Data discovery by browsing a flight calendar
  • Search metadata includes field campaign, flight
    platform and date
  • Additional metadata includes flight track and
    instrument histograms

16
Metadata Interoperability ChallengesMediation
among different schemas
  • Metadata crosswalks mapping of elements, syntax
    and semantics from one metadata scheme to another
  • Examples GEON, MMI
  • Metadata registries integrating resources,
    documenting each metadata element
  • Example EPA Environmental Data Registry

17
Metadata Interoperability ChallengesMediation
among different catalogs
  • Cross-system search
  • Data providers support a common search API - map
    own search capabilities to common search
    attributes
  • Examples Z39.50, EOSDIS IMS
  • Metadata aggregation
  • Data providers support a central metadata
    repository - translate native metadata to common
    set of core elements for aggregation
  • Examples Open Archives Initiative, ECHO

18
Metadata Interoperability ChallengesMediation
among different domain vocabularies
  • Standard metadata specification provides the
    attributes and their definitions BUT what about
    the values
  • What vocabulary should be used for these values?
  • Two approaches
  • Use controlled vocabulary
  • Examples Global Change Master Directory (GCMD),
    Climate and Forecasting (CF)
  • Use an ontology where it not only acts as an
    extended controlled vocabulary but also provides
    context and relationship for the values

19
Beyond data discovery how can metadata improve
data usability?
  • Syntactic and semantic metadata are required for
    effective exchange and use of digital objects
    described by metadata
  • Syntactic metadata describes data structures
    within the data object may be stored within the
    data object or separately
  • Examples README files, self-describing data
    formats (HDF, NetCDF), ESML
  • Semantic metadata attaches meaning to the data
    structures within the data object can enable a
    common framework that allows data to be shared
    across application, enterprise and/or community
    boundaries
  • Example publications, ontologies, Semantic Web

20
Challenge of Data Heterogeneity to Usability
  • Earth Science Data Characteristics
  • Many different formats, types and structures (50
    and counting at NCDC alone!)
  • Some formats lack metadata where as others are
    metadata rich
  • Enormous volumes
  • Heterogeneity leads to usabilty problems

21
Interoperability Accessing Heterogeneous Data
  • One approach Enforce a standard data format,
    but
  • Difficult to implement and enforce
  • Cant anticipate all needs
  • Some data cant be modeled or is lost in
    translation
  • Converting legacy data is costly
  • A better approach Interchange Technologies
  • Earth Science Markup Language

22
What is ESML?
  • It is a specialized markup language for Earth
    Science metadata based on XML - NOT another data
    format.
  • It is a machine-readable and -interpretable
    representation of the structure, semantics and
    content of any data file, regardless of data
    format
  • ESML description files contain external metadata
    that can be generated by either data producer or
    data consumer (at collection, data set, and/or
    granule level)
  • ESML provides the benefits of a standard,
    self-describing data format (like HDF, HDF-EOS,
    netCDF, geoTIFF, ) without the cost of data
    conversion
  • ESML is the basis for core Interchange Technology
    that allows data/application interoperability
  • ESML complements and extends data catalogs such
    as FGDC and GCMD by providing the use/access
    information those directories lack.
  • http//esml.itsc.uah.edu

23
How might we add semantics?Example Extending
ESML with Ontologies
  • ESML Schema provides structural metadata
  • Extend ESML schema by embedding semantic terms in
    the ESML Description File to provide a complete
    description of the data
  • Allow various science communities to create their
    own ontologies (for example, SWEET) and use them
    with ESML Description Files for their data

RULES DESCRIBING THE STRUCTURE OF THE DATA
DATA
ESMLSCHEMA
ESMLFILE
ONTOL- OGIES
TERMS DEFINING THE MEANING OF THE DATA
SEMANTIC PARSER (INFERENCE ENGINE)
CORE ESML LIBRARY
SMART APPLICATION/ SERVICES
24
Example Noesis semantic search using ontologies
1
2
4
  • Enter search term
  • Review ontology search results (Inheriting and
    synonymous concepts for the search term with
    definitions and links to AMS glossaries)
  • Select terms of interest
  • Review results from different search resources

3
25
How to create large metadata databases?
  • Organized Manual
  • Original Yahoo! is the classic case of metadata
    created by organizing an army of people to put in
    data manually
  • Organized Mechanical
  • Original AltaVista used a program to follow links
    and domain names and spidered the web, saving the
    information as it went
  • Another Interesting Option Volunteer Manual

26
Volunteer Metadata Creation Will it work for
geosciences?
  • CDDB Example
  • The CDDB database has information that allows
    your computer to identify a particular music CD
    in the CD drive and list its album title and
    track titles.
  • What CDDB does is let the software on your PC
    take that track information, send a CD signature
    to CDDB through Internet protocols (if you're
    connected) and get back the titles.
  • CDDB was created by getting track timing
    information and the titles typed in by a
    volunteer.
  • Only one person with each (even obscure) album
    needed to do this to build the database.
  • Bottom line increasing the value of the database
    by adding more metadata is a natural by-product
    of using the metadata database for ones own
    benefit.

27
Role of metadata in Open Space Discussions?
CONVERSATION
OPEN SPACE AREAS
OPEN SPACE
AREA
HOURS
Write a Comment
User Comments (0)
About PowerShow.com