Dublin Core and Emerging Conventions for a Semantic Web - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Dublin Core and Emerging Conventions for a Semantic Web

Description:

A metadata 'pidgin' for 'digital tourists' on a culturally diverse global Web ... Term set must evolve as implementors coin new terms and usage patterns emerge ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 35
Provided by: Thomas587
Category:

less

Transcript and Presenter's Notes

Title: Dublin Core and Emerging Conventions for a Semantic Web


1
Dublin Core and Emerging Conventions for a
Semantic Web
  • Thomas Baker
  • Fraunhofer-Gesellschaft, Bonn
  • ELPUB 2003, Guimaraes, Portugal
  • 26 June 2003

2
A particular set of metadata terms
  • Dublin Core as a simple and semantically generic
    lingua franca
  • Fifteen core elements Subject, Description,
    Title
  • A metadata "pidgin" for "digital tourists" on a
    culturally diverse global Web
  • Limited grammar, easy to learn and use
  • Enough "as is" for many needs
  • 33 "element refinements" and 17 "encoding
    schemes" to qualify the elements for specialized
    purposes
  • A small set of 12 resource types for use with
    dctype

3
A simple data model(resource with properties)
  • 1996-1998 Collective realization that
    machine-processability requires a coherent data
    model
  • 1996 Warwick Framework proposed at DC-2
    workshop DC as one specialized module (resource
    discovery)
  • 1997 Qualifiers proposed for specifying
    meanings
  • Some early adopters took this to unintended
    extremes DC.Creator.telephone-number
  • 1998 DCMI involvement in emerging Resource
    Description Framework, clarification of simple
    data model
  • 2000 First set of qualifiers approved

4
A typology of metadata terms ("grammar")
  • Elements
  • (core) properties of resources
  • Element Refinements
  • properties that semantically refine elements
  • Encoding Schemes
  • give context to a metadata value
  • Vocabulary Terms
  • constitute controlled lists of possible values

5
An emergent approach to"structured values"
  • Implementers sometimes "shoehorn" complex sets of
    information into a single value
  • Creator "nameTom, affiliationFHG,
    shoesize47"
  • In practice, a large variety of "structured
    values"
  • Labelled strings
  • Unlabelled strings
  • Marked-up strings (e.g., LaTex, HTML)
  • Secondary resource descriptions (as above)
  • Post-processing ad-hoc constructs is messy and
    does not scale
  • Andy Powell's model
  • Elements can have string values (Simple DC)
  • A further requirement to point to linked
    metadata?

6
A process for community standardization 10
  • 1995-1999 open workshops, unruly but stimulating
    meetings of minds, rough consensus
  • 2000 qualifier vote circa 25 voting members of
    an ad-hoc "Usage Committee"
  • 2001 smaller Usage Board
  • Codification of formal process for editorial
    control
  • Two two-day face-to-face meetings per year
  • Mandate and responsibility to maintain standard,
    approve extensions and clarifications

7
...based editorial review bya Usage Board
  • Term set must evolve as implementors coin new
    terms and usage patterns emerge
  • Working groups propose new terms or
    clarifications
  • Evaluate in light of grammatical principle,
    usefulness, clarity of definition, overlap with
    existing terms
  • Review application profiles based on Dublin Core
  • Tiered model of approval status conforming,
    recommended, obsolete, registered
  • Meeting materials, mailing lists, and decisions
    archived and accessible on the open Web
  • DCMI as maintenance agency for ISO 15836

8
A bias towards simple and generic
  • DCMI Usage Board bias
  • Strength and value of DC lies in simplicity and
    generic applicability
  • Keep the core standard small, generic, and
    lightweight
  • Resist temptation to "complexify" people want
    and need distinctions, but not in a "small
    standard"
  • DCMI Type Vocabulary has just 12 terms user
    communities should invent or re-use their own
    more specific sub-types

9
A bias towards cooperation and re-use
  • Help user communities define and use their own
    extensions
  • Cooperate with maintainers of specialized
    vocabularies on forms of mutual recognition
  • Provide a model for re-use

10
"Good neighbor" policies
  • MARC Relators (roles such as "adapter", "artist")
  • DCMI "use MARC Relators to refine
    dccontributor"
  • LoC's RDF schema "MARC Relators (identified with
    URIs) are sub-properties of dccontributor"
  • Encoding Schemes
  • DCMI term designates Library of Congress Subject
    Headings (http//purl.org/dc/terms/LCSH)
  • If LoC coins own term, DCMI should promote its
    use

11
A "namespace policy" 20
  • All DCMI metadata terms are given unique identity
    within three namespaces
  • http //purl.org/dc/elements/1.1/ - the core
    elements
  • http//purl.org/dc/terms/ - all other
    elements/qualifiers
  • http//purl.org/dc/dcmitype/ - a Type vocabulary
  • Example http//purl.org/dc/elements/1.1/title
  • Policy on long-term stability of namespace URIs
  • Changes not substantially semantic (i.e.,
    corrections) will not result in change of
    namespace URIs
  • Semantic changes must trigger a change of name
  • Version turnover of a document management
    nature will have no effect on namespace URIs

12
A typology of metadata vocabularies
  • Term declarations
  • Declare a unique set of elements and definitions
  • Each DCMI term is identified with a URI
  • Documented in HTML pages, formally declared as
    RDF schemas
  • Application profiles
  • Declare how an application uses which terms in
    its metadata
  • May mix-and-match from multiple namespaces

13
Why application profiles?
  • People want them!
  • Most standards have them IEEE/LOM, MARC, DOI...
  • As focus of dialogue and semantic negotiation
  • Deep human need to resist total standardization?
  • To identify emerging semantics "at the edges" of
    a standard
  • To know how colleagues and peers are designing
    metadata and avoid "reinventing the wheel"
  • To harmonize metadata usage within domains
  • User communities (DC-Libraries, DC-Government)
  • Subject gateways (Renardus)

14
Dublin Core application profiles
  • Declaration specifying which metadata terms an
    information provider uses in metadata
  • Identifies source of terms used
  • May provide additional documentation
  • Designed to promote interoperability within
    constraints of Dublin Core model
  • Draft guidelines sponsored by European
    Standardization Committee (CEN) to be progressed
    through DCMI process
  • http//www.cenorm.be/isss/Workshop/MMI-DC/applicat
    ion-profile-for-comment.pdf
  • Caution a documentary format cannot itself
    guarantee interoperability

15
A set of encoding practices
  • Guidelines for encoding metadata records (or
    embedded metadata) in HTML, XML, RDF
  • Use of rdfslabel and rdfsvalue allow nesting of
    secondary resource descriptions
  • A model for declaring terms "machine-processably"
    in RDF
  • Namespace Policy mandates this, though not
    specifically RDF
  • Work item a model for declaring application
    profiles machine-processably

16
CORES Resolution
17
Shared conventions fordeclaring namespaces? 30
  • Cross-community consensus-building
  • W3C metadata standards and URIs as a basis for
    interoperability among different standards?
  • EU CORES Project (2002-2003)
  • Identify and explore areas of possible agreement
    among major standards initiatives
  • Interoperability Forum meeting in Brussels,
    November 2002

18
CORES Resolution on Identifying Metadata Elements
  • http//www.cores-eu.net/interoperability/cores-res
    olution/
  • Whereas
  • Our metadata standards have elements units of
    meaning comparable and mappable to elements of
    other standards,
  • We agree
  • To assign Uniform Resource Identifiers to our
    elements
  • To articulate and publish specific policies
    regarding the stability, persistence, and
    maintenance of the URIs assigned to the elements.

19
Clarifications to theCORES Resolution
  • URIs not necessarily used in applications "as is"
  • In metadata records, maybe dccontributor instead
    of http//purl.org/dc/elements/1.1/contributor
  • Signatories decide what to identify with URIs
  • An individual element? An entire set of
    elements? A specific historical version of an
    element?
  • No implication that URIs will "resolve" to
    anything
  • URIs may "get" something with HTTP on Web or
    not!
  • E.g., resolve to a database query?
  • Resolve to an RDF schema?
  • Or even resolve to nothing at all ("file not
    found")!!

20
Signatories
  • Eliot Christian, USGS, for GILS
  • Brian Green, EDItEUR, for ONIX
  • Rebecca Guenther, Library of Congress, for MARC21
  • Keith Jeffery, EuroCRIS, for CERIF
  • Norman Paskin, Intl DOI Foundation, for DOI
  • Robby Robson, IEEE LTSC, for IEEE/LOM
  • Stuart Weibel, DCMI, for Dublin Core

21
Signatories Action Plan
  • Action plan, November 2002 May 2003
  • Define and publish URI assignment mechanisms
  • Assign URIs to elements
  • Publish URI persistence policies
  • Article on follow-up scheduled for D-Lib Magazine
    in July 2003 issue
  • Taken as a whole, corpus of good-practice
    policies for others to discuss and emulate

22
Beyond the CORES Resolution 40
  • Benefits for signatories
  • Important first step towards future
    interoperability applications (e.g., mapping,
    conversion)
  • Improve "citability" of elements between
    standards
  • Potential areas of further work
  • Provide persistent URIs for terms in taxonomies
    and ontologies
  • Shared conventions on declaring URIs in
    machine-processable forms
  • Shared conventions for application profiles and
    mapping constructs
  • Shared ontologies as targets for mapping

23
What exactly is being identified?
  • Is a particular term the same when used in
    different contexts?
  • A single term in a flat namespace?
  • http//ltsc.ieee.org/LOM/Identifier
  • Or two terms in a flat namespace?
  • http//ltsc.ieee.org/LOM/GeneralIdentifier
  • http//ltsc.ieee.org/LOM/MetadataIdentifier
  • Or two terms in a hierarchical namespace?
  • http//ltsc.ieee.org/LOM/General/Identifier
  • http//ltsc.ieee.org/LOM/Metadata/Identifier

24
What exactly is being identified?
  • For purposes of identification, is a term "the
    same" through successive versions?
  • At first, DC reflected version in the URI
  • http//purl.org/dc/elements/1.1/title
  • Then decided to keep URIs stable and define the
    limits of change in the Namespace Policy
  • http//purl.org/dc/terms/audience
  • URIs for DC 1.1 kept for legacy reasons
  • URIs for successive versions of a term used
    "behind the scenes" for tracking changes

25
Publishing and documentinga vocabulary
26
A method for maintaining (and versioning) a
vocabulary
  • Assume that vocabularies must evolve
  • Anticipate need to understand discrete states of
    the standard
  • All documents, decisions, and term declarations
    must evolve
  • Versioning to support future automated methods
    for processing legacy metadata
  • Numbered decisions linked to
  • A specific historical version of a term
  • Supporting documentation for the decision
  • Historical record of the Usage Board meeting

27
Modes for publishing a vocabulary
  • Multiple publication formats needed
  • Web pages for human use
  • RDF schemas for expressing relationships between
    terms in machine-processable form
  • OWL ontologies and rules languages will improve
    expressivity of these constructs
  • Future schemas may need to express versioning
    machine-processably
  • Workflow
  • Web pages and schemas from a common source
  • XML data XSLT scripts simple, effective

28
A searchable "registry" of terms 50
  • DCMI Registry
  • Searchable database of metadata terms
  • Terms translated into various languages
  • Goal application interface for Web services
  • Goal harvest schemas directly from their
    maintainers
  • An ecology of registries?
  • Harvest and merge element sets, vocabularies,
    profiles
  • For general overviews SCHEMAS, CORES
  • Specific domains MEG, GEM (education), FAO
    (agriculture)
  • Publication environment for information models
  • Tool for harmonization, mapping, conversion,
    merging

29
The evolving Web context
30
The Web as a new social context
  • Something new in history
  • Not just an historical set of technologies (HTTP,
    URLs, HTML)
  • Platform for historically unprecedented forms of
    social and intellectual interaction
  • Metadata as language for the Web
  • A language for statements about Web resources
  • Statements created and used both by humans and by
    machines
  • "Semantic Web" is about describing how resources
    relate to each other

31
Scale and automation
  • The Web is too big to control
  • Metadata statements are expensive to make and
    maintain
  • Shift away from the metaphor of "library"?
  • NSF workshop on "Post Digital Library Futures"
  • http //www.sis.pitt.edu/dlwkshop/
  • Automated resource discovery (e.g. Google)
  • Using contextual information (e.g., URL
    structures) to infer "aboutness"
  • Natural-language technology, e.g. summarization

32
An evolving role for metadata
  • Balance between human and machine
  • Automated methods to generate metadata
  • "Let Google do it" versus expert intervention
  • Granularity of metadata
  • Describe each item or entire collections?
  • How much metadata is "enough" to improve
    discovery?
  • Semantic precision or tolerance of fuzziness?

33
Which aspects of Dublin Core willprove most
useful over time?
  • The elements and related sets of terms
  • Open processes for community standardization
  • Editorial review by a Usage Board
  • A bias toward simple and generic metadata
  • A bias toward cooperative re-use of vocabularies
  • The etiquette of mutual recognition
  • A namespace policy for using URIs
  • A typology of vocabularies (e.g. application
    profiles)
  • A set of encoding practices (HTML, XML, RDF)
  • Methods for maintaining and versioning a
    vocabulary
  • Publishing a vocabulary for humans and machines
  • Searchable registries of metadata terms

34
thomas.baker_at_bi.fhg.de
Write a Comment
User Comments (0)
About PowerShow.com