Title: Dublin Core and Emerging Conventions for a Semantic Web
1Dublin Core and Emerging Conventions for a
Semantic Web
- Thomas Baker
- Fraunhofer-Gesellschaft, Bonn
- ELPUB 2003, Guimaraes, Portugal
- 26 June 2003
2A particular set of metadata terms
- Dublin Core as a simple and semantically generic
lingua franca - Fifteen core elements Subject, Description,
Title - A metadata "pidgin" for "digital tourists" on a
culturally diverse global Web - Limited grammar, easy to learn and use
- Enough "as is" for many needs
- 33 "element refinements" and 17 "encoding
schemes" to qualify the elements for specialized
purposes - A small set of 12 resource types for use with
dctype
3A simple data model(resource with properties)
- 1996-1998 Collective realization that
machine-processability requires a coherent data
model - 1996 Warwick Framework proposed at DC-2
workshop DC as one specialized module (resource
discovery) - 1997 Qualifiers proposed for specifying
meanings - Some early adopters took this to unintended
extremes DC.Creator.telephone-number - 1998 DCMI involvement in emerging Resource
Description Framework, clarification of simple
data model - 2000 First set of qualifiers approved
4A typology of metadata terms ("grammar")
- Elements
- (core) properties of resources
- Element Refinements
- properties that semantically refine elements
- Encoding Schemes
- give context to a metadata value
- Vocabulary Terms
- constitute controlled lists of possible values
5An emergent approach to"structured values"
- Implementers sometimes "shoehorn" complex sets of
information into a single value - Creator "nameTom, affiliationFHG,
shoesize47" - In practice, a large variety of "structured
values" - Labelled strings
- Unlabelled strings
- Marked-up strings (e.g., LaTex, HTML)
- Secondary resource descriptions (as above)
- Post-processing ad-hoc constructs is messy and
does not scale - Andy Powell's model
- Elements can have string values (Simple DC)
- A further requirement to point to linked
metadata?
6A process for community standardization 10
- 1995-1999 open workshops, unruly but stimulating
meetings of minds, rough consensus - 2000 qualifier vote circa 25 voting members of
an ad-hoc "Usage Committee" - 2001 smaller Usage Board
- Codification of formal process for editorial
control - Two two-day face-to-face meetings per year
- Mandate and responsibility to maintain standard,
approve extensions and clarifications
7...based editorial review bya Usage Board
- Term set must evolve as implementors coin new
terms and usage patterns emerge - Working groups propose new terms or
clarifications - Evaluate in light of grammatical principle,
usefulness, clarity of definition, overlap with
existing terms - Review application profiles based on Dublin Core
- Tiered model of approval status conforming,
recommended, obsolete, registered - Meeting materials, mailing lists, and decisions
archived and accessible on the open Web - DCMI as maintenance agency for ISO 15836
8A bias towards simple and generic
- DCMI Usage Board bias
- Strength and value of DC lies in simplicity and
generic applicability - Keep the core standard small, generic, and
lightweight - Resist temptation to "complexify" people want
and need distinctions, but not in a "small
standard" - DCMI Type Vocabulary has just 12 terms user
communities should invent or re-use their own
more specific sub-types
9A bias towards cooperation and re-use
- Help user communities define and use their own
extensions - Cooperate with maintainers of specialized
vocabularies on forms of mutual recognition - Provide a model for re-use
10"Good neighbor" policies
- MARC Relators (roles such as "adapter", "artist")
- DCMI "use MARC Relators to refine
dccontributor" - LoC's RDF schema "MARC Relators (identified with
URIs) are sub-properties of dccontributor" - Encoding Schemes
- DCMI term designates Library of Congress Subject
Headings (http//purl.org/dc/terms/LCSH) - If LoC coins own term, DCMI should promote its
use
11A "namespace policy" 20
- All DCMI metadata terms are given unique identity
within three namespaces - http //purl.org/dc/elements/1.1/ - the core
elements - http//purl.org/dc/terms/ - all other
elements/qualifiers - http//purl.org/dc/dcmitype/ - a Type vocabulary
- Example http//purl.org/dc/elements/1.1/title
- Policy on long-term stability of namespace URIs
- Changes not substantially semantic (i.e.,
corrections) will not result in change of
namespace URIs - Semantic changes must trigger a change of name
- Version turnover of a document management
nature will have no effect on namespace URIs
12A typology of metadata vocabularies
- Term declarations
- Declare a unique set of elements and definitions
- Each DCMI term is identified with a URI
- Documented in HTML pages, formally declared as
RDF schemas - Application profiles
- Declare how an application uses which terms in
its metadata - May mix-and-match from multiple namespaces
13Why application profiles?
- People want them!
- Most standards have them IEEE/LOM, MARC, DOI...
- As focus of dialogue and semantic negotiation
- Deep human need to resist total standardization?
- To identify emerging semantics "at the edges" of
a standard - To know how colleagues and peers are designing
metadata and avoid "reinventing the wheel" - To harmonize metadata usage within domains
- User communities (DC-Libraries, DC-Government)
- Subject gateways (Renardus)
14Dublin Core application profiles
- Declaration specifying which metadata terms an
information provider uses in metadata - Identifies source of terms used
- May provide additional documentation
- Designed to promote interoperability within
constraints of Dublin Core model - Draft guidelines sponsored by European
Standardization Committee (CEN) to be progressed
through DCMI process - http//www.cenorm.be/isss/Workshop/MMI-DC/applicat
ion-profile-for-comment.pdf - Caution a documentary format cannot itself
guarantee interoperability
15A set of encoding practices
- Guidelines for encoding metadata records (or
embedded metadata) in HTML, XML, RDF - Use of rdfslabel and rdfsvalue allow nesting of
secondary resource descriptions - A model for declaring terms "machine-processably"
in RDF - Namespace Policy mandates this, though not
specifically RDF - Work item a model for declaring application
profiles machine-processably
16CORES Resolution
17Shared conventions fordeclaring namespaces? 30
- Cross-community consensus-building
- W3C metadata standards and URIs as a basis for
interoperability among different standards? - EU CORES Project (2002-2003)
- Identify and explore areas of possible agreement
among major standards initiatives - Interoperability Forum meeting in Brussels,
November 2002
18CORES Resolution on Identifying Metadata Elements
- http//www.cores-eu.net/interoperability/cores-res
olution/ - Whereas
- Our metadata standards have elements units of
meaning comparable and mappable to elements of
other standards, - We agree
- To assign Uniform Resource Identifiers to our
elements - To articulate and publish specific policies
regarding the stability, persistence, and
maintenance of the URIs assigned to the elements.
19Clarifications to theCORES Resolution
- URIs not necessarily used in applications "as is"
- In metadata records, maybe dccontributor instead
of http//purl.org/dc/elements/1.1/contributor - Signatories decide what to identify with URIs
- An individual element? An entire set of
elements? A specific historical version of an
element? - No implication that URIs will "resolve" to
anything - URIs may "get" something with HTTP on Web or
not! - E.g., resolve to a database query?
- Resolve to an RDF schema?
- Or even resolve to nothing at all ("file not
found")!!
20Signatories
- Eliot Christian, USGS, for GILS
- Brian Green, EDItEUR, for ONIX
- Rebecca Guenther, Library of Congress, for MARC21
- Keith Jeffery, EuroCRIS, for CERIF
- Norman Paskin, Intl DOI Foundation, for DOI
- Robby Robson, IEEE LTSC, for IEEE/LOM
- Stuart Weibel, DCMI, for Dublin Core
21Signatories Action Plan
- Action plan, November 2002 May 2003
- Define and publish URI assignment mechanisms
- Assign URIs to elements
- Publish URI persistence policies
- Article on follow-up scheduled for D-Lib Magazine
in July 2003 issue - Taken as a whole, corpus of good-practice
policies for others to discuss and emulate
22Beyond the CORES Resolution 40
- Benefits for signatories
- Important first step towards future
interoperability applications (e.g., mapping,
conversion) - Improve "citability" of elements between
standards - Potential areas of further work
- Provide persistent URIs for terms in taxonomies
and ontologies - Shared conventions on declaring URIs in
machine-processable forms - Shared conventions for application profiles and
mapping constructs - Shared ontologies as targets for mapping
23What exactly is being identified?
- Is a particular term the same when used in
different contexts? - A single term in a flat namespace?
- http//ltsc.ieee.org/LOM/Identifier
- Or two terms in a flat namespace?
- http//ltsc.ieee.org/LOM/GeneralIdentifier
- http//ltsc.ieee.org/LOM/MetadataIdentifier
- Or two terms in a hierarchical namespace?
- http//ltsc.ieee.org/LOM/General/Identifier
- http//ltsc.ieee.org/LOM/Metadata/Identifier
24What exactly is being identified?
- For purposes of identification, is a term "the
same" through successive versions? - At first, DC reflected version in the URI
- http//purl.org/dc/elements/1.1/title
- Then decided to keep URIs stable and define the
limits of change in the Namespace Policy - http//purl.org/dc/terms/audience
- URIs for DC 1.1 kept for legacy reasons
- URIs for successive versions of a term used
"behind the scenes" for tracking changes
25Publishing and documentinga vocabulary
26A method for maintaining (and versioning) a
vocabulary
- Assume that vocabularies must evolve
- Anticipate need to understand discrete states of
the standard - All documents, decisions, and term declarations
must evolve - Versioning to support future automated methods
for processing legacy metadata - Numbered decisions linked to
- A specific historical version of a term
- Supporting documentation for the decision
- Historical record of the Usage Board meeting
27Modes for publishing a vocabulary
- Multiple publication formats needed
- Web pages for human use
- RDF schemas for expressing relationships between
terms in machine-processable form - OWL ontologies and rules languages will improve
expressivity of these constructs - Future schemas may need to express versioning
machine-processably - Workflow
- Web pages and schemas from a common source
- XML data XSLT scripts simple, effective
28A searchable "registry" of terms 50
- DCMI Registry
- Searchable database of metadata terms
- Terms translated into various languages
- Goal application interface for Web services
- Goal harvest schemas directly from their
maintainers - An ecology of registries?
- Harvest and merge element sets, vocabularies,
profiles - For general overviews SCHEMAS, CORES
- Specific domains MEG, GEM (education), FAO
(agriculture) - Publication environment for information models
- Tool for harmonization, mapping, conversion,
merging
29The evolving Web context
30The Web as a new social context
- Something new in history
- Not just an historical set of technologies (HTTP,
URLs, HTML) - Platform for historically unprecedented forms of
social and intellectual interaction - Metadata as language for the Web
- A language for statements about Web resources
- Statements created and used both by humans and by
machines - "Semantic Web" is about describing how resources
relate to each other
31Scale and automation
- The Web is too big to control
- Metadata statements are expensive to make and
maintain - Shift away from the metaphor of "library"?
- NSF workshop on "Post Digital Library Futures"
- http //www.sis.pitt.edu/dlwkshop/
- Automated resource discovery (e.g. Google)
- Using contextual information (e.g., URL
structures) to infer "aboutness" - Natural-language technology, e.g. summarization
32An evolving role for metadata
- Balance between human and machine
- Automated methods to generate metadata
- "Let Google do it" versus expert intervention
- Granularity of metadata
- Describe each item or entire collections?
- How much metadata is "enough" to improve
discovery? - Semantic precision or tolerance of fuzziness?
33Which aspects of Dublin Core willprove most
useful over time?
- The elements and related sets of terms
- Open processes for community standardization
- Editorial review by a Usage Board
- A bias toward simple and generic metadata
- A bias toward cooperative re-use of vocabularies
- The etiquette of mutual recognition
- A namespace policy for using URIs
- A typology of vocabularies (e.g. application
profiles) - A set of encoding practices (HTML, XML, RDF)
- Methods for maintaining and versioning a
vocabulary - Publishing a vocabulary for humans and machines
- Searchable registries of metadata terms
34thomas.baker_at_bi.fhg.de