Metadata for the Web From Discovery to Description - PowerPoint PPT Presentation

About This Presentation
Title:

Metadata for the Web From Discovery to Description

Description:

Metadata for the Web From Discovery to Description CS 502 20020226 Carl Lagoze Cornell University – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 45
Provided by: CarlL160
Category:

less

Transcript and Presenter's Notes

Title: Metadata for the Web From Discovery to Description


1
Metadata for the WebFrom Discovery to Description
  • CS 502 20020226
  • Carl Lagoze Cornell University

2
Co-existing Cost/Functionality Levels
Greater Functionality Cost
3
Dublin Core Qualifiers
  • From fuzzy buckets to more specific description
  • Model of graceful degradation
  • Support both simplicity and specificity
  • Intra-domain and inter-domain semantics

4
implied verb
one of 15 properties
property value (an appropriate literal)
DCCreator DCTitle DCSubject DCDate...
implied subject
Resource
has
property
X
qualifiers (adjectives)
optional qualifier
optional qualifier
5
Varieties of qualifiers Element Refinements
  • Make the meaning of an element narrower or more
    specific.
  • Narrowing implies an is a relationship
  • a "date created is a "date
  • an "is part of relation is a "relation
  • If your software does not understand the
    qualifier, you can safely ignore it.

6
Varieties of Qualifiers Value Encoding Schemes
  • Says that the value is
  • a term from a controlled vocabulary (e.g.,
    Library of Congress Subject Headings)
  • a string formatted in a standard way (e.g.,
    "2001-05-02" means May 3, not February 5)
  • Even if a scheme is not known by software, the
    value should be "appropriate" and usable for
    resource discovery.

7
Resource
has
Subject
"Languages -- Grammar"
LCSH
Resource
has
Date
"2000-06-13"
Revised
ISO8601
8
Dumb-Down Principle for Qualifiers
  • The fifteen elements should be usable and
    understandable with or without the qualifiers
  • Qualifiers refine meaning (but may be harder to
    understand)
  • Nouns can stand on their own without adjectives
  • If your software encounters an unfamiliar
    qualifier, look it up -- or just ignore it!
  • "has a relations break the model
  • E.g., a creator has a hair color

9
Test for good qualifiers cover and ask
-- Does the statement still make sense?
-- Is it still correct?
Resource
has
Subject
"Languages -- Grammar"
LCSH
Resource
has
Date
"2000-06-13"
Revised
ISO8601
10
Incorrect Qualification
Resource
has
creator
Cornell University
affiliation
Resource
has
subject
pre-schoolers
audience
11
Open questions in this model
  • Are uncontrolled and unconstrained values really
    useful for discovery?
  • Is it possible for an organization (DCMI) to
    control the evolution of a language?
  • How can "simple discovery metadata" be combined
    with complex descriptions? Is there a notion of
    graceful degradation?
  • Can DC serve as a lingua franca (mapping
    template) among more complex models

12
Models for Deploying Metadata
  • Embedded in the resource
  • low deployment threshold
  • Limited flexibility, limited model
  • Linked to from resource
  • Using xlink
  • Is there only one source of metadata?
  • Independent resource referencing resource
  • Model of accessing the object through its
    surrogate

13
Syntax AlternativesHTML
  • Advantages
  • Simple Mechanism META tags embedded in content
  • Widely deployed tools and knowledge
  • Disadvantages
  • Limited structural richness (wont support
    hierarchical,tree-structured data or entity
    distinctions).

14
Dublin Core in HTML
  • http//www.dublincore.org/documents/2000/08/15/dcq
    -html/
  • HTML constructs
  • ltlinkgt to establish pseudo-namespace
  • ltmetagt for metadata statements
  • name attribute for DC element (DC.element.ER)
  • content attribute for element value
  • scheme attribute for encoding scheme or
    controlled vocabulary
  • lang attribute for language of element value

15
Dublin Core in HTML example
ltlink rel"schema.DC" href"http//purl.org/dc/ele
ments/1.1"gt ltmeta name"DC.Title"
content"Business Unusualgtltmeta nameDC.Title
langes contentnegocio inusualgt ltmeta
name"DC.Creator" content"Carl Lagoze"gt ltmeta
name"DC.Subject" content"bibliographic control
web cataloging "gt ltmeta name"DC.Date.Created"
scheme"W3CDTF" content"2000-10-23"gt ltmeta
name"DC.Format" content"text/html"gt ltmeta
name"DC.Identifier" content"http//lcweb.loc
.gov/lagoze_paper.html"gt
16
Unqualified Dublin Core in XML
lt?xml version"1.0"?gt lt!DOCTYPE rdfRDF SYSTEM
"http//dublincore.org/2000/12/01-dcmes-xml-dtd.dt
d"gt ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/
22-rdf-syntax-ns" xmlnsdc"http//purl.
org/dc/elements/1.1/"gt ltrdfDescription
rdfabout"http//www.ilrt.bristol.ac.uk/people/cm
djb/"gt ltdctitlegtDave Beckett's Home
Pagelt/dctitlegt ltdccreatorgtDave
Beckettlt/dccreatorgt ltdcpublishergtILRT,
University of Bristollt/dcpublishergt
ltdcdategt2000-06-06lt/dcdategt
lt/rdfDescriptiongt lt/rdfRDFgt
http//www.dublincore.org/documents/2000/11/dcmes-
xml/
17
Example of Dublin Core Use
  • A map in the United States Library of Congress
    on-line American Memory Collection

18
Title
  • The name given to the resourcelt META
  • name DC.Title
  • content Novi Belgii Novæque
  • Angliænec non partis
  • Virginiæ tabula multis in
  • locis emendata
  • lang la gt

19
Creator
  • An entity primarily responsible for making the
    content of the resource
  • lt META
  • name DC.Creator
  • content Nicolaum Visscher
  • gt

20
Subject
  • The topic of the content of the resource
  • lt META
  • name DC.Subject
  • content Middle Atlantic States
  • scheme LCSH
  • gtlt META
  • name DC.Subject
  • content Maps
  • scheme LCSH
  • gtlt META
  • name DC.Subject
  • content Early works to 1800
  • scheme LCSH
  • gt

21
Description
  • An account of the content of the description
  • lt META
  • name DC.Description.Abstract
  • content An historical map showing
    the coast of New Jersey as
  • perceived in the seventeenth
  • century
  • gt

22
Publisher
  • An entity responsible for making the resource
    available
  • lt META
  • name DC.Publisher
  • content Library of Congress,
  • United States
  • gt

23
Contributor
  • An entity responsible for making contributions to
    the content of the resource.
  • lt META
  • name DC.Contributor
  • content Historic Urban Plans
  • gt

24
Date
  • A date associated with an event in the lifecycle
    of the resource
  • lt META
  • name DC.Date.Created
  • content 1996-04-17
  • scheme W3C-DTF
  • gt

25
Type
  • The nature or genre of the content of the
    resource
  • lt META
  • name DC.Type
  • content imagescheme DCMIType
  • gt

26
Format
  • The physical or digital manifestation of the
    resource
  • lt META
  • name DC.Format.Medium
  • content image/gif
  • scheme IMT
  • gtlt META
  • name DC.Format.Extent
  • content 556K
  • gt

27
Identifier
  • An unambiguous reference to the resource in the
    current context
  • lt META
  • name DC.Identifier
  • content http//loc.gov/coll1/img456.jpg
  • scheme URI
  • gt

28
Source
  • A reference to a resource from which the present
    resource is derived.
  • lt META
  • name DC.Source
  • content G3715 1685 .V5 1969 (LOC catalog )
  • gt

29
Language
  • Language of the intellectual content of the
    object
  • lt META
  • name DC.Language
  • content nlscheme ISO 639-2
  • gt

30
Relation
  • A reference to a related resource
  • lt META
  • name DC.Relation.isPartOf
  • content http//lcweb2.loc.gov/ammem/
  • gmdhtml/dsxpimg.html
  • scheme URIgt

31
Coverage
  • The extent or scope of the content of the
    resource
  • lt META
  • name DC.Coverage.Spatial
  • content New Jersey
  • scheme TGN"
  • gtlt META name DC.Coverage.Temporal
    content 1650 scheme
    W3C-DTFgt

32
Rights
  • Information about rights in and over the resource
  • lt META
  • name DC.Rights
  • content http//www.loc.gov/
  • rights_statement.htm
  • gt

33
Distributed ContentThe Metadata Challenge
  • From fixed, contained physical artifacts to
    fluid, distributed digital objects
  • Need for basis of trust and authenticity in
    network environment
  • Decentralization and specialization of resource
    description and need for mapping formalisms

34
Multi-entity nature of object description
35
Understanding Metadata based on Query Capabilities
  • Simple boolean tags?
  • CreatorTom Baker and Title contains Dublin
    Core
  • Agent, time, place questions?
  • Who was responsible for what and when and where

36
Attribute/Value approaches to metadata
The playwright of Hamlet was Shakespeare
Hamlet has a creator
Shakespeare
37
run into problems for richer descriptions
The playwright of Hamlet was Shakespeare,who was
born in Stratford
Hamlet has a creator
Stratford
birthplace
38
because of their failure to model entity
distinctions
Shakespeare
name
R1
R2
creator
birthplace
title
Stratford
Hamlet
39
Applying a Model-Centric Approach
  • Formally define common entities and relationships
    underlying multiple metadata vocabularies
  • Describe them (and their inter-relationships) in
    a simple logical model
  • Provide the framework for extending these common
    semantics to domain and application-specific
    metadata vocabularies.

40
Events are key to understanding metadata
relationships?
  • Modeling implied events as first-class objects
    provides attachment points for common entities
    e.g., agents, contexts (times places), roles.
  • Clarifying attachment points facilitates
    understanding and querying who was responsible
    for what when.

41
Content, Events, Descriptions
42
ABC/Harmony Event-aware metadata ontology
  • Recognizing inherent lifecycle aspects of
    description (esp. of digital content)
  • Modeling incorporates time (events and
    situations) as first-class objects
  • Supplies clear attachment points for agents,
    roles, existential properties
  • Resource description as a story-telling activity

43
Resource-centric Metadata
Title Anna Karenina
Author Leo Tolstoy
Illustrator Orest Vereisky
Translator Margaret Wettlin
Date Created 1877
Date Translated 1978
Description Adultery Depression
Birthplace Moscow
Birthdate 1828
44
(No Transcript)
45
Queries over complex descriptive graphs
  • Ability to ask questions like show me all the
    translations of War and Peace between 1980 and
    1990
Write a Comment
User Comments (0)
About PowerShow.com