An%20introduction%20to%20metadata%20for%20libraries,%20museums%20and%20archives%20Metadata%20in%20Digital%20Libraries,%20DELOS%20meeting,%20Riga,%20Latvia,%2016%20April%202003 - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

An%20introduction%20to%20metadata%20for%20libraries,%20museums%20and%20archives%20Metadata%20in%20Digital%20Libraries,%20DELOS%20meeting,%20Riga,%20Latvia,%2016%20April%202003

Description:

An introduction to metadata. for libraries, museums and archives ... Memory institutions, network ... Broad-brush minimalism or comprehensive structuralism? ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: An%20introduction%20to%20metadata%20for%20libraries,%20museums%20and%20archives%20Metadata%20in%20Digital%20Libraries,%20DELOS%20meeting,%20Riga,%20Latvia,%2016%20April%202003


1
An introduction to metadata for libraries,
museums and archives Metadata in Digital
Libraries, DELOS meeting, Riga, Latvia, 16 April
2003
  • Pete Johnston
  • UKOLN, University of Bath
  • Bath, BA2 7AY

p.johnston_at_ukoln.ac.uk http//www.ukoln.ac.uk/
UKOLN is supported by
2
Section 1 An Introduction to Metadata
3
An introduction to Metadata
  • Memory institutions, network services and
    metadata
  • What is metadata?
  • Exposing/sharing metadata
  • Exposing/sharing metadata semantics
  • the Dublin Core Metadata Initiative

4
Memory institutions, network services and metadata
5
Memory institutions
  • Museums, libraries and archivesoften called
    memory institutionsare trusted organizations
    that collectively document the entire range of
    human experience and expression. Memory
    institutions are engaged in the important work
    of
  • Capturing, authenticating, and making sense of
    cultural memory
  • Preserving the human record for future
    generations and
  • Sharing knowledge to support education and
    learning.

http//www.ukoln.ac.uk/interop-focus/ccs/positions
/
6
Delivering services
  • Memory institutions provide services to users
  • (At least some of) these services provide access
    to resources
  • Emergence of built on global networks
  • remote access to digital resources for all
    (potentially)
  • resources available round the clock
  • resources comparable to other digital resources
    from elsewhere
  • Investment in
  • digitisation of cultural content
  • network services providing access to digitised
    content

7
Delivering services
  • Potential for new types of service
  • digital libraries, virtual museums etc
  • integrated access to resources from multiple
    remote content providers
  • services defined by theme/subject/activity/audienc
    e etc, not by location/source
  • packaging and re-purposing of content
  • user-oriented rather than provider-oriented
  • Changing user expectations
  • user wants information relevant to task/activity
  • may see structural/organisational boundaries of
    content providers as unimportant!
  • user wants access from any location
  • user wants access at any time

8
Delivering services
  • Move from web sites to portals
  • A network service that provides a personalised,
    single point of access to a range of
    heterogeneous network services, local and remote,
    structured and unstructured
  • Andy Powell, 2002
  • Content providers exposing content for delivery
    through multiple services, channels
  • Presentation services surfacing content from
    multiple (distributed) sources
  • Memory institutions may perform both roles
  • Move away from silo mentality towards more
    joined-up approaches

9
Resource discovery on the Web
  • Broadly two approaches to providing discovery
    services
  • software indexing of resource content
  • human description of resources
  • Web search engines
  • software agents (robots) retrieve documents by
    following hyperlinks (crawling)
  • index text of documents
  • make index available as searchable database
  • some clever ranking algorithms
  • e.g. Google infers Page Ranking based on links
    to document
  • find pages which link to page X
  • find pages similar to X

10
Resource discovery on the Web
  • Web search engines
  • tend to generate many results
  • and may suffer from spamming
  • ranking algorithms may help
  • dont support structured search
  • search on author name
  • search on document type (journal article)
  • limited to textual resources
  • generally, poor support for search for multimedia
    objects
  • The hidden Web
  • robots may not crawl documents dynamically
    generated from databases/CMS

11
Resource discovery on the Web
  • But automated indexing
  • is low cost
  • At least compared to human resource description
  • (usually) scales to large numbers of resources
  • can be a useful tool!
  • Challenge of finding appropriate balance of
    approaches for context

12
Metadata for services
  • Metadata has been important to traditional
    service provision
  • is essential component of effective network
    services

13
What is metadata?
14
What is metadata?
  • Simple definitions
  • Structured data about data.
  • Dublin Core Metadata Initiative FAQ, 2003
  • Machine-understandable information about Web
    resources or other things.
  • Tim Berners-Lee, W3C, 1997

15
Towards a functional view of metadata
  • Data associated with objects which relieves their
    potential users of having to have full advance
    knowledge of their existence or characteristics.
    A user might be a program or a person.
  • Lorcan Dempsey Rachel Heery, 1998
  • Structured data about resources that can be used
    to help support a wide range of operations
  • Michael Day, 2001

16
What resources, objects, things?
  • Metadata might exist for almost anything
  • digital, physical, abstract resources
  • HTML documents
  • digital images
  • databases
  • books
  • museum objects
  • archival records
  • metadata records
  • Web sites
  • collections
  • services
  • physical places
  • people
  • institutions
  • abstract works
  • concepts
  • events

17
What resources, objects, things?
  • Metadata records include
  • bibliographic records in library catalogues or
    from abstracting indexing services
  • descriptions of archival material in archival
    finding aids
  • object records in museum documentation /
    collection management systems
  • entries in directories of organisations,
    individuals and services
  • descriptions of digital objects (documents,
    images, software)
  • descriptions of collections of digital objects
  • descriptions of network services
  • descriptions of metadata records

18
What operations?
  • Operations by human users, software tools
  • Metadata might be used to support many different
    functions
  • resource disclosure discovery
  • resource management, including preservation
  • intellectual property rights management
  • commerce
  • authentication and authorisation
  • personalisation and localisation of services
  • Different functions require different
    types/classes of metadata
  • No one size fits all solution
  • Need to specify functional requirements

19
Metadata elements element sets
  • Metadata describes attributes or properties of a
    resource
  • Each attribute or property is described by a
    metadata element
  • Can be identified, formally documented/defined
  • May be represented in different forms
  • A metadata element set
  • coherent bounded set of elements formulated as
    basis for metadata creation
  • created for purpose, as a unit
  • Schema
  • structured representation of an element set

20
Metadata for resource discovery
  • User wishes to
  • discover resources according to some criteria
  • (optionally) identify a specific resource
  • confirm that resource described is resource
    sought
  • distinguish similar resources
  • select
  • evaluate, choose resource appropriate to needs
  • locate resource
  • obtain/access resource
  • use resource
  • open, read, display, run, play, copy,
    unpackage/repackage
  • interpret content
  • Resource discovery metadata supporting
    (primarily) operations 1 - 4

21
Metadata for resource discovery
Continuum of complexity/functionality
full-text indexes might not be classed as metadata by some! generated by software tools discovery (by content), location
semantically simple forms (e.g. Dublin Core) typically covering description of broad range of resources maybe part generated automatically, partly human authored discovery, identification, selection, location
richer complex forms (e.g. MARC, EAD, CIMI-SPECTRUM, AMICO etc) typically covering specific types of resources often associated with particular community/domain creation may involve relatively high degree of human expertise discovery, identification, selection, location, access, use (which may be type specific)
22
Association of resource and metadata (1)
Metadata embedded in resource
e.g. meta elements in HTML docs summary
properties in word processor docs Can resource
support embedding of metadata? Does metadata
creator have write access to resource? Can
service extract embedded metadata? Metadata about
aggregates of resources? Metadata about people,
places, concepts?
23
Association of resource and metadata (2)
Metadata record as separate object Record
identifier embedded in resource
e.g. link elements in HTML docs Metadata record
may be remote from resource Can resource support
embedding of link? Does metadata creator have
write access to resource? Can service follow link
to metadata record? What happens when resource
deleted? Metadata about aggregates of
resources? Metadata about people, places,
concepts?
24
Association of resource and metadata (3)
Metadata record as separate object Resource
identifier in metadata record
Metadata record may be remote from resource Does
not require embedding of metadata or link Does
not require metadata creator to have write access
to resource Metadata record created independently
of resource possibly multiple records Service
uses metadata records independently of
resource Metadata record may persist after
resource deleted Metadata record can describe
anything (with identifier)
25
Metadata as managed resource
Metadata record is used separately from resource
described Recognition that metadata is resource
to be managed, separately from resource
described Metadata content stored in database,
exposed in form(s) appropriate for service(s)
26
Exposing/sharing metadata
27
How is metadata exposed/shared?
  • Resource description communities
  • characterised by consensus on conventions for
    internal exchange of metadata
  • Metadata for resource discovery
  • is used beyond its creator community
  • is combined/compared with metadata from other
    communities
  • is aggregated or cross-searched by services
  • How does a content provider make metadata records
    available in a commonly understood form?
  • How does a service provider obtain these metadata
    records from data providers?

28
How is metadata exposed/shared?
  • Effective sharing of information expressed in
    metadata record requires agreement on
  • metadata semantics
  • what metadata elements mean
  • metadata structure
  • data model, relationships of component parts
  • metadata syntax
  • rules of expression
  • protocols
  • how metadata records transmitted between content
    provider and service provider
  • Agreements formalised as specifications and
    standards (ideally)

29
Exposing/sharing metadata semantics Introducing
the Dublin Core
30
Introducing the Dublin Core
  • Initiative to improve resource discovery on Web
  • not for complex resource description
  • based on description of simple document-like
    objects
  • extended to other classes of resource
  • International, cross-disciplinary consensus on
    simple element set
  • 15 elements
  • all optional
  • all repeatable

http//dublincore.org/
31
Introducing the Dublin Core (2)
  • Title
  • Subject
  • Description
  • Creator
  • Publisher
  • Contributor
  • Date
  • Type
  • Format
  • Identifier
  • Source
  • Language
  • Relation
  • Coverage
  • Rights

32
Dublin Core creator
  • Term Name creator
  • Label Creator
  • Definition An entity primarily responsible for
    making the content of the resource.
  • Comment Examples of a Creator include a person,
    an organisation, or a service. Typically, the
    name of a Creator should be used to indicate the
    entity.
  • Type of Term element
  • Status recommended
  • Date issued 1999-07-02
  • URI http//purl.org/dc/elements/1.1/creator

33
Dublin Core date
  • Term Name date
  • Label Date
  • Definition A date associated with an event in
    the life cycle of the resource.
  • Comment Typically, Date will be associated with
    the creation or availability of the resource.
    Recommended best practice for encoding the date
    value is defined in a profile of ISO 8601
    W3CDTF and follows the YYYY-MM-DD format.
  • Type of Term element
  • Status recommended
  • Date issued 1999-07-02
  • URI http//purl.org/dc/elements/1.1/date

34
Standardisation of Dublin Core
  • CEN Workshop Agreement (EU)
  • 2000 Dublin Core elements endorsed as CWA13874
  • Usage guidelines for European industry
  • NISO Z39.85 (USA)
  • 2001 National Information Standards
    Organization, an ANSI affiliate
  • ISO
  • 2002 Dublin Core Metadata Element Set approved
    as ISO 15836

35
Using the Dublin Core
  • Tom Baker, A Grammar of Dublin Core, Dlib,
    October 2000
  • Metaphor of metadata as language
  • DC as a simple pidgin language for use by
    tourists on the Internet commons
  • Small vocabulary, simple grammar/structure
  • This Resource has Title An introduction to
    metadata
  • This Resource has Subject Resource discovery
  • Not subtly expressive, but easy to learn and
    deploy - good enough to work

36
Using the Dublin Core
  • Designed for simplicity of semantics, ease of use
  • Provides basic semantic interoperability
  • semantics sufficiently general to be useful
    across domains
  • Can provide 15 windows into richer resource
    descriptions
  • disclose rich description in simple form
  • semantic cross-walks, mappings

37
Using the Dublin Core
title
creator
date
desc
rights
Simple DC description
Rich description
38
Qualifying Dublin Core
  • Allows for controlled extensibility through
    qualifiers
  • Element refinements
  • make element meanings narrower, more specific
  • a Date Created versus Date Modified
  • an IsReplacedBy versus Replaces Relation
  • Encoding schemes
  • provide contextual information or parsing rules
    that aid in the interpretation of a value
  • may specify that a value is drawn from a
    controlled vocabulary (e.g. LCSH, TGN etc)
  • may specify that a value is formatted in
    accordance with a specified notation (e.g. date
    formats)

39
Qualifying Dublin Core
  • Qualifiers make elements more specific
  • Element Refinments narrow meanings, never extend
  • Encoding Schemes give context to element values
  • The dumb-down rule
  • Application should be able to use the value as if
    it were unqualified
  • Ignore unknown Encoding Schemes
  • Resolve (semantically more specific) Element
    Refinements to (more generic) Elements
  • Some loss of specificity, but still generally
    correct and useful for discovery

40
Dublin Core valid
  • Term Name valid
  • Label Valid
  • Definition Date (often a range) of validity of a
    resource.
  • Type of Term element-refinement
  • Status recommended
  • Date issued 2000-07-11
  • URI http//purl.org/dc/terms/valid

41
Using the Dublin Core
  • Not a replacement for richer descriptive
    standards
  • But useful
  • If you wish disclose community-specific metadata
    to other communities using commonly understood
    semantics
  • If you wish to provide integrated access to your
    own metadata databases with different underlying
    semantics
  • If you only need simple metadata semantics

42
Using the Dublin Core
  • Inherent tensions in DC
  • Broad, fuzzy search buckets or rigidly
    prescribed usage?
  • Generic applicability across domains or
    intra-domain precision?
  • One-size-fits-all or customise-as-you-please?
  • Simply discovering resources (a few typical
    search attributes) or describing them fully (lots
    of detail)?
  • Dublin Core primarily as a native record format
    or extracted from richer metadata?
  • Broad-brush minimalism or comprehensive
    structuralism?

43
Summary
  • Emergence of global networks enable new
    approaches to providing access to resources
  • Increasing requirement to provide resource
    discovery across boundaries
  • Metadata supports many functions, including
    resource discovery
  • DC as simple, cross-disciplinary metadata element
    set
  • Next
  • How metadata records are represented
    syntax/structure
  • How metadata records are exposed/shared/used in
    resource discovery services

44
Section 2 Sharing metadata XML and the OAI
Protocol for Metadata Harvesting
45
Sharing metadata XML and OAI
  • Exposing/sharing metadata syntax and structure
  • Extensible Markup Language (XML)
  • XML Schema
  • Metadata harvesting
  • The Open Archives Initiative Protocol for
    Metadata Harvesting
  • Some OAI-based services
  • Developing metadata-based services

46
Exposing/sharing metadata syntax and
structure XML XML Schema
47
Embedding DC metadata in (X)HTML
  • Dublin Core metadata can be embedded into (X)HTML
    documents
  • Simple to deploy but may be difficult to manage,
    maintain
  • But almost none of the Web search engine services
    index it
  • Lack of trust in open Web context
  • Abuse by content providers seeking to improve the
    ranking of their documents
  • However, may be useful technique in closed
    context
  • e.g. single Web site or where control over which
    documents indexed

48
Embedding DC metadata in (X)HTML
lthtml xmlns"http//www.w3.org/1999/xhtml"gt ltheadgt
ltlink rel"schema.DC" href"http//purl.org/dc/el
ements/1.1/" /gt ltmeta name"DC.Title" lang"en"
content"Expressing Qualified Dublin Core in
HTML/XHTML meta elements" /gt ltmeta
name"DC.Creator" content"Andy Powell, UKOLN,
University of Bath" /gt ltmeta name"DC.Date.Issued
" scheme"W3CDTF" content"2002-09-09" /gt ltmeta
name"DC.Identifier" scheme"URI"
content"http//dublincore.org/documents/dcq-html/
" /gt ltmeta name"DC.Format" scheme"IMT"
content"text/html" /gt ltmeta name"DC.Type"
scheme"DCMIType" content"Text" /gt
lt/headgt ltbodygt lt/bodygt lt/htmlgt
49
Introducing XML
  • Extensible Markup Language
  • Recommendation of W3C, 1998, 2000
  • Defines means of describing tree-structured data
    in text-based format
  • embedded markup delimits and describes data
  • Simple, platform-independent syntax
  • Standard programming interfaces
  • reusable software components
  • Support from major software vendors
  • Widely adopted for transferring data between
    programs, systems

50
Creator
Date
Title
Doc
J Smith
2001-11-05
Report
1
lttablegt ltrecordgt ltdocgt1lt/docgt ltcreatorgtJ
Smithlt/textgt ltdategt2001-11-05lt/dategt lttitlegtReport
lt/titlegt lt/recordgt lt/tablegt
51
Creator
Date
Title
Doc
Serialisation
ltrecordgt ... lt/recordgt
Transmission
ltrecordgt ... lt/recordgt
Remote application
De-serialisation
52
XML and interoperability
  • Meta-language
  • language for describing markup languages
  • can define unlimited number of markup languages
  • But.
  • XML says nothing about what your names mean
  • will a software agent process my ltdocgt XML
    element correctly?
  • Interoperability requires consensus on
  • the names of components (XML elements and
    attributes)
  • the structural model of a class of document
  • the semantics represented by the components and
    the structure
  • Shared use of common XML schemas

53
XML schemas
  • Means to codify syntax/structure rules for class
    of XML document
  • what markup is allowed
  • structural constraints on use of markup
  • Document Type Definition (DTD)
  • part of XML Recommendation
  • W3C XML Schema
  • W3C recommendation
  • data-typing i.e. tighter control on element
    content
  • support for XML Namespaces
  • uses XML syntax
  • Software can validate instance against DTD/schema

54
Metadata harvesting The Open Archives Initiative
Protocol for Metadata Harvesting
55
Searching harvesting
  • Resource discovery services operating across the
    resources of multiple distributed content
    providers
  • Possible strategies
  • Distributed search
  • submit parallel queries to multiple metadata
    databases
  • collate multiple result sets for presentation to
    user
  • Harvest
  • gather metadata records from multiple providers
    into single database
  • (periodic re-gathering to refresh data)
  • query central database
  • Performance issues in cross-searching

56
Introducing OAI
  • Open Archives Initiative
  • develops/promotes interoperability standards to
    facilitate dissemination of content
  • roots in e-prints community seeking to improve
    access to scholarly publications
  • Deposit pre-prints for quicker dissemination
  • Deposit post-prints to reduce institutional
    costs, maximise impact
  • e-print archives
  • institutional
  • federated subject/discipline-based
  • required simple low-cost interface to expose
    metadata for reuse

http//www.openarchives.org/
57
Introducing OAI (2)
  • Terminology
  • Archive repository, not archive
  • Open in terms of architecture, not
    free/unlimited access to repository
  • Protocol for Metadata Harvesting (OAI-PMH)
  • Developed by international technical committee,
    1999-2002
  • Shift from optimising discovery of e-prints to
    more generic resource discovery
  • OAI committed to version 2.0 as a production
    release

58
Introducing OAI PMH
  • Lightweight, low-cost protocol which allows data
    providers to expose metadata records for
    retrieval by service providers
  • Service providers can say give me all/some of
    your metadata records
  • Built on HTTP, XML
  • Six verbs requests from service provider to data
    provider sent using HTTP GET/POST
  • responses from data provider to service provider
    as XML documents
  • Not a distributed search protocol
  • Not limited to e-print archives

59
Introducing OAI PMH (2)
  • Supports transfer of metadata records
  • resources made available separately
  • identifier/locator of resources typically
    included in metadata record
  • Data provider must provide simple/unqualified DC
    metadata record
  • may provide metadata records in other formats
  • metadata formats must be associated with a W3C
    XML Schema
  • Extensible framework for metadata about
  • repository, sets, records
  • Metadata and resources often freely available
  • but not a requirement

60
Introducing OAI PMH (3)
  • Supports selective harvesting
  • by sets
  • by datestamps
  • Example
  • Service Provider List all records added since
    Jan 1 2002 in simple DC format (oai_dc)
  • verb ListRecords
  • from 2002-01-01
  • metadataPrefix oai_dc
  • http//www.myarchive.org/cgi-bin/oai?verbListReco
    rdsfrom2002-01-01metadataPrefixoai_dc
  • Data Provider Returns XML document containing
    records

61
(No Transcript)
62
OAI DC metadata record (from Library of Congress
Repository 1)
ltoai_dcdcgt ltdctitlegtEmpire State Building.
View from, to Central Parklt/dctitlegt ltdccreato
rgtGottscho, Samuel H. 1875-1971,
photographer.lt/dccreatorgt ltdcdategt1932 Jan.
19lt/dcdategt ltdctypegtimagelt/dctypegt ltdctypegttwo
-dimensional nonprojectible graphiclt/dctypegt ltdc
typegtCityscape photographs.lt/dctypegt ltdctypegtAce
tate negatives.lt/dctypegt ltdcidentifiergthttp//hd
l.loc.gov/loc.pnp/gsc.5a18067lt/dcidentifiergt ltdc
coveragegtUnited States--New York (State)--New
York.lt/dccoveragegt ltdcrightsgtNo known
restrictions on publication.lt/dcrightsgt lt/oai_dc
dcgt
63
Some OAI based services
64
Resource Discovery Network (RDN)
  • Co-operative network of subject gateways
  • Funded by JISC for HE and FE
  • Seven hubs
  • ALTIS - Hospitality, Leisure, Sport and Tourism
  • BIOME Health and Life Sciences
  • EEVL Engineering, Mathematics and Computing
  • GESource Geography and Environment
  • Humbul Humanities
  • PSIgate Physical Sciences
  • SOSIG Social Sciences, Business and Law
  • Databases of metadata records describing Internet
    resources selected for high quality

http//www.rdn.ac.uk/
65
Resource Discovery Network (RDN)
  • Hubs as subject communities
  • metadata creators are subject specialists
  • good links with users
  • separate metadata schemas
  • Hubs provide their own Web interfaces
  • search databases
  • other services tutorials, guides, alerting etc
  • But operate within a shared policy framework
  • collection development
  • cataloguing guidelines
  • technical standards
  • agreements on IPR

66
Resource Discovery Network (RDN)
  • RDN Resource Finder
  • Cross-search of Hubs metadata records
  • Initially distributed search using Z39.50
  • Performance issues
  • Difficult to build flexible browse interface
  • Now using OAI PMH to harvest records
  • Currently harvesting simple DC
  • Basic keyword searching
  • Exploring harvesting some richer record formats
    for additional functionality
  • Also some sharing of metadata
  • between Hubs (DC plus extensions)
  • between Hubs and other similar services (LOM)
  • but Hubs metadata not freely available for
    harvest

67
Resource Discovery Network http//www.rdn.ac.uk/
68
e-Prints UK
  • JISC-funded project, 2002-2004
  • Provide access to e-prints via subject-based RDN
    services
  • Harvest metadata from e-print archives
  • institutional, non-institutional, personal
  • Automatically enhance harvested metadata (using
    Web Services)
  • Add (or validate) authoritative forms of author
    names (OCLC)
  • Assign subject classification (based on analysis
    of full-text of resource) (OCLC)
  • Generate OpenURLs from citations (based on
    analysis of full-text of resource) (Univ of
    Southampton/UKOLN)

http//www.rdn.ac.uk/projects/eprints-uk/
69
e-Prints UK
  • Provide search services
  • across all metadata
  • subject-partitioned search services for Hubs
  • Enhanced metadata records made available to
    originating e-print archive
  • Note
  • service provider enhancing harvested metadata to
    provide more functionality
  • some of enhancement process requires access to
    resource as well as metadata record
  • two-way flow of metadata records
  • recommendations for how to use simple DC to
    describe e-prints to maximise benefits of
    metadata disclosure

70
e-Prints UK
e-print archives
Institutional e-print archives
Personal e-print archives
Non-institutional e-print archives
OAI-PMH
Web services offered by OCLC
Subject classification service
e-Prints UK
Name authority service
SOAP
Citation analysis service
Web service offered by Southampton
SOAP Javascript/HTTP Z39.50
end-user services thru the RDN
71
Developing metadata-based services
72
Developing services
  • Consensus on metadata semantics/syntax, transport
    protocols etc as minimal requirements
  • Resource selection
  • collections policies
  • Metadata quality assurance
  • cataloguing rules
  • mandatory elements, minimum-level records
  • guidance on content of values of elements
    formats, controlled vocabularies, identifiers etc
  • Maintenance, currency of metadata
  • Agreements on IPR, usage rights, branding
  • for metadata records as well as resources

73
Developing services
  • DCMES intended to be simple enough for creation
    by untrained creators
  • assumption that metadata creation
    straightforward?
  • Recognition that precision in services depends on
    quality of metadata
  • Subject terms/classification difficult for
    non-expert
  • Different services providing different
    functionality to different audiences may require
    different metadata

74
Developing services
  • Human creation of metadata is not cheap!
  • Where possible, use automated methods to
  • Generate metadata
  • Normalise/enhance metadata
  • Service providers as well as data providers can
    contribute (e.g. e-prints UK)
  • Reuse/repurpose metadata
  • Where human creation required, provide support
  • Education, guidelines
  • Appropriate software tools

75
Developing services
  • Service developers use/implement metadata
    standards in pragmatic way
  • Standards creators concerned with
  • Consensus, commonality, interoperability
  • e.g. DCMES
  • Implementers concerned with
  • Functionality, specificity, localisation
  • e.g. Using simple DC to describe e-Prints
  • Application profile
  • A metadata element set optimised for a particular
    application

76
Summary
  • Standards for metadata semantics
  • XML as syntax for metadata exchange, but requires
    consensus on structures
  • Harvesting model as alternative to distributed
    search
  • OAI PMH
  • Service provision
  • metadata quality
  • rights issues
  • application profiles
  • Next
  • A common framework for metadata?
  • Towards the Semantic Web?

77
Section 3 Sharing metadata RDF and the
Semantic Web
78
Sharing metadata RDF the Semantic Web
  • Is there a problem?
  • The vision of the Semantic Web
  • Introducing RDF
  • Some RDF applications

79
The problem with XML?
  • XML as a mechanism for expressing tree-structured
    data
  • Different communities make different design
    choices for the meaning of their trees
  • All good (and valid v XML DTD/Schema)
  • Within resource description community, meaning(s)
    of structure(s) may be limited
  • But applications working across communities have
    to work with multiple XML trees
  • potentially unlimited
  • not scalable in an open Web environment?
  • how to manage ever increasing set of conventions
  • always encountering new structures/schemas

80
The Semantic Web
  • Activity of World Wide Web Consortium (W3C)
  • To make data available on the Web in a form which
    is easier for machines to to process
  • Machine-processable statements about all kinds of
    things (Web pages, organisations, people,
    concepts, products, etc) and the
    relationships/links between them
  • To share data between programs and systems
    designed independently
  • Unlock the data held in databases
  • Link data from different sources
  • To enable richer more flexible services

http//www.w3.org/2001/sw/
81
The Semantic Web
  • Builds on
  • use of Uniform Resource Identifiers (URIs) to
    uniquely identify resources
  • the Resource Description Framework (RDF) as a
    common model for expressing information about
    resources
  • an XML syntax for representing RDF data
  • existing Web protocols (HTTP) for transferring
    data

82
Introducing RDF
83
Introducing RDF
  • Resource Description Framework
  • Model Syntax, W3C Recommendation, 1999
  • RDF Core WG activity, 2001-2003
  • Set of revised/expanded specifications currently
    (April 2002) in last call
  • Semantics formal model
  • Concepts abstract syntax (graph)
  • RDF/XML syntax conventions for encoding
    statements using XML
  • Test Cases
  • Vocabulary Description Language
  • Primer introduction

http//www.w3.org/RDF/
84
Introducing RDF (2)
  • Provides generic framework for representing
    information about resources
  • set of conventions/infrastructure for
    applications exchanging metadata
  • allows semantics to be defined by different
    resource description communities
  • accommodates mixing of information from diverse
    sources
  • Resource any object identified by URI
  • not necessarily accessible via Web
  • Property attribute to describe resource
  • properties also uniquely identified by URI
  • Statement triple of specific resource,
    property, and value

85
The RDF model
  • A resource has some property whose value is
    either (i) a simple string value (literal)

http//example.org/doc/1
author
John
  • The resource identified by the URI
    http//example.org/doc/1 has a property author
    whose value is John
  • Or, John is the author of the resource
    identified by http//example.org/doc/1

86
The RDF model (2)
  • or (ii) another resource...

http//example.org/doc/1
author
name
email
John
john_at_example.org
  • The value of property author is another
    resource which has a property name with value
    John and a property email with value
    john_at_example.org

87
The RDF model (3)
  • which may itself have a URI

author
http//example.org/doc/1
http//example.org/person/john
name
email
John
john_at_example.org
88
The RDF model (4)
  • Properties themselves are identified by URIs

http//example.org/author
http//example.org/doc/1
http//example.org/person/john
http//example.org/name
http//example.org/email
John
john_at_example.org
89
The power of the RDF model
  • Extensible model
  • supports any vocabularies
  • Supports arbitrary complexity of description
  • URIs as unique fixed points to identify
  • resources
  • properties
  • Descriptions created independently can be
    merged using URIs as anchors
  • i.e. supports distributed metadata

90
First source
author
http//example.org/doc/1
http//example.org/person/john
name
email
John
john_at_example.org
91
Second source
http//example.org/doc/1
subject
XML
92
Third source
organisation
http//example.org/person/john
JS Foundation
93
Three descriptions merged
94
A simple DC metadata record (the hedgehog)
http//example.org/doc/1
95
The RDF XML syntax
  • XML representation of model
  • to store/exchange descriptions
  • Use of XML Qualified Names and XML Namespaces to
    represent URIs in RDF/XML
  • Conventions for the meaning of structures in
    RDF/XML document
  • Service can know in advance the meaning of
    structures in RDF/XML document
  • i.e. always represents RDF graphs
  • even if unanticipated vocabularies used
  • can read multiple descriptions into store and
    merge on URIs

96
A simple DC metadata record (RDF/XML)
ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-r
df-syntax-ns" xmlnsdc"http//purl.org/dc/elemen
ts/1.1/"gt ltrdfDescription rdfabouthttp//ex
ample.org/doc/1gt ltdccreatorgtalt/dccreatorgt
ltdccontributorgtblt/dccontributorgt
ltdcpublishergtclt/dcpublishergt
ltdcsubjectgtdlt/dcsubjectgt
ltdcdescriptiongtelt/dcdescriptiongt
ltdcidentifiergtflt/dcidentifiergt
ltdcrelationgtglt/dcrelationgt
ltdcsourcegthlt/dcsourcegt ltdcrightsgtilt/dcrig
htsgt ltdcformatgtjlt/dcformatgt
ltdctypegtklt/dctypegt ltdctitlegtllt/dctitlegt
ltdcdategtmlt/dcdategt ltdccoveragegtnlt/dc
coveragegt ltdclanguagegtolt/dclanguagegt
lt/rdfDescriptiongt lt/rdfRDFgt
97
RDF Vocabulary Description Language (RDF Schema)
  • Provides mechanisms to describe
  • terms used in RDF statements
  • relationships between terms
  • e.g. Dublin Core metadata element set described
    using RDF(S)
  • Defines type system
  • resources grouped into classes
  • classes may be related hierarchically
    (subClassOf)
  • properties may be related hierarchically
    (subPropertyOf)
  • use of properties may be constrained (domain,
    range)
  • More RDF statements
  • i.e. metadata about metadata elements

98
Description of Dublin Core Creator
http//purl.org/dc/elements/1.1/creator
99
Description of Dublin Core Creator (RDF/XML)
ltrdfProperty rdfabout"http//purl.org/dc/elemen
ts/1.1/creator"gt ltrdfslabel
xmllang"en-US"gtCreatorlt/rdfslabelgt
ltrdfscomment xmllang"en-US"gtAn entity
primarily responsible for making the content of
the resource.lt/rdfscommentgt ltdcdescription
xmllang"en-US"gtExamples of a Creator include a
person, an organisation, or a service.
Typically, the name of a Creator should be used
to indicate the entity.lt/dcdescriptiongt
ltrdfsisDefinedBy rdfresource"http//purl.org/dc
/elements/1.1/"/gt ltdctermsissuedgt1999-07-02lt/d
ctermsissuedgt ltdctype rdfresource"http//du
blincore.org/usage/documents/principles/element"/
gt lt/rdfPropertygt
100
Simplicity, contradiction, trust
  • In RDF, meaning is expressed by simple
    statements
  • Subject-Predicate-Object
  • Anyone on Web can assert (in RDF sense) anything
    about anything
  • software agents navigating Web of statements
  • may be able to process some of these statements
    but not all
  • ignore the statements you don't understand
  • tolerance of inconsistency and errors
  • Establishing trust as fundamental part of
    Semantic Web infrastructure
  • Who said this (and when etc)

101
Metadata and the Semantic Web
  • Argued that the Semantic Web principles fit the
    nature of metadata
  • Metadata supports many different functions
  • Metadata is inherently "modular"
  • Metadata creation is not a one-off act, but an
    ongoing, distributed process
  • the metadata creator can't predict how users may
    want to use resources and query metadata
  • new uses of resources result in new metadata
  • Metadata is not (or at least not only)
    "objective", "authoritative" information
  • Some attributes represent interpretations
  • Some attributes are context-dependent
  • Multiple (even conflicting) descriptions can
    co-exist

102
Some RDF applications
103
RDF Site Summary (RSS) 1.0
  • Simple RDF metadata vocabulary designed to
    support syndication of "news" items
  • An RSS "channel" is published as an RDF/XML
    docment
  • Provides metadata about
  • The channel itself
  • A summary of its scope and purpose
  • A sequence of items
  • Summary descriptions of Web documents
  • Content of channel regularly updated by provider
  • Wide, simple, automated distribution

http//purl.org/rss/1.0/
104
RDF Site Summary (RSS) 1.0
  • Typical applications
  • Web sites render content of specific channels as
    part of their own Web sites
  • On line aggregator services harvest numerous
    channels and provide search/filtering services
    across the items
  • e.g. Meerkat
  • Desktop news readers allow users to "subscribe"
    to list of channels, regularly download content
    for user to browse
  • e.g. Amphetadesk
  • RSS also generated from some Weblog management
    systems
  • SWAD(E) activity on "semantic weblogging"

105
http//www.ukoln.ac.uk/
106
Metadata schema registries
  • How to encourage convergence and reuse of
    metadata vocabularies
  • Implementers
  • may be unaware of existing vocabularies
  • adapt/customise "standard" terms for
    application-specific use
  • may combine terms from multiple "standard"
    sources
  • coin application-specific terms or extensions
  • Application profile
  • A metadata element set optimised for a particular
    application

107
Metadata schema registries
  • A publication context for
  • "standard" metadata vocabularies and their terms
  • (depending on scope of registry) also implementer
    usages/adaptations of those vocabularies and
    their terms
  • To provide a "dictionary" function
  • To highlight relationships, encourage
    reuse/convergence
  • Based on indexing RDF data distributed on Web?
  • Requires shared conventions for describing
  • metadata vocabularies
  • and their usages and adaptations

108
http//dublincore.org/dcregistry/
109
Summary
  • RDF provides a common framework for making
    machine-processable statements about resources
  • The Semantic Web provides a vision of metadata
    as
  • modular, extensible
  • distributed, devolved
  • dynamic, evolving
  • Seeks to address (some of) the challenges of
    cross-domain, cross-community interoperability
  • Fundamental role of trust on the Semantic Web

110
Overall summary
  • Global networks have created a new context for
    the delivery of services
  • Metadata fundamental to service provision
  • Services being built (successfully!)
  • OAI PMH as a low-barrier technology
  • No one-size-fits-all solution
  • Debates, tensions, balances.
  • automated processes v human labour
  • domain-specific richness v cross-domain (over-?)
    simplicity
  • standards v their implementation
  • objectivity v subjectivity
  • centralisation v distribution
  • Emergence of a Semantic Web?

111
Acknowledgements
  • Parts of the content of this presentation are
    adapted from earlier presentations by
  • Tom Baker (Fraunhofer-Gesellschaft, Berlin),
  • Michael Day, Rachel Heery, Paul Miller, and Andy
    Powell (UKOLN)

112
Acknowledgements
  • UKOLN is funded by Resource the Council for
    Museums, Archives and Libraries, the Joint
    Information Systems Committee (JISC) of the UK
    higher and further education funding councils, as
    well as by project funding from the JISC and the
    European Union. UKOLN also receives support from
    the University of Bath where it is based.
  • http//www.ukoln.ac.uk/
About PowerShow.com