Title: An introduction to metadata Metadata : from soup to nuts, NOF-digitise programme seminar, London, 5 February 2002
1An introduction to metadata Metadata from soup
to nuts, NOF-digitise programme seminar,London,
5 February 2002
Email p.johnston_at_ukoln.ac.uk URL http//www.ukoln.
ac.uk/
- Pete Johnston
- UKOLN, University of Bath
- Bath, BA2 7AY
UKOLN is supported by
2An introduction to metadata
- What is metadata what is it used for?
- Metadata for resource discovery introducing the
Dublin Core - How is metadata created?
- How is metadata shared?
- Resource discovery metadata in the NOF-digitise
programme
3What is metadata what is it used for?
4What is metadata?
- Metadata creation the art formerly known as
cataloguing? - delivery of resources by resource creators/owners
- rather than (or as well as) by intermediary
- remote access to resources for all
- (potentially)
- emphasis on customer/user
- information overload
- quantity vs. quality?
- the Google effect
5What is metadata? (2)
- Data associated with objects which relieves
their potential users of having to have full
advance knowledge of their existence or
characteristics. A user might be a program or a
person. - Dempsey and Heery, 1998
- Machine understandable information about web
resources or other things. - Berners-Lee, 1997
- Structured data about resources that can be used
to help support a wide range of operations
6Who/what uses metadata?
- Human agent
- owner managing resources
- researcher seeking resources
- third party services
- Software agents
- aggregators
- portals presenting landscape to user
- brokers performing query tasks on behalf of user
7What resources, objects, things?
- HTML documents
- digital images
- databases
- books
- museum objects
- archival records
- metadata records
- collections
- services
- physical places
- people
- abstract works
- concepts
- events
8What operations?
- Different flavours of metadata serve different
purposes - simple, generic vs. rich, specific
- published widely vs. shared within community vs.
used by resource owner/manager - Owner / manager / provider wants to
- establish control of resources
- administer/manage resources (through time)
- disclose/promote resources
- enable and control access/use of resources
- contextualise resources
9What operations? (2)
- End user wants to
- find
- identify
- select
- obtain/use
- interpret
- Third party service may want to
- disclose/promote
- enable and control access/use
- annotate
- re-contextualise
10What information required in metadata?
- No one size fits all solution
- Depends on operation which metadata supports
- Refer to standards
- Benefit of others experience, expertise
- Provide basis for good practice
- Reflect consensus, so facilitate exchange,
access, interoperability - May have support in software tools
- Administrative metadata
11Metadata for resource discovery introducing the
Dublin Core
12Resource discovery metadata
- Resource users may wish to
- search across descriptions from different
providers - compare/combine descriptions from different
providers - Resource providers may wish to
- disseminate descriptions widely
- share descriptions with other providers, 3rd
parties - describe relationships between resources
- Third parties may wish to
- build services on descriptions prepared by
others - annotate descriptions prepared by others
13Resource discovery metadata
- Metadata for resource discovery is
- used beyond its creator community
- combined with metadata from other communities
- Metadata is aggregated or cross-searched
- challenge of semantic interoperability
14Resource discovery metadata
- Typically covers
- description of resource content
- what is it?
- may include some description of context
- description of resource form
- how is it constructed?
- description of resource use
- what tools do I need to use it?
- can I afford it?
15Introducing the Dublin Core
- Initiative to improve resource discovery on Web
- not for complex resource description
- simple document-like objects
- extended to other classes of resource
- Interdisciplinary consensus on simple element set
- 15 elements
- all optional
- all repeatable
16Introducing the Dublin Core
- Title
- Subject
- Description
- Creator
- Publisher
- Contributor
- Date
- Type
- Format
- Identifier
- Source
- Language
- Relation
- Coverage
- Rights
17Introducing the Dublin Core
- Simplicity of semantics, ease of use
- Provides basic semantic interoperability
- across domains
- across language communities
- Allows for extensibility
- but tension between extending DC and choosing
other, richer schema
18Introducing the Dublin Core
- Interoperability requires
- use of content rules/standards
- clarity about resource being described
- e.g. work, expression, manifestation, item
- Real resources more complex than (stable)
document-like object? - characteristics of resources change through time
- agents perform actions which produce changes
19Using the Dublin Core
- Not a replacement for richer descriptive
standards - A pidgin language for use by tourists on the
Internet commons - Tom Baker, A Grammar of Dublin Core
- Can provide 15 windows into richer resource
descriptions - disclose rich description in simple form
- semantic cross-walks, mappings
- export rather than create?
- NOF-digitise guidelines (5.2.1) mandate
generation of simple DC records at item-level
20Using the Dublin Core
title
creator
date
desc
rights
Simple DC description
Rich description
21How is metadata created?
22How is metadata created?
- By software tools
- indexing robots, web crawlers
- from resource content, from server info
- By human agents
- description by resource creator/owner
- description by third party services
- Creating (and maintaining) good quality metadata
is not cheap - rights issues for metadata as well as for
resources?
23Where is metadata stored?
- Embedded in resource
- depends on format of resource
- can metadata be extracted from resource?
- Linked to resource
- Created as record in database
- may be remote database
- Adopt approach which offers most flexibility
- may need to present different subsets of full
metadata in different contexts
24Metadata embedded in resource
Metadata embedded in resource
Creator
Date
Title
Doc
Creator J Smith
Date 2001-11-05
Title Report
J Smith
2001-11-05
Report
1
Resource1
Metadata database
25Metadata record as linked resource
Metadata record as linked resource
Doc 1
Creator
Date
Title
Doc
Creator J Smith
Date 2001-11-05
Title Report
Metadata rec 1
J Smith
2001-11-05
Report
1
Metadata rec 1
Resource 1
Metadata database
26Metadata record created in database
Metadata record created in database
Creator
Date
Title
Doc
J Smith
2001-11-05
Report
1
Resource 1
Metadata database
27How is metadata shared?
28How is metadata shared?
- How does a data provider make metadata records
available in a commonly understood form? - How does a service provider obtain these metadata
records from data providers?
29How is metadata shared?
- Metadata as language metadata records as sets of
statements - Effective transmission of information requires
agreement on - semantics
- what terms mean
- e.g. cat, to sit, mat
- structure
- significance of arrangement of terms
- e.g. sentence subject -gt verb -gt object (in
English.) - syntax
- rules of expression
- The cat sat on the mat.
30How is metadata shared?
- A resource description community is defined by
consensus on conventions - Consensus on syntax
- use of XML
- Consensus on semantics of terms
- meaning of (uniquely named through XML namespace)
elements/attributes - Consensus on meaning of structure
- use of community standard XML DTD/Schema
31Introducing XML
- Extensible Markup Language
- Recommendation of W3C, 1998, 2000
- Defines means of describing tree-structured data
in text-based format - embedded markup delimits and describes data
- Simple, platform-independent syntax
- Standard programming interfaces
- reusable software components
- Widely adopted for transferring data between
programs, systems
32Creator
Date
Title
Doc
J Smith
2001-11-05
Report
1
lttablegt ltrecordgt ltdocgt1lt/docgt ltcreatorgtJ
Smithlt/textgt ltdategt2001-11-05lt/dategt lttitlegtReport
lt/titlegt lt/recordgt lt/tablegt
33Creator
Date
Title
Doc
Serialisation
ltrecordgt ... lt/recordgt
Transmission
ltrecordgt ... lt/recordgt
Remote application
De-serialisation
34Introducing XML (2)
- Support from major software vendors
- Use of XML
- invisible to end-user
- increasingly invisible to information manager?
- generated and consumed by software
- requires consensus on structure amongst
communication partners - Use XML for exchange when
- partners (humans, applications) both know
semantics conveyed by structure of (meta)data - Use RDF/XML for exchange when
- (meta)data potentially used by applications
without prior knowledge of specific schema - (meta)data incorporates overlapping structures
from different domains
35Introducing OAI
- Open Archives Initiative
- develops/promotes interoperability standards to
facilitate dissemination of content - roots in e-prints community
- Archive repository, not archive
- Open in terms of architecture, not
free/unlimited access to repository
36Introducing OAI MHP
- OAI Metadata Harvesting Protocol
- lightweight protocol which allows data providers
to expose metadata records for retrieval by
service providers - built on HTTP, XML
- requests from service provider to data provider
sent using HTTP GET/POST - Six verbs
- responses from data provider to service provider
as XML documents - Must provide simple DC (OAI provides XML Schema)
- May provide other metadata formats (in XML)
37Introducing OAI MHP (2)
- Supports selective harvesting
- by sets
- by datestamps
- Example
- http//www.myarchive.org/cgi-bin/oai?verbListReco
rdsfrom2002-01-01metadataPrefixoai_dc - List all records added since Jan 1 2002 in oai_dc
format (simple DC) - Returns XML document containing records
- OAI MHP is not a distributed search protocol
38Resource discovery metadata in the NOF-digitise
programme
39Resource discovery within a project
Resources
40Resource discovery across the programme
41Resource discovery the larger context
42N.B. .
- N.B. Previous diagrams should be treated as
illustration of potential, not description of
architecture! - Role of collection-level description in
disclosing existence of collections/repositories
to portals
43Summary
- Metadata for resource discovery and resource
management - Resource discovery metadata made to be shared
- Communication syntax, semantics, structure
- Role of standards
- Lightweight protocols for metadata exchange
- balance functionality and cost
- Enhance access to your projects resources
44Acknowledgements
- UKOLN is funded by Resource the Council for
Museums, Archives and Libraries, the Joint
Information Systems Committee (JISC) of the UK
higher and further education funding councils, as
well as by project funding from the JISC and the
European Union. UKOLN also receives support from
the University of Bath where it is based. - http//www.ukoln.ac.uk/