Title: Open for business Open Archives, OpenURL, RSS and the Dublin Core
1Open for Business Open Archives, OpenURL, RSS
and the Dublin Core Andy Powell, UKOLN,
University of Bath a.powell_at_ukoln.ac.uk UKSG
2004, Manchester
UKOLN is supported by
www.bath.ac.uk
www.ukoln.ac.uk
a centre of expertise in digital information
management
2Contents
- context metasearching and open context
sensitive linking - bluffers guides to
- Dublin Core
- OAI Protocol for Metadata Harvesting
- RSS
- OpenURL
- discussion about the benefits, problems and
issues of using these standards in the publishing
business environment
3Things to note
- this is a briefing session about technologies
- but it is not intended to be overly technical
- you should leave with an understanding of what
the key technologies are but not necessarily be
expert in them!
4Important
- this is a briefing session
- please feel free to ask questionsas we go
through!
5Context metasearching and context sensitive
linking
6The problem
- end-user often has access to large number of
heterogeneous collections - full-text, AI,
images, video, data, etc. (e.g. thru JISC
licening agreements) - however, experience of these collections is less
than optimal - end-users not aware of available content
- end-user has to interact with (search or browse)
multiple different Web sites to work across range
of content - content discovery services not joined-up with
delivery services
7Or, to put it another way
- from perspective of data consumer
- need to interact with multiple collections of
stuff - bibliographic, full-text, data, image,
video, etc. - delivered thru multiple Web sites
- few cross-collection discovery services (with
exception of big search engines like Google, but
still some issues with use of Google e.g. the
invisible Web, the lack of metadata, keywords
with multiple meanings, etc.) - from perspective of data provider
- few agreed mechanisms for disclosing availability
of content
8A solution
- an information environment
- framework of machine-oriented services allowing
the end-user to - discover, access, use, publish resources across a
range of content providers - move away from lots of stand-alone Web sites...
- content providers expose metadata for
- searching, harvesting, alerting
- develop end-user services and tools that bring
stuff together - based on open standards
9End-user services and tools
- tend to focus on library portal (metasearch)
tools (e.g. Encompass, MetaLib or ZPortal) - but, there will be lots of user-focused services
and tools - subject portals developed within academia
- reading list and other tools in VLE (e.g.
externally hosted by Sentient Discover) - commercial portals (ISI Web of Knowledge,
ingenta, Bb Resource Center, etc.) - SFX service component (or other OpenURL resolver)
- personal desktop reference manager (e.g. Endnote)
10Link resolvers
- discovery is only part of the problem
- in the case of books, journals, journal articles,
end-user wants access to the most appropriate
copy - need to join up discovery services with
access/delivery services (local library OPAC,
ingentaJournals, Amazon, etc.) - need localised view of available services
- linking services that provide access to the most
appropriate copy - user and institutional preferences, cost, access
rights, location, etc.
11A shared problem space
- the problems outlined here are shared across
sectors and communities - student or researcher looking for information
from variety of bibliographic sources - lecturer searching for e-learning resources from
multiple learning object repositories - researcher working across multiple data-sets and
compute servers on the Grid - a GP searching the National electronic Library
for Health - school child searching BBC, museum and library
Web sites for homework project - someone searching across multiple e-government
Sites - even someone looking to buy or sell a second-hand
car
12Technologies
- require global, standards-based, cross-domain
solutions - cross-searching
- Z39.50 Bath Profile, a profile of Z39.50 SRW
(Search and Retrieve Web-service)(Web services
implementation of Z39.50) - harvesting
- OAI-PMH - Open Archives Initiative Protocol for
Metadata Harvesting - alerting
- RSS - RDF/Rich Site Summary
- linking
- OpenURL
13Bluffers Guide toDublin Core
14Bluffers guide to DC
http//dublincore.org/
- DC short for Dublin Core
- simple metadata standard,supporting
cross-domainresource discovery - original focus on Web resources but that is no
longer the case e.g. usage to describe physical
artefacts in museums - current usage across wide range of sectors
academic, e-government, museums, libraries,
business, semantic Web
15Bluffers Guide to DC
- simple DC provides 15 elements (metadata
properties) - multiple encoding syntaxes including HTML ltmetagt
tags, XML and RDF/XML (XML schema are available)
16Bluffers Guide to DC
- relatively slow programme of adding new terms to
qualified DC - new elements (e.g. dctermsaudience)
- element refinements (e.g. dctermsdateCopyrighted)
- encoding schemes (e.g. dctermsLCSH and
dctermsW3CDTF - 48 elements and 17 encoding schemes
http//dublincore.org/documents/dcmi-terms/
17Bluffers Guide to DC
- DC can be embedded into HTML pages but almost
none of the big search engines will use it! Why?
Lack of trust - meta-spam
- meta-crap
- however, embedding DC in HTML may be worthwhile
if your own site search engine uses it - however, simple DC forms baseline metadata format
for the OAI protocol
18Bluffers Guide toOAI Protocol for Metadata
Harvesting
19OAI roots
- the roots of OAI lie in the development of eprint
archives - arXiv, CogPrints, NACA (NASA), RePEc, NDLTD,
NCSTRL - each offered Web interface for deposit of
articles and for end-user searches - difficult for end-users to work across archives
without having to learn multiple different
interfaces - recognised need for single search interface to
all archives - Universal Pre-print Service (UPS)
20Searching vs. harvesting
- two possible approaches to building a single
search interface to multiple eprint archives - cross-searching multiple archives based on
protocol like Z39.50 - harvesting metadata into one or more central
services bulk move data to the user-interface - US digital library experience in this area
indicated that cross-searching not preferred
approach - distributed searching of N nodes viable, but only
for small values of N
21Harvesting requirements
- in order that harvesting approach can work there
need to be agreements about - transport protocols HTTP vs. FTP vs.
- metadata formats DC vs. MARC vs.
- quality assurance mandatory elements,
mechanisms for naming of people, subjects, etc.,
handling duplicated records, best-practice - intellectual property and usage rights who can
do what with the records - work in this area resulted in the Santa Fe
Convention
22Development of OAI-PMH
- 2 year metamorphosis thru various names
- Santa Fe Convention, OAI-PMH versions 1.0, 1.1
- OAI Protocol for Metadata Harvesting 2.0
- development steered by international technical
committee - inter-version stability helped developer
confidence - move from focus on eprints to more generic
protocol - move from OAI-specific metadata schema to
mandatory support for DC
23Bluffers guide to OAI
http//www.openarchives.org/
- OAI-PMH short for Open Archives Initiative
Protocol for Metadata Harvesting - a low-cost mechanism for harvesting metadata
records - from data providers to service providers
- allows service provider to say give me some or
all of your metadata records - where some is based on date-stamps, sets,
metadata formats - eprint heritage but widely deployed
- images, museum artefacts, learning objects,
24Bluffers guide to OAI
- based on HTTP and XML
- simple, Web-friendly, fast deployment
- OAI-PMH is not a search protocol
- but use can underpin search-based services based
on Z39.50 or SRW or SOAP or - OAI-PMH carries only metadata
- content (e.g. full-text or image) made available
separately typically at URL in metadata - mandates simple DC as record format
- but extensible to any XML format IMS metadata,
IEEE LOM, ONIX, MARC, METS, MPEG-21, etc.
25Bluffers guide to OAI
- metadata and content often made freely
available but not a requirement - OAI-PMH can be used between closed groups
- or, can make metadata available but restrict
access to content in some way - underlying HTTP protocol provides
- access control e.g. HTTP BASIC
- compression mechanisms (for improving performance
of harvesters) - could, in theory, also provide encryption if
required
26Bluffers Guide toRSS
27Bluffers guide to RSS
http//www.eevl.ac.uk/rss_primer/
- simple XML application for sharing (syndicating)
news feeds on the Web - RDF Site Summary or Rich Site Summary (depending
on who you ask) - news can be interpreted quite loosely, e.g. new
items added to database - uses channel and item terminology
- a channel is an XML document that is made
available on a Web-site to update the channel,
simply update the XML
28Bluffers guide to RSS
- each item has simple metadata (title,
description) and URL link to resource (news story
or whatever) - RSS also provides channel branding (logo, etc.)
- three versions currently 0.9, 1.0 and 2.0 - 1.0
is based on RDF and is more flexible (but
slightly more complex)(Also worth noting Atom
an attempt to resolve some of the tensions in
RSS) - no single registry of all channels yet
29Bluffers guide to RSS
- fairly widespread usage, e.g. channels available
from the BBC, Microsoft, Apple, as well as from
several academic sites and services (RDN, LTSN,
) - easy to use within portals (e.g. uPortal)
- lots of software and toolkits available open
source and commercial
30Bluffers Guide toOpenURLs
31OpenURL roots
a library perspective?
- the context
- distributed information environment (e.g. the
JISC IE) - multiple AI and other discovery services
- rapidly growing e-journal collection
- need to interlink available resources
- the problem
- links controlled by external info services
- links not sensitive to users context
(appropriate copy problem) - links dependent on vendor agreements
- links dont cover complete collection
32The problem
a library perspective?
- the context
- distributed information environment (e.g. the
JISC IE) - multiple AI and other discovery services
- rapidly growing e-journal collection
- need to interlink available resources
- the REAL problem
- libraries have no say in linking
- libraries losing core part of organising
information task - expensive collection not used optimally
- users not well served
33The solution
- do NOT hardwire a link to a single service on the
referenced item (e.g. a link from an AI service
to the corresponding full-text) - BUT rather
- provide a link that transports metadata about the
referenced item - to another service that is better placed to
provide service links
34Non-OpenURL linking
document delivery service
AI service
.
link to referenced work
reference
resolution of metadata into a link (typically a
URL)
35OpenURL linking
document delivery service
AI service
user-specific
transportation of metadata identifiers
.
reference
context-sensitive
provision of OpenURL
resolution of metadata identifiers into services
36Example 1
- journal article
- from Web of Science to ingenta Journals
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42Example 2
- book
- from University of Bath OPAC to Amazon
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)
48Summary
ingenta
ISI Web of Science
Google
OpenURL resolver
University of Bath OPAC
Amazon
OpenURL Resolver
OpenURL Source
OpenURL Target
49Summary (2)
- OpenURL source
- a service that embeds OpenURLs into its
user-interface in order to enable linking to most
appropriate copy - OpenURL resolver
- a service that links to appropriate copy(ies) and
other value added services based on metadata in
OpenURL - OpenURL target
- a service that can be linked to from an OpenURL
resolver using metadata in OpenURL
50Bluffers guide to OpenURLs
http//www.niso.org/committees/committee_ax.html
- standard for linking discovery services to
delivery services - supports linking from OpenURL source to OpenURL
target via OpenURL resolver
e.g. Web of Science
e.g. ingenta
source
resolver
target
BASEURL
http//www.bath.ac.uk/openurl?genrearticle atitl
eInformation20gateways20collaboration 20on20
content titleOnline20Information 20Review
issn1468-4527volume24 spage40epage45
artnum1aulastHeery aufirstRachel
End-user
51Bluffers guide to OpenURLs
- the OpenURL is a URL that carries metadata from
the source service to the users preferred
resolver - resolver typically offered by institution
- currently deployed OpenURLs are often version 0.1
- focus on bibliographic resources (books and
journal articles) - version 1.0 (the standard) more generic and
extensible, e.g. could carry metadata about
learning objects or research data
52Bluffers guide to OpenURLs
- sources need to maintain knowledge about
end-users preferred resolver - resolvers and targets need to share knowledge
about link-to syntaxes - most library automation vendors will either have
(or be developing) an OpenURL resolver solution
for their customers - some open-source solutions also available but
expect to work quite hard with these
53Discussion
54Summary
55Summary
- protocols presented here fill space between
information providers and other services
(portals, VLEs, etc.) - allow integration of remote information resources
more seamlessly - allow separation of discovery and content
delivery - enable user-focused, context-sensitive linking
- can be viewed as ways of getting users to your
site - but there are some issues to beware of
56What can you do?
- consider exposing metadata about your content for
harvesting (or searching) - consider making alerting channels available
- consider supporting use of OpenURLs for linking
to appropriate-copy - consider how your content will be used in
e-learning context - consider how external services link to your
resources (i.e. support persistent deep linking
to your content)