Open for business Open Archives, OpenURL, RSS and the Dublin Core - PowerPoint PPT Presentation

About This Presentation
Title:

Open for business Open Archives, OpenURL, RSS and the Dublin Core

Description:

UKOLN is supported by: Open for Business. Open Archives, OpenURL, ... 2 year metamorphosis thru various names. Santa Fe Convention, OAI-PMH versions 1.0, 1.1... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 57
Provided by: andyp74
Category:

less

Transcript and Presenter's Notes

Title: Open for business Open Archives, OpenURL, RSS and the Dublin Core


1
Open for Business Open Archives, OpenURL, RSS
and the Dublin Core Andy Powell, UKOLN,
University of Bath a.powell_at_ukoln.ac.uk UKSG
2004, Manchester
UKOLN is supported by
www.bath.ac.uk
www.ukoln.ac.uk
a centre of expertise in digital information
management
2
Contents
  • context metasearching and open context
    sensitive linking
  • bluffers guides to
  • Dublin Core
  • OAI Protocol for Metadata Harvesting
  • RSS
  • OpenURL
  • discussion about the benefits, problems and
    issues of using these standards in the publishing
    business environment

3
Things to note
  • this is a briefing session about technologies
  • but it is not intended to be overly technical
  • you should leave with an understanding of what
    the key technologies are but not necessarily be
    expert in them!

4
Important
  • this is a briefing session
  • please feel free to ask questionsas we go
    through!

5
Context metasearching and context sensitive
linking
6
The problem
  • end-user often has access to large number of
    heterogeneous collections - full-text, AI,
    images, video, data, etc. (e.g. thru JISC
    licening agreements)
  • however, experience of these collections is less
    than optimal
  • end-users not aware of available content
  • end-user has to interact with (search or browse)
    multiple different Web sites to work across range
    of content
  • content discovery services not joined-up with
    delivery services

7
Or, to put it another way
  • from perspective of data consumer
  • need to interact with multiple collections of
    stuff - bibliographic, full-text, data, image,
    video, etc.
  • delivered thru multiple Web sites
  • few cross-collection discovery services (with
    exception of big search engines like Google, but
    still some issues with use of Google e.g. the
    invisible Web, the lack of metadata, keywords
    with multiple meanings, etc.)
  • from perspective of data provider
  • few agreed mechanisms for disclosing availability
    of content

8
A solution
  • an information environment
  • framework of machine-oriented services allowing
    the end-user to
  • discover, access, use, publish resources across a
    range of content providers
  • move away from lots of stand-alone Web sites...
  • content providers expose metadata for
  • searching, harvesting, alerting
  • develop end-user services and tools that bring
    stuff together
  • based on open standards

9
End-user services and tools
  • tend to focus on library portal (metasearch)
    tools (e.g. Encompass, MetaLib or ZPortal)
  • but, there will be lots of user-focused services
    and tools
  • subject portals developed within academia
  • reading list and other tools in VLE (e.g.
    externally hosted by Sentient Discover)
  • commercial portals (ISI Web of Knowledge,
    ingenta, Bb Resource Center, etc.)
  • SFX service component (or other OpenURL resolver)
  • personal desktop reference manager (e.g. Endnote)

10
Link resolvers
  • discovery is only part of the problem
  • in the case of books, journals, journal articles,
    end-user wants access to the most appropriate
    copy
  • need to join up discovery services with
    access/delivery services (local library OPAC,
    ingentaJournals, Amazon, etc.)
  • need localised view of available services
  • linking services that provide access to the most
    appropriate copy
  • user and institutional preferences, cost, access
    rights, location, etc.

11
A shared problem space
  • the problems outlined here are shared across
    sectors and communities
  • student or researcher looking for information
    from variety of bibliographic sources
  • lecturer searching for e-learning resources from
    multiple learning object repositories
  • researcher working across multiple data-sets and
    compute servers on the Grid
  • a GP searching the National electronic Library
    for Health
  • school child searching BBC, museum and library
    Web sites for homework project
  • someone searching across multiple e-government
    Sites
  • even someone looking to buy or sell a second-hand
    car

12
Technologies
  • require global, standards-based, cross-domain
    solutions
  • cross-searching
  • Z39.50 Bath Profile, a profile of Z39.50 SRW
    (Search and Retrieve Web-service)(Web services
    implementation of Z39.50)
  • harvesting
  • OAI-PMH - Open Archives Initiative Protocol for
    Metadata Harvesting
  • alerting
  • RSS - RDF/Rich Site Summary
  • linking
  • OpenURL

13
Bluffers Guide toDublin Core
14
Bluffers guide to DC
http//dublincore.org/
  • DC short for Dublin Core
  • simple metadata standard,supporting
    cross-domainresource discovery
  • original focus on Web resources but that is no
    longer the case e.g. usage to describe physical
    artefacts in museums
  • current usage across wide range of sectors
    academic, e-government, museums, libraries,
    business, semantic Web

15
Bluffers Guide to DC
  • simple DC provides 15 elements (metadata
    properties)
  • multiple encoding syntaxes including HTML ltmetagt
    tags, XML and RDF/XML (XML schema are available)

16
Bluffers Guide to DC
  • relatively slow programme of adding new terms to
    qualified DC
  • new elements (e.g. dctermsaudience)
  • element refinements (e.g. dctermsdateCopyrighted)
  • encoding schemes (e.g. dctermsLCSH and
    dctermsW3CDTF
  • 48 elements and 17 encoding schemes

http//dublincore.org/documents/dcmi-terms/
17
Bluffers Guide to DC
  • DC can be embedded into HTML pages but almost
    none of the big search engines will use it! Why?
    Lack of trust
  • meta-spam
  • meta-crap
  • however, embedding DC in HTML may be worthwhile
    if your own site search engine uses it
  • however, simple DC forms baseline metadata format
    for the OAI protocol

18
Bluffers Guide toOAI Protocol for Metadata
Harvesting
19
OAI roots
  • the roots of OAI lie in the development of eprint
    archives
  • arXiv, CogPrints, NACA (NASA), RePEc, NDLTD,
    NCSTRL
  • each offered Web interface for deposit of
    articles and for end-user searches
  • difficult for end-users to work across archives
    without having to learn multiple different
    interfaces
  • recognised need for single search interface to
    all archives
  • Universal Pre-print Service (UPS)

20
Searching vs. harvesting
  • two possible approaches to building a single
    search interface to multiple eprint archives
  • cross-searching multiple archives based on
    protocol like Z39.50
  • harvesting metadata into one or more central
    services bulk move data to the user-interface
  • US digital library experience in this area
    indicated that cross-searching not preferred
    approach
  • distributed searching of N nodes viable, but only
    for small values of N

21
Harvesting requirements
  • in order that harvesting approach can work there
    need to be agreements about
  • transport protocols HTTP vs. FTP vs.
  • metadata formats DC vs. MARC vs.
  • quality assurance mandatory elements,
    mechanisms for naming of people, subjects, etc.,
    handling duplicated records, best-practice
  • intellectual property and usage rights who can
    do what with the records
  • work in this area resulted in the Santa Fe
    Convention

22
Development of OAI-PMH
  • 2 year metamorphosis thru various names
  • Santa Fe Convention, OAI-PMH versions 1.0, 1.1
  • OAI Protocol for Metadata Harvesting 2.0
  • development steered by international technical
    committee
  • inter-version stability helped developer
    confidence
  • move from focus on eprints to more generic
    protocol
  • move from OAI-specific metadata schema to
    mandatory support for DC

23
Bluffers guide to OAI
http//www.openarchives.org/
  • OAI-PMH short for Open Archives Initiative
    Protocol for Metadata Harvesting
  • a low-cost mechanism for harvesting metadata
    records
  • from data providers to service providers
  • allows service provider to say give me some or
    all of your metadata records
  • where some is based on date-stamps, sets,
    metadata formats
  • eprint heritage but widely deployed
  • images, museum artefacts, learning objects,

24
Bluffers guide to OAI
  • based on HTTP and XML
  • simple, Web-friendly, fast deployment
  • OAI-PMH is not a search protocol
  • but use can underpin search-based services based
    on Z39.50 or SRW or SOAP or
  • OAI-PMH carries only metadata
  • content (e.g. full-text or image) made available
    separately typically at URL in metadata
  • mandates simple DC as record format
  • but extensible to any XML format IMS metadata,
    IEEE LOM, ONIX, MARC, METS, MPEG-21, etc.

25
Bluffers guide to OAI
  • metadata and content often made freely
    available but not a requirement
  • OAI-PMH can be used between closed groups
  • or, can make metadata available but restrict
    access to content in some way
  • underlying HTTP protocol provides
  • access control e.g. HTTP BASIC
  • compression mechanisms (for improving performance
    of harvesters)
  • could, in theory, also provide encryption if
    required

26
Bluffers Guide toRSS
27
Bluffers guide to RSS
http//www.eevl.ac.uk/rss_primer/
  • simple XML application for sharing (syndicating)
    news feeds on the Web
  • RDF Site Summary or Rich Site Summary (depending
    on who you ask)
  • news can be interpreted quite loosely, e.g. new
    items added to database
  • uses channel and item terminology
  • a channel is an XML document that is made
    available on a Web-site to update the channel,
    simply update the XML

28
Bluffers guide to RSS
  • each item has simple metadata (title,
    description) and URL link to resource (news story
    or whatever)
  • RSS also provides channel branding (logo, etc.)
  • three versions currently 0.9, 1.0 and 2.0 - 1.0
    is based on RDF and is more flexible (but
    slightly more complex)(Also worth noting Atom
    an attempt to resolve some of the tensions in
    RSS)
  • no single registry of all channels yet

29
Bluffers guide to RSS
  • fairly widespread usage, e.g. channels available
    from the BBC, Microsoft, Apple, as well as from
    several academic sites and services (RDN, LTSN,
    )
  • easy to use within portals (e.g. uPortal)
  • lots of software and toolkits available open
    source and commercial

30
Bluffers Guide toOpenURLs
31
OpenURL roots
a library perspective?
  • the context
  • distributed information environment (e.g. the
    JISC IE)
  • multiple AI and other discovery services
  • rapidly growing e-journal collection
  • need to interlink available resources
  • the problem
  • links controlled by external info services
  • links not sensitive to users context
    (appropriate copy problem)
  • links dependent on vendor agreements
  • links dont cover complete collection

32
The problem
a library perspective?
  • the context
  • distributed information environment (e.g. the
    JISC IE)
  • multiple AI and other discovery services
  • rapidly growing e-journal collection
  • need to interlink available resources
  • the REAL problem
  • libraries have no say in linking
  • libraries losing core part of organising
    information task
  • expensive collection not used optimally
  • users not well served

33
The solution
  • do NOT hardwire a link to a single service on the
    referenced item (e.g. a link from an AI service
    to the corresponding full-text)
  • BUT rather
  • provide a link that transports metadata about the
    referenced item
  • to another service that is better placed to
    provide service links

34
Non-OpenURL linking
document delivery service
AI service
.
link to referenced work
reference
resolution of metadata into a link (typically a
URL)
35
OpenURL linking
document delivery service
AI service
user-specific
transportation of metadata identifiers
.
reference
context-sensitive
provision of OpenURL
resolution of metadata identifiers into services
36
Example 1
  • journal article
  • from Web of Science to ingenta Journals

37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
Example 2
  • book
  • from University of Bath OPAC to Amazon

43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
Summary
ingenta
ISI Web of Science
Google
OpenURL resolver
University of Bath OPAC
Amazon
OpenURL Resolver
OpenURL Source
OpenURL Target
49
Summary (2)
  • OpenURL source
  • a service that embeds OpenURLs into its
    user-interface in order to enable linking to most
    appropriate copy
  • OpenURL resolver
  • a service that links to appropriate copy(ies) and
    other value added services based on metadata in
    OpenURL
  • OpenURL target
  • a service that can be linked to from an OpenURL
    resolver using metadata in OpenURL

50
Bluffers guide to OpenURLs
http//www.niso.org/committees/committee_ax.html
  • standard for linking discovery services to
    delivery services
  • supports linking from OpenURL source to OpenURL
    target via OpenURL resolver

e.g. Web of Science
e.g. ingenta
source
resolver
target
BASEURL
http//www.bath.ac.uk/openurl?genrearticle atitl
eInformation20gateways20collaboration 20on20
content titleOnline20Information 20Review
issn1468-4527volume24 spage40epage45
artnum1aulastHeery aufirstRachel
End-user
51
Bluffers guide to OpenURLs
  • the OpenURL is a URL that carries metadata from
    the source service to the users preferred
    resolver
  • resolver typically offered by institution
  • currently deployed OpenURLs are often version 0.1
    - focus on bibliographic resources (books and
    journal articles)
  • version 1.0 (the standard) more generic and
    extensible, e.g. could carry metadata about
    learning objects or research data

52
Bluffers guide to OpenURLs
  • sources need to maintain knowledge about
    end-users preferred resolver
  • resolvers and targets need to share knowledge
    about link-to syntaxes
  • most library automation vendors will either have
    (or be developing) an OpenURL resolver solution
    for their customers
  • some open-source solutions also available but
    expect to work quite hard with these

53
Discussion
54
Summary
55
Summary
  • protocols presented here fill space between
    information providers and other services
    (portals, VLEs, etc.)
  • allow integration of remote information resources
    more seamlessly
  • allow separation of discovery and content
    delivery
  • enable user-focused, context-sensitive linking
  • can be viewed as ways of getting users to your
    site
  • but there are some issues to beware of

56
What can you do?
  • consider exposing metadata about your content for
    harvesting (or searching)
  • consider making alerting channels available
  • consider supporting use of OpenURLs for linking
    to appropriate-copy
  • consider how your content will be used in
    e-learning context
  • consider how external services link to your
    resources (i.e. support persistent deep linking
    to your content)
Write a Comment
User Comments (0)
About PowerShow.com