The OAI Protocol for Metadata Harvesting - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

The OAI Protocol for Metadata Harvesting

Description:

compression mechanisms (for improving performance of harvesters) ... harvester need not use all types. repository must implement all types ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 21
Provided by: andy272
Category:

less

Transcript and Presenter's Notes

Title: The OAI Protocol for Metadata Harvesting


1
The OAI Protocol for Metadata Harvesting
  • Andy Powell
  • a.powell_at_ukoln.ac.uk
  • UKOLN, University of Bath
  • IVOA Registry Meeting, London
  • March 2003

2
Contents
  • a brief history of OAI
  • 10 technical things you should know about the
    OAI-PMH

3
OAI roots
  • the roots of OAI lie in the development of eprint
    archives
  • arXiv, CogPrints, NACA (NASA), RePEc, NDLTD,
    NCSTRL
  • each offered Web interface for deposit of
    articles and for end-user searches
  • difficult for end-users to work across archives
    without having to learn multiple different
    interfaces
  • recognised need for single search interface to
    all archives
  • Universal Pre-print Service (UPS)

4
Searching vs. harvesting
  • two possible approaches to building a single
    search interface to multiple eprint archives
  • cross-searching multiple archives based on
    protocol like Z39.50
  • harvesting metadata into one or more central
    services bulk move data to the user-interface
  • US digital library experience in this area
    indicated that cross-searching not preferred
    approach
  • distributed searching of N nodes viable, but only
    for small values of N

5
Searching vs. harvesting
6
Harvesting requirements
  • in order that harvesting approach can work there
    need to be agreements about
  • transport protocols HTTP vs. FTP vs.
  • metadata formats DC vs. MARC vs.
  • quality assurance mandatory elements,
    mechanisms for naming of people, subjects, etc.,
    handling duplicated records, best-practice
  • intellectual property and usage rights who can
    do what with the records
  • work in this area resulted in the Santa Fe
    Convention

7
Development of OAI-PMH
  • 2 year metamorphosis thru various names
  • Santa Fe Convention, OAI-PMH versions 1.0, 1.1
  • OAI Protocol for Metadata Harvesting 2.0
  • development steered by international technical
    committee
  • inter-version stability helped developer
    confidence
  • move from focus on eprints to more generic
    protocol
  • move from OAI-specific metadata schema to
    mandatory support for DC

8
Bluffers guide to OAI
http//www.openarchives.org/
  • OAI-PMH is a low-cost mechanism for harvesting
    metadata records
  • from data providers to service providers
  • allows service provider to say give me some or
    all of your metadata records
  • where some is based on date-stamps, sets,
    metadata formats
  • not limited to repositories of eprints
  • images, museum artefacts, learning objects,
  • based on HTTP and XML
  • simple, Web-friendly, autonomous
  • fast, flexible deployment

9
Bluffers guide to OAI
  • OAI-PMH is not a search protocol
  • but use can underpin search-based services based
    on Z39.50 or SRW or SOAP or
  • OAI-PMH carries only metadata
  • content (e.g. full-text or image) made available
    separately typically at URL in metadata
  • mandates simple DC as record format
  • but extensible to any XML format IMS, ONIX,
    MARC, METS, etc.
  • extensible framework for metadata about
  • repository, resources, items, sets
  • can include rights metadata

10
Bluffers guide to OAI
  • metadata and content often made freely
    available but not a requirement
  • OAI-PMH can be used between closed groups
  • or, can make metadata available but restrict
    access to content in some way
  • underlying HTTP protocol provides
  • access control e.g. HTTP BASIC
  • compression mechanisms (for improving performance
    of harvesters)
  • could, in theory, also provide encryption if
    required

11
Resources, items and records
all available metadata about David
item identifier
item
Dublin Core metadata
MARC metadata
SPECTRUM metadata
records
12
Protocol requests
  • six different request types
  • Identify
  • ListMetadataFormats
  • ListSets
  • ListIdentifiers
  • ListRecords
  • GetRecord
  • harvester need not use all types
  • repository must implement all types
  • required and optional arguments
  • on request types

13
Record structure
  • metadata about a resource in a particular XML
    format
  • header (mandatory)
  • identifier (1)
  • datestamp (1)
  • setSpec elements ()
  • status attribute for deleted item (?)
  • metadata (mandatory)
  • XML encoded metadata within root tag which
    provides namespace and schema
  • repositories must support Dublin Core
  • about (optional)
  • rights statements
  • provenance statements

14
Dublin Core
http//dublincore.org/
  • OAI-PMH mandates use of simple DC as lowest
    common denominator
  • agreed XML schema oai_dc
  • simple DC 15 metadata properties
  • all DC properties optional and repeatable

Title Contributor Source
Creator Date Language
Subject Type Relation
Description Format Coverage
Publisher Identifier Rights
15
OAI demonstration
  • repository explorer demo

16
OAI and Google
eprint archive(s)
Web site(s)
multimedia database(s)
DP9 gateway
OAI gatewaymakes harvested metadata available
to Google
17
Implementing OAI
  • OAI protocol is relatively simple
  • implementation and deployment tends to be very
    fast
  • lots of available toolkits
  • Java, Perl, PHP, etc.
  • complete tools also available
  • e.g. tools that sit in front ofexisting
    databases
  • see tools area on theOAI Web site

18
Creative Commons
http//www.creativecommons.org/
  • CC is devoted to expanding the range of creative
    work available for others to build upon and
    share
  • provides standard licences for content
  • attribution
  • noncommercial
  • no derivative works
  • share alike
  • mechanisms for indicating licence on Web pages
  • need similar mechanism in OAI

19
Questions
20
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com