Title: Interoperability%20in%20Digital%20Libraries%20Open%20Archives%20Initiative%20and%20the%20NSDL
1Interoperability in Digital LibrariesOpen
Archives Initiative and the NSDL
- CS 502 20020326
- Carl Lagoze Cornell University
Acknowledgements Bill Arms Herbert Van de
Sompel
2Beyond the walls
The Library should selectively adopt the portal
model for targeted program areas. By creating
links from the Librarys Web site, this approach
would make available the ever-increasing body of
research materials distributed across the
Internet. The Library would be responsible for
carefully selecting and arranging for access to
licensed commercial resources for its users, but
it would not house local copies of materials or
assume responsibility for long-term
preservation. LC21 Digital Strategy for the
Library of Congresspage 5
3A portal should mean more than access..
- Traditional portal (e.g., Yahoo!)
- linkage with limited responsibility
- Hybrid Portal
- Asserting some semblance of curatorial role over
linked resources - Providing a rich fabric of services across those
resources
4Interoperability standards enable service
creation.
- Search and discovery
- Z39.50
- Metadata vocabularies and syntax
- MARC
- Dublin Core
- XML/RDF
- Object models
- METS
- FEDORA
5Interoperability Trade-offs
6Yes, its about resource discovery over
distributed collections
metadata
Author Title Abstract Identifer
7Facilitating/Monitoring Longevity of Distributed
Content
PreservationService
8Personalization of Content
9Cross-Repository Reference Linking
Linkage Service
10Origins of the OAI
- Increasing interest in alternative scholarly
publishing solutions e.g., LANL arXiv - Increasing impact through federation
- UPS Mtg., Sante Fe, October 1999
- Representatives of various ePrint, library,
publishing, communities - Goal definition of an interoperability framework
among ePrint providers - Result Santa Fe Convention, interoperability
through metadata harvesting
11Open Archives
- Political Agenda?
- Author self-archiving of E-Prints
- Mission to reformulate scholarly publishing
framework - Technical?
- Infrastructure to facilitate interoperability
across multiple domains
12Technical Umbrella for Practical Interoperability
Metadata Harvesting
Reference Libraries
Museums
Publishers
E-PrintArchives
that can be exploited by different communities
13OAI Technical Infrastructure Key technical
features
- Deploy now technology 80/20 rule
- Two-party model providers (data providers) and
consumers (service providers) - Simple HTTP encoding
- XML schema for some degree of protocol
conformance - Extensibility
- Multiple item-level metadata
- Collection level metadata
14The World According to OAI
Service Providers
Discovery
Current Awareness
Preservation
Data Providers
15Content and Metadata
Item (metadata)
repository
resource
record
010010
16(No Transcript)
17OAI-PMH History
- Version 1.0 January 21, 2001
- Version 1.1 July 2, 2001
- W3C XML schema changes
- Version 2.0a March 1, 2002
- Production release June 3, 2002
- No major functionality changes
- Numerous functional tweaks
- Harvesting granularity, flow control, error
handling
18Key Features of the OAI Metadata Harvesting
Protocol
- definitions concepts
- repository
- record
- identifier
- datestamp
- set
- protocol features
- HTTP encoding
- metadata prefix schema
- flow control
- protocol requests
- supporting requests
- harvesting requests
19repository
20record
ltrecordgt ltheadergt ltidentifiergtoaieg001lt/ident
ifiergt ltdatestampgt1999-01-01lt/datestampgt lt/head
ergt ltmetadatagt ltdc xmlnshttp//purl.org/dcgt
lttitlegtMy Examplelt/titlegt lt/dcgt lt/metadatagt
ltaboutgt ltea xmlnshttp//www.arXiv.org/ea
ltusagegtNo restrictionslt/usagegt lt/eagt lt/aboutgtlt
/recordgt
21identifiers
locally unique key for extracting a record from a
repository
oai-identifier oaiarchive-identifierrecord-ide
ntifier
example oaincstrlncstrl.cornellcs/TR94-1418
22selective harvesting - datestamps
23selective harvesting - sets
S2
24set specifics
- repositories define hierarchical organization
- each item in a repository may be organized in one
set, several sets, or no sets at all - meaning of sets or of set hierarchy is not
defined in protocol - individual communities may formulate common set
configurations
25HTTP encoding - requests
BASE-URL -----------gt an.oa.org/OAI-scriptkeyword
arguments --gt verbListIdentiferssetS1
GET http//an.oa.org/OAI-script?verbListIdenti
ferssetS1
POST POST http//an.oa.org/OAI-script
HTTP/1.0 Content-Length 78 Content-Type
application/x-www-form-urlencoded
verbListIdentiferssetS1
26HTTP encoding - responses
ltxml version1.0 encodingUTF-9
?gtltGetRecord xmlnshttp//oai.namespace.uri
xmlnsxsihttp//w3.namespace.uri xsischemaL
ocationhttp//oai.namespace.uri http//oai.sc
hemaURLgt ltresponseDategt2000-19-01T193030-0400
lt/responseDategt ltrequestURLgthttp//an.oa.org/OAI-
script?verbGetRecord ampidentifieroai3Aar
Xiv3A0001 ampmetadataPrefixoai_dclt/request
URLgt ltrecordgt record contents lt/record addit
ional recordslt/GetRecordgt
27metadata prefix and schema
- support for harvesting multiple metadata formats
- metadata schema each format must have a
validating XML schema at a publicly accessible
URL (communities may define shared formats and
schema. - metadata prefix each repository maps a prefix to
the schema it supports, which is used in protocol
requests. - support for unqualified Dublin Core mandatory
- DC OAI record syntax that builds on base DCMI
schema - reserved prefix oai_dc.
28flow control
29flow control specifics
- applies to all protocol requests that return
lists ListRecords, ListIdentifiers, ListSets - resumptionToken is opaque
- semantics of partitioning of responses within
resumption requests is undefined
30Extensibility Feature Summary
- Multiple metadata formats
- Collection level metadata
- Identify about container
- Record data
- Terms and conditions
- Provenance
- Set structure
- Pre-configured queries
31OAI Protocol
service provider
data provider
- Supporting protocol requests
- Identify
- ListMetadataFormats
- ListSets
- Harvesting protocol requests
- ListRecords
- ListIdentifiers
- GetRecord
32Supporting Protocol Requests
service provider
data provider
Identify
- Repository name
- Base-URL
- Admin e-mail
- OAI protocol version
- Description Container
33Supporting Protocol Requests
service provider
data provider
ListMetadataFormats
- REPEAT
- Format prefix
- Format XML schema
- /REPEAT
34Supporting Protocol Requests
service provider
data provider
ListSets
- REPEAT
- Set Specification
- Set Name
- /REPEAT
35Harvesting Protocol Requests
service provider
data provider
froma
untilb
setklm ListRecords metadataPrefixoai_dc
- REPEAT
- Identifier
- Datestamp
- Metadata
- About Container
- /REPEAT
36Harvesting Protocol Requests
service provider
data provider
froma
untilb ListIdentifiers setklm
- REPEAT
- Identifier
- Datestamp
- /REPEAT
37Harvesting Protocol Requests
service provider
data provider
identifieroaimlib123a
GetRecord metadataPrefixoai_dc
- Identifier
- Datestamp
- Metadata
- About
38(No Transcript)
39(No Transcript)
40Measures of Success
- gt100 implementers of the protocol
- 64 registered
- Basis for much research and implementation
- JCDL 2002
- A subject category for paper submission!
- Numerous papers building on OAI
- Research Projects and Funding
41Externally funded initiatives
- European Community
- Open Archives Forum
- Cyclades Project
- Andrew W. Mellon Foundation
- Funding for 7 service providers
- Digital Library Federation
- Gateways for access to member's digital
collecitons - National Science Foundation
- National Science Foundation Core Infrastructure
42DP9 Architecture
- Giving search engines access to the deep web
43(No Transcript)
44(No Transcript)
45NSDL (National Digital Library for Science,
Mathematics, and Engineering )
- Large-scale digital library technology
- 1,000,000 users
- 10,000,000 items
- 100,000 collections
- Diverse participants
- Libraries
- Academic/research institutions
- Individuals
46NSDL References
- http//comm.nsdlib.org/
- Zia, L., Growing a National Learning Environments
and Resources Network for Science, Mathematics,
Engineering, and Technology Education, D-Lib,
March 2001 - Arms, W. et. al., A Spectrum of Interoperability
The Site for Science Prototype for the NSDL,
D-Lib, January 2002 - Lagoze, C. et. Al., Core Services in the
Architecture of the NSDL, JCDL 2002, July 2002.
47The Challenge
Provide coherent services for users across
diverse collections, while retaining the
individuality and richness of the collections.
48The strategy
- A Spectrum of Interoperability
- Open framework for collections services
- Embrace collections with rich metadata support
for standards, ... accommodate collections with
limited metadata limited support for
interoperability. - Technical basis
- Follow library tradition of metadata sharing
- Use automated methods to generate, normalize,
translate metadata - Distribute metadata to service providers
49The Metadata Repository
Services
Users
Metadata repository
The metadata repository is a resource for service
providers. It holds information about every
collection and item known to the NSDL.
Collections
50MR Ingest and Exposure
OAI-PMH
OAI-PMH
Normalization Generation Cross-walking
MR Front Porch
OAI-PMH
gathering
Directentry
51Challenges and Questions
- Utility of lowest common denominator metadata
such as DC - Quality of metadata from non-professional
contributors - Machines processing to reduce and compliment
human effort - Functionality of service structure