Institutional Archives Technology Overview Michael L. Nelson Old Dominion University mlncs.odu.edu h - PowerPoint PPT Presentation

About This Presentation
Title:

Institutional Archives Technology Overview Michael L. Nelson Old Dominion University mlncs.odu.edu h

Description:

Institutional Archives & Repositories: What this digital ... and, of course, Herbert Van de Sompel (LANL) ... OAI = a bunch of people, a religion, a cult, etc. ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 71
Provided by: Michael50
Learn more at: https://www.cs.odu.edu
Category:

less

Transcript and Presenter's Notes

Title: Institutional Archives Technology Overview Michael L. Nelson Old Dominion University mlncs.odu.edu h


1
Institutional Archives Technology
OverviewMichael L. NelsonOld Dominion
Universitymln_at_cs.odu.eduhttp//www.cs.odu.edu/m
ln/
  • Institutional Archives Repositories What this
    digital movement means for Federal Libraries
  • Library of Congress Workshop
  • September 12, 2003

2
Acknowledgements
  • ODU K. Maly, M. Zubair, J. Bollen
  • LANL R. Luce, X. Liu
  • NASA G. Roncaglia, J. Rocker
  • Cornell C. Lagoze, S. Warner
  • MAGiC (UK) Paul Needham
  • and, of course, Herbert Van de Sompel (LANL)
  • the OpenURL slides are nicked from his
    presentations

3
Outline
  • A bit of history
  • Core technologies
  • OAI-PMH
  • OpenURL
  • Example implementations
  • Download and go

4
OAI-PMH
5
Background
  • I met Herbert Van de Sompel in April 1999...
  • we spoke of a demonstration project he had in
    mind and had received sponsorship from Paul
    Ginsparg and Rick Luce
  • We wanted to demonstrate a multi-disciplinary DL
    that leveraged the large number of high quality,
    yet often isolated, tech report servers, e-print
    servers, etc.
  • most digital libraries (DLs) had grown up along
    single disciplines or institutions
  • little to no interoperability isolated DL
    gardens

6
Universal Preprint Service
  • A cross-archive DL that that provides services on
    a collection of metadata harvested from multiple
    archives
  • Nelson NCSTRL a modified version of Dienst
  • support for clustering
  • support for buckets
  • Krichel ReDIF metadata format
  • Van de Sompel SFX Linking
  • Demonstrated at Santa Fe NM, October 21-22, 1999
  • http//web.archive.org/web//http//ups.cs.odu.edu
    /
  • D-Lib Magazine, 6(2) 2000 (2 articles)
  • http//www.dlib.org/dlib/february00/02contents.htm
    l
  • UPS was soon renamed the Open Archives Initiative
    (OAI) http//www.openarchives.org/

7
Data and Service Providers
  • Self-describing archives
  • Much of the learning about the constituent UPS
    archives occurred out of band
  • Data Providers
  • publishing into an archive
  • providing methods for metadata harvesting
  • provide non-technical context for sharing
    information also
  • Service Providers
  • harvest metadata from providers
  • implement user interface to data

Even if these are done by the same DL, these are
distinct roles
8
Metadata Harvesting
  • Move away from distributed searching
  • Extract metadata from various sources
  • Build services on local copies of metadata
  • data remains at remote repositories

all searching, browsing, etc. performed on the
metadata here
user
individual nodes can still support direct
user interaction
search for cfd applications
local copy of metadata
metadata harvested offline
metadata harvested offline
metadata harvested offline
metadata harvested offline
each node independently maintained
. . .
9
Result OAI
  • The OAI was the result of the demonstration and
    discussion during the Santa Fe meeting
  • OAI a bunch of people, a religion, a cult, etc.
  • OAI Protocol For Metadata Harvesting (OAI-PMH)
    the protocol created and maintained by the OAI
  • Initial focus was on federating collections of
    scholarly e-print materials
  • however, interest grew and the scope and
    application of OAI-PMH expanded to become a
    generic bulk metadata transport protocol
  • Note
  • OAI-PMH is only about metadata -- not full text!
  • but what is metadata vs. full-text?
  • OAI is neutral with respect to the nature of the
    metadata or the resources the metadata describes
  • read commercial publishers have an interest in
    OAI-PMH too...

10
Open Archives Initiative
11
OAI-PMH Mechanics
Request is encoded in http
Response is encoded in XML
XML Schema for the responses are defined in the
OAI-PMH document
12
Overview of OAI-PMH Verbs
archival metadata
harvesting verbs
most verbs take arguments dates, sets, ids,
metadata formats and resumption token (for flow
control)
13
OAI-PMH Data Model
item identifier
record identifier metadata format datestamp
14
Data Providers / Service Providers
15
Aggregators
  • aggregators allow for
  • scalability for OAI-PMH
  • load balancing
  • community building
  • discovery

service providers (harvesters)
data providers (repositories)
aggregator
16
Aggregators
  • Frequently interchangeable terms
  • aggregators likely to be community /
    institutionally focused
  • caches stores a copy, less likely to be
    community-oriented
  • proxies less likely to store a copy, may gateway
    between OAI-PMH and other protocols
  • Dienst / OAI Gateway Harrison, Nelson, Zubair,
    JCDL 03
  • To learn more about aggregators, caches
    proxies
  • http//www.openarchives.org/OAI/2.0/guidelines-agg
    regator.htm
  • http//www.cs.odu.edu/mln/jcdl03/

17
Example Aggregators
  • Arc - http//arc.cs.odu.edu/
  • first described hierarchical harvesting in
    D-Lib Magazine, 7(4) 2001
  • http//www.dlib.org/dlib/april01/liu/04liu.html
  • Celestial - http//celestial.eprints.org/
  • among other services, it provides a history of
    harvests (successful vs. errors)
  • http//celestial.eprints.org/cgi-bin/status

18
OAI-PMH 2.0 Registration
  • unregistered because
  • testing / development
  • not for public harvesting
  • public, but low-profile
  • never got around to it
  • ???

??? unregistered repositories
75 repositories registered
DPSP 51
Data Providers http//www.openarchives.org/Regist
er/BrowseSites.pl Service Providers
http//www.openarchives.org/service/listproviders.
html
19
Registration is NiceBut Not Required
  • OAI-PMH is (becoming) the http for digital
    libraries
  • there is no central registry of http servers
  • remember the NCSA Whats New page? (ca. 1994)
  • There will never be registration support in
    OAI-PMH
  • registries are a type of service provider, built
    on top of OAI-PMH
  • registration will be an integral part of
    community building
  • friends

20
NASA ltfriendsgt example
21
Field of Dreams
  • It should be easy to be a data provider, even if
    it makes more work for the service provider.
  • if enough data providers exist, the service
    providers will come (DPs gtgt SPs)
  • Open-source / freely available tools
  • drop-in data providers
  • at the end of this presentation
  • tools to make your existing DL a data provider
  • http//www.openarchives.org/tools/tools.htm
  • also OAI-implementers mailing list / mail
    archive!
  • service providers
  • http//oaiarc.sourceforge.net/

22
OAI-PMH Meeting History
23
Shift of Topics
  • From the protocol itself, supporting debugging
    tools and how to retrofit (existing) DLs
  • to building (new) services that use the OAI-PMH
    as a core technology and reporting on their
    impact to the institution/community

24
Arc
  • http//arc.cs.odu.edu/
  • harvests all known archives
  • first end-user service provider
  • source available through SourceForge
  • hierarchical harvesting

25
NCSTRL
  • http//www.ncstrl.org/
  • metadata harvesting replacement for Dienst-based
    NCSTRL
  • based on Arc
  • computer science metadata

26
Archon
  • http//archon.cs.odu.edu/
  • physics metadata
  • based on Arc
  • features
  • citation indexing
  • equation-based searching

27
Torii
  • http//torii.sissa.it/
  • physics metadata
  • features
  • personalization
  • recommendations
  • WAP access

28
iCite
  • http//icite.sissa.it/
  • physics metadata
  • features
  • citation based access to arXiv metadata

29
my.OAI
  • http//www.myoai.com/
  • covers all registered metadata
  • features
  • result sets
  • personalization
  • many other advanced features

30
Cyclades
  • http//www.ercim.org/cyclades
  • scientific metadata
  • features
  • personalization
  • recommendations
  • collaboration
  • status?

31
citebase
  • http//citebase.eprints.org/
  • arXiv metadata
  • citation based indexing, reporting

32
OAIster
  • http//oaister.umdl.umich.edu/
  • harvests all known archives

33
Others
  • Commercial publishers
  • American Physical Society (APS)
  • Institute of Physics
  • Elsevier / Scirus (www.scirus.com)
  • Department of Energy
  • OSTI
  • LANL
  • Institutional servers
  • DSpace (MIT www.dspace.org)
  • Eprints (www.eprints.org)
  • DARE (All Dutch universities)

34
NACA Technical Report Server
  • publicly available
  • began in 1996
  • details in NASA TM-1999-209127
  • scanned reports from 1917-1958
  • NACA predecessor to NASA
  • contents mirrored with the MaGIC project
  • a UK-based grey-literature preservation project
  • OAI-PMH used to mirror contents

http//naca.larc.nasa.gov/ http//naca.larc.nasa.g
ov/oai2.0/
35
NACA Report 1345 as seen through its native
DL http//naca.larc.nasa.gov/
36
NACA Report 1345 as seen through
MAGiC http//www.magic.ac.uk/
37
NACA Report 1345 as seen through its
Scirus (Elsevier) http//www.scirus.com/
38
NACA Report 1345 as seen through my.OAI (FS
Consulting) http//www.myoai.com/
39
NASA Technical Report Server
  • replacement for the previous distributed
    searching version of NTRS
  • MySQL
  • Va Tech harvester
  • modified bucket
  • details in Nelson, Rocker, Harrison, Library
    Hi-Tech, 21(2) (March 2003)
  • a service provider aggregator
  • same OAI baseURL as used for interactive searching

http//ntrs.nasa.gov/
40
NASA Technical Report Server
  • advanced, fielded search
  • explicit query routing
  • 10 NASA repositories
  • 4 non-NASA repositories
  • turned off by default

41
non-NASA repositories
gt 0.5M records
42
NASA DLs in the Larger STI Realm
DOE
DOD
Universities
Publishers
. . .
International
this could be a fully connected graph
NTRS could also be a data provider from the
point of view of other DLs allowing
the harvesting of NASA report metadata.
NTRS could also harvest metadata from other
DLs, and provide access to non-NASA content. We
hope to influence the direction of the
science.gov effort to use OAI-PMH
43
Service Providers
  • It is clear that SPs are proliferating, despite
    (because of?) the inherent bias toward DPs in the
    protocol
  • easy to be a DP -gt many DPs -gt SPs eventually
    emerge
  • hard to be a DP -gt SPs starve
  • currently 5x DPs more than SPs
  • SPs are beginning to offer increasingly
    sophisticated services
  • competitive market originally envisioned for SPs
    is emerging

44
OpenURL
45
Origins Motivation
  • The Context Library Automation Environment anno
    1998
  • distributed information environment
  • local remote AI databases
  • rapidly growing e-journal collection
  • need to interlink the available information
  • The Problem
  • links are delivered by info providers
  • links are not sensitive to users context
  • appropriate copy problem
  • links dependent on business agreements between
    information vendors
  • links dont cover the complete collection

46
Origins Motivation
  • The Context Library Automation Environment anno
    1998
  • distributed information environment
  • local remote AI databases
  • rapidly growing e-journal collection
  • need to interlink the available information
  • The REAL Problem
  • libraries have no say in linking
  • libraries are losing core part of the
    organizing information task
  • expensive collection is not used optimally
  • users are not well served

47
Origins Motivation
  • The Solution
  • In information services
  • DO NOT provide a link which is an actual service
    related to a referenced item (e.g. a link from a
    record in an AI database to the corresponding
    full-text)
  • BUT rather provide
  • a link that transports metadata about the
    referenced item
  • to
  • others that are better placed to provide service
    links

OpenURL
Linking server operated by library
48
non-OpenURL linking
resource
resource
.
link to referenced work
reference
resolution of metadata into link
49
OpenURL linking
transportation of metadata identifiers
user-specific
.
reference
context-sensitive
resolution of metadata identifiers into
services
provision of OpenURL
50
Evolution 1998
  • Nature of solution determined
  • Experiment with local databases at Ghent
    University
  • Demonstrated October 1998 at Belgian Library
    meeting
  • Problem statement Experiment described in 2
    D-Lib Magazine papers, April 1999

51
Evolution 1999
  • Feasibility of solution tested in 2 complex
    environments
  • Experiments
  • SFX_at_Ghent SFX_at_LANL LANL, Ghent, APS, Wiley,
    SilverPlatter, Ex Libris
  • UPS Prototype arXiv, SLAC/SPIRES, LANL, Ghent,
  • Demonstrated
  • June 1999 at ALA LiTA session, New Orleans
  • October 1999 at OAI meeting, Santa Fe
  • Experiments described in 2 D-Lib Magazine
    papers, October 1999 and February 2000

52
Evolution 2000
  • OpenURL 0.1 released
  • Quick adoption of OpenURL 0.1 in information
    community
  • SFX linking server goes beta

53
Evolution 2001
  • Integration of OpenURL Framework and
    DOI/CrossRef framework
  • Experiment involving CNRI, LANL, OhioLink,
    Academic Press, Ex Libris,
  • DOI/OpenURL integration described in 2 D-Lib
    Magazine papers, March 2001 and September 2001
  • First non-SFX linking servers appear

54
Evolution 2001
  • Proposal to standardize OpenURL
  • Generalization of OpenURL Framework concepts
    beyond scholarly information community
  • Described in
  • Van de Sompel, Herbert and Beit-Arie, Oren.
    Generalizing the OpenURL Framework beyond
    References to Scholarly Works the Bison-Futé
    model. July/August 2001. D-Lib Magazine.
  • NISO AX Committee starts standardization of the
    OpenURL Framework using the Bison-Futé model as
    the basis of its work.

55
NISO OpenURL Standardization Charge
  • Use existing OpenURL Framework as starting
    point
  • notion of context-sensitive services
  • notion of transporting contextual metadata
    packages to obtain context-sensitive services
  • Define syntax and transport-method for
    contextual metadata packages
  • Ensure extensibility
  • must support future applications
  • must support other information communities
  • gt Generalize and Standardize

56
NISO OpenURL Standardization Charge
  • Therefore, to be addressed were
  • OpenURL Framework beyond scholarly resources
  • contextual metadata packages
  • Syntax for contextual metadata packages
  • Transport of contextual metadata packages

57
  • default links
  • restricted in nature
  • action-radius restricted by business agreements
  • not context-sensitive

resource2
resource3
metadata plane
resource1
herbert van de sompel
58

extended services plane
service component1
service component2
resource2
resource3
metadata plane
resource1
herbert van de sompel
59
Download and Go!
60
Where Do You Want to Build?
user
service provider
. . .
data provider
data provider
data provider
data provider
local context- sensitive services
EPrints.org
61
Fedora
  • joint project between Cornell UVa
  • funded by the Mellon Foundation
  • a repository management system
  • focuses on complex digital objects and their
    behaivors
  • more info
  • http//www.fedora.info/
  • D-Lib Magazine, 9(4)
  • http//www.dlib.org/dlib/april03/staples/04staples
    .html

62
  • MIT HP Labs
  • constructed to capture all the output of MITs
    faculty
  • now generalized to the DSpace Federation
  • 8 top universities in the US Canada
  • More info
  • http//www.dspace.org/
  • http//sourceforge.net/projects/dspace/
  • D-Lib Magazine 9(1)
  • http//www.dlib.org/dlib/january03/smith/01smith.h
    tml

63
EPrints.org
  • developed at Southampton University
  • part of larger suite of institutional/author
    self-archiving tools and services
  • e.g. citebase paracite
  • widely adopted -- 100 sites
  • http//software.eprints.org/ep2
  • more info
  • http//www.eprints.org/
  • http//www.arl.org/sparc/core/index.asp?pageg206

64
  • P2P publishing for academia
  • community servers for coordination, management
  • archivelets for individual laptops, PCs
  • more info
  • http//kepler.cs.odu.edu/
  • D-Lib Magazine 7(4)
  • http//www.dlib.org/dlib/april01/maly/04maly.html

65
  • developed by UKOLN
  • open source
  • OpenURL 0.1 format resolver
  • NISO 1.0 format???
  • more info
  • Ariadne, 28
  • http//www.ariadne.ac.uk/issue28/resolver/
  • ftp//ftp.ukoln.ac.uk/metadata/tools/openresolver/
  • http//www.ukoln.ac.uk/distributed-systems/openurl
    /

66
Conclusions
67
Why The OAI-PMH is NOT Important
  • Users dont care
  • OAI-PMH is middleware
  • if done right, the uninterested user should never
    have to know
  • Using OAI-PMH does not insure a good SP
  • OAI-PMH is (or is becoming) HTTP for DLs
  • few people get excited about http now
  • http OAI-PMH are core technologies whose
    presence is now assumed

68
Other Uses For the OAI-PMH
  • Assumptions
  • Traditional DLs / SPs will continue on their
    present path of increasing sophistication
  • citation indexing, search results viz,
    personalization, recommendations, subject-based
    filtering, etc.
  • growth rates remain the same (5x DPs as SPs)
  • Premise OAI-PMH is applicable to any scenario
    that needs to update / synchronize distributed
    state
  • Future opportunities are possible by creatively
    interpreting the OAI-PMH data model
  • See Van de Sompel, Young Hickey, D-Lib Magazine
    July 2003, http//www.dlib.org/dlib/july03/young/0
    7young.html

69
OpenURL Framework evolution
70
The Future Community Building
  • Ultimately, protocols and metadata formats are
    not what makes a difference
  • Rather, the critical mass afforded by a common
    set of utilities (cf. http, Dublin Core, XML)
  • The best current example The Open Language
    Archives Community
  • http//www.language-archives.org/
  • OAI-PMH provides the basis for communication
    between strangers, but allows even richer
    communication between friends
Write a Comment
User Comments (0)
About PowerShow.com