Title: ARROW institutional repositories and discovery services: presentation to the CORDRA workshop, Melbou
1ARROW institutional repositories and discovery
services presentation to the CORDRA workshop,
Melbourne, 4 Feb 2005
- Geoff Payne,
- ARROW Project Manager
2Presentation structure
- ARROW objectives and strategies
- ARROW software
- ARROW and CORDRA
3ARROW acronyms
- Australian Research Repositories Online to the
World (ARROW) is a Federated Repositories of
Digital Objects (FRODO) Project funded by the
Australian Commonwealth Government Dept of
Education, Science and Training (DEST) - DEST Funding of A3.66M over three years
(2004-2006) - ARROW Consortium Partners
- Monash University (Lead Institution)
- University of New South Wales
- Swinburne University of Technology
- National Library of Australia
4What is an Institutional Repository?
- A managed collection of digital objects
- institutional in scope
- with consistent data and metadata structures for
similar objects - enabling resource discovery by the Communities
of Practice for whom the objects are of interest - allowing read, input and export of objects to
facilitate resource sharing - respecting access constraints
- sustainable over time
- facilitating application of preservation
strategies
5ARROW Branded Services Profile
Internet
National Library of Australia
ARROW Web Site Project Information
National Library of Australia ARROW Resource
Discovery Service Using TeraText to index
metadata harvested by OAI PMH
ARROW Open Access Journal Publishing
System Using OJS from Public Knowledge
Project
Internet Search Engines Capture
text exposed by ARROW Repositories
Swinburne
UNSW
Monash ARROW Repository Digital Object
Storage using Fedora VITAL
Members only area Meeting Minutes etc
6Repositories Technical Issues Metadata
Exchange
- Dublin Core insufficiently granular for many
purposes - Learning Object Metadata not good for
bibliographic metadata - Need to preserve metadata relevant to categories
of objects as decided by the community of
practice that produced the object - Open Archives Initiative Protocol for Metadata
Harvesting (OIA-PMH) can gather Dublin Core
metadata to establish resource discovery services
7ARROW Metadata Strategy
- Supports metadata schemata to suit individual
data models - No requirement to shoehorn all metadata into one
schema - Each stored object can retain metadata developed
for it by the community of practice which
generated the object - Maintains flexibility to store many types of
digital objects in the repository - No need to anticipate every object type now
8OCLC Metadata Interoperability Core
From Godby, Smith and Childress. 2003. Two
paths to interoperable metadata p. 3 at
http//www.oclc.org/research/publications/archive/
2003/godby-dc2003.pdf
9ARROW Branded Services Profile
Internet
National Library of Australia
ARROW Web Site Project Information
National Library of Australia ARROW Resource
Discovery Service Using TeraText to index
metadata harvested by OAI PMH
ARROW Open Access Journal Publishing
System Using OJS from Public Knowledge
Project
Internet Search Engines Capture
text exposed by ARROW Repositories
Swinburne
UNSW
Monash ARROW Repository Digital Object
Storage using Fedora VITAL
Members only area Meeting Minutes etc
10(No Transcript)
11ARROW Persistent Identifiers
- Repositories need to offer a preferred form of
citation for their content - Which does not break as URLs do when files are
moved or web sites are restructured - Handles from CNRI seem to be becoming widely
adopted - DOI (Digital Object Identifier is a Handle)
- UK Stationery Office adopting Handles
- DSpace uses Handles
12ARROW Repository Persistent Identifiers
- ARROW Handles Format adopted
- http//arrow.monash.edu.au/hdl/1959.1/nnnn
- 1959 ARROW handles naming authority
- 1959.n one sub number for each ARROW repository
- nnnn running number
- ARROW will assign a handle to each datastream in
a digital object to ensure that - individual parts of the digital object can be
cited and re-used independently - Internal data models in the repository can be
reworked and the datastream can still be reliably
retrieved - http//www.handle.net/index.html
13ARROW - Summary of design criteria
- A generalised institutional repository solution
- Initial focus on managing and exposing
traditional bibliographic research outputs - Expand to managing non-bibliographic research
outputs - Design decisions are being taken with the
intention of not precluding management of other
digital objects such as learning objects and
large research data sets
14ARROW technology software selected
- Flexible Extensible Digital Object Repository
Architecture -Fedora - Software implementation of architecture by
Cornell University and University of Virginia - VTLS Inc www.vtls.com as development partners
- ARROW / VTLS partnership to take the Fedora
engine and construct a working repository to
meet ARROWs functional requirements - ARROW licensing VITAL repository product
- VTLS doing ARROW-specified development
- Ongoing sustainability through vendor support
15ARROW architecture component software
VITAL Access Portal, OAI/PMH, SRU/SRW, Web
Exposure
VITAL, OJS Fedora
Fedora
16Resulting VITAL Application Stack
Vital Closed Source Management Client(Windows)
Access Portal (Web)
ARROW-Funded Open Source Web Services
Fedora Open Source Repository
17ARROW stages
- Demonstration (2004)
- Developing architecture, selecting, testing and
developing software - Deployment (late 2004 end 2005)
- Populating the ARROW Partners repositories
- Distribution (mid 2005 end 2006)
- Enabling others to participate
- Under review for earlier participation by others
18ARROW partnerships
- Established
- VTLS
- Fedora
- Google, to test indexing of research materials
- Thomson ISI Web Citation Index
- Being negotiated
- OCLC, to test their metadata interoperability
core - Open Journal System, to enhance the OJS Software
- Research Master, to test integration between RM4
and ARROW
19ARROW FRODO Partnerships
- MAMS Meta Access Management System
- Access control through eXtensible Access Control
Markup Language (XACML) metadata - Needs development of a FRODO profile of XACML for
access control interoperability - APSR Australian Partnerships for Sustainable
Repositories - Interoperability through consistent metadata for
similar data objects - Needs FRODO Metadata schemata for object
exchange, export and ingest into new repository
environments as part of sustainability and
preservation initiatives - ADT Australian Digital Theses
- Interoperability through harvestable Dublin Core
metadata - Supporting e-theses online which are pointed to
from ADT - Role for an overarching FRODO Web services
strategy?
20Repositories and Middleware
- List of possible open source repository software
- http//www.soros.org/openaccess/software/
- Regardless of software selected, need to deal
with same issues - authorisation/authentication
- object processing on ingest
- object workflow on ingest
- metadata consistency
- providing search exposure
- identifying OA status of deposits
- collaboration with other repositories and
repository initiatives - These may or may not be handled in the software
selected
21Search exposure
- Requirements
- standards-based way for information gateways, as
well as other repositories, to query repository
contents directly - way to harvest from repositories to support
single search gateway (preferable to federated
search) - ARROW will be supporting OAI-PMH and also
commissioning development of SRU/SRW web service
on top of Fedora - Google has agreed to work with ARROW to expose
repository content via Google Scholar - Middleware opportunity
- Extending OAI-PMH to harvest content as well as
metadata - modOAI project already looking at this
- Agreement to use SRU/SRW for searching across
different repositories - Moderately coupled with repository
22Collaboration with other repositories
- Requirements
- need for interoperability
- avoiding wheel re-invention
- learning from each others progress so far
- ARROW working with Fedora Development Consortium,
National Science Digital Library and APSR - Middleware opportunities
- Registry of standard content models
- In the absence of existing practice, influencing
the emergence of de facto standards - Agreement on standard framework for re-usable web
services
23ARROW Content Committee
- Unfortunately it is not as simple as build it and
they will come - Publisher and Library/Learning Solutions (PALS)
Pathfinder research on web-based repositories ,
Final Report, January 2004 - We find that IRs are currently rather small,
with an average (median) of 290 records per
institution (smaller but comparable to the median
size of other OAI data providers). (Page 33)
24Incentives are needed for academics to submit
their materials to repositories
- Substantial advocacy is required to achieve
participation - Mandatory deposit of e-Theses
- Credits towards promotion
- Funding linkages
- Demonstrable additional exposure such as in Web
Citation indexes and search engines
25ARROW and CORDRA
- ARROW
- Relies on exposing metadata for harvesting by
discovery services - Exposes content through search engines
- Not exclusively SCORM compliant learning objects
- Exposes basic elements for use in building
learning objects etc - Need to minimise human effort in the registration
process - CORDRA as one of many pathways to ARROW content
26Questions
- What is in it for authors?
- Protocols for exposing content in scope for
CORDRA registries via automatic harvesting or
push mechanisms? - Boundaries for CORDRA instances
- National?
- Community of interest
- Medical images for diagnostics
- Medieval manuscripts
- Content eligibility
- ARROW fundamental building block objects
- ARROW learning objects
- What is the difference between harvesting and
CORDRA registration - Would CORDRA relate to ARROWs federated resource
discovery service, or to individual repositories? - Business model reciprocal access or real ?