Title: Developing a digital repository infrastructure for Kings College London RSP Training Day, 22nd Janua
1Developing a digital repository infrastructure
for Kings College LondonRSP Training Day, 22nd
January 2009
- Gareth Knight
- Centre for e-Research
2Approach
- Analyse existing practices limitations of
current system - Establish requirements for Information management
access - Investigate alternative approaches (software
choice, extensibility, applicability to your
data, use by others) - Prototype smaller projects and experiments
3Centre for e-Research
- CeRch (http//www.kcl.ac.uk/iss/cerch) is
- A RD department in Information Services and
Systems (ISS) that performs - Management and preservation of research outputs
from KCL researchers in all disciplines - Research, teaching and consultancy on
e-infrastructure, data curation and preservation
and others. - Formerly Arts Humanities Data Service
Executive - Management and preservation of research outputs
from UK researchers in arts and humanities
4Context Existing approach
- Formal, but manual ingest procedures
- Bespoke repository for data management
- Not scaleable code could not easily be
reapplied to other projects. - Functional limitations
- Preservation, provenance metadata
- Limited delivery systems
- Collection-level identifiers (mostly)
- Diverse, semi-structured data
5Requirements
- Persistent identifiers down to the level of
individual datastreams, accommodating compound
content models - Versioning of content and metadata
- Automated processing and user input
- Able to integrate specialised third-party tools
(e.g. format conversion) - Preservation metadata management
- Audit trail/provenance metadata
- Standard distribution methods for specific
content types (Disseminators)
6What do we use Fedora for?
- Digital repository
- Kings Research Archive An institutional
repository for open access research papers
written by Kings College London staff - Virtual Research Environment (VRE) supporting
research management - EIDER Project Demonstrator for enhanced deposit
and ingest - Preservation Services
- SOAPI an architecture for (partially)
automating preservation and ingest workflows in
digital repositories - SHERPA DP2 developing preservation services for
content located in disparate locations. - Digitisation projects
- Historical Hansard - Digitisation project
scanning and markup of 50 years of debates from
the Upper Chamber of the Northern Ireland
Parliament from 1921 to 1972 - East London Theatre Archive - Digitisation of
15,000 performing arts resources, from playbills
and programmes to press cuttings and photographs
from East London theatres
7Capture Ingest workflow
Activities performed during Ingest
8Metadata (1) Descriptive
- Each project has specific descriptive MD
requirements - Scholarly Works Application Profile (SWAP)
created schema for IR - Metadata Object Description Schema (MODS) ELTA
and SHERPA DP2 - MarcXML SHERPA DP2
- Simple DC (various)
9Metadata (2) SWAP
10Metadata (3) Preservation
- Preservation
- PREMIS Object
- PREMIS Event (forthcoming)
- Generated by DROID, JHOVE others
- Rights
- Rights MD
- Provided by Sherpa-Romeo
11Metadata (4) Preservation
Rights metadata provided by Sherpa Romeo
Technical metadata provided by JHOVE
12Data Capture (1)Kings research data
- Collection of Kings research data
- Web interface for deposit
- Deposit via SWORD from desktop/web client
- Capture of metadata from Research Gateway, Web of
Science and other sources.
13Data Capture (2)Archiving services
- SHERPA DP2 provides archiving and preservation
services for varied software repositories and web
resources - Content providers supported
- Repositories Fedora, CDS Invenio, DSpace,
EPrints, DigiTool - Website Large dynamic sites (through
Subversion), static sites. - Capture methods
- OAI-PMH for metadata capture
- Data capture over HTTP/FTP and VPN.
14Digitisation (1)East London Theatre Archive
- 15,000 digital objects playbills, programmes,
press cuttings and photographs. - Object model representing 2 layers
- Performance venue
- Item (3 manifestations of each image
(high-quality, distribution, thumbnail) - Each will contain MODS metadata
- Accessible through browse, search Google
maps-style UI
15Digitisation (2)Historical Hansard
- 50 years of debates from the Upper Chamber of the
Northern Ireland Parliament from 1921 to 1972. - Separated into collection and volume.
- 45,100 items containing
- Page images (3 manifestations of each image
(high-quality, distribution, thumbnail) - OCRd text stored as XML
- Relationship MD
- UI Experiment with Fez, Muradora, Vital,
Existing Stormont
16Lessons we have learnt
- Understand your needs
- No one-size-fits-all approach
- Match requirements to functionality, not visa
versa - Implementation of a Fedora repository requires
time - No out-of-box solution, though likely to change
in the near future - Consider a long-term development plan. Some
customisation may be required - Consider future expansion plans
- Where do you want to be tomorrow?
- Dont be intimidated
- Lots of features, but dont need to use them all
- Possible to break implementation into
well-defined stages - Avoid reinventing the wheel
- Examine existing Fedora projects that may save
development time. - Develop code that can be repurposed to other
project
17Thank you!
- Gareth Knight
- Centre for e-Research
- gareth.knight_at_kcl.ac.uk