Developing a digital repository infrastructure for Kings College London RSP Training Day, 22nd Janua - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Developing a digital repository infrastructure for Kings College London RSP Training Day, 22nd Janua

Description:

Developing a digital repository infrastructure for King's College London ... EIDER Project Demonstrator for enhanced deposit and ingest. Preservation Services: ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 18
Provided by: stephe326
Category:

less

Transcript and Presenter's Notes

Title: Developing a digital repository infrastructure for Kings College London RSP Training Day, 22nd Janua


1
Developing a digital repository infrastructure
for Kings College LondonRSP Training Day, 22nd
January 2009
  • Gareth Knight
  • Centre for e-Research

2
Approach
  • Analyse existing practices limitations of
    current system
  • Establish requirements for Information management
    access
  • Investigate alternative approaches (software
    choice, extensibility, applicability to your
    data, use by others)
  • Prototype smaller projects and experiments

3
Centre for e-Research
  • CeRch (http//www.kcl.ac.uk/iss/cerch) is
  • A RD department in Information Services and
    Systems (ISS) that performs
  • Management and preservation of research outputs
    from KCL researchers in all disciplines
  • Research, teaching and consultancy on
    e-infrastructure, data curation and preservation
    and others.
  • Formerly Arts Humanities Data Service
    Executive
  • Management and preservation of research outputs
    from UK researchers in arts and humanities

4
Context Existing approach
  • Formal, but manual ingest procedures
  • Bespoke repository for data management
  • Not scaleable code could not easily be
    reapplied to other projects.
  • Functional limitations
  • Preservation, provenance metadata
  • Limited delivery systems
  • Collection-level identifiers (mostly)
  • Diverse, semi-structured data

5
Requirements
  • Persistent identifiers down to the level of
    individual datastreams, accommodating compound
    content models
  • Versioning of content and metadata
  • Automated processing and user input
  • Able to integrate specialised third-party tools
    (e.g. format conversion)
  • Preservation metadata management
  • Audit trail/provenance metadata
  • Standard distribution methods for specific
    content types (Disseminators)

6
What do we use Fedora for?
  • Digital repository
  • Kings Research Archive An institutional
    repository for open access research papers
    written by Kings College London staff
  • Virtual Research Environment (VRE) supporting
    research management
  • EIDER Project Demonstrator for enhanced deposit
    and ingest
  • Preservation Services
  • SOAPI an architecture for (partially)
    automating preservation and ingest workflows in
    digital repositories
  • SHERPA DP2 developing preservation services for
    content located in disparate locations.
  • Digitisation projects
  • Historical Hansard - Digitisation project
    scanning and markup of 50 years of debates from
    the Upper Chamber of the Northern Ireland
    Parliament from 1921 to 1972
  • East London Theatre Archive - Digitisation of
    15,000 performing arts resources, from playbills
    and programmes to press cuttings and photographs
    from East London theatres

7
Capture Ingest workflow
Activities performed during Ingest
8
Metadata (1) Descriptive
  • Each project has specific descriptive MD
    requirements
  • Scholarly Works Application Profile (SWAP)
    created schema for IR
  • Metadata Object Description Schema (MODS) ELTA
    and SHERPA DP2
  • MarcXML SHERPA DP2
  • Simple DC (various)

9
Metadata (2) SWAP
10
Metadata (3) Preservation
  • Preservation
  • PREMIS Object
  • PREMIS Event (forthcoming)
  • Generated by DROID, JHOVE others
  • Rights
  • Rights MD
  • Provided by Sherpa-Romeo

11
Metadata (4) Preservation
Rights metadata provided by Sherpa Romeo
Technical metadata provided by JHOVE
12
Data Capture (1)Kings research data
  • Collection of Kings research data
  • Web interface for deposit
  • Deposit via SWORD from desktop/web client
  • Capture of metadata from Research Gateway, Web of
    Science and other sources.

13
Data Capture (2)Archiving services
  • SHERPA DP2 provides archiving and preservation
    services for varied software repositories and web
    resources
  • Content providers supported
  • Repositories Fedora, CDS Invenio, DSpace,
    EPrints, DigiTool
  • Website Large dynamic sites (through
    Subversion), static sites.
  • Capture methods
  • OAI-PMH for metadata capture
  • Data capture over HTTP/FTP and VPN.

14
Digitisation (1)East London Theatre Archive
  • 15,000 digital objects playbills, programmes,
    press cuttings and photographs.
  • Object model representing 2 layers
  • Performance venue
  • Item (3 manifestations of each image
    (high-quality, distribution, thumbnail)
  • Each will contain MODS metadata
  • Accessible through browse, search Google
    maps-style UI

15
Digitisation (2)Historical Hansard
  • 50 years of debates from the Upper Chamber of the
    Northern Ireland Parliament from 1921 to 1972.
  • Separated into collection and volume.
  • 45,100 items containing
  • Page images (3 manifestations of each image
    (high-quality, distribution, thumbnail)
  • OCRd text stored as XML
  • Relationship MD
  • UI Experiment with Fez, Muradora, Vital,
    Existing Stormont

16
Lessons we have learnt
  • Understand your needs
  • No one-size-fits-all approach
  • Match requirements to functionality, not visa
    versa
  • Implementation of a Fedora repository requires
    time
  • No out-of-box solution, though likely to change
    in the near future
  • Consider a long-term development plan. Some
    customisation may be required
  • Consider future expansion plans
  • Where do you want to be tomorrow?
  • Dont be intimidated
  • Lots of features, but dont need to use them all
  • Possible to break implementation into
    well-defined stages
  • Avoid reinventing the wheel
  • Examine existing Fedora projects that may save
    development time.
  • Develop code that can be repurposed to other
    project

17
Thank you!
  • Gareth Knight
  • Centre for e-Research
  • gareth.knight_at_kcl.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com