DSpace 2.x Architecture Roadmap - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

DSpace 2.x Architecture Roadmap

Description:

i.e. for module X depends on an implementation of API Y ... Integrate asset store w/DSpace 1.x. Either build synchronisation tool, or ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 56
Provided by: ROB51
Category:

less

Transcript and Presenter's Notes

Title: DSpace 2.x Architecture Roadmap


1
DSpace 2.x Architecture Roadmap
  • Robert Tansley
  • DSpace Technical Lead, HP

2
Overview
  • Why a DSpace 2.x?
  • Proposed Target Architecture
  • Example Deployments
  • Proposed Migration Path

3
Why a DSpace 2.x?
4
DSpace 1.x
  • Breadth-first implementation of institutional
    repository
  • Provides all required functionality to start
    capturing digital assets
  • Widened awareness and understanding of digital
    preservation problem

5
Key areas for improvement
  • Modularity
  • Digital Preservation
  • Scalability

6
Modularity
  • Current APIs are low-level, somewhat ad-hoc
  • Difficult to keep stable
  • Difficult to implement enhanced/alternative
    functionality behind them
  • Changing a particular aspect of functionality
    involves changing UI as well as underlying
    business logic module
  • e.g Workflow review pages very specific to
    current Workflow Manager module functionality

7
Modularity
  • Heavy inter-dependence
  • e.g. Use same DB tables change in one module
    means you have to change others that use the same
    tables
  • No real plug in mechanism
  • Managing a modification alongside evolving core
    DSpace code can be tricky

8
DSpace 1 series architecture
9
Making a change
10
Proposed new modular approach
  • Modules provide own UI
  • Modules do not directly share data, e.g. DB
    tables
  • Inter-module communication via defined APIs
  • Many modules then dont need APIs, e.g. browse UI

11
Proposed new modular approach
  • UIs glued together by UI framework
  • Framework provides navigation tools, look and
    feel, internationalisation, localisation

12
Proposed new modular approach
  • Modules can depend on APIs

13
Proposed new modular approach
  • Modules can implement two APIs
  • E.g. LDAP integration module could implement
    E-person API and authorisation API

14
Digital preservation
  • Use of relational database optimised for access
  • Metadata is separate from Bitstreams
  • database corruption would make archive very
    difficult to reconstruct
  • Hard to extend metadata schema support
  • Custom schema difficult for other apps to access

15
Scalability
  • Some limits on scalability in 1.x, e.g.
  • Browse code
  • Supports multiple file systems, but not ideal
  • Largely limited to single server
  • Mirroring difficult
  • Metadata in database, bitstream on file system
  • extraction non-trivial

16
Proposed approach
  • Refactor storage Asset store
  • Metadata in standard format and bitstreams stored
    in the same place
  • AIP becomes a more tangible concept
  • Aids preservation No reliance on particular
    software
  • Aids scalability Easier to manage storage and
    distribution
  • Easier to move around

17
Summary
18
Proposed Target Architecture
19
Target architecture overview
20
Asset store
21
Asset store
  • Corresponds to OAIS Archival Storage
  • Contains only Archival Information Packages
    (AIPs)
  • Not e-people records, in-progress submissions
    etc.
  • AIPs consist of
  • Metadata serialisation
  • Bitstreams
  • AIP checksum

22
Object model
23
Example AIP (item)
  • How it might look in a file system
  • aip-identifier/
  • metadata.xml current metadata serialisation
  • 184BE84F293342 bitstream 1 (filename
    checksum)
  • 3F9AD0389CB821 bitstream 2
  • 330F925A1D0386 bitstream 3
  • checksum checksum of AIP

24
Asset store API
25
Asset store API
  • Standardised Java API for DSpace asset stores
  • May be different implementations
  • simple file system
  • Enterprise reference information store
  • Grid-based, e.g. SRB
  • SAN
  • Allows creation, retrieval, update etc. of AIPs

26
Scaling up
  • Easy to replicate AIPs and asset stores
  • Enables serving larger numbers of users
  • Aids preservation Multiple copies, more robust

27
Scaling up
  • Two DSpaces can easily keep synchronised

28
Scaling up
  • Two DSpaces can easily keep synchronised
  • Something as simple as a periodic rsync can do
    the job
  • Exact mechanism would depend on asset store
  • File system, enterprise reference information
    store, SRB etc.

29
What about clashes?
  • Were dealing with reference information
  • DSpace is not an authoring system
  • Not work-in-progress, often-updated material
  • Same AIP updated by two different DSpace
    instances in same day unlikely
  • Can flag as a conflict for manual resolution
  • Exception Items being added to same collection
  • Simple to resolve merge the additions
  • Just make sure IDs are unique!

30
What about search indices?
  • Modules may maintain indices or caches of
    information from AIPs in the asset store
  • E.g. the browse UI, Lucene index
  • Modules keep indices or caches up-to-date by
    periodically polling asset store API
  • Similar to incremental harvesting in OAI-PMH

31
Why the polling approach?
  • Polling is simpler to implement than real-time
    notification
  • Implementing custom asset store easier
  • More scalable can control when indexing occurs
  • Big sync might mean several indices updating at
    once
  • End-users might not see deposits appear in the
    search/browse indices immediately. However
  • Doesnt happen anyway if any workflow review
    needed
  • Neednt take more than overnight to happen
  • Reference information not time-critical data

32
DSpace modular architecture
  • Some modules have APIs some do not
  • Modules may have dependencies
  • i.e. for module X depends on an implementation of
    API Y
  • Modules may use RDBMS but do not share tables

33
UI framework
34
UI framework
  • Glues together UIs of different modules
  • Provides navigation tools, stylesheets, skin
  • Internationalisation, localisation
  • User authentication
  • Cocoon provides most of the above functionality
  • Easy to add the rest

35
Exposing services via Tomcate.g. OAI-PMH
36
Core DSpace modules and APIs
37
Content management API
  • Similar to existing org.dspace.content API
  • Provides procedural way to manipulate AIPs
  • Implementation may cache some information in
    RDBMS
  • E.g. Community/collection/item structure

38
Extending metadata
  • Pull out pieces of search UI, submit UI, item
    display related to Dublin Core into a separate
    module
  • Allow other similar modules for dealing with
    other schemas and extensions
  • Start with simple property/value support
  • SIMILE will provide richer functionality

39
Security
  • Similar to DSpace 1.x
  • Modules running within DSpace instance trusted
  • Not worrying about malicious code for now
  • Modules, UI framework responsible for
    authenticating end-user as an e-person
  • Modules, asset store implementation must invoke
    authorisation API as appropriate

40
Summary
  • Refactor storage Content in AIPs (metadata
    bitstreams)
  • Easier to share/mirror AIPs with periodic
    synchronisation
  • Modules do OAI-PMH-style incremental harvests to
    keep indices/caches up to date
  • Benefit Increased scalability, preserve-ability
  • Cost New/changed AIPs arent instantly indexed
  • Often not the case anyway (workflow reviews)
  • Reference information (not time critical)

41
Summary
  • Modular architecture
  • Modules responsible for own UI and data
  • Modules inter-communicate via defined APIs
  • UI framework provides Web UI glue Cocoon
  • Dependency mechanism to allow plug-in
    functionality
  • Benefit Vastly improved modularity
  • Essential for our diverse community of users
  • Cost Implementing modules might take more effort
  • Unavoidable but manageable price of modularity
  • Different from current approach migration
    non-trivial
  • Those who havent changed DSpace 1 much will have
    easy upgrade path
  • Does anyone really like servlets/JSPs?

42
Example Deployments
43
Standard deployment
44
Web services module
45
LDAP-based e-people and authorisation
46
Mirrored asset store
47
Shared asset store
48
Separate ingest and access instances
49
DSpace on SRB
50
SIMILE
51
Proposed Migration Path
52
Stage 1 Build asset store
  • Decide on AIP metadata serialisation
  • Build asset store
  • Integrate asset store w/DSpace 1.x
  • Either build synchronisation tool, or
  • Replace CM API (org.dspace.content) -- trickier

53
Stage 2 Build 2.0
  • Design build modular infrastructure
    (dependencies etc)
  • Define the APIs
  • Port/implement 1.x functionality
  • Release this as 2.0
  • Institutions can port their code to the 2.0
    architecture, and swap over

54
Stage 3 2.x and beyond
  • DSpace 2.1
  • Authorisation policy expression in AIPs
  • XQuery API
  • DSpace 2.2
  • Federation
  • DSpace 2.3
  • Integrate SIMILE components

55
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com