DSpace SRB Integration - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

DSpace SRB Integration

Description:

University of California, San Diego Libraries. CNI Fall Task Force Meeting. Portland, Oregon ... Moving Images. California movie newsreel footage. Size of ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 47
Provided by: lucdec
Category:

less

Transcript and Presenter's Notes

Title: DSpace SRB Integration


1
DSpace / SRBIntegration
  • Luc Declerck and Chris Frymann
  • University of California, San Diego Libraries
  • CNI Fall Task Force Meeting
  • Portland, Oregon
  • December 6-7, 2004

2
This Presentation Will
  • Provide a brief overview of DSpace SRB
  • Report on a project to integrate the two systems

3
Success of the Web
  • Much of the value and success of the Web is a
    result of its
  • Ease of use
  • Enormous size
  • Simplistically, this has been achieved through
  • A simple user interface
  • Transparent access to distributed storage

4
SimpleInterface
Transparent Access to Distributed Storage
Web Browser
Web Server

DSpace
SRB

5
Project Participants
  • San Diego Super Computer Center (SDSC)
  • Member of National Partnership for Advanced
    Computational Infrastructure (NPACI) an NSF
    sponsored program
  • MIT Libraries (MIT)
  • UC San Diego Libraries (UCSD)
  • National Archives and Records Administration
    (NARA)

6
Main Project Goal
  • Demonstrate that integration of DSpace with SRB
    leads to improved functionality for both systems

7
DSpace
  • Jointly developed by
  • MIT Libraries
  • Hewlett-Packard (HP)

8
DSpace Familiar As
  • Simple user-friendly front end providing
  • Digital content ingestion
  • Indexing, search and discovery
  • Content management
  • Dissemination services

9
Federation Services
SRB
10
Dspace 2.0 Discussion Planning
  • AssetStore API
  • AIP Model
  • Modularity Mechanism
  • UI Framework
  • http//simile.mit.edu/dspace-wiki/DspaceTwo

11
DSpace Availability
  • To any type of organization as
  • Free, open-source software
  • That can be customized and extended
  • From
  • http//sourceforge.net/projects/dspace/

12
SRB
  • Storage Resource Broker
  • Developed at San Diego Supercomputer Center

13
SRB
  • Server software programming interfaces
  • Allows applications that store and retrieve files
    to treat a diverse collection of physical storage
    devices as a single logical resource
  • Utilizes data grid and federation technologies

14

DSpace
Applications
Fedora
CDL DPR
Other
SRB
Broker
Storage
15
Basic Storage Resource
200 GB Disk Drive
16
Storage Resource
10 drives 2 Terabytes/box
.2 TB
.2 TB
.2 TB
.2 TB
.2 TB
.2 TB
.2 TB
.2 TB
.2 TB
.2 TB
Rackmount Storage Server
SRB lets us treat it as a single logical resource
17
Single Logical Resource 12 TB
Server 6
Server 5
Rack of Storage Servers
Server 4
Server 3
Server 2
Server 1
18
Single Logical Resource 50 TB
12 TB
12 TB
12 TB
12 TB
Room of Racks
19
200 TBSingle Logical Resource
Applications
SRB
20
What SRB Does
  • Connects, replicates, syncs, and archives
    heterogeneous resources in a logical manner,
    using abstraction mechanisms
  • Provides a way to access files and computers
    based on their attributes rather than just their
    names or physical locations

21
SDSC Storage Resource Broker Meta-data Catalog
InQ
MySRB
DSpace
Application
Linux I/O
OAI WSDL
Access APIs
DLL / Python
Java, NT Browsers
GridFTP
Federation
Consistency Management /
Authorization-Authentication
SRB Server
Logical Name Space
Latency Management
Data Transport
Metadata Transport
Storage Abstraction
Catalog Abstraction
Databases DB2, Oracle, Sybase, SQLServer
Drivers
HRM
22
SRB Data Grid Abstractions
  • Logical name space for files
  • Global persistent identifier
  • Storage repository abstraction
  • Standard operations supported on storage systems
  • Information repository abstraction
  • Standard operations to manage collections in
    databases
  • Access abstraction
  • Standard interface to support alternate APIs
  • Latency management mechanisms
  • Aggregation, parallel I/O, replication, caching
  • Security interoperability
  • GSSAPI, inter-realm authentication,
    collection-based authorization

23
Data Grids Federation
  • Data grids provide the ability to name, organize,
    and manage data on distributed storage resources
  • Federation provides a way to name, organize, and
    manage data on multiple data grids.

24
Federated SRB Server Model
Peer-to-peer Brokering
Read Application
Parallel Data Access
Logical Name Or Attribute Condition
1
6
5/6
SRB server
SRB server
3
4
5
SRB agent
SRB agent
2
Server(s) Spawning
R1
MCAT
1.Logical-to-Physical mapping 2.Identification of
Replicas 3.Access Audit Control
R2
Data Access
25
SRB Availability
  • SRB source distributed to academic and research
    institutions
  • Commercial use access through UCSD Technology
    Transfer Office
  • William Decker WJDecker_at_ucsd.edu
  • Commercial version from
  • http//www.nirvanastorage.com

26
SRB Info Resources
  • SRB Homepage
  • http//www.npaci.edu/DICE/SRB/

27
The Project
28
Main Goal
  • Extension of DSpace storage capability
  • Use SRB as filestore for DSpace bitstreams

29
Simple User Interface
Unlimited Storage
DSpace
SRB

Content Ingestion Discovery
Dissemination
Uniform interface to storage Distributed
Heterogeneous
30
Implementation Steps
  • Replace DSpace file system calls with SRB access
    calls
  • Employ METS based Archival Information Package
    (AIP)
  • Enable exchange of data and metadata between
    independent DSpace and SRB systems
  • Validate authenticity of exchanged content

31
UCSD LibrariesTo ProvideTest Collections
  • Still Images
  • Over 200,000 Digitized Slides
  • Approximately 4 Terabytes
  • Moving Images
  • California movie newsreel footage
  • Size of collection to be determined

32
Testing Will Explore
  • Management of terabyte scale collections
  • Automating aspects of archival workflow
  • Integration of METS
  • Automated verification and validation checking

33
Schedule Deliverables
  • Year 1 Develop Prototype
  • Develop functional requirements
  • Specify standard interfaces METS profiles
  • Prototype implementation of specified design
  • Ingest data, evaluate functionality and
    performance

34
Schedule Deliverables
  • Year 2 - Implementations
  • Federation with additional systems, possibly
  • CDL
  • OCLC
  • Fedora
  • Scalability testing
  • Ingestion of more content types

35
Progress So Far
  • Use Cases
  • Data Model (AIP)
  • Project timeline
  • Data preparation
  • Batch ingestion of files into SRB
  • METS Profile development
  • Code integration
  • Single item ingest and retrieval
  • Batch registration of existing SRB resources into
    DSpace

36
Data Model
  • Paired Content and Metadata Files
  • Metadata encoded in standard METS profiles
  • Stand-alone METS files used to describe arbitrary
    levels of aggregation of lower level objects

37
(No Transcript)
38
DSpace/SRB Integration Paths
DSpace
Content File Metadata
DB
Content Files
Single Item Ingest into DSpace/SRB
Future
DSpace Batch Import Utility
Batch Registration into DSpace/SRB
QDC
Mcat
METS Metadata
Content Files
SRB
Batch Ingestion into SRB
Content Files Metadata
Storage Layer
39
Federation
UCSD
CDL
MIT
DSpace
DSpace
DSpace
SRB
SRB
SRB
40
Federation
UCSD
CDL
MIT
DSpace
DSpace
DSpace
SRB
SRB
SRB
41
Peer-to-Peer Federation
  • Occasional Interchange - for specified users
  • Replicated Catalogs - entire state information
    replication
  • Resource Interaction - data replication
  • Replicated Data Zones - no user interactions
    between zones
  • Master-Slave Zones - slaves replicate data from
    master zone
  • Snow-Flake Zones - hierarchy of data
    replication zones
  • User / Data Replica Zones - user access from
    remote to host zone
  • Nomadic Zones - synchronize local zone to
    parent zone
  • Free-floating myZone - synchronize without a
    parent zone
  • Archival BackUp Zone - synchronize to an archive

42
Principle peer-to-peer federation approaches
43
Future Plans / Challenges
  • Planning intersection with DSpace 2.0 evolution
  • Extension of Preservation Architecture (AIP)
  • Naming (implementation of ARKs)
  • Exploring alternative modes of control
  • DSpace
  • SRB APIs
  • Federated zone configuration
  • Deeper integration of DSpace and SRB metadata
    databases
  • Explore life-cycle management of integrated
    resources
  • Explore relationship to CDL DPR
  • Handling of compound documents

44
ConclusionExpected Results / Benefits
  • DSpace users
  • Federated collection management through
    distributed grid technology
  • Exchange of METS encoded collections
  • SRB users
  • User friendly ingest mechanism
  • Extended life-cycle management
  • Exchange of METS encoded collections

45
Simple User Interface
Simple Access to Unlimited Storage
DSpace
SRB

46
Q A
  • Project website
  • http//libnet.ucsd.edu/nara
Write a Comment
User Comments (0)
About PowerShow.com