Exploring Scalable Storage: DSpace - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Exploring Scalable Storage: DSpace

Description:

Digital content ingestion. Search and discovery. Content ... 10 Western Digital Caviar 200-GB IDE disk drives. 3Com Etherlink 3C996B-T PCI bus 1000Base-T ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 44
Provided by: lucdec
Category:

less

Transcript and Presenter's Notes

Title: Exploring Scalable Storage: DSpace


1
ExploringScalable Storage DSpace SRB
  • Chris Frymann
  • University of California San Diego Libraries
  • DSpace User Group Meeting
  • March 10, 2004

2
Extending DSpaceStorage Capabilities
  • Much of the value and success of the Web is a
    result of its enormous size, which has been
    achieved through a distributed storage model.
  • The current DSpace model assumes local storage.
  • What if DSpace collections could be of virutally
    unlimited size, stored, replicated and accessible
    via federated grid technologies?

3
This Presentation Will
  • Report on a proposed project to extend DSpace
    data management capabilites through integration
    with the San Diego Supercomputer Centers Storage
    Resource Broker (SRB)
  • Provide an overview of SRB
  • Review how both DSpace and SRB users will benefit

4
NARA Proposal Participants
  • San Diego Super Computer Center (SDSC)
  • Member of National Partnership for Advanced
    Computational Infrastructure (NPACI) an NSF
    sponsored program
  • MIT Libraries
  • UC San Diego Libraries (UCSD)
  • Hewlett Packard Laboratories (HP)
  • National Archives and Records Administration
    (NARA)

5
Proposal Views DSpace As
  • Simple user-friendly front end providing
  • Digital content ingestion
  • Search and discovery
  • Content management
  • Dissemination services
  • Preservation

6
What is SRB?
  • Storage Resource Broker
  • Data management infrastructure
  • Developed at San Diego Supercomputer Center
  • Utilizes data grid and federation technologies

7
Levels of Possible Integration
  • Replace DSpace file system calls with SRB access
    calls
  • Utilize SRB metadata management capabilities
  • Provide support for federation of DSpace systems

8
Storage Resource BrokerTechnical Overview
9
Definition of SRB
  • Middleware which allows other applications to
    treat a diverse collection of physical storage
    devices as a single logical resource
  • A distributed file system (Data Grid), based on a
    client-server architecture.

10
What SRB Does
  • Replicates, syncs, archives, and connects
    heterogeneous resources in a logical manner using
    abstraction mechanisms.
  • Also provides a way to access files and computers
    based on their attributes rather than just their
    names or physical locations.

11
A Sample Storage Environment
12
Commodity IDE Disk Drive
200 GB - 200
13
Grid Brick
2 Terabytes
14
Grid Bricks Details
  • Hardware components
  • Intel Celeron 1.7 GHz CPU
  • SuperMicro P4SGA PCI Local bus ATX mainboard
  • 1 GB memory (266 MHz DDR DRAM)
  • 3Ware Escalade 7500-12 port PCI bus IDE RAID
  • 10 Western Digital Caviar 200-GB IDE disk drives
  • 3Com Etherlink 3C996B-T PCI bus 1000Base-T
  • Redstone RMC-4F2-7 4U ten bay ATX chassis
  • Linux operating system
  • Cost is 2,200 per Tbyte plus tax
  • Gig-E network switch costs 500 per brick
  • Effective cost is about 2,700 per TByte

15
Rack of Grid Bricks 12 TB
16
Data Grid 50 TB
Room of Racks
17
Grid Bricks at SDSC
  • Used to implement picking environments for
    10-TB collections
  • Web-based access
  • Web services (WSDL/SOAP) for data subsetting
  • Implemented 15-TBs of storage
  • Astronomy sky surveys, NARA prototype persistent
    archive, NSDL web crawls
  • Must still apply Linux security patches to each
    Grid Brick

18
SDSC Production Data Grid
  • SDSC Storage Resource Broker
  • Federated client-server system, managing
  • Over 90 TBs of data at SDSC
  • Over 16 million files
  • Manages data collections stored in
  • Archives (HPSS, UniTree, ADSM, DMF)
  • Hierarchical Resource Managers
  • Tapes, tape robots
  • File systems (Unix, Linux, Mac OS X, Windows)
  • FTP sites
  • Databases (Oracle, DB2, Postgres, SQLserver,
    Sybase, Informix)
  • Virtual Object Ring Buffers

19
SRB Collections at SDSC
20
Data Grids
  • Distributed data sources
  • Inter-realm authentication and authorization
  • Heterogeneity
  • Storage repository abstraction
  • Scalability
  • Differentiation between context and content
    management
  • Preservation
  • Support for automated processing (migration,
    archival processes)

21
Data Grid Components
  • Federated client-server architecture
  • Servers can talk to each other independently of
    the client
  • Infrastructure independent naming
  • Logical names for users, resources, files,
    applications
  • Collective ownership of data
  • Collection-owned data, with infrastructure
    independent access control lists
  • Context management
  • Record state information in a metadata catalog
    from data grid services such as replication
  • Abstractions for dealing with heterogeneity

22
Data Grid Abstractions
  • Logical name space for files
  • Global persistent identifier
  • Storage repository abstraction
  • Standard operations supported on storage systems
  • Information repository abstraction
  • Standard operations to manage collections in
    databases
  • Access abstraction
  • Standard interface to support alternate APIs
  • Latency management mechanisms
  • Aggregation, parallel I/O, replication, caching
  • Security interoperability
  • GSSAPI, inter-realm authentication,
    collection-based authorization

23
Data Grid Federation
  • Data grids provide the ability to name, organize,
    and manage data on distributed storage resources
  • Federation provides a way to name, organize, and
    manage data on multiple data grids.

24
Distributed Data Grids 200 TB
SRB Federation
25
SRB APIs
26
SDSC Storage Resource Broker Meta-data Catalog
Application
Linux I/O
OAI WSDL
Access APIs
DLL / Python
Java, NT Browsers
GridFTP
Federation
Consistency Management /
Authorization-Authentication
SRB Server
Logical Name Space
Latency Management
Data Transport
Metadata Transport
Storage Abstraction
Catalog Abstraction
Databases DB2, Oracle, Sybase, SQLServer
Drivers
HRM
27
Federated SRB server model
Peer-to-peer Brokering
Read Application
Parallel Data Access
Logical Name Or Attribute Condition
1
6
5/6
SRB server
SRB server
3
4
5
SRB agent
SRB agent
2
Server(s) Spawning
R1
MCAT
1.Logical-to-Physical mapping 2.Identification of
Replicas 3.Access Audit Control
R2
Data Access
28
Peer-to-Peer Federation
  • Occasional Interchange - for specified users
  • Replicated Catalogs - entire state information
    replication
  • Resource Interaction - data replication
  • Replicated Data Zones - no user interactions
    between zones
  • Master-Slave Zones - slaves replicate data from
    master zone
  • Snow-Flake Zones - hierarchy of data
    replication zones
  • User / Data Replica Zones - user access from
    remote to host zone
  • Nomadic Zones - synchronize local zone to
    parent zone
  • Free-floating myZone - synchronize without a
    parent zone
  • Archival BackUp Zone - synchronize to an archive

29
Principle peer-to-peer federation approaches
30
SRB Availability
  • SRB source distributed to academic and research
    institutions
  • Commercial use access through UCSD Technology
    Transfer Office
  • William Decker WJDecker_at_ucsd.edu
  • Commercial version from
  • http//www.nirvanastorage.com

31
SRB Info Resources
  • SRB Homepage
  • http//www.npaci.edu/DICE/SRB/
  • Grid Port Toolkit
  • https//gridport.npaci.edu/
  • Data Intensive Computing Environment (DICE)
  • http//www.npaci.edu/DICE
  • mySRB - Web-based Browser and Query Tool for
    Storage Resource Broker
  • http//www.npaci.edu/dice/srb/mySRB/mySRB.html

32
NARA Proposal - Two Goals
  • Demonstrate possibility of federating two
    different preservation architectures
  • Support exchange of documents between both
    systems
  • Demonstrate DSpace/SRB integration leads to
    improved life-cycle support

33
NARA Proposal - Plan of Work
  • Evaluation of life-cycle management
  • Use SRB as filestore for DSpace bitstreams
  • Identify METS storage profile
  • Enable exchange of data and metadata between
    independent DSpace and SRB systems

34
Schedule Deliverables
  • Year 1 Develop Research Prototype
  • Develop functional requirements
  • Specify standard interfaces METS profiles
  • Prototype implementation of specified design
  • Ingest data, evaluate functionality and
    performance

35
Schedule Deliverables
  • Year 2
  • Federation with additional systems, possibly
  • CDL
  • OCLC
  • Fedora
  • Scalability testing
  • Ingestion of more content types

36
UCSD LibrariesTo ProvideTest Collections
  • Slide Collection
  • Over 200,000 Digitized Slide Images
  • Over 1 million files (counting derivatives)
  • Approximately 5 Terabytes
  • Moving Image Collection
  • California movie newsreel footage
  • Size of collection to be determined

37
Testing Will Explore
  • Management of terabyte scale collections
  • Handling of compound documents
  • Automating aspects of archival workflow
  • Integration of METS
  • Feasibility of deriving descriptive metadata from
    the material during ingest
  • Automated verification and validation checking

38
Hewlett Packard Labs
  • Will provide a second storage utility which will
    be used to test
  • Federation
  • Zone level access and management controls
  • Data validation and authenticity

39
Data Model
  • Paired Content and Metadata Files
  • Metadata encoded in standard METS profiles
  • Stand-alone METS files used to describe arbitrary
    levels of aggregation of lower level objects

40
(No Transcript)
41
Integration of DSpace / SRB EnablesMultiple
Modes of Control
  • From DSpace
  • Via SRB APIs
  • Specification of Federated SRB Zone configuration
    and interoperability

42
ConcusionExpected Results / Benefits
  • DSpace users
  • Federation collection management through
    distributed grid technology
  • Exchange of METS encoded collections
  • SRB users
  • User friendly ingest mechanism
  • Extended life-cycle management
  • Exchange of METS encode collections

43
Q A
  • Presentation will be available at
  • http//libnet.ucsd.edu/dspace/user_group_2004.03.
    ppt
Write a Comment
User Comments (0)
About PowerShow.com