Designing Storage Architectures for Preservation Collections Library of Congress, September 1718, 20 - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Designing Storage Architectures for Preservation Collections Library of Congress, September 1718, 20

Description:

Archival storage (A) High capacity, low cost. Use assets are ... Storage virtualization. SAM-QFS reader / writer on primary on-campus T2000 file server ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 10
Provided by: Step340
Category:

less

Transcript and Presenter's Notes

Title: Designing Storage Architectures for Preservation Collections Library of Congress, September 1718, 20


1
Designing Storage Architectures for Preservation
Collections Library of Congress, September
17-18, 2007Preservation and Access
RepositoryStorage Architecture
  • Stephen Abrams
  • Harvard University Library
  • stephen_abrams_at_harvard.edu

2
Digital preservation at Harvard
  • Obligation to ensure the ongoing usability of
    library digital assets over time
  • Digital Repository Service (DRS)
  • Managed preservation and access repository
  • Seven years of production operation
  • 6.7 million assets (27 TB)
  • Primary strategy redundancy and
    heterogeneity
  • Primary challenge scaling

3
Scaling linear or exponential?
4
Storage classification
  • All managed assets are assigned a storage
    classification
  • Public use (U) ?
    High availability, fast response
  • Archival storage (A) ? High capacity, low cost
  • Use assets are optimized for web-friendly
    delivery
  • Archival assets are optimized for longevity
  • Asset classification is known at the point of
    acquisition

5
Architectural requirements
  • Each asset is stored
  • In at least 3 physical locations
  • On at least 2 storage mediums
  • With at least 2 on-line copies (U) / 1 on-line
    copy (A)
  • With at least 1 off-line copy
  • Ongoing auditing for bit-level error detection
    and correction
  • Virtualization layer with uniform interface to
    all assets, regardless of physical medium
  • Application interface exposed as NFS-mountable
    file systems

6
Storage architecture
7
Storage architecture
  • QFS cache and primary U disk archive on EMC
    CX3-40 (FC / SATA, RAID-1/ RAID-5) at on-campus
    data center
  • Redundant switched FC data paths to primary /
    fail-over Sun T2000 / Solaris file servers
    running SAM-QFS
  • Primary A / secondary U disk archive on EMC
    CX3-80 (FC / SATA, RAID-1/ RAID-5) at off-campus
    data center
  • Redundant FC data paths to T2000 file server
    running SAM-QFS
  • Secondary A / tertiary U tape archive on
    StorageTek SL500 (LTO-3) FC-attached to primary
    on-campus T2000
  • Tertiary A / quaternary U tape archive on LTO-3
    media at off-campus managed storage facility
  • Disk archives are UFS file systems containing Tar
    files even with the loss of the SAM
    infrastructure they are susceptible to full (if
    time-consuming) recovery with standard Unix /
    Linux tools

8
Storage virtualization
  • SAM-QFS reader / writer on primary on-campus
    T2000 file server
  • SAM-QFS reader on fail-over on-campus /
    off-campus T2000 file servers
  • All U and A assets written to QFS cache on CX3-40
  • Immediate creation of all UFS disk and LTO-3 tape
    archive copies
  • Immediate release from cache with stage never
  • SAM manages all copies of all assets externally
    each asset appears as a single file in an
    NFS-mountable file system
  • Application access requests are initiated by NFS
    reads and are fulfilled directly from primary
    disk archive copy without staging to cache

9
Issues
  • Disk vs tape
  • LTO-3 vs LTO-4
  • Tape archive media pooling
  • All hardware / software installed currently
    engaged in configuration and preliminary unit /
    integration testing
  • Need to establish benchmarks for system
    performance
  • Planning for migration from existing storage
    solution
  • Automated data classification
  • Response to an anticipated escalating rate of
    asset acquisition
  • Google mass digitization
  • Web archiving
  • Audio / video content
  • Scientific data sets
Write a Comment
User Comments (0)
About PowerShow.com