Creating and Sustaining a Digital Collection - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Creating and Sustaining a Digital Collection

Description:

Simply put, a digital object can be viewed as the unit of information that your ... Perhaps the quickest and least demanding component of digital collection building. ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 50
Provided by: dlLib
Category:

less

Transcript and Presenter's Notes

Title: Creating and Sustaining a Digital Collection


1
Creating and Sustaining a Digital Collection
  • What does it take?

2
Overview
  • What is a Digital Object?
  • Technology used to Create the Object
  • Techniques for Describing/Building the Object
  • Access Distribution Technologies
  • Preservation and Archiving

3
What is a Digital Object?
  • Simply put, a digital object can be viewed as the
    unit of information that your user interacts
    with.
  • A digital object can be a single file, or it can
    be a complex arrangement of files and file types.

4
What is a Digital Object? (Browns Model)
  • A digital object relates directly to the original
    artifact through a 11 correspondence
  • A digital object may take on a multitude of
    appearances and behaviors.

5
The Digital Object (interior view)
6
The Digital Object (exterior view)
File
Viewer
File
File
metadata
File
File
File
Viewer
Collection
Collection
7
Creating the Digital Object
  • 1. Creation of the surrogate files.
  • Master Files
  • 2. Creation of derivative images.
  • Use Copies
  • 3. Production of appropriate metadata

8
Creation of the Surrogate Files
  • Perhaps the quickest and least demanding
    component of digital collection building.
  • Technologies are readily available,
    cost-effective and easily used.

9
In-House or Out-Source?
  • Size of collection vs. available staff or
    students
  • Funding realities
  • Time constraints
  • Is this a One Timer or just the beginning?

10
Digitizing Vendors
  • Imaging
  • Imaging Metadata
  • Imaging Metadata Markup
  • Byte Managers
  • Access Imagery
  • Digital Divide
  • Luna Insight
  • Apex

11
In-House Capacity What to look for
  • Scanning Area
  • Optical Resolution Bit Depth
  • Workstation Requirements
  • Light Source

12
Optical Resolution
  • Optical Resolution Number of pixels (dots) that
    are created to represent an inch of the original
    image dots per inch.
  • High DPI High Quality
  • High DPI Large File Sizes

13
Bit Depth
  • Controls the number of possible colors in a scan.
  • Grayscale
  • 8 bit 256 Colors (2x2x2x2x2x2x2x2)
  • 24 bit Color
  • 8 bits per channel 3 channels (red, green, blue)
  • 24 bit 16,777,216 Colors (256x256x256)

14
Low Cost Desktop Scanner
  • 8.5x 11 scanning surface
  • 2400x2400 DPI optical resolution
  • 24 bit color depth (or higher)
  • Single pass scan mode
  • Automatic document feeder
  • HP ScanJet 250

15
High-End Desktop Scanner
  • 12x 17 scanning surface
  • 1600x3200 DPI optical resolution
  • 24-bit color depth (or higher)
  • Single pass scan mode
  • Epson Expression 1640XL 2500

16
Digital Camera Workstation
  • 4x5 Reflex Camera with Digital Back
  • Production Quality Lens
  • 40x60 imaging surface with dual vacuums
  • Reflective Tungsten lighting
  • 50,000 (plus G4 workstation)

17
Resolution Bit Depth File Size
  • 1) calculate area of image in pixels
  • 2) multiply by bytes per pixel (bit depth/8)
  • 3) divide by 1,000,000 (to convert to Mb)
  • 8 bit 3000x2284 pixel image (Grayscale)
  • (3000x2400x1)/1,000,000 6.8 Mb
  • 24 bit 3000x2284 pixel image (True Color)
  • (3000x2400x3)/1,000,000 20.5 Mb

18
Imaging Workstations Storage Needs
  • Simply put you need lots of it!
  • Example Sheet Music
  • Color Cover (2 images) 2_at_30 Mb
  • Grayscale Pages (4 images) 4_at_8 Mb
  • Object Total 92 Mb
  • 10 pieces scanned in 3 hour shift
  • 2 3 hour shifts per day
  • Daily Total 92 Mb per piece 10 pc 2 shifts
    1.84 Gb
  • 1500 pieces in collection
  • Project Total 92 Mb 1500 135 Gb

19
Storage Solutions
  • On-Line (derivatives only)
  • Web Server Storage Array
  • Near-Line (masters and derivatives)
  • Imaging Workstation Hard Drive
  • Network Storage Device
  • Off-Line (masters)
  • CD-ROM
  • DVD
  • Tape

20
In-House Production Hardware Costs
  • ? Digitizing workstations _at_ 3,000 each
  • Small format scanner (11 inch)
  • 2 GHz workstation with 100 GB HD 1 GB RAM
  • 21 inch monitor
  • ? digitizing workstations _at_ 5,000 each
  • Large format scanner (17 inch)
  • 2 GHz workstation with 100 GB HD 1 GB RAM
  • 21 inch monitor
  • 1 Terabyte Networked Storage Device _at_ 14,000

21
Creation of Use (derivative) Copies
  • You create masters at time of scanning
  • Uncompressed, full resolution TIFF files
  • These are your true assets, protect them well!
  • Users will interact with lower resolution,
    smaller-sized images
  • JPEG, GIF, PNG
  • Size and format may vary by collection
  • These can be recreated at will they have
    limited archival value

22
Use Copies at Brown (image collection)
  • Thumbnail 125 Pixels
  • Small Res. 750 Pixels
  • Medium Res. 1500 Pixels
  • High Res. 3000 Pixels
  • Archival Image Dimensions of Original Scan
  • as measured along longest axis

23
Oh, My God! So many files!!
  • Object to Artifact Ratio 11
  • File to Object Ratio 51
  • Use copies (derivatives) can be made at time of
    scanning (i.e., in Photoshop), or
  • You could use a more scriptable software package
    like ImageMagick to automate this
  • Your metadata record can keep track of these
    files and their corresponding filenames.

24
A Meta(data) Pause
  • Before you can think through your metadata
    options, you need to have a clear understanding
    and acceptance of your object model.
  • That model may lead you toward or away from
    certain options.

25
Possible Scenarios
  • Library X will only provide a single derivative
    image for each photograph in their collection.
    The derivative will be 600 pixels on long edge.
  • Library Z wants to provide page-turning
    capabilities for each pamphlet in their
    collection including variant resolutions for
    each page, and will also provide access to the
    text via the TEI.

26
and Possible Solutions
  • Library X might consider creating short MARC
    records for each photograph and link to the
    digital image via an 856 field.
  • Library Z might consider creating METS records
    for each pamphlet. They might want to link to
    these records from their OPAC or develop a
    MODS-based digital library. A stylesheet will
    render the METS record for web viewing.

27
With all due apologies to catalogers
  • All MARC is metadata, But Not all metadata is MARC

28
Measuring Your Metadata Needs
  • How complicated is your object model?
  • How ambitious are your digital collection
    aspirations?
  • Will you use a digital library in a box?
  • Or will you develop your own repository?

29
Metadata Needs for Digital Objects
  • Ability to discover an object amongst similar and
    dissimilar object types (an image amongst images
    or an image amongst all resources).
  • Ability to find and connect all the files that
    comprise an object (impart behavior)
  • Ability to ascertain technical quality of the
    digital object, and to predictably refresh and
    update its constituent parts.

30
Metadata Soup?
VRA
DDI
Dublin Core MODS DMD MARC
TEI
EAD
FGDC
MPEG
31
METS Old Concepts New Wrinkles
  • Metadata Encoding and Transmission Standard
  • METS provides a single document approach to
    collecting all metadata for an object .
  • METS provides spaces into which we can plug our
    own metadata encoding schemes.
  • METS provides the ability to sample and use as
    many flavors of the metadata soup as we want.
  • We can determine possible behaviors for any
    object based on the METS structure.

32
Creating the Metadata Record
  • Via the cataloging department using MARC or
    Dublin Core in the OPAC
  • Via the turnkey solutions data editor
  • Or as XML structures created by the production
    and or cataloging staff
  • Avoid the digital cataloging backlog
  • Cataloging is a shared activity

33
Tools for Creating XML structures
  • WYSIWYG Editors
  • XML Spy (300)
  • XMetal (500)
  • Text Editors
  • NoteTab (0 or 29)
  • BBEdit

34
Access and Distribution
  • Simple HTTP server
  • Turnkey enterprise systems
  • Open source and/or home grown systems

35
Simple HTTP Servers
  • Very inexpensive
  • Microsoft Internet Information Server
  • PC Based
  • Use OPAC for searching (856 field)

36
Turnkey Digital Library Systems
  • OPAC Vendors
  • SIRSI Hyperion
  • III Millennium
  • Encompass Endeavor
  • Stand Alone Systems
  • ContentDM
  • MDID
  • Luna Insight

37
Turnkey Systems Advantages
  • OPAC vendor systems may integrate better into
    traditional catalog and cataloging workflows
  • Strong vendor support and customer relationship
  • Large user base
  • Less technical burden placed on local institution

38
Turnkey Systems Disadvantages
  • Limited Object Modeling Metadata Support
  • Potentially slow response to environmental change
  • Focused on traditional modalities
  • Digital Preservation?
  • Cost

39
Open Source Home-Spun Solutions
  • Generally Free
  • Require a relatively high degree of technical
    proficiency in library to install and maintain.
  • Source code is available to the community of
    developers.
  • DSpace
  • FEDORA

40
Browns Ideal System
  • Supports format specific tag searching as well as
    cross-format searching
  • Based on the METS object model
  • Retrieval of objects managed via XSLT docs, with
    multiple Views for each object.
  • Z39.50 connections to other library systems.
  • Ability to automate refresh and migration.

41
Digitizing Collections at Brown
  • Center for Digital Initiatives
  • Oversees all digitizing efforts (except reserve
    readings)
  • Scanning, OCR, Markup and Metadata
  • Management of Vendor Contracts
  • Music Library
  • Produces mp3 and mp4 files for audio reserves
  • Media Services
  • Produces mov and rm files for video reserves

42
Digitizing Workflow
  • 1) students scan/photograph artifact and produce
    master tiffs
  • 2) staff check newly created scans and create
    descriptive records (MODS)
  • 3) production tool (embedded into NoteTab)
    produces derivative images, creates full METS
    record and packages all parts into zip file for
    loading into repository.

43
Digitizing Workflow (cont.)
  • 4) zip file loaded onto web server
  • 5) loading program run, placing images into web
    tree and loading MODS elements into SQL database
  • 6) new records added to index

44
Preservation and Archiving
  • Were not talking about digital photocopying
    --were building collections of digital objects.
  • These collections deserve the same attention to
    preservation and archiving that more traditional
    materials receive.
  • So, how do we do this?

45
Preservation and Archiving
  • While no long-term digital preservation programs
    are yet in place (many are in development), there
    are things you can do to make sure you are ready
    for them.
  • Produce files based on open standards.
  • Produce files at the highest level of quality.
  • Produce adequate metadata to document these.

46
Digital Tools to be made available
  • Clip libraries and templates for NoteTab (Light
    or Pro) that facilitate creation of MODS, EAD and
    other XML structures.
  • PERL and PerlMagick scripts which can be embedded
    into NoteTab to automate the creation of
    derivative images and METS records.
  • NoteTab is only available for the Windows OS.

47
Digital Tools to be made available
  • Repository database structure (MySQL). This can
    be ported to any SQL compliant database package.
  • Object Loading and Indexing Tools.
  • PHP based repository manager.
  • Object preservation and refresh tools, when
    available.

48
Recapitulation Points to Consider
  • What kinds of digital projects do you envision?
  • What is your object model?
  • Does it make sense to construct a production
    center in your library? How much do you need,
    and how much can you afford?
  • What is the appropriate mix of vended digitizing
    and in-house digitizing?

49
First Steps
  • 1. Prepare Digital Collection Development
    Policy.
  • 2. Develop object model.
  • 3. Select a manageable collection to digitize.
  • 4. Develop a project plan (including outcomes).
  • 5. Cost analysis for in-house vs. outsourced
    work.
  • 6. Do it.
  • 7. Analyze costs and benefits. Evaluate end
    product.
Write a Comment
User Comments (0)
About PowerShow.com