Title: Creating and Sustaining a Digital Collection
1Creating and Sustaining a Digital Collection
2Overview
- What is a Digital Object?
- Technology used to Create the Object
- Techniques for Describing/Building the Object
- Access Distribution Technologies
- Preservation and Archiving
3What is a Digital Object?
- Simply put, a digital object can be viewed as the
unit of information that your user interacts
with. - A digital object can be a single file, or it can
be a complex arrangement of files and file types.
4What is a Digital Object? (Browns Model)
- A digital object relates directly to the original
artifact through a 11 correspondence - A digital object may take on a multitude of
appearances and behaviors.
5The Digital Object (interior view)
6The Digital Object (exterior view)
File
Viewer
File
File
metadata
File
File
File
Viewer
Collection
Collection
7Creating the Digital Object
- 1. Creation of the surrogate files.
- Master Files
- 2. Creation of derivative images.
- Use Copies
- 3. Production of appropriate metadata
8Creation of the Surrogate Files
- Perhaps the quickest and least demanding
component of digital collection building. - Technologies are readily available,
cost-effective and easily used.
9In-House or Out-Source?
- Size of collection vs. available staff or
students - Funding realities
- Time constraints
- Is this a One Timer or just the beginning?
10Digitizing Vendors
- Imaging
- Imaging Metadata
- Imaging Metadata Markup
- Byte Managers
- Access Imagery
- Digital Divide
- Luna Insight
- Apex
11In-House Capacity What to look for
- Scanning Area
- Optical Resolution Bit Depth
- Workstation Requirements
- Light Source
12Optical Resolution
- Optical Resolution Number of pixels (dots) that
are created to represent an inch of the original
image dots per inch. - High DPI High Quality
- High DPI Large File Sizes
13Bit Depth
- Controls the number of possible colors in a scan.
- Grayscale
- 8 bit 256 Colors (2x2x2x2x2x2x2x2)
- 24 bit Color
- 8 bits per channel 3 channels (red, green, blue)
- 24 bit 16,777,216 Colors (256x256x256)
14Low Cost Desktop Scanner
- 8.5x 11 scanning surface
- 2400x2400 DPI optical resolution
- 24 bit color depth (or higher)
- Single pass scan mode
- Automatic document feeder
- HP ScanJet 250
15High-End Desktop Scanner
- 12x 17 scanning surface
- 1600x3200 DPI optical resolution
- 24-bit color depth (or higher)
- Single pass scan mode
- Epson Expression 1640XL 2500
16Digital Camera Workstation
- 4x5 Reflex Camera with Digital Back
- Production Quality Lens
- 40x60 imaging surface with dual vacuums
- Reflective Tungsten lighting
- 50,000 (plus G4 workstation)
17Resolution Bit Depth File Size
- 1) calculate area of image in pixels
- 2) multiply by bytes per pixel (bit depth/8)
- 3) divide by 1,000,000 (to convert to Mb)
- 8 bit 3000x2284 pixel image (Grayscale)
- (3000x2400x1)/1,000,000 6.8 Mb
- 24 bit 3000x2284 pixel image (True Color)
- (3000x2400x3)/1,000,000 20.5 Mb
18Imaging Workstations Storage Needs
- Simply put you need lots of it!
- Example Sheet Music
- Color Cover (2 images) 2_at_30 Mb
- Grayscale Pages (4 images) 4_at_8 Mb
- Object Total 92 Mb
- 10 pieces scanned in 3 hour shift
- 2 3 hour shifts per day
- Daily Total 92 Mb per piece 10 pc 2 shifts
1.84 Gb - 1500 pieces in collection
- Project Total 92 Mb 1500 135 Gb
19Storage Solutions
- On-Line (derivatives only)
- Web Server Storage Array
- Near-Line (masters and derivatives)
- Imaging Workstation Hard Drive
- Network Storage Device
- Off-Line (masters)
- CD-ROM
- DVD
- Tape
20In-House Production Hardware Costs
- ? Digitizing workstations _at_ 3,000 each
- Small format scanner (11 inch)
- 2 GHz workstation with 100 GB HD 1 GB RAM
- 21 inch monitor
- ? digitizing workstations _at_ 5,000 each
- Large format scanner (17 inch)
- 2 GHz workstation with 100 GB HD 1 GB RAM
- 21 inch monitor
- 1 Terabyte Networked Storage Device _at_ 14,000
21Creation of Use (derivative) Copies
- You create masters at time of scanning
- Uncompressed, full resolution TIFF files
- These are your true assets, protect them well!
- Users will interact with lower resolution,
smaller-sized images - JPEG, GIF, PNG
- Size and format may vary by collection
- These can be recreated at will they have
limited archival value
22Use Copies at Brown (image collection)
- Thumbnail 125 Pixels
- Small Res. 750 Pixels
- Medium Res. 1500 Pixels
- High Res. 3000 Pixels
- Archival Image Dimensions of Original Scan
- as measured along longest axis
23Oh, My God! So many files!!
- Object to Artifact Ratio 11
- File to Object Ratio 51
- Use copies (derivatives) can be made at time of
scanning (i.e., in Photoshop), or - You could use a more scriptable software package
like ImageMagick to automate this - Your metadata record can keep track of these
files and their corresponding filenames.
24A Meta(data) Pause
- Before you can think through your metadata
options, you need to have a clear understanding
and acceptance of your object model. - That model may lead you toward or away from
certain options.
25Possible Scenarios
- Library X will only provide a single derivative
image for each photograph in their collection.
The derivative will be 600 pixels on long edge. - Library Z wants to provide page-turning
capabilities for each pamphlet in their
collection including variant resolutions for
each page, and will also provide access to the
text via the TEI.
26and Possible Solutions
- Library X might consider creating short MARC
records for each photograph and link to the
digital image via an 856 field. - Library Z might consider creating METS records
for each pamphlet. They might want to link to
these records from their OPAC or develop a
MODS-based digital library. A stylesheet will
render the METS record for web viewing.
27With all due apologies to catalogers
- All MARC is metadata, But Not all metadata is MARC
28Measuring Your Metadata Needs
- How complicated is your object model?
- How ambitious are your digital collection
aspirations? - Will you use a digital library in a box?
- Or will you develop your own repository?
29Metadata Needs for Digital Objects
- Ability to discover an object amongst similar and
dissimilar object types (an image amongst images
or an image amongst all resources). - Ability to find and connect all the files that
comprise an object (impart behavior) - Ability to ascertain technical quality of the
digital object, and to predictably refresh and
update its constituent parts.
30Metadata Soup?
VRA
DDI
Dublin Core MODS DMD MARC
TEI
EAD
FGDC
MPEG
31METS Old Concepts New Wrinkles
- Metadata Encoding and Transmission Standard
- METS provides a single document approach to
collecting all metadata for an object . - METS provides spaces into which we can plug our
own metadata encoding schemes. - METS provides the ability to sample and use as
many flavors of the metadata soup as we want. - We can determine possible behaviors for any
object based on the METS structure.
32Creating the Metadata Record
- Via the cataloging department using MARC or
Dublin Core in the OPAC - Via the turnkey solutions data editor
- Or as XML structures created by the production
and or cataloging staff - Avoid the digital cataloging backlog
- Cataloging is a shared activity
33Tools for Creating XML structures
- WYSIWYG Editors
- XML Spy (300)
- XMetal (500)
- Text Editors
- NoteTab (0 or 29)
- BBEdit
34Access and Distribution
- Simple HTTP server
- Turnkey enterprise systems
- Open source and/or home grown systems
35Simple HTTP Servers
- Very inexpensive
- Microsoft Internet Information Server
- PC Based
- Use OPAC for searching (856 field)
36Turnkey Digital Library Systems
- OPAC Vendors
- SIRSI Hyperion
- III Millennium
- Encompass Endeavor
- Stand Alone Systems
- ContentDM
- MDID
- Luna Insight
37Turnkey Systems Advantages
- OPAC vendor systems may integrate better into
traditional catalog and cataloging workflows - Strong vendor support and customer relationship
- Large user base
- Less technical burden placed on local institution
38Turnkey Systems Disadvantages
- Limited Object Modeling Metadata Support
- Potentially slow response to environmental change
- Focused on traditional modalities
- Digital Preservation?
- Cost
39Open Source Home-Spun Solutions
- Generally Free
- Require a relatively high degree of technical
proficiency in library to install and maintain. - Source code is available to the community of
developers. - DSpace
- FEDORA
40Browns Ideal System
- Supports format specific tag searching as well as
cross-format searching - Based on the METS object model
- Retrieval of objects managed via XSLT docs, with
multiple Views for each object. - Z39.50 connections to other library systems.
- Ability to automate refresh and migration.
41Digitizing Collections at Brown
- Center for Digital Initiatives
- Oversees all digitizing efforts (except reserve
readings) - Scanning, OCR, Markup and Metadata
- Management of Vendor Contracts
- Music Library
- Produces mp3 and mp4 files for audio reserves
- Media Services
- Produces mov and rm files for video reserves
42Digitizing Workflow
- 1) students scan/photograph artifact and produce
master tiffs - 2) staff check newly created scans and create
descriptive records (MODS) - 3) production tool (embedded into NoteTab)
produces derivative images, creates full METS
record and packages all parts into zip file for
loading into repository.
43Digitizing Workflow (cont.)
- 4) zip file loaded onto web server
- 5) loading program run, placing images into web
tree and loading MODS elements into SQL database - 6) new records added to index
44Preservation and Archiving
- Were not talking about digital photocopying
--were building collections of digital objects. - These collections deserve the same attention to
preservation and archiving that more traditional
materials receive. - So, how do we do this?
45Preservation and Archiving
- While no long-term digital preservation programs
are yet in place (many are in development), there
are things you can do to make sure you are ready
for them. - Produce files based on open standards.
- Produce files at the highest level of quality.
- Produce adequate metadata to document these.
46Digital Tools to be made available
- Clip libraries and templates for NoteTab (Light
or Pro) that facilitate creation of MODS, EAD and
other XML structures. - PERL and PerlMagick scripts which can be embedded
into NoteTab to automate the creation of
derivative images and METS records. -
- NoteTab is only available for the Windows OS.
47Digital Tools to be made available
- Repository database structure (MySQL). This can
be ported to any SQL compliant database package. - Object Loading and Indexing Tools.
- PHP based repository manager.
- Object preservation and refresh tools, when
available.
48Recapitulation Points to Consider
- What kinds of digital projects do you envision?
- What is your object model?
- Does it make sense to construct a production
center in your library? How much do you need,
and how much can you afford? - What is the appropriate mix of vended digitizing
and in-house digitizing?
49First Steps
- 1. Prepare Digital Collection Development
Policy. - 2. Develop object model.
- 3. Select a manageable collection to digitize.
- 4. Develop a project plan (including outcomes).
- 5. Cost analysis for in-house vs. outsourced
work. - 6. Do it.
- 7. Analyze costs and benefits. Evaluate end
product.