Title: Digitization Projects at the State Library of Pennsylvania: Where the Past and Future Meet
1Digitization Projects at the State Library of
Pennsylvania Where the Past and Future Meet
- Bill Nork
- Head of Systems Preservation
- William Fee
- Digital Collections Librarian
- Kurt Bodling
- Digital Resources Cataloger
- Pennsylvania Department of Education,
- State Library of Pennsylvania
2www.statelibrary.state.pa.us/digital_projects
- Or
- Visit the State Library of PA Website
- www.statelibrary.state.pa.us
- Select Digital Projects of the State Library
3Digitization things learned the hard way
- Or why do I drink so much coffee?
- By Bill Fee
- Try to plan things out as much as you can before
starting a project - No matter how much you plan, something will blow
up in your face. - Its often better to throw people at a problem
than equipment (if they hit just right, this also
counts as percussive maintenance) - Loud, obnoxious and driving punk rock and techno
really improve the workflow (though that could be
just a personal preference)
4Hardware Software
- We run a Dell Optiplex GX260 with a 2.26 Ghz
non-hyperthreaded processor. Alas, were a PC
shop. - Scanner-wise, we have a 25,000 Minolta PS 7000
overhead engine book scanner and an HP ScanJet
7400C thats up for replacement.
5Hardware Software- again
- Direct scans into Photoshop. I can save the
archival TIFF, then edit it and create the access
JPEG right there. As a library you should be
able to get an educational license, which is a
heck of a lot cheaper. The program itself may
seem more full featured than you need, but things
like batch process when you're doing a whole
directory of images with the same edits and
sizing really save time. Get them to pay for
classes, though- about 200 per but well worth it.
6Still More Hardware Software
- We use Omnipage for OCR. You'll save yourself a
heck of a lot of correction time by doing a dual
scan- 1 into Photoshop, one directly into the OCR
program, whichever you use. Omnipage has about a
98 or 99 percent accuracy for anything but
newspapers, but there are others just as good.
Hit up ComputerShopper.com and read reviews. - If I'm doing a web page, I use the Composer
feature in Mozilla or Netscape. - Ive been using these programs and essentially
the same hardware since the bad old pre-standards
Dark Ages of 5 years ago, and they seem to work.
7What criteria do you use to have an item
digitized?
- Must be PA related.
- Usually in such poor shape that it cannot
circulate, or from the Rare Book Room, or ordered
by the Director or Commissioner. - Must have less than 5-10 holding libraries in
FirstSearch (not counting us). - Usually fits a theme- current is the VLaT
project- Violence, Labor and Transportation
riots, train wrecks, mine accidents, etc.
8Other problems you will find
- Bureaucracy
- Shipment
- File and folder nomenclature
- Poor scans and OCR
- Storage
- Personnel
- High-priority projects
- New software, new uses for software, new problem
with software that only come up because its a
new project.
9Metadata Considerations
- Kurt A.T. Bodling
- Digital Resources Cataloger
- State Library of Pennsylvania
10The Starting Place
- What is the digital object?
- Something newly created?
- Already cataloged?
- A collection?
- A single item?
- A selection from an item?
- Who is it for?
11(No Transcript)
12(No Transcript)
13Ben Franklin solutions
- Easy call siphon data from OPAC
- Tougher dealing with chapters and single letters
14(No Transcript)
15General solution to obit challenges
- Sampling and testing
- Hunting down exceptions
- Creating a data dictionary
- And, of course, going back later to make changes
16Data Dictionary defined
MARC AACR2 Dublin Core Data Dictionary
17(No Transcript)
18(No Transcript)
19Creating the data dictionary
- Simple issues first
- Steal data from the catalog
- Use boilerplate rights management statement
- Get repeated data into a template
20Creating the data dictionary
- More difficult challenges
- Names of the deceased
- Citation to original source newspapers
- Omissions
- Enhancements
- Difficulties caused by original scrapbooking
21Names of the deceased
- Not authority controlled
- Variations between two obit versions
- Variations within one obit
- Lacking first name
22Name variations
23Anonymous child
24Names of the deceased
- Solutions
- Enter only surname, but
- Enter all spellings that appear
25Citations to original sources
- Visible on microfilm, but NOT in jpeg
- Easily recoverable
26Citations to original sources
- Solution
- Leave this information out of metadata
27Omissions
- Blank pages
- Pages glued together
- Military unit information
28Military unit info
29Omissions
- Solutions
- Record page numbers as they appear
- Note when pages dont appear
- Omit unit information
30Enhancements
- Geographic info
- Occupational info
- Marital status
- And on and on and on.
31(No Transcript)
32(No Transcript)
33Enhancements
- Solutions
- Forego most enrichment
- Include former slave
- Include some terms like suicide and murder
34Scrapbook difficulties
- Running on to second page
- Running on to 3rd, 4th, 5th pages
35Multiple page obit
36Scrapbook difficulties
37Scrapbook difficulties
- Label at bottom of page, obit on next
38Text and title split
39Scrapbook difficulties
- Year-end cumulative death notice
- Articles that were not obits at all
- Volumes containing two years
40Cumulative notice
41Not an obit
42My Lessons Learned
- Metadata isnt (arent?) scary
- Patience and perseverance win out
- Small crew quick decisions
43What Did we Learn?
More man-hours than we thought More staffing to
complete task Decisions about how deep to go with
metadata
44Questions?
Call or email one of us
Bill Fee 717-783-7014 wfee_at_state.pa.us
Kurt Bodling 717-783-5996 kbodling_at_state.pa.us
Bill Nork 717-787-9128 wnork_at_state.pa.us