Digitization Projects at the State Library of Pennsylvania: Where the Past and Future Meet - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Digitization Projects at the State Library of Pennsylvania: Where the Past and Future Meet

Description:

Digitization Projects at the State Library of Pennsylvania: Where the ... Repeated obituaries. Scrapbook difficulties. Label at bottom of page, obit on next ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 45
Provided by: kha8
Category:

less

Transcript and Presenter's Notes

Title: Digitization Projects at the State Library of Pennsylvania: Where the Past and Future Meet


1
Digitization Projects at the State Library of
Pennsylvania Where the Past and Future Meet
  • Bill Nork
  • Head of Systems Preservation
  • William Fee
  • Digital Collections Librarian
  • Kurt Bodling
  • Digital Resources Cataloger
  • Pennsylvania Department of Education,
  • State Library of Pennsylvania

2
www.statelibrary.state.pa.us/digital_projects
  • Or
  • Visit the State Library of PA Website
  • www.statelibrary.state.pa.us
  • Select Digital Projects of the State Library

3
Digitization things learned the hard way
  • Or why do I drink so much coffee?
  • By Bill Fee
  • Try to plan things out as much as you can before
    starting a project
  • No matter how much you plan, something will blow
    up in your face.
  • Its often better to throw people at a problem
    than equipment (if they hit just right, this also
    counts as percussive maintenance)
  • Loud, obnoxious and driving punk rock and techno
    really improve the workflow (though that could be
    just a personal preference)

4
Hardware Software
  • We run a Dell Optiplex GX260 with a 2.26 Ghz
    non-hyperthreaded processor. Alas, were a PC
    shop.
  • Scanner-wise, we have a 25,000 Minolta PS 7000
    overhead engine book scanner and an HP ScanJet
    7400C thats up for replacement.

5
Hardware Software- again
  • Direct scans into Photoshop.   I can save the
    archival TIFF, then edit it and create the access
    JPEG right there.  As a library you should be
    able to get an educational license, which is a
    heck of a lot cheaper.  The program itself  may
    seem more full featured than you need, but things
    like batch process when you're doing a whole
    directory of images with the same edits and
    sizing really save time.  Get them to pay for
    classes, though- about 200 per but well worth it.

6
Still More Hardware Software
  • We use Omnipage for OCR.  You'll save yourself a
    heck of a lot of correction time by doing a dual
    scan- 1 into Photoshop, one directly into the OCR
    program, whichever you use.  Omnipage has about a
    98 or 99 percent accuracy for anything but
    newspapers, but there are others just as good. 
    Hit up ComputerShopper.com and read reviews.
  • If I'm doing a web page, I use the Composer
    feature in Mozilla or Netscape.
  • Ive been using these programs and essentially
    the same hardware since the bad old pre-standards
    Dark Ages of 5 years ago, and they seem to work.

7
What criteria do you use to have an item
digitized?
  • Must be PA related.
  • Usually in such poor shape that it cannot
    circulate, or from the Rare Book Room, or ordered
    by the Director or Commissioner.
  • Must have less than 5-10 holding libraries in
    FirstSearch (not counting us).
  • Usually fits a theme- current is the VLaT
    project- Violence, Labor and Transportation
    riots, train wrecks, mine accidents, etc.

8
Other problems you will find
  • Bureaucracy
  • Shipment
  • File and folder nomenclature
  • Poor scans and OCR
  • Storage
  • Personnel
  • High-priority projects
  • New software, new uses for software, new problem
    with software that only come up because its a
    new project.

9
Metadata Considerations
  • Kurt A.T. Bodling
  • Digital Resources Cataloger
  • State Library of Pennsylvania

10
The Starting Place
  • What is the digital object?
  • Something newly created?
  • Already cataloged?
  • A collection?
  • A single item?
  • A selection from an item?
  • Who is it for?

11
(No Transcript)
12
(No Transcript)
13
Ben Franklin solutions
  • Easy call siphon data from OPAC
  • Tougher dealing with chapters and single letters

14
(No Transcript)
15
General solution to obit challenges
  • Sampling and testing
  • Hunting down exceptions
  • Creating a data dictionary
  • And, of course, going back later to make changes

16
Data Dictionary defined
MARC AACR2 Dublin Core Data Dictionary
17
(No Transcript)
18
(No Transcript)
19
Creating the data dictionary
  • Simple issues first
  • Steal data from the catalog
  • Use boilerplate rights management statement
  • Get repeated data into a template

20
Creating the data dictionary
  • More difficult challenges
  • Names of the deceased
  • Citation to original source newspapers
  • Omissions
  • Enhancements
  • Difficulties caused by original scrapbooking

21
Names of the deceased
  • Not authority controlled
  • Variations between two obit versions
  • Variations within one obit
  • Lacking first name

22
Name variations
23
Anonymous child
24
Names of the deceased
  • Solutions
  • Enter only surname, but
  • Enter all spellings that appear

25
Citations to original sources
  • Visible on microfilm, but NOT in jpeg
  • Easily recoverable

26
Citations to original sources
  • Solution
  • Leave this information out of metadata

27
Omissions
  • Blank pages
  • Pages glued together
  • Military unit information

28
Military unit info
29
Omissions
  • Solutions
  • Record page numbers as they appear
  • Note when pages dont appear
  • Omit unit information

30
Enhancements
  • Geographic info
  • Occupational info
  • Marital status
  • And on and on and on.

31
(No Transcript)
32
(No Transcript)
33
Enhancements
  • Solutions
  • Forego most enrichment
  • Include former slave
  • Include some terms like suicide and murder

34
Scrapbook difficulties
  • Running on to second page
  • Running on to 3rd, 4th, 5th pages

35
Multiple page obit
36
Scrapbook difficulties
  • Repeated obituaries

37
Scrapbook difficulties
  • Label at bottom of page, obit on next

38
Text and title split
39
Scrapbook difficulties
  • Year-end cumulative death notice
  • Articles that were not obits at all
  • Volumes containing two years

40
Cumulative notice
41
Not an obit
42
My Lessons Learned
  • Metadata isnt (arent?) scary
  • Patience and perseverance win out
  • Small crew quick decisions

43
What Did we Learn?
More man-hours than we thought More staffing to
complete task Decisions about how deep to go with
metadata
44
Questions?
Call or email one of us
Bill Fee 717-783-7014 wfee_at_state.pa.us
Kurt Bodling 717-783-5996 kbodling_at_state.pa.us
Bill Nork 717-787-9128 wnork_at_state.pa.us
Write a Comment
User Comments (0)
About PowerShow.com