Cant BEAT This The LC Bibliographic Enrichment Advisory Team - PowerPoint PPT Presentation


PPT – Cant BEAT This The LC Bibliographic Enrichment Advisory Team PowerPoint presentation | free to view - id: be654-ZDc1Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Cant BEAT This The LC Bibliographic Enrichment Advisory Team


... 'We must provide more information online about what our print collections ... ONIX: ONline Information eXchange data, we get TOCs, descriptions, sample texts, ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 38
Provided by: wlaL


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Cant BEAT This The LC Bibliographic Enrichment Advisory Team

Cant BEAT This!The LC BibliographicEnrichment
Advisory Team
  • David Williamson
  • Cataloging Automation Specialist
  • Acquisitions and
  • Bibliographic Access Directorate
  • Library of Congress

Working Outside the Box
  • To develop tools to aid in creating and locating
  • To develop innovative workflows and policies
  • To prepare pilot projects for production (Proof
    of Concept)

Rules of the Game
  • Projects can not distract from Library priorities
    (e.g. arrearage reduction)
  • Projects should demonstrate their
  • Projects populated by volunteer staff with
    minimal disruption to regular assignments

Enhancement ProjectsTheoretical Foundation
  • Discovery Retrieval We must provide more
    information online about what our print
    collections hold, so that potential users of our
    holdings can more easily discover the treasurers
    they contain. Roy Tennant, LJ, Dec. 15, 2001

Enhancement Projects
Many students prefer the chaos of the web to the
drudgery of the library New York Times, August
10, 2000
Tables of Contents, Descriptions, Sample Text
Author Bio Projects
  • ONIX ONline Information eXchange data, we get
    TOCs, descriptions, sample texts, and bio
    information from these files.
  • Digital TOC Scanned from published books in the
    cataloging stream
  • E-CIP TOC Extracted from manuscript files
    submitted by publishers. 36,000 extracted and
    linked to date.

ONIX TOC Project
  • We have a rolling pool of some 250,000 ONIX
    records with more coming soon.
  • Not all uniquely match records in the LC database
    because publishers create individual records for
    hardback vs. paperback, individual volumes of
    multipart items, and for accompanying material.

ONIX TOC Project
  • April 2002 Sept. 2004
  • TOC 29,623 48,783
  • Descriptions 20,815 109,279
  • Samples 0 16,741
  • Bios 0 12,675

  • 48,783 TOCs extracted and linked to date
  • Wiley 17,794
  • Cambridge 15,661
  • McGraw-Hill 5,734
  • Houghton Mifflin 250
  • Globe-Pequot 69
  • Princeton 1,968
  • Holtzbrinck 1,227
  • Elsevier 3,607
  • Simon and Schuster 22
  • Dover 161
  • Harcourt 51
  • Northwestern 29

ONIX TOC Project Wiley
  • 17,794 TOCs to date
  • American Hospital Pub.
  • Cliffs Notes
  • Howell Book House
  • Hungry Minds
  • IDG Books ( for Dummies)
  • IEEE Press
  • Jossey-Bass
  • Macmillan
  • Pfeiffer

Digital TOC (DTOC) Project
  • TOCs are digitally scanned into images
  • Images are edited and converted into text using
    Prime Recognition OCR software
  • Files are HTML coded and mounted on LC web server

Selection of DTOC Items
  • Taken from current English-language imprints of
    research value all hardback books are now in
  • TOCs must be laid out in straight-forward manner
    maximum 6 pages
  • TOCs must have meaningful chapter titles

Current Status of DTOC
  • Currently 25,000 TOCs linked to bib records in
    the 856 field
  • All subject areas are currently covered
  • Currently processing approximately 250-300 new
    TOCs per week

  • Sub-project of the DTOC project
  • Goal is to process as many LHG items as possible
  • Approximately 700 TOCs completed so far.
  • When critical mass reached, will provide a new
    database of names, places, and events for
    researchers to search across multiple
    publications in ways not possible in the catalog.

TOC vs. TOC vs. TOC
  • ONIX TOC sample http//
  • DTOC sample http//
  • LHG TOC sample http//

The Cost of DTOC TOC Information
  • BEAT projects should demonstrate their
  • Mid 80s experiment with staff member typing TOC
    information into 505 field approx. 35 per title
  • DTOC initially around 10 per title with older
    hardware and software
  • DTOC now with better hardware and software 2
    per title and dropping

The Cost of ONIX TOC Information- Initially
  • ONIX TOC Extraction cost of initial programming
    and TOC extraction for 10,090 Wiley TOCs .065
    per record
  • ONIX TOC Linking cost of making link to bib
    record .065 per record

The Cost of ONIX TOC Information- Now
  • Approx. 7.50 for setup, extraction and linking.
    Cost per record depends on how many records
    created. 100 .07, 1,000 .007

ONIX Publisher Description Project
  • 109,279 descriptions extracted and linked to date
  • Wiley 16,214
    Cambridge 17,892
    McGraw-Hill 8,446
    Houghton Mifflin 3,031
    Harcourt 2,238

    Holtzbrinck 5,997
    Sterling 3,292
    Princeton 3,103
    Globe-Pequot 185
    Hendrickson 207
    Cavendish 41
    Dover 3,583
    Elsevier 4,112
    Simon and
    Schuster 5,836

ONIX Sample Text Project
  • 7,518 sample texts extracted and linked to date
  • Cambridge (PDF) 3,640
  • Simon and Schuster (HTML) 1,482
  • Princeton (HTML PDF) 315
  • Wiley (PDF) 2,081

Use of TOC and Descriptions Data
TOC Hits on Web
Descriptions Hits on Web
Reviews Project
  • BEAT desire to enhance bib records
  • Reviews of "Outstanding reference sources"
    sections of annual compilations that appear in
    American Libraries added to the bibliographic
    records in a 520 note (20-30 a year)

Reviews Project
  • Reviews of Best Free Reference Web Sites by the
    Machine-Assisted Reference Section (MARS) of
    ALA's Reference and User Services Association are
    added to the bibliographic records in a 520 note
    (25-30 a year)

Reviews Project
  • HLAS reviews for volumes 57 (social sciences) and
    58 (humanities) extracted from the HLAS database
    and inserted into the corresponding record in the
    LC database. 5,995 records were enhanced with
    HLAS reviews. Future records will be enhanced as
    new volumes are published.

Reviews Project
  • H-Net reviews links LC catalog records to H-Net
    Reviews in the Humanities and Social Sciences,
    the online journal of H-NET Humanities and
    Social Sciences Online. 6,173 records linked to
    the associated H-Net review to date.

The Most BEAT-ified Record
  • The Kingdom of Quito

Web Access to Works in the Public Domain
  • Many institutions are busy digitizing works in
    the public domain, making the full text of these
    items available to users.
  • This project matches the catalog record for the
    digitized items in the institutions catalog to
    the catalog record for the same edition of the
    work in LCs catalog.

Web Access to Works in the Public Domain
  • Once matched, the LC record gets 3 additions
  • 007 relating to the digital resource
  • 530 note explaining what that resource is and
    what institution is responsible for it
  • 856 link to the meta-level description for the
    resource on the institutions server, which then
    links to the full text of the item.

Web Access to Works in the Public Domain
  • University of Michigans Making of America
    project (http// 1,267
    records matched along with 78 from Cornells
    participation in the MOA project. Sample

Web Access to Works in the Public Domain
  • Indiana Universitys Wright American Fiction,
    1851-1875 project (http//
    b/w/ wright2/) 653 records matched. Also, 35
    titles from Indiana University's Victorian Women
    Writers Project. This additional collaboration
    with Indiana University links to items in a
    project that aims "to produce highly accurate
    transcriptions of works by British women writers
    of the 19th century, encoded using ... SGML. The
    works ... include anthologies, novels, political
    pamphlets, religious tracts, children's books and
    volumes of poetry and verse drama."

David Williamsondawi_at_loc.gov202.707.5179