Digital Preservation - Outline - PowerPoint PPT Presentation

About This Presentation

Digital Preservation - Outline


Digital Preservation - Outline Introduction - Definitions, Facts, Challenges Digital Archiving A Life Cycle View Metadata Strategies RUL Projects – PowerPoint PPT presentation

Number of Views:215
Avg rating:3.0/5.0
Slides: 33
Provided by: RonaldC155


Transcript and Presenter's Notes

Title: Digital Preservation - Outline

Digital Preservation - Outline
  • Introduction - Definitions, Facts, Challenges
  • Digital Archiving A Life Cycle View
  • Metadata
  • Strategies
  • RUL Projects
  • Trusted Digital Repositories

Digital Dark Ages?
  • As we move into the electronic era of digital
    objects it is important to know that there are
    new barbarians at the gate and that we are moving
    into an era where much of what we know today,
    much of what is coded and written electronically,
    will be lost forever. We are, to my mind, living
    in the midst of digital Dark Ages consequently,
    much as monks of times past, it falls to
    librarians and archivists to hold to the
    tradition which reveres history and the published
    heritage of our times. (Kuny, 1998)

  • The urge to preserve is endemic to our roles as
  • The patent office, home to nearly 6.5 million
    patents dating to 1790, is converting to an
    electronic database and discarding a significant
    portion of its paper files after they have been
    scanned and digitized. -Mitchell, A. (2001).
    Ingenuitys Blueprints, Into Historys Dustbin.
    NY Times. December 30, 2001, p. A1.
  • A scenario A truck loaded with hazardous waste
    is headed toward a dump site. Will our
    descendants know where we have buried the waste?
    (Bide, et al, 1999)

Digital Preservation Some Numbers
  • 20 Trillion loss of information expected over
    the next 20 years (Lysakowski and Leibowitz,
  • Within 10 years, the total number of electronic
    records could be doubling every 60 minutes.
  • From an economic model, the cost of converting
    from MS-Office95 to Office97 is estimated at
    711,110 work years.
  • 80 Million books in the US are rapidly
  • Yale University states that 80 of their
    collection is endangered.
  • Print material
  • All print material (ascii text) published in the
    world each year could be stored in about 5
  • Images
  • Over 80 billion photographs are taken each year
    which would take 400 petabytes to store.

Numbers continued(from http//www.ccsf.caltech
  • Megabyte one million bytes
  • Gigabyte 1000 megabytes
  • Terabyte 1000 gigabytes
  • 10 terabytes the printed collection of the US
    Library of Congress
  • Petabyte 1000 terabytes
  • 2 petabytes all the material in US academic
    research libraries
  • Exabyte 1000 petabytes
  • 5 exabytes all words ever spoken by human beings

Preservation in Digital Libraries
  • PreservationThe managerial, financial, and
    technical issues involved in preserving library
    (or archive) materials in all formats - and/or
    their information content - so as to maximize
    their useful life (Eden, 1997)
  • Digital preservation is defined as the managed
    activities necessary for ensuring
  • 1. The long term maintenance of a byte stream and
  • 2. Continued accessibility of the contents thru
    time and changing technology.
  • Digital Libraries vs. Digital Archives Archives
    make a commitment to long-term preservation of
    digital information. (Joint Task Force on
    Digital Archiving)

Why Would You Digitally Preserve?
  • Protect original print artifact
  • Provide access by accurately representing
  • Preserve material that exists in electronic form
  • Enhance research by improving originals
  • High resolution imagery to study details
  • Searchable text

The Challenges of Digital Preservation
  • Lack of standards (or too many standards)
  • Lack of documentation on production and use
  • Cost and rapid obsolescence of technology
  • Impermanence of the medium
  • Mutability of the content (easily changed legal
  • Version control
  • Need to guarantee integrity of digital
  • Migration of information (driven by external

What to Archive A Checklist
  • Historical and research value
  • Aesthetic and artistic merit
  • Uniqueness of an item
  • Subject content relevant to Institution
  • Access Restrictions and inventory
  • Condition
  • Frequency of use frequency of change
  • Ownership
  • Redundancy concern for loss or modification
  • Length of preservation
  • Is any other institution archiving the material?

Candidates for Preservation
  • Material created (not digitized) in digital
  • Reference databases (online catalogs, subject
    specific indexes, etc)
  • Electronic journals
  • Digital maps
  • Data
  • Websites (e.g. research guides, web-based
    databases, documents)
  • Government information
  • Census data, international statistics (Do we rely
    on the government to preserve this material?)
  • Consortiums such as Inter-university Consortium
    for Political and Social Research (ICPSR) have a
  • Print material/manuscripts that are digitized for
    access and/or preservation
  • Original documents not retained (e.g. as in the
    NJ Environmental Digital Library)
  • Original document retained (as in Special
  • Electronic (analog) media that is digitized
    (audio, video tapes)

Digital Archiving A Life Cycle View
  • Creation
  • Acquisition and Collection Development
  • Identification and Cataloging
  • Storage
  • Preservation (incl. Metadata)
  • Access
  • from (Hodge, 2000)

Digital Preservation Strategies
  • Migration transferring digital materials from
    one media or format to another because of
    obsolescence, failure in media, software updates,
    standards, etc.
  • Emulation refers to the process of mimicking, in
    software, a piece of hardware or software so that
    other processes think the original
    equipment/function is still available in its
    original form. (http//
  • Encapsulation A technique of grouping together a
    digital object and anything else necessary to
    provide access to that object. This technique
    aims to overcome the problems of the
    technological obsolescence of file formats
    because the details of how to interpret the
    digital bits in the object can be part of the
    encapsulated information. (http//

Migration of Digital Information
  • Reasons for Migration
  • Medium refreshing (e.g. rewrite a CD)
  • Medium conversion (diskette to CD)
  • Format conversion (ascii to pdf)
  • Version upgrade (Office97 to Office2000)
  • Migration of technical environment (W98 to NT)

The Migration Process
  • Error Prone
  • Labor intensive and expensive
  • Governed by external factors
  • The only approach that works for now

RUL Projects A Sampling
  • Medieval Early Modern Data Bank
  • Eagleton Public Opinion Polls
  • The Augustine Collection
  • REALITI A Digital Preservation Framework

Medieval Early Modern Data Bank - MEMDB
  • Characteristics
  • At http//
  • Content commodity prices in the medieval period
  • Access public domain
  • Compiler Co-directors of MEMDB
  • Owner RUL?
  • Archiver (who should archive?)
  • Type Database on the web
  • Format html, Active server pages, MS-Access,
  • Metadata reqmts numeric data
  • Questions What is the primary document? How long
    should it be preserved? Extent of document?
    Owner? Preserve look feel?

Eagleton Public Opinion Polls
  • Characteristics
  • At http//
  • Content New Jersey public opinion (1970 - )
  • Access public domain
  • Compiler Eagleton Institute
  • Owner Eagleton/Star Ledger
  • Archiver RUL/Scholarly Communication Center
  • Type database on the Web
  • Format html, MS-Access, portable spss files
  • Metadata reqmnts Questionnaires numeric data
  • Questions Preserve look feel, spss
    (proprietary software)

The Augustine Collection
  • Characteristics
  • At http//
  • Content Photographs from 19th Century New Jersey
  • Access public domain
  • Compiler William F. Augustine
  • Owner RUL Special Collections
  • Archiver RUL Special Collections
  • Type image archive
  • Format html, jpeg
  • Metadata reqmnts original artifacts
  • Questions image format, preserve digital
    archive, individual items/collection

REALITI A Digital Preservation
Framework(Rutgers Electronic Access to Library
Information thru Technology Integration)
  • Characteristics
  • At http//
  • Content Civil War period in New Jersey
  • Access public domain
  • Compiler RUL Special Collections
  • Owner RUL
  • Archiver RUL Special Collections/SCC
  • Type Images on the Web
  • Format html, ColdFusion, MS-Access, PDF,
  • Metadata Preservation, multiple formats
  • Questions formats, compression, metadata,
    original artifact

Preservation Metatdata for Digital Collections
  • Collection Level
  • Persistent identifier
  • Date of creation
  • Structural type (e.g. ascii text, jpeg images,
  • Technical infrastructure files, databases, html,
  • File description
  • System requirements
  • Installation requirements
  • Storage information
  • Access inhibitors
  • Access facilitators
  • Preservation action permission
  • Validation (information about validation
  • Relationships (to other objects)
  • (continued)
  • Quirks (any characteristic that may cause loss
    in funtionality)
  • Archiving decision (work)
  • Decision reason (work)
  • Institution responsible for archiving decision
  • Archiving decision (manifestation)
  • Decision reason (manifestation)
  • Institution Responsible for Archiving Decision
  • Intention Type
  • Institution with preservation responsibility
  • Process
  • Record Creator
  • Other

(from National Library of Australia
http// )
Trusted Digital Repositories (http//
  • A Proposed Framework for a Trusted Archival
  • Administrative adherence to agreed upon
  • Organizational commitment to management on
    behalf of depositors
  • Financial adherence to good business practices
  • Technological infrastructure in place with
    upgrade policies
  • Security policies for security, auditability,
    and backup
  • Procedural Repository practices will be in
    place and documented.

Possible Organizational Models(Who might be a
digital repository)
  • Originators (e.g. individual researchers)
  • Publishers (What happens when the publisher goes
    out of business?)
  • Libraries, museums, and other conservator
  • National libraries and archives
  • Cooperative service agencies (e.g. OCLC, RLG,
    ICPSR for social science research)
  • Segmented market providers (e.g. Bell Howell
    for preserving dissertation literature and Early
    English Books)
  • Private storage providers
  • Computer centers
  • Scholarly associations (e.g. American Institute
    of Physics)
  • Indexing and abstracting services
  • Certified digital archives.

Institutional Efforts
  • OCLC and Web Document Digital Archive (WDDA)
  • Tools for libraries and archives to preserve and
    maintain access to digital content
  • At http//
  • RLG Cultural Materials
  • Cultural Materials is being developed through
    members to set the conditions for contributing
    and distributing their digital surrogates of
    valuable collections.
  • The goal is a growing, significant, online
    resource and service solution.
  • At http//
  • LOCKSS A permanent web publishing and access
  • Addresses problem of material no longer available
    from the publisher
  • Modeled on distributed print libraries. Reich, et
    al, (2001). D-Lib Magazine, 7, (6).
  • OAIS Open Archival Information System Reference
  • Requirements for any system responsible for
    preserving any type of information over a long
  • At http//

Digital Preservation Concluding Thoughts
  • Librarians and archivists are a key to the
  • A major academic scandal will have to happen
    first . . . in order to focus attention and
    resources. (Graham, 2000).
  • A combination of solutions will be employed
    including migration and emulation.
  • Digital archaeology will be used to recover
    lost data.

Preservation Resources on the Web
  • Institutional Issues
  • ARL Preservation Program (http//
  • Digital Preservation Needs and Requirements in
    RLG Member Institutions (http//
  • RLG DigiNews (http//
  • Technical Information/Papers
  • Avoiding Technological Quicksand
  • PADI - Preserving Access to Digital Information -
    from the National Library of Australia
  • Background Papers and Technical Information -
    from LOC American Memories site
  • Preservation of electronic information - a
    bibliography (http//
  • Digital Imaging Tutorial - http//www.library.corn

More Information on the Web
  • Technical Information/Papers (continued)
  • CLIR Publications (http//
  • Kuny, T. (1998/May). The digital dark dges?
    Challenges in the preservation of electronic
    information. International Preservation News,
    (17), At http//
  • Hodge, G. M. (2000). Best practices for digital
    archiving An information life cycle approach.
    D-Lib Magazine, 6, (1), available at
  • Handbooks
  • Hunter, G. S. (2000). Preserving Digital
    Information A How-To-Do-It Manual, New York
    Neil-Schuman Publishers
  • Sitts, M. K. (2000). Handbook for Digital
    Projects A Management Tool for Preservation and
    Access, Andover, Massachusetts Northeast
    Document Conservation Center

  • Bide, M, Potter, E, Watkinson, A. (1999) ,
    Digital Preservation an introduction to the
    standards issues surrounding the deposit of
    non-print publications. At
  • Graham, P. (2000). RLG and Archiving at the
    heart of the research library mission. RLG News.
    Winter 2000. (50). P. 12 13.
  • Graham, P. (1998/February). Digital strategies
    for the Rutgers University Libraries a white
    paper draft. DRAFT 4.
  • Hedstrom, M. Montgomery, S. (1998). Digital
    Preservation Needs and Requirements in RLG Member
    Institutions A Study Commissioned by the
    Research Libraries Group. Available at
  • Hodge, G. (2000). Best practices for digital
    archiving An information life cycle approach.
    D-Lib Magazine, 6, (1). Available at
  • Lysakowski, R. Leibowitz, Z. (2000). Looming
    information age crisis expected to cause
    trillion-dollar losses over the next 20 years
    Titantic 2020 a call to action. Available at
  • Rothenberg, J. (1998/January). Avoiding
    Technological Quicksand Finding a Viable
    Foundation for Digital Preservation. Available
    at http//

(No Transcript)
Migration Complexity of the Technical
Preservation in Digital Libraries
  • PreservationThe managerial, financial, and
    technical issues involved in preserving library
    (or archive) materials in all formats - and/or
    their information content - so as to maximize
    their useful life (Eden, 1997)
  • Digital preservation The term refers exclusively
    to the preservation (whatever exactly that
    entails) of material which is available solely?
    in electronic form (Bide, 1999).
  • And the digital version is considered to be the
    primary archival item. (Hedstrom, 1998)
  • Digital Libraries vs. Digital Archives Archives
    make a commitment to long-term preservation of
    digital information. (Joint Task Force on
    Digital Archiving)

Digital Archiving- Getting Started
  • Form an archiving working group
  • Prepare a preliminary policy statement
  • Trial the policy statement with several small,
    existing projects
  • Examine what others are doing and bring in best
  • Collaborate with others who are interested in
    digital in preservation.
  • Initiate forums on digital archiving invite
    colleagues, students, researchers, etc.
  • Submit a recommendation for a digital archiving
    program and next steps.

Trusted Digital Repositories(http//
  • A Proposed Definition (from RLG document)
  • Technology Infrastructure
  • Auditability, security, and communication
  • Backup policies incl. avoiding, detecting and
    restoring corrupted data
  • Organization
  • Certification
  • Compliance
  • Reputation and performance
  • Agreements between creators and providers
  • Open sharing of what is being preserved and for
  • Balanced risk, benefit, and cost
Write a Comment
User Comments (0)