Planning to Maximize Longevity of Digital Information - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Planning to Maximize Longevity of Digital Information

Description:

Besser--Planning (Brazil) 31/5/01. 1. Planning to Maximize Longevity of Digital Information ... Besser--Planning (Brazil) 31/5/01. 6. Key Considerations for ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 50
Provided by: gseis
Category:

less

Transcript and Presenter's Notes

Title: Planning to Maximize Longevity of Digital Information


1
Planning to Maximize Longevity of Digital
Information
  • Howard Besser
  • UCLA School of Education Information
  • http//www.gseis.ucla.edu/howard

2
Planning to Maximize Longevity of Digital Info-
  • Access and Preservation
  • Why are you Managing this Information?
  • Key Considerations for Imaging Projects
  • Important Planning Considerations
  • Models for Digital Collections
  • Importance of Metadata Standards
  • Digital Longevity Issues
  • More Planning Issues

3
Access and Preservation
  • Digitizing can serve both Access and
    Preservation
  • E.g. Access to digital surrogates saves wear
    tear on originals
  • But Digitization for Access can be quite
    different than Digitization for Preservation
  • Level of detail, scanning quality, extensiveness
    of resources
  • And long-term retention of digital works is
    still an open issue

4
Why are you Managing this Information?
  • Organizational mission type
  • Users
  • Uses

5
Key Considerations for Imaging Projects-
  • Users' Needs
  • Image Quality
  • Intellectual Property
  • Standards
  • Topology
  • Tools Processes

6
Key Considerations for Imaging Projects (1 of 3)
  • Users' Needs
  • Quality of Digital Surrogate
  • Interoperable desktop applications
  • Image Quality
  • Archival
  • Current online delivery

7
Key Considerations for Imaging Projects (2 of 3)
  • Intellectual Property
  • Standards
  • Modular and Layered Architecture
  • Terminology
  • Technical imaging information
  • Topology

8
Key Considerations for Imaging Projects (3 of 3)
  • Tools Processes
  • Scanners
  • Compression techniques
  • Linking files
  • Workflow
  • Interoperable desktop applications

9
Some nuts-and-boltsPlanning Considerations
  • Think about users (and potential users), uses,
    and type of material/collection
  • Scan at the highest quality that does not exceed
    the likely potential users/uses/material
  • Do not let todays delivery limitations influence
    your scanning file sizes understand the
    difference between digital masters and derivative
    files used for delivery
  • Many documents which appear to be bitonal
    actually are better represented with greyscale
    scans
  • Include color bar and ruler in the scan
  • Use objective measurements to determine scanner
    settings (do NOT attempt to make the image good
    on your particular monitor or use image
    processing to color correct)
  • Dont use lossy compression
  • Store in a common (standardized) file format
  • Capture as much metadata as is reasonably
    possiple (including metadata about the scanning
    process itself)

10
Why Scale is important
11
Important Planning Considerations
  • File Formats
  • Choosing Interoperable Systems
  • Adhere to standards
  • Vendors with large installed base
  • Refreshing and/or Migration

12
Key problems were facing
  • Discovery
  • Longevity-
  • Interoperability-

13
Serious Longevity Problems
  • What we know from prior widespread digital file
    formats
  • Images separating from their metadata
  • Inaccessibility of software needed to view an
    image
  • Inability to even decode the file format of an
    image
  • return to Longevity problem later-

14
Traditional Digital Library Model
15
Ideal Digital Library Model
16
For Interoperability Digital Libraries Need
Standards
  • Descriptive Metadata for consistent description
  • Discovery Metadata for finding
  • Administrative Metadata for viewing and
    maintaining
  • Structural Metadata for navigation
  • ... Terms Conditions Metadata for controlling
    access...

17
Why are Standards and Metadata consensus
important?
  • Managing digital files over time
  • Longevity
  • Interoperability
  • Veracity
  • Recording in a consistent manner
  • Will give vendors incentive to create
    applications that support this

18
Why Standards?
  • Why do we need standards?
  • To make information universally available to
    users
  • facilitate sharing and interchange of
    information
  • To preserve information (make it safe from
    changes in hardware and software)
  • Standards only work if communities widely accept
    them, but theyre necessary for communities to
    work together

19
Questions to Ask
  • What communities is this standard designed for?
  • What type of information is this standard
    designed to handle?
  • What functions is this standard designed to
    serve?
  • What previous standards is it built upon?
  • Does the standard prescribe how to create new
    records (or parts of records), or how to map from
    existing records?
  • How far does the standard go? Semantics Does it
    define element sets? Rules? Syntax?-

20
Semantics/Syntax/Structure
  • Semantics
  • meaning, as defined by a community to meet their
    particular needs (DC)
  • Syntax
  • a systematic arrangement of data elements for
    machine processing
  • facilitates the exchange and use of metadata
    among various applications (HTML, XML, RDF)
  • Structure
  • a formal arrangement of the syntax with the goal
    of consistent representation of the semantics
    (rules defining field contents like 1/11/99)

21
The Short Life of Digital Info Digital Longevity
Problems-
  • Disappearing Information
  • The Viewing Problem
  • The Scrambling Problem
  • The Inter-relation Problem
  • The Custodial Problem
  • The Translation Problem

22
The Viewing Problem
  • Digital Info requires a whole infrastructure to
    view it
  • Each piece of that infrastructure is changing at
    an incredibly rapid rate
  • How can we ever hope to deal with all the
    permutations and combinations

23
The Scrambling ProblemDangers from
  • Compression to ease storage delivery
  • Container Architecture to enhance digital commerce

24
The Inter-relation Problem
  • -Info is increasingly inter-related to other
    info
  • -How do we make our own Info persist when it
    points to and integrates with Info owned by
    others?
  • -What is the boundary of a set of information (or
    even of a digital object)?

25
The Custodial Problem
  • How do we decide what to save?
  • Who should save it?
  • How should they save it?
  • -methods for later access emulation, migration,
    etc.
  • -issues of authenticity and evidence

26
The Translation Problem
  • Content translated into new delivery devices
    changes meaning
  • -A photo vs. a painting
  • -If Info is produced originally in digital form
    in one encoded format, will it be the same when
    translated into another format?
  • Behaviors

27
Pieces of the Solution (1/2)
  • -We need to insist upon clearly readable
    standardized ways for digital objects to
    self-identify their formats
  • -We should discourage scrambling
  • -We need to better understand information
    inter-relates to other Info, and what constitutes
    boundaries of Info objects

28
Pieces of the Solution (2/2)
  • -People and organizations wishing to make
    information persist need guidelines of how to go
    about doing it
  • -We need to better understand how translating
    from one storage or display format to another
    affects the meaning of a work
  • -We need to save the behaviors of a digital
    object, not just its contents

29
Conceptual Approaches to Digital Preservation
  • Refreshing always necessary due to volatility of
    physical strata
  • Impact on evidential value
  • Migration -- advantages disadvantages
  • Emulation -- advantages disadvantages

30
Metadata can be the first line of defense
  • Can tell you
  • where the file is (if you cant find the file)
  • where more info about the file is (if you have
    the file but most other metadata has become
    separated)
  • what the file format is
  • what the compression scheme is
  • what application program and version is needed
    for the file

31
Groups Working onthe Big Problemhttp//sunsite.b
erkeley.edu/Longevity/
  • CPA Task Force
  • Getty Time Bits Conference Follow-ups-
  • Emulation experiments in US and Europe
  • NEDLIB, CURL, Michigan
  • Mellon-funded E-Journal Archive experiments
  • Internet Archive
  • Long Now

32
Time Bits
33
Time Bits Participants
  • Steward Brand
  • Howard Besser
  • Brian Eno
  • Danny Hillis
  • Peter Lyman
  • Brewster Kahle
  • Kevin Kelly
  • Jaron Lanier
  • Doug Carlston
  • John Heilemann
  • Ben Davis
  • Margaret MacLean
  • Bruce Sterling
  • Paul Saffo

34
Groups Working onPieces of the Big
Problemhttp//sunsite.berkeley.edu/Longevity/
  • Internet Archive
  • Long Now
  • Emulation experiments in US and Europe
  • NEDLIB, CURL, Michigan

35
Journal Archiving
  • License, dont own may not be even able to
    obtain right to make archival copy
  • Increasingly no paper back-up at all
  • Usually we dont have the important redundancy
    factor
  • Stanfords LOCKSS Project (Lots of Copies Keeps
    Stuff Safe) and its problems (http//lockss.stanfo
    rd.edu)

36
Migration/Refreshing
  • Impact on evidential value

37
More Planning Issues
  • Image Families
  • Behaviors
  • Persistent Identification

38
Identification/Provenance (Images)-
  • The number of variant forms of a work can be
    enormous
  • Image Families
  • A digital image frequently has many layers of
    parentage
  • Information about the parentage that can indicate
    the quality and veracity of the image (Dublin
    Core "Source" and "Relation")
  • how to deal with different versions derived from
    the same scan or different encoding schemes
  • Vocabulary Standards to express this

39
The number of variant forms of a work can be
enormous
  • different views of the same object
  • different scans of the same photo
  • different resolutions
  • different compression schemes
  • different compression ratios
  • different file storage formats
  • different details of the same image
  • ...

40
Image Families
41
Identification/Provenance
  • how to deal with different versions (browse,
    hi-res, medium res) derived from the same scan or
    different encoding schemes (TIFF, PICT, JFIF)
  • Vocabulary Standards to express this
  • VRA Surrogate Categories
  • CIMI's "Image Elements

42
MOA II Behaviors
  • Navigation
  • Display/Print

43
MOA II Best practices
  • Use/Users/Collection
  • Benchmarking
  • Masters vs. Derivatives
  • Scanning-
  • Administrative Metadata-
  • Structural Metadata-

44
To deal with Immediately
  • Persistent IDs
  • Metadata

45
Persistent IDs--the Problem
  • Need to separate work ID from work location
  • URNs probably wont be ready until 2003
  • Becomes a business process issue when one
    organization maintains the resource and another
    organization references it (ie. licensed from
    vendors or managed by separate administrative
    structures)

46
More Persistent IDs--the Approach for today
  • PURLs
  • Handles
  • HTTP redirects
  • And worry about costs now and conversion costs
    when URNs become feasible

47
Data Set ManagementMore issues with referencing
IDs
  • References for mirror sites
  • References for back-up sites when main site is
    down or bottle-necked
  • References for off-site copies and archival
    copies

48
One Final QuestionWho will collect the digital
works of today that should become the Special
Collections of tomorrow?
  • web sites
  • zines
  • electronic journals
  • listserve and email discussions
  • drafts of works that later become famous

49
Planning to Maximize Longevity of Digital
Information
  • Howard Besser
  • UCLA School of Education Information
  • http//sunsite.berkeley.edu/Longevity/
  • http//www.gseis.ucla.edu/howard
  • http//sunsite.berkeley.edu/moa2
  • http//lockss.stanford.edu
  • http//www.longnow.com/10klibrary/TimeBitsDisc/
  • http//www.archive.org/
Write a Comment
User Comments (0)
About PowerShow.com