Archiving EMail and Public Records: Challenges, Strategies, and NARAs Electronic Records Archives Pr - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Archiving EMail and Public Records: Challenges, Strategies, and NARAs Electronic Records Archives Pr

Description:

Electronic formats are replacing paper as the medium for communication and transactions ... Diversity (office automation, image, video, and audio formats) ... – PowerPoint PPT presentation

Number of Views:181
Avg rating:3.0/5.0
Slides: 28
Provided by: nar37
Category:

less

Transcript and Presenter's Notes

Title: Archiving EMail and Public Records: Challenges, Strategies, and NARAs Electronic Records Archives Pr


1
Archiving E-Mail and Public Records Challenges,
Strategies, and NARAs Electronic Records
Archives Program
  • June 13, 2001
  • Daniel M. Jansen, ERA Staff
  • National Archives and Records Administration

2
The Issues
  • Electronic formats are replacing paper as the
    medium for communication and transactions
  • Records that are indispensable for documenting
    citizens rights, the actions of Federal
    officials, and the nations history will be lost
    without effective technology for preserving and
    providing access to them.
  • Electronic records, no less than those in
    traditional forms, are critical for the effective
    functioning of democracy
  • Digital technology is both necessary and
    advantageous for discovering and delivering
    records

3
NARAs Current Electronic Records Preservation
Capabilities
  • Program in existence for 30 years
  • Holdings of electronic records that have received
    full archival processing (accessioning,
    preservation, and access to software and hardware
    independent files) are limited to structured data
    files (fielded, fixed-length, comma-separated
    ASCII)
  • Holdings of electronic records that temporarily
    have received minimal processing (bitstream
    preservation only) include a wider variety and
    large quantity of files

4
Recent Challenges
  • Diversity (office automation, image, video, and
    audio formats)
  • Complexity (decision support systems or GIS,
    applets, and interactive WWW pages)
  • Volume
  • Files
  • Bytes
  • Rapidly changing nature of systems used to create
    records

5
What do we need to do?
  • Overcome technological obsolescence in a way that
    preserves demonstrably authentic records.
  • Build a dynamic solution that incorporates the
    expectation of continuing change in information
    technology and in the records it produces.
  • Find ways to take advantage of continuing
    progress in information technology in order to
    maintain and improve both performance and
    customer service.

6
Strategies for Digital Preservation
  • Existing
  • Technology Preservation
  • Maintain original hardware and software
  • Imitate original technology
  • Data Format Migration
  • Version Migration
  • Standardize Formats
  • Emerging
  • Transformation to Persistent Form
  • Persistent Object Preservation

7
Technology Preservation Strategies
  • Includes emulation, maintaining original hardware
    and software, etc.
  • Perpetuate the problems of current technologies
  • Increase in complexity over time
  • Particularly complex when collections of records
    to be preserved are accumulated over time, or
    constitute a system rather than individual files
  • Advanced technology still required to improve
    information discovery, delivery, and management
  • In emulation, claims of authenticity derived from
    original technology are invalid
  • Ultimately, addresses technological obsolescence,
    not preservation of records

8
Format Migration Strategies
  • Includes systematic migration through versions of
    a single software package, some forms of
    standardization, etc.
  • Short-term and ad hoc solutions
  • Replacement formats may not exist in some cases
  • Even standardized products contain features that
    extend the standard or implement standards in
    different ways
  • Each migration brings risk of alteration and
    associated difficulty demonstrating integrity of
    the records
  • Complexity increases with growth of the number of
    formats
  • Ultimately, addresses technological obsolescence,
    not preservation of records

9
Transformation Strategies
  • Includes Persistent Object Preservation an
    approach emerging from collaborative research in
    which NARA is participating
  • Does not preserve things in original
    technological state
  • Requires precise specification of archival
    requirements related to content, context,
    structure, and presentation of records and the
    collections to which they belong
  • Currently beyond the state of the art
  • Focuses on requirements for preserving records

10
Comparison of Transformation and Other Strategies
  • Unlike format migration strategies,
    transformation approaches may minimize number of
    migrations and actively manage authenticity
  • Unlike technology preservation strategies, it may
    be possible to do transformation in a way that
    diverse and complex genres of records can be
    managed and made available without regard for the
    choice of technology that you are using to manage
    them or make them accessible
  • Unlike transformation strategies, format
    migration and technology preservation strategies
    are currently possible in limited situations,
    although sustainability of these approaches is
    questionable

11
Persistent Object Preservation(POP)
  • Application of Distributed Object Computation
    Testbed (DOCT) technologies to archival
    preservation and access (DARPA/USPTO/NARA
    sponsored work at San Diego Supercomputer Center)
  • Comprehensive, scalable, infrastructure
    independent, flexible
  • Established in the core technologies of the next
    generation National Information Infrastructure
  • Backed by high-performance computing, high-speed
    communications, physical science, life science,
    and digital library communities
  • Implementation of Reference Model for an Open
    Archival Information System (NASA-Consultative
    Committee on Space Data Systems)

12
POP Transformation
  • Identify all significant properties of the
    classes of objects (genres of records) that are
    to be preserved.
  • Express these properties in formal models (i.e.,
    define the canonical form for each class of
    record)
  • Transform the objects and the collections to
    which they belong by
  • Encapsulating them in metadata defined in the
    formal models
  • Eliminating other technical characteristics that
    are proprietary, dependent on specific hardware
    or software, or subject to obsolescence.
  • Use software mediators to enable future
    technologies to interpret the models and metadata
  • to rebuild and repopulate collections
  • support information discovery and delivery.

13
POP Assumptions
  • Content, context, structure, and presentation of
    records needs to be maintained to prove
    authenticity
  • All records can be represented as objects with
    characteristics and behaviors
  • Provenance and original order of records can be
    represented by grouping objects into collections
  • For expression of canonical form, there exists an
    open, standardized, self-describing, and flexible
    modeling language -- a hardware and software
    independent method of expressing a complex data
    model
  • For transformation of the records and
    collections, POP requires an open and
    standardized syntax

14
POP Implications
  • Domain-specific semantics, for every class of
    records you wish to maintain, are necessary for
    authenticity
  • Archival processing of legacy records promises
    to be difficult
  • Full automation of processes may not be possible
  • Other preservation strategies may be required in
    some instances
  • Real opportunity for POP will be in future as
    commercial products incorporate standard
    markup-based data structures and communities of
    interest define domain-specific markup languages
    and semantics
  • To handle volume of materials being generated and
    provide necessary services, POP requires
    existence of distributed, redundant storage
    distributed processing distributed security and
    high-speed communications

15
POP Demonstrations
  • Ingest and storage demonstration for several
    diverse collections of electronic records
    representing the following genres
  • E-mail
  • Geospatial data
  • Office automation products
  • Databases
  • Images
  • Access demonstrated for a few collections
  • Usenet e-mail example 1 million message
    collected, ingested, stored, and made available
    for access in just over one day

16
What is the Electronic Records Archives?
  • The Electronic Records Archives (ERA) is NARAs
    vision for a comprehensive, systematic, and
    dynamic means of preserving and providing
    continuing access to authentic electronic records
    over time.

17
NARAs Plan to Build ERA
  • Research Leverage, sponsor, partner in, and
    conduct research into archival issues and
    emerging technologies for risk mitigation
    purposes
  • Systems Development Develop the ERA system with
    the most promising technologies as they become
    available in the market.
  • Business Development Addressing policy issues
    arising from research and systems development,
    articulating archival rules to be implemented in
    ERA system, facilitating organizational change

18
Research ActivitiesPartnerships
  • Open Archival Information System (OAIS) Reference
    Model
  • NASA, Consultative Committee on Space Data
    Systems
  • Distributed Object Computation Testbed (DOCT)
  • Defense Advanced Research Projects Agency, U.S.
    Patent and Trademark Office
  • National Partnership for Advanced Computational
    Infrastructure (NPACI)
  • National Science Foundation
  • Presidential Electronic Records Processing
    Operational System (PERPOS)
  • Army Research Laboratory, Georgia Tech Research
    Institute
  • Archivists Workbench
  • NHPRC Grant to San Diego Supercomputer Center
  • International research on Permanent Authentic
    Records in Electronic Systems (InterPARES)
  • 7 international, multidisciplinary research
    teams, 10 national archives

19
Research ActivitiesCurrent Investigations
  • Ingest of heterogeneous collections (includes WWW
    sites)
  • Ingest of geospatial data/GIS
  • Demonstration of POP processes in distributed
    mode
  • Demonstration of POP security
  • Use of computational linguistics techniques and
    technologies to identify, retrieve, and extract
    information from unmanaged electronic records

20
Systems Development Activities
  • Initial planning, scheduling, and cost estimating
  • Progressive deployment of prototypes, pilots, and
    productions systems that demonstrate ERA concepts
    and address current NARA requirements
  • Access to Archival Databases project
  • Presidential Electronic Records Processing
    Operational System project
  • Digital Official Military Personnel Files
    Repository project
  • Preparing for development of ERA proper

21
Business Development Activities
  • Business process analysis/development effort
    beginning in FY2002
  • Will include, at a minimum, accessioning,
    preservation and access functions
  • Also includes analysis of existing processes,
    policies, rules, procedures, and organizational
    structures in other functional areas (appraisal,
    scheduling, etc.)
  • Includes initial planning for management of
    organizational change
  • Communications

22
ERA Concept Diagrams
  • ERA Functional Model
  • ERA Architectural Model
  • ERA Design Strategy

23
ERA Functional ModelAn Open Archival
Information System Implementation
Submission Information Packages
Producer
OAIS
Archival Information Packages
Result sets
queries
orders
Consumer
Dissemination Information Packages
24
ERA Infrastructure Concept
Gb/sec Internet Grid Security Distributed
Processing Mediation among Systems Distributed,
redundant Storage Infrastructure Independence
Records Creator
Workbench
Public User
Government User
NARA User
Workbench
Records Creator
Trusted Repository
Digital Library
Records Creator
NARA User
Public User
25
ERA Design Strategy
NARA System
ERA Framework
Information Technology Architecture for
Persistent Digital Collections
26
ERA Program Schedule
  • Research will continue throughout in order to
    accumulate knowledge, experience, and metrics
  • Develop primary system(s) at point where enabling
    technologies mature and are available on the
    market, estimate a 5-10 project to deploy primary
    capability
  • Prototypes, pilots, and operational components
    rolled out annually over next few years

27
For more information
  • http//www.nara.gov/era
  • Dan Jansen
  • (301) 713-6730x285
  • dan.jansen_at_nara.gov
Write a Comment
User Comments (0)
About PowerShow.com