MetaArchive Distributed Digital Preservation Workshop - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

MetaArchive Distributed Digital Preservation Workshop

Description:

... University of North Carolina at Chapel Hill in an explicit effort to define the ... What about Web sites developed during the national elections? ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 36
Provided by: mhal3
Category:

less

Transcript and Presenter's Notes

Title: MetaArchive Distributed Digital Preservation Workshop


1
MetaArchiveDistributed Digital Preservation
Workshop
  • Wednesday, May 30, 2007
  • Robert W. Woodruff Library
  • Emory University
  • Atlanta, Georgia

2
Day One Overview
  • 830 AM - 900 AM Light Breakfast and Welcome
  • 900 AM - 1030 AM Session 1. Overview of
    Distributed Digital Preservation Networks, M.
    Halbert
  • 1030 AM - 1045 AM Break
  • 1045 AM - 1215 PM Session 2. Content
    Management, C. Jannik and G. MacMillan
  • 1215 PM - 115 PM Lunch
  • 115 PM - 245 PM Session 3. Costs and
    Operational Considerations, M. Halbert and K.
    Skinner
  • 245 PM - 300 PM Break
  • 300 PM - 430 PM Session 4. Organizational
    Agreements, D. Buttler and K. Skinner
  • 430 PM - 445 PM Wrap Up

3
Purposes of this Workshop
  • Foster discussion concerning distributed digital
    preservation strategies
  • Share information and perspectives acquired in
    the course of the MetaArchive NDIIPP project
  • Provide information and training for institutions
    seeking to build or join distributed digital
    preservation networks based on the LOCKSS
    software.

4
Introductions Who We All Are
  • Please introduce yourself
  • Say where you are from
  • Mention any particular things that you hope to
    get out of this workshop, and any other
    expectations you may have
  • Identify any particular topics you hope we will
    spend time discussing

5
Learning Objectives for this Session
  • Review day one workshop sessions
  • Overview of some digital preservation basics
  • Reasons to establish or join a network
  • Models of network organization
  • Defining partner/member responsibilities
  • Overview of MetaArchive and LOCKSS

6
Overview of Some Digital Preservation Basics
7
The New Field of Digital Preservation
  • Cultural heritage organizations are rapidly
    expanding their digitization programs in an
    effort to provide better access to collections.
    As these digitization efforts go forward, and as
    an increasing number of born-digital acquisitions
    are made, there are concomitant needs for
    preservation of these materials.
  • The DigCCurr 2007 Conference was hosted in April
    2007 by the School of Information and Library
    Science at the University of North Carolina at
    Chapel Hill in an explicit effort to define the
    new field of Digital Curation.
  • The Consultative Committee for Space Data Systems
    has of necessity created many working standards
    for preservation of digital information. One of
    the most notable standards was the Reference
    Model for an Open Archival Information System
    (OAIS) which provided a broad vocabulary for
    discussing digital archives systems and processes
  • The National Digital Information Infrastructure
    and Preservation Program (NDIIPP) is the
    congressionally chartered national program to
    digitally preserve our national heritage
  • The Digital Preservation Management Workshop
    hosted by Cornell University from 2003-2006 was
    an effort to collate and share relevant best
    practices and documentation from a large number
    of emerging projects and efforts related to
    digital preservation.
  • In the UK, groups such as the Digital Curation
    Centre and the Digital Preservation Coalition
    have been formed to foster joint action to
    address the urgent challenges of securing the
    preservation of digital resources in the UK and
    to work with others internationally to secure our
    global digital memory and knowledge base.

8
The Data Loss Problem
9
The Data Loss Problem (cont.)
10
The Data Loss Problem (cont.)
11
The Data Loss Problem (cont.)
12
The Data Loss Problem (cont.)
From NDIIPP Website on the Importance of Digital
preservation (http//www.digitalpreservation.gov/
importance/)
13
National Digital Information and Infrastructure
Preservation Program (NDIIPP) Commentary
  • Technology has so altered our world that most of
    what we now create begins life in a digital
    format.
  • The artifacts that tell the stories of our lives
    no longer reside in a trunk in the attic, but on
    personal computers or Web sites, in e-mails or on
    digital photo and film cards.
  • The flip side to the ease with which we are able
    to create digital content is the complexity of
    preservation and long-term retrieval of this
    content.
  • We must contend with issues relating to hardware
    and software compatibility long-term storage
    organization of files for ease of search and
    retrieval media quality disaster recovery and
    integrity of original data

14
Making Our Digital Heritage a Top Priority
  • When we consider the ways in which the American
    story has been conveyed to the nation, we think
    of items such as the Declaration of Independence,
    Depression-era photographs, television
    transmission of the lunar landing and audio of
    Martin Luther King's "I Have a Dream" speech.
    Each of these are physically preserved and
    maintained according to the properties of the
    physical media on which they were created. Yet,
    how will we preserve these essential pieces of
    our heritage?
  • Web sites as they existed in the days following
    Sept. 11, 2001, or Hurricane Katrina?
  • What about Web sites developed during the
    national elections?
  • Executive correspondence generated via e-mail?
  • Web sites dedicated to political, social and
    economic analyses?
  • Data generated via geographical information
    systems, rather than physical maps?
  • Digitally recorded music or video recordings?
  • Web sites that feature personal information such
    as videos or photographs?
  • Social networking sites?
  • Should these be at a greater risk of loss, simply
    because they are not tangible?
  • The content of digital archives at cultural
    heritage institutions, created with scarce
    resources in a time of great change

15
The Gap in Digital Preservation Programs
  • 66 of cultural heritage institutions (academic
    libraries, archives, art museums, public
    libraries, and other similar kinds of
    institutions) report that no one is responsible
    for digital preservation activities
  • 30 of all archives have been backed up one time
    or not at all

Source 2005 NEDCC Survey by Bishoff and Clareson
16
Reasons to Establish or Join a DDP Network
17
Backups versus Digital Preservation
  • What differentiates a schedule for data backups
    from a digital preservation program?
  • Backups are tactical measures. Backups are
    typically stored in a single location (often
    nearby or collocated with the servers backed up)
    and are performed only periodically. Backups are
    designed to address short-term data loss via
    minimal investment of money and staff time
    resources. Backups are better than nothing, but
    not a comprehensive solution to the problem of
    preserving information over time.
  • Digital preservation is strategic. A digital
    preservation program entails a geographically
    dispersed set of secure caches of critical
    information. A true digital preservation program
    will require multi-institutional collaboration
    and at least some ongoing investment to
    realistically address the issues involved in
    preserving information over time.

18
What is Digital Preservation?
  • Digital Preservation refers to the management of
    digital information over time.
  • Unlike the preservation of paper or microfilm,
    the preservation of digital information demands
    ongoing attention. This constant input of effort,
    time, and money to handle rapid technological and
    organisational advance is considered the main
    stumbling block for preserving digital
    information beyond a couple of years.
  • Digital preservation can therefore be seen as the
    set of processes and activities that ensure the
    continued access to information and all kinds of
    records, scientific and cultural heritage
    existing in digital formats.

http//en.wikipedia.org/wiki/Digital_preservation
19
Secure and Distributed Cache Networks
  • Why are the characteristics of geographically
    distribution and security so important? This
    strategy maximizes survivability of content in
    both individual and collective terms
  • Security reduces the likelihood that any single
    cache will be compromised.
  • Distribution reduces the likelihood that the loss
    of any single cache will lead to a loss of the
    preserved content.
  • By creating a collaborative network for secure
    and distributed preservation, a group can also
    work together on more complex issues such as
    format migration.

20
Case Study from the Chirographic (Handwritten)
Era The Nag Hammâdi Library
  • Collection of early Coptic texts discovered near
    the town of Nag Hammâdi in 1945
  • Had been buried in the 4th Century CE when
    censored
  • Only extent copies of core early Gnostic
    scholarship
  • Survived 15 centuries because they were part of a
    secure, distributed chirographic network

21
Shared archiving Fails without a Pre-coordinated
Digital Preservation Network in Place
  • The NDIIPP Archive Ingest and Handling Test
    (AIHT)
  • Designed to document methods for preserving
    digital cultural materials, identify areas that
    require further research
  • Participants tested five different preservation
    systems
  • Encountered many unexpected incompatibilities
    because of different systems
  • Realization that much of the cost in preserving
    digital material is in coordinating the
    organizational and institutional imperatives of
    preservation, and not the technological costs of
    storage space

22
Both Technical Networking and Organizational
Networking are Required
  • A single cultural heritage organization is
    unlikely to have the capability to operate
    several geographically dispersed and securely
    maintained servers
  • Collaboration between institutions on
    technological solutions is essential
  • Similarly, inter-institutional agreements must be
    put in place or there will be no commitment to
    act in concert over time
  • The increased number and diversity of those
    concerned with digital preservationcoupled with
    the current general scarcity of resources for
    preservation infrastructuresuggests that new
    collaborative relationships that cross
    institutional and sector boundaries could provide
    important and promising ways to deal with the
    data preservation challenge.  These
    collaborations could potentially help spread the
    burden of preservation, create economies of scale
    needed to support it, and mitigate the risks of
    data loss.
  • - The Need for Formalized Trust in Digital
    Repository Collaborative Infrastructure
  • NSF/JISC Repositories Workshop (April 16,
    2007)

23
Defining Partner/Member Responsibilities
24
Institutional and Consortial Roles
  • Preservation Sites are entities responsible for
    the ongoing activity of preserving digital
    content. At a minimum, every preservation site
    must include responsible staff and a node server
    of the relevant preservation network.
    Preservation sites collectively comprise a
    preservation network.
  • Development Sites are responsible for technical
    development of the computer systems that enable
    the preservation network. Obviously, development
    sites may also be preservation sites and/or
    contributing sites.
  • A Preservation Network is composed of all
    preservation sites that work together to preserve
    at-risk digital content.
  • Contributing (Content) Sites are institutions
    that need to preserve digital content, and
    therefore decide to contribute digital content
    into the preservation network. The preservation
    network acts for the common good to preserve the
    at-risk content submitted by the contributing
    sites. Contributing sites may also be
    preservation sites.

25
Individual Roles
  • Selectors are staff that identify and prioritize
    content to be preserved. They will most often be
    knowledgeable concerning the content of an
    institutions digital archives, and may have been
    the same individuals that originally created or
    acquired the archives.
  • System Administrators are staff members that
    maintain individual preservation node servers of
    the relevant preservation network.
  • Data Wranglers are programmers and other
    technically adept workers that prepare local
    digital archives for ingestion into a
    preservation network.
  • Program Managers are leaders that accept
    responsibility for coordinating the activities of
    a digital preservation network.
  • NOTE All of the above roles may overlap in
    creative ways!

26
Models of Network Organization
  • Different Ways of Creating or Joining Digital
    Preservation Networks

27
Dedicated Network
  • Create a Dedicated Preservation Network
  • Provides the greatest organizational control
  • You can set up the rules for the network
  • Requires greatest up-front investment to
    implement

28
Strategic Alliance
  • Build onto an Existing Preservation Network
  • Takes advantage of previous investments by others
  • Requires understanding the rules of existing
    network and abiding by them
  • Still requires capital investment in
    infrastructure

29
Piggyback Ride
  • Arrange Contribution Strategy to an Existing
    Preservation Network
  • No capital investment in infrastructure required
  • Maximum advantage from previous investments by
    others
  • Requires abiding by rules of existing network
  • Requires convincing the existing network to
    preserve your stuff will likely entail fees

30
Network Security Factors
  • What level of security and control over access to
    your data do you need?
  • Do you have sensitive assets that require access
    controls? If so, you may need a dedicated
    network in which you control access to the
    preservation nodes, or at least be able to join a
    network which provides such access assurances.
  • Do you have some flexibility in adapting to other
    infrastructures and security policies? If so, it
    may be simplest to join and build your
    preservation nodes onto an existing network. The
    requirements may be readily acceptable.
  • Do you have relaxed or no security/access
    expectations? If so, you may simply want to
    piggyback off an existing network and depend on
    their good graces.

31
Decisions on Degrees of Security
  • More security and access assurances drive up the
    required costs of a preservation network
  • Extra costs may very well be justified! The
    entire point of a preservation network is long
    term security for you digital content.
  • Strategic alliances can make a lot of sense.
    They leverage your resources, but still give you
    ownership of a portion of the infrastructure.
  • If you have no infrastructural capacity, and
    little or no funding, a piggyback ride is better
    than nothing!

32
Overview of MetaArchive and LOCKSS
33
MetaArchive
  • A dedicated preservation network for digital
    archives established under the auspices of and
    with funding from the National Digital
    Information and Infrastructure Preservation
    Program (NDIIPP)
  • Based on LOCKSS technology, but a separate
    network with high capacity nodes
  • Highly distributed geographically across multiple
    states
  • Node servers are very secure, with a variety of
    extra security hardening measures added to each
    preservation node
  • Memoranda of Understanding between participating
    sites concerning commitment to maintain each
    others data security and network integrity
  • Motivation to preserve partners digital archives
    is based on signed agreements and commitment to
    the preservation network
  • Available for others to join, both to build onto
    or to piggyback on
  • Active development community, committed to
    ongoing exploration of distributed preservation
    technologies, digital Curation tools, and format
    migration methods
  • Fee structure to join as members or to piggyback
    on

34
LOCKSS
  • A dedicated preservation network for online
    journals, established with funding from the
    Mellon Foundation and new funding from the
    NDIIPP
  • The pioneering leader in distributed digital
    preservation
  • Very highly distributed geographically across the
    world, with hundreds of sites
  • Available for others to join, both to build onto
    or to piggyback on
  • Fee structure for membership
  • No signed agreements between sites individual
    nodes may preserve content or withdraw at will
  • Motivation to preserve content is based on
    interest by members in long-term access to online
    journal content to which they subscribe
  • Active development community, with new
    initiatives with publishers (CLOCKSS) and many
    other technical advancement directions

35
QA Discussion
Write a Comment
User Comments (0)
About PowerShow.com