Long Term digital preservation OAIS preservation perspectives - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Long Term digital preservation OAIS preservation perspectives

Description:

There is no set position for a particular cartridge inside the facility. Files are automatically changed from one cartridge to another: ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 15
Provided by: Mos129
Category:

less

Transcript and Presenter's Notes

Title: Long Term digital preservation OAIS preservation perspectives


1
Long Term digital preservationOAIS preservation
perspectives
  • Claude Huc
  • Centre National d Etudes Spatiales
  • Claude.Huc_at_cnes.Fr

2
Contents
  • General characteristics of space data
  • Preservation of space data the factors of
    difficulty
  • Archiving systems architecture
  • Physical storage
  • Media and technologies
  • The storage function
  • The storage machine ?
  • Rules applicable to information to be archived
  • Independence of information with regard to
    systems
  • Representation information
  • Migrations

3
Characteristics of Digital Information in the
Space field
  • Diversity
  • Diversity of applications (scientific,
    operational, commercial)
  • Diversity in the organization, structure and
    encoding of documents and data scientific
    observations, text documents describing missions,
    experiences and their context, technological
    information (spacecraft structure, thermic
    model)
  • Complexity
  • Related to the complexity of on-board instruments
  • Volume
  • Generally high
  • May be as much as several hundred Terabytes for
    the earth observation missions

4
Characteristics and diversity of information to
be archived
  • Data file collections
  • - formats defined for each application integers,
    real numbers in encoded (ASCII) or binary (IEEE
    real) format
  • - standard formats pour certain vocabulary
    research projects
  • Images
  • JPEG, GIF, TIFF, PS formats
  • Text documents
  • PDF, TXT, HTML, XML formats
  • Occasional video documents
  • MPEG formats
  • Technological data (STEP)
  • Links to other archives, etc.

5
Preservation of space data The factors of
difficulty
  • Technical factors
  • increasingly frequent technological changes
  • non-independence of archived information with
    regard to systems
  • insufficient semantic and syntactic descriptions
    of data
  • control of the migration process
  • Organizational factors
  • space projects are not perpetual structures
  • Scientific factors
  • the know-how required to interpret data
  • Most of these factors are not specific to the
    space field

6
Archiving systems architecture
  • Compartmentalize technologies
  • break down main functions into services which are
    as independent as possible ingest process,
    storage, metadata management, access,
  • standardize interfaces between these services,
  • Each service then becomes independent with regard
    to its evolutions,
  • We are clearly evolving towards service-based
    architectures
  • Minimize the number of systems to be developed
    and maintained search for genericity

7
Physical Storage Media and Technologies
  • Incredible evolutions in storage capacities
  • In 1975 35 Megabyte 1600 bpi magnetic tape
    (Compatible Computer Tape)
  • In 1995 1 terabyte optic tape,
  • Increasingly rapid obsolescence and changes of
    technologies now creates more difficulties than
    those created by media obsolescence
  • Each change in storage technology led to
    relatively deep modifications in the archive
    system
  • Therefore we must seek to minimize these
    consequences by developing a physical storage
    service for data which is as independent as
    possible of the other archive system functions
  • This approach has been undergoing experimentation
    in CNES since 1995

8
Physical StorageExample of Storage Function
  • STAF file archive and transfer service
    implemented at the CNES
  • Mission
  • Structured storage of data in tree format,
  • Long term physical preservation of files,
  • Transparent media management for users,
  • Confidentiality and access rights
  • Archiving and retrieval in a heterogeneous
    environment
  • Service which is totally independent of all
    applications or projects
  • The STAF has been operational since 1995. It
    operates 24 hours a day.
  • Current storage 25 Terabytes, 2 million files,
    daily input 120 Gbytes

9
Physical Storage The Storage Machine?
  • The STAF
  • The data is in files which are stored on magnetic
    tape cartridges
  • The cartridges are stored in mass storage
    facilities (StorageTek robots)
  • There is no set position for a particular
    cartridge inside the facility
  • Files are automatically changed from one
    cartridge to another
  • Due to error corrector codes nearing saturation
  • Due to the age of the cartridge (5 years old)
  • To reorganize storage following the deletion of
    certain files
  • To deal with the changes of cartriges
    technologies.
  • gt Progressive suppression of the data - médium
    association

10
Rules applicable to information to be archived
  • The digital information to preserve must be
  • independent of technologies and systems
  • independent of the software used to create or
    manage this information
  • the files are seen as an abstract bit stream,
    totally independent of the machines
  • only file structures which are completely known
  • limited to standardized encoding

11
Rules applicable to information to be archived
  • Describe data on the syntactic and semantic
    levels
  • all these file bits must belong to fields with
    encoding and significance which is known to us
  • the description of these fields must be
    exhaustive, comply with the data and validated
  • For data files
  • syntactic description with the EAST language
    http//east.cnes.fr
  • semantic description with the DEDSL language
    http//www.ccsds.org/ccsds/red_books.html CCSDS
    647.1-R-2 Data Entity Dictionary Specification
    Language
  • For text documents
  • PDF and XML or SGML

12
Migrations (1)
  • The CNES data migration program
  • Undertaken from 1994 to late 1999
  • Several tens of thousands of magnetic tapes
  • Justified by the abandon at that time of the
    Control Data machines and all magnetic tape
    reading facilities and the obsolescence of this
    storage technology
  • objective save existing data in a portable form
  • existing data is saved gt no transformation of
    the information
  • physical transfer of the data to a new perpetual
    storage service (STAF),
  • transcoding - if necessary - at this stage

13
Migrations (2)
  • Purely physical migrations (repackaging for the
    OAIS)
  • files are not modified
  • losses of less than 1
  • this low figure is due to
  • the monitoring and the quality of the media used
  • the duplication of valuable data
  • Migrations which include transcoding
    (transformation for the OAIS)
  • applicable to non-portable files
  • costly (avoidable) operation
  • validation criticality

14
Conclusions
  • There is still a long way to go.
  • In the face of lack of knowledge, carelessness or
    irresponsibility, a number of documents in
    digital format will be lost in years to come,
  • We hope that such losses will trigger the
    necessary awareness in the modern World who wish
    to keep our memory intact.
Write a Comment
User Comments (0)
About PowerShow.com