Title: CCSDS IWS CNES 2230 October 2001 Archive Ingest Methodology Study
1CCSDS IWS CNES22-30 October 2001Archive
Ingest Methodology Study
Nestor Peccia
2ESOC Ingest Methodology Study
- GC Proposal submitted under LET - SME
- (Leading Edge Technologies for Small and Medium
Enterprise) - on Archive Ingestion Methodology
- Two Phases (100 Keuro each)
- Phase 1 includes the feasibility analysis, design
and initial demonstration via prototype of the
archival ingestion process - Phase 2 shall permit, from Phase 1 results, to
implement, test and validate the procedures and
software tools that support the ingest process.
Interoperability between archives shall be
demonstrated. - Proposal is under evaluation
3ESOC Ingest Methodology Study
- First Proposal
- It has an initial study phase, which will
include - Describing a methodology for an OAIS archive
ingest process - Identifying and selecting resources (procedures,
standards, tools) required to support the archive
ingest process - Producing an initial Archive Design for the
elements of an OAIS archive required to support
the ingest process - On the basis of selected standards and design,
specify in details the steps to be carried out
between the producer and the archive by refining
the ingest process protocol, including the
definition of a standardised formalism to specify
the SIPs and to perform the actual delivery - Prototyping of the ingest process
- A second phase would consist of building upon the
results of the initial phase, and cover - Refining the Archive Design
- Implementation, test and validation of the
procedures and software tools that support the
ingest process. - Demonstration of interoperability between ESA
different types of archives (space science, earth
observation, micro-gravity, etc).
4First Proposal on Ingest Methodology
- Ingest Process Methodology
- Resource identification
- Reviewing available standards required to support
the representations of both the AIPs and the SIPs - The standard Data Description Languages (EAST,
DEDSL, XML, etc.) and their applicability to the
representation of the Information Packets. - The standards for Metadata representations, such
as METS, IMS, MPEG21, DIDL, etc. - Reviewing of tools available to help to control,
conduct and monitor the various steps of the
ingest process - Selection of a set of representation standards,
and identification of tools that can support the
various steps of the ingest process in the
specific case of an ESA archive.
5First Proposal on Ingest Methodology
- Archive Design
- Reviewing existing archive standards
- Designing data objects, representation
information and preservation description
information - Packaging data objects into data sets and
collections - Determining storage media, designing volumes and
volume sets - Designing the data production process
- Planning for data validation
- Developing high level descriptive information for
the archive catalogue. - An initial Archive Design will be required as
part of the study work, in order to specify the
components of the OAIS archive that are required
to support the ingest process.
6First Proposal on Ingest Methodology
- Ingest Process Refinement
- At this stage the detailed procedures required to
support the ingest process will be defined on the
basis of the archive design and of the selected
standards and tools. A standardised formalism to
specify the SIPs and to perform the actual
delivery will be defined. - Prototyping
- A prototype will be developed, which will
demonstrate the concept by applying the ingest
methodology to the ingestion of ESA mission data.
7First Proposal on Ingest Methodology
- Structure of the Study
- Consolidation Phase a preliminary analysis
phase, aiming at specifying the various
components required to implement the ingestion
methodology, and assessing the feasibility of the
definition of an ingestion methodology within
ESA. - Prototype Development Phase building upon the
results of the Consolidation Phase, a prototype
will be developed that demonstrates the
applicability of the concepts in the framework of
archiving of mission data. - Installation Presentation Phase the final
phase will cover the installation and
demonstration of the prototype, together with the
production of the study report and the final
presentation of the study results to ESA.
8Second Proposal on Ingest Methodology
- Scope
- It proposes to study the positive impact that XML
and emerging related technologies such as SOAP
and UDDI can bring to the definition of the
central archive concept, and to provide a simple
prototype to demonstrate our results. The
principle interest area of the study is the
ingestion process i.e. the method by which data
and metadata can be contributed to the central
archive function.
9Second Proposal on Ingest Methodology
- Proposal
- Examine the success of XML and related
technologies in parallel data warehousing and
data mining projects outside the Space industry,
such as in the world of finance and the
pharmaceutical industry. - Examine the current state of the art of XML and
related technologies, through assessment of
progress made by such important players in the
market as Microsoft and IBM. - Establish pertinent standards or subset of
standards that match best the needs of the
ingestion process. - Define an overview process for resolving the
interface between generic projects and the
central archive. - Design, and implement a simple prototype to
demonstrate the ingestion process concept.
10Second Proposal on Ingest Methodology
- The Innovation
- The scope of the study we foresee is to
investigate the suitability of XML, SOAP and UDDI
to enable the provision of a suitable front end
to an archive, which would allow the ingestion of
metadata from a variety of sources. In addition
to the ingestion of metadata, the question of
physically including data in the central archive
would also be addressed, taking into account the
need to update the delivered metadata to reflect
the actual physical location of the data. A small
proof-of-concept prototype would be developed
utilising the technologies discussed above to
demonstrate the feasibility of concept.
11Second Proposal on Ingest Methodology
- Technical Approach (four phases )
- An study of the current state of XML and related
technologies, including SOAP and UDDI, including
examination of existing standards available for
use in real developments. The study will look at
how the current technology can be applied to the
specific problem of defining a front-end
interface to the central archive concept. - The specification of requirements and data to be
used in the generation of a software prototype
which can demonstrate the applicability of the
studied technologies in the resolution of the
front-end interface to the central archive
concept. - The detailed design and implementation of the
prototype according to the requirements specified
in the preceding phase. The testing of the
prototype in readiness for demonstration. - Concluding the project by summarising the
findings and test results into appropriate
documentation, and the presentation of the
results to ESA.
12Second Proposal on Ingest Methodology
- Comparison with the state of the art
- Vodafone UK
- The millions of calls carried each day on the
network generate a massive volume of data
relating to the transaction details of each call
made. The data is of immediate value as the basis
for billing and managing the network
infrastructure. The retention of this information
is not only necessary for legal reasons, but
essential to good customer service. Vodafone had
identified that this call data storage and data
enquiry system was a strategic initiative for the
group. - 20 billion records
- 5Tb database size
- 52 million files a day
- 9Tb raw disk storage
- 9Tb tape storage
- High availability
13Second Proposal on Ingest Methodology
- Comparison with the state of the art
- XMM SOC at Vilspa
- The system is installed at the VILSPA ground
station near Madrid. It is responsible for
archiving images from the X-Ray Multi-Mirror
telescope for later retrieval and processing. The
data repository employed a Sun Enterprise 3500
server with 3Gb Ram as an AMS server with a
storage system comprising a Sun StorEdge A3500
running an AMS Disk Array containing 22 x 18Gb
Drives. Alongside this were two L3500 AMS tape
Libraries. The system had to have high
operational resilience to collate the data as it
came from the satellite payload. The RDBMS in the
system is Oracle. - However, in both of the above examples the data
ingestion process is somewhat static and
predefined i.e. data in very specific formats is
expected. The aim of this new ingestion process
is to vastly improve the flexibility with such
archives i.e. allowing data in many different
forms to be ingested.
143rd Proposal XOAIS Prototype Architecture
153rd Proposal XOAIS Prototype Architecture
- Ingestion. Ingest functions include receiving
SIPs, performing Validation on SIPs, generating
an Archival Information Package (AIP) which
complies with the archives data formatting and
documentation standards. A data conversor module
is introduced in order to extract such
information. The Data Conversor shall produce
such information in the Long Term Preservation
Space Markup Language, that has been defined in
the previous phase. Descriptive Information from
the AIPs (Metadata) are produce for inclusion in
the archive database.In the same way, the
ingestion process is based on the definition of a
Meta-data system. This shall be based on a tree
structure, a hierarchical model with root,
branches and leaves elements. This step is
the definition of the elements of the language in
order to define the metadata using the XML
Schema. - Archive. This entity provides the services and
functions for the storage, maintenance and
retrieval of the Object Data and the Metadata. - Access. This entity supports Consumers in
determining the existence, description, location
and availability of information stored in the
OAIS and allowing consumers to request and
receive information products. - HMI. Based on Tcl/Tk, a shareware scripting
language that may easy the integration of the
other different modules in a straightforward way.