CCSDS IWS CNES 2230 October 2001 Archive Ingest Methodology Study - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

CCSDS IWS CNES 2230 October 2001 Archive Ingest Methodology Study

Description:

Phase 1 includes the feasibility analysis, design and initial demonstration via ... had to have high operational resilience to collate the data as it came from the ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 16
Provided by: PSci5
Category:

less

Transcript and Presenter's Notes

Title: CCSDS IWS CNES 2230 October 2001 Archive Ingest Methodology Study


1
CCSDS IWS CNES22-30 October 2001Archive
Ingest Methodology Study
Nestor Peccia
2
ESOC Ingest Methodology Study
  • GC Proposal submitted under LET - SME
  • (Leading Edge Technologies for Small and Medium
    Enterprise)
  • on Archive Ingestion Methodology
  • Two Phases (100 Keuro each)
  • Phase 1 includes the feasibility analysis, design
    and initial demonstration via prototype of the
    archival ingestion process
  • Phase 2 shall permit, from Phase 1 results, to
    implement, test and validate the procedures and
    software tools that support the ingest process.
    Interoperability between archives shall be
    demonstrated.
  • Proposal is under evaluation

3
ESOC Ingest Methodology Study
  • First Proposal
  • It has an initial study phase, which will
    include
  • Describing a methodology for an OAIS archive
    ingest process
  • Identifying and selecting resources (procedures,
    standards, tools) required to support the archive
    ingest process
  • Producing an initial Archive Design for the
    elements of an OAIS archive required to support
    the ingest process
  • On the basis of selected standards and design,
    specify in details the steps to be carried out
    between the producer and the archive by refining
    the ingest process protocol, including the
    definition of a standardised formalism to specify
    the SIPs and to perform the actual delivery
  • Prototyping of the ingest process
  • A second phase would consist of building upon the
    results of the initial phase, and cover
  • Refining the Archive Design
  • Implementation, test and validation of the
    procedures and software tools that support the
    ingest process.
  • Demonstration of interoperability between ESA
    different types of archives (space science, earth
    observation, micro-gravity, etc).

4
First Proposal on Ingest Methodology
  • Ingest Process Methodology
  • Resource identification
  • Reviewing available standards required to support
    the representations of both the AIPs and the SIPs
  • The standard Data Description Languages (EAST,
    DEDSL, XML, etc.) and their applicability to the
    representation of the Information Packets.
  • The standards for Metadata representations, such
    as METS, IMS, MPEG21, DIDL, etc.
  • Reviewing of tools available to help to control,
    conduct and monitor the various steps of the
    ingest process
  • Selection of a set of representation standards,
    and identification of tools that can support the
    various steps of the ingest process in the
    specific case of an ESA archive.

5
First Proposal on Ingest Methodology
  • Archive Design
  • Reviewing existing archive standards
  • Designing data objects, representation
    information and preservation description
    information
  • Packaging data objects into data sets and
    collections
  • Determining storage media, designing volumes and
    volume sets
  • Designing the data production process
  • Planning for data validation
  • Developing high level descriptive information for
    the archive catalogue.
  • An initial Archive Design will be required as
    part of the study work, in order to specify the
    components of the OAIS archive that are required
    to support the ingest process.

6
First Proposal on Ingest Methodology
  • Ingest Process Refinement
  • At this stage the detailed procedures required to
    support the ingest process will be defined on the
    basis of the archive design and of the selected
    standards and tools. A standardised formalism to
    specify the SIPs and to perform the actual
    delivery will be defined.
  • Prototyping
  • A prototype will be developed, which will
    demonstrate the concept by applying the ingest
    methodology to the ingestion of ESA mission data.

7
First Proposal on Ingest Methodology
  • Structure of the Study
  • Consolidation Phase a preliminary analysis
    phase, aiming at specifying the various
    components required to implement the ingestion
    methodology, and assessing the feasibility of the
    definition of an ingestion methodology within
    ESA.
  • Prototype Development Phase building upon the
    results of the Consolidation Phase, a prototype
    will be developed that demonstrates the
    applicability of the concepts in the framework of
    archiving of mission data.
  • Installation Presentation Phase the final
    phase will cover the installation and
    demonstration of the prototype, together with the
    production of the study report and the final
    presentation of the study results to ESA.

8
Second Proposal on Ingest Methodology
  • Scope
  • It proposes to study the positive impact that XML
    and emerging related technologies such as SOAP
    and UDDI can bring to the definition of the
    central archive concept, and to provide a simple
    prototype to demonstrate our results. The
    principle interest area of the study is the
    ingestion process i.e. the method by which data
    and metadata can be contributed to the central
    archive function.

9
Second Proposal on Ingest Methodology
  • Proposal
  • Examine the success of XML and related
    technologies in parallel data warehousing and
    data mining projects outside the Space industry,
    such as in the world of finance and the
    pharmaceutical industry.
  • Examine the current state of the art of XML and
    related technologies, through assessment of
    progress made by such important players in the
    market as Microsoft and IBM.
  • Establish pertinent standards or subset of
    standards that match best the needs of the
    ingestion process.
  • Define an overview process for resolving the
    interface between generic projects and the
    central archive.
  • Design, and implement a simple prototype to
    demonstrate the ingestion process concept.

10
Second Proposal on Ingest Methodology
  • The Innovation
  • The scope of the study we foresee is to
    investigate the suitability of XML, SOAP and UDDI
    to enable the provision of a suitable front end
    to an archive, which would allow the ingestion of
    metadata from a variety of sources. In addition
    to the ingestion of metadata, the question of
    physically including data in the central archive
    would also be addressed, taking into account the
    need to update the delivered metadata to reflect
    the actual physical location of the data. A small
    proof-of-concept prototype would be developed
    utilising the technologies discussed above to
    demonstrate the feasibility of concept.

11
Second Proposal on Ingest Methodology
  • Technical Approach (four phases )
  • An study of the current state of XML and related
    technologies, including SOAP and UDDI, including
    examination of existing standards available for
    use in real developments. The study will look at
    how the current technology can be applied to the
    specific problem of defining a front-end
    interface to the central archive concept.
  • The specification of requirements and data to be
    used in the generation of a software prototype
    which can demonstrate the applicability of the
    studied technologies in the resolution of the
    front-end interface to the central archive
    concept.
  • The detailed design and implementation of the
    prototype according to the requirements specified
    in the preceding phase. The testing of the
    prototype in readiness for demonstration.
  • Concluding the project by summarising the
    findings and test results into appropriate
    documentation, and the presentation of the
    results to ESA.

12
Second Proposal on Ingest Methodology
  • Comparison with the state of the art
  • Vodafone UK
  • The millions of calls carried each day on the
    network generate a massive volume of data
    relating to the transaction details of each call
    made. The data is of immediate value as the basis
    for billing and managing the network
    infrastructure. The retention of this information
    is not only necessary for legal reasons, but
    essential to good customer service. Vodafone had
    identified that this call data storage and data
    enquiry system was a strategic initiative for the
    group.
  • 20 billion records
  • 5Tb database size
  • 52 million files a day
  • 9Tb raw disk storage
  • 9Tb tape storage
  • High availability

13
Second Proposal on Ingest Methodology
  • Comparison with the state of the art
  • XMM SOC at Vilspa
  • The system is installed at the VILSPA ground
    station near Madrid. It is responsible for
    archiving images from the X-Ray Multi-Mirror
    telescope for later retrieval and processing. The
    data repository employed a Sun Enterprise 3500
    server with 3Gb Ram as an AMS server with a
    storage system comprising a Sun StorEdge A3500
    running an AMS Disk Array containing 22 x 18Gb
    Drives. Alongside this were two L3500 AMS tape
    Libraries. The system had to have high
    operational resilience to collate the data as it
    came from the satellite payload. The RDBMS in the
    system is Oracle.
  • However, in both of the above examples the data
    ingestion process is somewhat static and
    predefined i.e. data in very specific formats is
    expected. The aim of this new ingestion process
    is to vastly improve the flexibility with such
    archives i.e. allowing data in many different
    forms to be ingested.

14
3rd Proposal XOAIS Prototype Architecture
15
3rd Proposal XOAIS Prototype Architecture
  • Ingestion. Ingest functions include receiving
    SIPs, performing Validation on SIPs, generating
    an Archival Information Package (AIP) which
    complies with the archives data formatting and
    documentation standards. A data conversor module
    is introduced in order to extract such
    information. The Data Conversor shall produce
    such information in the Long Term Preservation
    Space Markup Language, that has been defined in
    the previous phase. Descriptive Information from
    the AIPs (Metadata) are produce for inclusion in
    the archive database.In the same way, the
    ingestion process is based on the definition of a
    Meta-data system. This shall be based on a tree
    structure, a hierarchical model with root,
    branches and leaves elements. This step is
    the definition of the elements of the language in
    order to define the metadata using the XML
    Schema.
  • Archive. This entity provides the services and
    functions for the storage, maintenance and
    retrieval of the Object Data and the Metadata.
  • Access. This entity supports Consumers in
    determining the existence, description, location
    and availability of information stored in the
    OAIS and allowing consumers to request and
    receive information products.
  • HMI. Based on Tcl/Tk, a shareware scripting
    language that may easy the integration of the
    other different modules in a straightforward way.
Write a Comment
User Comments (0)
About PowerShow.com