Mike Smorul - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Mike Smorul

Description:

Ease of integration with archive and data grid systems. ... Data grids may allow for platform independent metadata, but may not be optimal for access ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 24
Provided by: toas2
Category:
Tags: mike | smorul

less

Transcript and Presenter's Notes

Title: Mike Smorul


1
Digital Preservation and Archiving at the
Institute for Advanced Computer
StudiesUniversity of Maryland, College Park
  • Mike Smorul
  • Saurabh Channan

2
Overview
  • Digital Preservation Research
  • ADAPT Project and Components
  • Pilot Persistent Archive
  • Digital Library and Production Data Distribution
  • Global Land Cover Facility
  • Conclusion Questions

3
A Digital Approach to Preservation Technology
(ADAPT)
  • Premise
  • Preservation of digital entities into
    self-describing objects
  • OAIS Information Packet model as a framework
  • Separation of management into three layers,
    bitstream, semantic, and access/discovery
  • Distributed and Secure Infrastructure
  • Automatic ingestion and replication
  • Policy-Driven Management of Preservation
    Processes
  • Global Format Registry
  • Separate Peer-to-Peer Deep Archive

4
ADAPT Architecture
5
ADAPT Components
  • Ingestion
  • Producer-Archive Workflow Network (PAWN)
  • Management of Preservation Processes
  • Lightweight Preservation Environment (LPE)
  • Access and Discovery
  • Grid Retrieval and Search Platform (GRASP)
  • EAP Collection browser

6
Overall Principles (PAWN)
  • Distributed, secure ingestion
  • OAIS based Information Packet creation
  • Use of web/grid technologies platform
    independent
  • Minimal client-side requirements
  • Ease of integration with archive and data grid
    systems.
  • Designed to satisfy data integrity requirements
    of scientific collections and digital preservation

7
Distributed Ingestion (PAWN)
8
Ingestion Workflow (PAWN)
  • Negotiate Submission Agreement.
  • Workflow Initialization and Submission
    Information Packet (SIP) creation.
  • Transfer of SIPs to Data Grid site.
  • Validation of SIP transfer
  • Organization of data into collections and
    transfer into Data Grid.

9
Component Overview (PAWN)
10
Target Collections (PAWN)
  • Digital Image Collection
  • Rich metadata in various formats
  • Web site crawling
  • Online and interactive content
  • GLCF Landsat data
  • Spatial and temporal metadata
  • Large quantity (over 15,000 objects)

11
Lightweight Preservation Environment (LPE)
  • The Lightweight Preservation Environment is an
    archival system based on a modular design using
    grid and web services.
  • The current implementation relies mostly on
    Globus technologies.
  • Primarily, weve focused on wrapping logic around
    those components.

12
Developed Components (LPE)
  • Data Manager (DM)Organizes data and queries
    between the user and the other components
  • Policy Manager (PM)Ensures that a minimum
    number of copies exist for any given file
  • Transformation Manager (TM)Executes specific
    transformations on a named file on a given
    storage node and returns the results

13
Grid Retrieval and Search Platform (GRASP)
  • Based on concepts developed in the Earth Science
    Data Interface (ESDI) developed at the UMIACS
    GLCF.
  • Provides a graphical interface into data grid
    holdings.
  • Access to entire GLCF holdings through the
    Storage Resource Broker(SRB)

14
GRASP Architecture
15
GRASP Architecture
  • GRASP uses a data grid as an abstract storage
    repository.
  • Metadata in the grid is mined from the grid
    itself or from external sources and published
    into a browsable form.
  • Data grids may allow for platform independent
    metadata, but may not be optimal for access

16
GRASP Screenshot
17
Global Land Cover Facility
  • Mission
  • The GLCF Mission is to encourage the use of
    remotely sensed imagery, derived products and
    applications within a broad range of science
    communities in a manner that improves
    comprehension of the nature and causes of land
    cover change and its impact on the Earth.
  • Goal
  • The GLCF Goal is to provide free access to an
    integrated collection of critical land cover and
    Earth science data through systems that are
    designed to maximize user outreach and that
    promote development of novel tools for ordering,
    visualizing and manipulating spatial data.

18
Data Collections
Majority of the holdings are of Landsat and MODIS
data
19
Data Distribution
  • Data at the GLCF
  • Approximately 5.1 TB compressed
  • Approximately 13 TB uncompressed
  • Anticipated Production Rate
  • Triple or Quadruple current data holding within
    the next two year

20
Data Discovery Applications
  • ESDI
  • Web Interface
  • User friendly
  • Search
  • Retrieve
  • Discover
  • Scalable
  • Over 9TB a month !

21
GLCF Archive
Scalable and Reliable
22
Participation Possibilities
  • PAWN ingestion component
  • Minimal geospatial metadata support planned, can
    be expanded to support NGDA endpoint
  • GRASP display component
  • Solid core components, end-user interfaces need
    additional polishing
  • GLCF data holdings
  • Additional hardware required if additional data
    and access mechanisms (grid, etc) required
  • Other possibilities include grid infrastructure,
    GSI security, format registry, etc.

23
Questions
Write a Comment
User Comments (0)
About PowerShow.com