Title: A NOAANASA Pilot Project for the Preservation of MODIS Data from the Earth Observing System EOS
1A NOAA/NASA Pilot Project for the Preservation of
MODIS Data from the Earth Observing System (EOS)
- Robert H. Rank
- NOAA/NESDIS
- Kenneth R. McDonald
- NASA/GSFC
2Topics
- Data Life Cycle Background
- NASAs Earth Science Data
- NOAA Data Centers Mission
- Guiding Principles
- CLASS Overview and Goals
- Long-Term Archive Challenges
- MODIS Pilot Project
- Reference Model
- OAIS Responsibilities
- Data Submission Agreement (DSA)
- Schedule
- Expected Outcomes
- Experience Using OAIS
- Conclusion
3Data Life Cycle
A simple model showing the four major lifecycle
entities within a context of an overall set of
guiding policies
Some of these functions may be grouped together
in any given mission or project
4Long Term Archive Requirements
- NASA shares the responsibility for stewardship of
its Earth science data resources with NOAA and
USGS. - NASA holds the responsibility for its data during
the life of each mission plus four years. - NOAA and USGS provide the long term archive for
ocean and atmosphere data and land processes
data, respectively. - Agreements are in place between NASA and NOAA and
between NASA and USGS that document these
responsibilities.
5 NASAs Earth Observing System (EOS) Data and
Information System (EOSDIS)
6EOSDIS Overview
- EOSDIS Functions
- A production capability for standard data
products from EOS instruments - An active archive of Earth science data from
EOS and other past and present missions - A distributed information framework (data
centers, SIPS, networks, interoperability
infrastructure) - EOSDIS Operations
- Supporting EOS missions since 1999 and heritage
data archives since 1994. - Operations at 8 DAACs and 13 SIPS.
- Total archive of over 4 petabytes, growing at 4
terabytes per day - Over 200,000 distinct users obtaining data from
DAACs - Annual distribution of 33 million data products,
2 TB per day.
7NOAAs National Data Centers -- Environmental
Data Stewards
Scientific Data Stewardship is ownership,
knowledge, utilization, and application of the
data CLASS is the Information Technology
infrastructure (hardware and software
environment, and tools) underpinning SDS Data
Rescue preserves and makes available historical
data sets from obsolete media
8NOAAs National Data Centers
- NOAAs National Data Centers are major archive,
access, and assessment sites maintaining,
processing, and distributing environmental and
geospatial data. - National Climatic Data Center
WWW.NCDC.NOAA.GOV - Asheville, NC
- National Coastal Data Development Center
- Stennis, MS WWW.NCDDC.NOAA.GOV
- National Geophysical Data Center
WWW.NGDC.NOAA.GOV - Boulder, CO
- National Oceanographic Data Center
WWW.NODC.NOAA.GOV - Silver Spring, MD
9NOAAs National Data Centers(Continued)
- These Centers provide long-term stewardship for
most of NOAAs environmental and geospatial data,
and a broad range of user services. - They serve as both
- Centers of Data -- facilities where extensive
collections of given environmental parameter(s)
are maintained because of individual or
institutional research or operational
requirements - Agency Record Centers -- facilities where data
is made accessible to a large user community, as
well as being preserved and protected to certain
standards
10Guiding Principles
All NOAA environmental data will
- be made accessible to broader data integration
efforts, such as Global Earth Observation System
of Systems GEOSS - reside in secure archives conforming to National
Archives and Records Administration (NARA) and
Continuity Of Operations (COOP) standards - be maintained at the highest standards of
scientific data stewardship - be searchable using advanced data discovery
tools, to facilitate interdisciplinary studies - be accessible through a common portal available
to the scientific community, commercial sector,
and general public based on advanced access tools
11Comprehensive Large Array-data Stewardship System
(CLASS)
12WHY a CLASS?
- Fulfill NOAAs legal requirement to provide for
archive and access to its data - The source for the vast majority of observational
environmental data generated by NOAA. - Provide critical products to Customers
- Public and Private Research Development efforts
- Colleges and Universities
- Federal, State, and Local Climatologists
- Agriculture Users, Drought Monitors, and Flood
Management - Accident Investigators Legal Community
- Coastal Monitoring, Algae Blooms, and Fishing
Management
13CLASS Overview
- CLASS is a web-based data archive and
distribution system for NOAAs environmental data - CLASS is an evolving system which will support
additional campaigns, broader user base, new
functionality as implementation continues for the
next 10 years - CLASS is the principal IT system supporting
NOAAs responsibility as environmental data
stewards - CLASS concurrently supports both ongoing
operations and new requirements implementation
14CLASS Campaigns
- NOAA and Department of Defense (DoD)
Polar-orbiting Operational Environmental
Satellites (POES) and Defense Meteorological
Satellite Program (DMSP) - NOAA Geostationary Operational Environmental
Satellites (GOES) - EUMETSAT Meteorological Operational Satellite
(Metop) Program - NOAA NEXT generation weather RADAR (NEXRAD)
Program and future dual polarized and
phased-array radars. - National Aeronautics and Space Administration
(NASA) Earth Observing System (EOS)
Moderate-resolution Imaging Spectrometer (MODIS)
- The NPOESS Preparatory Project (NPP)
- National Polar-orbiting Operational Environmental
Satellite System (NPOESS) - National Centers for Environmental Prediction
Model Datasets, including Reanalysis Products
15CLASS GOALS
- Give any potential customer access to all NOAA
(and possibly non-NOAA) data through a single
portal -
- Eliminate the need to keep creating stovepipe
systems for each new type of data, but, in as
much as possible use already polished
portions/modules of existing legacy systems - Describe a cost-effective architecture that can
primarily handle large array data sets but also
be capable of handling smaller data sets as well
16CLASS Summary
- A NOAA-wide Data Management System (DMS) can
evolve from CLASS by initially integrating with
the NOAA National Data Centers and ultimately
with the NOAA Centers of Data - The CLASS backbone will provide the DMS for
large-array (largely NESDIS) data sets, but also
provide secure archival services to other NESDIS
and NOAA users who participate in the NMMR and
NOAA-Server - This approach will leverage the resources of
CLASS, NVDS, SDS, and the various funding
vehicles being use by non-NESDIS NOAA
organizational components - This semi-distributed architecture, with central
data and metadata archives built on international
standards and will allow future integration of
NOAA systems into GEOSS - CLASS will be the NOAA archive for NPP/NPOESS,
EOS and GOES-R data - CLASS is accessible via the web at
www.class.noaa.gov
17LTA Challenges
- What data are needed for long-term archive?
- How is long-term preservation achieved?
- What services do users need to deal with these
data volumes? - What are the people vs. machine issues?
- How will new technology help?
- Metrics for assessing how we are doing
- National Research Council Panel enabled to help
address this issue
18challenges
- The archived information must be useable by
consumers who are separated in time, distance and
background from the producers - producers no longer available
- cannot answer questions on ad-hoc basis
- producers software not supported - may be
obsolete - knowledge captured by the software becomes
unavailable - documentation is lost over time
19...challenges
- The user community will change over time
- new community will be unfamiliar with the
background to the information - may use different analysis environment
- may want to combine information from many sources
- The archive will change over time
- migration to new technology - hardware/software
- may require reorganization of information
- possible changes in implicit relationships
- migration to different institutions
- possible changes to management, data structure,
file format
20MODIS Pilot Project
- Purpose is to define system interfaces and
implement transfer capability. - Established MODIS L0 and L1B as initial
candidates for data transfer - L0 is stable and
L1B has high user demand. - Team includes representatives from ESDIS, GES
DAAC, MODIS SDST, CLASS/Suitland, NCDC, NGDC,
Fairmont, WV. - Established the collaboration tools and methods.
- Pilot Plan Actions and Schedules
- Working Group Charter
- Following Open Archival Information System
Standard (OAIS) as an LTA Reference Model.
21Pilot Schedule Highlights
- CLASS Operational at NSOF (ingest node) Jan
2006 - Prototype 1 week MODIS L0 transfer Feb 2006
- Prototype 1 month L0 transfer (6x rate) Mar
2006 - Evaluate L0 continuous feed (40 days) Jun
2006 - DSA for MODIS L1B Feb 2006
- ICD for MODIS L1B Apr 2006
- Prototype 1 week L1B transfer May 2006
- Prototype 1 month L1B transfer (6x rate) Jun
2006 - Evaluate continuous feed (20 days) Jul 2006
- Access and Delivery Capability Aug 2006
- Pilot Project Evaluation Report Oct 2006
- Project Plan for NOAA MODIS LTA Dec 2006
- NOAA/NASA Panel Report Dec 2006
22MODIS Pilot Project Expected Outcomes
- NASA and NOAA will have a better hands-on
understanding of system capabilities, conventions
(e.g. data model) standards and processes of
respective systems. - A draft set of interface documentation (DSA, ICD,
etc.). - An interface between EOSDIS and CLASS - defined
and exercised. - An actual demonstration of CLASS support for EOS
data. - The foundation for the development of a sound
NASA/NOAA LTA plan.
23Reference Model
- A Reference Model is needed to provide a common
framework for discussion description - A major aim is to facilitate a much wider
understanding of what is required to preserve
information for the long term - Facilitates description and comparison of
archives - Provides a basis for further standardization
- help broaden the market for commercial providers
24...Reference Model
- We are particularly concerned with Long-Term
Preservation of digital information - long term is long enough to be concerned about
changing technologies - not just bit preservation
- starting point for model addressing non-digital
information
25......Reference Model
- But this work is also of use for Short-Term
archives because - technological change is rapid (years, not
decades) - the short-term archive may eventually hand
information over to another, longer-term, archive
26Areas for Standards to follow
- Interfaces between OAIS type archives
- Submission to OAIS (SIP)
- Dissemination from OAIS (DIP)
- Search retrieve metadata from OAIS
- Sufficient information should be provided to
ensure the rendered content may be interpreted
and understood by its intended users. - Information migration
- Procedures should indicate the file format and
version to be created and software used to create
it. - Provenance
- A description of the content history, including
its origins, changes to the object or its content
over time, and its chain of custody (if known).
27OAIS Responsibilities
- Negotiates accepts Submission IPs
- Determines communities which need to be able to
understand Content Information - Ensures information to be preserved is
understandable to designated communities - Assumes sufficient control of information to be
able to ensure long-term preservation - Follows policies procedures to ensure
information is preserved - Makes the information available to the designated
communities in appropriate forms
28Data Submission Agreement (DSA)
- Include all the information that is necessary for
the producer to provide data products to the
archive and for the archive to receive the data
products from the produce - It seems a daunting (or rather an impossible)
task to collect all of the information listed
above in a single document in a timely fashion. - Need a high-level agreement in place before we
proceed to specify the details of the
Producer-Archive interface and to design the
respective systems. - There is no way of compiling operational
information until near the start of the
operational phase.
29DSA Groupings
- High-level agreement
- the content is rather static (i.e., temporally
stable) and provides a framework for both the
Producer and the Archive to move on to defining
details. - Detailed level interface and some functional
specification - the content is somewhat dynamic (i.e., changing
with time) and requires for the Producer and the
Archive to do some in-depth studies. - Operational information
- the content is not available until near the time
when the data flows commence (e.g., IP addresses,
host directory names, operations contacts) - Quasi-static metadata details
- the definite content is hard to come by,
especially for a planned spacecraft missions. - More??
30DSA Groupings
- With these considerations in mind, we suggest
that the Producer-Archive Agreement - Divided into several separate documents
- Each being signed at a different
management/technical level and at a different
time - Memorandum of Agreement (MOA)
- Interface Control Document (ICD)
- Operations Agreement (OA)
- Quasi-Static Metadata Specification (QSMS)
- Others?
31DSA Groupings
- The MOA should be developed early on and signed
by a high-level management of both parties. - It should provide a firm ground for detailed
level technical work to proceed. - Any details that will become clearer later or
simply are unknown will have to be deferred to
the lower level components of the agreement
(i.e., ICD, OA, and QSMS). - Once the MOA is signed, both parties may start
developing ICD and QSMS. - Forms the basis for the design of the physical
systems (for both the Producer and the Archive). - The OA can wait until the time of the system IT
32DSA Groupings
- Depending on the circumstances, the Producer and
the Archive may include additional documents. - There are certain items that do not belong in the
MOA and yet are not covered by the ICD. - We may call it Supplement to the MOA (yet still
separate from the MOA). - This approach of creating the Producer-Archive
Agreement in multiple volumes and releasing each
sequentially in time appears to be far better
than the current approach of creating a single
volume agreement.
33Use of OAIS
- Benefits
- Good overall framework of terms, functions and
processes to structure the LTA discussion - Identifies a set of documents to capture and
record requirements and specifications - Challenges
- Timing - starting to use OAIS in the middle of
the data life cycle of EOS data has been
difficult - Complexity of EOS LTA requirements - numbers of
products, data volumes, processing S/W - Overload on Data Submission Agreement - Interface
Requirements Document, Interface Control
Document, Operations Agreement
34Conclusions
- Transfer of NASAs Earth science data to NOAA for
long-term preservation and stewardship is a major
undertaking - NOAA/NASA MODIS Pilot Project - way to get
started - Project provides great case study for use of OAIS
Reference Model - Services to project and source of feedback on RM
- Still learning how to best use OAIS
- Expect that as OAIS is more widely used, over
entire data life cycle, it will be even more
valuable