Title: DISTRIBUTED DATA INFORMATION SYSTEMS SUPPORTING EARTH OBSERVING AND REMOTE SENSING PROJECTS Prototyp
1DISTRIBUTED DATA INFORMATION SYSTEMS SUPPORTING
EARTH OBSERVING AND REMOTE SENSING PROJECTS
Prototype SEASONAL TO INTERANNUAL ESIPMenas
Kafatos Center for Earth Observing and Space
Research (CEOSR)George Mason Universitymkafatos_at_
gmu.eduhttp//www.siesip.gmu.edu
- A Distributed Data and Information System among
GMU, COLA, GDAAC, and UDel
2Goals of the Federation
- The Goals of the federation are to increase the
quality and value of Earth Science products and
services throughout their life cycle. - The Beneficiaries are all the Federations
stakeholders. - Achieving the goals will be by continuously
improving all all of the science-based processes
underpinning its goods and services.
3Achieving the goals
- Encouraging and establishing the use of best
science practices to ensure the quality and
breadth of data and resultant information,
products and services. - Ensuring that data and information can be readily
exchanged and integrated to improve Earth science
data, information, products, and services. - Contributing to the development of an Earth
science information economy through the
comprehensive consideration of applications,
research and commerce. - Increasing the diversity and breadth of users and
uses of Earth science data, information, products
and services.
4Types of new services
- Facilitating integrated data use
- User-specified products
- Data mining tools
- Model outputs for users
5100km
MODIS 250m First light 16-day NDVI composites
will be made by the GLCF for the conterminous
U.S. Sub-sets for all states will be available.
Nevada
California
NDVI
0
1
Arizona
Created using L2G Surface Reflectance for days 81
82, 2000. Tile h08v05.
6Terra Vandenberg Launch Dec 18, 1999
Solar Array Deploy
Terra's solar panel supplies the 3 kilowatts of
power needed by the spacecraft. Its deployment is
the first major event after Terra separates from
its fairing.
Solar Array Deployment
Image Processed at NASA GSFC
7Terra Vandenberg Launch Dec 18, 1999
Terras Sensors
Combined Swaths of Terra's Instruments
Terra will conduct many of its observations
simultaneously, allowing for new ways of
integrating different types of data
Multiple Sensors On Board Terra
8PROJECT
Science Advisory Board
SIESIP Management Committee
Federation
SIESIP is a distributed Earth Science Information
Partners (ESIP) involving two universities
(George Mason University, University of
Delaware), a seasonal to interannual (S-I)
research center (Center for Ocean-Land-Atmosphere
Studies), and a NASA data center (Goddard
Distributed Active Archive Center)
IT develops, implements, and operates a
distributed data and information system that
addresses the research needs of S-I, TRMM,
SCSMEX,and interdisciplinary Earth Science
9SCIENCESeasonal-Interannual Climate
- One of Four Major Themes of USGCRP
- One of Five Major Science Areas of NASAs ESE
- SIESIP Science Driver
- Seasonal-Interannual Climate Variations,
Predictability and Prediction
10Seasonal-Interannual Climate
- Bridges Timescale Gap
- Same Models Used for NWP, S-I, DecCen
- Same Data Needed as Input to Models for Initial
Conditions, Boundary Conditions, Validation - Bridges Spatial Domains
- Primarily Tropical Phenomena
- Teleconnections to Global Climate Regional
Effects - Good Match to Satellites Timescales
- Years to decades (individual missions - systems,
e.g. TRMM, EOS, DMSP, NOAA-POES)
11Challenge Distributed NASA Non-NASA Data
- By Enabling Analysis of NASA Satellite Data in
Context of Non-NASA Data Sets from a Physically
Distributed, Logically Unique Platform - Can Extend Satellite Data Time Span
- Can Validate/Verify Remotely Sensed Parameters
- Can Include Parameters and Regions Not Measured
from Satellite in Research Analysis, e.g., - Convective Latent Heating
- Sub-Surface Ocean Quantities
- Can Enable Unique Satellite - In Situ Analysis,
e.g., - PMEL TOGA TAO Buoy Data vs. Scatterometer Winds
- Rain Gauge Network Data vs. TRMM Precipitation
Products
12Seasonal-Interannual Climate
- Multidisciplinary/Interdisciplinary Research
- Coupled atmosphere/ocean
- Effects on Biosphere
- Connection to Hydrological Cycle (tropical
rainfall, convection, etc.) - Multiple Phenomena
- ENSO
- Monsoons
- Teleconnections (effects at continental
sub-continental levels) - Relation to Droughts, Event-driven Phenomena,
etc. - Multiple Time Scales
- Spans short-scale weather and longer-term climate
variability - Multi-Agency Data Sets (NASA, NOAA, )
- S-I Community of Scientists (Data Providers and
Users) - Input being provided by Advisory Board with
representation from S-I (Shukla, Schopf,
Miyakoda, Reynolds, etc.), TRMM (North, Weinman),
NSIPP (Schubert), SCSMEX (Lau) IDS (Sorooshian)
communities
131997-98 El Niño SSTA
14USERS/UTILITY
- Users consist of
- 1) discipline researchers (e.g., TRMM rainfall
researchers) - 2) interdisciplinary scientists (e.g., land.
ocean, atmosphere modelers) - 3) graduate students
- Needs include
- 1) quick access to and delivery of relevant S-I
data holdings - 2) filtering of data by time, space and
parameter - 3) presentation of data in easy-to-use format
requiring no special tools or libraries - 4) if feasible, parameters on uniform temporal
and spatial scales (useful for the last 2
categories of users) - Technical/Scientific Challenge
- Ensuring close working relationship with
scientists to ensure validity of procedures and
to perform subsequent QA of data when regridding
data to uniform scales (pertains to 4 above) and,
15SIESIP Federation
SIESIP Client
DODS
Others
Internet
GMU
Exchange Protocols
COLA
Data Ingest
Data Archiving
Data Orders
GDAAC
Data Orders
Other Data Sources (e.g. NOAA)
Data Delivery
16DATACurrent SIESIP Data Sets
17Current SIESIP Data Sets
New Data Sets
18Seasonal - Interannual Data Model and
Observational Data Sets at the SIESIP COLA node
- Moderately large collection of observational data
- Gridded analyses
- Station data
- Very large collection of model output
- COLA AGCM
- COLA OGCM (MOM)
- COLA coupled and anomaly-coupled models
- Dynamical Seasonal Prediction participants AGCMs
- Coupled predictions
19COLA Data
- Multiple Parameters
- Precipitation
- Snow
- Land Surface - Soil Moisture, Soil Wetness, and
Greenness
20COLA Data
- Surface Temperature
- Surface Wind Stress
- Ocean Sub-Surface
- Radiation Quantities
21SIESIP TRMM Data Sets
- TRMM Standard Products
- TRMM Data Subsets
- SCSMEX Data Sets
Comparison of rainfall rate estimated by TRMM
satellite rain algorithms. The average for
February 199,8 near the height of El Niño, is
shown (mm/hr).
22SIESIP Supports SCSMEX Data Analysis
- SIESIP provides TRMM gridded, satellite
coincidence data subsets, and GMS data for Field
Campaign, seasonal inter-annual analyses - Data available at http//daac.gsfc.nasa.gov/CAMPAI
GN_DOCS/TRMM_FE/scsmex/scsmex.html - SIESIP will produce TRMM SCSMEX data CD for
international distribution at SCSMEX Science
Teams request
233-D Orbit Viewer allowing on-the-fly viewing of
TRMM data for a variety of hurricanes, typhoons,
and tropical storms. Hurricane Bonnie is shown
here.
24Tropical Cyclone Leo, 4/29/99 (TSDIS/GMU Orbit
Viewer)
25Data provided by NASA/NASDA/CRL
http//www-tsdis.gsfc.nasa.gov/tsdis/TSDISorbitVie
wer/release.html
26UDel Data
- AVERAGE SEASONAL-CYCLE ESTIMATES FOR THE WORLD
- Gridded datasets are archived on the SIESIP site,
as well as on UDel. - Climatologically averaged values of monthly and
annual air temperature (T) and total
precipitation (P) reinterpolated to a 0.5x0.5
degree grid. - AVERAGE SEASONAL-CYCLE ESTIMATES FOR SOUTH
AMERICA - Climatologically averaged values of monthly and
annual air temperature (T) and total
precipitation (P) interpolated to a 0.5x0.5
degree grid, and their associated
cross-validation fields. Genesis of gridded
datasets involves DEM-aided interpolation. - MONTHLY TIME-SERIES ESTIMATES FOR SOUTH AMERICA
- Monthly total precipitation (P) and average air
temperature (T) interpolated to a 0.5x0.5 degree
grid, and their associated cross-validation
fields for each month in the period 1961-1990.
27(No Transcript)
28Climatology Interdisciplinary Data Collection
(CIDC) NDVI Continental Subsets 1981-1999
http//daac.gsfc.nasa.gov/
- CIDC is a 4-CD-ROM and NDVI is a 3-CD-ROM set
all data are available free by electronic
transfer - Over 70 Monthly Mean Global Climate Parameters -
Land, Ocean, Sun, Cryosphere, Biosphere,
Atmosphere 8-km PAL NDVI set - The CD-ROMs set were produced in collaboration
with the Center for Earth Observing and Space
Research (CEOSR) at George Mason University (with
GrADS as the provided tool for CIDC)
29Monsoon Rain from SMMR Climatology of Monsoon
rainfall over the Indian and West Pacific Oceans
for October 1978 through August 1987.
30INFORMATION TECHNOLOGY STRATEGY
- Development of science scenarios to serve
particular user communities - Web accessibility
- Development of user queries
- Integration of tools accessibility with data set
accessibility to allow meaningful, user-specified
queries - Integration of freely/easily accessible analysis
tools such as GrADS with on-line visualization
data mining (pyramid) metadata searches (XML
and relational data base management systems)
31Strategy Standard/Open Design
- Standard languages (XML-based) for queries in
phases 1, 2, and 3 - Exchangeable components
- Personalization
- Free, open-source software orientation
- Incorporating existing services e.g., DODS,
GrADS, EDG (V0), FTP - Evolutionary implementation
32TECHNOLOGY COMPONENTS
- Completed or in progress
- GrADS/DODS server
- GrADS as a DODS client software
- Basic Web capability or online analysis tools
- Browse capabilities various temporal and spatial
correlations - Data interoperability with COLA (ftp) data and
DODS (http) data - An XML-based online data search system for GrADS
data sets, data at DODS sites etc. - Online 3D visualization of TRMM data
- Data mining/clustering method (non-system type)
- Participation in Federation technology clusters
- Content-based/Data Mining Cluster
- Cluster for Interoperability at the Data Level
33ImplementationThree-Phase Data Access Model
- Phase 1 A user browses and searches the
static (or description) metadata and
content-based metadata provided by the SIESIP
system - Phase 2 The user gets a quick look of the
contents of the data through on-line data
analysis or does detailed analysis on-the-fly
(distributed) - Phase 3 The user has located the data of
interest and then orders the data - This is an interactive and iterative process
34SIESIP Components
ContentBrowsing Analysis Data Order
GrADS Analysis Workbench
Data Order GUI
HTML
Class Libraries
Applet/Plug-In
Internet
Internet
SIESIP Data Sets
Data Pyramid
NOAA
DODS
Local
Metadata
NASA
Data and Metadata Systems on the
Internet Outside of SIESIP
35Flexible Standard
- XML
- A flexible markup language for information
encoding. - Our Approach
- Whatever you put there are metadata.
- All metadata should be searchable.
- If part of your metadata conforms to a standard,
great! If not, ok. - We dont try to create metadata standardsXML is
the standard (language).
36The Role of XML
- Encoding queries of all three phases
- Metadata encoding and query processing
- Phase I query result presentation (see demo)
37Whats in XML Metadata
- SIESIP data organization
- Domain knowledge from domain experts
- This allows the linkage between data sets and
domain knowledge (like what data sets are
closely related to el Nino?) - Annotation from scientists
- This allows searchable annotations (future)
- Structure of a file system with GrADS data
- This allows the directories, control files and
etc., to be searched.
38Welcome to Siesip Data Page
Parameter
SST
Data set NCEP SST
NCEP SST
COADS climatology SST
NDVI
PRECIPITATION
Air Temperature
Spatial Resolution Longitude 1.0 degree(s)
Latitude 1.0 degree(s) Temporal Coverage from
GMT Nov 1 000000 1981to GMT Jul 31 235959
1997 Temporal Resolution MONTHLY Contact
Information name EOS Distributed Active Archive
Center (DAAC)e-mail daacuso_at_daac.gsfc.nasa.gov Y
ou may order this data set by clicking the Order
button.
SSTA
Rain
Project/Experiment
Phenomenon
Time Range
Region
Repository
Format
Observation
Model
Contact
GrADS/data/ncep.1nmago
39GDAAC/GMU Technology Development
- GOALS
- Automate the transfer of online and nearline
metadata between the GDAAC and GMU - To make available all archive data from GDAAC
using several popular interoperability mechanisms
(e.g., metadata publishing, DODS) - IMPORTANCE
- Archive data at the GDAAC is now accessible
through GMU via data exchange protocol (metadata
publishing) - Provides opportunities for other ESIPs which
require access to the extensive GDAAC archive
using a simple/reliable method - Potential exists for opening up the EOS data sets
such as MODIS for search and order via this form
of interoperability - ACCOMPLISHMENTS
- Daily online and nearline metadata are updated
daily and made available to SIESIP through the
GDAAC - DODS server is operational a demonstration data
set (TOMS) is available for serving, to be
augmented by others shortly
40SIESIP GUI
- Integration of tools accessibility with data set
accessibility to allow meaningful, user-specified
queries - Allows users to follow processes in real-time
- Based on JavaSwing technology
41El Niño
1982/83 El Niño Event in March 1983
Sea Surface Temperature Anomaly (SSTA) and Wind
Field
High values of SSTA are found near the west
coast of S. America
Trade winds have dissipated Display using GrADS
42- The spatial pattern of the fifth principal
component of the NDVI variations over the United
States. Green and blue indicate positive
anomaly, yellow and red indicate negative anomaly.
43COLA IT GrADS
- Integrated User Interface Already in Place for
- Selecting, Accessing, and Sampling Data Sets
(grids, stations, future - images) - Computing and Deriving New Quantities
- Quantitatively Visualizing of Results
- Designed to Handle Geophysical Data Sets
- Thousands of Users Worldwide
44Data Access/Interoperability/Analysis
- Level 0 Basic Web capability (interface using
GrADS as analysis engine)limited functions but
can provide quick results to relatively new
users. - Level 1 DODS server where server serves data in
a general way, supports subsetting. Client can
support data interoperability. - Level 2 Stateless analysis server. One
analysis request at a time (no memory from one
request to another). Uses the innovative and
unique ability of GrADS to do very sophisticated
analysis tasks in a highly encapsulated way. What
is needed is a dimension constraint, list of
data sets and a GrADS expression. Example Two
data sets 3 GB amount of data processed at the
server 5 MB amount of data returned to the
client 10 KB. - Level 3 Session-oriented analysis server.
- Climate Analysis Workbench is planned for
implementation on the client side, making use of
server levels 1-3
45Before DODS SIESIP
GrADS
binary
Internet
GrADS, Fortran app
Ferret, Matlab, IDL
46DODS Enables Subsetting
DODS Server
GrADS
binary
Internet
Ferret, Matlab, IDL (DODS clients)
47GrADS Client-Server (Prototype)
datasets in any format supported by GrADS
GrADS-DODS Server
extracts meta-data and subsets
maps DODS requests to GrADS services
parses requests, packages data
handles HTTP protocol
binary data
GrADS batch mode
interface code
DODS server libraries
Java servlet
GRIB data
NetCDF data
HDF data
etc..
DODS requests and compressed data exchanged via
HTTP
internet
DODS Distributed Oceanographic Data SystemA
protocol for transferring data and metadata over
the internet independent of file format see
http//www.unidata.ucar.edu/packages/dods/
Joe Wielgosz 5/25/00
48SIESIPs Contribution
GrADS/ DODS Server
DODS Server
binary
Internet
GrADS (DODS client), Fortran app
49GrADS Analysis Server (Design)
GrADS Analysis Server
datasets in any format supported by GrADS
performs analysis operations
manages sessions, translates dataset names
supports extended request types for analysis,
upload
GrADS data
GRIB data
GrADS batch mode
interface code
DODS server libraries
Java servlet
NetCDF data
etc..
session data
holds temporary data (uploaded, generated by a
previous operation, or transferred directly from
another server) for use in remote analysis
DODS data and requests
Client
upload, remote analysis, and download are
available via extended GrADS commands
custom DODS libraries
GrADS
Joe Wielgosz 5/25/00
50An Example
netCDF
HDF-EOS
netCDF
HDF-EOS
GrADS/ DODS Server
GrADS/ DODS Server
binary
Internet
Metadata from multiple sources in multiple
formats used to diagnose and differentiate
disparate data sets directly on the desktop.
GrADS (DODS Client)
51Use of Metadata ServerExample Possible
Interface with GrADS/DODS?
User/Scientist
General User
Call out
MetadataBrowse/Search
GrADSClient
DODS URL
Client workstation
DODS (GrADS)Server
Metadata(XML)Server
Remote systems
52Interoperability/Data Access Scenario
- User visits SIESIP web page
- User selects parameters andenters them into
workplace - User issues an analysis command for selected
parameters - Server checks the associated data locations
- Server collects data sets through predefined
protocol, e.g., ftp or DODS
- Server performs analysis defined by the user
on-the-fly - Server sends back the results to the client
(e.g. images, time series, etc.) - Client displays the results
53Technology Accomplishments
- GrADS/DODS client
- GrADS/DODS universal server
- Enhancing GrADS to serve satellite data and to
support more functionalities (in progress) - By bringing together DODS and SIESIP, helping to
bring oceanic and atmospheric communities
together - Providing solutions to bring together different
tools and different formats for the use of
scientists rather than imposing standards on them - Providing large volumes of model observational
data to NASA communities with a front-end
GrADS/DODS server (in progress)
54Accomplishments (cont.)
- Innovative flexible interoperable solutions for
data access, data analysis including on-line
browsing data ordering, specifically - XML-based protocol/query for supporting metadata
queries, metadata navigation, data analysis and
data ordering (not fully done) - System design/architecture that supports a
general 3-phasedata access model - Knowledge-base enhancement to metadata
- Service oriented component architecture
- Personalized data access method (not fully done)
- System prototype that supports the above
- Machine-to-machine metadata publishing and data
ingest between GMU and GDAAC (not fully done) - GMU/UAH/IBM sub-cluster for content-based search
- Using applets at IBM to issue queries answered by
servers at GMU and UAH
55Metrics
- Facilitating the conduct of S-I science (input
provided by SIESIPs Advisory Board) - Increasing the number of new users using SIESIP
data (e.g. TRMM at the DAAC) - Bringing in new communities to use NASA data
(e.g. S-I modelers using primarily NOAA, station,
buoy data GrADS users, etc.) - Facilitating the interactions between different
Earth science communities (atmospheric/GrADS
communities with ocean/DODS communities) - User ease in selecting, accessing analyzing
data (vs. their current practices) - Value-added/new data products derived from
interactions in the federation (e.g. DODS,
Interannual Climate Cluster, etc.) - Data orders
- Success of the clusters SIESIP participates in
- Nodes in SIESIP being introduced to new
technology implementations that will enhance
their goals - Refereed papers, proceedings, doctorates produced
56FEDERATION CLUSTERS
- SIESIP is participating in a number of different
clusters - Content-based Cluster/Data Mining
- Interoperability at the Data Level Cluster
- Hydrology Cluster
- Interannual Climate Cluster (ICC)
- LBA-E
57Content-based Browsing Earth Science Data Mining
- Content-based browsing is a process of browsing
or searching the content of data sets prior to
actually accessing or ordering full data sets and
allows a user to acquire important information
contained in the data in order to be able to make
better choices in data selections. It constitutes
an on-line data mining capability. - Accessing data by information content (mining) is
as important as accessing by usual description
metadata - Content-based browsing process is interactive.
- Interactive content-based browsing allows user
queries to take short enough time to be fully
executed.
58Purpose of the function and its Importance for
Earth Sciences
- Utilizes browsing which accesses data content
while allowing user to obtain additional
information for more refined queries, thereby
reducing need for unneeded large data transfers - System is science-specific and can be tailored to
different Earth Sciences user communities
59ImplementationPyramid Data Model
- Motivation -- to support the interactive
content-based browsing of large volumes of data - For example, queries on the statistical
properties of the data can be used in a
content-based browsing process - The challenge in query processing performance for
large data volumes - Solution -- to speed up query evaluations by
precomputing intermediate results which
contribute to answering user queries. - What kind of precomputations? How to apply them?
60(No Transcript)
61 62New Data Mining Prototype
- Objective Content-Based Browsing Search
- Find areas and time periods on which a
parameter value falls in certain range. - Examples
- Find regions with, e.g., Ave(NDVI)gt 0.5
- Two parameter conditional correlations, e.g.
correlate SST anomalies in the tropical Pacific
with AVHRR NDVI in specific regions where SSTA gt
some value, - etc.
63Comparison (differences)
Indexing
Top-down filtering
64Histograms After Clustering
Sum up the histograms in each cluster to get
the representative histograms.
65Representative Histograms
66Scenarios
- Ocean scenario (with Ocean ESIP, P.O.DAAC)
- Correlate SST anomalies in the tropical Pacific
with AVHRR NDVI in specific regions such as S.
Africa, continental U.S., etc. - Correlations are in time series of spatially
averaged values of SSTA and NDVI as well as SOI
and NDVI - Vegetation scenario (for land, vegetation ESIPs)
- Find regions with NDVI (aver.) gt 0.5
- Find deciduous forests in a particular
geographical region, etc. - Hurricane scenario (with PM-ESIP LIS SCF)
- Display TRMM PR and TMI data for specific
hurricanes showing rain rate above a certain
value - Display all hurricanes with rain rate above a
certain value
67Data Level Interoperability
- SIESIP is one of DODS data server sites
- GrADS has been added to the DODS suite
- of client software
- DODS data access enabled through SIESIP
- GUI interface (next step)
- COLA ftp data access enabled though SIESIP
- GUI interface
- GrADS as part of DODS server
- -To manipulate DODS data before
transferring - -To support more data types and data
formats
68Interannual Climate Cluster
- Name of ESIP/Cluster____________________
- GDAAC (ESIP-1)
- LIS SCF (ESIP-1)
- PO.DAAC (ESIP-1)
- EOS-WEBSTER (UNH) (ESIP-2)
- ESS-W (UCSB) (ESIP-2)
- GENESIS (JPL) (ESIP-2)
- GLCF (UMD) (ESIP-2)
- Ocean ESIP(JPL) (ESIP-2)
- PM-ESIP (UAH) (ESIP-2)
- SIESIP (GMU) (ESIP-2)
- UMAC (UND) (ESIP-3/RESAC)
- DODS Cluster (Cluster)
- LBA-E (Cluster)
- Focus is S-I Climate