Discovering Communicable Scientific Knowledge from SpatioTemporal Data

About This Presentation

Title:

Discovering Communicable Scientific Knowledge from SpatioTemporal Data

Description:

Description of Earth science problem. Choice of representation and algorithm. Results ... Information Sciences & Technology. Testing the model across years ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 18

Provided by: marks163

Learn more at: https://math.nist.gov

Category:

more less

Transcript and Presenter's Notes

Title: Discovering Communicable Scientific Knowledge from SpatioTemporal Data

1
Discovering Communicable Scientific Knowledge
from Spatio-Temporal Data

Mark Schwabacher
NASA Ames Research Center
Computational Sciences Division
mark.schwabacher_at_arc.nasa.gov
http//ic-www.arc.nasa.gov/people/schwabacher/
Joint work with Pat Langley and Jeff Shrager
(ISLE) and Chris Potter, Steve Klooster, Lisy
Torregrosa, and Vanessa Brooks (NASA Earth
Science)

2
Outline

Description of Earth science problem
Choice of representation and algorithm
Results
Visualizations
Discovery of an error in the data
Future Work

3
Earth Science Problem

The Normalized Difference Vegetation Index (NDVI)
is a measure of vegetation across the globe
derived from satellite data
NDVI is used in various Earth-science models
Unfortunately, NDVI is only available for the
years since 1983, when a satellite with these
sensors was launched
We would like to predict NDVI at a point on the
globe from ground-based climate variables
representing temperature, precipitation, and
moisture

4
Choice of Representation

For scientific applications, the learned models
should be
Understandable
Communicable

5
Representation used by scientists

Our Earth Science collaborators had built the
following model with an if statement to select
between two linear models, one for warmer
locations and one for cooler locations
if GDDlt3000 then
ln(NDVI) 0.715 ln(GDD) 0.377 ln(PPT)
0.448
if GDDgt 3000 then
NDVI 189.89 AMI 44.02 ln(PPT) 227.99

6
Choice of Algorithm

We selected regression rules as a generalization
of the Earth scientists representation
We selected Cubist to learn themhttp//www.rulequ
est.com

7
First Results
Cubist produced better accuracy, but model was
hard to understand.
8
Varying the Cubist minimumrule cover parameter
9
2-rule Cubist model

if PPT lt 25.457 then
NDVI -3.225 7.07 PPT 0.0521 CDD - 84
AMI 0.4 ln(PPT) 0.0001 GDD
if PPT gt 25.457 then
NDVI 386.3 316 AMI 0.0294 GDD - 0.99
PPT 0.2 ln(PPT)

10
Visualization 1Cubist model in one variable
11
Visualization 2 Activity of Cubist Rules
12
Visualization 2Error of Cubist model
13
Testing the model across years

We trained Cubist using one years data
We tested the resulting model on other years
data
If it transfers, its useful for Earth scientists
If it sometimes doesnt transfer, that could
point to a scientific discovery

14
Discovery of an error in the data
Cross-validate 1985
Train 1984, test 1985
15
Related Work

Regression trees Breiman et als CART (1984)
Classification applied to Earth science Brodley
Friedl (1999) Ester, Kriegel, Xu (1996)
Visualizing classes on map Brodley Friedl
(1999) Smyth, Ghil, Ide (1999)
Detecting and correcting faulty class labels in
data John (1995) Brodley and Friedl (1999)
Detecting and correcting calibration problems in
remote-sensing systems using predefined model
Chen (1997)

16
Future Work

Cubist/NDVI work
Incorporate time explicitly
Include other variables (e.g. elevation)
Test understandability
Other work
Improve CASA model (next talk)
Implement an interactive system that lets
scientists direct high-level search for improved
ecosystem models

17
Lessons Learned

Weve identified three problems that arise in
scientific applications of ML, and proposed
initial solutions
Communicability Use the same representation as
the scientists.
Understandability When using spatial data,
spatially visualize the models errors and the
activity of its components.
Quantitative errors When using time-series data,
quantitative errors can be identified by testing
a model trained on one time period against data
from other time periods.