Discovering Communicable Scientific Knowledge from SpatioTemporal Data - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Discovering Communicable Scientific Knowledge from SpatioTemporal Data

Description:

Description of Earth science problem. Choice of representation and algorithm. Results ... Information Sciences & Technology. Testing the model across years ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 18
Provided by: marks163
Learn more at: https://math.nist.gov
Category:

less

Transcript and Presenter's Notes

Title: Discovering Communicable Scientific Knowledge from SpatioTemporal Data


1
Discovering Communicable Scientific Knowledge
from Spatio-Temporal Data
  • Mark Schwabacher
  • NASA Ames Research Center
  • Computational Sciences Division
  • mark.schwabacher_at_arc.nasa.gov
  • http//ic-www.arc.nasa.gov/people/schwabacher/
  • Joint work with Pat Langley and Jeff Shrager
    (ISLE) and Chris Potter, Steve Klooster, Lisy
    Torregrosa, and Vanessa Brooks (NASA Earth
    Science)

2
Outline
  • Description of Earth science problem
  • Choice of representation and algorithm
  • Results
  • Visualizations
  • Discovery of an error in the data
  • Future Work

3
Earth Science Problem
  • The Normalized Difference Vegetation Index (NDVI)
    is a measure of vegetation across the globe
    derived from satellite data
  • NDVI is used in various Earth-science models
  • Unfortunately, NDVI is only available for the
    years since 1983, when a satellite with these
    sensors was launched
  • We would like to predict NDVI at a point on the
    globe from ground-based climate variables
    representing temperature, precipitation, and
    moisture

4
Choice of Representation
  • For scientific applications, the learned models
    should be
  • Understandable
  • Communicable

5
Representation used by scientists
  • Our Earth Science collaborators had built the
    following model with an if statement to select
    between two linear models, one for warmer
    locations and one for cooler locations
  • if GDDlt3000 then
  • ln(NDVI) 0.715 ln(GDD) 0.377 ln(PPT)
    0.448
  • if GDDgt 3000 then
  • NDVI 189.89 AMI 44.02 ln(PPT) 227.99

6
Choice of Algorithm
  • We selected regression rules as a generalization
    of the Earth scientists representation
  • We selected Cubist to learn themhttp//www.rulequ
    est.com

7
First Results
Cubist produced better accuracy, but model was
hard to understand.
8
Varying the Cubist minimumrule cover parameter
9
2-rule Cubist model
  • if PPT lt 25.457 then
  • NDVI -3.225 7.07 PPT 0.0521 CDD - 84
    AMI 0.4 ln(PPT) 0.0001 GDD
  • if PPT gt 25.457 then
  • NDVI 386.3 316 AMI 0.0294 GDD - 0.99
    PPT 0.2 ln(PPT)

10
Visualization 1Cubist model in one variable
11
Visualization 2 Activity of Cubist Rules
12
Visualization 2Error of Cubist model
13
Testing the model across years
  • We trained Cubist using one years data
  • We tested the resulting model on other years
    data
  • If it transfers, its useful for Earth scientists
  • If it sometimes doesnt transfer, that could
    point to a scientific discovery

14
Discovery of an error in the data
Cross-validate 1985
Train 1984, test 1985
15
Related Work
  • Regression trees Breiman et als CART (1984)
  • Classification applied to Earth science Brodley
    Friedl (1999) Ester, Kriegel, Xu (1996)
  • Visualizing classes on map Brodley Friedl
    (1999) Smyth, Ghil, Ide (1999)
  • Detecting and correcting faulty class labels in
    data John (1995) Brodley and Friedl (1999)
  • Detecting and correcting calibration problems in
    remote-sensing systems using predefined model
    Chen (1997)

16
Future Work
  • Cubist/NDVI work
  • Incorporate time explicitly
  • Include other variables (e.g. elevation)
  • Test understandability
  • Other work
  • Improve CASA model (next talk)
  • Implement an interactive system that lets
    scientists direct high-level search for improved
    ecosystem models

17
Lessons Learned
  • Weve identified three problems that arise in
    scientific applications of ML, and proposed
    initial solutions
  • Communicability Use the same representation as
    the scientists.
  • Understandability When using spatial data,
    spatially visualize the models errors and the
    activity of its components.
  • Quantitative errors When using time-series data,
    quantitative errors can be identified by testing
    a model trained on one time period against data
    from other time periods.
Write a Comment
User Comments (0)
About PowerShow.com