Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS - PowerPoint PPT Presentation

About This Presentation
Title:

Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS

Description:

... (e.g. which math library the shell loaded) ... NAMES= – PowerPoint PPT presentation

Number of Views:349
Avg rating:3.0/5.0
Slides: 34
Provided by: jamesf153
Category:

less

Transcript and Presenter's Notes

Title: Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS


1
Tracking Metadata and Lineageof the Data
Processing Chainfor Mapping Snow Cover
Propertieswith the NASA MODIS
  • James Frew1, Thomas H. Painter2,Peter
    Slaughter1, Jeff Dozier1

1Donald Bren School of Environmental Science and
Management, University of California, Santa
Barbara 2National Snow and Ice Data
Center,University of Colorado, Boulder
2
Outline
  • Motivation
  • Snow mapping product
  • Implications for hydrologic modeling
  • Lineage Capture
  • Wrapping the ESSW experience
  • Instrumenting,overriding,monitoring the
    (ongoing) ES3 experience

3
MODIS image Sierra Nevada
EOS Terra MODIS 07 March 2004 MOD09 Surface
Reflectance 0.555 0.645 0.858
4
Snow-covered area and grain size
5
Hindu Kush
2003 DOY 070
6
Colorado RockiesCLPX13 March 2002
7
Model structure MODIS snow-area / albedo
8
Lineage Capture, Take 1
  • The ESSW experience

9
Using Existing Science Applications
  • No standardEarth science computing environment
  • commercial packages (ArcInfo, MATLAB, )
  • public packages/models (MM5, MODTRAN, )
  • locally-developed codes
  • arbitrary combinations of ?
  • Example SST from AVHRR ? ? ?
  • commercial, standalone programs
  • parameters highly customized for UCSB
  • How do we get these programs to
  • communicate
  • cooperate
  • with ESSW, without rewriting them?

Receive
Ingest and Calibrate
Navigate (Manual/Automatic)
Sea Surface Temp (SST)
Rectify
SST Maps
10
Lineage Current Best Practice
11
Earth System Science Workbench (ESSW)
  • Producer and consumer issues can both be
    addressedby a laboratory metaphor
  • Experiment
  • Network of models
  • ingesting / synthesizing data
  • generating products
  • Laboratory
  • Experiment execution environment
  • Computing storage accessibility scalability
  • Lab Notebook
  • Persistent storage that can be queried
  • Keeps track of all experiments
  • Documentation lineage accountability

12
Wrap Your App Scripts Talk to ESSW
  • No changes,just additions
  • Wrapper scripts
  • Make program (groups) look like ESSW experiments
  • use Perl API
  • Lab Notebook daemon
  • Accepts API commands
  • Creates XML documents
  • Sends to database
  • ESSW database
  • XML metadata DTDs
  • Tabular metadata
  • XML search terms
  • Lineage links

Perl API
XML SQL
Lab Notebookdaemon
Receive
Ingest and Calibrate
ESSW Database
Navigate (Manual/Automatic)
Sea Surface Temp (SST)
Rectify
MySQL
Java
SST Maps
JDBC
Perl
13
ESSW Metadata management
  • Lab Notebook daemon verifies XML metadata
    document
  • Experiment step metadata stored for product
    lineage tracking
  • Complete metadata document stored in custom
    database table
  • XML DTD ? 11 ? database table
  • (n1)th column is document itself
  • Some metadata values extracted into database
    tables
  • DTD contains column names and types for some
    elements
  • Always save all the XML,even if dont know how
    to columnize all of it

14
Wrapper Example Input Dataset
15
Wrapper Example Output Dataset
16
Wrapper Example Process
17
Wrapper Example Lineage Links
18
Process graph reconstructedfrom ESSW database
19
ESSW Lessons
  • Providers are customers
  • ESIPs arent much good unless scientists are
    happy to put information in them
  • A light touch is the right touch
  • Wrapping is easier for scientists and their
    programmers to deal with than complete
    re-engineering
  • Scientists do write scripts, but not necessarily
    Perl
  • Scripting (gluing stuff together) comes naturally
    to scientists
  • Scientists dont write DTDs
  • Nobody calls metadata APIs
  • ESSW was automatic, but not automatic enough

20
Lineage Capture, Take 2
  • The ES3 experience

21
ES3 Earth System Science Server
ESSW data lineage tracking

MODster
OpenDAP
Watershed-scale snow product
MODIS
Microsoft TerraServer
AVHRR
Global-scale snow product
Alexandria Digital Library
Corona
BUB data storage
ROCKS processing clusters
22
From ESSW to ES3 Summary
  • Perl wrappers ? Probulators
  • Perl API ? web services XML messages
  • MySQL ? XML database(s)

23
From Wrappers to Probulators
  • Wrappers Active Lineage
  • Complete control over what gets recorded
  • Single language/API for all wrapped events
  • Not tied to execution
  • You can even lie about what happened
  • Must explicitly script everything
  • Scripts can drift from reality
  • You can even lie about what happened

24
From Wrappers to Probulators
  • Probulators Passive Lineage
  • Record what actually happened
  • Not just what you think happened
  • Not what didnt happen
  • Automatic dont have to write new scripts for
    everything
  • Different flavors for different environments
  • Cant just do everything in Perl

25
Probulator patterns
  • Instrumentation
  • Insert lineage capture instructions directly into
    science codes
  • e.g. I just created file foo
  • Typical implementation preprocessor/precompiler
  • Overriding
  • Replace standard routines/libraries with
    lineage-capturing versions
  • e.g. open() ? snoopy_open()
  • Typical implementation modify execution
    environment
  • environment variables
  • configuration files
  • Passive monitoring
  • Trace program execution
  • e.g. called open() with args foo, bar,
  • Typical implementation straced shell

26
ES3 Lineage Architecture
probulator1
logger
transmitter
ES3 core
probulatorn
27
Probulating IDL Instrumenting the code
  • edit
  • pro modscag_cleanse,prefixprefix,nsns,nlnl
  • HELP, NAMES"", OUTPUTES3_ENVIROMENT ES3_LOG,
  • ENTER"modscag_cleanse", ENVIROMENTES3_ENVIROME
    NT
  • clean up under,overflow of MODSCAG run
  • Input prefix prefix for all of the MODSCAG
    output filenames
  • ns number of samples
  • nl number of lines
  • Output rewrite of the MODSCAG files
  • t.h.painter / 1.19.2005
  • open snow file
  • ES3_openr,1,string(prefix,'snow.pic')
  • snowfltarr(ns,nl)
  • readu,1,snow

28
Probulating IDL Results
  • ltinit time"20050522T234606Z
  • pid"31002" stime"20050522T234604Z"
    pstime"20050522T234256Z" ppid"30920"
    language"idl" user"haavar" hostname"spitting-du
    ck.bren.ucsb.edu"gt
  • ltenviromentgt
  • ltvariable name"!PATH" value"/home/haavar/probu
    lator//idl
  • /home/rsi/idl_6.1/lib/hook
  • lt/enviromentgt
  • ltmount-pointsgt
  • ltmount share"dab15/ed15/rsi"
    type"nfs"gt/home/rsilt/mountgt
  • lt/mount-pointsgt
  • lt/initgt
  • ltenter region"modscag_cleanse"gt
  • ltenviromentgt
  • ltvariable type"INT" name"NL" value"2"/gt
  • ltvariable type"INT" name"NS" value"2"/gt
  • lt/enviromentgt
  • lt/entergt
  • ltexec time"20050522T234610Z" routine"OPENR"gt

29
Probulating bash Passive Monitoring
  • cat /etc/passwd grep haavar sed -n
    's/\(.\)\2\\(0-9\\)./\2/p'
  • 25232 1138336174.480079 open("/etc/ld.so.cache",
    O_RDONLY) 3
  • 25232 1138336174.480215 open("/lib/libm.so.6",
    O_RDONLY) 3
  • 25234 1138336178.887267 dup2(3, 255) 255
  • 25234 1138336178.887912 pipe(3, 4) 0
  • 25234 1138336178.888257 clone(child_stack0, ,
    child_tidptr0xb7f2e708) 25235
  • 25235 1138336178.889366 dup2(4, 1) 1
  • 25235 1138336178.889975 pipe(3, 4) 0
  • 25235 1138336178.890326 clone(child_stack0, ,
    child_tidptr0xb7f2e708) 25236
  • 25235 1138336178.891260 pipe(4, 5) 0
  • 25235 1138336178.891756 clone(child_stack0, ,
    child_tidptr0xb7f2e708) 25237
  • 25235 1138336178.892753 clone(child_stack0, ,
    child_tidptr0xb7f2e708) 25238
  • 25238 1138336178.894266 dup2(4, 0) 0
  • 25236 1138336178.894726 dup2(4, 1) 1
  • 25237 1138336178.894763 dup2(3, 0) 0
  • 25237 1138336178.895581 dup2(5, 1) 1

30
Probulating bash Results
  • ltinitgt same as IDL
  • ltexec time"20060027T042938.900117Z"
    routine"/bin/cat" pid"25236" ppid"25235"gt
  • ltargumentsgt
  • ltargumentgt/etc/passwdlt/argumentgt
  • lt/argumentsgt
  • ltiogt
  • ltpipe read"true" id"std-in"/gt
  • ltpipe write"true" id"3"/gt
  • ltpipe write"true" id"std-err"/gt
  • ltfile read"true"gt/etc/ld.so.cachelt/filegt
  • ltfile read"true"gt/etc/passwdlt/filegt
  • lt/iogt
  • lt/execgt
  • ltexec time"20060027T042938.903342Z"
    routine"/bin/grep" pid"25237" ppid"25235"gt
  • ltargumentsgt
  • ltargumentgthaavarlt/argumentgt
  • lt/argumentsgt
  • ltiogt

31
Now What?
  • Probulator reports not universally unique
  • Q How hook separate reports together?
  • A Logger assigns UUIDs to
  • Data streams
  • Processes
  • Jobs (workflows)
  • Lineage not explicit
  • Q How publish lineage?
  • A ES3 Core builds serialized graph

32
Thanks to
  • Current
  • Mike Colee
  • Stephane Maritorena
  • Dominic Metzger
  • Karl Rittger
  • Dave Siegel
  • Former
  • Anurag Acharya
  • Rajendra Bose
  • Scott Denning
  • Debbie Donahue
  • Jim Duff
  • Calin Duma
  • Erik Fields
  • Jim Gray
  • Steve Miley
  • Jordan Morris
  • Mark Pelletier
  • Pete Peterson
  • Walter Rosenthal
  • Klaus Schauser
  • Håvar Valeur

33
To Probulate Further http//www.snow.ucsb.edu
Publications
  • Bose, R. and Frew, J., 2005. Lineage retrieval
    for scientific data processing a survey. ACM
    Computing Surveys, vol. 37, no. 1, pp. 1-28.
  • doi10.1145/1057977.1057978
  • Dozier, J., and Painter, T.H., 2004.
    Multispectral and hyperspectral remote sensing of
    alpine snow properties. Annual Review of Earth
    and Planetary Sciences, vol. 32, pp. 465-494.
  • doi10.1146/annurev.earth.32.101802.120404
  • Molotch, N.P., Painter, T.H., Bales, R.C., and
    Dozier, J., 2004. Incorporating remotely sensed
    snow albedo into spatially distributed snowmelt
    modeling. Geophysical Research Letters, 31,
    L03501
  • doi10.1029/2003GL019063
  • Frew, J. and Bose, R., 2001. Earth System Science
    Workbench a data management infrastructure for
    Earth science products. In Kerschberg, L. and
    Kafatos, M. (eds.) 2001. Proceedings, 13th
    International Conference on Scientific and
    Statistical Database Management (SSDBM 2001), pp.
    180-189.
  • doi10.1109/SSDM.2001.938550
Write a Comment
User Comments (0)
About PowerShow.com