NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) - PowerPoint PPT Presentation

Loading...

PPT – NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) PowerPoint presentation | free to download - id: 1f9e89-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP)

Description:

Blender Task 1 scenario and modules were contributed by Jeff Masek and Feng Gao. Modules ... Output data: 1 LEDAPS/Blender composite scene. Sponsored by NASA ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 21
Provided by: bew7
Learn more at: http://wgiss.ceos.org
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP)


1
NASA GSFC Landsat Data Continuity Mission (LDCM)
Grid Prototype (LGP)
  • Beth Weinstein
  • NASA GSFC

May 8, 2006
2
LDCM Grid Prototype (LGP) Introduction
  • A Grid infrastructure allows scientists at
    resource-poor sites access to remote
    resource-rich sites
  • Enables greater scientific research
  • Maximizes existing resources
  • Limits the expense of building new facilities
  • The objective of the LDCM Grid Prototype (LGP) is
    to assess the applicability and effectiveness of
    a data grid to serve as the infrastructure for
    research scientists to generate Landsat-like data
    products

3
LGP Milestones
  • Capability 1 (C1) (12/03 - 12/04)
  • Demonstrated a basic grid infrastructure to
    enable a science user to run their program on a
    specified resource in a virtual organization
  • Virtual organization (VO) included GSFC labs and
    USGS EROS resources
  • Basic Globus Toolkit 2.4 (e.g. GSI, GridFTP,
    GRAM)
  • Capability 2 (C2) (12/04 - 9/05)
  • Demonstrated an expanded grid infrastructure to
    allow the dynamic allocation of resources to
    enable a specific science application
  • VO included NASA GSFC labs, USGS EROS, University
    of Maryland (UMD)
  • Workflow enabled
  • NASA ROSES ACCESS A.26 (1/06 1/08)
  • Land Cover Change Processing and Analysis System
    LC-ComPS

4
Capability 1 Science Scenario
LEDAPSL7ESR
MODIS MOD09GHK
5
Capability 1 Summary
  • Prepare two heterogeneous data sets at different
    remote locations for like footprint comparison
    from a science users home site
  • The MODIS Reprojection Tool (MRT) serves as our
    typical science application developed at the
    science users site (GSFC Building 32 in demo)
  • mrtmosaic and resample (subset and reproject)
  • Operates on MODIS and LEDAPS (Landsat surface
    reflectance) scenes
  • Data distributed at remote facilities
  • NASA GSFC Building 23 (MODIS scenes)
  • USGS EROS (LEDAPS scenes)
  • Solves a realistic scientific scenario using
    grid-enabled resources

6
Capability 2 Science Scenario
Landsat Scene 1 Path/Row 182/61 Date 2/12/2002
2002 182/61 Composite
Landsat Scene 2 Path/Row 182/61 Date 6/4/2002
7
Capability 2 Summary
  • Create direct reflectance composite products
    using Landsat data
  • Blender Task 1 scenario and modules were
    contributed by Jeff Masek and Feng Gao
  • Modules
  • lndcal - calibration
  • lndcsm cloud shadow mask
  • lndsr surface reflectance
  • lndreg - registration
  • lndcom - composite
  • Input data
  • Up to 5 Landsat scenes spatially coincident
  • GSFC ancillary data
  • TOMS (ozone)
  • Reanalysis (Water Vapor)
  • Output data 1 LEDAPS/Blender composite scene

8
Capability 2 Scenario
EROS Pool
2001 Landsat Scene 2
2001 Landsat Scene 1
2001 Landsat Scene 3
2001 Landsat Scene 4
lndcal
lndcal
lndcal
lndcal
lndcsm
lndcsm
lndcsm
lndcsm
ancillary inputs
ancillary inputs
ancillary inputs
ancillary inputs
lndsr
lndsr
lndsr
lndsr
lndreg
lndreg
lndreg
lndcom
30m resolution 2001 composite product (single
path-row)
9
Capability 2 Virtual Organization
UMD1 (2) UMD College Park
Edclxs66 (2) USGS EROS Sioux Falls, SD
LGP23 (4) GSFC B23/W316
MacCl23 (12) GSFC B23/W316
1 Gbps
1 Gbps
USGS EROS 1 Gbps Backbone
1 Gbps
GSFC SEN 1Gbps Backbone
1 Gbps
USGS EROS
MAX (College Park) OC48, 2.4Gbps Backbone
1 Gbps
OC12, 622 Mbps
LGP32 (2) Science User_1 GSFC B32/C101
vBNS (Chicago) OC48, 2.4Gbps Backbone
OC12, 622 Mbps Shared with DREN
GSFC
Capability 2

SEN Science and Engineering Network MAX
Mid-Atlantic Crossroads DREN Defense Research
and Engineering Network vBNS Very high
Performance Network Service
Capability 3
10
Capability 2 Grid Workflow
  • In Capability 1, jobs were run on a specific
    resource
  • In Capability 2, workflow provided the ability to
    submit a job to the Grid (VO)
  • Leverage distributed resource sharing and
    collaboration on a large-scale
  • Grid resource management
  • Automatic allocation of grid resources
  • Sub task management
  • Reliable job completion
  • Leverage idle cpu cycles

11
Capability 2 Workflow Software Karajan
  • Karajan provides grid workflow functions
  • Includes task management language and an
    execution engine
  • Integrated with the Java Commodity Grid (CoG) Kit
  • Includes a task scheduler
  • Runs gridExecute and gridTransfer tasks on grid
    resources
  • Manages both local and remote resources
  • Specifies workflow using XML
  • Supplies command line and GUI
  • interfaces

Java CoG 4_0_a1
Karajan
Globus Toolkit 2.4.3
Globus Gate keeper
GRAM
GridFTP
Karajan Globus Grid Architecture
12
User Configuration File Specification
  • User creates product.spec configuration file
  • Path, row, and acquisition date provided for each
    input scene
  • product.spec example file
  • host edclxs66.cr.usgs.gov
  • base_directory /data/LEDAPS
  • 182 062 20010719 base
  • 182 062 20030215
  • 182 062 20040218
  • 182 062 20040609
  • -
  • default to host and base_directory specified
    above
  • 182 061 20020212 base
  • 182 061 20020604
  • 182 061 20040101
  • 182 061 20040218
  • 182 061 20040711

13
Capability 2 Architecture
Product.spec
driver.pl
driver.xml
Karajan
ltparallelgt

Host
Host 1
Host 2
create_composite1.xml
create_composite2.xml
ltsequentialgt
ltsequentialgt
lndpm
lndpm
lndpm
lndpm
Scene 1 ltpath, row, acqDategt Base Scene
Scene 1 ltpath, row, acqDategt Base Scene
lndcal
lndcal
lndcal
lndcal
lndcsm
lndcsm
lndcsm
lndcsm
Scene 2 ltpath, row, acqDategt
Scene 2 ltpath, row, acqDategt
lndsr
lndsr
lndsr
lndsr


lndreg
lndreg
lndcom
lndcom
Copy_ output
Copy_ output
14
Capability 2 Performance
  • Processing benchmarks
  • Each composite had 4 input scenes
  • Transfer rates
  • File Transfer using 8 parallel streams
  • Raw Data Files (TIF) 57 Mb in 45-50 Sec. ( 1.26
    Mbps)
  • Final Output File (HDF) 1.25 Gb in 5 Minutes ( 4
    Mbps)
  • Conclusion Larger files are more efficient

of composites Time to process
8 3 hours
16 5 hours 36 minutes
32 (2 16 parallel runs) 11 hours 46 minutes
48 12 hours 50 minutes
Data Host Remote Host
File Transfer 7 9
CPU Processing 93 91
15
Performance Research and Potential Plans
  • Benchmarked processing rates for producing up to
    50 output scenes
  • Completed initial analysis of transfer and
    processing rates obtained using Netlogger
  • Netlogger provides the ability to monitor
    applications within a complex distributed
    environment in order to determine exactly where
    time is spent
  • Room for Optimization
  • Analyze process flows to optimize running in
    operational setting and implement optimization
    strategies below
  • Complete input file compression on data host
    prior to file transfer
  • Increase the parallelization
  • Parallel runs of multiple input scenes for a
    single composite
  • Parallel file transfer
  • Add more CPUs and maximize CPU utilization
  • Look at error handling and possibility of
    automatic re-starting of jobs

16
LGP Lessons Learned
  • The Open Source environment can be very
    beneficial
  • Reuse, Collaboration incentive
  • Hardened software (i.e. GSI)
  • A surprising amount of time was spent on basic
    network administration and security
  • Network performance
  • Firewall/ports
  • Maintaining configuration management across
    independent agencies and centers is difficult
  • MapCenter - System status tool (QA/Calibration)
  • Understanding the processing flow and modules
    required for optimization
  • Once size doesnt fit all (at least not yet)
  • Allow for remote processing dynamic ancillary
    data
  • CPU intensive vs. data intensive
  • Karajan is somewhat immature, but we have passed
    on requests to CoG developers
  • Karajan does provide the basic framework for
    creating workflows in an operational setting.
    Functionality not provided by the basic framework
    is being provided by external wrapper scripts
  • Developed workaround to pass environment
    variables across processing runs
  • Provided wrapper script to pass arguments to
    underlying Globus executables
  • Very elementary Job Scheduler

17
Current and Future Work
  • LDCM Grid Prototype work will continue
  • Receiving NASA ROSES ACCESS A.26 funding for Land
    Cover Change Processing and Analysis System
    (LC-ComPS)
  • Use grid technology to allow regional and
    continental-scale land cover analysis at high
    resolution
  • Use Globus 4.0 as the underlying Grid
    infrastructure
  • Improve error handling in the workflow scripts
    and handle automatic re-starting of tasks in the
    event of failures
  • Expand the pool of machines in VO

18
Backup Slides
19
Acknowledgements
  • Sponsors
  • LDCM - Bill Ochs, Matt Schwaller
  • Code 500/580 - Peter Hughes, Julie Loftis
  • LGP Team members
  • Jeff Lubelczyk (Lead)
  • Gail McConaughy (Branch Principal)
  • Beth Weinstein (SW Lead)
  • Ben Kobler (HW, Networks)
  • Eunice Eng (SW Dev, Data)
  • Valerie Ward (SW Dev, Apps)
  • Ananth Rao (SGT SW Arch/Dev, Grid Expert)
  • Brooks Davis (Aerospace Corp Grid Expert)
  • Wayne Yu (QSS Sys Admin)
  • GSFC Science Input
  • Jeff Masek (Blender)
  • Feng Gao (Blender)
  • USGS EROS
  • Stu Doescher (Mgmt)
  • Chris Doescher (POC)
  • John Dwyer
  • Tom Mcelroy
  • Mike Neiers (Sys Support)
  • Cory Ranschau (Sys Admin)
  • University of Maryland (UMD)
  • Paul Davis
  • Gary Jackson

20
Acronym List
  • ACCESS Advancing Collaborative Connections for
    Earth-Sun
  • System Science
  • EROS Earth Resources Observation and Science
  • FTP File Transfer Protocol
  • GASS Globus Access to Secondary Storage
  • GRAM Grid Resource Allocation Management
  • GSI Grid Security Infrastructure
  • LC-ComPS Land Cover Change Processing and
    Analysis System
  • LDCM Landsat Data Continuity Mission
  • LEDAPS Landsat Ecosystem Disturbance Analysis
    Adaptive
  • Processing System
  • LGP LDCM Grid Prototype
  • LP DAAC Land Processes Distributed Active Archive
    Center
  • MDS Monitoring Discovery System (MDS)
  • MODIS Moderate Resolution Imaging
    Spectroradiometer
  • MRT MODIS Reprojection Tool
  • ROSES Research Opportunities in Space and Earth
    Sciences
About PowerShow.com