University of Alabama in Huntsville NMI Testing and Experiences - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

University of Alabama in Huntsville NMI Testing and Experiences

Description:

Data Mining in real-time and for post ... Exploiting Technology ... Success Builds on the Integration of User Domains and Information Technology. Data Mining ... – PowerPoint PPT presentation

Number of Views:172
Avg rating:3.0/5.0
Slides: 33
Provided by: sandir
Category:

less

Transcript and Presenter's Notes

Title: University of Alabama in Huntsville NMI Testing and Experiences


1
University of Alabama in Huntsville NMI Testing
and Experiences
Sandra Redman Information Technology and Systems
Center and Information Technology Research
Center National Space Science and Technology
Center 256-961-7806 sredman_at_itsc.uah.edu Sandra.R
edman_at_msfc.nasa.gov www.itsc.uah.edu
2
Improving Data Usability
  • Advanced Applications Development
  • Data organization and management for archival and
    analysis
  • Data Mining in real-time and for post run
    analysis
  • Interchange Technologies for improved data
    exploitation
  • Semantics to transform data exploitation via
    intelligent automated processing
  • Exploiting Technology
  • Grid technologies for seamless access to multiple
    computational and data resources into a virtual
    computing environment
  • Cluster technologies for high speed parallel
    computation, for multiple agent computations, and
    other applications
  • High-performance networking for advanced
    applications development and high performance
    connectivity
  • Next generation technologies in videoconferencing
    and electronic collaboration

3
Exploiting Technology to Improve Data Usability
Distributed Immersive Collaborative Environments
Real-time Data Fusion Information Delivery
Customized knowledge delivery
GRID Processing
Knowledge Discovery
On-Board Mining
Adaptive/learning
Increasing Capability
Custom Order Processing
3D and 4D distributed dynamic data fusion
Visual navigation aides
Earth Science Markup Language (ESML)
Data Mining
Time
Now
Future
4
Data Usability Success Builds on the Integration
of User Domains and Information Technology
  • Information Technology Scientists
  • Information Science Research
  • Knowledge Management
  • Data Exploitation
  • Domain Scientists and Engineers
  • Research and Analysis
  • Data Set Development
  • Collaborations
  • Accelerate research process
  • Maximize knowledge discovery
  • Minimize data handling
  • Contribute to both fields

Domain Scientists and Engineers
Information Scientists
5
Data Mining
  • Automated discovery of patterns, anomalies from
    vast observational data sets
  • Derived knowledge for decision making,
    predictions and disaster response
  • http//datamining.itsc.uah.edu

6
Mining Environment When, Where, Who and Why?
  • WHERE
  • User Workstation
  • Data Mining Center
  • Cluster
  • Grid
  • On-board
  • WHEN
  • Real Time
  • On-Ingest
  • On-Demand
  • Repeatedly
  • WHO
  • End Users
  • Domain Experts
  • Mining Experts
  • WHY
  • Event
  • Relationship
  • Association
  • Corroboration
  • Collaboration

Data Mining
7
Creating a Successful Environment for Data Mining
  • Provide scientists with the capabilities to allow
    the flexibility of creative scientific analysis
  • Provide data mining benefits of
  • Automation of the analysis process
  • Reducing data volume
  • Provide a framework to allow a well defined
    structure to the entire process
  • Provide a suite of mining algorithms for creative
    analysis that can adapt to new hypotheses
  • Provide capabilities to add science algorithms
    to the environment
  • Exploit emerging technologies in computational
    and data grids, high-performance networks, and
    collaborative environments

8
Algorithm Development and Mining System (ADaM) -
System Overview
  • Consists of over 100 interoperable mining and
    image processing components
  • Each component is provided with a C application
    programming interface (API), an executable in
    support of scripting tools (e.g. Perl, Python,
    Tcl, Shell)
  • ADaM components are lightweight and autonomous,
    and have been used successfully in a grid
    environment
  • ADaM has several translation components that
    provide data level interoperability with other
    mining systems (such as WEKA and Orange), and
    point tools (such as libSVM and svmLight)
  • Components include Python wrappers and web
    service interfaces
  • Visualization of results easily accomplished with
    various visualization packages

9
ADaM Components
10
Current Mining Environments
  • Multiple Configurations
  • Complete System (Client and Engine)
  • Mining Engine (User provides its own client)
  • Application Specific Mining Systems
  • Operations Tool Kit
  • Stand Alone Mining Algorithms
  • Distributed/Federated/Grid Mining
  • Distributed services
  • Distributed data
  • Chaining using Interchange Technologies
  • On-board Mining
  • Real time and distributed mining
  • Processing environment constraints
  • Space-based/ground-based/unmanned

11
ADaM Feature Subset Selection application chosen
for testing
  • Supervised pattern classification is a technique
    important in many domains
  • Used to improve both the runtime and accuracy of
    a supervised pattern classifier by eliminating
    noisy, irrelevant or redundant attributes or
    features from the data set.
  • Feature subset selection is the process of
    choosing a subset of the features from the
    original data set in order to maximize classifier
    accuracy
  • Both processor and data-intensive

12
Parallel Version of Cloud Extraction
  • GOES images used to recognize cumulus cloud
    fields
  • Cumulus clouds are small and do not show up well
    in 4km resolution IR channels
  • Detection of cumulus cloud fields in GOES can be
    accomplished by using texture features or edge
    detectors

Master
Slave 1
Slave 2
Slave 3
GOES Image
Laplacian Filter
Sobel Horizontal Filter
Sobel Vertical Filter
Energy Computation
Energy Computation
Energy Computation
Energy Computation
Classifier
Cloud Image
GOES Image
Cumulus Cloud Mask
Three edge detection filters are used
together to detect cumulus clouds which lends
itself to implementation on a parallel cluster
13
Feature Subset Selection Testing
  • Application ported to linux
  • Support Vector Machine downloaded and tested
  • Developed application scripts
  • Modified for Globus environment by writing simple
    Globus RSL file
  • Ran each combination of tools on a different node
    on the grid
  • Globus used to execute jobs on different machines
  • Experimented with both real and synthetic data

14
Early Findings (NMI R2)
  • Globus documentation improved, installation
    trouble-free, application port straight-forward
  • No problems encountered during Condor-G
    installation, but found problem with Condor-G
    under Redhat linux 7.3 when using nss_ldap.
    Developer provided workaround - start name
    service caching daemon (nscd)
  • GSI-OpenSSH installed, but Kerberos
    authentication did not work since linux was not
    compiled with PAM option (undocumented)
  • Network Weather Service installed, but learned we
    are more interested in MDS

15
MEADModeling Environment for Atmospheric
Discovery
  • One of the NSF PACI Alliance research Expeditions
  • Expeditions ensure intense collaboration among
    technology developers and application scientists
    and focus on the deployment of infrastructure
    that supports computational science and
    engineering and science in a variety of
    disciplines.
  • MEADs focus is on retrospective analysis of
    hurricanes and severe storms using the TeraGrid,
    integrating computation, grid workflow
    management, data management, model coupling, data
    analysis/mining, and visualization.

16
MEAD
  • Science Objective
  • To investigate different thunderstorm cell
    interactions favorable for subsequent tornado
    (mesocyclone) formation
  • Approach
  • Use idealized WRF model simulations with
    different initial conditions
  • Create a large parameter space of thunderstorm
    cell interaction and storm behavior
  • Mine this search space for patterns and trends

17
WRF Initializations
  • 230 WRF runs were made, two control
    (single-cell)
  • Each corresponded to a particular
    arrangement of a pair of initial storm cells
  • In figure at left
  • Each square 1 simulation
  • 1st storm in the middle
  • 2nd at one of blue squares
  • Center cell stronger

Matrix of WRF simulations
Slide Source Brian Jewett
18
Goals of this Mining Study
  • Develop a mesocyclone detection algorithm (in
    both 2D and 3D)
  • Develop an algorithm to track the temporal
    evolution of the mesocyclone features
  • Investigate the use of clustering techniques to
  • Summarize differences in simulation runs
  • Provide an overview of all the simulations

19
Example Tracking Results
20
Mesocyclone Detection and Tracking Results
Features with time durations of a single time
step are filtered out
21
Summary Mesocyclone Detection
  • Number of mesocyclones with higher duration tend
    to be associated with initializations where the
    second cell is closer to the first
  • Mesocyclones found in the storm simulations are
    sensitive to the particular arrangement of a pair
    of initial storm cells (secondary storm placement
    at 45 degrees to the primary storm)
  • Clustering techniques are useful to summarize
    differences in simulation runs
  • Clustering techniques provide an overview of all
    the simulations

22
Some Lessons Learned
  • NMI Testbed Process working well
  • Answers found through NMI discussion lists from
    developers and other users
  • Have to sell the grid concept to developers,
    administrators, users
  • NMI Work proven helpful in other grid work
  • TeraGrid
  • LEAD Linked Environments for Atmospheric
    Discovery
  • SpaceDoG Space Development and Operations Grid
  • CEOS Committee for Earth Observing Satellites
  • More Components needed

23
Linked Environment for Atmospheric Discovery
(LEAD)
  • NSF Information Technology Research Program
  • Creating a cyberinfrastructure for mesoscale
    meteorology
  • real-time, on-demand, and dynamically adaptive
    needs for mesoscale weather research
  • High volume data sets and streams
  • Computationally demanding numerical models and
    data assimilation systems

24
The LEAD Goal
  • To create an integrated, scalable framework in
    which analysis tools, forecast models, and data
    repositories can be used as dynamically adaptive,
    on-demand systems that can
  • operate independent of data formats and the
    physical location of data or computing resources
  • change configuration rapidly and automatically in
    response to weather
  • continually be steered by new data (i.e., the
    weather)
  • respond to decision-driven inputs from users
  • initiate other processes automatically and
  • steer remote observing technologies to optimize
    data collection for the problem at hand

25
The LEAD Vision Dynamic, Adaptive, Multi-Scale
NWS National Static Observations Grids
Virtual/Digital Resources and Services
ADAS
ADaM
Mesoscale Weather
MyLEADPortal
Experimental Dynamic Observations
Tools
Remote Physical (Grid) Resources
Local Physical Resources
Local Observations
26
LEAD An integrated framework for identifying,
accessing, preparing, assimilating, predicting,
managing, analyzing, mining, and visualizing
meteorological data, independent of format and
physical location
27
Challenges for Next-generation Mining
  • Develop and document common/standard interfaces
    for interoperability of data and services
  • Design new data models for handling
  • real-time/streaming input
  • data fusion/integration
  • Design and develop distributed standardized
    catalog capabilities
  • Develop advanced resource allocation and load
    balancing techniques
  • Exploit the grid concept for enhanced data mining
    functionality
  • Develop more intelligent and intuitive user
    interfaces
  • Integrate with collaborative environments
  • Develop ontologies of scientific data, processes
    and data mining techniques for multiple domains
  • Support language and system independent
    components
  • Incorporate data mining into science and
    engineering curricula

28
LEAD GWSTBsGrid and Web Services Testbeds
  • Local User Environment customized portal,
    control of information flows, collaboration
    tools, managing processes
  • Productivity Environment models, tools, and
    algorithms
  • Data Services Environment data transport, data
    formatting, and interoperability
  • Distributed Technologies Environment workflow
    infrastructure to autonomously acquire resources
    and adapt to changing plans
  • Data Archive recent and historical data,
    products, and tools

29
LEAD Education Testbeds
  • Provide hands-on access to assess the
    effectiveness of LEAD technologies for education
  • Provide input and feedback to LEAD developers
  • Facilitate knowledge transfer
  • Collaborative technologies

30
LEAD policy development and implementation
  • Define Virtual Organizations
  • LEAD designed for use principally by the
    meteorological higher education and operations
    research communities
  • Develop LEAD policies
  • Developing LEAD global policies
  • Adhere to local policies of each site (security,
    resource utilization, etc.)
  • Policy management services
  • PKI cryptography, X.509 certificates
  • Authorization service
  • Monitor resource utilization and accounting
    services

31
Other considerations
  • Emerging standards and middleware
  • Applications development to concentrate on the
    application using NMI middleware (Globus,
    MyProxy, OGCE, etc) for grid infrastructure also
    using additional middleware (MCS, RSL,
    performance monitoring tools)
  • Current software has dependencies on middleware
    versions
  • Configuration management
  • Distributed team developing and delivering
    software to multiple testbeds
  • Goal is to allow heterogeneous host environments
  • Collaborative technologies
  • Access Grid, H.323 videoconferencing facilitate
    LEAD team project planning and work sessions
  • Collaborative technologies will be integrated
    into testbeds for user education and research

32
Data Integration and Mining From Global
Information to Local Knowledge
Emergency Response
Precision Agriculture
Bioinformatics
Urban Environments
Weather Prediction
Write a Comment
User Comments (0)
About PowerShow.com