GridMiner a Framework for Data Integration - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

GridMiner a Framework for Data Integration

Description:

GridMiner a Framework for Data Integration & Knowledge Discovery ... Umut Onan. Ibrahim Elsayed. Knowledge Mgt: Ivan Janciak. Data mediation: Alexander W hrer ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 40
Provided by: ACE558
Category:

less

Transcript and Presenter's Notes

Title: GridMiner a Framework for Data Integration


1
GridMiner a Framework for Data Integration
Knowledge Discovery on Computational Grids
  • Peter Brezany
  • Institute of Scientific Computing
  • University of Vienna, Austria
  • brezany_at_par.univie.ac.at

April 8, 2005, Vienna
2
Outline
  • Motivation
  • Scientific and Application Drivers
  • GridMiner Project in Vienna
  • Architecture
  • Workflow Management
  • Data Access and Integration
  • On-Line Analytical Processing Data Mining
  • Current Prototype
  • Demo
  • Future Work
  • Conclusions

3
Motivation
Business
Medicine
Scientific experiments
Data and data exploration
cloud
Simulations
Earth observations
4
Stages of a Data Exploration Project
Time to
Importance complete to
success (percent of total) (percent of
total)
Based on Data Preparation for Data Mining, by
Dorian Pyle, Morgan Kaufmann
  • Exploring the problem 10 15
  • Exploring the solution 9 20 14 80
  • Implementation specification 1 51
  • Knowledge discovery
  • a. Data preparation 60 15
  • b. Data surveying 15 3
  • c. Data modeling 5 2

80
20
5
The Knowledge Discovery Process
Knowledge
OLAP Queries
OLAP
Online Analytical Mining
Evaluation and Presentation
Data Mining
Selection and Transformation
Data Warehouse
Cleaning and Integration
6
Application and Scientific Drivers
7
Data Mining Accuracy vs. Data Size
100
accuracy
sampled data size
available data size
8
Project EcoGRID (Sketch)
Distributed Data
Distributed Applications
Distributed Data Mining
Reporting
Bio- diversity
Waste
Popular Presen- tation
Statistic
Air
Soil
Flow Analysis
Prediction Models
Emmisions
Water
Geo- Statistic

Forests
Common Ontology
Author Kathi Schleidt
9
Management of TBI patients
  • Traumatic brain injuries (TBIs) typically result
    from accidents in which head strikes an object.
  • The treatment of TBI patients is very resource
    intensive.
  • The trajectory of the TBI patients management
  • Trauma event
  • First aid
  • Transportation to hospital
  • Acute hospital care
  • Home care
  • All the above phases are associated with data
    collection into databases now managed by
    individual hospitals.

Usage of mobile communication devices
10
The GridMiner Project in Vienna
  • GridMiner A knowledge discovery Grid
    infrastructure (http//www.gridminer.org/)
  • OGSA-based architecture
  • Workflow management
  • Grid-aware data preprocessing
  • and data mining services
  • Data mediation service
  • OLAP service
  • GUI
  • Current Implementation on top of Globus Toolkit
    3.2

11
GridMiner (Goal) Architecture
SMD Support for Mobile Devices
GridMiner Mobility
GridMiner Workflow
GM DSCE Dynamic Service Control
GridMiner Core
GMPPS Preprocessing
GMDMS Data Mining
GMPRS Presentation
GMDT Transformation
GMOMS OLAM
GridMiner Base
GMMS Mediation
GMIS Information
GMRB Resource Broker
GMCMS OLAP / Cubes
Grid Core
Grid Core Services
Security
File and Database Access Service
Replica Management
Basic Grid Services
Fabric
Grid Resources
Data Source
12
Collaboration of GM-Services
Simple Scenario
GMPPS Preprocessing
GMDMS Data Mining
GMDIS Integration
GMPRS Presentation
Intermediate Result 1
Intermediate Result 2 (e.g. flat table)
Intermediate Result 3 (e.g. PMML)
Final Result
Data Sources
13
Collaboration (2)
Complex Scenarios
GMPPS
GMDIS
GMPPS
GMDMS
GMPRS
GMPPS
GMDMS
GMPRS
GMPPS
GMPPS
GMDMS
GMPRS
GMDMS
GMPRS
GMPPS
GMDIS
GMCMS
GMOMS
GMPRS
GMPPS
GMPPS
14
Workflow Models
Static Workflows
Dynamic Workflows
15
Dynamic Workflows
  • Dynamic Service Control Language (DSCL)
  • based on XML
  • easy to use
  • Dynamic Service Control Engine (DSCE)
  • processes the workflow according to DSCL

DSCL
DSCE
Service A
Service B
Service D
Service C
16
DSCL Control Flow
Automatic conversion
Users view
dscl
variables
composition
sequence
createService activityIDact1
parallel
invoke activityIDact2.1
invoke activityIDact2.2
sequence

17
Graphical User Interface End-User Level
18
(No Transcript)
19
(No Transcript)
20
  • Administration Level

21
(No Transcript)
22
Grid Data Mediation Service Example Scenario
  • Heterogeneities
  • Name in A is Alexander Wöhrer
  • Name in C has to be combined
  • Distribution
  • 3 data sources

23
Grid Data Mediation Service - Architecture
24
OLAP (On-Line Analytical Processing)
Research Objectives High-Performance
Grid OLAP Services
25
Requirements
  • Operation on large data sets
  • Centralized OLAP Service (parallel computing
    power can be included)
  • Distributed OLAP service
  • Federation of autonomous distributed OLAP services

26
Development Strategy
OE
Network
OE
OE
OLAP Engine
OE
27
Development Strategy (2)
  • Precondition No open-source OLAP system
    available
  • Decision development (in Java) from scratch
  • Advantage motivation for research activities
    addressing all facets
  • Disadvantage a possible long implementation
    curve
  • First step centralized sequential Grid OLAP
    service

28
Towards Centralized Service
OLAP
Workflow Engine
DSCL, OMML
OMML
XML
GUI
Mediator
PMML
PMML
RD
XMLD
CSV
Data Mining Engine
29
Distributed OLAP Aggregation of Compute and
Storage Resources
Tuple Stream
30
OLAP Caching
31
Federated OLAPMotivating Example
  • Effective management of a network requires
    collecting, correlating, and analyzing a variety
    of network trace data.
  • Analysis of flow data collecting at each router
    and stored in a local data warehouse adjacent
    to the router is a challenging application.
  • All flow information is conceptually part of a
    single relation with the following schema
  • Flow ( RouterId, SourceIP, SourcePort,
    SourceMask, SourceAS, DestIP, DestPort,
    DestMask, DestAS, StartTime, EndTime, NumPackets,
    NumBytes)

32
OLAP Federation
33
GridMiner Current Architecture
User environment
Web
Grid
34
Towards an Open Service System
35
Implementation/Technology
  • Globus 3.2
  • OGSA/DAI version 5
  • GUI Workflow constructions/Results
    visualization (JGraph, Java web Start, Java
    server pages)
  • Service Configurators (Java server pages)
  • Workflow management DSCE Client (OGSA)
  • Knowledge base Configurations (XML,OWL)
  • Data mediation service (OGSA/DAI)

36
GridMiner People and Areas
Data Preprocessing Michaela Pfeifer
Former Members Jürgen Hofer (until 07/2003
Early GT3-based Prototype DIGIDT Case Study)
37
DIALOGUE ProjectData Integration Applications
Linking Organizations to Gain Understanding and
Experience
  • University of Edinburgh (Project Leader)
  • Malcolm Atkinson
  • Cutter (Ohio State University, Columbus)
  • Joel Saltz
  • Bioinformatics (Indiana University, Bloomington)
  • Beth Plale
  • San Diego Supercomputing Center
  • Chaitan Baru
  • GridMiner
  • Peter Brezany
  • Kick-off Workshop August 2005, Univ. of Ohio

38
Within Austrian Grid Adaptive Semantic Data
Integration
39
Project Schedules
  • GridMiner (2003-2005)
  • a follow-up project proposal in preparation phase
  • Austrian Grid (Workpackage 4b) (2005-2006)
  • DIALOGUE (2005-2006)
Write a Comment
User Comments (0)
About PowerShow.com