Affordable Knowledge Discovery Through Distributed Data Mining - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Affordable Knowledge Discovery Through Distributed Data Mining

Description:

albany. Coal (t) county. B. C. CATA 2002. 12. ADDM Example: ... albany. Low elevation (E) Low cancer rate (D) Coal (C) High elevation (B) High cancer rate (A) ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 18
Provided by: reg102
Category:

less

Transcript and Presenter's Notes

Title: Affordable Knowledge Discovery Through Distributed Data Mining


1
Affordable Knowledge Discovery Through
Distributed Data Mining
  • Rex E. Gantenbein and Chris Sung
  • Computer Science Department
  • University of Wyoming

2
What Is Data Mining?
Data mining is the extraction of interesting
information or patterns from data in large
databases Also known as Knowledge discovery in
databases (KDD), Knowledge extraction,
Data/pattern analysis, Data archeology, Data
dredging, Information harvesting, Business
intelligence
3
Data Mining A Confluence of Multiple Disciplines
Database Technology
Statistics
Data Mining
Machine Learning
Visualization
Information Science
Other Disciplines
4
What is Distributed Data Mining?
  • Data Mining in Distributed and Parallel Computing
    Environment
  • Data Mining
  • Distributed Parallel Processing

5
What is Distributed Data Mining?
  • Data Mining (DM)
  • Data Warehousing (DW)
  • Distributed Computing (DC)

6
How can we distribute data mining?
  • (S/M) Instruction (S/M) Data
  • Server / Client Vs. P2P
  • Homogeneous Vs. Heterogeneous
  • Local to Global Vs. Local and Global

7
DC(2) A Survey of Distributed KDD Approaches
8
What is the problem?
  • Accuracy
  • DC(-) / DW()
  • Performance
  • DM, DW(-) / DC, Virtual DW()
  • Affordability
  • DM, DW(-) / DC, Virtual DW()

9
Affordable Distributed Data Mining (ADDM) approach
  • Virtual DW used as an integrator only
  • Assumes heterogeneous and homogeneous databases
    exist
  • No aggregation
  • Locally networked
  • No relational DB server

10
ADDM Architecture
11
ADDM Example distributed DB
A
B
C
12
ADDM Example build virtual DW
B
A
C
13
ADDM Example data preparation
A
B
C
Revised DW meta data
14
ADDM Example Affinity Grouping
Probabilities of 3 items and their combinations
15
ADDM Example mining result
Measure for Association Rules
P(condition and result) Confidence
(highest) -----------------------------------
P(condition) P(condition and
result) Improvement (gt1)
------------------------------------
P(condition) P(result)
16
Summary
  • Goal provide an architecture for Distributed DM
  • Basic communications
  • Will support a variety of DM techniques
  • Satisfy accuracy, performance, affordability
    constraints
  • Limitation available only in particular DM
    Environments
  • (Visual C, Windows, OLE DB (ADO))

17
Future Work
  • Extend to more general DM environments (Java,
    ODBC)
  • Upgrade the architecture to other distributed DM
    protocols (Suns jxta)
Write a Comment
User Comments (0)
About PowerShow.com