Data Mining: Potentials and Challenges - PowerPoint PPT Presentation

About This Presentation
Title:

Data Mining: Potentials and Challenges

Description:

SAS Enterprise Miner: Sophisticated Statisticians segment ... Emergence of the application of data mining in non-conventional domains ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 14
Provided by: Ra131
Category:

less

Transcript and Presenter's Notes

Title: Data Mining: Potentials and Challenges


1
Data MiningPotentials and Challenges
  • Rakesh Agrawal Jeff Ullman

2
Observations
  • Transfer of data mining research into deployed
    applications and commercial products
  • Greater success in vertical applications
  • Horizontal tools Examples
  • SAS Enterprise Miner Sophisticated Statisticians
    segment
  • DB2 Intelligent Miner database applications
    requiring mining
  • Emergence of the application of data mining in
    non-conventional domains
  • Combination of structured and unstructured data
  • New challenges due to security/privacy concerns
  • DARPA initiative to fund data mining research

3
Identifying Social Links Using Association Rules
Input Crawl of about 1 million pages
4
Website Profiling using Classification
Input Example pages for each category during
training
5
Discovering Trends Using Sequential Patterns
Shape Queries
Input i) patent database ii) shape of interest
6
Discovering Micro-communities
Frequently co-cited pages are related. Pages
with large bibliographic overlap are related.
7
New Challenges
  • Privacy-preserving data mining
  • Data mining over compartmentalized databases

8
Inducing Classifiers over Privacy Preserved
Numeric Data
Alices age
Alices salary
Johns age
30 becomes 65 (3035)
9
Other recent work
  • Cryptographic approach to privacy-preserving data
    mining
  • Lindell Pinkas, Crypto 2000
  • Privacy-Preserving discovery of association rules
  • Vaidya Clifton, KDD2002
  • Evfimievski et. Al, KDD 2002
  • Rizvi Haritsa, VLDB 2002

10
Computation over Compartmentalized Databases
11
Some Hard Problems
  • Past may be a poor predictor of future
  • Abrupt changes
  • Wrong training examples
  • Actionable patterns (principled use of domain
    knowledge?)
  • Over-fitting vs. not missing the rare nuggets
  • Richer patterns
  • Simultaneous mining over multiple data types
  • When to use which algorithm?
  • Automatic, data-dependent selection of algorithm
    parameters

12
Discussion
  • Should data mining be viewed as rich querying
    and deeply integrated with database systems?
  • Most of current work make little use of database
    functionality
  • Should analytics be an integral concern of
    database systems?
  • Issues in data mining over heterogeneous data
    repositories (Relationship to the heterogeneous
    systems discussion)

13
Summary
  • Data mining has shown promise but needs much more
    further research

We stand on the brink of great new answers, but
even more, of great new questions -- Matt Ridley
Write a Comment
User Comments (0)
About PowerShow.com