Data Mining Using IBM Intelligent Miner - PowerPoint PPT Presentation

About This Presentation
Title:

Data Mining Using IBM Intelligent Miner

Description:

Knowledge discovery (mining) in databases (KDD), data/pattern analysis, ... Cluster Weblog data to discover groups of similar access patterns. Data Mining & Privacy ... – PowerPoint PPT presentation

Number of Views:739
Avg rating:3.0/5.0
Slides: 43
Provided by: Temp110
Category:

less

Transcript and Presenter's Notes

Title: Data Mining Using IBM Intelligent Miner


1
Data Mining Using IBM Intelligent Miner
  • Presented by
  • Qiyan (Jennifer ) Huang

2
Outline
  • Introduction
  • Mining Process
  • Main Functionalities of Intelligent Miner
  • Other Data Mining Products
  • Data Mining and Privacy
  • Summary
  • References

3
What is Data Mining
  • Data mining discovering interesting patterns
    from large amounts of data
  • Knowledge discovery (mining) in databases (KDD),
    data/pattern analysis, information harvesting,
    business intelligence, etc.

4
Evolution of Database Technology
  • 1960s
  • Data collection, database creation
  • 1970s
  • Relational data model, relational DBMS
    implementation
  • 1980s present
  • RDBMS, advanced data models 1990s2000s
  • Data mining and data warehousing, multimedia
    databases, and Web databases

5
Data Mining VS. Database Query
  • Database
  • Data Mining
  • Identify customers who have purchased more than
    10,000 in the last month.
  • Find all customers who have purchased milk
  • Identify customers with similar buying habits.
    (Clustering)
  • Find all items which are frequently purchased
    with milk. (association rules)

6
Data Mining Process (KDD)
Knowledge
Pattern Evaluation
Data Mining
Task-relevant Data
Selection
Data Warehouse
Data Cleaning
J. Han. and M. Kamber. Data Mining Concepts and
Techniques,2001
Databases
7
About DB2 Intelligent Miner
  • DB2 Intelligent Miner for Data focused on the
    large-scale mining, such as large volumes of
    data, parallel data mining on Windows NT, Sun
    Solaris, and OS/390 IBM

8
Main Functionalities
  • Cluster analysis
  • Group the data that share similar trends and
    patterns
  • Classification
  • Predict the outcome based on historical data
  • Association analysis
  • Finding frequent patterns.

9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
Classification
This follows an example from Quinlans ID3
19
(No Transcript)
20
Classification
21
Classification
This follows an example from Quinlans ID3
22
Association
  • Association Rule identifies relationships
  • Example
  • 30 customers buy shirts in all the
    transactions, 60 of these customers
  • will also by a tie
  • Confidence factor is 60
  • Support if buying shirt and tie together is
    observed in 12 of all transactions, then the
    support is thus 12
  • Lift 60 / 302

23
Association
  • Support Confidence Type Lift Rule
    Body Rule Head
  • () ()
  • 5.5286 34.0800 2.7300 203
    1207 gt 1716
  • 7.0388 34.1300 2.7400 203
    1719 gt 1716
  • 5.4662 34.1700 2.7400 202
    802 gt 1716
  • 5.8805 34.3400 2.7500 203
    802 gt 1716
  • 5.0163 34.4900 2.7600 203
    705 gt 1716
  • 7.1279 34.7400 2.7800 202
    1718 gt 1716
  • 5.8226 34.7600 3.3900 711
    203 gt 710
  • 5.0697 34.8300 2.7400 202
    1702 gt 1703
  • 5.2836 34.8300 2.7400 202
    1207 gt 1703
  • 5.4350 34.9400 3.4100 201
    711 gt 710
  • 5.3459 35.0200 2.7600 201
    1702 gt 1703

24
Data Mining Products
  • more than 50 commercial data mining tools
  • Wide range of pricing
  • SAS Institutes Enterprise Miner 80k
  • SPSS Inc. Clementine 75K
  • IBM Intelligent Miner 60k
  • Desktop products start at few hundred dollars

25
Data Mining Products
Data Ming Product Comparison on Algorithm
Algorithm IBM SAS SPSS
Neural Network v v v
Decision Tree v v v
Clustering v v
Association v v
Nearest Neighbour v
Kohonen Self- Organizing Map v v
26
Data Mining Privacy
  • Release limited subset of data
  • Hide attributes that potentially related to
    personal information
  • Release Encrypted Data
  • Audit to detect misuse of Data
  • Set up Data Mining Controller

27
Summary
  • Introduction to Data Mining
  • A KDD Data Mining Process
  • Functionalities of Intelligent Miner
  • Commercial Data Mining Tools
  • Data Mining Privacy

28
References
  • Angoss Whitepaper
  • http//www.angoss.com/ProdServ/AnalyticalTools/ks
    eeker/whitepaper.html. Retrieved on Oct26th,2003
  • C. Clifton. D. Marks Security and Privacy
    Implications of Data Ming.1996
  • D.W. Abbott, I. P. Matkovsky J. F. Elder IV. An
    Evaluation of High-end Data Mining Tools
  • Elder Research. http//www.rgrossman.com/faq/dm-02
    .htm. Retrieved on Oct28th,2003
  • IBM. BD2 Intelligent Mine.
  • http//www-3.ibm.com/software/data/iminer/.
  • Retrieved on Oct26th,2003
  • J. F. Elder D. W. Abbott. August, 1988 A
    comparison of Leading Data Mining Tools
  • J. Han. and M. Kamber. Data Mining Concepts and
    Techniques, 2000
  • http//www.cald.cs.cmu.edu/summerschool03/PrivacyP
    reservingDM.ppt Retrieved on Nov 10th,2003
  • Robert Grossman http//www.datamininglab.com/tool
    comp.htmlcomparison. Retrieved on Oct20th,2003
  • SPSS. http//www.spss.com/. Retrieved on
    Nov12th,2003

29
(No Transcript)
30
Evolution of Database Technology
  • 1960s
  • Data collection, database creation, and network
    DBMS
  • 1970s
  • Relational data model, relational DBMS
    implementation
  • 1980s
  • RDBMS, advanced data models 1990s2000s
  • Data mining and data warehousing, multimedia
    databases, and Web databases

31
Data Mining On What Kind of Data?
  • Data Sources
  • Relational database
  • Data warehouses
  • Transactional databases
  • WWW
  • Data types
  • Audio
  • Image
  • Text

32
Output A Decision Tree for buys_computer
age?
lt30
overcast
gt40
30..40
student?
credit rating?
yes
no
yes
fair
excellent
no
no
yes
yes
33
Neural network
34
Neural network
35
Neural network
36
Applications of Clustering
  • Pattern Recognition
  • Image Processing
  • Economic Science (especially market research)
  • WWW
  • Document classification
  • Cluster Weblog data to discover groups of similar
    access patterns

37
Data Mining Privacy

Data Mining Tool
Mining Controller
Data warehouse
38
Examples of Clustering Applications
  • Marketing Help marketers discover distinct
    groups in their customer bases, and then use this
    knowledge to develop targeted marketing programs
  • Insurance Identifying groups of motor insurance
    policy holders with a high average claim cost
  • City-planning Identifying groups of houses
    according to their house type, value, and
    geographical location
  • Earth-quake studies Observed earth quake
    epicenters should be clustered along continent
    faults

39
Association
  • Association and pattern analysis
  • Applications
  • Basket data analysis, cross-marketing, catalog
    design, loss-leader analysis, clustering,
    classification, etc.
  • Examples.
  • buys(x, diapers) buys(x, beers) 0.5,
    60
  • major(x, CS) takes(x, DB) grade(x, A)
    1, 75

40
Data Mining On What Kind of Data?
  • Relational databases
  • Data warehouses
  • Transactional databases
  • Advanced DB and information repositories
  • Object-oriented and object-relational databases
  • Text databases and multimedia databases
  • Heterogeneous and legacy databases
  • WWW

41
Steps of a KDD Process
  • Learning the application domain
  • relevant prior knowledge and goals of application
  • Creating a target data set data selection
  • Data cleaning and preprocessing (may take 60 of
    effort!)
  • Data reduction and transformation
  • Find useful features, dimensionality/variable
    reduction, invariant representation.
  • Choosing functions of data mining
  • summarization, classification, regression,
    association, clustering.
  • Choosing the mining algorithm(s)
  • Data mining search for patterns of interest
  • Pattern evaluation and knowledge presentation
  • visualization, transformation, removing redundant
    patterns, etc.
  • Use of discovered knowledge

42
Strength and Weakness
  • Strength
  • Algorithm breadth
  • Graphical output
  • Available for PC and mainframe environment
  • Weakness
  • No automation
  • Data has to reside in IBMs database system
Write a Comment
User Comments (0)
About PowerShow.com