Data%20Mining:%20A%20Closer%20Look - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Data%20Mining:%20A%20Closer%20Look

Description:

Data Mining: A Closer Look Chapter 2 – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 42
Provided by: admini1002
Learn more at: http://sparc.nfu.edu.tw
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Data%20Mining:%20A%20Closer%20Look


1
Data Mining A Closer Look
  • Chapter 2

2
2.1 Data Mining Strategies
3
(No Transcript)
4
Classification
  • Learning is supervised.
  • The dependent variable is categorical.
  • Well-defined classes.
  • Current rather than future behavior.

5
Estimation
  • Learning is supervised.
  • The dependent variable is numeric.
  • Well-defined classes.
  • Current rather than future behavior.

6
Prediction
  • The emphasis is on predicting future rather
    than current outcomes.
  • The output attribute may be categorical or
    numeric.

7
The Cardiology Patient Dataset
8
(No Transcript)
9
(No Transcript)
10
A Healthy Class Rule for the Cardiology Patient
Dataset
  • IF 169 lt Maximum Heart Rate lt202
  • THEN Concept Class Healthy
  • Rule accuracy 85.07
  • Rule coverage 34.55

11
A Sick Class Rule for the Cardiology Patient
Dataset
  • IF Thal Rev Chest Pain Type Asymptomatic
  • THEN Concept Class Sick
  • Rule accuracy 91.14
  • Rule coverage 52.17

12
Unsupervised Clustering
  • Determine if concepts can be found in the data.
  • Evaluate the likely performance of a supervised
    model.
  • Determine a best set of input attributes for
    supervised learning.
  • Detect Outliers.

13
Market Basket Analysis
  • Find interesting relationships among retail
    products.
  • Uses association rule algorithms.

14
2.2 Supervised Data Mining Techniques
15
The Credit Card Promotion Database
16
(No Transcript)
17
A Hypothesis for the Credit Card Promotion
Database
  • A combination of one or more of the dataset
    attributes differentiate Acme Credit Card Company
    card holders who have taken advantage of the life
    insurance promotion and those card holders who
    have chosen not to participate in the promotional
    offer.

18
A Production Rule for theCredit Card Promotion
Database
  • IF Sex Female 19 ltAge lt 43
  • THEN Life Insurance Promotion Yes
  • Rule Accuracy 100.00
  • Rule Coverage 66.67

19
Production Rules
  • Rule accuracy is a between-class measure.
  • Rule coverage is a within-class measure.

20
Neural Networks
21
(No Transcript)
22
(No Transcript)
23
Statistical Regression
  • Life insurance promotion
  • 0.5909 (credit card insurance) -
  • 0.5455 (sex) 0.7727

24
2.3 Association Rules
25
An Association Rule for the Credit Card Promotion
Database
  • IF Sex Female Age over40
  • Credit Card Insurance No
  • THEN Life Insurance Promotion Yes

26
2.4 Clustering Techniques
27
(No Transcript)
28
2.5 Evaluating Performance
29
Evaluating Supervised Learner Models
30
Confusion Matrix(????)
  • A matrix used to summarize the results of a
    supervised classification.
  • Entries along the main diagonal are correct
    classifications.
  • Entries other than those on the main diagonal
    are classification errors.

31
????
??
32
Two-Class Error Analysis(???????)
33
(No Transcript)
34
(No Transcript)
35
Evaluating Numeric Output
  • Mean absolute error
  • Mean squared error
  • Root mean squared error

36
Comparing Models by Measuring Lift (???)
37
(No Transcript)
38
Computing Lift
39
(No Transcript)
40
(No Transcript)
41
Unsupervised Model Evaluation
About PowerShow.com