An Excelbased Data Mining Tool - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

An Excelbased Data Mining Tool

Description:

Step 1: Enter the Data to be Mined. Step 2: Perform a Data Mining Session ... It is possible to run RuleMaker without running the mining algorithm again. ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 36
Provided by: xx197
Category:

less

Transcript and Presenter's Notes

Title: An Excelbased Data Mining Tool


1
  • An Excel-based Data Mining Tool
  • iDA

2
(No Transcript)
3
(No Transcript)
4
ESX A Multipurpose Tool for Data Mining
5
The Algorithmic Logic Behind ESX
  • Given
  • A set of existing concept-level nodes C1, ..., Cn
  • An average class resemblance score S
  • A new instance I to be classified
  • Classify I with the concept class that will
    improve S the most, or hurt S the least.
  • If learning is unsupervised, create a new concept
    node with I alone if it results in a better S
    score.

6
iDAV Format for Data Mining
  • iDA attribute/value format
  • First row attribute names
  • Second row attribute type identifier
  • C categorical, R real (real stands for any
    numeric field)
  • Third row attribute usage identifier
  • I input, O output, U unused D display only
  • Forth row test set data

7
(No Transcript)
8
(No Transcript)
9
A Five-step Approach for Unsupervised Clustering
  • Step 1 Enter the Data to be Mined
  • Step 2 Perform a Data Mining Session
  • Step 3 Read and Interpret Summary Results
  • Step 4 Read and Interpret Individual Class
    Results
  • Step 5 Visualize Individual Class Rules

10
Step 1 Enter The Data To Be Mined
11
Step 2 Perform A Data Mining Session
  • iDA -gt begin mining session
  • Select instance similarity and real-valued
    tolerance setting

12
RuleMaker Settings
13
Step 3 Read and Interpret Summary Results
  • Class Resemblance Scores
  • Similarity of instances in the class
  • Domain Resemblance Score
  • Similarity of instances in the entire set
  • Cluster Quality
  • Class resemblance with reference to domain
    resemblance (clusters should be at least as good
    as the domain)

14
Step 3 Results about Attributes
  • Categorical
  • Domain Predictability
  • Given categorical attribute A with possible
    values v1,..,vn, domain predictability gives the
    number of instances that has A equal to vi (if
    domain predictability score is close to 100,
    most of the instances have the same value, and
    the attribute is not very valuable for learning
    purposes)
  • Numeric
  • Attribute significance
  • Given attribute A, find the range of class means,
    and divide by the domain standard deviation
    (higher values are better for differentiation
    purposes)

15
(No Transcript)
16
(No Transcript)
17
Step 4 Read and Interpret Individual Class
Results
  • Class Predictability is a within-class measure.
  • Given class C and categorical attribute A with
    possible values v1,..,vn, class predictability
    gives the percent of instances that has A equal
    to vi in C
  • Class Predictiveness is a between-class measure.
  • Given class C and categorical attribute A with
    possible values v1,..,vn, class predictiveness
    for vi is the probability that an instance
    belongs to C given it has value vi for A.

18
(No Transcript)
19
Necessary and Sufficient Conditions
  • A predictiveness score of 1.0 tells us that all
    instances with the particular attribute value
    belong to this particular class.
  • gt Attribute v is a sufficient condition for
    membership in this class.
  • A predictability score of 1.0 tells us that all
    the instances in this class have Attribute v.
  • gt Attribute v is a necessary condition for
    membership in this class.

20
Necessary and/or Sufficient Conditions
  • If both predictability and predictiveness scores
    are 1.0, the particular value for the attribute
    is necessary and sufficient for class membership.
  • ESX outputs necessary and sufficient attribute
    values that meet a particular cut-off (0.80) as
    highly necessary and highly sufficient.

21
(No Transcript)
22
Step 5 Visualize Individual Class Rules
23
RuleMaker Settings
  • Recall that we used the setting to ask RuleMaker
    to generate all rules. This is a good way to
    learn about the nature of the problem at hand.

24
A Six-Step Approach for Supervised Learning
  • Step 1 Choose an Output Attribute
  • Step 2 Perform the Mining Session
  • Step 3 Read and Interpret Summary Results
  • Step 4 Read and Interpret Test Set Results
  • Step 5 Read and Interpret Class Results
  • Step 6 Visualize and Interpret Class Rules

25
Perform the Mining Session
  • Decide on the size of the training set.
  • The remaining items will be used by the software
    to test the model that is developed (and
    evaluation results will be reported).

26
Read and Interpret Summary Results
  • The worksheet RES SUM contains summary
    information.
  • Class resemblance scores, attribute summary
    information (categorical and numerical) and most
    commonly occurring attributes for each class are
    given.

27
Read and Interpret Test Set Results
28
Read and Interpret Test Set Results
  • Worksheets RES TST, RES MTX
  • Reports performance on the test set (which was
    not part of model training)
  • RES MTX reports confusion matrix
  • RES TST reports for each instance in the test set
    the models classification, and whether it is
    accurate or not.

29
Read and Interpret Class Results
  • As individual clusters are of interest in
    unsupervised learning, the information about
    individual classes is relevant in supervised
    learning.
  • Worksheet RES CLS contains the information.
  • Most and least typical instances are also given
    here.
  • The worksheet RUL TYP gives typicality scores
    for all of the instances in the test set.

30
Visualize and Interpret Class Rules
  • All rules or covering set of rules?
  • Worksheet RES Rul contains the rules generated by
    RuleMaker
  • If all rules are generated, there might be
    overlapping coverage.
  • The covering set algorithm works iteratively, by
    identifying the best covering rule and updating
    the instance set to be covered.
  • It is possible to run RuleMaker without running
    the mining algorithm again. This menu item can be
    used to change the RuleMaker settings to generate
    alternative rule sets.

31
Generating Rules The General Idea
  • Choose an attribute that differentiates all
    domain/subclass instances best.
  • Use the attribute to subdivide instances into
    classes.
  • For each subclass
  • If the instances meet a predefined criteria,
    generate a defining rule for the subclass.
  • If the predefined criteria is not met, go to Step
    1.

32
Techniques for Generating Rules
  • Define the scope of the rules.
  • Choose the instances.
  • Set the minimum rule correctness.
  • Define the minimum rule coverage.
  • Choose an attribute significance value.

33
Instance Typicality
  • Typicality Scores
  • Identify prototypical and outlier instances.
  • Select a best set of training instances.
  • Used to compute individual instance
    classification confidence scores.

34
(No Transcript)
35
Special Considerations and Features
  • Avoid Mining Delays
  • The Quick Mine Feature
  • Erroneous and Missing Data
Write a Comment
User Comments (0)
About PowerShow.com