Data Mining Output: Knowledge Representation - PowerPoint PPT Presentation

Loading...

PPT – Data Mining Output: Knowledge Representation PowerPoint presentation | free to download - id: 14ce2c-ZDE2N



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Data Mining Output: Knowledge Representation

Description:

There are many different ways of representing patterns ... Figure 3.1 Decision tree for a simple disjunction. If a and b then x. If c and d then x ... – PowerPoint PPT presentation

Number of Views:540
Avg rating:3.0/5.0
Slides: 31
Provided by: Sus598
Learn more at: http://www.lasalle.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Data Mining Output: Knowledge Representation


1
Data Mining Output Knowledge Representation
  • Chapter 3

2
Representing Structural Patterns
  • There are many different ways of representing
    patterns
  • 2 covered in Chapter 1 decision trees and
    classification rules
  • Learned pattern is a form of knowledge
    representation (even if the knowledge does not
    seem very impressive)

3
Decision Trees
  • Make decisions by following branches down the
    tree until a leaf is found
  • Classification based on contents of leaf
  • Non-leaf node usually involve testing a single
    attribute
  • Usually for different values of nominal
    attributes, or for range of a numeric attribute
    (most commonly a two way split, gt some value and
    lt same value)
  • Less commonly, compare two attribute values, or
    some function of multiple attributes
  • Common for an attribute once used to not be used
    at a lower level of same branch

4
Decision Trees
  • Missing Values
  • May be treated as another possible value of a
    nominal attribute if missing data may mean
    something
  • May follow most popular branch when data is
    missing from test data
  • More complicated approach rather than going
    all-or-nothing, can split the test instance in
    proportion to popularity of branches in test data
    recombination at end will use vote based on
    weights

5
Classification Rules
  • Popular alternative to decision trees
  • LHS / antecedent / precondition tests to
    determine if rule is applicable
  • Tests usually ANDed together
  • Could be general logical condition (AND/OR/NOT)
    but learning such rules is MUCH less constrained
  • RHS / consequent / conclusion answer usually
    the class (but could be a probability
    distribution)
  • Rules with the same conclusion essentially
    represent an OR
  • Rules may be an ordered set, or independent
  • If independent, policy may need to be established
    for if more than one rule matches (conflict
    resolution strategy) or if no rule matches

6
Rules / Trees
  • Rules can be easily created from a tree but not
    the most simple set of rules
  • Transforming rules into a tree is not
    straightforward (see replicated subtree problem
    next two slides)
  • In many cases rules are more compact than trees
    particularly if default rule is possible
  • Rules may appear to be independent nuggets of
    knowledge (and hence less complicated than trees)
    but if rules are an ordered set, then they are
    much more complicated than they appear

7
Figure 3.1 Decision tree for a simple
disjunction.
If a and b then x If c and d then x
8
Figure 3.3 Decision tree with a replicated
subtree.
If x1 and y1 then class a If z1 and w1
then class a Otherwise class b
Each gray triangle actually contains the whole
gray subtree below
9
Association Rules
  • Association Rules are not intended to be used
    together as a set in fact value is in the
    knowledge probably no automatic use of rules
  • Large numbers of possible rules

10
Association Rule Evaluation
  • Coverage the number of instances for which it
    predicts correctly also called support
  • Accuracy proportion of instances that it
    predicts correctly also called confidence
  • Coverage sometimes expressed as percent of the
    total instances
  • Usually methods or users specify minimum coverage
    and accuracy for rules to be generated
  • Some possible rules imply others present the
    strongest supported

11
Example My Weather Apriori Algorithm
  • Apriori
  • Minimum support 0.15
  • Minimum metric ltconfidencegt 0.9
  • Number of cycles performed 17
  • Best rules found
  • 1. outlookrainy 5 gt playno 5 conf(1)
  • 2. temperaturecool 4 gt humiditynormal 4
    conf(1)
  • 3. temperaturehot windyFALSE 3 gt playno 3
    conf(1)
  • 4. temperaturehot playno 3 gt windyFALSE 3
    conf(1)
  • 5. outlookrainy windyFALSE 3 gt playno 3
    conf(1)
  • 6. outlookrainy humiditynormal 3 gt playno 3
    conf(1)
  • 7. outlookrainy temperaturemild 3 gt playno
    3 conf(1)
  • 8. temperaturemild playno 3 gt outlookrainy
    3 conf(1)
  • 9. temperaturehot humidityhigh windyFALSE 2
    gt playno 2 conf(1)
  • 10. temperaturehot humidityhigh playno 2 gt
    windyFALSE 2 conf(1)

12
Rules with Exceptions
  • Skip

13
Rules involving Relations
  • More than the value for attributes may be
    important
  • See book example on next slide

14
Figure 3.6 The shapes problem.
Shaded standing Unshaded lying
15
More Complicated Winstons Blocks World
  • House 3 sided block 4 sided block AND 3 sided
    is on top of 4 sided
  • Solutions frequently involve learning rules that
    include variables/parameters
  • E.g. 3sided(block1) 4sided(block2)
    ontopof(block1,block2) ? house

16
Easier and Sometimes Useful
  • Introduce new attributes during data preparation
  • New attribute represents relationship
  • E.g. for the standing / lying task could
    introduce new boolean attribute widthgreater?
  • which would be filled in for each instance
    during data prep
  • E.g. in numeric weather, could introduce
    WindChill based on calculations from
    temperature and wind speed (if numeric) or Heat
    Index based on temperature and humidity

17
Numeric Prediction
  • Standard for comparison for numeric prediction is
    the statistical technique of regression
  • E.g. for the CPU performance data the regression
    equation below was derived
  • PRP
  • - 56.1
  • 0.049 MYCT
  • 0.015 MMIN
  • 0.006 MMAX
  • 0.630 CACH
  • - 0.270 CHMIN
  • 1.46 CHMAX

18
Trees for Numeric Prediction
  • Tree branches as in a decision tree (may be based
    on ranges of attributes)
  • Regression Tree leaf nodes contain average of
    training set values that the leaf applies to
  • Model Tree leaf nodes contain regression
    equations for the instances that the leaf applies
    to

19
Figure 3.7(b) Models for the CPU performance
data regression tree.
20
Figure 3.7(c) Models for the CPU performance
data model tree.
21
Instance Based Representation
  • Concept not really represented (except via
    examples)
  • Real World Example some radio stations dont
    define what they play by words, they play promos
    basically saying WXXX music is ltsongsgt
  • Training examples are merely stored (kind of like
    rote learning)
  • Answers are given by finding the most similar
    training example(s) to test instance at testing
    time
  • Has been called lazy learning no work until
    an answer is needed

22
Instance Based Finding Most Similar Example
  • Nearest Neighbor each new instance is compared
    to all other instances, with a distance
    calculated for each attribute for each instance
  • Class of nearest neighbor instance is used as the
    prediction ltsee next slide and come backgt
  • OR K-nearest neighbors vote, or weighted vote
  • Combination of distances city block or
    euclidean (crow flies)

23
Nearest Neighbor
  • x
  • x

x
  • y
  • x

x
  • y
  • x
  • x

T
  • x
  • y
  • y
  • z
  • y
  • z
  • z
  • y
  • z
  • z
  • y

x
  • z
  • y
  • y
  • z
  • z
  • z

24
Additional Details
  • Distance/ Similarity function must deal with
    binaries/nominals usually by all or nothing
    match but mild should be a better match to hot
    than cool is!
  • Distance / Similarity function is simpler if data
    is normalized in advance. E.g. 10 difference in
    household income is not significant, while 1.0
    distance in GPA is big
  • Distance/Similarity function should weight
    different attributes differently key task is
    determining those weights

25
Further Wrinkles
  • May not need to save all instances
  • Very normal instances may not all need be be
    saved
  • Some approaches actually do some generalization

26
But …
  • Not really a structural pattern that can be
    pointed to
  • However, many people in many task/domains will
    respect arguments based on previous cases
    (diagnosis, law among them)
  • Book points out that instances distance metric
    combine to form class boundaries
  • With 2 attributes, these can actually be
    envisioned ltsee next slidegt

27
Figure 3.8 Different ways of partitioning the
instance space.
(a)
(b)
(c)
(d)
28
Clustering
  • Clusters may be able to be represented
    graphically
  • If dimensionality is high, best representation
    may only be tabular showing which instances are
    in which clusters
  • Show Weka do njcrimenominal with EM and then do
    visualization of results
  • In some algorithms associate instances with
    clusters probabilistically for every instance,
    list probability of membership in each of the
    clusters
  • Some algorithms produce a hierarchy of clusters
    and these can be visualized using a tree diagram
  • After clustering, clusters may be used as class
    for classification

29
Figure 3.9 Different ways of representing
clusters.
(a)
(b)
1 2 3 a 0.4 0.1
0.5 b 0.1 0.8 0.1 c
0.3 0.3 0.4 d 0.1 0.1
0.8 e 0.4 0.2 0.4 f 0.1 0.4
0.5 g 0.7 0.2 0.1 h
0.5 0.4 0.1 …
(c)
(d)
30
End Chapter 3
About PowerShow.com