# Data Mining Output: Knowledge Representation - PowerPoint PPT Presentation

PPT – Data Mining Output: Knowledge Representation PowerPoint presentation | free to download - id: 14ce2c-ZDE2N

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Data Mining Output: Knowledge Representation

Description:

### There are many different ways of representing patterns ... Figure 3.1 Decision tree for a simple disjunction. If a and b then x. If c and d then x ... – PowerPoint PPT presentation

Number of Views:540
Avg rating:3.0/5.0
Slides: 31
Provided by: Sus598
Category:
Tags:
Transcript and Presenter's Notes

Title: Data Mining Output: Knowledge Representation

1
Data Mining Output Knowledge Representation
• Chapter 3

2
Representing Structural Patterns
• There are many different ways of representing
patterns
• 2 covered in Chapter 1 decision trees and
classification rules
• Learned pattern is a form of knowledge
representation (even if the knowledge does not
seem very impressive)

3
Decision Trees
• Make decisions by following branches down the
tree until a leaf is found
• Classification based on contents of leaf
• Non-leaf node usually involve testing a single
attribute
• Usually for different values of nominal
attributes, or for range of a numeric attribute
(most commonly a two way split, gt some value and
lt same value)
• Less commonly, compare two attribute values, or
some function of multiple attributes
• Common for an attribute once used to not be used
at a lower level of same branch

4
Decision Trees
• Missing Values
• May be treated as another possible value of a
nominal attribute if missing data may mean
something
• May follow most popular branch when data is
missing from test data
• More complicated approach rather than going
all-or-nothing, can split the test instance in
proportion to popularity of branches in test data
recombination at end will use vote based on
weights

5
Classification Rules
• Popular alternative to decision trees
• LHS / antecedent / precondition tests to
determine if rule is applicable
• Tests usually ANDed together
• Could be general logical condition (AND/OR/NOT)
but learning such rules is MUCH less constrained
• RHS / consequent / conclusion answer usually
the class (but could be a probability
distribution)
• Rules with the same conclusion essentially
represent an OR
• Rules may be an ordered set, or independent
• If independent, policy may need to be established
for if more than one rule matches (conflict
resolution strategy) or if no rule matches

6
Rules / Trees
• Rules can be easily created from a tree but not
the most simple set of rules
• Transforming rules into a tree is not
straightforward (see replicated subtree problem
next two slides)
• In many cases rules are more compact than trees
particularly if default rule is possible
• Rules may appear to be independent nuggets of
knowledge (and hence less complicated than trees)
but if rules are an ordered set, then they are
much more complicated than they appear

7
Figure 3.1 Decision tree for a simple
disjunction.
If a and b then x If c and d then x
8
Figure 3.3 Decision tree with a replicated
subtree.
If x1 and y1 then class a If z1 and w1
then class a Otherwise class b
Each gray triangle actually contains the whole
gray subtree below
9
Association Rules
• Association Rules are not intended to be used
together as a set in fact value is in the
knowledge probably no automatic use of rules
• Large numbers of possible rules

10
Association Rule Evaluation
• Coverage the number of instances for which it
predicts correctly also called support
• Accuracy proportion of instances that it
predicts correctly also called confidence
• Coverage sometimes expressed as percent of the
total instances
• Usually methods or users specify minimum coverage
and accuracy for rules to be generated
• Some possible rules imply others present the
strongest supported

11
Example My Weather Apriori Algorithm
• Apriori
• Minimum support 0.15
• Minimum metric ltconfidencegt 0.9
• Number of cycles performed 17
• Best rules found
• 1. outlookrainy 5 gt playno 5 conf(1)
• 2. temperaturecool 4 gt humiditynormal 4
conf(1)
• 3. temperaturehot windyFALSE 3 gt playno 3
conf(1)
• 4. temperaturehot playno 3 gt windyFALSE 3
conf(1)
• 5. outlookrainy windyFALSE 3 gt playno 3
conf(1)
• 6. outlookrainy humiditynormal 3 gt playno 3
conf(1)
• 7. outlookrainy temperaturemild 3 gt playno
3 conf(1)
• 8. temperaturemild playno 3 gt outlookrainy
3 conf(1)
• 9. temperaturehot humidityhigh windyFALSE 2
gt playno 2 conf(1)
• 10. temperaturehot humidityhigh playno 2 gt
windyFALSE 2 conf(1)

12
Rules with Exceptions
• Skip

13
Rules involving Relations
• More than the value for attributes may be
important
• See book example on next slide

14
Figure 3.6 The shapes problem.
15
More Complicated Winstons Blocks World
• House 3 sided block 4 sided block AND 3 sided
is on top of 4 sided
• Solutions frequently involve learning rules that
include variables/parameters
• E.g. 3sided(block1) 4sided(block2)
ontopof(block1,block2) ? house

16
Easier and Sometimes Useful
• Introduce new attributes during data preparation
• New attribute represents relationship
• E.g. for the standing / lying task could
introduce new boolean attribute widthgreater?
• which would be filled in for each instance
during data prep
• E.g. in numeric weather, could introduce
WindChill based on calculations from
temperature and wind speed (if numeric) or Heat
Index based on temperature and humidity

17
Numeric Prediction
• Standard for comparison for numeric prediction is
the statistical technique of regression
• E.g. for the CPU performance data the regression
equation below was derived
• PRP
• - 56.1
• 0.049 MYCT
• 0.015 MMIN
• 0.006 MMAX
• 0.630 CACH
• - 0.270 CHMIN
• 1.46 CHMAX

18
Trees for Numeric Prediction
• Tree branches as in a decision tree (may be based
on ranges of attributes)
• Regression Tree leaf nodes contain average of
training set values that the leaf applies to
• Model Tree leaf nodes contain regression
equations for the instances that the leaf applies
to

19
Figure 3.7(b) Models for the CPU performance
data regression tree.
20
Figure 3.7(c) Models for the CPU performance
data model tree.
21
Instance Based Representation
• Concept not really represented (except via
examples)
• Real World Example some radio stations dont
define what they play by words, they play promos
basically saying WXXX music is ltsongsgt
• Training examples are merely stored (kind of like
rote learning)
• Answers are given by finding the most similar
training example(s) to test instance at testing
time
• Has been called lazy learning no work until

22
Instance Based Finding Most Similar Example
• Nearest Neighbor each new instance is compared
to all other instances, with a distance
calculated for each attribute for each instance
• Class of nearest neighbor instance is used as the
prediction ltsee next slide and come backgt
• OR K-nearest neighbors vote, or weighted vote
• Combination of distances city block or
euclidean (crow flies)

23
Nearest Neighbor
• x
• x

x
• y
• x

x
• y
• x
• x

T
• x
• y
• y
• z
• y
• z
• z
• y
• z
• z
• y

x
• z
• y
• y
• z
• z
• z

24
• Distance/ Similarity function must deal with
binaries/nominals usually by all or nothing
match but mild should be a better match to hot
than cool is!
• Distance / Similarity function is simpler if data
is normalized in advance. E.g. 10 difference in
household income is not significant, while 1.0
distance in GPA is big
• Distance/Similarity function should weight
different attributes differently key task is
determining those weights

25
Further Wrinkles
• May not need to save all instances
• Very normal instances may not all need be be
saved
• Some approaches actually do some generalization

26
But
• Not really a structural pattern that can be
pointed to
• However, many people in many task/domains will
respect arguments based on previous cases
(diagnosis, law among them)
• Book points out that instances distance metric
combine to form class boundaries
• With 2 attributes, these can actually be
envisioned ltsee next slidegt

27
Figure 3.8 Different ways of partitioning the
instance space.
(a)
(b)
(c)
(d)
28
Clustering
• Clusters may be able to be represented
graphically
• If dimensionality is high, best representation
may only be tabular showing which instances are
in which clusters
• Show Weka do njcrimenominal with EM and then do
visualization of results
• In some algorithms associate instances with
clusters probabilistically for every instance,
list probability of membership in each of the
clusters
• Some algorithms produce a hierarchy of clusters
and these can be visualized using a tree diagram
• After clustering, clusters may be used as class
for classification

29
Figure 3.9 Different ways of representing
clusters.
(a)
(b)
1 2 3 a 0.4 0.1
0.5 b 0.1 0.8 0.1 c
0.3 0.3 0.4 d 0.1 0.1
0.8 e 0.4 0.2 0.4 f 0.1 0.4
0.5 g 0.7 0.2 0.1 h
0.5 0.4 0.1
(c)
(d)
30
End Chapter 3