Daniel Delic, HansJ' Lenz, Mattis Neiling - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Daniel Delic, HansJ' Lenz, Mattis Neiling

Description:

Gill. yes. Ford. no. Brown. no. Adams. Decision attribute(s) heart problem. Universe. persons ... X2 = {Ford} Y1 = {Adams, Brown, Gill} ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 31
Provided by: pc67343
Category:
Tags: hansj | daniel | delic | gill | lenz | mattis | neiling

less

Transcript and Presenter's Notes

Title: Daniel Delic, HansJ' Lenz, Mattis Neiling


1
Mining Association Rules with Rough Sets and
Large Itemsets - A Comparative Study
  • Daniel Delic, Hans-J. Lenz, Mattis Neiling
  • Free University of Berlin
  • Institute of Applied Computer Science
  • Garystr. 21, D-14195 Berlin, Germany

2
Two different methods for the extraction of
association rules
  • Large itemset method (e.g. Apriori)
  • Rough set method

1 INTRODUCTION
3
  • Introduction
  • Large Itemset Method
  • Rough Set Method
  • Comparison of the Procedures
  • Hybrid Procedure Apriori
  • Summary
  • Outlook
  • References

4
LARGE ITEMSET METHOD
2 LARGE ITEMSET METHOD
5
  • Type of analyzable data
  • "Market basket data" ? Attributes with boolean
    domains
  • Stored in table ? Each row representing a market
    basket

2 LARGE ITEMSET METHOD
6
  • Large k-Itemset generation with Apriori
  • Minimum support 40

2 LARGE ITEMSET METHOD
7
2 LARGE ITEMSET METHOD
8
Step 3
  • Large 2-Itemsets
  • Spaghetti, Tomato Sauce
  • Spaghetti, Bread
  • Tomato Sauce, Bread
  • Candidate 3-Itemsets
  • Spaghetti, Tomato Sauce,Bread ? support 1
    20

  • Large 3-Itemsets

2 LARGE ITEMSET METHOD
9
2 LARGE ITEMSET METHOD
10
ROUGH SET METHOD
11
  • Type of analyseable data
  • Attributes which can have more than two values
  • Predefined set of condition attributes and
    decision attribute(s)
  • Stored in table ? each row containing values of
    the predefined attributes

3 ROUGH SET METHOD
12
Deriving association rules with rough sets
Step 1
Creating partitions over U Partition U divided
into subsets (equivalence classes) induced by
equivalence relations
3 ROUGH SET METHOD
13
Examples of Equivalence relations R1 (u,
v)u and v have the same temperature R2 (u,
v)u and v have the same blood pressure R3
(u, v)u and v have the same temperature and
blood pressure R4 (u, v)u and v have the
same heart problem
3 ROUGH SET METHOD
14
  • Partition R3
  • Induced by equivalence relation R3 (based on
    condition attributes)
  • R3 (u, v)u and v have the same temperature
    and blood pressure
  • R3 ? R3 X1, X2, X3 with
  • X1 Adams, Brown, X2 Ford, X3 Gill,
    Bellows

3 ROUGH SET METHOD
15
Partition R4 Induced by equivalence relation
R4 (based on decision attribute(s)) R4 (u,
v)u and v have the same heart problem R4 ? R4
Y1, Y2 with Y1 Adams, Brown, Gill, Y2
Ford, Bellows
3 ROUGH SET METHOD
16
Step 2
  • Defining the approximation space
  • overlapping the partitions created by the
    equivalence relations
  • Result 3 distinct regions in the approximation
    space
  • Positive region POSS(Yj) Uxi?Yj Xi X1
  • Boundary region BNDS(Yj) Uxi?Yj?? Xi X3
  • Negative region NEGS(Yj) Uxi?Yj? Xi X2

3 ROUGH SET METHOD
17
X1
Y1
  • Rules from positive region (POSS(Yj) Uxi?Yj Xi
    )
  • Example for POSS(Y1)
  • X1 Adams, Brown ? Y1 Adams, Brown, Gill
  • ? Clear rule (confidence 100, support 40)
  • If temperature normal and blood pressure low then
    heart problem no

3 ROUGH SET METHOD
18
Y1
X3
  • Rules from boundary region (BNDS(Yj) Uxi?Yj??
    Xi )
  • Example for BNDS(Y1)
  • X3 Gill, Bellows ? Y1 Adams, Brown, Gill
  • ? possible rule (confidence ?, support 20)
  • If temperature high and blood pressure high then
    heart problem no
  • ? confidence c Xi ? Yj / Xj X3 ? Y1 /
    X3 1 / 2 0,5 50

3 ROUGH SET METHOD
19
Y1
X2
  • Negative region (NEGS(Yj) Uxi?Yj? Xi )
  • Example for NEGS(Y1)
  • X2 Ford ? Y1 Adams, Brown, Gill
  • ? since X2 ? Y1 ?, no rule derivable from the
    negative region

3 ROUGH SET METHOD
20
Reducts ? Simplification of rules by removal of
unecessary attributes
?
Original rule If temperature normal and blood
pressure low then heart problem no Simplified
(more precise) rule If blood pressure low then
heart problem no
3 ROUGH SET METHOD
21
COMPARISON OF THE PROCEDURES
22
  • Prerequisites for comparison of both methods
  • modification of rough set method (RS-Rules)
  • ? no fixed decision attribute required
    (RS-Rules)
  • Compatible data structure ? Bitmaps

4 DATA TRANSFORMATION
23
  • Benchmark data sets1
  • Car Evaluation Database 1728 tuples, 25 bitmap
    attributes
  • Mushroom Database 8416 tuples, 12 original
    attributes selected,
  • 68 bitmap attributes
  • Adult 32561 tuples, 12 original attributes
    selected, 61 bitmap attributes
  • Results
  • almost similar results for all examined tables
  • exceptions reducts
  • ? Quality of rough set rules better (more
    precise rules)

1 UCI Repository of Machine Learning Database and
Domain Theories (URL ftp.ics.uci.edu/pub/machine-
learning-databases 2 Algorithms written in Visual
Basic 6.0, executed on Win98 PC with AMD K6-2/400
processor
5 COMPARISON OF THE PROCEDURES
24
HYBRID PROCEDURE Apriori
6 HYBRID PROCEDURE Apriori
25
  • Hybrid Method Apriori
  • based on Apriori
  • capable of extracting reducts
  • capable of deriving rules based on predefined
    decision attribute
  • Comparison Results (Apriori compared to
    RS-Rules)
  • identical rules

6 HYBRID PROCEDURE Apriori
26
SUMMARY
27
  • creation of a compatible datatype for both
    methods
  • comparison of both methods
  • RS-Rules derived rules that were more precise
    (due to reducts) than those derived by Apriori
  • Apriori derived same rules as RS-Rules
  • Computing times in favor of the large itemset
    methods
  • Conclusion Combination of both original methods
    best solution

7 CONCLUSION
28
OUTLOOK
29
  • More Interesting Capabilities of Rough Sets
  • Analysing dependencies between rules
  • Analysing the impact of one special condition
    attribute on the
  • decision attribute(s)
  • Idea
  • Enhancing the data mining capabilities of
    Apriori by those further
  • rough set features
  • ? Result A powerful and efficient data mining
    application (?)

8 OUTLOOK
30
References
  • Agrawal, R. and Srikant, S. (1994). Fast
    Algorithms for Mining Association Rules in Large
    Databases. In VLDB94, 487499. Morgan Kaufmann.
  • Düntsch, I. and Gediga G. (1999). Rough set data
    analysis.
  • Munakata, T. (1998). Rough Sets. In Fundamentals
    of the New Artificial Intelligence, 140182. New
    York Springer-Verlag.
Write a Comment
User Comments (0)
About PowerShow.com