Discovering Fuzzy Classification Rules using Genetic Network Programming - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Discovering Fuzzy Classification Rules using Genetic Network Programming

Description:

... cost of mailing by targeting a set of consumers likely to buy a new cell-phone product. ... a more global search in the space of candidate membership functions and ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 46
Provided by: karlat2
Category:

less

Transcript and Presenter's Notes

Title: Discovering Fuzzy Classification Rules using Genetic Network Programming


1
Discovering Fuzzy Classification Rules using
Genetic Network Programming
Karla Taboada
2
Contents
  • What is Data Mining?
  • Why Data Mining?
  • Data mining tasks.
  • Genetic Network Programming (GNP).
  • GNP for association rule mining.
  • GNP-Fuzzy data mining method
  • for classification.
  • Simulation results.
  • Conclusions

3
  • Introduction to Data Mining

4
Why Mine Data?
  • Lots of data is being collected and warehoused
  • Web data, e-commerce
  • purchases at department/grocery stores
  • Bank/Credit Card transactions.
  • Computers have become cheaper and more powerful
  • Competitive Pressure is Strong
  • Provide better, customized services for an edge
  • (e.g. in Customer Relationship Management)

5
Why Mine Data?
  • Data collected and stored at enormous speeds
    (GB/hour)
  • remote sensors on a satellite
  • telescopes scanning the skies
  • microarrays generating gene expression data
  • scientific simulations generating terabytes of
    data
  • Traditional techniques infeasible for raw data.

6
Knowledge Discovery in Databases
  • The abundance of data, coupled with the need for
    powerful data analysis tools, has been described
    as a data-rich but information-poor situation.

How do you explore millions of records, tens or
hundreds of fields, and find patterns?
7
Knowledge Discovery in Databases
  • Knowledge Discovery in Databases is the
    non-trivial process of identifying valid, novel,
    potentially useful, and ultimately understandable
    patterns in data.

8
What is Data Mining?
  • Process of semi-automatically analyzing large
    databases to find patterns that are
  • valid hold on new data with some certainty.
  • novel non-obvious to the system.
  • useful should be possible to act on the item .
  • understandable humans should be able to
    interpret the pattern.

9
Why Data Mining
  • Credit ratings/targeted marketing
  • Given a database of 100,000 names, which persons
    are the least likely to default on their credit
    cards?
  • Identify likely responders to sales promotions
  • Fraud detection
  • Which types of transactions are likely to be
    fraudulent, given the demographics and
    transactional history of a particular customer?
  • Customer relationship management
  • Which of my customers are likely to be the most
    loyal, and which are most likely to leave for a
    competitor?

Data Mining helps extract such information
10
Data Mining Tasks
  • Classification
  • Clustering
  • Association Rule Discovery
  • Sequential Pattern Discovery
  • Regression
  • Deviation Detection

11
(No Transcript)
12
Classification Application
  • Direct Marketing
  • Goal Reduce cost of mailing by targeting a set
    of consumers likely to buy a new cell-phone
    product.
  • Approach
  • Use the data for a similar product introduced
    before.
  • We know which customers decided to buy and which
    decided otherwise. This buy, dont buy decision
    forms the class attribute.
  • Collect various demographic, lifestyle, and
    company-interaction related information about all
    such customers (type of business, where they
    stay, how much they earn, etc).
  • Use this information as input attributes to learn
    a classifier model.

13
Market Basket Example
Association Rule Mining
?
Where should detergents be placed in the store to
maximize their sales?
?
Are window cleaning products purchased when
detergents and orange juice are bought together?
?
Is soda typically purchased with bananas? Does
the brand of soda make a difference?
?
How are the demographics of the neighborhood
affecting what customers are buying?
14
Association Rule Mining
  • Searches for interesting relationships among
    items in a given data set.
  • Support and confidence are the two most important
    quality measures for evaluating the
    interestingness of an
  • association rule.
  • An association rule is an implication of the
    form
  • X ? Y, where X, Y ? I, and X ?Y ?

Example When a customer buys bread and butter,
they buy milk 85 of the time.

15
Association Rule Mining
Rules Discovered Milk --gt Cereal
Diaper, Milk --gt Beer
Milk and cereal selltogether!
Applications Catalog design, store layout,
cross-marketing
16
Genetic Network Programming
  • (GNP)

17
Genetic Network Programming (GNP)
GNP is an extension of Genetic Algorithms (GA)
and Genetic Programming (GP).
  • The main difference between them
  • is the representation of the solution
  • GA evolves strings as solutions and it is mainly
    applied to optimization problems.
  • GP expands the expression ability of GA by using
    tree structures.
  • GNP uses directed graph structures as solutions,
    therefore GNP can deal with complex problems more
    effective and efficient than GA and GP.

Processing node Judgment node Start node
18
(Roulette, tournament and elite selection are
established in GNP.)
Reproduction
19
(No Transcript)
20
(No Transcript)
21
GNP for class association rule mining
22
Objective
Propose a data mining method for dealing
continuous values based on Genetic Network
Programming (GNP) and Fuzzy Set Theory.
GNP
GNP
Fuzzy Classification Rules
A1_High gt Z1 A4_Med ? A7_Low gt Z2
23
Empowering classical association rules
Why Fuzzy Association Rules?
The original idea derives from dealing with
continuous attributes, where discretization of
the continuous values into intervals would lead
to under or overestimating values near the
borders. This is called the sharp boundary
problem.
  • Can help to overcome this problem by allowing
    different degrees of membership, not only 1 and
    0.
  • Has been shown to be a very useful tool because
    the mined rules are expressed in linguistic
    terms, which are more natural and understandable
    for human beings

Fuzzy Sets Theory
24
Extraction of Association Rules using GNP
  • GNP examines the attribute values of database
    using judgment nodes.
  • GNP calculates the measurements of association
    rules using processing nodes.
  • The connections of judgment nodes are represented
    as association rules.

P1
N
Yes
Yes
Yes
Yes
c
d
a
b
A41
A11
A31
A21
c(C)
d(C)
b(C)
a(C)
No
GNP structure for class association rule mining
25
Extraction of Association Rules using GNP
(C 0, 1, , K)
P1
N
Yes
Yes
Yes
Yes
c
d
a
b
A41
A11
A31
A21
c(C)
d(C)
b(C)
a(C)
No
26
Extraction of Fuzzy Classification Rules using GNP
Class Association Rules
(C 0, 1, , K)
P1
N
Yes
Yes
Yes
Yes
c
d
a
b
A1_High
A3_Mid
A4_High
A2_Low
c(C)
d(C)
b(C)
a(C)
No
P2
Yes No
27
Fuzzy Classification Rules using GNP
Our proposed model consists of two major phases
1) Generating fuzzy class association rules by
using Genetic Network Programming. 2) Building a
classifier model based on the extracted fuzzy
rules. In the first phase, the task is to
extract fuzzy class association rules from a
fuzzy training set using a GNP-based algorithm.
Moreover, the fuzzy membership functions are
evolved by non-uniform mutation in every
generation in order to perform a more global
search in the space of candidate membership
functions and therefore enable to discover new
fuzzy rules. In the second phase, all of the
generated fuzzy rules in the pool are used to
predict the class of the test set. For each test
data, the classifier computes the average
distance between the data and the rules in each
class. Finally, the class with the smallest
distance is assigned to the test data.
28
GNP-Fuzzy DM method
Fuzzy membership functions for handling
continuous attributes
Sample Database
Low Medium High
Young Middle Old
1
Salary attribute
Age attribute
Database with the fuzzy membership values
29
GNP-Fuzzy DM method
Extraction of Fuzzy Association Rules using GNP
Probability to moving to Yes-side
Fuzzy values are used as probabilities for the
transition of judgment nodes.
Pb0.8
Pb0.7
Pb0.75
Pb0.9
P1
1
1
1
1
1
A2_Low
A3_Med
A4_High
A1_High
TID 1
Pb0.7
Pb0.65
Pb0.2
Pb0.7
P1
2
1
1
2
2
A2_Low
A3_Med
A4_High
A1_High
TID 2
30
GNP-Fuzzy DM method
Extract fuzzy rules through generations
  • Each fuzzy rule is stored with
  • x2 value.
  • Support.
  • Fuzzy parameters.

31
Fuzzy Classification Rules using GNP
Each run of the algorithm discovers fuzzy rules
for a single class, therefore the algorithm must
run K1 times, where K1 is the number of classes.
Pool Class 0
Pool Class K
...
Pool Class 2
Pool Class 1
Pool of fuzzy rules per each class in the DB.
32
Fuzzy Classification Rules using GNP
For each test data, the classifier computes the
average distance between the data and the rules
in each class. Finally, the class with the
smallest distance is assigned to the test data.
Test set
Pool Class 1
Pool Class K
Pool Class 2
Pool Class 1
33
Fuzzy Classification Rules using GNP
Therefore, the classification of test data d is
determined as follows
34
Fuzzy Classification Rules using GNP
Therefore, the classification of test data d is
determined as follows
35
Experimental results
36
Experimental results
  • We have evaluated our proposed method across
    three public-domain data sets from the UCI data
    set repository. The results reported below were
    produced by using a 10-fold cross validation
    procedure.
  • Population size 120.
  • Number of processing nodes 20.
  • Number of judgment nodes 200.
  • Number of generations 100.
  • x2 6.63
  • supmin0.01, 0.05, 0.1, 0.15, 0.25. 0.3
  • anew 150
  • rc15/78
  • rm11/3
  • rm21/5
  • All algorithms were coded in Java. Experiments
    were
  • done on a 1.50GHz Pentium M with 504MB RAM.

37
Experimental results
Heart stat-log DB 303 records 14 attributes.
38
Experimental results
Heart stat-log DB 303 records 14 attributes.
39
Experimental results
Ionosphere DB 351 records 35 attributes.
40
Experimental results
Ionosphere DB 351 records 35 attributes.
41
Experimental results
CRX DB 351 records 35 attributes.
42
Experimental results
CRX DB 351 records 35 attributes.
43
Experimental results
In order to evaluate the performance of our
proposed method we have compared it to other
evolutionary system found in the literature
CEFR-MINER 1.
Table 5. Accuracy rate, in .
1 R. Mendes, A. Freitas, Fuzzy Classification
Rules with Genetic Programming and Co-Evolution,
Conference on Principles of Data Mining and
Knowledge Discovery, 2001.
44
Conclusions
  • Compared with traditional classification rules,
    fuzzy rules provide good linguistic explanation
    and can deal with both discrete and continuous
    attributes.
  • A method for discovering fuzzy classification
    rules using GNP has been proposed. We have
    performed experiments and estimated the
    performance of the GNP based method.
  • Extract important association rules through
    generations.
  • The pool is updated in every generation replacing
    an association rule with lower x2 value by the
    same association rules with higher x2 value.
  • The final result of the evolutionary process is a
    fuzzy rule set and a set of fuzzy membership
    functions.
  • The results have shown that the GNP based method
    extracts important association rules in the
    database effectively and obtain good results in
    comparison wit other methods.

45
Fin
  • Thank you very much.
  • Any question?
Write a Comment
User Comments (0)
About PowerShow.com