An Experiment with Fuzzy Sets in Data Mining International Conference on Computational Science Beiji - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

An Experiment with Fuzzy Sets in Data Mining International Conference on Computational Science Beiji

Description:

An Experiment with Fuzzy Sets in Data Mining. International ... De and Krishna [2002] User transactions, recommend products. Measured transaction similarity ... – PowerPoint PPT presentation

Number of Views:185
Avg rating:3.0/5.0
Slides: 36
Provided by: CBA478
Category:

less

Transcript and Presenter's Notes

Title: An Experiment with Fuzzy Sets in Data Mining International Conference on Computational Science Beiji


1
An Experiment with Fuzzy Sets in Data
MiningInternational Conference on Computational
ScienceBeijing, 2007
  • David L. Olson University of Nebraska
  • Helen Moshkovich University of Montevallo
  • Alexander Mechitov University of Montevallo

2
Data Mining Uncertainty
  • Data mining is highly useful
  • Deal with large datasets in many fields
  • Data often vague uncertain
  • Fuzzy set theory (Zadeh, 1965)
  • Rough set theory (Pawlak, 1982)
  • Probability theory (Pearl, 1988)
  • Set pair theory (Zhao, 1989)

3
Fuzzy Set Theory
  • Interval-valued fuzzy sets
  • Degree of membership an interval in 0-1 range
  • Vague sets, intuitionist sets essentially the
    same
  • Grey-related analysis
  • Use interval membership as part of process
  • Rough set theory

4
Purpose
  • Review variants of fuzzy sets in data mining
  • Demonstrate fuzzy set implementation in decision
    tree model
  • Examine relative number of rules, accuracy

5
Data Mining Use of Fuzzy Sets
  • Neural Networks
  • Pattern Classification
  • Cluster Analysis
  • Genetic Algorithms
  • Association Rules

6
Fuzzy Sets in Neural Networks
  • Simpson 1992
  • method using neural networks to classify fuzzy
    data
  • Min-max function determined degree of membership
  • Generalization of k-nearest-neighbor classifier
  • Fuzzy min-max neural network classifier proved to
    be at least as good as traditional methods

7
Other Fuzzy Neural Networks
  • Zhang et al. 2000
  • Procedure to process numerical linguistic data
  • Hu et al. 2004
  • Two-phased method
  • Build fuzzy knowledge base from transactional
    data
  • Find weights through single-layer perceptron
  • Fuzzy Linguistic input customer evaluations

8
Neural Networks
  • Can be applied to many data mining applications
  • Prediction
  • Classification
  • Clustering (self-organizing maps)
  • Work relatively better over data with nonlinear
    relationships
  • Including complex interactions

9
Fuzzy Pattern Classification
  • Abe 1995
  • Generated fuzzy rules over variable fuzzy regions
  • Used attribute hyperboxes
  • Classification data
  • License plate recognition
  • More rules, greater accuracy
  • Compared with fuzzy min-max neural network
    approach of Simpson 1992
  • Neural networks better if data more complex

10
Fuzzy Pattern Classification
  • Liu et al (1999)
  • Fuzzy matching
  • Discover patterns against expectations
  • Sought to identify more interesting patterns
  • Led to fuzzy association rule generation

11
Fuzzy Linear Programming
  • Discriminant analysis
  • Find cutoff (or cutoffs) between categories
  • DEA fits fuzzy well

12
Cluster Analysis
  • Drobics et al. 2002
  • 3-stage approach
  • Self-organizing maps represent input data
  • Fuzzy c-means clustering used on cleaned data
    used to display fuzzy clusters
  • Fuzzy rules generated inductively
  • Tested on classification data, image segmentation

13
Clustering Web Data
  • De and Krishna 2002
  • User transactions, recommend products
  • Measured transaction similarity
  • Fuzzy proximity relations basis of clusters
  • Le 2003
  • Fuzzy logic to assess Website popularity
    satisfaction
  • Association rules
  • Lee and Liu 2004
  • Framework for information retrieval filtering,
    Internet shopping
  • Agents used to fuzzify data
  • Neural network model to select products

14
Fuzzy Clustering Methods
  • K-Means
  • Fuzzy c-means
  • Hierarchical
  • Bayesian Classification
  • ROUGH SET CLUSTERING
  • Put indiscernible objects together
  • If similarity index below threshold,
    indiscernible
  • If higher, fewer clusters
  • Weighted sum of
  • Numerical Euclidean
  • Nominal Hamming distance

15
Genetic Algorithms
  • Bruha et al. 2000
  • Method to process symbolic attributes
  • CN4 beam search technology to categorize
    numerical attributes
  • Data fuzzified 0, uncertain, 1
  • Genetic learning algorithm used to process each
    observation into best-fitting category
  • Used on credit screening data
  • Fuzzy expected to be better, but insignificant
  • More hypotheses significantly better, but much
    greater computational support required

16
Association Rules
  • If PRECEDENT then CONSEQUENT (single output
    result)
  • Support
  • degree to which relationship appears in the data
  • Confidence
  • Probability that if precedent occurs, consequence
    will occur

17
Fuzzy Association Rules
  • Many based on APriori algorithm
  • Treat all attributes (or at least linguistic) as
    uniform
  • Lower support and confidence requirements lowers
    algorithm efficiency
  • Generates many uninteresting rules
  • Gyenesie 2000 used weighted quantitative
    association rules based on fuzzy data

18
Rough SetsPawlak 1991 book
  • Bayes theorem statistical inference
  • Given the number of times an unknown event has
    happened and failed,
  • What is the chance that the probability of its
    happening in a single trial lies between stated
    probability limits
  • Rough Set Theory
  • Doesnt refer to prior or posterior probabilities
  • Reveals probabilistic structure of data
  • ANY DATA SET SATISFIES THE TOTAL PROBABILITY
    THEOREM
  • BAYES THEOREM CAN BE USED TO DRAW CONCLUSIONS
    FROM THE DATA
  • Can invert implications (give reasons for
    decisions)_

19
Bayes Theorem
  • H hypothesis
  • D data
  • PrHD PrDH ? PrH/PrD
  • PrH probabilistic statement of belief of H
    before obtaining data D (prior)
  • PrHD becomes probabilistic statement of belief
    about H after obtaining D (posterior)
  • Given PrDH and PrD, can learn from data

20
Information System
  • Data table of Universe, Attributes, Values
  • Attributes C condition D decision
  • IF C THEN D
  • Each row of decision table determines decisions
    that must be taken when specified conditions
    satisfied
  • Crisp no conflict (universal truth)
  • Rough boundaries to certainty

21
Decision Table
22
Concepts
  • Support
  • Number of cases
  • Strength
  • Support / Universe
  • Certainty Factor
  • Conditional probability of D given C
  • (true / true false) for case
  • Coverage Factor
  • Conditional probability of C given D
  • Inverse decision rule

23
Calculations 1st Row
  • Young, Green, OK (250 of 1000)
  • Strength 250/1000 0.250
  • Certainty 250/25050 0.833
  • Coverage 250/25010044040 0.301
  • Young, Green, Problem (50 of 1000)
  • Strength 50/1000 0.050
  • Certainty 50/25050 0.167
  • Coverage 50/501001010 0.294

24
Results
25
Data
  • Shi et al., International Journal of Information
    Technology Decision Making 2005
  • Bank credit card data (1990s)
  • 6000 observations (5040 good, 960 bad)
  • Outcome good, bad
  • 64 Explanatory variables
  • 9 binary
  • 3 categorical
  • 52 continuous
  • Generated decision tree rules

26
Data Mining Software Supporting Fuzzy Data
  • PolyAnalyst
  • Claims fuzzy sets used in a number of algorithms
    (discriminant analysis)
  • Inserts boundaries to data
  • User doesnt do anything
  • See5
  • User selects fuzzy option
  • Inserts buffer at boundaries
  • Based on sensitivity of classification to small
    changes in threshold

27
Controls
  • PolyAnalyst
  • Minimum Support/minimum association
  • Minimum part of transactions that should contain
    a basket of products
  • Too high no clusters - Default 1
  • Minimum Confidence
  • Probability if A then B
  • Too high no rules - Default 50
  • Minimum Improvement
  • How much better confidence of association rule is
    to random
  • Default 1

28
Clementine Apriori Controls
  • Minimum rule support
  • Percentage of records for which antecedent true
  • Minimum rule confidence
  • Of the records where rule antecedents are true,
    percentage where consequent ture
  • Maximum number of antecendents
  • Confidence ratio
  • Ratio of rule confidence to prior confidence

29
See5 Controls (tree)
  • Pruning confidence factor
  • Smaller values prune more of the tree
  • Minimum cases
  • Number of cases (support) required to keep rule
  • Too high fits training data less
  • Locked data for 5 replications
  • Pruning 10 - most 20 30 40 - least
  • MinSupport 10, 20, 30
  • So 60 runs replicated for crisp fuzzy,
    ordinal, ordinal-fuzzy, categorical

30
Example Crisp Model
  • RULE 1 IF revtoPayNov 11.441 THEN good
  • RULE 2 IF RevtoPayNov gt 11.441 AND
  • IF CoverBal3 1 THEN good
  • RULE 3 IF RevtoPayNov gt 11.441 AND
  • IF CoverBal3 0 AND
  • IF OpentoBuyDec gt 5.35129 THEN good
  • RULE 4 IF RevtoPayNov gt 11.441 AND
  • IF CoverBal3 0 AND
  • IF OpentoBuyDec 5.35129 AND
  • IF NumPurchDec 2.30259 THEN bad
  • ELSE good

31
Example Fuzzy Model
  • RULE 1 IF revtoPayNov 11.50565 THEN good
  • RULE 2 IF RevtoPayNov gt 11.50565 AND
  • IF CoverBal3 1 THEN good
  • RULE 3 IF RevtoPayNov gt 11.50565 AND
  • IF CoverBal3 0 AND
  • IF OpentoBuyDec gt 5.351905 THEN good
  • RULE 4 IF RevtoPayNov gt 11.50565 AND
  • IF CoverBal3 0 AND
  • IF OpentoBuyDec 5.351905 AND
  • IF NumPurchDec 2.64916 THEN bad
  • ELSE good

32
Rules
33
Error of 3000 test cases
34
Fuzzy Data Mining
  • Fuzzy set theory found in almost every area of
    data mining
  • Appropriate if
  • Large scale databases
  • Uncertain relationships
  • One approach
  • Partition data into categories to create fuzzy
    grids

35
Conclusions
  • Fuzzy representation very appropriate
  • Humans perceive a great deal of uncertainty
  • A number of ways to incorporate fuzzy ideas
  • Fuzzifying data loses some detail
  • Ordinal could yield more robust models
  • Not necessarily more accurate
  • Humans could guess direction wrong
  • More pruning will focus on more interesting rules
  • Regardless of whether fuzzy or not
  • Categorical data more robust in this set of tests
  • Ordinal data treatment should be even better
  • FUZZIFYING DATA DOES NOT SEEM TO MAKE LESS
    ACCURATE
Write a Comment
User Comments (0)
About PowerShow.com