Soft Computing, Machine Intelligence and Data Mining - PowerPoint PPT Presentation

Loading...

PPT – Soft Computing, Machine Intelligence and Data Mining PowerPoint presentation | free to view - id: 78d85f-NmYyY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Soft Computing, Machine Intelligence and Data Mining

Description:

Soft Computing, Machine Intelligence and Data Mining Sankar K. Pal Machine Intelligence Unit Indian Statistical Institute, Calcutta http://www.isical.ac.in/~sankar – PowerPoint PPT presentation

Number of Views:181
Avg rating:3.0/5.0
Slides: 83
Provided by: PC359
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Soft Computing, Machine Intelligence and Data Mining


1
Soft Computing, Machine Intelligence and Data
Mining
  • Sankar K. Pal
  • Machine Intelligence Unit
  • Indian Statistical Institute, Calcutta
  • http//www.isical.ac.in/sankar

2
ISI, 1931, Mahalanobis
CS
BS
MS
AS
SS
PES
SQC
ACMU ECSU MIU CVPRU
3
  • Director
  • Prof-in-charge
  • Heads
  • Distinguished Scientist
  • Professor
  • Associate Professor
  • Lecturer

Faculty 250
Courses B. Stat, M. Stat, M. Tech(CS),
M.Tech(SQC OR), Ph.D.
Location Calcutta (HQ) Delhi
Bangalore Hyderabad,
Madras Giridih, Bombay
4
MIU Activities
(Formed in March 1993)
  • Pattern Recognition and Image Processing
  • Color Image Processing
  • Data Mining
  • Data Condensation, Feature Selection
  • Support Vector Machine
  • Case Generation
  • Soft Computing
  • Fuzzy Logic, Neural Networks, Genetic Algorithms,
    Rough Sets
  • Hybridization
  • Case Based Reasoning
  • Fractals/Wavelets
  • Image Compression
  • Digital Watermarking
  • Wavelet ANN
  • Bioinformatics

5
  • Externally Funded Projects
  • INTEL
  • CSIR
  • Silicogene
  • Center for Excellence in Soft Computing Research
  • Foreign Collaborations
  • (Japan, France, Poland, Honk Kong, Australia)
  • Editorial Activities
  • Journals, Special Issues
  • Books
  • Achievements/Recognitions

Faculty 10 Research Scholar/Associate 8
6
Contents
  • What is Soft Computing ?
  • - Computational Theory of Perception
  • Pattern Recognition and Machine Intelligence
  • Relevance of Soft Computing Tools
  • Different Integrations

7
  • Emergence of Data Mining
  • Need
  • KDD Process
  • Relevance of Soft Computing Tools
  • Rule Generation/Evaluation
  • Modular Evolutionary Rough Fuzzy MLP
  • Modular Network
  • Rough Sets, Granules Rule Generation
  • Variable Mutation Operations
  • Knowledge Flow
  • Example and Merits

8
  • Rough-fuzzy Case Generation
  • Granular Computing
  • Fuzzy Granulation
  • Mapping Dependency Rules to Cases
  • Case Retrieval
  • Examples and Merits
  • Conclusions

9
SOFT COMPUTING (L. A. Zadeh)
  • Aim
  • To exploit the tolerance for imprecision
    uncertainty, approximate reasoning and partial
    truth to achieve tractability, robustness, low
    solution cost, and close resemblance with human
    like decision making
  • To find an approximate solution to an
    imprecisely/precisely formulated problem.

10
  • Parking a Car
  • Generally, a car can be parked rather easily
    because the final position of the car is not
    specified exactly. It it were specified to
    within, say, a fraction of a millimeter and a few
    seconds of arc, it would take hours or days of
    maneuvering and precise measurements of distance
    and angular position to solve the problem.
  • ? High precision carries a high cost.

11
  • ? The challenge is to exploit the tolerance for
    imprecision by devising methods of computation
    which lead to an acceptable solution at low cost.
    This, in essence, is the guiding principle of
    soft computing.

12
  • Soft Computing is a collection of methodologies
    (working synergistically, not competitively)
    which, in one form or another, reflect its
    guiding principle Exploit the tolerance for
    imprecision, uncertainty, approximate reasoning
    and partial truth to achieve Tractability,
    Robustness, and close resemblance with human
    like decision making.
  • Foundation for the conception and design of high
    MIQ (Machine IQ) systems.

13
  • Provides Flexible Information Processing
    Capability for representation and evaluation of
    various real life ambiguous and uncertain
    situations.
  • ?
  • Real World Computing
  • It may be argued that it is soft computing
    rather than hard computing that should be viewed
    as the foundation for Artificial Intelligence.

14
  • At this junction, the principal constituents
    of soft computing are Fuzzy Logic ,
    Neurocomputing , Genetic Algorithms
    and Rough Sets .

FL
GA
NC
  • Within Soft Computing FL, NC, GA, RS are
    Complementary rather than Competitive

15
  • Role of

FL the algorithms for dealing with imprecision
and uncertainty NC the machinery for learning
and curve fitting GA the algorithms for search
and optimization
handling uncertainty arising from the granularity
in the domain of discourse
16
Referring back to example Parking a Car
  • Do we use any measurement and computation while
    performing the tasks in Soft Computing?
  • We use Computational Theory of Perceptions (CTP)

17
Computational Theory of Perceptions (CTP)
AI Magazine, 22(1), 73-84, 2001
Provides capability to compute and reason with
perception based information
  • Example Car parking, driving in city, cooking
    meal, summarizing story
  • Humans have remarkable capability to perform a
    wide variety of physical and mental tasks without
    any measurement and computations

18
  • They use perceptions of time, direction, speed,
    shape, possibility, likelihood, truth, and other
    attributes of physical and mental objects
  • Reflecting the finite ability of the sensory
    organs and (finally the brain) to resolve
    details, Perceptions are inherently imprecise

19
  • Perceptions are fuzzy (F) granular
  • (both fuzzy and granular)
  • Boundaries of perceived classes are unsharp
  • Values of attributes are granulated.
  • (a clump of indistinguishable points/objects)
  • Example
  • Granules in age very young, young, not so old,
  • Granules in direction slightly left, sharp
    right,

20
  • F-granularity of perceptions puts them well
    beyond the reach of traditional methods of
    analysis (based on predicate logic and
    probability theory)
  • Main distinguishing feature the assumption that
    perceptions are described by propositions drawn
    from a natural language.

21
Hybrid Systems
  • Neuro-fuzzy
  • Genetic neural
  • Fuzzy genetic
  • Fuzzy neuro
  • genetic

Knowledge-based Systems
  • Probabilistic reasoning
  • Approximate reasoning
  • Case based reasoning

Data Driven Systems
Machine Intelligence
  • Neural network
  • system
  • Evolutionary
  • computing
  • Fuzzy logic
  • Rough sets

Non-linear Dynamics
  • Chaos theory
  • Rescaled range
  • analysis (wavelet)
  • Fractal analysis
  • Pattern recognition
  • and learning

Machine Intelligence A core concept for
grouping various advanced technologies with
Pattern Recognition and Learning
22
Pattern Recognition System (PRS)
  • Measurement ? Feature ? Decision
  • Space Space Space
  • Uncertainties arise from deficiencies of
    information available from a situation
  • Deficiencies may result from incomplete,
    imprecise, ill-defined, not fully reliable,
    vague, contradictory information in various
    stages of a PRS

23
Relevance of Fuzzy Sets in PR
  • Representing linguistically phrased input
    features for processing
  • Representing multi-class membership of ambiguous
    patterns
  • Generating rules inferences in
  • linguistic form
  • Extracting ill-defined image regions, primitives,
    properties and describing relations among them as
    fuzzy subsets

24
ANNs provide Natural Classifiers having
  • Resistance to Noise,
  • Tolerance to Distorted Patterns /Images (Ability
    to Generalize)
  • Superior Ability to Recognize Overlapping Pattern
    Classes or Classes with Highly Nonlinear
    Boundaries or Partially Occluded or Degraded
    Images
  • Potential for Parallel Processing
  • Non parametric

25
Why GAs in PR ?
  • Methods developed for Pattern Recognition and
    Image Processing are usually problem dependent.
  • Many tasks involved in analyzing/identifying a
    pattern need Appropriate Parameter Selection and
    Efficient Search in complex spaces to obtain
    Optimal Solutions

26
  • Makes the processes
  • - Computationally Intensive
  • - Possibility of Losing the Exact Solution
  • GAs Efficient, Adaptive and robust Search
    Processes, Producing near optimal solutions and
    have a large amount of Implicit Parallelism
  • GAs are Appropriate and Natural Choice for
    problems which need Optimizing Computation
    Requirements, and Robust, Fast and Close
    Approximate Solutions

27
Relevance of FL, ANN, GAs Individually
to PR Problems is Established
28
In late eighties scientists thought Why NOT
Integrations ?
Fuzzy Logic ANN ANN GA Fuzzy Logic ANN
GA Fuzzy Logic ANN GA Rough Set
Neuro-fuzzy hybridization is the most
visible integration realized so far.
29
Why Fusion
Fuzzy Set theoretic models try to mimic human
reasoning and the capability of handling
uncertainty (SW) Neural Network models attempt
to emulate architecture and information
representation scheme of human brain (HW)
NEURO-FUZZY Computing
(for More Intelligent System)
30
FUZZY SYSTEM
ANN used for learning and Adaptation
NFS
ANN
Fuzzy Sets used to Augment its Application
domain
FNN
31
MERITS
  • GENERIC
  • APPLICATION SPECIFIC

32
Rough-Fuzzy Hybridization
  • Fuzzy Set theory assigns to each object a degree
  • of belongingness (membership) to represent an
  • imprecise/vague concept.
  • The focus of rough set theory is on the
    ambiguity
  • caused by limited discernibility of objects
    (lower
  • and upper approximation of concept).

Rough sets and Fuzzy sets can be integrated to
develop a model of uncertainty stronger than
either.
33
Rough Fuzzy Hybridization A New Trend in
Decision Making, S. K. Pal and A. Skowron (eds),
Springer-Verlag, Singapore, 1999
34
Neuro-Rough Hybridization
  • Rough set models are used to generate network
  • parameters (weights).
  • Roughness is incorporated in inputs and output
    of
  • networks for uncertainty handling, performance
  • enhancement and extended domain of application.
  • Networks consisting of rough neurons are used.

Neurocomputing, Spl. Issue on Rough-Neuro
Computing, S. K. Pal, W. Pedrycz, A. Skowron and
R. Swiniarsky (eds), vol. 36 (1-4), 2001.
35
  • Neuro-Rough-Fuzzy-Genetic Hybridization
  • Rough sets are used to extract domain knowledge
    in the form of linguistic rules
    generates fuzzy Knowledge based networks
    evolved using Genetic algorithms.
  • Integration offers several advantages like fast
    training, compact network and performance
    enhancement.

36
IEEE TNN, .9, 1203-1216, 1998
Incorporate Domain Knowledge using Rough Sets
37
  • Before we describe
  • Modular Evolutionary Rough-fuzzy MLP
  • Rough-fuzzy Case Generation System
  • We explain Data Mining and the significance
  • of Pattern Recognition, Image Processing and
  • Machine Intelligence.

38
One of the applications of Information Technology
that has drawn the attention of researchers is
DATA MINING Where Pattern Recognition/Image
Processing/Machine Intelligence are directly
related.
39
Why Data Mining ?
  • Digital revolution has made digitized information
    easy to capture and fairly inexpensive to store.
  • With the development of computer hardware and
    software and the rapid computerization of
    business, huge amount of data have been collected
    and stored in centralized or distributed
    databases.
  • Data is heterogeneous (mixture of text, symbolic,
    numeric, texture, image), huge (both in
    dimension and size) and scattered.
  • The rate at which such data is stored is growing
    at a phenomenal rate.

40
  • As a result, traditional ad hoc mixtures of
    statistical techniques and data management tools
    are no longer adequate for analyzing this vast
    collection of data.

41
  • Pattern Recognition and Machine Learning
  • principles applied to a very large (both in size
  • and dimension) heterogeneous database
  • ? Data Mining
  • Data Mining Knowledge Interpretation
  • ?
    Knowledge Discovery
  • Process of identifying valid, novel, potentially
  • useful, and ultimately understandable patterns
  • in data

42
Pattern Recognition, World Scientific, 2001
Data Mining (DM)
  • Data
  • Cleaning

Machine Learning
Knowledge Interpretation
  • Data
  • Condensation

Mathe- matical Model of
Preprocessed
Useful
Huge Raw Data
  • Knowledge
  • Extraction
  • Knowledge
  • Evaluation
  • Dimensionality
  • Reduction

Knowledge
  • Classification
  • Clustering
  • Rule
  • Generation

Data
Data (Patterns)
  • Data
  • Wrapping/
  • Description

Knowledge Discovery in Database (KDD)
43
Data Mining Algorithm Components
  • Model Function of the model (e.g.,
    classification, clustering, rule generation) and
    its representational form (e.g., linear
    discriminants, neural networks, fuzzy logic, GAs,
    rough sets).
  • Preference criterion Basis for preference of
    one model or set of parameters over another.
  • Search algorithm Specification of an algorithm
    for finding particular patterns of interest (or
    models and parameters), given the data, family of
    models, and preference criterion.

44
Why Growth of Interest ?
  • Falling cost of large storage devices and
    increasing ease of collecting data over networks.
  • Availability of Robust/Efficient machine learning
    algorithms to process data.
  • Falling cost of computational power ? enabling
    use of computationally intensive methods for data
    analysis.

45
Example
  • Financial Investment Stock indices and prices,
    interest rates, credit card data, fraud detection
  • Health Care Various diagnostic information
    stored by hospital management systems.
  • Data is heterogeneous (mixture of text, symbolic,
    numeric, texture, image) and huge (both in
    dimension and size).

46
Role of Fuzzy Sets
  • Modeling of imprecise/qualitative
    knowledge
  • Transmission and handling uncertainties at
    various stages
  • Supporting, to an extent, human type
  • reasoning in natural form

47
  • Classification/ Clustering
  • Discovering association rules (describing
    interesting association relationship among
    different attributes)
  • Inferencing
  • Data summarization/condensation (abstracting the
    essence from a large amount of information).

48
Role of ANN
  • Adaptivity, robustness, parallelism, optimality
  • Machinery for learning and curve fitting (Learns
    from examples)
  • Initially, thought to be unsuitable for black
    box nature no information available in symbolic
    form (suitable for human interpretation)
  • Recently, embedded knowledge is extracted in the
    form of symbolic rules making it
    suitable for Rule generation.

49
Role of GAs
  • Robust, parallel, adaptive search methods
    suitable when the search space is large.
  • Used more in Prediction (P) than Description(D)
  • D Finding human interpretable patterns
    describing the data
  • P Using some variables or attributes in the
    database to predict unknown/ future values of
  • other variables of interest.

50
Example Medical Data
  • Numeric and textual information may be
    interspersed
  • Different symbols can be used with same meaning
  • Redundancy often exists
  • Erroneous/misspelled medical terms are common
  • Data is often sparsely distributed

51
  • Robust preprocessing system is required to
    extract any kind of knowledge from even
    medium-sized medical data sets
  • The data must not only be cleaned of errors and
    redundancy, but organized in a fashion that makes
    sense for the problem

52
  • So, We NEED
  • Efficient
  • Robust
  • Flexible
  • Machine Learning Algorithms
  • ?
  • NEED for Soft Computing Paradigm

53
Without Soft Computing Machine Intelligence
Research Remains Incomplete.
54
Modular Neural Networks
Task Split a learning task into several
subtasks, train a Subnetwork for each subtask,
integrate the subnetworks to generate the final
solution. Strategy Divide and Conquer
55
  • The approach involves
  • Effective decomposition of the problems s.t. the
  • Subproblems could be solved with compact
  • networks.
  • Effective combination and training of the
  • subnetworks s.t. there is Gain in terms of both
  • total training time, network size and accuracy
    of
  • solution.

56
Advantages
  • Accelerated training
  • The final solution network has more structured
  • components
  • Representation of individual clusters
    (irrespective
  • of size/importance) is better preserved in the
    final
  • solution network.
  • The catastrophic interference problem of neural
  • network learning (in case of overlapped
    regions)
  • is reduced.

57
Classification Problem
  • Split a k-class problem into k 2-class problems.
  • Train one (or multiple) subnetwork modules for
  • each 2-class problem.
  • Concatenate the subnetworks s.t. Intra-module
    links
  • that have already evolved are unchanged, while
  • Inter-module links are initialized to a low
    value.
  • Train the concatenated networks s.t. the Intra-
  • module links (already evolved) are less
    perturbed,
  • while the Inter-module links are more
    perturbed.

58
3-class problem 3 (2-class problem)
Class 1 Subnetwork
Class 2 Subnetwork
Class 3 Subnetwork
Integrate Subnetwork Modules
Links to be grown
Links with values preserved
Final Training Phase
I
Final Network
Inter-module links grown
59
Modular Rough Fuzzy MLP?
A modular network designed using four different
Soft Computing tools. Basic Network Model
Fuzzy MLP Rough Set theory is used to generate
Crude decision rules Representing each of the
classes from the Discernibility Matrix. (There
may be multiple rules for each class
multiple subnetworks for each class)
60
The Knowledge based subnetworks are concatenated
to form a population of initial solution
networks. The final solution network is evolved
using a GA with variable mutation operator. The
bits corresponding to the Intra-module links
(already evolved) have low mutation probability,
while Inter-module links have high mutation
probability.
61
Rough Sets
Z. Pawlak 1982, Int. J. Comp. Inf. Sci.
  • Offer mathematical tools to discover hidden
    patterns in data.
  • Fundamental principle of a rough set-based
    learning system is to discover redundancies and
    dependencies between the given features of a data
    to be classified.

62
  • Approximate a given concept both from below and
    from above, using lower and upper approximations.
  • Rough set learning algorithms can be used to
    obtain rules in IF-THEN form from a decision
    table.
  • Extract Knowledge from data base (decision table
    w.r.t. objects and attributes ? remove
    undesirable attributes (knowledge discovery) ?
    analyze data dependency ? minimum subset of
    attributes (reducts))

63
Rough Sets
Upper Approximation BX
Set X
Lower Approximation BX
xB (Granules)
.
x
xB set of all points belonging to the same
granule as of the point x
in feature space WB.
xB is the set of all points which are
indiscernible with point x in terms of feature
subset B.
64
Approximations of the set
w.r.t feature subset B
B-lower BX
Granules definitely belonging to X
B-upper BX
Granules definitely and possibly belonging to X
If BX BX, X is B-exact or B-definable Otherwise
it is Roughly definable
65
Rough Sets
Uncertainty Handling
Granular Computing
(Using information granules)
(Using lower upper approximations)
66
Granular Computing Computation is performed
using information granules and not the data
points (objects)
Information compression Computational gain
67
Information Granules and Rough Set Theoretic Rules
F2
high
medium
low
low
medium
high
F1
Rule
  • Rule provides crude description of the class
    using
  • granule

68
Rough Set Rule Generation
Decision Table
Object F1 F2 F3 F4 F5
Decision x1 1 0 1 0
1 Class 1 x2 0 0 0 0
1 Class 1 x3 1 1 1
1 1 Class 1 x4 0 1 0
1 0 Class 2 x5 1 1
1 0 0 Class 2
Discernibility Matrix (c) for Class 1
Objects x1 x2 x3
x1 f F1, F3 F2, F4
x2 f F1,F2,F3,F4
x3 f
69
Discernibility function
Discernibility function considering the object x1
belonging to Class 1 Discernibility of x1 w.r.t
x2 (and) Discernibility of x1 w.r.t x3
Similarly, Discernibility function considering
object
Dependency Rules (AND-OR form)
70
Rules
No. of Classes2 No. of Features2
Crude Networks
L1 . . . H2
L1 . . . H2
L1 . . . H2
...
...
...
Population1
Population2
Population3
GA2 (Phase 1)
GA3 (Phase 1)
GA1 (Phase 1)
Partially Trained
L1 . . . H2
L1 . . . H2
L1 . . . H2
71
Concatenated
Links having small random value
L1 . . . H2
L1 . . . H2
. . . . . . . . . . . .
Final Population
Low mutation probability
GA (Phase II) (with restricted mutation
probability )
High mutation probability
Final Trained Network
72
Knowledge Flow in Modular Rough Fuzzy MLP
IEEE Trans. Knowledge Data Engg., 15(1), 14-25,
2003
Feature Space
Rough Set Rules
C1
(R1)
Network Mapping
C1
F2
C2(R2)
C2(R3)
F1
R1 (Subnet 1)
R2 (Subnet 2)
R3 (Subnet 3)
Partial Training with Ordinary GA
Feature Space
SN1
(SN2)
(SN1)
(SN3)
F2
SN2
Partially Refined Subnetworks
SN3
F1
73
Concatenation of Subnetworks
high mutation prob.
low mutation prob.
Evolution of the Population of Concatenated
networks with GA having variable mutation operator
Feature Space
C1
F2
C2
Final Solution Network
F1
74
(No Transcript)
75
Speech Data 3 Features, 6 Classes
Classification Accuracy
76
Network Size (No. of Links)
77
Training Time (hrs) DEC Alpha
Workstation _at_400MHz
78
1. MLP 4.
Rough Fuzzy MLP 2. Fuzzy MLP
5. Modular Rough Fuzzy MLP 3. Modular
Fuzzy MLP
79
Network Structure IEEE Trans.
Knowledge Data Engg., 15(1), 14-25, 2003
Modular Rough Fuzzy MLP
Structured ( of links few)
Fuzzy MLP
Unstructured ( of links more)
Histogram of weight values
80
Connectivity of the network obtained using
Modular Rough Fuzzy MLP
81
Sample rules extracted for Modular Rough Fuzzy
MLP
82
Rule Evaluation
  • Accuracy
  • Fidelity (Number of times network and rule base
    output agree)
  • Confusion (should be restricted within minimum
    no. of classes)
  • Coverage (a rule base with smaller uncovered
    region i.e., test set for which no rules are
    fired, is better)
  • Rule base size (smaller the no. of rules, more
    compact is the rule base)
  • Certainty (confidence of rules)

83
Existing Rule Extraction Algorithms
  • Subset Searches over all possible combination
    of input
  • weights to a node of trained networks. Rules
    are generated
  • from these Subsets of links, for which sum of
    the weights
  • exceed the bias for that node.
  • MofN Instead of AND-OR rules, the method
    extracts rules
  • of the Form IF M out of N inputs are high THEN
    Class I.
  • X2R Unlike previous two methods which consider
    links
  • of a network, X2R generates rule from
    input-output
  • mapping implemented by the network.
  • C4.5 Rule generation algorithm based on
    decision trees.

84
IEEE Trans. Knowledge Data Engg., 15(1), 14-25,
2003
Comparison of Rules obtained for Speech data
85
Number of Rules
Confusion
CPU Time
86
Case Based Reasoning (CBR)
  • Cases some typical situations, already
    experienced by the system.
  • conceptualized piece of knowledge
    representing an experience that teaches a lesson
    for achieving the goals of the system.

  • CBR involves
  • adapting old solutions to meet new demands
  • using old cases to explain new situations or to
  • justify new solutions
  • reasoning from precedents to interpret new
    situations.

87
  • ? learns and becomes more efficient as a
    byproduct of its reasoning activity.
  • Example Medical diagnosis and Law
    interpretation where the knowledge available is
    incomplete and/or evidence is sparse.

88
  • Unlike traditional knowledge-based system, case
    based system operates through a process of
  • remembering one or a small set of concrete
    instances or cases and
  • basing decisions on comparisons between the
    new situation and the old ones.

89
  • Case Selection ? Cases belong to the set of
    examples encountered.
  • Case Generation ? Constructed Cases need not be
    any of the examples.

90
Rough Sets
Uncertainty Handling
Granular Computing
(Using information granules)
(Using lower upper approximations)
91
IEEE Trans. Knowledge Data Engg., to appear
Granular Computing and Case Generation
  • Information Granules A group of similar objects
    clubbed together by an indiscernibility relation.
  • Granular Computing Computation is performed
    using information granules and not the data
    points (objects)
  • Information compression
  • Computational gain

92
  • Cases Informative patterns (prototypes)
    characterizing the problems.
  • In rough set theoretic framework
  • Cases ? Information Granules
  • In rough-fuzzy framework
  • Cases ? Fuzzy Information Granules

93
Characteristics and Merits
  • Cases are cluster granules, not sample points
  • Involves only reduced number of relevant
    features with variable size
  • Less storage requirements
  • Fast retrieval
  • Suitable for mining data with large dimension
    and size

94
  • How to Achieve?
  • Fuzzy sets help in linguistic representation of
    patterns, providing a fuzzy granulation of the
    feature space
  • Rough sets help in generating dependency rules to
    model informative/representative regions in the
    granulated feature space.
  • Fuzzy membership functions corresponding to the
    representative regions are stored as Cases.

95
Fuzzy (F)-Granulation
mlow
mmedium
mhigh
1
Membership value
0.5
cM
cH
cL
Feature j
lL
lM
p-function
96
Clow(Fj) mjl Cmedium(Fj) mj Chigh(Fj)
mjh ?low(Fj) Cmedium(Fj) ? Clow(Fj) ?high(Fj)
Chigh(Fj) ? Cmedium(Fj) ?medium(Fj) 0.5
(Chigh(Fj) ? Clow(Fj)) Mj mean of the pattern
points along jth axis. Mjl mean of points in
the range Fj min, mj) Mjh mean of points in
the range (mj, Fj max Fj max, Fj min maximum
and minimum values of feature Fj.
97
  • An n-dimensional pattern Fi Fi1, Fi2, , Fin
    is represented as a 3n-dimensional fuzzy
    linguistic pattern Pal Mitra 1992
  • Fi ?low(Fi1) (Fi), , ?high(Fin) (Fi)
  • Set m value at 1 or 0, if it is higher or lower
  • than 0.5
  • ? Binary 3n-dimensional patterns are obtained

98
  • (Compute the frequency nki of occurrence of
    binary patterns. Select those patterns having
    frequency above a threshold Tr (for noise
    removal))
  • Generate a decision table consisting of the
    binary patterns.
  • Extract dependency rules corresponding to
  • informative regions (blocks).
    (e.g., class L1 ? M2)

99
Rough Set Rule Generation
Decision Table
Object F1 F2 F3 F4 F5
Decision x1 1 0 1 0
1 Class 1 x2 0 0 0 0
1 Class 1 x3 1 1 1
1 1 Class 1 x4 0 1 0
1 0 Class 2 x5 1 1
1 0 0 Class 2
100
Discernibility Matrix (c) for Class 1
Objects x1 x2 x3
x1 f F1, F3 F2, F4
x2 f F1,F2,F3,F4
x3 f
101
Discernibility function considering the object x1
belonging to Class 1 Discernibility of x1 w.r.t
x2 (and) Discernibility of x1 w.r.t x3
Similarly, Discernibility function considering
object
Dependency Rules (AND-OR form)
102
Mapping Dependency Rules to Cases
  • Each conjunction e.g., L1 ? M2 represents a
    region (block)
  • For each conjunction, store as a case
  • Parameters of the fuzzy membership functions
    corresponding to linguistic variables that occur
    in the conjunction.
  • (thus, multiple cases may be generated from a
    rule.)
  • Class information

103
  • Note All features may not occur in a rule.
  • Cases may be represented by Different Reduced
    number of features.
  • Structure of a Case
  • Parameters of the membership functions (center,
    radii), Class information

104
Example IEEE
Trans. Knowledge Data Engg., to appear
F2
CASE 1
0.9
0.4
X X X X X X X X X

CASE 2
0.2
0.7
0.1
0.5
F1
Parameters of fuzzy linguistic sets low, medium,
high
105
Dependency Rules and Cases Obtained
Case 1 Feature No 1, fuzzset (L) c 0.1, ?
0.5 Feature No 2, fuzzset (H) c 0.9, ?
0.5 Class1 Case 2 Feature No 1, fuzzset (H)
c 0.7, ? 0.4 Feature No 2, fuzzset (L) c
0.2, ? 0.5 Class2
106
Case Retrieval
  • Similarity (sim(x,c)) between a pattern x and a
    case c is defined as
  • n number of features present in case c

107
  • the degree of belongingness of pattern x to
    fuzzy linguistic set fuzzset for feature j.
  • For classifying an unknown pattern, the case
    closest to the pattern in terms of sim(x,c) is
    retrieved and its class is assigned to the
    pattern.

108
Experimental Results and Comparisons
  1. Forest Covertype Contains 10 dimensions, 7
    classes and 586,012 samples. It is a Geographical
    Information System data representing forest cover
    types (pine/fir etc) of USA. The variables are
    cartographic and remote sensing measurements. All
    the variables are numeric.

109
  1. Multiple features This dataset consists of
    features of handwritten numerals (0-9)
    extracted from a collection of Dutch utility
    maps. There are total 2000 patterns, 649 features
    (all numeric) and all classes.
  2. Iris The dataset contains 150 instances, 4
    features and 3 classes of Iris flowers. The
    features are numeric.

110
  • Some Existing Case Selection Methods
  • k-NN based
  • Condensed nearest neighbor (CNN),
  • Instance based learning (e.g., IB3),
  • Instance based learning with feature weighting
    (e.g., IB4).
  • Fuzzy logic based
  • Neuro-fuzzy based.

111
Algorithms Compared
  1. Instance based learning algorithm, IB3 Aha
    1991,
  2. Instance based learning algorithm, IB4 Aha 1992
    (reduced feature). The feature weighting is
    learned by random hill climbing in IB4. A
    specified number of features having high weights
    is selected.
  3. Random case selection.

112
Evaluation in terms of
  1. 1-NN classification accuracy using the cases.
    Training set 10 for case generation, and Test
    set 90
  2. Number of cases stored in the case base.
  3. Average number of features required to store a
    case (navg).
  1. CPU time required for case generation (tgen).
  2. Average CPU time required to retrieve a case
    (tret). (on a Sun UltraSparc _at_350 MHz Workstation)

113
Iris Flowers 4 features, 3 classes, 150 samples
Number of cases 3 (for all methods)
114
Forest Cover Types 10 features, 7 classes,
5,86,012 samples
Number of cases 545 (for all methods)
115
Hand Written Numerals 649 features, 10 classes,
2000 samples
Number of cases 50 (for all methods)
116
For same number of cases
Accuracy Proposed method much superior to random
selection and IB4, close IB3. Average Number of
Features Stored Proposed method stores much less
than the original data dimension. Case Generation
Time Proposed method requires much less compared
to IB3 and IB4. Case Retrieval Time Several
orders less for proposed method compared to IB3
and random selection. Also less than IB4.
117
  • Conclusions
  • Relation between Soft Computing, Machine
    Intelligence and Pattern Recognition is
    explained.
  • Emergence of Data Mining and Knowledge
    Discovery from PR point of view is explained.
  • Significance of Hybridization in Soft Computing
    paradigm is illustrated.

118
  • Modular concept enhances performance,
    accelerates training and makes the network
    structured with less no. of links.
  • Rules generated are superior to other related
    methods in terms of accuracy, coverage, fidelity,
    confusion, size and certainty.

119
  • Rough sets used for generating information
    granules.
  • Fuzzy sets provide efficient granulation of
    feature space (F -granulation).
  • Reduced and variable feature subset
    representation of cases is a unique feature of
    the scheme.
  • Rough-fuzzy case generation method is suitable
    for CBR systems involving datasets large both in
    dimension and size.

120
  • Unsupervised case generation, Rough-SOM
  • (Applied intelligence, to appear)
  • Application to multi-spectral image segmentation
  • (IEEE Trans. Geoscience and Remote Sensing,
    40(11), 2495-2501, 2002)
  • Significance in Computational Theory of
    Perception (CTP)

121
Thank You!!
About PowerShow.com