1 / 82

Soft Computing, Machine Intelligence and Data

Mining

- Sankar K. Pal
- Machine Intelligence Unit
- Indian Statistical Institute, Calcutta
- http//www.isical.ac.in/sankar

ISI, 1931, Mahalanobis

CS

BS

MS

AS

SS

PES

SQC

ACMU ECSU MIU CVPRU

- Director
- Prof-in-charge
- Heads

- Distinguished Scientist
- Professor
- Associate Professor
- Lecturer

Faculty 250

Courses B. Stat, M. Stat, M. Tech(CS),

M.Tech(SQC OR), Ph.D.

Location Calcutta (HQ) Delhi

Bangalore Hyderabad,

Madras Giridih, Bombay

MIU Activities

(Formed in March 1993)

- Pattern Recognition and Image Processing
- Color Image Processing
- Data Mining
- Data Condensation, Feature Selection
- Support Vector Machine
- Case Generation
- Soft Computing
- Fuzzy Logic, Neural Networks, Genetic Algorithms,

Rough Sets - Hybridization
- Case Based Reasoning
- Fractals/Wavelets
- Image Compression
- Digital Watermarking
- Wavelet ANN
- Bioinformatics

- Externally Funded Projects
- INTEL
- CSIR
- Silicogene
- Center for Excellence in Soft Computing Research
- Foreign Collaborations
- (Japan, France, Poland, Honk Kong, Australia)
- Editorial Activities
- Journals, Special Issues
- Books
- Achievements/Recognitions

Faculty 10 Research Scholar/Associate 8

Contents

- What is Soft Computing ?
- - Computational Theory of Perception
- Pattern Recognition and Machine Intelligence
- Relevance of Soft Computing Tools
- Different Integrations

- Emergence of Data Mining
- Need
- KDD Process
- Relevance of Soft Computing Tools
- Rule Generation/Evaluation

- Modular Evolutionary Rough Fuzzy MLP
- Modular Network
- Rough Sets, Granules Rule Generation
- Variable Mutation Operations
- Knowledge Flow
- Example and Merits

- Rough-fuzzy Case Generation
- Granular Computing
- Fuzzy Granulation
- Mapping Dependency Rules to Cases
- Case Retrieval
- Examples and Merits
- Conclusions

SOFT COMPUTING (L. A. Zadeh)

- Aim
- To exploit the tolerance for imprecision

uncertainty, approximate reasoning and partial

truth to achieve tractability, robustness, low

solution cost, and close resemblance with human

like decision making - To find an approximate solution to an

imprecisely/precisely formulated problem.

- Parking a Car
- Generally, a car can be parked rather easily

because the final position of the car is not

specified exactly. It it were specified to

within, say, a fraction of a millimeter and a few

seconds of arc, it would take hours or days of

maneuvering and precise measurements of distance

and angular position to solve the problem. - ? High precision carries a high cost.

- ? The challenge is to exploit the tolerance for

imprecision by devising methods of computation

which lead to an acceptable solution at low cost.

This, in essence, is the guiding principle of

soft computing.

- Soft Computing is a collection of methodologies

(working synergistically, not competitively)

which, in one form or another, reflect its

guiding principle Exploit the tolerance for

imprecision, uncertainty, approximate reasoning

and partial truth to achieve Tractability,

Robustness, and close resemblance with human

like decision making. - Foundation for the conception and design of high

MIQ (Machine IQ) systems.

- Provides Flexible Information Processing

Capability for representation and evaluation of

various real life ambiguous and uncertain

situations. - ?
- Real World Computing
- It may be argued that it is soft computing

rather than hard computing that should be viewed

as the foundation for Artificial Intelligence.

- At this junction, the principal constituents

of soft computing are Fuzzy Logic ,

Neurocomputing , Genetic Algorithms

and Rough Sets .

FL

GA

NC

- Within Soft Computing FL, NC, GA, RS are

Complementary rather than Competitive

- Role of

FL the algorithms for dealing with imprecision

and uncertainty NC the machinery for learning

and curve fitting GA the algorithms for search

and optimization

handling uncertainty arising from the granularity

in the domain of discourse

Referring back to exampleParking a Car

- Do we use any measurement and computation while

performing the tasks in Soft Computing? - We use Computational Theory of Perceptions (CTP)

Computational Theory of Perceptions (CTP)

AI Magazine, 22(1), 73-84, 2001

Provides capability to compute and reason with

perception based information

- Example Car parking, driving in city, cooking

meal, summarizing story - Humans have remarkable capability to perform a

wide variety of physical and mental tasks without

any measurement and computations

- They use perceptions of time, direction, speed,

shape, possibility, likelihood, truth, and other

attributes of physical and mental objects - Reflecting the finite ability of the sensory

organs and (finally the brain) to resolve

details, Perceptions are inherently imprecise

- Perceptions are fuzzy (F) granular
- (both fuzzy and granular)
- Boundaries of perceived classes are unsharp
- Values of attributes are granulated.
- (a clump of indistinguishable points/objects)
- Example
- Granules in age very young, young, not so old,
- Granules in direction slightly left, sharp

right,

- F-granularity of perceptions puts them well

beyond the reach of traditional methods of

analysis (based on predicate logic and

probability theory) - Main distinguishing feature the assumption that

perceptions are described by propositions drawn

from a natural language.

Hybrid Systems

- Neuro-fuzzy
- Genetic neural
- Fuzzy genetic
- Fuzzy neuro
- genetic

Knowledge-based Systems

- Probabilistic reasoning
- Approximate reasoning
- Case based reasoning

Data Driven Systems

Machine Intelligence

- Neural network
- system
- Evolutionary
- computing

- Fuzzy logic
- Rough sets

Non-linear Dynamics

- Chaos theory
- Rescaled range
- analysis (wavelet)
- Fractal analysis

- Pattern recognition
- and learning

Machine Intelligence A core concept for

grouping various advanced technologies with

Pattern Recognition and Learning

Pattern Recognition System (PRS)

- Measurement ? Feature ? Decision
- Space Space Space
- Uncertainties arise from deficiencies of

information available from a situation - Deficiencies may result from incomplete,

imprecise, ill-defined, not fully reliable,

vague, contradictory information in various

stages of a PRS

Relevance of Fuzzy Sets in PR

- Representing linguistically phrased input

features for processing

- Representing multi-class membership of ambiguous

patterns

- Generating rules inferences in
- linguistic form

- Extracting ill-defined image regions, primitives,

properties and describing relations among them as

fuzzy subsets

ANNs provide Natural Classifiers having

- Resistance to Noise,
- Tolerance to Distorted Patterns /Images (Ability

to Generalize) - Superior Ability to Recognize Overlapping Pattern

Classes or Classes with Highly Nonlinear

Boundaries or Partially Occluded or Degraded

Images

- Potential for Parallel Processing
- Non parametric

Why GAs in PR ?

- Methods developed for Pattern Recognition and

Image Processing are usually problem dependent. - Many tasks involved in analyzing/identifying a

pattern need Appropriate Parameter Selection and

Efficient Search in complex spaces to obtain

Optimal Solutions

- Makes the processes
- - Computationally Intensive
- - Possibility of Losing the Exact Solution
- GAs Efficient, Adaptive and robust Search

Processes, Producing near optimal solutions and

have a large amount of Implicit Parallelism - GAs are Appropriate and Natural Choice for

problems which need Optimizing Computation

Requirements, and Robust, Fast and Close

Approximate Solutions

Relevance of FL, ANN, GAs Individually

to PR Problems is Established

In late eighties scientists thought Why NOT

Integrations ?

Fuzzy Logic ANN ANN GA Fuzzy Logic ANN

GA Fuzzy Logic ANN GA Rough Set

Neuro-fuzzy hybridization is the most

visible integration realized so far.

Why Fusion

Fuzzy Set theoretic models try to mimic human

reasoning and the capability of handling

uncertainty (SW) Neural Network models attempt

to emulate architecture and information

representation scheme of human brain (HW)

NEURO-FUZZY Computing

(for More Intelligent System)

FUZZY SYSTEM

ANN used for learning and Adaptation

NFS

ANN

Fuzzy Sets used to Augment its Application

domain

FNN

MERITS

- GENERIC
- APPLICATION SPECIFIC

Rough-Fuzzy Hybridization

- Fuzzy Set theory assigns to each object a degree

- of belongingness (membership) to represent an
- imprecise/vague concept.
- The focus of rough set theory is on the

ambiguity - caused by limited discernibility of objects

(lower - and upper approximation of concept).

Rough sets and Fuzzy sets can be integrated to

develop a model of uncertainty stronger than

either.

Rough Fuzzy Hybridization A New Trend in

Decision Making, S. K. Pal and A. Skowron (eds),

Springer-Verlag, Singapore, 1999

Neuro-Rough Hybridization

- Rough set models are used to generate network
- parameters (weights).
- Roughness is incorporated in inputs and output

of - networks for uncertainty handling, performance
- enhancement and extended domain of application.
- Networks consisting of rough neurons are used.

Neurocomputing, Spl. Issue on Rough-Neuro

Computing, S. K. Pal, W. Pedrycz, A. Skowron and

R. Swiniarsky (eds), vol. 36 (1-4), 2001.

- Neuro-Rough-Fuzzy-Genetic Hybridization
- Rough sets are used to extract domain knowledge

in the form of linguistic rules

generates fuzzy Knowledge based networks

evolved using Genetic algorithms. - Integration offers several advantages like fast

training, compact network and performance

enhancement.

IEEE TNN, .9, 1203-1216, 1998

Incorporate Domain Knowledge using Rough Sets

- Before we describe
- Modular Evolutionary Rough-fuzzy MLP
- Rough-fuzzy Case Generation System
- We explain Data Mining and the significance
- of Pattern Recognition, Image Processing and
- Machine Intelligence.

One of the applications of Information Technology

that has drawn the attention of researchers is

DATA MINING Where Pattern Recognition/Image

Processing/Machine Intelligence are directly

related.

Why Data Mining ?

- Digital revolution has made digitized information

easy to capture and fairly inexpensive to store. - With the development of computer hardware and

software and the rapid computerization of

business, huge amount of data have been collected

and stored in centralized or distributed

databases.

- Data is heterogeneous (mixture of text, symbolic,

numeric, texture, image), huge (both in

dimension and size) and scattered. - The rate at which such data is stored is growing

at a phenomenal rate.

- As a result, traditional ad hoc mixtures of

statistical techniques and data management tools

are no longer adequate for analyzing this vast

collection of data.

- Pattern Recognition and Machine Learning
- principles applied to a very large (both in size
- and dimension) heterogeneous database
- ? Data Mining
- Data Mining Knowledge Interpretation
- ?

Knowledge Discovery - Process of identifying valid, novel, potentially
- useful, and ultimately understandable patterns
- in data

Pattern Recognition, World Scientific, 2001

Data Mining (DM)

- Data
- Cleaning

Machine Learning

Knowledge Interpretation

- Data
- Condensation

Mathe- matical Model of

Preprocessed

Useful

Huge Raw Data

- Knowledge
- Extraction
- Knowledge
- Evaluation

- Dimensionality
- Reduction

Knowledge

- Classification
- Clustering
- Rule
- Generation

Data

Data (Patterns)

- Data
- Wrapping/
- Description

Knowledge Discovery in Database (KDD)

Data Mining Algorithm Components

- Model Function of the model (e.g.,

classification, clustering, rule generation) and

its representational form (e.g., linear

discriminants, neural networks, fuzzy logic, GAs,

rough sets). - Preference criterion Basis for preference of

one model or set of parameters over another. - Search algorithm Specification of an algorithm

for finding particular patterns of interest (or

models and parameters), given the data, family of

models, and preference criterion.

Why Growth of Interest ?

- Falling cost of large storage devices and

increasing ease of collecting data over networks. - Availability of Robust/Efficient machine learning

algorithms to process data. - Falling cost of computational power ? enabling

use of computationally intensive methods for data

analysis.

Example

- Financial Investment Stock indices and prices,

interest rates, credit card data, fraud detection - Health Care Various diagnostic information

stored by hospital management systems. - Data is heterogeneous (mixture of text, symbolic,

numeric, texture, image) and huge (both in

dimension and size).

Role of Fuzzy Sets

- Modeling of imprecise/qualitative

knowledge - Transmission and handling uncertainties at

various stages - Supporting, to an extent, human type
- reasoning in natural form

- Classification/ Clustering
- Discovering association rules (describing

interesting association relationship among

different attributes) - Inferencing
- Data summarization/condensation (abstracting the

essence from a large amount of information).

Role of ANN

- Adaptivity, robustness, parallelism, optimality
- Machinery for learning and curve fitting (Learns

from examples) - Initially, thought to be unsuitable for black

box nature no information available in symbolic

form (suitable for human interpretation) - Recently, embedded knowledge is extracted in the

form of symbolic rules making it

suitable for Rule generation.

Role of GAs

- Robust, parallel, adaptive search methods

suitable when the search space is large. - Used more in Prediction (P) than Description(D)

- D Finding human interpretable patterns

describing the data - P Using some variables or attributes in the

database to predict unknown/ future values of - other variables of interest.

Example Medical Data

- Numeric and textual information may be

interspersed - Different symbols can be used with same meaning
- Redundancy often exists
- Erroneous/misspelled medical terms are common
- Data is often sparsely distributed

- Robust preprocessing system is required to

extract any kind of knowledge from even

medium-sized medical data sets - The data must not only be cleaned of errors and

redundancy, but organized in a fashion that makes

sense for the problem

- So, We NEED
- Efficient
- Robust
- Flexible
- Machine Learning Algorithms
- ?
- NEED for Soft Computing Paradigm

Without Soft Computing Machine Intelligence

Research Remains Incomplete.

Modular Neural Networks

Task Split a learning task into several

subtasks, train a Subnetwork for each subtask,

integrate the subnetworks to generate the final

solution. Strategy Divide and Conquer

- The approach involves
- Effective decomposition of the problems s.t. the

- Subproblems could be solved with compact
- networks.
- Effective combination and training of the
- subnetworks s.t. there is Gain in terms of both

- total training time, network size and accuracy

of - solution.

Advantages

- Accelerated training
- The final solution network has more structured
- components
- Representation of individual clusters

(irrespective - of size/importance) is better preserved in the

final - solution network.
- The catastrophic interference problem of neural
- network learning (in case of overlapped

regions) - is reduced.

Classification Problem

- Split a k-class problem into k 2-class problems.
- Train one (or multiple) subnetwork modules for
- each 2-class problem.
- Concatenate the subnetworks s.t. Intra-module

links - that have already evolved are unchanged, while
- Inter-module links are initialized to a low

value. - Train the concatenated networks s.t. the Intra-
- module links (already evolved) are less

perturbed, - while the Inter-module links are more

perturbed.

3-class problem 3 (2-class problem)

Class 1 Subnetwork

Class 2 Subnetwork

Class 3 Subnetwork

Integrate Subnetwork Modules

Links to be grown

Links with values preserved

Final Training Phase

I

Final Network

Inter-module links grown

Modular Rough Fuzzy MLP?

A modular network designed using four different

Soft Computing tools. Basic Network Model

Fuzzy MLP Rough Set theory is used to generate

Crude decision rules Representing each of the

classes from the Discernibility Matrix. (There

may be multiple rules for each class

multiple subnetworks for each class)

The Knowledge based subnetworks are concatenated

to form a population of initial solution

networks. The final solution network is evolved

using a GA with variable mutation operator. The

bits corresponding to the Intra-module links

(already evolved) have low mutation probability,

while Inter-module links have high mutation

probability.

Rough Sets

Z. Pawlak 1982, Int. J. Comp. Inf. Sci.

- Offer mathematical tools to discover hidden

patterns in data. - Fundamental principle of a rough set-based

learning system is to discover redundancies and

dependencies between the given features of a data

to be classified.

- Approximate a given concept both from below and

from above, using lower and upper approximations.

- Rough set learning algorithms can be used to

obtain rules in IF-THEN form from a decision

table. - Extract Knowledge from data base (decision table

w.r.t. objects and attributes ? remove

undesirable attributes (knowledge discovery) ?

analyze data dependency ? minimum subset of

attributes (reducts))

Rough Sets

Upper Approximation BX

Set X

Lower Approximation BX

xB (Granules)

.

x

xB set of all points belonging to the same

granule as of the point x

in feature space WB.

xB is the set of all points which are

indiscernible with point x in terms of feature

subset B.

Approximations of the set

w.r.t feature subset B

B-lower BX

Granules definitely belonging to X

B-upper BX

Granules definitely and possibly belonging to X

If BX BX, X is B-exact or B-definable Otherwise

it is Roughly definable

Rough Sets

Uncertainty Handling

Granular Computing

(Using information granules)

(Using lower upper approximations)

Granular Computing Computation is performed

using information granules and not the data

points (objects)

Information compression Computational gain

Information Granules and Rough Set Theoretic Rules

F2

high

medium

low

low

medium

high

F1

Rule

- Rule provides crude description of the class

using - granule

Rough Set Rule Generation

Decision Table

Object F1 F2 F3 F4 F5

Decision x1 1 0 1 0

1 Class 1 x2 0 0 0 0

1 Class 1 x3 1 1 1

1 1 Class 1 x4 0 1 0

1 0 Class 2 x5 1 1

1 0 0 Class 2

Discernibility Matrix (c) for Class 1

Objects x1 x2 x3

x1 f F1, F3 F2, F4

x2 f F1,F2,F3,F4

x3 f

Discernibility function

Discernibility function considering the object x1

belonging to Class 1 Discernibility of x1 w.r.t

x2 (and) Discernibility of x1 w.r.t x3

Similarly, Discernibility function considering

object

Dependency Rules (AND-OR form)

Rules

No. of Classes2 No. of Features2

Crude Networks

L1 . . . H2

L1 . . . H2

L1 . . . H2

...

...

...

Population1

Population2

Population3

GA2 (Phase 1)

GA3 (Phase 1)

GA1 (Phase 1)

Partially Trained

L1 . . . H2

L1 . . . H2

L1 . . . H2

Concatenated

Links having small random value

L1 . . . H2

L1 . . . H2

. . . . . . . . . . . .

Final Population

Low mutation probability

GA (Phase II) (with restricted mutation

probability )

High mutation probability

Final Trained Network

Knowledge Flow in Modular Rough Fuzzy MLP

IEEE Trans. Knowledge Data Engg., 15(1), 14-25,

2003

Feature Space

Rough Set Rules

C1

(R1)

Network Mapping

C1

F2

C2(R2)

C2(R3)

F1

R1 (Subnet 1)

R2 (Subnet 2)

R3 (Subnet 3)

Partial Training with Ordinary GA

Feature Space

SN1

(SN2)

(SN1)

(SN3)

F2

SN2

Partially Refined Subnetworks

SN3

F1

Concatenation of Subnetworks

high mutation prob.

low mutation prob.

Evolution of the Population of Concatenated

networks with GA having variable mutation operator

Feature Space

C1

F2

C2

Final Solution Network

F1

(No Transcript)

Speech Data 3 Features, 6 Classes

Classification Accuracy

Network Size (No. of Links)

Training Time (hrs) DEC Alpha

Workstation _at_400MHz

1. MLP 4.

Rough Fuzzy MLP 2. Fuzzy MLP

5. Modular Rough Fuzzy MLP 3. Modular

Fuzzy MLP

Network Structure IEEE Trans.

Knowledge Data Engg., 15(1), 14-25, 2003

Modular Rough Fuzzy MLP

Structured ( of links few)

Fuzzy MLP

Unstructured ( of links more)

Histogram of weight values

Connectivity of the network obtained using

Modular Rough Fuzzy MLP

Sample rules extracted for Modular Rough Fuzzy

MLP

Rule Evaluation

- Accuracy
- Fidelity (Number of times network and rule base

output agree) - Confusion (should be restricted within minimum

no. of classes)

- Coverage (a rule base with smaller uncovered

region i.e., test set for which no rules are

fired, is better) - Rule base size (smaller the no. of rules, more

compact is the rule base) - Certainty (confidence of rules)

Existing Rule Extraction Algorithms

- Subset Searches over all possible combination

of input - weights to a node of trained networks. Rules

are generated - from these Subsets of links, for which sum of

the weights - exceed the bias for that node.
- MofN Instead of AND-OR rules, the method

extracts rules - of the Form IF M out of N inputs are high THEN

Class I. - X2R Unlike previous two methods which consider

links - of a network, X2R generates rule from

input-output - mapping implemented by the network.
- C4.5 Rule generation algorithm based on

decision trees.

IEEE Trans. Knowledge Data Engg., 15(1), 14-25,

2003

Comparison of Rules obtained for Speech data

Number of Rules

Confusion

CPU Time

Case Based Reasoning (CBR)

- Cases some typical situations, already

experienced by the system. - conceptualized piece of knowledge

representing an experience that teaches a lesson

for achieving the goals of the system.

- CBR involves
- adapting old solutions to meet new demands
- using old cases to explain new situations or to
- justify new solutions
- reasoning from precedents to interpret new

situations.

- ? learns and becomes more efficient as a

byproduct of its reasoning activity.

- Example Medical diagnosis and Law

interpretation where the knowledge available is

incomplete and/or evidence is sparse.

- Unlike traditional knowledge-based system, case

based system operates through a process of - remembering one or a small set of concrete

instances or cases and - basing decisions on comparisons between the

new situation and the old ones.

- Case Selection ? Cases belong to the set of

examples encountered. - Case Generation ? Constructed Cases need not be

any of the examples.

Rough Sets

Uncertainty Handling

Granular Computing

(Using information granules)

(Using lower upper approximations)

IEEE Trans. Knowledge Data Engg., to appear

Granular Computing and Case Generation

- Information Granules A group of similar objects

clubbed together by an indiscernibility relation. - Granular Computing Computation is performed

using information granules and not the data

points (objects) - Information compression
- Computational gain

- Cases Informative patterns (prototypes)

characterizing the problems. - In rough set theoretic framework
- Cases ? Information Granules
- In rough-fuzzy framework
- Cases ? Fuzzy Information Granules

Characteristics and Merits

- Cases are cluster granules, not sample points
- Involves only reduced number of relevant

features with variable size - Less storage requirements

- Fast retrieval
- Suitable for mining data with large dimension

and size

- How to Achieve?
- Fuzzy sets help in linguistic representation of

patterns, providing a fuzzy granulation of the

feature space - Rough sets help in generating dependency rules to

model informative/representative regions in the

granulated feature space. - Fuzzy membership functions corresponding to the

representative regions are stored as Cases.

Fuzzy (F)-Granulation

mlow

mmedium

mhigh

1

Membership value

0.5

cM

cH

cL

Feature j

lL

lM

p-function

Clow(Fj) mjl Cmedium(Fj) mj Chigh(Fj)

mjh ?low(Fj) Cmedium(Fj) ? Clow(Fj) ?high(Fj)

Chigh(Fj) ? Cmedium(Fj) ?medium(Fj) 0.5

(Chigh(Fj) ? Clow(Fj)) Mj mean of the pattern

points along jth axis. Mjl mean of points in

the range Fj min, mj) Mjh mean of points in

the range (mj, Fj max Fj max, Fj min maximum

and minimum values of feature Fj.

- An n-dimensional pattern Fi Fi1, Fi2, , Fin

is represented as a 3n-dimensional fuzzy

linguistic pattern Pal Mitra 1992 - Fi ?low(Fi1) (Fi), , ?high(Fin) (Fi)
- Set m value at 1 or 0, if it is higher or lower
- than 0.5
- ? Binary 3n-dimensional patterns are obtained

- (Compute the frequency nki of occurrence of

binary patterns. Select those patterns having

frequency above a threshold Tr (for noise

removal)) - Generate a decision table consisting of the

binary patterns. - Extract dependency rules corresponding to
- informative regions (blocks).

(e.g., class L1 ? M2)

Rough Set Rule Generation

Decision Table

Object F1 F2 F3 F4 F5

Decision x1 1 0 1 0

1 Class 1 x2 0 0 0 0

1 Class 1 x3 1 1 1

1 1 Class 1 x4 0 1 0

1 0 Class 2 x5 1 1

1 0 0 Class 2

Discernibility Matrix (c) for Class 1

Objects x1 x2 x3

x1 f F1, F3 F2, F4

x2 f F1,F2,F3,F4

x3 f

Discernibility function considering the object x1

belonging to Class 1 Discernibility of x1 w.r.t

x2 (and) Discernibility of x1 w.r.t x3

Similarly, Discernibility function considering

object

Dependency Rules (AND-OR form)

Mapping Dependency Rules to Cases

- Each conjunction e.g., L1 ? M2 represents a

region (block) - For each conjunction, store as a case
- Parameters of the fuzzy membership functions

corresponding to linguistic variables that occur

in the conjunction. - (thus, multiple cases may be generated from a

rule.)

- Class information

- Note All features may not occur in a rule.
- Cases may be represented by Different Reduced

number of features. - Structure of a Case
- Parameters of the membership functions (center,

radii), Class information

Example IEEE

Trans. Knowledge Data Engg., to appear

F2

CASE 1

0.9

0.4

X X X X X X X X X

CASE 2

0.2

0.7

0.1

0.5

F1

Parameters of fuzzy linguistic sets low, medium,

high

Dependency Rules and Cases Obtained

Case 1 Feature No 1, fuzzset (L) c 0.1, ?

0.5 Feature No 2, fuzzset (H) c 0.9, ?

0.5 Class1 Case 2 Feature No 1, fuzzset (H)

c 0.7, ? 0.4 Feature No 2, fuzzset (L) c

0.2, ? 0.5 Class2

Case Retrieval

- Similarity (sim(x,c)) between a pattern x and a

case c is defined as - n number of features present in case c

- the degree of belongingness of pattern x to

fuzzy linguistic set fuzzset for feature j. - For classifying an unknown pattern, the case

closest to the pattern in terms of sim(x,c) is

retrieved and its class is assigned to the

pattern.

Experimental Results and Comparisons

- Forest Covertype Contains 10 dimensions, 7

classes and 586,012 samples. It is a Geographical

Information System data representing forest cover

types (pine/fir etc) of USA. The variables are

cartographic and remote sensing measurements. All

the variables are numeric.

- Multiple features This dataset consists of

features of handwritten numerals (0-9)

extracted from a collection of Dutch utility

maps. There are total 2000 patterns, 649 features

(all numeric) and all classes. - Iris The dataset contains 150 instances, 4

features and 3 classes of Iris flowers. The

features are numeric.

- Some Existing Case Selection Methods
- k-NN based
- Condensed nearest neighbor (CNN),
- Instance based learning (e.g., IB3),
- Instance based learning with feature weighting

(e.g., IB4). - Fuzzy logic based
- Neuro-fuzzy based.

Algorithms Compared

- Instance based learning algorithm, IB3 Aha

1991, - Instance based learning algorithm, IB4 Aha 1992

(reduced feature). The feature weighting is

learned by random hill climbing in IB4. A

specified number of features having high weights

is selected. - Random case selection.

Evaluation in terms of

- 1-NN classification accuracy using the cases.

Training set 10 for case generation, and Test

set 90 - Number of cases stored in the case base.
- Average number of features required to store a

case (navg).

- CPU time required for case generation (tgen).
- Average CPU time required to retrieve a case

(tret). (on a Sun UltraSparc _at_350 MHz Workstation)

Iris Flowers 4 features, 3 classes, 150 samples

Number of cases 3 (for all methods)

Forest Cover Types 10 features, 7 classes,

5,86,012 samples

Number of cases 545 (for all methods)

Hand Written Numerals 649 features, 10 classes,

2000 samples

Number of cases 50 (for all methods)

For same number of cases

Accuracy Proposed method much superior to random

selection and IB4, close IB3. Average Number of

Features Stored Proposed method stores much less

than the original data dimension. Case Generation

Time Proposed method requires much less compared

to IB3 and IB4. Case Retrieval Time Several

orders less for proposed method compared to IB3

and random selection. Also less than IB4.

- Conclusions
- Relation between Soft Computing, Machine

Intelligence and Pattern Recognition is

explained. - Emergence of Data Mining and Knowledge

Discovery from PR point of view is explained. - Significance of Hybridization in Soft Computing

paradigm is illustrated.

- Modular concept enhances performance,

accelerates training and makes the network

structured with less no. of links. - Rules generated are superior to other related

methods in terms of accuracy, coverage, fidelity,

confusion, size and certainty.

- Rough sets used for generating information

granules. - Fuzzy sets provide efficient granulation of

feature space (F -granulation). - Reduced and variable feature subset

representation of cases is a unique feature of

the scheme. - Rough-fuzzy case generation method is suitable

for CBR systems involving datasets large both in

dimension and size.

- Unsupervised case generation, Rough-SOM
- (Applied intelligence, to appear)
- Application to multi-spectral image segmentation

- (IEEE Trans. Geoscience and Remote Sensing,

40(11), 2495-2501, 2002) - Significance in Computational Theory of

Perception (CTP)

Thank You!!