A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy cMeans for Data with Toleranc - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy cMeans for Data with Toleranc

Description:

A New Algorithm of Fuzzy Clustering for Data with Uncertainties: ... Many clustering algorithms have been proposed and fuzzy c-means (FCM) is the ... – PowerPoint PPT presentation

Number of Views:691
Avg rating:3.0/5.0
Slides: 37
Provided by: Yas6
Category:

less

Transcript and Presenter's Notes

Title: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy cMeans for Data with Toleranc


1
A New Algorithm of Fuzzy Clustering for Data with
UncertaintiesFuzzy c-Means for Data with
Tolerance Defined as Hyper-rectangles
  • ENDO Yasunori
  • MIYAMOTO Sadaaki

2
Outline
  • Background and goal of our study
  • The concept of tolerance
  • New clustering algorithms for data with tolerance
  • Numerical examples
  • Conclusion and future works

3
Introduction
  • Clustering is one of the unsupervised automatic
    classification. Classification methods classify a
    set of data into several groups.
  • Many clustering algorithms have been proposed and
    fuzzy c-means (FCM) is the most typical method of
    fuzzy clustering.
  • In this presentation, I would like to talk about
    one way to handle the uncertainty with data and
    present some new clustering algorithms which are
    based on FCM.

4
Uncertainty
  • In clustering, each data on a real space is
    regarded as one point in a pattern space and
    classified.
  • However, the data with uncertainty should be
    often represented not by a point but by a set.

5
Three examples of data with uncertainty
  • Example 1 Data has errors

When a spring scale of which the measurement
accuracy is plus/minus 5g shows 450g, an actual
value is in the interval from 445g to 455g.
6
Three examples of data with uncertainty
  • Example 2 Data has ranges

An apple has not only one color but also a lot of
colors so that colors of the apple could not be
represented as one point on color space.
7
Three examples of data with uncertainty
  • Example 3 Missing values exist in data

In case of a social investigation, if there are
unanswered items in the questionnaire, the items
are handled as missing values.
8
Background
  • In the past, these uncertainties of data have
    been represented as interval data. Some
    algorithms for interval data have been proposed
    (e.g., Takata and Miyamoto1).
  • In those algorithms, dissimilarity is defined
    between interval data by using particular
    measures, e.g., nearest-neighbor,
    furthest-neighbor or Hausdorff distance.

9
Background
  • The methodology of interval has the following
    disadvantages
  • We have to introduce a particular measure. But
    how do we select the adequate measure?
  • Actually, only boundary of interval data is
    handled by these measures.

10
Goal of our study
  • From a view point of strict optimization problem,
    we handle uncertainty as tolerance and consider
    the new type of optimization problem for the data
    with tolerance.
  • Moreover, we construct new clustering algorithms
    in the optimization framework. In these
    algorithms, dissimilarity is defined between
    target data by using L1 or squared L2 norm.

11
Features of proposed algorithms
  • The methodology of tolerance has the following
    advantages
  • Particular distances between intervals dont have
    to be defined.
  • Not only the boundary but also all region in
    tolerance is handled.
  • Our discussion becomes mathematically simpler
    than using interval distances.

12
The concept of tolerance
13
The concept of tolerance
  • We define as
    the -th data on a dimensional vector space
    , and as
    the tolerance vector of .
  • The constraint condition is shown by following
    expression.

14
An example of tolerance vector on R
Tolerance It is decided before calculate.
Tolerance vector It is calculated in
algorithm.
15
Comparison of Tolerance and Other Measures
Nearest-neighbor method Furthest-neighbor method
Proposed method
16
Proposed algorithms
17
Conventional fuzzy c-means
  • sFCM standard fuzzy c-means
  • .. Number of clusters
  • .. Number of data
  • .. Number of dimensions of the pattern space
  • .. Membership grade
  • .. Data
  • .. Cluster center

18
Conventional fuzzy c-means
19
Optimization problem sFCM-L2
  • Objective function
  • Membership grade U
  • Cluster center V

20
Algorithm sFCM-L2
  • Step1
  • Set the initial value of V .
  • Step2
  • Update U by
    .
  • Step3
  • Update V by
    .
  • Step4
  • If is convergent, stop.
  • Otherwise, go back to Step2.

21
Proposed algorithms
The constraint condition
22
An example of tolerance vector on R
Tolerance It is decided before calculate.
Tolerance vector It is calculated in
algorithm.
23
Optimization problem sFCMT-L2
  • Objective function
  • Membership grade U
  • Cluster center V

24
Optimization problem sFCMT-L2
  • Tolerance vector E

25
Algorithm sFCMT-L2
  • Step1
  • Set the initial values of V and E.
  • Step2
  • Update U by
    .
  • Step3
  • Update V by
    .
  • Step4
  • Update E by
    .
  • Step5
  • If is convergent, stop.
  • Otherwise, go back to Step2.

26
Outline of proposed algorithms
  • Step1
  • Set the initial values of V and E.
  • Step2
  • Update U by Eq.A.
  • Step3
  • Update V by Eq.B.
  • Step4
  • Update E by Eq.C.
  • Step5
  • If is convergent, stop.
  • Otherwise, go back to Step2.

27
Proposed algorithms
The numbers in the above table correspond to ones
of equations in our paper in the proceeding
respectively.
28
Numerical examples
29
Test data sFCMT-L2
30
Diagnosis of heart disease data
  • Heart disease database has five attributes. The
    result of diagnosis, presence or absence is
    known. The number of data is 866 and 560 data
    contains missing values in some attributes.

31
Diagnosis of heart disease data
  • In all algorithms, the convergence condition is
  • where is the previous optimal solution.
  • In addition, in sFCM.
  • To handle missing values as tolerance, we define
    it as follows.

32
Diagnosis of heart disease data
  • We try to classify all 866 data with missing
    values by using proposed algorithms, and only 306
    data without missing values by using conventional
    algorithms.
  • In each algorithm, we give initial cluster
    centers at random and classify the data set into
    two clusters. We run this trial 1000 times and
    show the average of ratio of correctly classified
    results.

33
Diagnosis of heart disease data
  • This table shows the results of classifying all
    866 data.
  • This tables shows the results of classifying only
    306 data without missing values.

34
Diagnosis of heart disease data
  • This table shows the results of classifying all
    866 data by using the proposed algorithms in our
    research.
  • This table shows the results of classifying all
    866 data by using an algorithm which handles
    missing value as interval data and uses
    nearest-neighbor distance to calculate
    dissimilarity.

35
Conclusion and future works
  • Conclusion
  • We considered the optimization problems for data
    with tolerance and solved the optimal solutions.
    Using the results, we have constructed new six
    algorithms.
  • We have shown the effectiveness of the proposed
    algorithms through some numerical examples.

36
Conclusion and future works
  • Future works
  • We will calculate other data sets with tolerance.
  • We will apply the concept of tolerance to
    regression analysis, support vector machine and
    so on.

37
Thank you for your attention.
38
References
  • 1.Osamu Takata, Sadaaki Miyamoto Fuzzy
    clustering of Data with Interval Uncertainties,
    Journal of Japan Society for Fuzzy Theory and
    Systems, Vol.12, No.5, pp.686-695 (2000) (in
    Japanese)
Write a Comment
User Comments (0)
About PowerShow.com