Clustering Analysis: Outline - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Clustering Analysis: Outline

Description:

In Factor Analysis original set of variables are reduced to smaller number of ... In Discriminant Analysis Clusters are known in advanced and discriminating ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 14
Provided by: csBil
Category:

less

Transcript and Presenter's Notes

Title: Clustering Analysis: Outline


1
Clustering Analysis Outline
  • General Pupose
  • Clustering Categories
  • Clustering vs Other Multivariate Data Analysis
  • Area of Application
  • Distance Measures
  • Hierarchical Tree Clustering
  • K-Means Clustering
  • K-Means vs ANOVA
  • Expectation Maximization Clustering(for
    Categorical Variables)
  • Two-Way Clustering (Block Clustering)
  • Ex Content Clustering in Texts

2
Clustering AnalysisGeneral Purpose
  • How to organize observed data into meaningful
    structures, that is, to develop taxonomies.
  • Ex To organize the different species of animals
    before a meaningful description of the
    differences between animals is possible.
  • Target Both minimize within-group variation
    and maximize between-group variation.

3
Clustering Analysis Categories
  • A. Hierarchical Clustering
  • Ex Tree Clustering.
  • B. i ) K-means Clustering
  • ii) Expecatation Maximization Clustering
  • C. Block Clustering(Two-way Joining)
  • Ex Concept Clustering within Texts

4
Clustering Analysis Similarity to Discriminant
and Factor Analysis
  • In Factor Analysis original set of variables are
    reduced to smaller number of Factors, while in
    clustering original set of variables are grouped.
  • In Discriminant Analysis Clusters are known in
    advanced and discriminating variables are worked,
    while in clustering we try to discover natural
    clusters within the data.

5
Clustering Analysis Statistical Significance
Testing
  • Unlike many other statistical procedures, cluster
    analysis methods are mostly used when we do not
    have any a priori hypotheses, but are still in
    the exploratory phase of our research. In a
    sense, cluster analysis finds the "most
    significant solution possible." Therefore,
    statistical significance testing is really not
    appropriate for clustering analysis.

6
Clustering Analysis Area of Application
  • In general, whenever one needs to classify a
    "mountain" of information into manageable
    meaningful piles, cluster analysis is of great
    utility.
  • Ex clustering diseases, cures for diseases, or
    symptoms of diseases can lead to very useful
    taxonomies. In the field of psychiatry, the
    correct diagnosis of clusters of symptoms such as
    paranoia, schizophrenia, etc. is essential for
    successful therapy.

7
Clustering Analysis Hierarchical Clustering
  • i.) Bottom-Up(upward)The purpose of this method
    is to join together variables into successively
    larger clusters, using some measure of similarity
    or distance.
  • Initially each variable is considered as a
    separate cluster.
  • Thus, similarity threshold is relaxed.
  • ii.) Top-Down(downward) In that method, a
    partitioning scheme is followed.
  • Initially all data set is considered as a single
    cluster.
  • Repeatedly similarity threshold is tightened.

8
Clustering Analysis Hierarchical Tree Clustering
9
Clustering AnalysisDistance Measures
  • Euclidean distance
  • The distance between any two objects is not
    affected by the addition of new objects to the
    analysis, which may be outliers.
  • distance(x,y) i (xi - yi)2 ½
  • Chebychev distance
  • differentiate furthest dimensions or attributes
  • distance(x,y) Maximumxi - yi
  • Percent disagreement
  • This measure is particularly useful if the data
    for the dimensions included in the analysis are
    categorical in nature.
  • distance(x,y) (Number of xi yi)/ i

10
Clustering AnalysisK-Means Clustering
  • When we already have hypotheses concerning the
    number of clusters in your cases or variables
    then we can address the k- means clustering.
  • In general, the k-means method will produce
    exactly k different clusters of greatest possible
    distinction.

11
Clustering Analysis K-Means vs ANOVA
  • K-Means clustering is analogous to "ANOVA in
    reverse" in the sense that
  • - The significance test in ANOVA evaluates the
    between group variability against the
    within-group variability when computing the
    significance test for the hypothesis that the
    means in the groups are different from each
    other.
  • - In k-means clustering, the program tries to
    move objects (e.g., cases) in and out of groups
    (clusters) to get the most significant ANOVA
    results.

12
Clustering AnalysisExpectation Maximization
Clustering(Categorical Variables)
  • Classification probabilities instead of
    classifications Each observation belongs to each
    cluster with a certain probability.
  • Categorical variablesThe EM algorithm can also
    accommodate categorical variables. The program
    will at first randomly assign different
    probabilities (weights, to be precise) to each
    class or category, for each cluster. In
    successive iterations, these probabilities are
    refined (adjusted) to maximize the likelihood of
    the data given the specified number of clusters.

13
Clustering Analysis Two-Way Clustering (Block
Clustering)
  • Block Clustering is useful in the relatively rare
    circumstances when one expects that both cases
    and variables will simultaneously contribute to
    the uncovering of meaningful patterns of clusters.
Write a Comment
User Comments (0)
About PowerShow.com