Title: EM Algorithm: Expectation Maximazation Clustering Algorithm book: DataMining, Morgan Kaufmann, Frank
1EM AlgorithmExpectation MaximazationClustering
Algorithmbook DataMining, Morgan Kaufmann,
Frank
- DataMining, Morgan Kaufmann, p218-227
- Mining Lab. ???
- 2004? 10? 27?
2Content
- Clustering
- K-Means via EM
- Mixture Model
- EM Algorithm
- Simple examples of EM
- EM Application WEKA
- References
3Clustering (1/2)
- Clustering ?
- Clustering algorithms divide a data set into
natural groups (clusters). - Instances in the same cluster are similar to each
other, they share certain properties. - e.g Customer Segmentation.
- Clustering vs. Classification
- Supervised Learning
- Unsupervised Learning
- Not target variable to be predicted.
4Clustering (2/2)
- Categorization of Clustering Methods
- Partitioning mehtods
- K-Means / K-medoids / PAM / CRARA / CRARANS
- Hierachical methods
- CURE / CHAMELON / BIRCH
- Density-based methods
- DBSCAN / OPTICS
- Grid-based methods
- STING / CLIQUE / Wave-Cluster
- Model-based methods
- EM / COBWEB / Bayesian / Neural
Model-Based Clustering
Probability-based Clustering
Statistical Clustering
5K-Means (1)Algorithm
- Step 0
- Select K objects as initial centroids.
- Step 1 (Assignment)
- For each object compute distances to k centroids.
- Assign each object to the cluster to which it is
the closest. - Step 2 (New Centroids)
- Compute a new centroid for each cluster.
- Step 3 (Converage)
- Stop if the change in the centroids is less than
the selected covergence criterion. - Otherwise repeat Step 1.
6K-Means (2)simple example
New Centroids (Check)
Input Data
Random Centroids
Assignment
New Centroids (check)
Assignment
Assignment
Centroids (check)
7K-Means (3)weakness on outlier (noise)
8K-Means (4)Calculation
1. (4,4), (3,4)
0. (4,4), (3,4)
(4,2), (0,2), (1,1), (1,0) (100, 0)
(4,2), (0,2), (1,1), (1,0)
1. 1) lt3.5, 4gt lt21, 1gt
1. 1) lt3.5, 4gt lt1.5, 1.25gt
2) lt3.5, 4gt - (0,2), (1,1), (1,0),(3,4),(4,4),(4,
2) lt21, 1gt - (100,1)
2) lt3.5, 4gt - (3, 4), (4, 4), (4, 2) lt1.5,
1.25gt - (0, 2) (1, 1), (1, 0)
2. 1) lt2.1, 2.1gt lt100, 0gt
2. 2) lt3.67, 3.3gt lt0.67, 1gt
2) lt2.1, 2.1gt - (0, 2),(1,1),(1,0),(3,4),(4,4),(4
,2) lt100, 1gt - (100, 1)
3) lt3.67, 3.3gt - (3, 4), (4, 4), (4, 2)
lt0.67, 1gt - (0, 2) (1, 1), (1, 0)
9K-Means (5)comparison with EM
C1
- K-Means
- Hard Clustering.
- A instance belong to only one Cluster.
- Based on Euclidean distance.
- Not Robust on outlier, value range.
- EM
- Soft Clustering.
- A instance belong to several clusters with
- membership probability.
- Based on density probability.
- Can handle both numeric and nominal attributes.
I
C2
C1
0.7
0.3
I
C2
10Mixture Model (1)
- A Mixture is a set of k probability
distributions, repesenting k clusters. - A probability distribution have mean and
variances. - The mixture model combines several normal
distributions.
11Mixture Model (2)
- Only one numeric attribute
- five parameter
-
12Mixture Model (3) Simple Example
- Probability that an instance x belongs to cluster
A
Probability Density Function
13Mixture Model (4)Probability Density Function
- Normal Distribution
- Gaussian Density Function
- Poisson Distribution
14Mixture Model (5)Probability Density Function
Iteration
15EM Algorithm (1)
- Step 1. (Initialization)
- Random probability
- Step 2. (Maximization Step)
- Re-create cluster model
- Re-compute the parameter T(mean, variance)
- normal distribution.
- Step 3. (Expectation Step)
- Update records weight
- Step 4.
- Calculate log-likelihood
- If the value saturates, exit
- If not, Go to Step 2.
Parameter Adjustment
Weight Adjustment
16EM Algorithm (2)Initialization
- Random Probability
- M-Step
- Example
17EM Algorithm (3)M-Step Parameter (Mean, Dev)
- Estimating parameters from weighted instances
- Parameters
- means, deviations.
18EM Algorithm (3)M-Step Parameter (Mean, Dev)
19EM Algorithm (4)E-Step Weight
20EM Algorithm (5)E-Step Weight
21EM Algorithm (6)Objective Function (check)
- Log-likelihood Function
- For all instances, its probability belong to
cluster A, - Use log for analysis
1-Dimensional data 2-Cluster A,B
N-Dimensional data K-cluster - Mean vector -
Covariance matrix
22EM Algorithm (7)Objective Function (check)
- Covariance Matrix - Mean Vector
23EM Algorithm (8)Termination
- Termination
- Procedure stops when log-likelihood saturates.
Q4
Q3
Q2
Q1
Q0
of Iteration
24EM Algorithm (1)Simple Data
- EM example
- 6 data (3 sample per 1 class)
- 2 class (circle, rectangle)
25EM Algorithm (2)
Likelihood function of two component means T1, T2
26EM Algorithm (3)
27EM Example (1)
- Example dataset
- 2 Column(Math, English), 6 record
28EM Example (2)
- Distri. Of Math
- mean 56.67
- variance 776.73
- Distri. Of Eng
- mean 82.5
- variance 197.50
100
50
0
100
50
29EM Example (3)
30EM Example (4)
Maximization Step (parameter adjustment)
31EM Example (4)
32EM Example (5)
Expectation Step (Weight adjustment)
Maximization Step (parameter adjustment)
33EM Example (6)
Expectation Step (Weight adjustment)
Maximization Step (parameter adjustment)
34EM Example (6)
Expectation Step (Weight adjustment)
Maximization Step (parameter adjustment)
35EM Application (1)Weka
- Weka
- Waikato University in Newzealand
- Open Source Mining Tool
- http//www.cs.waikato.ac.nz/ml/weka
- Experiment Data
- Iris data
- Real Data
- Department Customer Data
- Modified Customer Data
36EM Application (2)IRIS Data
- Data Info
- Attribute Information
- sepal length in cm / sepal width / petal length /
petal width in cm - class Iris Setosa / Iris Versicolour / Iris
Virginica
37EM Application (3)IRIS Data
38EM Application (4)Weka Usage
- Weka Clustering Packages
- Command line Execution
- GUI Execution
Weka.clusterers
Java weka.clusterers.EM t iris.arff N 2
Java weka.clusterers.EM t iris.arff N 2 -V
Java jar weka.jar
39EM Application (4)Weka Usage
- Options for clustering in weka
40EM Application (5)Weka usage
41EM Application (5)Weka usage input file format
Summary Statistics Min Max
Mean SD Class Correlation sepal length
4.3 7.9 5.84 0.83 0.7826 sepal
width 2.0 4.4 3.05 0.43 -0.4194 petal
length 1.0 6.9 3.76 1.76 0.9490
(high!) petal width 0.1 2.5 1.20 0.76
0.9565 (high!) _at_RELATION iris _at_ATTRIBUTE
sepallength REAL _at_ATTRIBUTE sepalwidth
REAL _at_ATTRIBUTE petallength REAL _at_ATTRIBUTE
petalwidth REAL _at_ATTRIBUTE class
Iris-setosa,Iris-versicolor,Iris-virginica _at_DA
TA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iri
s-setosa 4.7,3.2,1.3,0.2,Iris-setosa
42EM Application (6)Weka usage output format
Number of clusters 3 Cluster 0 Prior
probability 0.3333 Attribute
sepallength Normal Distribution. Mean 5.006
StdDev 0.3489 Attribute sepalwidth Normal
Distribution. Mean 3.418 StdDev
0.3772 Attribute petallength Normal
Distribution. Mean 1.464 StdDev
0.1718 Attribute petalwidth Normal Distribution.
Mean 0.244 StdDev 0.1061 Attribute
class Discrete Estimator. Counts 51 1 1
(Total 53) 0 50 ( 33) 1 48 (
32) 2 52 ( 35) Log likelihood -2.21138
43EM Application (6)Result Visualization
44References
- DataMining
- Morgan Cauffmann. IAN H. p218-p255.
- DataMining, Concepts and Techiques.
- Jiawei Han. Chapter 8.
- The Expectation Maximization Algorithm
- Frank Dellaert, Febrary 2002.
- A Gentle Tutorial of the EM Algorithm and its
Application to Parameter Estimation for Gaussian
Mixture and Hidden Markov Models.