Clustering methods used in microarray data analysis - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Clustering methods used in microarray data analysis

Description:

Gentleman, Carey, et al (Bioinformatics and Comp Biology Solutings Using R) Chapters ... Milligan and Cooper (Psychometrika 50:159-179, 1985) studied 30 methods ... – PowerPoint PPT presentation

Number of Views:128

Avg rating:3.0/5.0

Slides: 43

Provided by: mou64

Category:

more less

Transcript and Presenter's Notes

Title: Clustering methods used in microarray data analysis

1
Clustering methods used in microarray data
analysis

Steve Horvath
Human Genetics and Biostatistics
UCLA
Acknowledgement based in part on lecture notes
from
Darlene Goldstein web site http//ludwig-sun2.un
il.ch/darlene/

2
Contents

Background clustering
k-means clustering
hierarchical clustering

3
References for clustering

Gentleman, Carey, et al (Bioinformatics and Comp
Biology Solutings Using R) Chapters 11,12,13,
T. Hastie, R. Tibshirani, J. Friedman (2002) The
elements of Statistical Learning. Springer Series
L. Kaufman, P. Rousseeuw (1990) Finding groups in
data. Wiley Series in Probability

4
Clustering

Historically, objects are clustered into groups
periodic table of the elements (chemistry)
taxonomy (zoology, botany)
Why cluster?
Understand the global structure of the data see
the forest instead of the trees
detect heterogeneity in the data, e.g. different
tumor classes
Find biological pathways (cluster gene expression
profiles)
Find data outliers (cluster microarray samples)

5
Classification, Clustering and Prediction

WARNING
many people talk about classification when they
mean clustering (unsupervised learning)
Other people talk about classification when they
mean prediction (supervised learning)
Usually, the meaning is context specific. I
prefer to avoid the term classification and to
talk about clustering or prediction or another
more specific term.
Common denominator classification divides
objects into groups based on a set of values
Unlike a theory, clustering is neither true nor
false, and should be judged largely on the
usefulness of results.
CLUSTERING IS AND ALWAYS WILL BE SOMEWHAT OF AN
ARTFORM
However, a classification (clustering) may be
useful for suggesting a theory, which could then
be tested

6
Cluster analysis

Addresses the problem Given n objects, each
described by p variables (or features), derive a
useful division into a number of classes
Usually want a partition of objects
But also fuzzy clustering
Could also take an exploratory perspective
Unsupervised learning

7
Difficulties in defining cluster
8
Wordy Definition
Cluster analysis aims to group or segment a
collection of objects into subsets or "clusters",
such that those within each cluster are more
closely related to one another than objects
assigned to different clusters. An object can
be described by a set of measurements (e.g.
covariates, features, attributes) or by its
relation to other objects. Sometimes the goal
is to arrange the clusters into a natural
hierarchy, which involves successively grouping
or merging the clusters themselves so that at
each level of the hierarchy clusters within the
same group are more similar to each other than
those in different groups.

9
Clustering Gene Expression Data

Can cluster genes (rows), e.g. to (attempt to)
identify groups of co-regulated genes
Can cluster samples (columns), e.g. to identify
tumors based on profiles
Can cluster both rows and columns at the same
time (to my knowledge, not in R)

10
Clustering Gene Expression Data

Leads to readily interpretable figures
Can be helpful for identifying patterns in time
or space
Useful (essential?) when seeking new subclasses
of samples
Can be used for exploratory purposes

11
SimilarityProximity

Similarity sij indicates the strength of
relationship between two objects i and j
Usually 0 sij 1
Ex 1 absolute value of the Pearson correlation
coefficient
Use of correlation-based similarity is quite
common in gene expression studies but is in
general contentious...
Ex 2 co-expression network methods topological
overlap matrix
Ex 3 random forest similarity

12
Proximity matrices are the input to most
clustering algorithms

Proximity between pairs of objects similarity or
dissimilarity. If the original data were
collected as similarities, a monotone-decreasing
function can be used to convert them to
dissimilarities. Most algorithms use
(symmetric) dissimilarities (e.g. distances) But
the triangle inequality does not have to hold.
Triangle inequality
13
Dissimilarity and Distance

Associated with similarity measures sij bounded
by 0 and 1 is a dissimilarity dij 1 - sij
Distance measures have the metric property (dij
dik djk)
Many examples Euclidean (as the crow flies),
Manhattan (city block), etc.
Distance measure has a large effect on
performance
Behavior of distance measure related to scale of
measurement

14
Partitioning Methods

Partition the objects into a prespecified number
of groups K
Iteratively reallocate objects to clusters until
some criterion is met (e.g. minimize within
cluster sums of squares)
Examples k-means, self-organizing maps (SOM),
partitioning around medoids (PAM), model-based
clustering

15
K-means clustering

Prespecify number of clusters K, and cluster
centers
Minimize within cluster sum of squares from the
centers
Iterate (until cluster assignments do not
change)
For a given cluster assignment, find the cluster
means
For a given set of means, minimize the within
cluster sum of squares by allocating each object
to the closest cluster mean
Intended for situtations where all variables are
quantitative, with (squared) Euclidean distance
(so scale variables suitably before use)

16
PAM clustering

Also need to prespecify number of clusters K
Unlike K-means, the cluster centers (medoids)
are objects, not averages of objects
Can use general dissimilarity
Minimize (unsquared) distances from objects to
cluster centers, so more robust than K-means

17
Combinatorial clustering algorithms.Example
K-means clustering
18
Clustering algorithms

Goal partition the observations into groups
("clusters") so that the pairwise dissimilarities
between those assigned to the same cluster tend
to be smaller than those in different clusters.
3 types of clustering algorithms mixture
modeling, mode seekers (e.g. PRIM algorithm), and
combinatorial algorithms.
We focus on the most popular combinatorial
algorithms.

19
Combinatorial clustering algorithms

Most popular clustering algorithms directly
assign each observation to a group or cluster
without regard to a probability model describing
the data.
Notation Label observations by an integer i in
1,...,N and clusters by an integer k in
1,...,K.
The cluster assignments can be characterized by a
many to one mapping C(i) that assigns the i-th
observation to the k-th cluster C(i)k. (aka
encoder)
One seeks a particular encoder C(i) that
minimizes a particular loss function (aka
energy function).

20
Loss functions for judging clusterings

One seeks a particular encoder C(i) that
minimizes a particular loss function (aka
energy function).
Example within cluster point scatters

21
Cluster analysis by combinatorial optimization

Straightforward in principle Simply minimize
W(C) over all possible assignments of the N data
points to K clusters.
Unfortunately such optimization by complete
enumeration is feasible only for small data sets.
For this reason practical clustering algorithms
are able to examine only a fraction of all
possible encoders C.
The goal is to identify a small subset that is
likely to contain the optimal one or at least a
good sub-optimal partition.
Feasible strategies are based on iterative greedy
descent.

22
K-means clustering is a very popular iterative
descent clustering methods.

Setting all variables are of the quantitative
type and one uses a squared Euclidean distance.
In this case
Note that this can be re-expressed as

23
Thus one can obtain the optimal C by solving the
enlarged optimization problem

This can be minimized by an alternating
optimization procedure given on the next slide
24
K-means clustering algorithm leads to a local
minimum

1. For a given cluster assignment C, the total
cluster variance
is minimized with respect to m1,...,mk yielding
the means of the currently assigned clusters,
i.e. find the cluster means.
2. Given the current set of means, TotVar is
minimized by assigning each observation to the
closest (current) cluster mean. That is
C(i)argmink xi-mk2
3. Steps 1 and 2 are iterated until the
assignments do not change.

25
Recommendations for k-means clustering

Either Start with many different random choices
of
starting means, and choose the solution having
smallest value of
the objective function.
Or use another clustering method (e.g.
hierarchical clustering)
to determine an initial set of cluster centers.

26
Agglomerative clustering, hierarchical clustering
and dendrograms
27
Hierarchical clustering plot
28
Hierarchical Clustering

Produce a dendrogram
Avoid prespecification of the number of clusters
K
The tree can be built in two distinct ways
Bottom-up agglomerative clustering
Top-down divisive clustering

29
Agglomerative Methods

Start with n mRNA sample (or G gene) clusters
At each step, merge the two closest clusters
using a measure of between-cluster dissimilarity
which reflects the shape of the clusters
Examples of between-cluster dissimilarities
Unweighted Pair Group Method with Arithmetic Mean
(UPGMA) average of pairwise dissimilarities
Single-link (NN) minimum of pairwise
dissimilarities
Complete-link (FN) maximum of pairwise
dissimilarities

30
Agglomerative clustering

Agglomerative clustering algorithms begin with
every observation representing a singleton
cluster.
At each of the N-1 the closest 2 (least
dissimilar) clusters are merged into a single
cluster.
Therefore a measure of dissimilarity between 2
clusters must be defined.

31
Between cluster distances (also known as linkage
methods)
32
Different intergroup dissimilarities
Let G and H represent 2 groups.
33
Comparing different linkage methods

If there is a strong clustering tendency, all 3
methods produce similar results.
Single linkage has a tendency to combine
observations linked by a series of close
intermediate observations ("chaining). Good for
elongated clusters
Bad Complete linkage may lead to clusters where
observations assigned to a cluster can be much
closer to members of other clusters than they are
to some members of their own cluster. Use for
very compact clusters (like perls on a string).
Group average clustering represents a compromise
between the extremes of single and complete
linkage. Use for ball shaped clusters

34
Dendrogram

Recursive binary splitting/agglomeration can be
represented by a rooted binary tree.
The root node represents the entire data set.
The N terminal nodes of the trees represent
individual observations.
Each nonterminal node ("parent") has two daughter
nodes.
Thus the binary tree can be plotted so that the
height of each node is proportional to the value
of the intergroup dissimilarity between its 2
daughters.
A dendrogram provides a complete description of
the hierarchical clustering in graphical format.

35
Comments on dendrograms

Caution different hierarchical methods as well
as small changes in the data can lead to
different dendrograms.
Hierarchical methods impose hierarchical
structure whether or not such structure actually
exists in the data.
In general dendrograms are a description of the
results of the algorithm and not graphical
summary of the data.
Only valid summary to the extent that the
pairwise observation dissimilarities obey the
ultrametric inequality

for all i,i,k
36
Figure 1
average
complete
single
37
Divisive Methods

Start with only one cluster
At each step, split clusters into two parts
Advantage Obtain the main structure of the data
(i.e. focus on upper levels of dendrogram)
Disadvantage Computational difficulties when
considering all possible divisions into two groups

38
Discussion
39
Partitioning vs. Hierarchical

Partitioning
Advantage Provides clusters that satisfy some
optimality criterion (approximately)
Disadvantages Need initial K, long computation
time
Hierarchical
Advantage Fast computation (agglomerative)
Disadvantages Rigid, cannot correct later for
erroneous decisions made earlier
Word on the street most data analysts prefer
hierarchical clustering over partitioning methods
when it comes to gene expression data

40
Generic Clustering Tasks

Estimating number of clusters
Assigning each object to a cluster
Assessing strength/confidence of cluster
assignments for individual objects
Assessing cluster homogeneity

41
How many clusters K?

Many suggestions for how to decide this!
Milligan and Cooper (Psychometrika 50159-179,
1985) studied 30 methods
A number of new methods, including GAP
(Tibshirani) and clest (Fridlyand and Dudoit,
uses bootstrapping), see also prediction strength
methods
http//www.genetics.ucla.edu/labs/horvath/General
PredictionStrength/

42
R clustering

A number of R packages (libraries) contain
functions to carry out clustering, including
mva kmeans, hclust
cluster pam (among others)
cclust convex clustering, also methods to
estimate K
mclust model-based clustering
GeneSOM

Write a Comment

User Comments (0)