Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering presentation

About This Presentation

Transcript and Presenter's Notes

Title: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering

1
Methods in Medical Image Analysis Statistics of
Pattern Recognition Classification and Clustering

Some content provided by Milos Hauskrecht,
University of Pittsburgh Computer Science

2
ITK Questions?
3
Classification
4
Classification
5
Classification
6
Features

Loosely stated, a feature is a value describing
something about your data points (e.g. for
pixels intensity, local gradient, distance from
landmark, etc)
Multiple (n) features are put together to form a
feature vector, which defines a data points
location in n-dimensional feature space

7
Feature Space

Feature Space -
The theoretical n-dimensional space occupied by n
input raster objects (features).
Each feature represents one dimension, and its
values represent positions along one of the
orthogonal coordinate axes in feature space.
The set of feature values belonging to a data
point define a vector in feature space.

8
Statistical Notation

Class probability distribution
p(x,y) p(x y) p(y)
x feature vector x1,x2,x3,xn
y class
p(x y) probabilty of x given y
p(x,y) probability of both x and y

9
Example Binary Classification
10
Example Binary Classification

Two class-conditional distributions
p(x y 0) p(x y 1)
Priors
p(y 0) p(y 1) 1

11
Modeling Class Densities

In the text, they choose to concentrate on
methods that use Gaussians to model class
densities

12
Modeling Class Densities
13
Generative Approach to Classification

Represent and learn the distribution
p(x,y)
Use it to define probabilistic discriminant
functions
e.g.
go(x) p(y 0 x)
g1(x) p(y 1 x)

14
Generative Approach to Classification

Typical model
p(x,y) p(x y) p(y)
p(x y) Class-conditional distributions
(densities)
p(y) Priors of classes (probability of class
y)
We Want
p(y x) Posteriors of classes

15
Class Modeling

We model the class distributions as multivariate
Gaussians
x N(µ0, S0) for y 0
x N(µ1, S1) for y 1
Priors are based on training data, or a
distribution can be chosen that is expected to
fit the data well (e.g. Bernoulli distribution
for a coin flip)

16
Making a class decision

We need to define discriminant functions ( gn(x)
)
We have two basic choices
Likelihood of data choose the class (Gaussian)
that best explains the input data (x)
Posterior of class choose the class with a
better posterior probability

17
Calculating Posteriors

Use Bayes Rule
In this case,

18
Linear Decision Boundary

When covariances are the same

19
Linear Decision Boundary
20
Linear Decision Boundary
21
Quadratic Decision Boundary

When covariances are different

22
Quadratic Decision Boundary
23
Quadratic Decision Boundary
24
Clustering

Basic Clustering Problem
Distribute data into k different groups such that
data points similar to each other are in the same
group
Similarity between points is defined in terms of
some distance metric
Clustering is useful for
Similarity/Dissimilarity analysis
Analyze what data point in the sample are close
to each other
Dimensionality Reduction
High dimensional data replaced with a group
(cluster) label

25
Clustering
26
Clustering
27
Distance Metrics

Euclidean Distance, in some space (for our
purposes, probably a feature space)
Must fulfill three properties

28
Distance Metrics

Common simple metrics
Euclidean
Manhattan
Both work for an arbitrary k-dimensional space

29
Clustering Algorithms

k-Nearest Neighbor
k-Means
Parzen Windows

30
k-Nearest Neighbor

In essence, a classifier
Requires input parameter k
In this algorithm, k indicates the number of
neighboring points to take into account when
classifying a data point
Requires training data

31
k-Nearest Neighbor Algorithm

For each data point xn, choose its class by
finding the most prominent class among the k
nearest data points in the training set
Use any distance measure (usually a Euclidean
distance measure)

32
k-Nearest Neighbor Algorithm
-
-
-
-

q1
e1

-

-
1-nearest neighbor the concept represented by e1
5-nearest neighbors q1 is classified as negative
33
k-Nearest Neighbor

Advantages
Simple
General (can work for any distance measure you
want)
Disadvantages
Requires well classified training data
Can be sensitive to k value chosen
All attributes are used in classification, even
ones that may be irrelevant
Inductive bias we assume that a data point
should be classified the same as points near it

34
k-Means

Suitable only when data points have continuous
values
Groups are defined in terms of cluster centers
(means)
Requires input parameter k
In this algorithm, k indicates the number of
clusters to be created
Guaranteed to converge to at least a local optima

35
k-Means Algorithm

Algorithm
Randomly initialize k mean values
Repeat next two steps until no change in means
Partition the data using a similarity measure
according to the current means
Move the means to the center of the data in the
current partition
Stop when no change in the means

36
k-Means
37
k-Means

Advantages
Simple
General (can work for any distance measure you
want)
Requires no training phase
Disadvantages
Result is very sensitive to initial mean
placement
Can perform poorly on overlapping regions
Doesnt work on features with non-continuous
values (cant compute cluster means)
Inductive bias we assume that a data point
should be classified the same as points near it

38
Parzen Windows

Similar to k-Nearest Neighbor, but instead of
using the k closest training data points, its
uses all points within a kernel (window),
weighting their contribution to the
classification based on the kernel
As with our classification algorithms, we will
consider a gaussian kernel as the window

39
Parzen Windows

Assume a region defined by a d-dimensional
Gaussian of scale s
We can define a window density function
Note that we consider all points in the training
set, but if a point is outside of the kernel, its
weight will be 0, negating its influence

40
Parzen Windows
41
Parzen Windows

Advantages
More robust than k-nearest neighbor
Excellent accuracy and consistency
Disadvantages
How to choose the size of the window?
Alone, kernel density estimation techniques
provide little insight into data or problems

Write a Comment

User Comments (0)

About PowerShow.com

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering PowerPoint PPT Presentation