Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering PowerPoint PPT Presentation

presentation player overlay
1 / 41
About This Presentation
Transcript and Presenter's Notes

Title: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering


1
Methods in Medical Image Analysis Statistics of
Pattern Recognition Classification and Clustering
  • Some content provided by Milos Hauskrecht,
    University of Pittsburgh Computer Science

2
ITK Questions?
3
Classification
4
Classification
5
Classification
6
Features
  • Loosely stated, a feature is a value describing
    something about your data points (e.g. for
    pixels intensity, local gradient, distance from
    landmark, etc)
  • Multiple (n) features are put together to form a
    feature vector, which defines a data points
    location in n-dimensional feature space

7
Feature Space
  • Feature Space -
  • The theoretical n-dimensional space occupied by n
    input raster objects (features).
  • Each feature represents one dimension, and its
    values represent positions along one of the
    orthogonal coordinate axes in feature space.
  • The set of feature values belonging to a data
    point define a vector in feature space.

8
Statistical Notation
  • Class probability distribution
  • p(x,y) p(x y) p(y)
  • x feature vector x1,x2,x3,xn
  • y class
  • p(x y) probabilty of x given y
  • p(x,y) probability of both x and y

9
Example Binary Classification
10
Example Binary Classification
  • Two class-conditional distributions
  • p(x y 0) p(x y 1)
  • Priors
  • p(y 0) p(y 1) 1

11
Modeling Class Densities
  • In the text, they choose to concentrate on
    methods that use Gaussians to model class
    densities

12
Modeling Class Densities
13
Generative Approach to Classification
  • Represent and learn the distribution
    p(x,y)
  • Use it to define probabilistic discriminant
    functions
  • e.g.
  • go(x) p(y 0 x)
  • g1(x) p(y 1 x)

14
Generative Approach to Classification
  • Typical model
  • p(x,y) p(x y) p(y)
  • p(x y) Class-conditional distributions
    (densities)
  • p(y) Priors of classes (probability of class
    y)
  • We Want
  • p(y x) Posteriors of classes

15
Class Modeling
  • We model the class distributions as multivariate
    Gaussians
  • x N(µ0, S0) for y 0
  • x N(µ1, S1) for y 1
  • Priors are based on training data, or a
    distribution can be chosen that is expected to
    fit the data well (e.g. Bernoulli distribution
    for a coin flip)

16
Making a class decision
  • We need to define discriminant functions ( gn(x)
    )
  • We have two basic choices
  • Likelihood of data choose the class (Gaussian)
    that best explains the input data (x)
  • Posterior of class choose the class with a
    better posterior probability

17
Calculating Posteriors
  • Use Bayes Rule
  • In this case,

18
Linear Decision Boundary
  • When covariances are the same

19
Linear Decision Boundary
20
Linear Decision Boundary
21
Quadratic Decision Boundary
  • When covariances are different

22
Quadratic Decision Boundary
23
Quadratic Decision Boundary
24
Clustering
  • Basic Clustering Problem
  • Distribute data into k different groups such that
    data points similar to each other are in the same
    group
  • Similarity between points is defined in terms of
    some distance metric
  • Clustering is useful for
  • Similarity/Dissimilarity analysis
  • Analyze what data point in the sample are close
    to each other
  • Dimensionality Reduction
  • High dimensional data replaced with a group
    (cluster) label

25
Clustering
26
Clustering
27
Distance Metrics
  • Euclidean Distance, in some space (for our
    purposes, probably a feature space)
  • Must fulfill three properties

28
Distance Metrics
  • Common simple metrics
  • Euclidean
  • Manhattan
  • Both work for an arbitrary k-dimensional space

29
Clustering Algorithms
  • k-Nearest Neighbor
  • k-Means
  • Parzen Windows

30
k-Nearest Neighbor
  • In essence, a classifier
  • Requires input parameter k
  • In this algorithm, k indicates the number of
    neighboring points to take into account when
    classifying a data point
  • Requires training data

31
k-Nearest Neighbor Algorithm
  • For each data point xn, choose its class by
    finding the most prominent class among the k
    nearest data points in the training set
  • Use any distance measure (usually a Euclidean
    distance measure)

32
k-Nearest Neighbor Algorithm
-
-
-
-

q1
e1

-


-
1-nearest neighbor the concept represented by e1
5-nearest neighbors q1 is classified as negative
33
k-Nearest Neighbor
  • Advantages
  • Simple
  • General (can work for any distance measure you
    want)
  • Disadvantages
  • Requires well classified training data
  • Can be sensitive to k value chosen
  • All attributes are used in classification, even
    ones that may be irrelevant
  • Inductive bias we assume that a data point
    should be classified the same as points near it

34
k-Means
  • Suitable only when data points have continuous
    values
  • Groups are defined in terms of cluster centers
    (means)
  • Requires input parameter k
  • In this algorithm, k indicates the number of
    clusters to be created
  • Guaranteed to converge to at least a local optima

35
k-Means Algorithm
  • Algorithm
  • Randomly initialize k mean values
  • Repeat next two steps until no change in means
  • Partition the data using a similarity measure
    according to the current means
  • Move the means to the center of the data in the
    current partition
  • Stop when no change in the means

36
k-Means
37
k-Means
  • Advantages
  • Simple
  • General (can work for any distance measure you
    want)
  • Requires no training phase
  • Disadvantages
  • Result is very sensitive to initial mean
    placement
  • Can perform poorly on overlapping regions
  • Doesnt work on features with non-continuous
    values (cant compute cluster means)
  • Inductive bias we assume that a data point
    should be classified the same as points near it

38
Parzen Windows
  • Similar to k-Nearest Neighbor, but instead of
    using the k closest training data points, its
    uses all points within a kernel (window),
    weighting their contribution to the
    classification based on the kernel
  • As with our classification algorithms, we will
    consider a gaussian kernel as the window

39
Parzen Windows
  • Assume a region defined by a d-dimensional
    Gaussian of scale s
  • We can define a window density function
  • Note that we consider all points in the training
    set, but if a point is outside of the kernel, its
    weight will be 0, negating its influence

40
Parzen Windows
41
Parzen Windows
  • Advantages
  • More robust than k-nearest neighbor
  • Excellent accuracy and consistency
  • Disadvantages
  • How to choose the size of the window?
  • Alone, kernel density estimation techniques
    provide little insight into data or problems
Write a Comment
User Comments (0)
About PowerShow.com