Machine Learning: A Brief Introduction - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Machine Learning: A Brief Introduction

Description:

Machine Learning: as a Tool for Classifying Patterns. What is the difference ... R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd Ed. ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 28
Provided by: ist3
Category:

less

Transcript and Presenter's Notes

Title: Machine Learning: A Brief Introduction


1
Machine Learning A Brief Introduction
  • Fu Chang
  • Institute of Information Science
  • Academia Sinica
  • 2788-3799 ext. 1819
  • fchang_at_iis.sinica.edu.tw

2
Machine Learning as a Tool for Classifying
Patterns
  • What is the difference between you and me?
  • Tentative answer 1
  • You are pretty, and I am ugly
  • A vague answer, not very useful
  • Tentative answer 2
  • You have a tiny mouth, and I have a big one
  • A lot more useful, but what if we are viewed from
    the side?
  • In general, can we use a single feature
    difference to distinguish one pattern from
    another?

3
Old Philosophical Debates
  • What makes a cup a cup?
  • Philosophical views
  • Plato the ideal type
  • Aristotle the collection of all cups
  • Wittgenstein family resemblance

4
Machine Learning Viewpoint
  • Represent each object with a set of features
  • Mouth, nose, eyes, etc., viewed from the front,
    the right side, the left side, etc.
  • Each pattern is taken as a conglomeration of
    sample points or feature vectors

5
Patterns as Conglomerations of sample Points
B
A
Two types of sample points
6
ML Viewpoint (Cntd)
  • Training phase
  • Want to learn pattern differences among
    conglomerations of labeled samples
  • Have to describe the differences by means of a
    model probability distribution, prototype,
    neural network, etc.
  • Have to estimate parameters involved in the model
  • Testing phase
  • Have to classify at acceptable accuracy rates

7
Models
  • Neural networks
  • Support vector machines
  • Classification and regression tree
  • AdaBoost
  • Statistical models
  • Prototype classifiers

8
Neural Networks
9
Back-Propagation Neural Networks
  • Layers
  • Input number of nodes dimension of feature
    vector
  • Output number of nodes number of class types
  • Hidden number of nodes gt dimension of feature
    vector
  • Direction of data migration
  • Training backward propagation
  • Testing forward propagation
  • Training problems
  • Overfitting
  • Convergence

10
Illustration
11
Support Vector Machines (SVM)
12
SVM
  • Gives rise to the optimal solution to binary
    classification problem
  • Finds a separating boundary (hyperplane) that
    maintains the largest margin between samples of
    two class types
  • Things to tune up with
  • Kernel functions defining the similarity measure
    of two sample vectors
  • Tolerance for misclassification
  • Parameters associated with the kernel function

13
Illustration
14
Classification and Regression Tree (CART)
15
Illustration
16
AdaBoost
17
AdaBoost
  • Can be thought as a linear combination of the
    same classifier c(, ) with varying weights
  • The Idea
  • Iteratively apply the same classifier c to a set
    of samples
  • At iteration m, the samples erroneously
    classified at (m-1)st iteration are duplicated at
    a rate ?m
  • The weight ßm is related to ?m in a certain way

18
Statistical Models
19
Bayesian Approach
  • Given
  • Training samples X x1, x2, , xn
  • Probability density p(tT)
  • t is an arbitrary vector (a test sample)
  • T is the set of parameters
  • T is taken as a set of random variables

20
Bayesian Approach (Cntd)
  • Posterior density
  • Different class types give rise to different
    posteriors
  • Use the posteriors to evaluate the class type of
    a given test sample t

21
A Bayesian Model with Hidden Variables
  • In addition to the observed data X, there exist
    some hidden data H
  • H is taken as a set of random variables
  • We want to optimize
  • with both T and H as unknown
  • Some iterative procedure (EM algorithm) is
    required to do this

22
Hidden Markov Model (HMM)
  • HMM is a Bayesian model with hidden variables
  • The observed data consist of sequences of samples
  • The hidden variables are sequences of consecutive
    states

23
Boltzmann-Gibbs Distribution
  • Given
  • States s1, s2, , sn
  • Density p(s) ps
  • Maximum entropy principle
  • Without any information, one chooses the density
    ps to maximize the entropy
  • subject to the constraints

24
Boltzmann-Gibbs (Cntd)
  • Consider the Lagrangian
  • Take partial derivatives of L with respect to ps
    and set them to zero, we obtain Boltzmann-Gibbs
    density functions
  • where Z is the normalizing factor

25
Boltzmann-Gibbs (Cntd)
  • Maximum entropy (ME)
  • Use of Boltzmann-Gibbs as prior distribution
  • Compute the posterior for given observed data and
    features fi
  • Use the optimal posterior to classify

26
Boltzmann-Gibbs (Cntd)
  • Maximum entropy Markov model (MEMM)
  • The posterior consists of transition probability
    densities
  • p(s s, X)
  • Conditional random field (CRF)
  • The posterior consists of both transition
    probability densities p(s s, X) and state
    probability densities
  • p(s X)

27
References
  • R. O. Duda, P. E. Hart, and D. G. Stork, Pattern
    Classification, 2nd Ed., Wiley Interscience,
    2001.
  • T. Hastie, R. Tibshirani, and J. Friedman, The
    Elements of Statistical Learning,
    Springer-Verlag, 2001.
  • P. Baldi and S. Brunak, Bioinformatics The
    Machine Learning Appraoch, The MIT Press, 2001.
Write a Comment
User Comments (0)
About PowerShow.com