Classification - PowerPoint PPT Presentation

About This Presentation
Title:

Classification

Description:

We will likely come back to classification and discuss support vector machines as requested ... Find a weight vector that satisfies all the constraints. 11/10/09 ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 31
Provided by: Huan77
Category:

less

Transcript and Presenter's Notes

Title: Classification


1
Classification
  • A task of induction to find patterns

2
Outline
  • Data and its format
  • Problem of Classification
  • Learning a classifier
  • Different approaches
  • Key issues

3
Data and its format
  • Data
  • attribute-value pairs
  • with/without class
  • Data type
  • continuous/discrete
  • nominal
  • Data format
  • Flat
  • If not flat, what should we do?

4
Sample data
5
Induction from databases
  • Inferring knowledge from data
  • The task of deduction
  • infer information that is a logical consequence
    of querying a database
  • Who conducted this class before?
  • Which courses are attended by Mary?
  • Deductive databases extending the RDBMS

6
Classification
  • It is one type of induction
  • data with class labels
  • Examples -
  • If weather is rainy then no golf
  • If
  • If

7
Different approaches
  • There exist many techniques
  • Decision trees
  • Neural networks
  • K-nearest neighbors
  • Naïve Bayesian classifiers
  • Support Vector Machines
  • Ensemble methods
  • Semi-supervised
  • and many more ...

8
A decision tree
9
Inducing a decision tree
  • There are many possible trees
  • lets try it on the golfing data
  • How to find the most compact one
  • that is consistent with the data?
  • Why the most compact?
  • Occams razor principle
  • Issue of efficiency w.r.t. optimality

10
Information gain
and
  • Entropy -
  • Information gain - the difference between the
    node before and after splitting

11
Building a compact tree
  • The key to building a decision tree - which
    attribute to choose in order to branch.
  • The heuristic is to choose the attribute with the
    maximum IG.
  • Another explanation is to reduce uncertainty as
    much as possible.

12
Learn a decision tree
Outlook
sunny
overcast
rain
Humidity
Wind
YES
high
normal
strong
weak
NO
YES
NO
YES
13
Issues of Decision Trees
  • Number of values of an attribute
  • Your solution?
  • When to stop
  • Data fragmentation problem
  • Any solution?
  • Mixed data types
  • Scalability

14
Rules and Tree stumps
  • Generating rules from decision trees
  • One path is a rule
  • We can do better. Why?
  • Tree stumps and 1R
  • For each attribute value, determine a default
    class (of values of rules)
  • Calculate the of errors for each rule
  • Find of errors for that attributes rule set
  • Choose one rule set that has the least of errors

15
K-Nearest Neighbor
  • One of the most intuitive classification
    algorithm
  • An unseen instances class is determined by its
    nearest neighbor
  • The problem is it is sensitive to noise
  • Instead of using one neighbor, we can use k
    neighbors

16
K-NN
  • New problems
  • How large should k be
  • lazy learning does it learn?
  • large storage
  • A toy example (noise, majority)
  • How good is k-NN?
  • How to compare
  • Speed
  • Accuracy

17
Naïve Bayes Classifier
  • This is a direct application of Bayes rule
  • P(CX) P(XC)P(C)/P(X)
  • X - a vector of x1,x2,,xn
  • Thats the best classifier we can build
  • But, there are problems
  • There are only a limited number of instances
  • How to estimate P(xC)
  • Your suggestions?

18
NBC (2)
  • Assume conditional independence between xis
  • We have
  • P(Cx) P(x1C) P(xiC) (xnC)P(C)
  • Whats missing? Is it really correct?
  • An example
  • How good is it in reality?

19
No Free Lunch
  • If the goal is to obtain good generalization
    performance, there are no context-independent or
    usage-independent reasons to favor one learning
    or classification method over another.
  • http//en.wikipedia.org/wiki/No-Free-Lunch_theorem
    s
  • What does it indicate?
  • Or is it easy to choose a good classifier for
    your application?
  • Again, there is no off-the-shelf solution for a
    reasonably challenging application.

20
Ensemble Methods
  • Motivation
  • Stability
  • Model generation
  • Bagging (Bootstrap Aggregating)
  • Boosting
  • Model combination
  • Majority voting
  • Meta learning
  • Stacking (using different types of classifiers)
  • Examples (classify-ensemble.ppt)

21
AdaBoost.M1 (from Weka Book)
Model generation
  • Assign equal weight to each training instance
  • For t iterations
  • Apply learning algorithm to weighted dataset,
  • store resulting model
  • Compute models error e on weighted dataset
  • If e 0 or e gt 0.5
  • Terminate model generation
  • For each instance in dataset
  • If classified correctly by model
  • Multiply instances weight by e/(1-e)
  • Normalize weight of all instances

Classification
Assign weight 0 to all classes For each of the
t models (or fewer) For the class this model
predicts add log e/(1-e) to this classs
weight Return class with highest weight
22
Using many different classifiers
  • We have learned some basic and often used
    classifiers
  • There are many more out there.
  • Regression
  • Discriminant analysis
  • Neural networks
  • Support vector machines
  • Pick the most suitable one for an application
  • Where to find all these classifiers?
  • Dont reinvent the wheel that is not as round
  • We will likely come back to classification and
    discuss support vector machines as requested

23
Assignment 3
  • Pick one of your favorite software package (feel
    free to use any at your disposal, as we discussed
    in class)
  • Use the mushroom dataset found at UC Irvine
    Machine Learning Repository
  • Run a decision tree induction algorithm to get
    the following
  • Use resubstituion error to measure
  • Use 10-fold cross validation to measure
  • Show the confusion matrix for the above two error
    measures
  • Summarize and report your observations and
    conjectures if any
  • Submit a hardcopy report on Wednesday 3/1/06

24
Classification via Neural Networks
Squash
?
A perceptron
25
What can a perceptron do?
  • Neuron as a computing device
  • To separate a linearly separable points
  • Nice things about a perceptron
  • distributed representation
  • local learning
  • weight adjusting

26
Linear threshold unit
  • Basic concepts projection, thresholding

W vectors evoke 1
W .11 .6
L .7 .7
.5
27
E.g. 1 solution region for AND problem
  • Find a weight vector that satisfies all the
    constraints

AND problem 0 0 0 0 1 0 1 0 0 1
1 1
28
E.g. 2 Solution region for XOR problem?
XOR problem 0 0 0 0 1 1 1 0 1 1
1 0
29
Learning by error reduction
  • Perceptron learning algorithm
  • If the activation level of the output unit is 1
    when it should be 0, reduce the weight on the
    link to the ith input unit by rLi, where Li is
    the ith input value and r a learning rate
  • If the activation level of the output unit is 0
    when it should be 1, increase the weight on the
    link to the ith input unit by rLi
  • Otherwise, do nothing

30
Multi-layer perceptrons
  • Using the chain rule, we can back-propagate the
    errors for a multi-layer perceptrons.

Output layer
Hidden layer
Input layer
Write a Comment
User Comments (0)
About PowerShow.com