Data Mining

About This Presentation

Transcript and Presenter's Notes

Title: Data Mining

1
3. Classification Methods

Patterns and Models
Regression, NBC
k-Nearest Neighbors
Decision Trees and Rules
Large size data

2
Models and Patterns

A model is a global description of data, or an
abstract representation of a real-world process
Estimating parameters of a model
Data-driven model building
Examples Regression, Graphical model (BN), HMM
A pattern is about some local aspects of data
Patterns in data matrices
Predicates (age lt 40) (income lt 10)
Patterns for strings (ASCII characters, DNA
alphabet)
Pattern discovery rules

3
Performance Measures

Generality
How many instances are covered
Applicability
Or is it useful? All husbands are male.
Accuracy
Is it always correct? If not, how often?
Comprehensibility
Is it easy to understand? (a subjective measure)

4
Forms of Knowledge

Concepts
Probabilistic, logical (proposition/predicate),
functional
Rules
Taxonomies and Hierarchies
Dendrograms, decision trees
Clusters
Structures and Weights/Probabilities
ANN, BN

5
Induction from Data

Inferring knowledge from data - generalization
Supervised vs. unsupervised learning
Some graphical illustrations of learning tasks
(regression, classification, clustering)
Any other types of learning?
Compare The task of deduction
Infer information/fact that is a logical
consequence of facts in a database
Who is Johns grandpa? (deduced from e.g. Mary is
Johns mother, Joe is Marys father)
Deductive databases extending the RDBMS

6
The Classification Problem

From a set of labeled training data, build a
system (a classifier) for predicting the class of
future data instances (tuples).
A related problem is to build a system from
training data to predict the value of an
attribute (feature) of future data instances.

7
What is a bad classifier?

Some simplest classifiers
Table-Lookup
What if x cannot be found in the training data?
We give up!?
Or, we can
A simple classifier Cs can be built as a
reference
If it can be found in the table (training data),
return its class otherwise, what should it
return?
A bad classifier is one that does worse than Cs.
Do we need to learn a classifier for data of one
class?

8
Many Techniques

Decision trees
Linear regression
Neural networks
k-nearest neighbour
Naïve Bayesian classifiers
Support Vector Machines
and many more ...

9
Regression for Numeric Prediction

Linear regression is a statistical technique when
class and all the attributes are numeric.
y a ßx, where a and ß are regression
coefficients
We need to use instances ltxi,ygt to find a and ß
by minimizing SSE (least squares)
SSE S(yi-yi)2 S(yi- a - ßxi)2
Extensions
Multiple regression
Piecewise linear regression
Polynomial regression

10
Nearest Neighbor

Also called instance based learning
Algorithm
Given a new instance x,
find its nearest neighbor ltx,ygt
Return y as the class of x
Distance measures
Normalization?!
Some interesting questions
Whats its time complexity?
Does it learn?

11
Nearest Neighbor (2)

Dealing with noise k-nearest neighbor
Use more than 1 neighbor
How many neighbors?
Weighted nearest neighbors
How to speed up?
Huge storage
Use representatives (a problem of instance
selection)
Sampling
Grid
Clustering

12
Naïve Bayes Classification

This is a direct application of Bayes rule
P(Cx) P(xC)P(C)/P(x)
x - a vector of x1,x2,,xn
Thats the best classifier you can ever build
You dont even need to select features, it takes
care of it automatically
But, there are problems
There are a limited number of instances
How to estimate P(xC)

13
NBC (2)

Assume conditional independence between xis
We have P(Cx) P(x1C) P(xiC) (xnC)P(C)
How good is it in reality?
Lets build one NBC for a very simple data set
Estimate the priors and conditional probabilities
with the training data
P(C1) ? P(C2) ? P(x11C1)? P(x12C1)?
What is the class for x(1,2,1)?
P(1x) P(x111) P(x221) P(x311) P(1),
P(2x)
What is the class for (1,2,2)?

14
Example of NBC
C 1 2
7 4 3
A10 2 0
A11 2 1
A12 0 2
A20
A21
A22
A31
A32
A1 A2 A3 C
1 2 1 1
0 0 1 1
2 1 2 2
1 2 1 2
0 1 2 1
2 2 2 2
1 0 1 1
15
Golf Data
16
Decision Trees

A decision tree

Outlook
sunny
overcast
rain
Humidity
Wind
YES
high
normal
strong
weak
NO
YES
NO
YES
17
How to grow a tree?

Randomly ? Random Forests (Breiman, 2001)
What are the criteria to build a tree?
Accurate
Compact
A straightforward way to grow is
Pick an attribute
Split data according to its values
Recursively do the first two steps until
No data left
No feature left

18
Discussion

There are many possible trees
lets try it on the golf data
How to find the most compact one
that is consistent with the data?
Why the most compact?
Occams razor principle
Issue of efficiency w.r.t. optimality
One attribute at a time or

19
Grow a good tree efficiently

The heuristic to find commonality in feature
values associated with class values
To build a compact tree generalized from the data
It means we look for features and splits that can
lead to pure leaf nodes.
Is it a good heuristic?
What do you think?
How to judge it?
Is it really efficient?
How to implement it?

20
Lets grow one

Measuring the purity of a data set Entropy
Information gain (see the brief review)
Choose the feature with max gain

21
Different numbers of values

Different attributes can have varied numbers of
values
Some treatments
Removing useless attributes before learning
Binarization
Discretization
Gain-ratio is another practical solution
Gain root-Info InfoAttribute(i)
Split-Info -?((Ti/T)log2 (Ti/T))
Gain-ratio Gain / Split-Info

22
Another kind of problems

A difficult problem. Why is it difficult?
Similar ones are Parity, Majority problems.

XOR problem 0 0 0 0 1 1 1 0 1 1
1 0
23
Tree Pruning

Overfitting Model fits training data too well,
but wont work well for unseen data.
An effective approach to avoid overfitting and
for a more compact tree (easy to understand)
Two general ways to prune
Pre-pruning stop splitting further
Any significant difference in classification
accuracy before and after division
Post-pruning to trim back

24
Rules from Decision Trees

Two types of rules
Order sensitive (more compact, less efficient)
Order insensitive
The most straightforward way is
Class-based method
Group rules according to classes
Select most general rules (or remove redundant
ones)
Data-based method
Select one rule at a time (keep the most general
one)
Work on the remaining data until all data is
covered

25
Variants of Decision Trees and Rules

Tree stumps
Holtes 1R rules (1992)
For each attribute A
Sort according to its values v
Find the most frequent class value c for each v
Breaking tie with coin flipping
Output the most accurate rule as if Av then c
An example (the Golf data)

26
Handling Large Size Data

When data simply cannot fit in memory
Is it a big problem?
Three representative approaches
Smart data structures to avoid unnecessary
recalculation
Hash trees
SPRINT
Sufficient statistics
AVC-set (Attribute-Value, Class label) to
summarize the class distribution for each
attribute
Example RainForest
Parallel processing
Make data parallelizable

27
Ensemble Methods

A group of classifiers
Hybrid (Stacking)
Single type
Strong vs. weak learners
A good ensemble
Accuracy
Diversity
Some major approaches form ensembles
Bagging
Boosting

28
Bibliography

I.H. Witten and E. Frank. Data Mining Practical
Machine Learning Tools and Techniques with Java
Implementations. 2000. Morgan Kaufmann.
M. Kantardzic. Data Mining Concepts, Models,
Methods, and Algorithms. 2003. IEEE.
J. Han and M. Kamber. Data Mining Concepts and
Techniques. 2001. Morgan Kaufmann.
D. Hand, H. Mannila, P. Smyth. Principals of Data
Mining. 2001. MIT.
T. G. Dietterich. Ensemble Methods in Machine
Learning. I. J. Kittler and F. Roli (eds.) 1st
Intl Workshop on Multiple Classifier Systems, pp
1-15, Springer-Verlag, 2000.

Write a Comment

User Comments (0)

About PowerShow.com

Data Mining PowerPoint PPT Presentation