Data Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Data Mining

Description:

3. Classification Methods Patterns and Models Regression, NBC k-Nearest Neighbors Decision Trees and Rules Large size data Models and Patterns A model is a global ... – PowerPoint PPT presentation

Number of Views:170
Avg rating:3.0/5.0
Slides: 29
Provided by: cecsWrig7
Learn more at: http://cecs.wright.edu
Category:
Tags: class | data | mining | rules

less

Transcript and Presenter's Notes

Title: Data Mining


1
3. Classification Methods
  • Patterns and Models
  • Regression, NBC
  • k-Nearest Neighbors
  • Decision Trees and Rules
  • Large size data

2
Models and Patterns
  • A model is a global description of data, or an
    abstract representation of a real-world process
  • Estimating parameters of a model
  • Data-driven model building
  • Examples Regression, Graphical model (BN), HMM
  • A pattern is about some local aspects of data
  • Patterns in data matrices
  • Predicates (age lt 40) (income lt 10)
  • Patterns for strings (ASCII characters, DNA
    alphabet)
  • Pattern discovery rules

3
Performance Measures
  • Generality
  • How many instances are covered
  • Applicability
  • Or is it useful? All husbands are male.
  • Accuracy
  • Is it always correct? If not, how often?
  • Comprehensibility
  • Is it easy to understand? (a subjective measure)

4
Forms of Knowledge
  • Concepts
  • Probabilistic, logical (proposition/predicate),
    functional
  • Rules
  • Taxonomies and Hierarchies
  • Dendrograms, decision trees
  • Clusters
  • Structures and Weights/Probabilities
  • ANN, BN

5
Induction from Data
  • Inferring knowledge from data - generalization
  • Supervised vs. unsupervised learning
  • Some graphical illustrations of learning tasks
    (regression, classification, clustering)
  • Any other types of learning?
  • Compare The task of deduction
  • Infer information/fact that is a logical
    consequence of facts in a database
  • Who is Johns grandpa? (deduced from e.g. Mary is
    Johns mother, Joe is Marys father)
  • Deductive databases extending the RDBMS

6
The Classification Problem
  • From a set of labeled training data, build a
    system (a classifier) for predicting the class of
    future data instances (tuples).
  • A related problem is to build a system from
    training data to predict the value of an
    attribute (feature) of future data instances.

7
What is a bad classifier?
  • Some simplest classifiers
  • Table-Lookup
  • What if x cannot be found in the training data?
  • We give up!?
  • Or, we can
  • A simple classifier Cs can be built as a
    reference
  • If it can be found in the table (training data),
    return its class otherwise, what should it
    return?
  • A bad classifier is one that does worse than Cs.
  • Do we need to learn a classifier for data of one
    class?

8
Many Techniques
  • Decision trees
  • Linear regression
  • Neural networks
  • k-nearest neighbour
  • NaĂŻve Bayesian classifiers
  • Support Vector Machines
  • and many more ...

9
Regression for Numeric Prediction
  • Linear regression is a statistical technique when
    class and all the attributes are numeric.
  • y a Ăźx, where a and Ăź are regression
    coefficients
  • We need to use instances ltxi,ygt to find a and Ăź
  • by minimizing SSE (least squares)
  • SSE S(yi-yi)2 S(yi- a - Ăźxi)2
  • Extensions
  • Multiple regression
  • Piecewise linear regression
  • Polynomial regression

10
Nearest Neighbor
  • Also called instance based learning
  • Algorithm
  • Given a new instance x,
  • find its nearest neighbor ltx,ygt
  • Return y as the class of x
  • Distance measures
  • Normalization?!
  • Some interesting questions
  • Whats its time complexity?
  • Does it learn?

11
Nearest Neighbor (2)
  • Dealing with noise k-nearest neighbor
  • Use more than 1 neighbor
  • How many neighbors?
  • Weighted nearest neighbors
  • How to speed up?
  • Huge storage
  • Use representatives (a problem of instance
    selection)
  • Sampling
  • Grid
  • Clustering

12
NaĂŻve Bayes Classification
  • This is a direct application of Bayes rule
  • P(Cx) P(xC)P(C)/P(x)
  • x - a vector of x1,x2,,xn
  • Thats the best classifier you can ever build
  • You dont even need to select features, it takes
    care of it automatically
  • But, there are problems
  • There are a limited number of instances
  • How to estimate P(xC)

13
NBC (2)
  • Assume conditional independence between xis
  • We have P(Cx) P(x1C) P(xiC) (xnC)P(C)
  • How good is it in reality?
  • Lets build one NBC for a very simple data set
  • Estimate the priors and conditional probabilities
    with the training data
  • P(C1) ? P(C2) ? P(x11C1)? P(x12C1)?
  • What is the class for x(1,2,1)?
  • P(1x) P(x111) P(x221) P(x311) P(1),
    P(2x)
  • What is the class for (1,2,2)?

14
Example of NBC
C 1 2
7 4 3
A10 2 0
A11 2 1
A12 0 2
A20
A21
A22
A31
A32
A1 A2 A3 C
1 2 1 1
0 0 1 1
2 1 2 2
1 2 1 2
0 1 2 1
2 2 2 2
1 0 1 1
15
Golf Data
16
Decision Trees
  • A decision tree

Outlook
sunny
overcast
rain
Humidity
Wind
YES
high
normal
strong
weak
NO
YES
NO
YES
17
How to grow a tree?
  • Randomly ? Random Forests (Breiman, 2001)
  • What are the criteria to build a tree?
  • Accurate
  • Compact
  • A straightforward way to grow is
  • Pick an attribute
  • Split data according to its values
  • Recursively do the first two steps until
  • No data left
  • No feature left

18
Discussion
  • There are many possible trees
  • lets try it on the golf data
  • How to find the most compact one
  • that is consistent with the data?
  • Why the most compact?
  • Occams razor principle
  • Issue of efficiency w.r.t. optimality
  • One attribute at a time or

19
Grow a good tree efficiently
  • The heuristic to find commonality in feature
    values associated with class values
  • To build a compact tree generalized from the data
  • It means we look for features and splits that can
    lead to pure leaf nodes.
  • Is it a good heuristic?
  • What do you think?
  • How to judge it?
  • Is it really efficient?
  • How to implement it?

20
Lets grow one
  • Measuring the purity of a data set Entropy
  • Information gain (see the brief review)
  • Choose the feature with max gain

21
Different numbers of values
  • Different attributes can have varied numbers of
    values
  • Some treatments
  • Removing useless attributes before learning
  • Binarization
  • Discretization
  • Gain-ratio is another practical solution
  • Gain root-Info InfoAttribute(i)
  • Split-Info -?((Ti/T)log2 (Ti/T))
  • Gain-ratio Gain / Split-Info

22
Another kind of problems
  • A difficult problem. Why is it difficult?
  • Similar ones are Parity, Majority problems.

XOR problem 0 0 0 0 1 1 1 0 1 1
1 0
23
Tree Pruning
  • Overfitting Model fits training data too well,
    but wont work well for unseen data.
  • An effective approach to avoid overfitting and
    for a more compact tree (easy to understand)
  • Two general ways to prune
  • Pre-pruning stop splitting further
  • Any significant difference in classification
    accuracy before and after division
  • Post-pruning to trim back

24
Rules from Decision Trees
  • Two types of rules
  • Order sensitive (more compact, less efficient)
  • Order insensitive
  • The most straightforward way is
  • Class-based method
  • Group rules according to classes
  • Select most general rules (or remove redundant
    ones)
  • Data-based method
  • Select one rule at a time (keep the most general
    one)
  • Work on the remaining data until all data is
    covered

25
Variants of Decision Trees and Rules
  • Tree stumps
  • Holtes 1R rules (1992)
  • For each attribute A
  • Sort according to its values v
  • Find the most frequent class value c for each v
  • Breaking tie with coin flipping
  • Output the most accurate rule as if Av then c
  • An example (the Golf data)

26
Handling Large Size Data
  • When data simply cannot fit in memory
  • Is it a big problem?
  • Three representative approaches
  • Smart data structures to avoid unnecessary
    recalculation
  • Hash trees
  • SPRINT
  • Sufficient statistics
  • AVC-set (Attribute-Value, Class label) to
    summarize the class distribution for each
    attribute
  • Example RainForest
  • Parallel processing
  • Make data parallelizable

27
Ensemble Methods
  • A group of classifiers
  • Hybrid (Stacking)
  • Single type
  • Strong vs. weak learners
  • A good ensemble
  • Accuracy
  • Diversity
  • Some major approaches form ensembles
  • Bagging
  • Boosting

28
Bibliography
  • I.H. Witten and E. Frank. Data Mining Practical
    Machine Learning Tools and Techniques with Java
    Implementations. 2000. Morgan Kaufmann.
  • M. Kantardzic. Data Mining Concepts, Models,
    Methods, and Algorithms. 2003. IEEE.
  • J. Han and M. Kamber. Data Mining Concepts and
    Techniques. 2001. Morgan Kaufmann.
  • D. Hand, H. Mannila, P. Smyth. Principals of Data
    Mining. 2001. MIT.
  • T. G. Dietterich. Ensemble Methods in Machine
    Learning. I. J. Kittler and F. Roli (eds.) 1st
    Intl Workshop on Multiple Classifier Systems, pp
    1-15, Springer-Verlag, 2000.
Write a Comment
User Comments (0)
About PowerShow.com