Rule induction: Ross Quinlan's ID3 algorithm - PowerPoint PPT Presentation

About This Presentation
Title:

Rule induction: Ross Quinlan's ID3 algorithm

Description:

You are presented with the data. You have a supervised learning problem ... the same value for the conclusion (eg they all say Conclusion=safe from sunburn) ... – PowerPoint PPT presentation

Number of Views:280
Avg rating:3.0/5.0
Slides: 11
Provided by: freddaw
Category:

less

Transcript and Presenter's Notes

Title: Rule induction: Ross Quinlan's ID3 algorithm


1
Rule inductionRoss Quinlan's ID3 algorithm
  • Fredda Weinberg
  • CIS 718X
  • Fall 2005
  • Professor Kopec
  • Assignment 3

2
The learning problem
  • You are presented with the data.
  • You have a supervised learning problem (that is,
    a target variable).
  • In practice, there is no such thing as the
    correct model.
  • You are looking for a best approximating model.
  • There is no reason to think that linear models
    provide the best approximating model.
  • SPSS CLementine Users Group

3
Terms
  • General
  • Decision trees.
  • Recursive partitioning -- Apply the same
    splitting rule to smaller and smaller partitions
    of the sample space.
  • Classification
  • Tree-based classification.
  • Classification trees.
  • ibid

4
Rule induction
  • 1. For each attribute, compute its entropy with
    respect to the conclusion
  • 2. Select the attribute (say A) with lowest
    entropy.
  • 3. Divide the data into separate sets so that
    within a set,
  • A has a fixed value (eg Colorgreen eye
    color in one set, Colorbrown in another, etc).
  • 4. Build a tree with branches
  • if Aa1 then ... (subtree1)
  • if Aa2 then ... (subtree2)
  • ...etc...
  • 5. For each subtree, repeat this process from
    step 1.
  • 6. At each iteration, one attribute gets
    removed from consideration. The process stops
    when there are no attributes left to consider, or
    when all the data being considered in a subtree
    have the same value for the conclusion (eg they
    all say Conclusionsafe from sunburn).
  • Rule induction Ross Quinlan's ID3 algorithm

5
Iterative Dichotomizer
The rule induction algorithm was first used by
Hunt in his CLS (concept learning system) in
1962. Then, with extensions for handling numeric
data too, it was used by Ross Quinlan for his ID3
system in 1979. Quinlan's ID3 tried to cut down
on effort by inducing a set of rules from a small
subset of data, and then testing to see if those
rules explained other data. Data not explained
were then added to the chosen subset, and new
rules induced. This process continued until all
the data was accounted for. The letters ID stood
for iterative dichotomiser', a fancy name for
this simple algorithm. Rule induction Ross
Quinlan's ID3 algorithm
6
Entropy
  • Entropy Si -pi log2 pi
  • Information-theoretic criterion Minimum number
    of bits needed to encode the classification of an
    arbitrary case.
  • Ranges from 0 to 1.
  • 0 if p is concentrated in one class.
  • Maximal if p is uniform across classes.
  • Entropy gain is reduction in entropy after split.
    Interpretation Number of bits saved when
    encoding the target value with knowledge of the
    predictor.
  • Entropy gain is biased in favor of attributes
    with many values. Gain ratio discourages the
    selection of attributes with many uniformly
    distributed values.
  • SPSS CLementine Users Group

7
Tech Support toy database is it the equipment or
the commander?
Decision Trees by Computational Intelligence
8
The Decision Tree produced by the training data
9
Testing with new examples Predictions
10
Applications
  • Predicting Magnetic Properties of Crystals
  • Profiling High Income Earners from Census Data
  • Assessing Churn Risk
  • Detecting Advertisements on the Web
  • Identifying Spam
  • Diagnosing Hypothyroidism
Write a Comment
User Comments (0)
About PowerShow.com