Title: EXCELProject, Applications of Calculus part A Entropy Function and Decision Tree Classifiers
1EXCEL-Project, Applications of Calculus (part
A)Entropy Function and Decision Tree Classifiers
- Faculty Mollaghasemi, Georgiopoulos
- Graduate Students Sentelle and Kaylani
- Dates August 29 and September 5, 2006
2 Presentation Outline
- Introduction and Motivation
- Calculus Problems from your Textbook
- Problem 29, page 47
- Problem 31, page 47
- Problem 37, page 47
- Machine Learning and Applications
- Decision Tree Classifier
- An Example
- Designing a Decision Tree Classifier
- Learning Objective 1 Constructing the Entropy
Function - Learning Objective 2 Design a Decision Tree
Classifier for an Example - Learning Objective 3 Using a Decision Tree Tool
3 Introduction and Motivation The Entropy Function
- This module will introduce to you the idea of the
entropy function and its usefulness in the design
of decision tree classifiers - In Mathematics a classifier is a function from a
discrete (or continuous) feature space X (domain)
to a discrete set of labels - Y (range of the function)
- An example of a classifier is one which accepts
as inputs a persons salary details, age, marital
status, home address and previous credit card
history and classifies the person as acceptable
or unacceptable to receive a credit card, or a
loan
4 Introduction and Motivation The Entropy Function
- A decision tree classifier is being built by
splitting the space of the input features (domain
of the function) into smaller sub-spaces, in a
way that all the points in these smaller
sub-spaces are mapped to the same label (value
from the range of the function) - The splitting of the input space into smaller
sub-spaces is accomplished by using a
mathematical function, called entropy
(uncertainty, disorder) function, introduced by
Claude Shannon - Observe, in the following figures, a
classification problem (IRIS Flower
Classification Problem), and possible splits of
the input space into sub-spaces and why these
splits are important in the design of the
classifier
5 Introduction and Motivation Iris Flower
Classification Problem
- Iris data consists of 150 data-points of three
different types (labels) of flowers - Iris Setosa
- Iris Versicolor
- Iris Virginica
- Each datum has four features
- Sepal Length (cm) (Feature 1)
- Sepal Width (cm) (Feature 2)
- Petal Length (cm) (Feature 3)
- Petal Width (cm) (Feature 4)
- 120 of the 150 points are used for the design of
the classifier - 30 of the 150 points are used for the testing of
the classifier
6 Introduction and Motivation Iris Flower
Classification Problem
7 Introduction and Motivation Iris Flower
Classification Problem
8 Introduction and Motivation Iris Flower
Classification Problem
9 Introduction and Motivation Iris Flower
Classification Problem
10 Introduction and Motivation The Entropy Function
- The entropy (uncertainty, disorder) function of a
two-label (label A and B) dataset, where p is the
probability that the datum in the dataset is of
label A is defined as follows
11 Introduction and Motivation The Entropy Function
- The entropy (uncertainty, disorder) function of
a dataset is used in a decision tree classifier
to split the space of the input features into
smaller sub-spaces that contain data (objects) of
a single label, or expressed differently, data of
0 uncertainty (disorder) - Remember the Iris Flower Dataset
12 Introduction and Motivation The Entropy Function
- The entropy function is related to your Calculus
topic, entitled, Combinations of Functions (page
42 of your Calculus textbook) and Composition of
Functions (page 43 of your Calculus textbook)
because in order to construct the entropy
function you need - Multiplication of functions, such as in
- Composition of functions, such as in
- Addition of functions, such as in
13 Problems from your Calculus Book Problem 29,
Page 47
14 Problems from your Calculus Book Problem 31,
Page 47
15 Problems from your Calculus Book Problem 31,
Page 47
16 Problems from your Calculus Book Problem 31,
Page 47
17 Problems from your Calculus Book Problem 31,
Page 47
18 Problems from your Calculus Book Problem 31,
Page 47
19 Problems from your Calculus Book Problem 31,
Page 47
20 Problems from your Calculus Book Problem 37,
Page 47
- The following functions are given to you
- The composite function is defined as
- The domain of the composite function is defined
to be the interval
21 Decision Tree Classifier Applications
- Medical Applications
- Wisconsin Breast Cancer (predict whether a tissue
sample taken from a patient is malignant or
benign two classes, nine numerical attributes) - Bupa Livers Disorder (predict whether or not a
male patient has a liver disorder based on blood
tests and alcohol consumption two classes, six
numerical attributes)
22 Decision Tree Classifier Applications
- Medical Applications
- PIMA Indian Diabetes (the patients are females at
least 21 years old of Pima Indian heritage living
near Phoenix, Arizona the problem is to predict
whether a patient would test positive for
diabetes there are two classes, seven numerical
attributes) - Heart Disease (the problem here is to predict the
presence or absence of heart disease based on
various medical tests there are two classes,
seven numerical attributes and six categorical
attributes)
23 Decision Tree Classifier Applications
- Image Recognition Applications
- Satellite Image (this dataset gives the
multi-spectral values of pixels within 3x3
neighborhoods in a satellite image, and the
classification associated with the central pixel
the aim is to predict the classification given
the multi-spectral values. There are six classes
and thirty six numerical attributes) - Image Segmentation (this is a database of seven
outdoor images every pixel should be classified
as brickface, sky, foliage, cement, window, path,
or grass there are seven classes and nineteen
numerical attributes)
24 Decision Tree Classifier Applications
- Other Applications
- Boston Housing (this dataset gives housing values
in Boston suburbs there are three classes,
twelve numerical attributes, one binary
attribute) - Congressional Voting Records (this database gives
the votes of each member of the U.S. House of
Representatives of the 98th Congress on sixteen
key issues the problem is to classify a
congressman as a Democrat, or a Republican based
on the sixteen votes there are two classes,
sixteen categorical attributes (yea, nay,
neither)
25 A Case Study Iris Flower Data Hand-Crafting a
Classifier
- Iris data consists of 150 data-points of three
different types of flowers - Iris Setosa
- Iris Versicolor
- Iris Virginica
- Each datum has four features
- Sepal Length (cm) (Feature 1)
- Sepal Width (cm) (Feature 2)
- Petal Length (cm) (Feature 3)
- Petal Width (cm) (Feature 4)
- 120 of the 150 points are used for the design of
the classifier - 30 of the 150 points are used for the testing of
the classifier
26 A Case Study Iris Flower Data Visualization
of the Iris Features
27 A Case Study Iris Flower Data A Hand-Crafted
Classifier
28 A Case Study Iris Flower Data Testing the
Hand-Crafted Classifier
29 Design of a Decision Tree Classifier An
Example
30 Design of a Decision Tree Classifier An
Example
31 Design of a Decision Tree Classifier An
Example
- We have data 1, 2, 3, 4, 5 of Class A and data
(6, 7, 8, 9, 10 of class B, with attributes
(features) x and y. - Our objective is to use values for the attributes
x and y that would separate the data into smaller
datasets of purer classification labels - Remember the Iris Flower data-set where we were
trying to separate the colors (labels) by drawing
lines perpendicular to specific attribute values
32 Design of a Decision Tree Classifier Split
Choices
- Here the choices for splitting the data are
limited - The possible x-splits that we need to consider
are - The possible y-splits that we need to consider
are
33 DTC First Split equals x0.15
34 DTC First Split equals x0.25
35 DTC First Split equals x0.35
36 DTC First Split equals 0.4
37 DTC First Split equals x0.5
38 DTC First Split equals x0.55
39 DTC First Split equals x0.6
40 DTC First Split equals y0.5
41 DTC First Split equals y0.6
42 DTC First Split equals y0.7
43 Decision Tree Classifier How do we find the
Optimal Split?
- The optimal split corresponds to an x or y
attribute value that minimizes the average
entropy (uncertainty, disorder) of the resulting
smaller datasets, or equivalently maximizes the
difference in entropy (uncertainty) between the
original dataset (data-points 1, 2, 3, 4, 5, 6,
7, 8, 9, and 10) and the resulting smaller
datasets - This leads us into a natural way into the
following two learning objectives - Learning Objective 1 Calculating the entropy of
a dataset - Learning Objective 2 Using the entropy values to
determine the optimal split
44 Learning Objective 1 Calculating the Entropy
Function
- The entropy function of a dataset consisting of
points belonging to two distinct labels (A or B),
where the probability that the points are of
label A is equal to p, was defined by the
following equation - We will calculate the entropy function of a
dataset in a sequence of steps
45 Learning Objective 1 Calculating the Entropy
Function
- Step 1 (In-Class Student Assignment 1.1)
Calculate the function - at points p 0, 0.2, 0.4, 0.5, 0.6, 0.8, and 1
and then plot this function
46 Learning Objective 1 Calculating the Entropy
Function
47 Learning Objective 1 Calculating the Entropy
Function
- Step 1 (In-Class Student Assignment 1.2)
Calculate the function - at points p 0, 0.2, 0.4, 0.5, 0.6, 0.8, and 1
and then plot this function
48 Learning Objective 1 Calculating the Entropy
Function
49Learning Objective 1 Calculating the Entropy
Function
- Step 1 (In-Class Student Assignment 1.3)
Calculate the function - at points p 0, 0.2, 0.4, 0.5, 0.6, 0.8, and 1
and then plot this function
50 Learning Objective 1 Calculating the Entropy
Function
51 Learning Objective 1 Calculating the Entropy
Function
- Step 4 (In-Class Student Assignment 1.4) Draw
some useful observations about the entropy
function - Observation 1
- Observation 2
- Observation 3
- Observation 4
52 Learning Objective 2 Finding the Optimal
Split
- We return back to our problem of finding the
optimal split of the data (1, 2, 3, 4, 5 points
of class A, and 6, 7, 8, 9, 10 points of class B) - The optimal split of the data is the one that
maximizes the difference in entropy between the
original dataset (1, 2, 3, 4, 5 points of class
A, and 6, 7, 8, 9, 10 points of class B) and the
datasets that result if we split the data into
two groups of data using all possible x and y
splits - This objective is achieved by focusing on four
distinct but inter-related steps
53 Learning Objective 2 Finding the Optimal
Split
- Step 1 Decide which are the possible x and y
values to use, in order to split the data into
two groups - Possible x-split values
- Possibly y-split values
54 Learning Objective 2 Finding the Optimal
Split
- Step 2 For the split values that you have chosen
to split the data with (in Step 1), determine the
two groups of data. - If we think that all the data (1, 2, 3, 4, 5, 6,
7, 8., 9, 10) lie at a node of the tree (called
root node) and data that have attribute values
smaller than a split value go to a left node,
branched off from this root node, while data that
have attribute values larger than or equal to the
split value go to a right node, branched off from
this root node, we are interested in determining
which data points go to the left node and the
right node of the tree
55 Learning Objective 2 Example of a
Data-Split, x0.35
- Example of a data-split (data going to the left
and right node of the tree) for a split value
x0.35
56 Learning Objective 2 Finding the Optimal
Split
- Step 2 (Student Assignment 2.1) For the split
values that you have chosen to split the data
with (in Step 1), determine the two groups of
data that go to the left and to the right node of
the tree, as was illustrated in the previous
slide for the example split value of x0.35
This is an in-class Assignment Student Assignment
2.1
57 Learning Objective 2 Finding the Optimal
Split
- Step 3 (Student Assignment 2.2) For the split
values that you have chosen to split the data
with (in Step 1), calculate the impurity of the
data that go to the left and to the right node of
the tree. You have already identified the data
that go to the left and the right node of the
tree (in Step 2) for the split values chosen in
Step 1.
This is an in-class Assignment Student Assignment
2.2
58 Learning Objective 2 Finding the Optimal
Split
- Step 4 (Student Assignment 2.3) For the split
values that you have chosen to split the data
with (in Step 1), calculate the average impurity
of the data that go to the left and to the right
node of the tree. Then, calculate the difference
between the impurity of the original dataset and
this average impurity.
This is an in-class Assignment Student Assignment
2.3
59 Learning Objective 2 Finding the Optimal
Split
- Step 5 Choose as the optimal split value the x
or y split value for which the difference in
impurity, calculated in Step 4, was maximum.
60 Learning Objective 2 Finding the Optimal
Split
- Step 6 By looking at the figures of how the
optimal split separated the data and how the
other splits separated the data, comment on the
validity of the statement -
-
- the optimal split has reduced the impurity of
the original data the most
61 Learning Objective 2 Student Assignment
- Student Assignment 2.4 (to do at home)
- You are requested to repeat steps 1-6, described
in class, to determine what is the optimal split
value for the dataset consisting of data with
indices 3, 5, 6, 7, 8, 9 and 10 and residing at
the right child of the node of the tree shown in
the following figure - Data-points 3 and 5 are of class label A, while
data-points 6, 7, 8, 9 and 10 are of class label
B
62 Learning Objective 2 Student Assignment
- Student Assignment 2.4 Repeat steps 1-6 for the
data residing in node 3 of the tree, in order to
find the optimal split for this dataset (3, 5, 6,
7, 8, 9, 10)