EXCELProject, Applications of Calculus part A Entropy Function and Decision Tree Classifiers - PowerPoint PPT Presentation

1 / 62

About This Presentation

Title:

EXCELProject, Applications of Calculus part A Entropy Function and Decision Tree Classifiers

Description:

UCF EXCEL. EXCEL-Project, Applications of Calculus (part A) ... Sepal Length (cm) (Feature 1) Sepal Width (cm) (Feature 2) Petal Length (cm) (Feature 3) ... – PowerPoint PPT presentation

Number of Views:137

Avg rating:3.0/5.0

Slides: 63

Provided by: exce1

Category:

more less

Transcript and Presenter's Notes

Title: EXCELProject, Applications of Calculus part A Entropy Function and Decision Tree Classifiers

1
EXCEL-Project, Applications of Calculus (part
A)Entropy Function and Decision Tree Classifiers

Faculty Mollaghasemi, Georgiopoulos
Graduate Students Sentelle and Kaylani
Dates August 29 and September 5, 2006

2
Presentation Outline

Introduction and Motivation
Calculus Problems from your Textbook
Problem 29, page 47
Problem 31, page 47
Problem 37, page 47
Machine Learning and Applications
Decision Tree Classifier
An Example
Designing a Decision Tree Classifier
Learning Objective 1 Constructing the Entropy
Function
Learning Objective 2 Design a Decision Tree
Classifier for an Example
Learning Objective 3 Using a Decision Tree Tool

3
Introduction and Motivation The Entropy Function

This module will introduce to you the idea of the
entropy function and its usefulness in the design
of decision tree classifiers
In Mathematics a classifier is a function from a
discrete (or continuous) feature space X (domain)
to a discrete set of labels
Y (range of the function)
An example of a classifier is one which accepts
as inputs a persons salary details, age, marital
status, home address and previous credit card
history and classifies the person as acceptable
or unacceptable to receive a credit card, or a
loan

4
Introduction and Motivation The Entropy Function

A decision tree classifier is being built by
splitting the space of the input features (domain
of the function) into smaller sub-spaces, in a
way that all the points in these smaller
sub-spaces are mapped to the same label (value
from the range of the function)
The splitting of the input space into smaller
sub-spaces is accomplished by using a
mathematical function, called entropy
(uncertainty, disorder) function, introduced by
Claude Shannon
Observe, in the following figures, a
classification problem (IRIS Flower
Classification Problem), and possible splits of
the input space into sub-spaces and why these
splits are important in the design of the
classifier

5
Introduction and Motivation Iris Flower
Classification Problem

Iris data consists of 150 data-points of three
different types (labels) of flowers
Iris Setosa
Iris Versicolor
Iris Virginica
Each datum has four features
Sepal Length (cm) (Feature 1)
Sepal Width (cm) (Feature 2)
Petal Length (cm) (Feature 3)
Petal Width (cm) (Feature 4)
120 of the 150 points are used for the design of
the classifier
30 of the 150 points are used for the testing of
the classifier

6
Introduction and Motivation Iris Flower
Classification Problem
7
Introduction and Motivation Iris Flower
Classification Problem
8
Introduction and Motivation Iris Flower
Classification Problem
9
Introduction and Motivation Iris Flower
Classification Problem
10
Introduction and Motivation The Entropy Function

The entropy (uncertainty, disorder) function of a
two-label (label A and B) dataset, where p is the
probability that the datum in the dataset is of
label A is defined as follows

11
Introduction and Motivation The Entropy Function

The entropy (uncertainty, disorder) function of
a dataset is used in a decision tree classifier
to split the space of the input features into
smaller sub-spaces that contain data (objects) of
a single label, or expressed differently, data of
0 uncertainty (disorder)
Remember the Iris Flower Dataset

12
Introduction and Motivation The Entropy Function

The entropy function is related to your Calculus
topic, entitled, Combinations of Functions (page
42 of your Calculus textbook) and Composition of
Functions (page 43 of your Calculus textbook)
because in order to construct the entropy
function you need
Multiplication of functions, such as in
Composition of functions, such as in
Addition of functions, such as in

13
Problems from your Calculus Book Problem 29,
Page 47
14
Problems from your Calculus Book Problem 31,
Page 47
15
Problems from your Calculus Book Problem 31,
Page 47
16
Problems from your Calculus Book Problem 31,
Page 47
17
Problems from your Calculus Book Problem 31,
Page 47
18
Problems from your Calculus Book Problem 31,
Page 47
19
Problems from your Calculus Book Problem 31,
Page 47
20
Problems from your Calculus Book Problem 37,
Page 47

The following functions are given to you
The composite function is defined as
The domain of the composite function is defined
to be the interval

21
Decision Tree Classifier Applications

Medical Applications
Wisconsin Breast Cancer (predict whether a tissue
sample taken from a patient is malignant or
benign two classes, nine numerical attributes)
Bupa Livers Disorder (predict whether or not a
male patient has a liver disorder based on blood
tests and alcohol consumption two classes, six
numerical attributes)

22
Decision Tree Classifier Applications

Medical Applications
PIMA Indian Diabetes (the patients are females at
least 21 years old of Pima Indian heritage living
near Phoenix, Arizona the problem is to predict
whether a patient would test positive for
diabetes there are two classes, seven numerical
attributes)
Heart Disease (the problem here is to predict the
presence or absence of heart disease based on
various medical tests there are two classes,
seven numerical attributes and six categorical
attributes)

23
Decision Tree Classifier Applications

Image Recognition Applications
Satellite Image (this dataset gives the
multi-spectral values of pixels within 3x3
neighborhoods in a satellite image, and the
classification associated with the central pixel
the aim is to predict the classification given
the multi-spectral values. There are six classes
and thirty six numerical attributes)
Image Segmentation (this is a database of seven
outdoor images every pixel should be classified
as brickface, sky, foliage, cement, window, path,
or grass there are seven classes and nineteen
numerical attributes)

24
Decision Tree Classifier Applications

Other Applications
Boston Housing (this dataset gives housing values
in Boston suburbs there are three classes,
twelve numerical attributes, one binary
attribute)
Congressional Voting Records (this database gives
the votes of each member of the U.S. House of
Representatives of the 98th Congress on sixteen
key issues the problem is to classify a
congressman as a Democrat, or a Republican based
on the sixteen votes there are two classes,
sixteen categorical attributes (yea, nay,
neither)

25
A Case Study Iris Flower Data Hand-Crafting a
Classifier

Iris data consists of 150 data-points of three
different types of flowers
Iris Setosa
Iris Versicolor
Iris Virginica
Each datum has four features
Sepal Length (cm) (Feature 1)
Sepal Width (cm) (Feature 2)
Petal Length (cm) (Feature 3)
Petal Width (cm) (Feature 4)
120 of the 150 points are used for the design of
the classifier
30 of the 150 points are used for the testing of
the classifier

26
A Case Study Iris Flower Data Visualization
of the Iris Features
27
A Case Study Iris Flower Data A Hand-Crafted
Classifier
28
A Case Study Iris Flower Data Testing the
Hand-Crafted Classifier
29
Design of a Decision Tree Classifier An
Example
30
Design of a Decision Tree Classifier An
Example
31
Design of a Decision Tree Classifier An
Example

We have data 1, 2, 3, 4, 5 of Class A and data
(6, 7, 8, 9, 10 of class B, with attributes
(features) x and y.
Our objective is to use values for the attributes
x and y that would separate the data into smaller
datasets of purer classification labels
Remember the Iris Flower data-set where we were
trying to separate the colors (labels) by drawing
lines perpendicular to specific attribute values

32
Design of a Decision Tree Classifier Split
Choices

Here the choices for splitting the data are
limited
The possible x-splits that we need to consider
are
The possible y-splits that we need to consider
are

33
DTC First Split equals x0.15
34
DTC First Split equals x0.25
35
DTC First Split equals x0.35
36
DTC First Split equals 0.4
37
DTC First Split equals x0.5
38
DTC First Split equals x0.55
39
DTC First Split equals x0.6
40
DTC First Split equals y0.5
41
DTC First Split equals y0.6
42
DTC First Split equals y0.7
43
Decision Tree Classifier How do we find the
Optimal Split?

The optimal split corresponds to an x or y
attribute value that minimizes the average
entropy (uncertainty, disorder) of the resulting
smaller datasets, or equivalently maximizes the
difference in entropy (uncertainty) between the
original dataset (data-points 1, 2, 3, 4, 5, 6,
7, 8, 9, and 10) and the resulting smaller
datasets
This leads us into a natural way into the
following two learning objectives
Learning Objective 1 Calculating the entropy of
a dataset
Learning Objective 2 Using the entropy values to
determine the optimal split

44
Learning Objective 1 Calculating the Entropy
Function

The entropy function of a dataset consisting of
points belonging to two distinct labels (A or B),
where the probability that the points are of
label A is equal to p, was defined by the
following equation
We will calculate the entropy function of a
dataset in a sequence of steps

45
Learning Objective 1 Calculating the Entropy
Function

Step 1 (In-Class Student Assignment 1.1)
Calculate the function
at points p 0, 0.2, 0.4, 0.5, 0.6, 0.8, and 1
and then plot this function

46
Learning Objective 1 Calculating the Entropy
Function
47
Learning Objective 1 Calculating the Entropy
Function

Step 1 (In-Class Student Assignment 1.2)
Calculate the function
at points p 0, 0.2, 0.4, 0.5, 0.6, 0.8, and 1
and then plot this function

48
Learning Objective 1 Calculating the Entropy
Function
49
Learning Objective 1 Calculating the Entropy
Function

Step 1 (In-Class Student Assignment 1.3)
Calculate the function
at points p 0, 0.2, 0.4, 0.5, 0.6, 0.8, and 1
and then plot this function

50
Learning Objective 1 Calculating the Entropy
Function
51
Learning Objective 1 Calculating the Entropy
Function

Step 4 (In-Class Student Assignment 1.4) Draw
some useful observations about the entropy
function
Observation 1
Observation 2
Observation 3
Observation 4

52
Learning Objective 2 Finding the Optimal
Split

We return back to our problem of finding the
optimal split of the data (1, 2, 3, 4, 5 points
of class A, and 6, 7, 8, 9, 10 points of class B)
The optimal split of the data is the one that
maximizes the difference in entropy between the
original dataset (1, 2, 3, 4, 5 points of class
A, and 6, 7, 8, 9, 10 points of class B) and the
datasets that result if we split the data into
two groups of data using all possible x and y
splits
This objective is achieved by focusing on four
distinct but inter-related steps

53
Learning Objective 2 Finding the Optimal
Split

Step 1 Decide which are the possible x and y
values to use, in order to split the data into
two groups
Possible x-split values
Possibly y-split values

54
Learning Objective 2 Finding the Optimal
Split

Step 2 For the split values that you have chosen
to split the data with (in Step 1), determine the
two groups of data.
If we think that all the data (1, 2, 3, 4, 5, 6,
7, 8., 9, 10) lie at a node of the tree (called
root node) and data that have attribute values
smaller than a split value go to a left node,
branched off from this root node, while data that
have attribute values larger than or equal to the
split value go to a right node, branched off from
this root node, we are interested in determining
which data points go to the left node and the
right node of the tree

55
Learning Objective 2 Example of a
Data-Split, x0.35

Example of a data-split (data going to the left
and right node of the tree) for a split value
x0.35

56
Learning Objective 2 Finding the Optimal
Split

Step 2 (Student Assignment 2.1) For the split
values that you have chosen to split the data
with (in Step 1), determine the two groups of
data that go to the left and to the right node of
the tree, as was illustrated in the previous
slide for the example split value of x0.35

This is an in-class Assignment Student Assignment
2.1
57
Learning Objective 2 Finding the Optimal
Split

Step 3 (Student Assignment 2.2) For the split
values that you have chosen to split the data
with (in Step 1), calculate the impurity of the
data that go to the left and to the right node of
the tree. You have already identified the data
that go to the left and the right node of the
tree (in Step 2) for the split values chosen in
Step 1.

This is an in-class Assignment Student Assignment
2.2
58
Learning Objective 2 Finding the Optimal
Split

Step 4 (Student Assignment 2.3) For the split
values that you have chosen to split the data
with (in Step 1), calculate the average impurity
of the data that go to the left and to the right
node of the tree. Then, calculate the difference
between the impurity of the original dataset and
this average impurity.

This is an in-class Assignment Student Assignment
2.3
59
Learning Objective 2 Finding the Optimal
Split

Step 5 Choose as the optimal split value the x
or y split value for which the difference in
impurity, calculated in Step 4, was maximum.

60
Learning Objective 2 Finding the Optimal
Split

Step 6 By looking at the figures of how the
optimal split separated the data and how the
other splits separated the data, comment on the
validity of the statement
the optimal split has reduced the impurity of
the original data the most

61
Learning Objective 2 Student Assignment

Student Assignment 2.4 (to do at home)
You are requested to repeat steps 1-6, described
in class, to determine what is the optimal split
value for the dataset consisting of data with
indices 3, 5, 6, 7, 8, 9 and 10 and residing at
the right child of the node of the tree shown in
the following figure
Data-points 3 and 5 are of class label A, while
data-points 6, 7, 8, 9 and 10 are of class label
B

62
Learning Objective 2 Student Assignment

Student Assignment 2.4 Repeat steps 1-6 for the
data residing in node 3 of the tree, in order to
find the optimal split for this dataset (3, 5, 6,
7, 8, 9, 10)

Write a Comment

User Comments (0)