EXCELProject, Applications of Calculus part A Entropy Function and Decision Tree Classifiers - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

EXCELProject, Applications of Calculus part A Entropy Function and Decision Tree Classifiers

Description:

UCF EXCEL. EXCEL-Project, Applications of Calculus (part A) ... Sepal Length (cm) (Feature 1) Sepal Width (cm) (Feature 2) Petal Length (cm) (Feature 3) ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 63
Provided by: exce1
Category:

less

Transcript and Presenter's Notes

Title: EXCELProject, Applications of Calculus part A Entropy Function and Decision Tree Classifiers


1
EXCEL-Project, Applications of Calculus (part
A)Entropy Function and Decision Tree Classifiers
  • Faculty Mollaghasemi, Georgiopoulos
  • Graduate Students Sentelle and Kaylani
  • Dates August 29 and September 5, 2006

2
Presentation Outline
  • Introduction and Motivation
  • Calculus Problems from your Textbook
  • Problem 29, page 47
  • Problem 31, page 47
  • Problem 37, page 47
  • Machine Learning and Applications
  • Decision Tree Classifier
  • An Example
  • Designing a Decision Tree Classifier
  • Learning Objective 1 Constructing the Entropy
    Function
  • Learning Objective 2 Design a Decision Tree
    Classifier for an Example
  • Learning Objective 3 Using a Decision Tree Tool

3
Introduction and Motivation The Entropy Function
  • This module will introduce to you the idea of the
    entropy function and its usefulness in the design
    of decision tree classifiers
  • In Mathematics a classifier is a function from a
    discrete (or continuous) feature space X (domain)
    to a discrete set of labels
  • Y (range of the function)
  • An example of a classifier is one which accepts
    as inputs a persons salary details, age, marital
    status, home address and previous credit card
    history and classifies the person as acceptable
    or unacceptable to receive a credit card, or a
    loan

4
Introduction and Motivation The Entropy Function
  • A decision tree classifier is being built by
    splitting the space of the input features (domain
    of the function) into smaller sub-spaces, in a
    way that all the points in these smaller
    sub-spaces are mapped to the same label (value
    from the range of the function)
  • The splitting of the input space into smaller
    sub-spaces is accomplished by using a
    mathematical function, called entropy
    (uncertainty, disorder) function, introduced by
    Claude Shannon
  • Observe, in the following figures, a
    classification problem (IRIS Flower
    Classification Problem), and possible splits of
    the input space into sub-spaces and why these
    splits are important in the design of the
    classifier

5
Introduction and Motivation Iris Flower
Classification Problem
  • Iris data consists of 150 data-points of three
    different types (labels) of flowers
  • Iris Setosa
  • Iris Versicolor
  • Iris Virginica
  • Each datum has four features
  • Sepal Length (cm) (Feature 1)
  • Sepal Width (cm) (Feature 2)
  • Petal Length (cm) (Feature 3)
  • Petal Width (cm) (Feature 4)
  • 120 of the 150 points are used for the design of
    the classifier
  • 30 of the 150 points are used for the testing of
    the classifier

6
Introduction and Motivation Iris Flower
Classification Problem
7
Introduction and Motivation Iris Flower
Classification Problem
8
Introduction and Motivation Iris Flower
Classification Problem
9
Introduction and Motivation Iris Flower
Classification Problem
10
Introduction and Motivation The Entropy Function
  • The entropy (uncertainty, disorder) function of a
    two-label (label A and B) dataset, where p is the
    probability that the datum in the dataset is of
    label A is defined as follows

11
Introduction and Motivation The Entropy Function
  • The entropy (uncertainty, disorder) function of
    a dataset is used in a decision tree classifier
    to split the space of the input features into
    smaller sub-spaces that contain data (objects) of
    a single label, or expressed differently, data of
    0 uncertainty (disorder)
  • Remember the Iris Flower Dataset

12
Introduction and Motivation The Entropy Function
  • The entropy function is related to your Calculus
    topic, entitled, Combinations of Functions (page
    42 of your Calculus textbook) and Composition of
    Functions (page 43 of your Calculus textbook)
    because in order to construct the entropy
    function you need
  • Multiplication of functions, such as in
  • Composition of functions, such as in
  • Addition of functions, such as in

13
Problems from your Calculus Book Problem 29,
Page 47
14
Problems from your Calculus Book Problem 31,
Page 47
15
Problems from your Calculus Book Problem 31,
Page 47
16
Problems from your Calculus Book Problem 31,
Page 47
17
Problems from your Calculus Book Problem 31,
Page 47
18
Problems from your Calculus Book Problem 31,
Page 47
19
Problems from your Calculus Book Problem 31,
Page 47
20
Problems from your Calculus Book Problem 37,
Page 47
  • The following functions are given to you
  • The composite function is defined as
  • The domain of the composite function is defined
    to be the interval

21
Decision Tree Classifier Applications
  • Medical Applications
  • Wisconsin Breast Cancer (predict whether a tissue
    sample taken from a patient is malignant or
    benign two classes, nine numerical attributes)
  • Bupa Livers Disorder (predict whether or not a
    male patient has a liver disorder based on blood
    tests and alcohol consumption two classes, six
    numerical attributes)

22
Decision Tree Classifier Applications
  • Medical Applications
  • PIMA Indian Diabetes (the patients are females at
    least 21 years old of Pima Indian heritage living
    near Phoenix, Arizona the problem is to predict
    whether a patient would test positive for
    diabetes there are two classes, seven numerical
    attributes)
  • Heart Disease (the problem here is to predict the
    presence or absence of heart disease based on
    various medical tests there are two classes,
    seven numerical attributes and six categorical
    attributes)

23
Decision Tree Classifier Applications
  • Image Recognition Applications
  • Satellite Image (this dataset gives the
    multi-spectral values of pixels within 3x3
    neighborhoods in a satellite image, and the
    classification associated with the central pixel
    the aim is to predict the classification given
    the multi-spectral values. There are six classes
    and thirty six numerical attributes)
  • Image Segmentation (this is a database of seven
    outdoor images every pixel should be classified
    as brickface, sky, foliage, cement, window, path,
    or grass there are seven classes and nineteen
    numerical attributes)

24
Decision Tree Classifier Applications
  • Other Applications
  • Boston Housing (this dataset gives housing values
    in Boston suburbs there are three classes,
    twelve numerical attributes, one binary
    attribute)
  • Congressional Voting Records (this database gives
    the votes of each member of the U.S. House of
    Representatives of the 98th Congress on sixteen
    key issues the problem is to classify a
    congressman as a Democrat, or a Republican based
    on the sixteen votes there are two classes,
    sixteen categorical attributes (yea, nay,
    neither)

25
A Case Study Iris Flower Data Hand-Crafting a
Classifier
  • Iris data consists of 150 data-points of three
    different types of flowers
  • Iris Setosa
  • Iris Versicolor
  • Iris Virginica
  • Each datum has four features
  • Sepal Length (cm) (Feature 1)
  • Sepal Width (cm) (Feature 2)
  • Petal Length (cm) (Feature 3)
  • Petal Width (cm) (Feature 4)
  • 120 of the 150 points are used for the design of
    the classifier
  • 30 of the 150 points are used for the testing of
    the classifier

26
A Case Study Iris Flower Data Visualization
of the Iris Features
27
A Case Study Iris Flower Data A Hand-Crafted
Classifier
28
A Case Study Iris Flower Data Testing the
Hand-Crafted Classifier
29
Design of a Decision Tree Classifier An
Example
30
Design of a Decision Tree Classifier An
Example
31
Design of a Decision Tree Classifier An
Example
  • We have data 1, 2, 3, 4, 5 of Class A and data
    (6, 7, 8, 9, 10 of class B, with attributes
    (features) x and y.
  • Our objective is to use values for the attributes
    x and y that would separate the data into smaller
    datasets of purer classification labels
  • Remember the Iris Flower data-set where we were
    trying to separate the colors (labels) by drawing
    lines perpendicular to specific attribute values

32
Design of a Decision Tree Classifier Split
Choices
  • Here the choices for splitting the data are
    limited
  • The possible x-splits that we need to consider
    are
  • The possible y-splits that we need to consider
    are

33
DTC First Split equals x0.15
34
DTC First Split equals x0.25
35
DTC First Split equals x0.35
36
DTC First Split equals 0.4
37
DTC First Split equals x0.5
38
DTC First Split equals x0.55
39
DTC First Split equals x0.6
40
DTC First Split equals y0.5
41
DTC First Split equals y0.6
42
DTC First Split equals y0.7
43
Decision Tree Classifier How do we find the
Optimal Split?
  • The optimal split corresponds to an x or y
    attribute value that minimizes the average
    entropy (uncertainty, disorder) of the resulting
    smaller datasets, or equivalently maximizes the
    difference in entropy (uncertainty) between the
    original dataset (data-points 1, 2, 3, 4, 5, 6,
    7, 8, 9, and 10) and the resulting smaller
    datasets
  • This leads us into a natural way into the
    following two learning objectives
  • Learning Objective 1 Calculating the entropy of
    a dataset
  • Learning Objective 2 Using the entropy values to
    determine the optimal split

44
Learning Objective 1 Calculating the Entropy
Function
  • The entropy function of a dataset consisting of
    points belonging to two distinct labels (A or B),
    where the probability that the points are of
    label A is equal to p, was defined by the
    following equation
  • We will calculate the entropy function of a
    dataset in a sequence of steps

45
Learning Objective 1 Calculating the Entropy
Function
  • Step 1 (In-Class Student Assignment 1.1)
    Calculate the function
  • at points p 0, 0.2, 0.4, 0.5, 0.6, 0.8, and 1
    and then plot this function

46
Learning Objective 1 Calculating the Entropy
Function
47
Learning Objective 1 Calculating the Entropy
Function
  • Step 1 (In-Class Student Assignment 1.2)
    Calculate the function
  • at points p 0, 0.2, 0.4, 0.5, 0.6, 0.8, and 1
    and then plot this function

48
Learning Objective 1 Calculating the Entropy
Function
49
Learning Objective 1 Calculating the Entropy
Function
  • Step 1 (In-Class Student Assignment 1.3)
    Calculate the function
  • at points p 0, 0.2, 0.4, 0.5, 0.6, 0.8, and 1
    and then plot this function

50
Learning Objective 1 Calculating the Entropy
Function
51
Learning Objective 1 Calculating the Entropy
Function
  • Step 4 (In-Class Student Assignment 1.4) Draw
    some useful observations about the entropy
    function
  • Observation 1
  • Observation 2
  • Observation 3
  • Observation 4

52
Learning Objective 2 Finding the Optimal
Split
  • We return back to our problem of finding the
    optimal split of the data (1, 2, 3, 4, 5 points
    of class A, and 6, 7, 8, 9, 10 points of class B)
  • The optimal split of the data is the one that
    maximizes the difference in entropy between the
    original dataset (1, 2, 3, 4, 5 points of class
    A, and 6, 7, 8, 9, 10 points of class B) and the
    datasets that result if we split the data into
    two groups of data using all possible x and y
    splits
  • This objective is achieved by focusing on four
    distinct but inter-related steps

53
Learning Objective 2 Finding the Optimal
Split
  • Step 1 Decide which are the possible x and y
    values to use, in order to split the data into
    two groups
  • Possible x-split values
  • Possibly y-split values

54
Learning Objective 2 Finding the Optimal
Split
  • Step 2 For the split values that you have chosen
    to split the data with (in Step 1), determine the
    two groups of data.
  • If we think that all the data (1, 2, 3, 4, 5, 6,
    7, 8., 9, 10) lie at a node of the tree (called
    root node) and data that have attribute values
    smaller than a split value go to a left node,
    branched off from this root node, while data that
    have attribute values larger than or equal to the
    split value go to a right node, branched off from
    this root node, we are interested in determining
    which data points go to the left node and the
    right node of the tree

55
Learning Objective 2 Example of a
Data-Split, x0.35
  • Example of a data-split (data going to the left
    and right node of the tree) for a split value
    x0.35

56
Learning Objective 2 Finding the Optimal
Split
  • Step 2 (Student Assignment 2.1) For the split
    values that you have chosen to split the data
    with (in Step 1), determine the two groups of
    data that go to the left and to the right node of
    the tree, as was illustrated in the previous
    slide for the example split value of x0.35

This is an in-class Assignment Student Assignment
2.1
57
Learning Objective 2 Finding the Optimal
Split
  • Step 3 (Student Assignment 2.2) For the split
    values that you have chosen to split the data
    with (in Step 1), calculate the impurity of the
    data that go to the left and to the right node of
    the tree. You have already identified the data
    that go to the left and the right node of the
    tree (in Step 2) for the split values chosen in
    Step 1.

This is an in-class Assignment Student Assignment
2.2
58
Learning Objective 2 Finding the Optimal
Split
  • Step 4 (Student Assignment 2.3) For the split
    values that you have chosen to split the data
    with (in Step 1), calculate the average impurity
    of the data that go to the left and to the right
    node of the tree. Then, calculate the difference
    between the impurity of the original dataset and
    this average impurity.

This is an in-class Assignment Student Assignment
2.3
59
Learning Objective 2 Finding the Optimal
Split
  • Step 5 Choose as the optimal split value the x
    or y split value for which the difference in
    impurity, calculated in Step 4, was maximum.

60
Learning Objective 2 Finding the Optimal
Split
  • Step 6 By looking at the figures of how the
    optimal split separated the data and how the
    other splits separated the data, comment on the
    validity of the statement
  • the optimal split has reduced the impurity of
    the original data the most

61
Learning Objective 2 Student Assignment
  • Student Assignment 2.4 (to do at home)
  • You are requested to repeat steps 1-6, described
    in class, to determine what is the optimal split
    value for the dataset consisting of data with
    indices 3, 5, 6, 7, 8, 9 and 10 and residing at
    the right child of the node of the tree shown in
    the following figure
  • Data-points 3 and 5 are of class label A, while
    data-points 6, 7, 8, 9 and 10 are of class label
    B

62
Learning Objective 2 Student Assignment
  • Student Assignment 2.4 Repeat steps 1-6 for the
    data residing in node 3 of the tree, in order to
    find the optimal split for this dataset (3, 5, 6,
    7, 8, 9, 10)
Write a Comment
User Comments (0)
About PowerShow.com