Decision Trees - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Decision Trees

Description:

For every possible combination of values, a1=x1, a2=x2, ... dodge challenger se. 1. 70. 10. 3563. 170. 383. 8. 15. amc ambassador dpl. 1. 70. 8.5. 3850. 190 ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 28
Provided by: busi210
Category:

less

Transcript and Presenter's Notes

Title: Decision Trees


1
Decision Trees
2
Contingency Tables
  • A better name for a histogram
  • A One-dimensional Contingency Table
  • Recipe for making a k-dimensional contingency
    table
  • Pick k attributes from your dataset. Call them
    a1,a2, ak.
  • For every possible combination of values, a1x1,
    a2x2, akxk ,record how frequently that
    combination occurs

3
A 2-d Contingency Table
For each pair of values for attributes (age,
wealth) we can see how many records match.
4
A 3-d Contingency Tables
5
Contingency Tables
  • With 16 attributes, how many 1-d contingency
    tables are there?
  • How many 2-d contingency tables?
  • How many 3-d tables?
  • With 100 attributes how many 3-d tables are
    there?

6
Contingency Tables
  • With 16 attributes, how many 1-d contingency
    tables are there? 16
  • How many 2-d contingency tables? 16- choose-2
    16 15 / 2 120
  • How many 3-d tables? 560
  • With 100 attributes how many 3-d tables are
    there? 161,700

7
Manually looking at contingencytables
  • Looking at one contingency table can be as much
    fun as reading an interesting book
  • Looking at ten tables as much fun as watching
    CNN
  • Looking at 100 tables as much fun as watching an
    infomercial
  • Looking at 100,000 tables as much fun as a three
    week November vacation in Duluth with a dying
    weasel

8
Searching for High Info Gains
Given something (e.g. wealth) you are trying to
predict, it is easy to ask the computer to find
which attribute has highest information gain for
it.
9
Decision Trees
  • A decision tree is a graph of decisions and their
    possible consequences, (including resource costs
    and risks) used to create a plan to reach a goal.
  • Decision trees are constructed in order to help
    with making decisions. A decision tree is a
    special form of tree structure.

10
Sample Tree
Interior nodes representattributes Arcs between
nodes represent possible valuesof the attributes
Leaf nodes represent value of outcomevariable
given values of attributes on the path to the
leaf node.
11
Types of Decision Trees
  • Classification tree
  • Outcome variable is a categorical variable
  • Regression tree
  • Outcome variable is a continuous variable

12
Play Golf Dataset
13
Decision Tree of Golf Data
Sunny
14
Conclusion
  • The best way to explain the attribute play is
    with the attribute Outlook
  • First conclusion, people always play when its
    overcast
  • On days it rains, the attribute Windy explains
    whether people play or not
  • On days when its sunny, the attribute humidity
    explains when people play

15
Decision Tree as Rules
If Outlook Overcast Then Play If Outlook
Sunny and Humidity lt 70 Then Play ElseIf Outlook
Sunny and Humidity gt 70 Then Dont Play If
Outlook Rain and Windy True Then Dont
Play ElseIf Outlook Rain and Windy False Then
Play
16
Learning Decision Trees
  • To decide which attribute should be tested first,
    simply find the one with the highest information
    gain.
  • Then recurse

17
Example Data
18
Decision Tree
MPG is outcome variable
Cylinders3
Cylinders5
Cylinders8
Cylinders4
Cylinders6
19
Recursion Step
20
Decision Tree
21
Second Level of Tree
Next level of tree
22
Final Tree
23
Final Tree
Dont split a case if all matching instances have
the same outcome
24
Final Tree
Dont split a node if none of the attributes can
create non-empty children
25
Final Tree
No attribute can distinguish because
none provides any information
26
Confidence and Support
  • Confidence refers to the relative frequency that
    an event occurs
  • If golfers play 8 out of the 10 days its
    overcast then we have 8/10 confidence that
    golfers will play on overcast days
  • Support refers to number of times an event occurs
    out of all instances
  • If its only overcast 1 day in 100 then there is
    only 1/100 support for the rule given above

27
Pruning the Tree
  • We can set limits on how deep we want to build
    the tree
  • If there is insufficient support for a new branch
  • Not enough instances to make it worthwhile
  • Have to set a cutoff value for the algorithm
  • Want to avoid data which is actually irrelevant
    but appears to be relevant in the data used to
    build the tree
  • There are some statistical techniques to identify
    noisy data
Write a Comment
User Comments (0)
About PowerShow.com