Induction and Decision Trees - PowerPoint PPT Presentation

About This Presentation
Title:

Induction and Decision Trees

Description:

Induction and Decision Trees – PowerPoint PPT presentation

Number of Views:161
Avg rating:3.0/5.0
Slides: 21
Provided by: Patric584
Learn more at: http://cob.jmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Induction and Decision Trees


1
Induction and Decision Trees
2
Artificial Intelligence
  • The design and development of computer systems
    that exhibit intelligent behavior.
  • What is intelligence?
  • Turing test
  • Developed in 1950 by Alan Turing (pioneer in
    computer science)
  • Computer and human in one room
  • Human interrogator in another room
  • Interrogator asks questions...human OR computer
    answers
  • If interrogator cannot tell whether the human or
    the computer is answering, then the computer is
    intelligent

3
Classification of AI Systems
  • Knowledge Representation Systems
  • Capture existing expert knowledge and use it to
    consult end-users and provide decision support
  • Main types Rule-based expert systems, Case-base
    reasoning systems, Frame-based knowledge systems,
    Semantic networks
  • Machine Learning
  • Algorithms that use mathematical or logical
    techniques for finding patterns in data and
    discovering or creating new knowledge
  • Main types Artificial neural networks, genetic
    algorithms, inductive decision trees, Naïve
    Bayesian algorithms, Clustering and
    pattern-recognition algorithms

Data mining involves primarily a machine
learning form of AI
4
Data Mining
  • Textbook definition
  • Knowledge discovery in databases
  • Using statistical, mathematical, AI, and machine
    learning techniques to extract useful information
    and subsequent knowledge from large databases
  • Key point identifying patterns in large data
    sets

5
Microsoft SQL Server Data Mining Algorithms
  • Decision Trees
  • Naïve Bayesian
  • Clustering
  • Sequence Clustering
  • Association Rules
  • Neural Network
  • Time Series

6
Decision Trees for Machine Learning
  • Based on Inductive Logic
  • Three types of logical structures commonly used
    in AI systems
  • Deduction
  • Abduction
  • Induction

7
Deduction
  • Premise (rule) if p then q
  • Fact (axiom, observation) p
  • Conclude q
  • This is classical logic (Modus Ponens). If the
    rule is correct, and the fact is correct, then
    you know that the conclusion will be correct.

We are given the rule
8
Abduction
  • Premise (rule) if p then q
  • Fact (axiom, observation) q
  • Conclude p
  • This form of reasoning is a logical fallacy
    called affirming the consequent (Post hoc ergo
    propter hoc). The conclusion may be wrong, but it
    is a plausible explanation of the fact, given the
    rule. Useful for diagnostic tasks.

We are given the rule
9
Induction
  • Observe p and q together
  • .
  • .
  • .
  • n. Observe p and q together
  • Conclude if p then q
  • This is stereotypical thinkinghighly error
    prone.

We create the rule
10
Example Baby in the kitchen
11
ID3 Decision Tree Algorithm
  • Iterative Dichotomizer
  • Developed by Ross Quinlan (1979)
  • This is the basis for many commercial induction
    products
  • The goal of this algorithm is to find rules
    resulting in YES or NO values. (Therefore, the
    output of generated rules have 2 possible
    outcomes)
  • ID3 generates a tree, where each path of the tree
    represents a rule. The leaf node is the THEN part
    of the rule, and the nodes leading to it are the
    ANDS of attribute-value combinations in the IF
    part of the rule.

12
ID3 Algorithm
  • Starting Point
  • an empty tree (this tree will eventually
    represent the final rules created by ID3)
  • a recordset of data elements (e.g. records from a
    database)
  • a set attributes (fields), each with some finite
    number of possible values
  • NOTE one of the attributes is the decision
    field, with a YES or NO value (or some other
    2-valued option...GOOD/BAD, HIGH/LOW, WIN/LOSE,
    etc.)
  • Output
  • a tree, where each path of the tree represents a
    rule

13
ID3 algorithm
  • If all records in your recordset are positive
    (i.e. have YES values for their decision
    attribute), create a YES node and stop (end
    recursion)
  • If all records in your recordset are negative,
    create a NO node and stop (end recursion)
  • Select the attribute that best discriminates
    among the records (using an entropy function)
  • Create a tree-node representing that attribute,
    with n branches, where n is the number of values
    for the selected attribute
  • Divide the records of the recordset into subsets
    subrecordset 1, subrecordset 2, ..., subrecordset
    n corresponding with each value of the selected
    attribute
  • Recursively apply the algorithm to each
    subrecordset i, with reduced attribute set (dont
    include already used attributes further down the
    path)

14
Calculating Entropy
  • Entropy mixture, chaos
  • We want to pick the attribute with the lowest
    entropy
  • ? ideally, a particular value for the input
    attribute leads to ALL yes or ALL no in the
    outcome attributeor come as close to this as
    possible

An attributes entropy
Where n is the total number of possible values
for the attribute and xi is the ith value
15
Babys RecordSet of Oven-Touching Experiences
16
ID3 Applied to Baby-in-the-Kitchen
  • Which attribute to start with? Based on Entropy
    measure (assuming log base 2),
  • Touch stove entropy 0.918
  • Mom in kitchen entropy 1.0
  • To see this, note that
  • Probability of touching stove leading to ouch is
    .67, and not leading to ouch is .33
  • .67 .33 .22
  • Probability of mom being in kitchen leading to
    ouch is .5 and mom being in kitchen not leading
    to ouch is also .5
  • .5 .5 .25

17
Applying the Touch Stove Attribute
18
Recurse apply the Mom in Kitchen attribute where
needed
19
Resulting decision rules
  • If Touch_Oven No then BOO_BOO No
  • If Touch_Oven Yes and Mom_In_Kitchen Yes then
    BOO_BOO Yes
  • If Touch_Oven Yes and Mom_In_Kitchen No then
    BOO_BOO No

20
Now well do this with Microsoft SQL Server
Write a Comment
User Comments (0)
About PowerShow.com