Advanced Artificial Intelligence Lecture 5: Inductive Logic Programming - PowerPoint PPT Presentation


PPT – Advanced Artificial Intelligence Lecture 5: Inductive Logic Programming PowerPoint presentation | free to download - id: 1a3c5b-ZDc1Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Advanced Artificial Intelligence Lecture 5: Inductive Logic Programming


Relational Learning systems use a representation language which goes beyond the propositional ... with many different Ps, one being X=Y' ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 20
Provided by: scSn


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Advanced Artificial Intelligence Lecture 5: Inductive Logic Programming

Advanced Artificial Intelligence Lecture 5
Inductive Logic Programming
  • Bob McKay
  • School of Computer Science and Engineering
  • College of Engineering
  • Seoul National University

  • Inductive Logic Programming
  • FOIL

What is Relational Learning?
  • Relational Learning systems use a representation
    language which goes beyond the propositional
  • Permitting learning about relationships between
    data items
  • a definition of a sort or append function
  • family relationships
  • spatial or geometric relationships
  • temporal relationships
  • Relational systems
  • Inductive Logic Programming
  • Genetic Programming
  • Recurrent neural networks can learn simple
  • Its often possible to turn a known relationship
    into a propositional learning problem
  • Eg learning time series

Relational Learning - status
  • Still largely a research domain rather than
    practical applications
  • Relational learning is difficult
  • Computationally expensive
  • Data expensive
  • Finding good algorithms is difficult
  • Hence achievements so far are limited
  • An important research domain because
  • Relational problems are widespread in practical
  • There are often no alternative approaches to
    these problems

Inductive Logic Programming
  • Representation Language
  • Prolog (ie first order predicate rules)
  • Some systems slightly extend the representation
  • Learning Algorithms
  • Usually Deterministic
  • Some stochastic (evolutionary) approaches have
    been tried, but with limited success
  • Generally gradient descent algorithms, often
    extended with special-purpose heuristic

Some Terminology A Reminder
  • An atom a is a formula of the form
  • P(x1,…, xn)
  • (in general logic, x1,…, xn may contain function
    symbols, but almost ILP systems are limited to
    the case where there are no function symbols)
  • A literal is either an atom or the negation of an
  • A (predicate calculus or relational) rule is a
    formula of the form
  • L1 … Ln ? L0
  • Where all of the Li are literals
  • And in particular, L0 is a positive literal (ie
    an atom)

Learning Example
  • For example, a relational learner might be asked
    to learn the member predicate from examples
  • That is, given
  • member(1,1).
  • member(1,1,2).
  • member(2,1,2).
  • Etc.
  • It should learn
  • member(X,YYs) - X Y.
  • member(X,YYs - member(X,Ys).

Negative Examples
  • Many relational learning systems require also
    negative examples for their learning, such as
  • That is, given
  • Not member(2,1).
  • Not member(3,1,2).
  • Not, member(4,1,2,3).
  • Etc.
  • Alternatively, such systems may rely on the
    Closed World Assumption
  • I.e. anything not included in the positive
    examples is assumed to be a negative example

FOIL (Quinlan)
  • FOIL is probably the easiest of the well-known
    ILP systems to understand
  • Outer loop of the algorithm
  • Find a rule that covers some of the positive
    instances (and no negatives)
  • Store the rule and remove the covered data from
    the dataset
  • Re-run the algorithm on the reduced dataset

FOIL Example
  • In learning the member predicate, FOIL might
    first learn
  • member(X,YYs) - X Y.
  • Then FOIL would be left with the reduced dataset
  • member(2,1,2).
  • member(2,1,2,3).
  • member(3,1,2,3).
  • etc

FOIL Inner Loop
  • The inner loop generates individual rules
  • A general-to-specific algorithm
  • Learning a rule of the form Hyp ? Conc
  • FOIL begins with Hyp empty
  • ie with the rule that says the conclusion is
    always satisfied
  • Adds literals one by one to Hyp
  • Using a greedy (ie non-backtracking hillclimbing)
    search algorithm.

FOIL Inner Loop Example
  • In our example, FOIL would start with the clause
  • member(X,YYs).
  • Of course, this would have an unacceptable error
  • FOIL would try
  • member(X,YYs) - P.
  • with many different Ps, one being XY
  • This would have an acceptable error rate, so the
    inner loop would terminate

FOIL Heuristic
  • How does FOIL decide what literal to add next?
  • FOIL uses the information-gain heuristic
  • (same as C4.5)
  • The next literal added is that which produces the
    highest information gain relative to the data
  • We will look at this in more detail later

FOIL Heuristic
  • However the information gain heuristic is biased
    by a couple of extra factors
  • Introducing variables
  • Its generally useful to introduce new variables
    into the developing rule
  • So FOIL gives a small positive bonus to literals
    that introduce new variables
  • Determinacy
  • Next slide

FOIL and Determinacy
  • A literal is determinate relative to the literals
    already in the developing rule if
  • For each combination of values of old variables
    already in the rule
  • All new variables have only one corresponding
    value in the dataset
  • Determinate literals are often useful in
    generating rules
  • So FOIL has an if all else fails heuristic
  • If no new literal produces sufficient
    information gain
  • Where sufficient is a threshold set by the user
  • Then FOIL introduces a new determinate literal
  • To avoid possible infinite loops, FOIL sets a
    bound on the depth of determinate literals it can
  • Usually this heuristic introduces too many new
  • However FOIL has a pruning stage that removes
    most of them again

  • Denote the positive training instances by T, and
    the positive instances covered by a particular
    clause C as T?C
  • Then we can estimate the probability that a
    particular instance is covered by C as pC T?C
    / T
  • (there are some additional complications relating
    to the effects of new variables introduced by
    clause C, but these need not concern us here)
  • Next, we can define the information entropy of
    the clause against the training data as
  • Entropy(C) - pC log2 pC
  • Entropy is a measure of the minimum number of
    bits required to encode the status of a
    randomly-drawn member of T

Information Gain
  • The change in entropy due to adding a new literal
    is known as the information gain
  • The literal chosen by FOIL will that which leads
    to the greatest information gain
  • (with the caveats noted before)

FOIL Pruning Stage
  • The information gain of each literal is
    calculated relative to the rest of the rule
  • The literal with the least information gain is
  • Until no literal with information gain less than
    a predetermined threshold can be found