Advanced Artificial Intelligence Lecture 5: Inductive Logic Programming presentation

About This Presentation

Transcript and Presenter's Notes

Title: Advanced Artificial Intelligence Lecture 5: Inductive Logic Programming

1
Advanced Artificial IntelligenceLecture 5
Inductive Logic Programming

Bob McKay
School of Computer Science and Engineering
College of Engineering
Seoul National University

2
Outline

Inductive Logic Programming
FOIL

3
What is Relational Learning?

Relational Learning systems use a representation
language which goes beyond the propositional
Permitting learning about relationships between
data items
a definition of a sort or append function
family relationships
spatial or geometric relationships
temporal relationships
Relational systems
Inductive Logic Programming
Genetic Programming
Recurrent neural networks can learn simple
relationships
Its often possible to turn a known relationship
into a propositional learning problem
Eg learning time series

4
Relational Learning - status

Still largely a research domain rather than
practical applications
Relational learning is difficult
Computationally expensive
Data expensive
Finding good algorithms is difficult
Hence achievements so far are limited
An important research domain because
Relational problems are widespread in practical
applications
There are often no alternative approaches to
these problems

5
Inductive Logic Programming

Representation Language
Prolog (ie first order predicate rules)
Some systems slightly extend the representation
language
Learning Algorithms
Usually Deterministic
Some stochastic (evolutionary) approaches have
been tried, but with limited success
Generally gradient descent algorithms, often
extended with special-purpose heuristic

6
Some Terminology A Reminder

An atom a is a formula of the form
P(x1,, xn)
(in general logic, x1,, xn may contain function
symbols, but almost ILP systems are limited to
the case where there are no function symbols)
A literal is either an atom or the negation of an
atom
A (predicate calculus or relational) rule is a
formula of the form
L1 Ln ? L0
Where all of the Li are literals
And in particular, L0 is a positive literal (ie
an atom)

7
Learning Example

For example, a relational learner might be asked
to learn the member predicate from examples
That is, given
member(1,1).
member(1,1,2).
member(2,1,2).
Etc.
It should learn
member(X,YYs) - X Y.
member(X,YYs - member(X,Ys).

8
Negative Examples

Many relational learning systems require also
negative examples for their learning, such as
That is, given
Not member(2,1).
Not member(3,1,2).
Not, member(4,1,2,3).
Etc.
Alternatively, such systems may rely on the
Closed World Assumption
I.e. anything not included in the positive
examples is assumed to be a negative example

9
FOIL (Quinlan)

FOIL is probably the easiest of the well-known
ILP systems to understand
Outer loop of the algorithm
Find a rule that covers some of the positive
instances (and no negatives)
Store the rule and remove the covered data from
the dataset
Re-run the algorithm on the reduced dataset

10
FOIL Example

In learning the member predicate, FOIL might
first learn
member(X,YYs) - X Y.
Then FOIL would be left with the reduced dataset
member(2,1,2).
member(2,1,2,3).
member(3,1,2,3).
etc

11
FOIL Inner Loop

The inner loop generates individual rules
A general-to-specific algorithm
Learning a rule of the form Hyp ? Conc
FOIL begins with Hyp empty
ie with the rule that says the conclusion is
always satisfied
Adds literals one by one to Hyp
Using a greedy (ie non-backtracking hillclimbing)
search algorithm.

12
FOIL Inner Loop Example

In our example, FOIL would start with the clause
member(X,YYs).
Of course, this would have an unacceptable error
rate.
FOIL would try
member(X,YYs) - P.
with many different Ps, one being XY
This would have an acceptable error rate, so the
inner loop would terminate

13
FOIL Heuristic

How does FOIL decide what literal to add next?
FOIL uses the information-gain heuristic
(same as C4.5)
The next literal added is that which produces the
highest information gain relative to the data
We will look at this in more detail later

14
FOIL Heuristic

However the information gain heuristic is biased
by a couple of extra factors
Introducing variables
Its generally useful to introduce new variables
into the developing rule
So FOIL gives a small positive bonus to literals
that introduce new variables
Determinacy
Next slide

15
FOIL and Determinacy

A literal is determinate relative to the literals
already in the developing rule if
For each combination of values of old variables
already in the rule
All new variables have only one corresponding
value in the dataset
Determinate literals are often useful in
generating rules
So FOIL has an if all else fails heuristic
If no new literal produces sufficient
information gain
Where sufficient is a threshold set by the user
Then FOIL introduces a new determinate literal
To avoid possible infinite loops, FOIL sets a
bound on the depth of determinate literals it can
introduce
Usually this heuristic introduces too many new
literals
However FOIL has a pruning stage that removes
most of them again

16
Entropy

Denote the positive training instances by T, and
the positive instances covered by a particular
clause C as T?C
Then we can estimate the probability that a
particular instance is covered by C as pC T?C
/ T
(there are some additional complications relating
to the effects of new variables introduced by
clause C, but these need not concern us here)
Next, we can define the information entropy of
the clause against the training data as
Entropy(C) - pC log2 pC
Entropy is a measure of the minimum number of
bits required to encode the status of a
randomly-drawn member of T

17
Information Gain

The change in entropy due to adding a new literal
is known as the information gain
The literal chosen by FOIL will that which leads
to the greatest information gain
(with the caveats noted before)

18
FOIL Pruning Stage

The information gain of each literal is
calculated relative to the rest of the rule
The literal with the least information gain is
deleted
Until no literal with information gain less than
a predetermined threshold can be found

19
?????

Write a Comment

User Comments (0)

About PowerShow.com

Advanced Artificial Intelligence Lecture 5: Inductive Logic Programming PowerPoint PPT Presentation