Loading...

PPT – Advanced Artificial Intelligence Lecture 5: Inductive Logic Programming PowerPoint presentation | free to download - id: 1a3c5b-ZDc1Z

The Adobe Flash plugin is needed to view this content

Advanced Artificial Intelligence Lecture 5

Inductive Logic Programming

- Bob McKay
- School of Computer Science and Engineering
- College of Engineering
- Seoul National University

Outline

- Inductive Logic Programming
- FOIL

What is Relational Learning?

- Relational Learning systems use a representation

language which goes beyond the propositional - Permitting learning about relationships between

data items - a definition of a sort or append function
- family relationships
- spatial or geometric relationships
- temporal relationships
- Relational systems
- Inductive Logic Programming
- Genetic Programming
- Recurrent neural networks can learn simple

relationships - Its often possible to turn a known relationship

into a propositional learning problem - Eg learning time series

Relational Learning - status

- Still largely a research domain rather than

practical applications - Relational learning is difficult
- Computationally expensive
- Data expensive
- Finding good algorithms is difficult
- Hence achievements so far are limited
- An important research domain because
- Relational problems are widespread in practical

applications - There are often no alternative approaches to

these problems

Inductive Logic Programming

- Representation Language
- Prolog (ie first order predicate rules)
- Some systems slightly extend the representation

language - Learning Algorithms
- Usually Deterministic
- Some stochastic (evolutionary) approaches have

been tried, but with limited success - Generally gradient descent algorithms, often

extended with special-purpose heuristic

Some Terminology A Reminder

- An atom a is a formula of the form
- P(x1, , xn)
- (in general logic, x1,
, xn may contain function

symbols, but almost ILP systems are limited to

the case where there are no function symbols) - A literal is either an atom or the negation of an

atom - A (predicate calculus or relational) rule is a

formula of the form - L1 Ln ? L0
- Where all of the Li are literals
- And in particular, L0 is a positive literal (ie

an atom)

Learning Example

- For example, a relational learner might be asked

to learn the member predicate from examples - That is, given
- member(1,1).
- member(1,1,2).
- member(2,1,2).
- Etc.
- It should learn
- member(X,YYs) - X Y.
- member(X,YYs - member(X,Ys).

Negative Examples

- Many relational learning systems require also

negative examples for their learning, such as - That is, given
- Not member(2,1).
- Not member(3,1,2).
- Not, member(4,1,2,3).
- Etc.
- Alternatively, such systems may rely on the

Closed World Assumption - I.e. anything not included in the positive

examples is assumed to be a negative example

FOIL (Quinlan)

- FOIL is probably the easiest of the well-known

ILP systems to understand - Outer loop of the algorithm
- Find a rule that covers some of the positive

instances (and no negatives) - Store the rule and remove the covered data from

the dataset - Re-run the algorithm on the reduced dataset

FOIL Example

- In learning the member predicate, FOIL might

first learn - member(X,YYs) - X Y.
- Then FOIL would be left with the reduced dataset
- member(2,1,2).
- member(2,1,2,3).
- member(3,1,2,3).
- etc

FOIL Inner Loop

- The inner loop generates individual rules
- A general-to-specific algorithm
- Learning a rule of the form Hyp ? Conc
- FOIL begins with Hyp empty
- ie with the rule that says the conclusion is

always satisfied - Adds literals one by one to Hyp
- Using a greedy (ie non-backtracking hillclimbing)

search algorithm.

FOIL Inner Loop Example

- In our example, FOIL would start with the clause
- member(X,YYs).
- Of course, this would have an unacceptable error

rate. - FOIL would try
- member(X,YYs) - P.
- with many different Ps, one being XY
- This would have an acceptable error rate, so the

inner loop would terminate

FOIL Heuristic

- How does FOIL decide what literal to add next?
- FOIL uses the information-gain heuristic
- (same as C4.5)
- The next literal added is that which produces the

highest information gain relative to the data - We will look at this in more detail later

FOIL Heuristic

- However the information gain heuristic is biased

by a couple of extra factors - Introducing variables
- Its generally useful to introduce new variables

into the developing rule - So FOIL gives a small positive bonus to literals

that introduce new variables - Determinacy
- Next slide

FOIL and Determinacy

- A literal is determinate relative to the literals

already in the developing rule if - For each combination of values of old variables

already in the rule - All new variables have only one corresponding

value in the dataset - Determinate literals are often useful in

generating rules - So FOIL has an if all else fails heuristic
- If no new literal produces sufficient

information gain - Where sufficient is a threshold set by the user
- Then FOIL introduces a new determinate literal
- To avoid possible infinite loops, FOIL sets a

bound on the depth of determinate literals it can

introduce - Usually this heuristic introduces too many new

literals - However FOIL has a pruning stage that removes

most of them again

Entropy

- Denote the positive training instances by T, and

the positive instances covered by a particular

clause C as T?C - Then we can estimate the probability that a

particular instance is covered by C as pC T?C

/ T - (there are some additional complications relating

to the effects of new variables introduced by

clause C, but these need not concern us here) - Next, we can define the information entropy of

the clause against the training data as - Entropy(C) - pC log2 pC
- Entropy is a measure of the minimum number of

bits required to encode the status of a

randomly-drawn member of T

Information Gain

- The change in entropy due to adding a new literal

is known as the information gain - The literal chosen by FOIL will that which leads

to the greatest information gain - (with the caveats noted before)

FOIL Pruning Stage

- The information gain of each literal is

calculated relative to the rest of the rule - The literal with the least information gain is

deleted - Until no literal with information gain less than

a predetermined threshold can be found

?????