Learning First-Order Probabilistic Models with Combining Rules - PowerPoint PPT Presentation

About This Presentation

Title:

Learning First-Order Probabilistic Models with Combining Rules

Description:

... depends on the temperature and the rainfall each day since the last freeze ... Modified dataset : The folder names of all the sources were randomized ) 2nd ... – PowerPoint PPT presentation

Number of Views:135

Avg rating:3.0/5.0

Slides: 29

Provided by: Srir2

Learn more at: https://web.engr.oregonstate.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning First-Order Probabilistic Models with Combining Rules

1
Learning First-Order Probabilistic Models with
Combining Rules

Sriraam Natarajan
Prasad Tadepalli
Eric Altendorf
Thomas G. Dietterich
Alan Fern
Angelo Restificar
School of EECS
Oregon State University

2
First-order Probabilistic Models

Combine the expressiveness of first-order logic
with the uncertainty modeling of the graphical
models
Several formalisms already exist
Probabilistic Relational Models (PRMs)
Bayesian Logic Programs (BLPs)
Stochastic Logic Programs (SLPs)
Relational Bayesian Networks (RBNs)
Probabilistic Logic Programs (PLPs),
Parameter sharing and quantification allow
compact representation
The projects difficulty and the project
teams competence influence the projects
success.

3
First-order Probabilistic Models

Combine the expressiveness of first-order logic
with the uncertainty modeling of the graphical
models
Several formalisms already exist
Probabilistic Relational Models (PRMs)
Bayesian Logic Programs (BLPs)
Stochastic Logic Programs (SLPs)
Relational Bayesian Networks (RBNs)
Probabilistic Logic Programs (PLPs),
Parameter sharing and quantification allow
compact representation

The
4
Multiple Parents Problem

Often multiple objects are related to an object
by the same relationship
Ones friends drinking habits influence ones
own
A students GPA depends on the grades in the
courses he takes
The size of a mosquito population depends on the
temperature and the rainfall each day since the
last freeze
The target variable in each of these statements
has multiple influents (parents in Bayes
net jargon)

5
Multiple Parents for population

Variable number of parents
Large number of parents
Need for compact parameterization

6
Solution 1 Aggregators
Rain1
Temp1
Rain2
Temp2
Rain3
Temp3
Deterministic
AverageRain
AverageTemp
Stochastic
Population
Problem Does not take into account the
interaction between related parents Rain and Temp
7
Solution 2 Combining Rules
Rain1
Temp1
Rain2
Temp2
Rain3
Temp3
Population3
Population1
Population2
Population

Top 3 distributions share parameters
The 3 distributions are combined into one final
distribution

8
Outline

First-order Conditional Influence Language
Learning the parameters of Combining Rules
Experiments and Results

First Order Conditional Influence Language
Learning the parameters of Combining Rules
Experiments and Results

10
First-order Conditional Influence Language (FOCIL)

Task and role of a document influence its folder
if task(t), doc(d), role(d,r,t) then
r.id, t.id Qinf d.folder.
The folder of the source of the document
influences the folder of the document
if doc(d1), doc(d2), source(d1,d2) then
d1.folder Qinf d2.folder
The difficulty of the course and the intelligence
of the student influence his/her GPA
if (student(s), course(c), takes(s,c)) then
s.IQ, c.difficulty Qinf s.gpa)

11
Relationship to Other Formalisms

Shares many of the same properties as other
statistical relational models.
Generalizes path expressions in probabilistic
relational
models to arbitrary conjunctions of literals.
Unlike BLPs, explicitly distinguishes between
conditions, which do not allow uncertainty, and
influents, which do.
Monotonicity relationships can be specified.
if person(p) then p.age Q
p.height

12
Combining Multiple Instances of a Single Statement
If task(t), doc(d), role(d,r,t) then
t.id, r.id Qinf (Mean) d.folder
t1.id
r1.id
t2.id
r2.id
d.folder
d.folder
Mean
d.folder
13
A Different FOCIL Statement for the Same Target
Variable
If doc(s), doc(d), source(s,d) then
s.folder Qinf (Mean) d.folder
s1.folder
s2.folder
d.folder
d.folder
Mean
d.folder
14
Combining Multiple Statements

Weighted Mean
If task(t), doc(d), role(d,r,t) then
t.id, r.id Qinf (Mean)
d.folder
If doc(s), doc(d), source(s,d) then
s.folder Qinf (Mean) d.folder

15
Unrolled Network for Folder Prediction
t1.id
r1.id
t2.id
r2.id
s2.folder
s1.folder
d.folder
d.folder
d.folder
d.folder
Mean
Mean
d.folder
d.folder
Weighted Mean
d.folder
16

First Order Conditional Influence Language
Learning the parameters of Combining Rules
Experiments and Results

17
General Unrolled Network
X2m2,k

X2m2,k

X1m1,k
X11,1
X11,k
X12,1
X12,k
X1m1,k
X21,1
X21,k
X22,1
X22,k

m1
m2
1
2
1
2
Mean
Mean
Rule1
Rule2
Y
Weighted mean
18
Gradient Descent for Squared Error

Squared error

where
19
Gradient Descent for Loglikelihood

Loglikelihood

, where
20
Learning the weights

Mean Squared Error
Loglikelihood

21
Expectation-Maximization
X2m2,k

X2m2,k

X1m1,k
X11,1
X11,k
X1m1,k
X21,1
X21,k

?1m1
?21
?11
?2m2
m1
m2
1
1
1/m1
1/m1
1/m2
1/m2
Mean
Mean
w2
w1
Weighted mean
Y
22
EM learning

Expectation-step Compute the responsibilities of
each instance of each rule
Maximization-step Compute the maximum likelihood
parameters using responsibilities as the counts

where n is the of examples with 2 or more rules
instantiated
23

First Order Conditional Influence Language
Learning the parameters of Combining Rules
Experiments and Results

24
Experimental Setup
Weighted Mean If task(t), doc(d), role(d,r,t)
then t.id, r.id Qinf (Mean) d.folder. If doc(s),
doc(d), source(s,d) then s.folder Qinf (Mean)
d.folder.

500 documents, 6 tasks, 2 roles, 11 folders
Each document typically has 1-2 task-role pairs
25 of documents have a source folder
10-fold cross validation

25
Folder prediction task

Mean reciprocal rank
where ni is the number of times the true
folder was ranked as i
Propositional classifiers
Decision trees and Naïve Bayes
Features are the number of occurrences of each
task-role pair and source document folder

26
Rank EM GD- MS GD-LL J48 NB
1 349 354 346 351 326
2 107 98 113 100 110
3 22 26 18 28 34
4 15 12 15 6 19
5 6 4 4 6 4
6 0 0 3 0 0
7 1 4 1 2 0
8 0 2 0 0 1
9 0 0 0 6 1
10 0 0 0 0 0
11 0 0 0 0 5
MRR 0.8299 0.8325 0.8274 0.8279 0.797
27
Learning the weights

Original dataset 2nd rule has more weight ) it
is more predictive when both rules are applicable
Modified dataset The folder names of all the
sources were randomized ) 2nd rule is made
ineffective ) weight of
the 2nd rule decreases

EM GD-MS GD-LL
Original data set Weights h.15,.85i h.22,.78i h.05,.95i
Original data set Score .8299 .8325 .8274
Modified data set Weights h.9,.1i h.84,.16i h1,0i
Modified data set Score .7934 .8021 .7939
28
Lessons from Real-world Data

The propositional learners are almost as good as
the first-order learners in this domain!
The number of parents is 1-2 in this domain
About ¾ of the time only one rule is applicable
Ranking of probabilities is easy in this case
Accurate modeling of the probabilities is needed
Making predictions that combine with other
predictions
Cost-sensitive decision making

29
Synthetic Data Set

2 rules with 2 inputs each Wrule1 0.1,Wrule2
0.9
Probability that an example matches a rule .5
If an example matches a rule, the number of
instances is 3 - 10
Performance metric average absolute error in
predicted probability

30
Synthetic Data Set - Results
31
Synthetic Data Set GDMS
32
Synthetic Data Set GDLL
33
Synthetic Data Set EM
34
Conclusions

Introduced a general instance of multiple parents
problem in first-order probabilistic languages
Gradient descent and EM successfully learn the
parameters of the conditional distributions as
well as the parameters of the combining rules
(weights)
First order methods significantly outperform
propositional methods in modeling the
distributions when the number of parents 3

35
Future Work