Introduction to PLL - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to PLL

Description:

Overview Introduction to PLL Foundations of PLL Logic Programming, Bayesian Networks, Hidden Markov Models, Stochastic Grammars Frameworks of PLL – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 56
Provided by: Eps52
Category:

less

Transcript and Presenter's Notes

Title: Introduction to PLL


1
Overview
  • Introduction to PLL
  • Foundations of PLL
  • Logic Programming, Bayesian Networks, Hidden
    Markov Models, Stochastic Grammars
  • Frameworks of PLL
  • Independent Choice Logic,Stochastic Logic
    Programs, PRISM,
  • Bayesian Logic Programs, Probabilistic Logic
    Programs,Probabilistic Relational Models
  • Logical Hidden Markov Models
  • Applications

2
Probabilistic Logic Programs (PLPs)
Haddawy, Ngo
  • Atoms set of similar RVs
  • First arguments RV
  • Last argument state
  • Clause CPD entry

e b 0.9
  • Probability distribution over Herbrand
    interpretations

0.1 burglary(true). 0.9
burglary(false). 0.01 earthquake(true). 0.99
earthquake(false). 0.9 alarm(true) -
burglary(true), earthquake(true). ...
burglary(true) and burglary(false) true in the
same interpretation ?
false - burglary(true), burglary(false). burglary
(true) burglary(false) - true. false -
earthquake(true), earthquake(false). ...
Integrity constraints
3
Probabilistic Logic Programs (PLPs)
Haddawy, Ngo
father(rex,fred). mother(ann,fred).
father(brian,doro). mother(utta, doro).
father(fred,henry). mother(doro,henry).
Qualitative Part Quantitative Part
1.0 mc(P,a) - mother(M,P), pc(M,a),mc(M,a). 0.0
mc(P,b) - mother(M,P), pc(M,a),mc(M,a). ... 0.
5 pc(P,a) - father(F,P), pc(F,0),mc(F,a). 0.5
pc(P,0) - father(F,P), pc(F,0),mc(F,a). ... 1.0
bt(P,a) - mc(P,a),pc(P,a)
Variable Binding
false - pc(P,a),pc(P,b), pc(P,0). pc(P,a)pc(P,b)
pc(P,0) - person(P). ...
4
Probabilistic Logic Programs (PLPs)
Haddawy, Ngo
father(rex,fred). mother(ann,fred).
father(brian,doro). mother(utta, doro).
father(fred,henry). mother(doro,henry).
1.0 mc(P,a) - mother(M,P), pc(M,a),mc(M,a). 0.0
mc(P,b) - mother(M,P), pc(M,a),mc(M,a). ... 0.
5 pc(P,a) - father(F,P), pc(F,0),mc(F,a). 0.5
pc(P,0) - father(F,P), pc(F,0),mc(F,a). ... 1.0
bt(P,a) - mc(P,aa),pc(P,aa)
mc(ann)
mc(rex)
pc(rex)
pc(ann)
mc(brian)
pc(brian)
mc(utta)
pc(utta)
pc(fred)
pc(doro)
mc(fred)
mc(doro)
bt(brian)
bt(utta)
bt(rex)
bt(ann)
mc(henry)
pc(henry)
bt(fred)
bt(doro)
bt(henry)
false - pc(P,a),pc(P,b), pc(P,0). pc(P,a)pc(P,b)
pc(P,0) - person(P). ...
5
Probabilistic Logic Programs (PLPs)
Haddawy, Ngo
  • Unique probability distribution over Herbrand
    interpretations
  • finite branching factor, finite proofs, no
    self-dependency
  • Atoms States
  • Integrity constraints encode mutually excl.
    states
  • BN used to do inference
  • Functors / Turing-complete programming language
  • BNs, HMMs, DBNs, SCFGs, ...
  • No learning

6
Probabilistic Relational Models (PRMs)
Getoor,Koller, Pfeffer
  • Database theory
  • Entity-Relationship Models
  • Attributes RV

Database
alarm system
Earthquake
Burglary
Table
Alarm
MaryCalls
JohnCalls
Attribute
7
Probabilistic Relational Models (PRMs)
Getoor,Koller, Pfeffer
(Father)
(Mother)
Bloodtype
Bloodtype
M-chromosome
M-chromosome
P-chromosome
P-chromosome
Person
Person
M-chromosome
P-chromosome
Bloodtype
Person
8
Probabilistic Relational Models (PRMs)
Getoor,Koller, Pfeffer
father(Father,Person).
(Father)
(Mother)
mother(Mother,Person).
Bloodtype
Bloodtype
M-chromosome
M-chromosome
P-chromosome
P-chromosome
Person
Person
bt(Person,BT).
M-chromosome
P-chromosome
pc(Person,PC).
mc(Person,MC).
Bloodtype
Person
Dependencies (CPDs associated with)
bt(Person,BT) - pc(Person,PC), mc(Person,MC).
pc(Person,PC) - pc_father(Father,PCf),
mc_father(Father,MCf).
View
pc_father(Person,PCf) father(Father,Person),pc(
Father,PC). ...
9
Probabilistic Relational Models (PRMs)
Getoor,Koller, Pfeffer
father(rex,fred). mother(ann,fred).
father(brian,doro). mother(utta, doro).
father(fred,henry). mother(doro,henry).
pc_father(Person,PCf) father(Father,Person),pc(
Father,PC). ...
mc(Person,MC) pc_mother(Person,PCm),
pc_mother(Person,MCm).
pc(Person,PC) pc_father(Person,PCf),
mc_father(Person,MCf).
bt(Person,BT) pc(Person,PC), mc(Person,MC).
State
RV
10
Probabilistic Relational Models (PRMs)
Getoor,Koller, Pfeffer
  • Database View
  • Unique Probability Distribution over finite
    Herbrand interpretations
  • No self-dependency
  • Discrete and continuous RV
  • BN used to do inference
  • Highlight Graphical Representation
  • Focus on class level
  • BNs
  • Learning

11
Bayesian Logic Programs (BLPs)
Kersting, De Raedt

Rule Graph
earthquake/0
burglary/0
alarm/0
maryCalls/0
johnCalls/0
alarm - earthquake, burglary.
12
Bayesian Logic Programs (BLPs)
Kersting, De Raedt
Rule Graph
pc/1
mc/1
bt/1
variable
bt(Person) - pc(Person),mc(Person).
13
Bayesian Logic Programs (BLPs)
Kersting, De Raedt
pc/1
mc/1
bt/1
mc(Person) mother(Mother,Person),
pc(Mother),mc(Mother).
pc(Person) father(Father,Person),
pc(Father),mc(Father).
bt(Person) pc(Person),mc(Person).
14
Bayesian Logic Programs (BLPs)
Kersting, De Raedt
father(rex,fred). mother(ann,fred).
father(brian,doro). mother(utta, doro).
father(fred,henry). mother(doro,henry).
mc(Person) mother(Mother,Person),
pc(Mother),mc(Mother).
pc(Person) father(Father,Person),
pc(Father),mc(Father).
bt(Person) pc(Person),mc(Person).
Bayesian Network induced over least Herbrand model
15
Bayesian Logic Programs (BLPs)
Kersting, De Raedt
  • Unique probability distribution over Herbrand
    interpretations
  • Finite branching factor, finite proofs, no
    self-dependency
  • Highlight
  • Separation of qualitative and quantitative parts
  • Functors
  • Graphical Representation
  • Discrete and continuous RV
  • BNs, DBNs, HMMs, SCFGs, Prolog ...
  • Turing-complete programming language
  • Learning

16
Declaritive Semantics
  • Dependency Graph
  • (possibly infite) Bayesian network

consequence operator
If the body of C holds then the head holds, too
mc(fred) is true because mother(ann,fred)
mc(ann),pc(ann) are true
17
Procedural Semantics
P(bt(ann)) ?
18
Procedural Semantics
Bayes rule
P(bt(ann) bt(fred))
P(bt(ann), bt(fred)) ?
mc(ann)
pc(ann)
mc(rex)
pc(rex)
mc(brian)
pc(brian)
mc(utta)
pc(utta)
pc(fred)
pc(doro)
mc(fred)
mc(doro)
bt(brian)
bt(rex)
bt(utta)
bt(ann)
mc(henry)
pc(henry)
bt(fred)
bt(doro)
bt(henry)
19
Queries using And/Or trees
P(bt(fred)) ?
bt(fred)
Or node is proven if at least one of its
successors is provable. And node is proven if all
of its successors are provable.
pc(fred), mc(fred)
pc(fred)
mc(fred)
father(rex,fred),mc(rex),pc(rex)
mother(ann,fred),mc(ann),pc(ann)
mc(ann)
pc(ann)
mc(rex)
father(rex,fred)
pc(rex)
mother(ann,fred)
pc(fred)
mc(fred)
mc(rex)
mc(ann)
bt(ann)
pc(rex)
pc(ann))
bt(fred)
...
20
Combining Partial Knowledge
...
discusses/2
read/1
prepared/2
passes/1
21
Combining Partial Knowledge
Topic
discusses
Book
prepared
read
Student
prepared(Student,Topic) read(Student,Book),
discusses(Book,Topic).
  • variable of parents for prepared/2 due to
    read/2
  • whether a student prepared a topic depends on the
    books she read
  • CPD only for one book-topic pair

22
Combining Rules
Topic
P(AB) and P(AC)
discusses
Book
prepared
read
CR
Student
prepared(Student,Topic) read(Student,Book),
discusses(Book,Topic).
P(AB,C)
  • Any algorithm which
  • has an empty output if and only if the input is
    empty
  • combines a set of CPDs into a single (combined)
    CPD
  • E.g. noisy-or, regression, ...

23
Aggregates
  • Map multisets of values to summary values (e.g.,
    sum, average, max, cardinality)

24
Aggregates
  • Map multisets of values to summary values (e.g.,
    sum, average, max, cardinality)

grade_avg/1
Deterministic
25
Summary Model-Theoretic
Underlying logic pogram
If the body holds then the head holds, too.
Consequence operator

Conditional independencies encoded in the
induced BN structure

Local probability models

(macro) CPDs
noisy-or, ...
CRs

Joint probability distribution over the least
Herbrand interpretation

26
Stochastic Relational Models (SRMs)
  • Type I, i.e., frequencies in databases
  • Probability that a select-join query succeeds
  • Independently sample tuples ri from Ri select as
    values for Ai the values r.Ai

27
Stochastic Relational Models (SRMs)
WHO Mortality Database
country.name
death.cause
pers.sex
pers.jCountry
pers.dyear
pers.jDeath
pers.dage
query(persdage75-79y,deathcausek)
0.012 query(persdage85-89y,deathcausek)
0.0012 query(persdage75-79y,deathcauser)
0.02 query(persdage85-89y,deathcauser)
0.114
query(persdage1-4y) 0.00201 query(persdage
25-29y) 7.110-5 query(persdage75-79y)
0.12 query(persdage85-89y) 0.176
28
Learning Tasks
Learning Algorithm
Database
Model
  • Parameter Estimation
  • Numerical Optimization Problem
  • Model Selection
  • Combinatorical Search

29
Differences between SL and PLL ?
  • Representation (cf. above)
  • Structure on the search space becomes more
    complex
  • operators for traversing the space
  • Algorithms remain essentially the same

30
What is the data about? Model Theoretic
E
Earthquake
Burglary
Alarm
JohnCalls
MaryCalls
Model(1) earthquakeyes, burglaryno, alarm?, mar
ycallsyes, johncallsno
Model(3) earthquake?, burglary?, alarmyes, mary
callsyes, johncallsyes
Model(2) earthquakeno, burglaryno, alarmno, mar
ycallsno, johncallsno
31
What is the data about? Model Theoretic
  • Data case
  • Random Variable States (partial) Herbrand
    interpretation
  • Akin to learning from interpretations in ILP

Background m(ann,dorothy), f(brian,dorothy), m(cec
ily,fred), f(henry,fred), f(fred,bob), m(kim,bob),
...
Model(2) bt(cecily)ab, pc(henry)a, mc(fred)?, b
t(kim)a, pc(bob)b
Model(1) pc(brian)b, bt(ann)a, bt(brian)?, bt(d
orothy)a
Model(3) pc(rex)b, bt(doro)a, bt(brian)?
Bloodtype example
32
Parameter Estimation Model Theoretic
Database D
Learning Algorithm

Parameter Q
Underlying Logic program L
33
Parameter Estimation Model Theoretic
  • Estimate the CPD q entries that best fit the data
  • Best fit ML parameters q
  • q argmaxq P( data logic program,
    q)
  • argmaxq log P( data logic
    program, q)
  • Reduces to problem to estimate parameters of a
    Bayesian networks
  • given structure,
  • partially observed random varianbles

34
Parameter Estimation Model Theoretic

35
Excourse Decomposable CRs
E
  • Parameters of the clauses and not of the support
    network.

Multiple ground instance of the same clause
Deterministic CPD for Combining Rule
36
Parameter Estimation Model Theoretic

37
Parameter Estimation Model Theoretic

Parameter tighting
38
EM Model Theoretic
EM-algorithm iterate until convergence
Logic Program L
Expectation
Initial Parameters q0
Current Model (M,qk)
Expected counts of a clause
Maximization
Update parameters (ML, MAP)
39
Model Selection Model Theoretic
Database
Learning Algorithm

Language Bayesian bt/1, pc/1,mc/1 Background
Knowledge Logical mother/2, father/2
40
Model Selection Model Theoretic
  • Combination of ILP and BN learning
  • Combinatorical search for hypo M s.t.
  • M logically covers the data D
  • M is optimal w.r.t. some scoring function score,
    i.e., M argmaxM score(M,D).
  • Highlights
  • Refinement operators
  • Background knowledge
  • Language biase
  • Search bias

41
Refinement Operators
  • Add a fact, delete a fact or refine an existing
    clause
  • Specialization
  • Add atom
  • apply a substitution X / Y where X,Y already
    appear in atom
  • apply a substitution X / f(Y1, , Yn) where
    Yi new variables
  • apply a substitution X / c where c is a
    constant
  • Generalization
  • delete atom
  • turn term into variable
  • p(a,f(b)) becomes p(X,f(b)) or p(a,f(X))
  • p(a,a) becomes p(X,X) or p(a,X) or p(X,a)
  • replace two occurences of variable X into X1 and
    X2
  • p(X,X) becomes p(X1,X2)

42
Example
43
Example
44
Example
45
Example
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
46
Example
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
47
Example
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
48
Example
E
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
...
49
Bias
  • Many clauses can be eliminated a priori
  • Due to type structure of clauses
  • e.g. atom(compound,atom, charge),
  • bond(compound,atom,atom,bondtype)
  • active(compound)
  • eliminate e.g.
  • active(C) - atom(X,C,5)
  • not conform to type structure

50
Bias - continued
  • or to modes of predicates determines calling
    pattern in queries
  • input - output
  • mode(atom(,-,-))
  • mode(bond(,,-,-))
  • all variables in head are (input)
  • active(C) - bond(C,A1,A2,T) not mode conform
  • because A1 does not exist in left part of clause
    and argument declared
  • active(C) - atom(C,A,P), bond(C,A,A2,double)
    mode conform.

51
Conclusions on Learning
  • Algorithms remain essentially the same
  • Not single edges but bunches of edges are
    modified
  • Structure on the search space becomes more
    complex

Refinement Operators
Scores
Statistical Learning
Inductive Logic Programming/ Multi-relational
Data Mining
Independency
Bias
Priors
Background Knowledge
52
Overview
  • Introduction to PLL
  • Foundations of PLL
  • Logic Programming, Bayesian Networks, Hidden
    Markov Models, Stochastic Grammars
  • Frameworks of PLL
  • Independent Choice Logic,Stochastic Logic
    Programs, PRISM,
  • Bayesian Logic Programs, Probabilistic Logic
    Programs,Probabilistic Relational Models
  • Logical Hidden Markov Models
  • Applications

53
Logical (Hidden) Markov Models
Each state is trained independently No sharing of
experience, large state space
54
Logical (Hidden) Markov Models
(0.7) dept(D) -gt course(D,C). (0.2)
dept(D) -gt lecturer(D,L).
... (0.3) course(D,C) -gt lecturer(D,L). (0.3)
course(D,C) -gt dept(D). (0.3) course(D,C) -gt
course(D,C). ... (0.1)
lecturer(D,L) -gt course(D,C).
...
Abstract states
55
Logical (Hidden) Markov Models
  • So far, only transitions between abstract states
  • Needed possible transitions and their
    probabilities for any ground state

lecturer(D,L)
Possible instantiations for each arguments
cs,math,bio,... x luc, wolfram, ...
Chance of instations
P(lecturer(cs,luc))
56
Logical (Hidden) Markov Models
RMMs Anderson et al. 03 Probability
Estimation Trees
lecturer(D,L)
LOHMMs Kersting et al. 03 Naive Bayes
P(D)
P(L)

57
What is the data about? Intermediate
E
  • Data case
  • (partial) traces (or derivations)
  • Akin to Shapiros algorithmic program debuging

Trace(1) dept(cs),course(cs,dm),lecturer(pedro,cs)
,...
Trace(2) dept(bio), course(bio,genetics),lecturer(
mendel,bio), ...
Trace(3) dept(cs),course(cs,stats),dept(cs),couse(
cs,ml), ...
58
What is the data about? Proof Theoretic
E
1.0 S ? NP, VP 1/3 NP ? i 1/3 NP ? Det,
N 1/3 NP ? NP, PP 1.0 Det ? the 0.5 N
? man 0.5 N ? telescope 0.5 VP ? V, NP
0.5 VP ? VP, PP 1.0 PP ? P, NP 1.0 V ?
saw 1.0 P ? with
Example(1) s(I, saw, the, man,).
Example(2) s(the, man, saw, the man,).
Example(3) s(I, saw, the, man, with, the
telescope,).
definite clause grammar
59
What is the data about? Proof Theoretic
E
  • Data case
  • ground fact (or even clauses)
  • Akin to learning from entailment (ILP)

Background m(ann,dorothy), f(brian,dorothy), m(cec
ily,fred), f(henry,fred), f(fred,bob), m(kim,bob),
...
Example(3) bt(brian)ab
Example(1) bt(ann)a.
Example(2) bt(fred)ab
Example(5) mc(dorothy)b
Example(4) pc(brian)a
Bloodtype example
Write a Comment
User Comments (0)
About PowerShow.com