Markov Logic in Natural Language Processing - PowerPoint PPT Presentation

Loading...

PPT – Markov Logic in Natural Language Processing PowerPoint presentation | free to download - id: 485c82-MmZiM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Markov Logic in Natural Language Processing

Description:

Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington PCFG? * Lifted An attractive solution is to use aux ... – PowerPoint PPT presentation

Number of Views:339
Avg rating:3.0/5.0
Slides: 159
Provided by: pedr92
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Markov Logic in Natural Language Processing


1
Markov Logic in Natural Language Processing
  • Hoifung Poon
  • Dept. of Computer Science Eng.
  • University of Washington

2
Overview
  • Motivation
  • Foundational areas
  • Markov logic
  • NLP applications
  • Basics
  • Supervised learning
  • Unsupervised learning

3
Languages Are Structural
governments lmpxtm (according
to their families)
4
Languages Are Structural
S
govern-ment-s l-mpx-t-m (according
to their families)
VP
NP
V
NP
IL-4 induces CD11B
Involvement of p70(S6)-kinase activation in IL-10
up-regulation in human monocytes by gp41......
George Walker Bush was the 43rd President of the
United States. Bush was the eldest son of
President G. H. W. Bush and Babara Bush. . In
November 1977, he met Laura Welch at a barbecue.
involvement
Theme
Cause
up-regulation
activation
Site
Theme
Cause
Theme
human monocyte
IL-10
gp41
p70(S6)-kinase
5
Languages Are Structural
S
govern-ment-s l-mpx-t-m (according
to their families)
VP
NP
V
NP
IL-4 induces CD11B
Involvement of p70(S6)-kinase activation in IL-10
up-regulation in human monocytes by gp41......
George Walker Bush was the 43rd President of the
United States. Bush was the eldest son of
President G. H. W. Bush and Babara Bush. . In
November 1977, he met Laura Welch at a barbecue.
involvement
Theme
Cause
up-regulation
activation
Site
Theme
Cause
Theme
human monocyte
IL-10
gp41
p70(S6)-kinase
6
Languages Are Structural
  • Objects are not just feature vectors
  • They have parts and subparts
  • Which have relations with each other
  • They can be trees, graphs, etc.
  • Objects are seldom i.i.d. (independent and
    identically distributed)
  • They exhibit local and global dependencies
  • They form class hierarchies (with multiple
    inheritance)
  • Objects properties depend on those of related
    objects
  • Deeply interwoven with knowledge

7
First-Order Logic
  • Main theoretical foundation of computer science
  • General language for describing complex
    structures and knowledge
  • Trees, graphs, dependencies, hierarchies, etc.
    easily expressed
  • Inference algorithms (satisfiability testing,
    theorem proving, etc.)

8
Languages Are Statistical
Microsoft buys Powerset Microsoft acquires
Powerset Powerset is acquired by Microsoft
Corporation The Redmond software giant buys
Powerset Microsofts purchase of Powerset,
I saw the man with the telescope
NP
I saw the man with the telescope
NP
ADVP
I saw the man with the telescope
G. W. Bush Laura Bush Mrs. Bush
Here in London, Frances Deek is a retired teacher
In the Israeli town , Karen London says Now
London says
Which one?
London ? PERSON or LOCATION?
9
Languages Are Statistical
  • Languages are ambiguous
  • Our information is always incomplete
  • We need to model correlations
  • Our predictions are uncertain
  • Statistics provides the tools to handle this

10
Probabilistic Graphical Models
  • Mixture models
  • Hidden Markov models
  • Bayesian networks
  • Markov random fields
  • Maximum entropy models
  • Conditional random fields
  • Etc.

11
The Problem
  • Logic is deterministic, requires manual coding
  • Statistical models assume i.i.d. data, objects
    feature vectors
  • Historically, statistical and logical NLP have
    been pursued separately
  • We need to unify the two!
  • Burgeoning field in machine learning
  • Statistical relational learning

12
Costs and Benefits of Statistical Relational
Learning
  • Benefits
  • Better predictive accuracy
  • Better understanding of domains
  • Enable learning with less or no labeled data
  • Costs
  • Learning is much harder
  • Inference becomes a crucial issue
  • Greater complexity for user

13
Progress to Date
  • Probabilistic logic Nilsson, 1986
  • Statistics and beliefs Halpern, 1990
  • Knowledge-based model construction Wellman et
    al., 1992
  • Stochastic logic programs Muggleton, 1996
  • Probabilistic relational models Friedman et al.,
    1999
  • Relational Markov networks Taskar et al., 2002
  • Etc.
  • This talk Markov logic Domingos Lowd, 2009

14
Markov Logic A Unifying Framework
  • Probabilistic graphical models and first-order
    logic are special cases
  • Unified inference and learning algorithms
  • Easy-to-use software Alchemy
  • Broad applicability
  • Goal of this tutorial Quickly learn how to use
    Markov logic and Alchemy for a broad spectrum of
    NLP applications

15
Overview
  • Motivation
  • Foundational areas
  • Probabilistic inference
  • Statistical learning
  • Logical inference
  • Inductive logic programming
  • Markov logic
  • NLP applications
  • Basics
  • Supervised learning
  • Unsupervised learning

16
Markov Networks
  • Undirected graphical models

Cancer
Smoking
Cough
Asthma
  • Potential functions defined over cliques

Smoking Cancer ?(S,C)
False False 4.5
False True 4.5
True False 2.7
True True 4.5
17
Markov Networks
  • Undirected graphical models

Cancer
Smoking
Cough
Asthma
  • Log-linear model

Weight of Feature i
Feature i
18
Markov Nets vs. Bayes Nets
Property Markov Nets Bayes Nets
Form Prod. potentials Prod. potentials
Potentials Arbitrary Cond. probabilities
Cycles Allowed Forbidden
Partition func. Z ? Z 1
Indep. check Graph separation D-separation
Indep. props. Some Some
Inference MCMC, BP, etc. Convert to Markov
19
Inference in Markov Networks
  • Goal compute marginals conditionals of
  • Exact inference is P-complete
  • Conditioning on Markov blanket is easy
  • Gibbs sampling exploits this

20
MCMC Gibbs Sampling
state ? random truth assignment for i ? 1 to
num-samples do for each variable x
sample x according to P(xneighbors(x))
state ? state with new value of x P(F) ? fraction
of states in which F is true
21
Other Inference Methods
  • Belief propagation (sum-product)
  • Mean field / Variational approximations

22
MAP/MPE Inference
  • Goal Find most likely state of world given
    evidence

Query
Evidence
23
MAP Inference Algorithms
  • Iterated conditional modes
  • Simulated annealing
  • Graph cuts
  • Belief propagation (max-product)
  • LP relaxation

24
Overview
  • Motivation
  • Foundational areas
  • Probabilistic inference
  • Statistical learning
  • Logical inference
  • Inductive logic programming
  • Markov logic
  • NLP applications
  • Basics
  • Supervised learning
  • Unsupervised learning

25
Generative Weight Learning
  • Maximize likelihood
  • Use gradient ascent or L-BFGS
  • No local maxima
  • Requires inference at each step (slow!)

No. of times feature i is true in data
Expected no. times feature i is true according to
model
26
Pseudo-Likelihood
  • Likelihood of each variable given its neighbors
    in the data
  • Does not require inference at each step
  • Widely used in vision, spatial statistics, etc.
  • But PL parameters may not work well for long
    inference chains

27
Discriminative Weight Learning
  • Maximize conditional likelihood of query (y)
    given evidence (x)
  • Approximate expected counts by counts in MAP
    state of y given x

No. of true groundings of clause i in data
Expected no. true groundings according to model
28
Voted Perceptron
  • Originally proposed for training HMMs
    discriminatively
  • Assumes network is linear chain
  • Can be generalized to arbitrary networks

wi ? 0 for t ? 1 to T do yMAP ? Viterbi(x)
wi ? wi ? counti(yData) counti(yMAP) return
? wi / T
29
Overview
  • Motivation
  • Foundational areas
  • Probabilistic inference
  • Statistical learning
  • Logical inference
  • Inductive logic programming
  • Markov logic
  • NLP applications
  • Basics
  • Supervised learning
  • Unsupervised learning

30
First-Order Logic
  • Constants, variables, functions, predicates E.g.
    Anna, x, MotherOf(x), Friends(x, y)
  • Literal Predicate or its negation
  • Clause Disjunction of literals
  • Grounding Replace all variables by
    constants E.g. Friends (Anna, Bob)
  • World (model, interpretation) Assignment of
    truth values to all ground predicates

31
Inference in First-Order Logic
  • Traditionally done by theorem proving (e.g.
    Prolog)
  • Propositionalization followed by model checking
    turns out to be faster (often by a lot)
  • Propositionalization Create all ground atoms and
    clauses
  • Model checking Satisfiability testing
  • Two main approaches
  • Backtracking (e.g. DPLL)
  • Stochastic local search (e.g. WalkSAT)

32
Satisfiability
  • Input Set of clauses (Convert KB to conjunctive
    normal form (CNF))
  • Output Truth assignment that satisfies all
    clauses, or failure
  • The paradigmatic NP-complete problem
  • Solution Search
  • Key point Most SAT problems are actually easy
  • Hard region Narrow range of Clauses / Variables

33
Stochastic Local Search
  • Uses complete assignments instead of partial
  • Start with random state
  • Flip variables in unsatisfied clauses
  • Hill-climbing Minimize unsatisfied clauses
  • Avoid local minima Random flips
  • Multiple restarts

34
The WalkSAT Algorithm
for i ? 1 to max-tries do solution random
truth assignment for j ? 1 to max-flips do
if all clauses satisfied then
return solution c ? random unsatisfied
clause with probability p
flip a random variable in c else
flip variable in c that maximizes satisfied
clauses return failure
35
Overview
  • Motivation
  • Foundational areas
  • Probabilistic inference
  • Statistical learning
  • Logical inference
  • Inductive logic programming
  • Markov logic
  • NLP applications
  • Basics
  • Supervised learning
  • Unsupervised learning

36
Rule Induction
  • Given Set of positive and negative examples of
    some concept
  • Example (x1, x2, , xn, y)
  • y concept (Boolean)
  • x1, x2, , xn attributes (assume Boolean)
  • Goal Induce a set of rules that cover all
    positive examples and no negative ones
  • Rule xa xb ? y (xa Literal, i.e., xi
    or its negation)
  • Same as Horn clause Body ? Head
  • Rule r covers example x iff x satisfies body of r
  • Eval(r) Accuracy, info gain, coverage, support,
    etc.

37
Learning a Single Rule
head ? y body ? Ø repeat for each literal x
rx ? r with x added to body
Eval(rx) body ? body best x until no x
improves Eval(r) return r
38
Learning a Set of Rules
R ? Ø S ? examples repeat learn a single rule
r R ? R U r S ? S - positive examples
covered by r until S Ø return R
39
First-Order Rule Induction
  • y and xi are now predicates with arguments E.g.
    y is Ancestor(x,y), xi is Parent(x,y)
  • Literals to add are predicates or their negations
  • Literal to add must include at least one
    variable already appearing in rule
  • Adding a literal changes groundings of
    rule E.g. Ancestor(x,z) Parent(z,y) ?
    Ancestor(x,y)
  • Eval(r) must take this into account E.g.
    Multiply by positive groundings of rule
    still covered after adding literal

40
Overview
  • Motivation
  • Foundational areas
  • Markov logic
  • NLP applications
  • Basics
  • Supervised learning
  • Unsupervised learning

41
Markov Logic
  • Syntax Weighted first-order formulas
  • Semantics Feature templates for Markov networks
  • Intuition Soften logical constraints
  • Give each formula a weight (Higher weight ?
    Stronger constraint)

42
Example Coreference Resolution
Barack Obama, the 44th President of the United
States, is the first African American to hold the
office.
43
Example Coreference Resolution
44
Example Coreference Resolution
45
Example Coreference Resolution
Two mention constants A and B
Apposition(A,B)
Head(A,President)
Head(B,President)
MentionOf(A,Obama)
MentionOf(B,Obama)
Head(A,Obama)
Head(B,Obama)
Apposition(B,A)
46
Markov Logic Networks
  • MLN is template for ground Markov nets
  • Probability of a world x
  • Typed variables and constants greatly reduce size
    of ground Markov net
  • Functions, existential quantifiers, etc.
  • Can handle infinite domains Singla Domingos,
    2007 and continuous domains Wang
    Domingos, 2008

Weight of formula i
No. of true groundings of formula i in x
47
Relation to Statistical Models
  • Special cases
  • Markov networks
  • Markov random fields
  • Bayesian networks
  • Log-linear models
  • Exponential models
  • Max. entropy models
  • Gibbs distributions
  • Boltzmann machines
  • Logistic regression
  • Hidden Markov models
  • Conditional random fields
  • Obtained by making all predicates zero-arity
  • Markov logic allows objects to be interdependent
    (non-i.i.d.)

48
Relation to First-Order Logic
  • Infinite weights ? First-order logic
  • Satisfiable KB, positive weights ? Satisfying
    assignments Modes of distribution
  • Markov logic allows contradictions between
    formulas

49
MLN Algorithms The First Three Generations
Problem First generation Second generation Third generation
MAP inference Weighted satisfiability Lazy inference Cutting planes
Marginal inference Gibbs sampling MC-SAT Lifted inference
Weight learning Pseudo-likelihood Voted perceptron Scaled conj. gradient
Structure learning Inductive logic progr. ILP PL (etc.) Clustering pathfinding
50
MAP/MPE Inference
  • Problem Find most likely state of world given
    evidence

Query
Evidence
51
MAP/MPE Inference
  • Problem Find most likely state of world given
    evidence

52
MAP/MPE Inference
  • Problem Find most likely state of world given
    evidence

53
MAP/MPE Inference
  • Problem Find most likely state of world given
    evidence
  • This is just the weighted MaxSAT problem
  • Use weighted SAT solver (e.g., MaxWalkSAT Kautz
    et al., 1997 )

54
The MaxWalkSAT Algorithm
for i ? 1 to max-tries do solution random
truth assignment for j ? 1 to max-flips do
if ? weights(sat. clauses) gt threshold then
return solution c ? random
unsatisfied clause with probability p
flip a random variable in c else
flip variable in c that maximizes
? weights(sat. clauses)
return failure, best solution found
55
Computing Probabilities
  • P(FormulaMLN,C) ?
  • MCMC Sample worlds, check formula holds
  • P(Formula1Formula2,MLN,C) ?
  • If Formula2 Conjunction of ground atoms
  • First construct min subset of network necessary
    to answer query (generalization of KBMC)
  • Then apply MCMC

56
But Insufficient for Logic
  • Problem Deterministic dependencies break
    MCMC Near-deterministic ones make it very slow
  • Solution Combine MCMC and WalkSAT ? MC-SAT
    algorithm Poon Domingos, 2006

57
Auxiliary-Variable Methods
  • Main ideas
  • Use auxiliary variables to capture dependencies
  • Turn difficult sampling into uniform sampling
  • Given distribution P(x)
  • Sample from f (x, u), then discard u

58
Slice Sampling Damien et al. 1999
U
P(x)
Slice
u(k)
X
x(k1)
x(k)
59
Slice Sampling
  • Identifying the slice may be difficult
  • Introduce an auxiliary variable ui for each ?i

60
The MC-SAT Algorithm
  • Select random subset M of satisfied clauses
  • With probability 1 exp ( wi )
  • Larger wi ? Ci more likely to be selected
  • Hard clause (wi ? ?) Always selected
  • Slice ? States that satisfy clauses in M
  • Uses SAT solver to sample x u.
  • Orders of magnitude faster than Gibbs sampling,
    etc.

61
But It Is Not Scalable
  • 1000 researchers
  • Coauthor(x,y) 1 million ground atoms
  • Coauthor(x,y) ? Coauthor(y,z) ? Coauthor(x,z) 1
    billion ground clauses
  • Exponential in arity

62
Sparsity to the Rescue
  • 1000 researchers
  • Coauthor(x,y) 1 million ground atoms
  • But most atoms are false
  • Coauthor(x,y) ? Coauthor(y,z) ? Coauthor(x,z)
  • 1 billion ground clauses
  • Most trivially satisfied if most atoms are false
  • No need to explicitly compute most of them

63
Lazy Inference
  • LazySAT Singla Domingos, 2006a
  • Lazy version of WalkSAT Selman et al., 1996
  • Grounds atoms/clauses as needed
  • Greatly reduces memory usage
  • The idea is much more general Poon
    Domingos, 2008a

64
General Method for Lazy Inference
  • If most variables assume the default value,
    wasteful to instantiate all variables / functions
  • Main idea
  • Allocate memory for a small subset of
  • active variables / functions
  • Activate more if necessary as inference proceeds
  • Applicable to a diverse set of algorithms
    Satisfiability solvers (systematic,
    local-search), Markov chain Monte Carlo, MPE /
    MAP algorithms, Maximum expected utility
    algorithms, Belief propagation, MC-SAT, Etc.
  • Reduce memory and time by orders of magnitude

65
Lifted Inference
  • Consider belief propagation (BP)
  • Often in large problems, many nodes are
    interchangeable They send and receive the same
    messages throughout BP
  • Basic idea Group them into supernodes, forming
    lifted network
  • Smaller network ? Faster inference
  • Akin to resolution in first-order logic

66
Belief Propagation
Features (f)
Nodes (x)
67
Lifted Belief Propagation
Features (f)
Nodes (x)
68
Lifted Belief Propagation
?,? Functions of edge counts
?
?
Features (f)
Nodes (x)
69
Learning
  • Data is a relational database
  • Closed world assumption (if not EM)
  • Learning parameters (weights)
  • Learning structure (formulas)

70
Parameter Learning
  • Parameter tying Groundings of same clause
  • Generative learning Pseudo-likelihood
  • Discriminative learning Conditional
    likelihood, use MC-SAT or MaxWalkSAT for inference

No. of times clause i is true in data
Expected no. times clause i is true according to
MLN
71
Parameter Learning
  • Pseudo-likelihood L-BFGS is fast and robust but
    can give poor inference results
  • Voted perceptron Gradient descent MAP
    inference
  • Scaled conjugate gradient

72
Voted Perceptron for MLNs
  • HMMs are special case of MLNs
  • Replace Viterbi by MaxWalkSAT
  • Network can now be arbitrary graph

wi ? 0 for t ? 1 to T do yMAP ?
MaxWalkSAT(x) wi ? wi ? counti(yData)
counti(yMAP) return ? wi / T
73
Problem Multiple Modes
  • Not alleviated by contrastive divergence
  • Alleviated by MC-SAT
  • Warm start Start each MC-SAT run at previous end
    state

74
Problem Extreme Ill-Conditioning
  • Solvable by quasi-Newton, conjugate gradient,
    etc.
  • But line searches require exact inference
  • Solution Scaled conjugate gradient
    Lowd Domingos, 2008
  • Use Hessian to choose step size
  • Compute quadratic form inside MC-SAT
  • Use inverse diagonal Hessian as preconditioner

75
Structure Learning
  • Standard inductive logic programming
    optimizes the wrong thing
  • But can be used to overgenerate for L1 pruning
  • Our approach ILP Pseudo-likelihood Structure
    priors
  • For each candidate structure change Start from
    current weights relax convergence
  • Use subsampling to compute sufficient statistics

76
Structure Learning
  • Initial state Unit clauses or prototype KB
  • Operators Add/remove literal, flip sign
  • Evaluation function Pseudo-likelihood
    Structure prior
  • Search Beam search, shortest-first search

77
Alchemy
  • Open-source software including
  • Full first-order logic syntax
  • Generative discriminative weight learning
  • Structure learning
  • Weighted satisfiability, MCMC, lifted BP
  • Programming language features

alchemy.cs.washington.edu
78
Alchemy Prolog BUGS
Represent-ation F.O. Logic Markov nets Horn clauses Bayes nets
Inference Model check- ing, MCMC, lifted BP Theorem proving MCMC
Learning Parameters structure No Params.
Uncertainty Yes No Yes
Relational Yes Yes No
79
Constrained Conditional Model
  • Representation Integer linear programs
  • Local classifiers Global constraints
  • Inference LP solver
  • Parameter learning None for constraints
  • Weights of soft constraints set heuristically
  • Local weights typically learned independently
  • Structure learning None to date
  • But see latest development in NAACL-10

80
Running Alchemy
  • Programs
  • Infer
  • Learnwts
  • Learnstruct
  • Options
  • MLN file
  • Types (optional)
  • Predicates
  • Formulas
  • Database files

81
Overview
  • Motivation
  • Foundational areas
  • Markov logic
  • NLP applications
  • Basics
  • Supervised learning
  • Unsupervised learning

82
Uniform Distribn. Empty MLN
  • Example Unbiased coin flips
  • Type flip 1, , 20
  • Predicate Heads(flip)

83
Binomial Distribn. Unit Clause
  • Example Biased coin flips
  • Type flip 1, , 20
  • Predicate Heads(flip)
  • Formula Heads(f)
  • Weight Log odds of heads
  • By default, MLN includes unit clauses for all
    predicates
  • (captures marginal distributions, etc.)

84
Multinomial Distribution
  • Example Throwing die
  • Types throw 1, , 20
  • face 1, , 6
  • Predicate Outcome(throw,face)
  • Formulas Outcome(t,f) f ! f gt
    !Outcome(t,f).
  • Exist f Outcome(t,f).
  • Too cumbersome!

85
Multinomial Distrib. ! Notation
  • Example Throwing die
  • Types throw 1, , 20
  • face 1, , 6
  • Predicate Outcome(throw,face!)
  • Formulas
  • Semantics Arguments without ! determine
    arguments with !.
  • Also makes inference more efficient (triggers
    blocking).

86
Multinomial Distrib. Notation
  • Example Throwing biased die
  • Types throw 1, , 20
  • face 1, , 6
  • Predicate Outcome(throw,face!)
  • Formulas Outcome(t,f)
  • Semantics Learn weight for each grounding of
    args with .

87
Logistic Regression (MaxEnt)
Logistic regression Type
obj 1, ... , n Query predicate
C(obj) Evidence predicates Fi(obj) Formulas
a C(x)
bi Fi(x) C(x) Resulting distribution
Therefore Alternative form Fi(x) gt
C(x)
88
Hidden Markov Models
obs Red, Green, Yellow state Stop,
Drive, Slow time 0, ..., 100
State(state!,time) Obs(obs!,time) State(s,0)
State(s,t) State(s',t1) Obs(o,t)
State(s,t) Sparse HMM State(s,t) gt
State(s1,t1) v State(s2, t1) v ... .
89
Bayesian Networks
  • Use all binary predicates with same first
    argument (the object x).
  • One predicate for each variable A A(x,v!)
  • One clause for each line in the CPT and value of
    the variable
  • Context-specific independence One clause for
    each path in the decision tree
  • Logistic regression As before
  • Noisy OR Deterministic OR Pairwise clauses

90
Relational Models
  • Knowledge-based model construction
  • Allow only Horn clauses
  • Same as Bayes nets, except arbitrary relations
  • Combin. function Logistic regression, noisy-OR
    or external
  • Stochastic logic programs
  • Allow only Horn clauses
  • Weight of clause log(p)
  • Add formulas Head holds ? Exactly one body holds
  • Probabilistic relational models
  • Allow only binary relations
  • Same as Bayes nets, except first argument can vary

91
Relational Models
  • Relational Markov networks
  • SQL ? Datalog ? First-order logic
  • One clause for each state of a clique
  • syntax in Alchemy facilitates this
  • Bayesian logic
  • Object Cluster of similar/related observations
  • Observation constants Object constants
  • Predicate InstanceOf(Obs,Obj) and clauses using
    it
  • Unknown relations Second-order Markov logic
  • S. Kok P. Domingos, Statistical Predicate
    Invention, in Proc. ICML-2007.

92
Overview
  • Motivation
  • Foundational areas
  • Markov logic
  • NLP applications
  • Basics
  • Supervised learning
  • Unsupervised learning

93
Text Classification
The 56th quadrennial United States presidential
election was held on November 4, 2008. Outgoing
Republican President George W. Bush's policies
and actions and the American public's desire for
change were key issues throughout the campaign.
Topic politics
The Chicago Bulls are an American professional
basketball team based in Chicago, Illinois,
playing in the Central Division of the Eastern
Conference in the National Basketball Association
(NBA).
Topic sports

94
Text Classification
page 1, ..., max word ... topic ...
Topic(page,topic) HasWord(page,word) Topic(p,
t) HasWord(p,w) gt Topic(p,t) If topics
mutually exclusive Topic(page,topic!)
95
Text Classification
page 1, ..., max word ... topic ...
Topic(page,topic) HasWord(page,word) Links(page
,page) Topic(p,t) HasWord(p,w) gt
Topic(p,t) Topic(p,t) Links(p,p') gt
Topic(p',t) Cf. S. Chakrabarti, B. Dom P.
Indyk, Hypertext Classification Using
Hyperlinks, in Proc. SIGMOD-1998.
96
Entity Resolution
AUTHOR H. POON P. DOMINGOS TITLE UNSUPERVISED
SEMANTIC PARSING VENUE EMNLP-09
SAME?
AUTHOR Hoifung Poon and Pedro Domings TITLE
Unsupervised semantic parsing VENUE Proceedings
of the 2009 Conference on Empirical Methods in
Natural Language Processing
AUTHOR Poon, Hoifung and Domings, Pedro TITLE
Unsupervised ontology induction from text VENUE
Proceedings of the Forty-Eighth Annual Meeting of
the Association for Computational Linguistics
SAME?
AUTHOR H. Poon, P. Domings TITLE Unsupervised
ontology induction VENUE ACL-10
97
Entity Resolution
Problem Given database, find duplicate
records HasToken(token,field,record) SameField(fi
eld,record,record) SameRecord(record,record) HasT
oken(t,f,r) HasToken(t,f,r) gt
SameField(f,r,r) SameField(f,r,r) gt
SameRecord(r,r)
98
Entity Resolution
Problem Given database, find duplicate
records HasToken(token,field,record) SameField(fi
eld,record,record) SameRecord(record,record) HasT
oken(t,f,r) HasToken(t,f,r) gt
SameField(f,r,r) SameField(f,r,r) gt
SameRecord(r,r) SameRecord(r,r)
SameRecord(r,r) gt SameRecord(r,r) Cf.
A. McCallum B. Wellner, Conditional Models of
Identity Uncertainty with Application to Noun
Coreference, in Adv. NIPS 17, 2005.
99
Entity Resolution
Can also resolve fields HasToken(token,field,rec
ord) SameField(field,record,record) SameRecord(rec
ord,record) HasToken(t,f,r)
HasToken(t,f,r) gt SameField(f,r,r) SameFi
eld(f,r,r) ltgt SameRecord(r,r) SameRecord(r,r)
SameRecord(r,r) gt SameRecord(r,r) SameFi
eld(f,r,r) SameField(f,r,r) gt
SameField(f,r,r) More P. Singla P. Domingos,
Entity Resolution with Markov Logic, in Proc.
ICDM-2006.
100
Information Extraction
Unsupervised Semantic Parsing, Hoifung Poon and
Pedro Domingos. Proceedings of the 2009
Conference on Empirical Methods in Natural
Language Processing. Singapore ACL.
UNSUPERVISED SEMANTIC PARSING. H. POON P.
DOMINGOS. EMNLP-2009.
101
Information Extraction
Author
Title
Venue
Unsupervised Semantic Parsing, Hoifung Poon and
Pedro Domingos. Proceedings of the 2009
Conference on Empirical Methods in Natural
Language Processing. Singapore ACL.
SAME?
UNSUPERVISED SEMANTIC PARSING. H. POON P.
DOMINGOS. EMNLP-2009.
102
Information Extraction
  • Problem Extract database from text
    or semi-structured sources
  • Example Extract database of publications from
    citation list(s) (the CiteSeer problem)
  • Two steps
  • Segmentation Use HMM to assign tokens to fields
  • Entity resolution Use logistic regression and
    transitivity

103
Information Extraction
Token(token, position, citation) InField(position,
field!, citation) SameField(field, citation,
citation) SameCit(citation, citation) Token(t,i,
c) gt InField(i,f,c) InField(i,f,c)
InField(i1,f,c) Token(t,i,c)
InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit(c,
c) SameCit(c,c) gt SameCit(c,c)
104
Information Extraction
Token(token, position, citation) InField(position,
field!, citation) SameField(field, citation,
citation) SameCit(citation, citation) Token(t,i,
c) gt InField(i,f,c) InField(i,f,c)
!Token(.,i,c) InField(i1,f,c) Token(t,i,c
) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit(c,
c) SameCit(c,c) gt SameCit(c,c) More H.
Poon P. Domingos, Joint Inference in
Information Extraction, in Proc. AAAI-2007.
105
Biomedical Text Mining
  • Traditionally, name entity recognition or
    information extraction
  • E.g., protein recognition, protein-protein
    identification
  • BioNLP-09 shared task Nested bio-events
  • Much harder than traditional IE
  • Top F1 around 50
  • Naturally calls for joint inference

106
Bio-Event Extraction
Involvement of p70(S6)-kinase activation in IL-10
up-regulation in human monocytes by gp41 envelope
protein of human immunodeficiency virus type 1 ...
involvement
Theme
Cause
up-regulation
activation
Site
Theme
Cause
Theme
human monocyte
p70(S6)-kinase
gp41
IL-10
107
Bio-Event Extraction
Token(position, token) DepEdge(position,
position, dependency) IsProtein(position) EvtType(
position, evtType) InArgPath(position, position,
argType!) Token(i,w) gt EvtType(i,t) Token(j,w)
DepEdge(i,j,d) gt EvtType(i,t) DepEdge(i,j,d
) gt InArgPath(i,j,a) Token(i,w)
DepEdge(i,j,d) gt InArgPath(i,j,a)
Logistic regression
108
Bio-Event Extraction
Token(position, token) DepEdge(position,
position, dependency) IsProtein(position) EvtType(
position, evtType) InArgPath(position, position,
argType!) Token(i,w) gt EvtType(i,t) Token(j,w)
DepEdge(i,j,d) gt EvtType(i,t) DepEdge(i,j,d
) gt InArgPath(i,j,a) Token(i,w)
DepEdge(i,j,d) gt InArgPath(i,j,a) InArgPath(
i,j,Theme) gt IsProtein(j) v
(Exist k k!i InArgPath(j, k,
Theme)). More H. Poon and L. Vanderwende,
Joint Inference for Knowledge Extraction from
Biomedical Literature, 1040 am, June 4, Gold
Room.
Adding a few joint inference rules doubles the F1
109
Temporal Information Extraction
  • Identify event times and temporal relations
    (BEFORE, AFTER, OVERLAP)
  • E.g., who is the President of U.S.A.?
  • Obama 1/20/2009 ? present
  • G. W. Bush 1/20/2001 ? 1/19/2009
  • Etc.

110
Temporal Information Extraction
DepEdge(position, position, dependency) Event(posi
tion, event) After(event, event)
DepEdge(i,j,d) Event(i,p) Event(j,q) gt
After(p,q) After(p,q) After(q,r) gt
After(p,r)
111
Temporal Information Extraction
DepEdge(position, position, dependency) Event(posi
tion, event) After(event, event) Role(position,
position, role) DepEdge(I,j,d) Event(i,p)
Event(j,q) gt After(p,q) Role(i,j,ROLE-AFTER)
Event(i,p) Event(j,q) gt After(p,q) After(p,q)
After(q,r) gt After(p,r) More K. Yoshikawa,
S. Riedel, M. Asahara and Y. Matsumoto, Jointly
Identifying Temporal Relations with Markov
Logic, in Proc. ACL-2009. X. Ling D. Weld,
Temporal Information Extraction, in Proc.
AAAI-2010.
112
Semantic Role Labeling
  • Problem Identify arguments for a predicate
  • Two steps
  • Argument identification Determine whether a
    phrase is an argument
  • Role classification Determine the type of an
    argument (agent, theme, temporal, adjunct, etc.)

113
Semantic Role Labeling
Token(position, token) DepPath(position,
position, path) IsPredicate(position) Role(positio
n, position, role!) HasRole(position, position)
Token(i,t) gt IsPredicate(i) DepPath(i,j,p)
gt Role(i,j,r) HasRole(i,j) gt
IsPredicate(i) IsPredicate(i) gt Exist j
HasRole(i,j) HasRole(i,j) gt Exist r
Role(i,j,r) Role(i,j,r) gt HasRole(i,j) Cf. K.
Toutanova, A. Haghighi, C. Manning, A global
joint model for semantic role labeling, in
Computational Linguistics 2008.
114
Joint Semantic Role Labeling and Word Sense
Disambiguation
Token(position, token) DepPath(position,
position, path) IsPredicate(position) Role(positio
n, position, role!) HasRole(position,
position) Sense(position, sense!) Token(i,t) gt
IsPredicate(i) DepPath(i,j,p) gt
Role(i,j,r) Sense(I,s) gt IsPredicate(i) HasRole
(i,j) gt IsPredicate(i) IsPredicate(i) gt Exist j
HasRole(i,j) HasRole(i,j) gt Exist r
Role(i,j,r) Role(i,j,r) gt HasRole(i,j) Token(i,t
) Role(i,j,r) gt Sense(i,s) More I.
Meza-Ruiz S. Riedel, Jointly Identifying
Predicates, Arguments and Senses using Markov
Logic, in Proc. NAACL-2009.
115
Practical Tips Modeling
  • Add all unit clauses (the default)
  • How to handle uncertain data R(x,y) R(x,y)
    (the HMM trick)
  • Implications vs. conjunctions
  • For soft correlation, conjunctions often better
  • Implication A gt B is equivalent to !(A !B)
  • Share cases with others like A gt C
  • Make learning unnecessarily harder

116
Practical Tips Efficiency
  • Open/closed world assumptions
  • Low clause arities
  • Low numbers of constants
  • Short inference chains

117
Practical Tips Development
  • Start with easy components
  • Gradually expand to full task
  • Use the simplest MLN that works
  • Cycle Add/delete formulas, learn and test

118
Overview
  • Motivation
  • Foundational areas
  • Markov logic
  • NLP applications
  • Basics
  • Supervised learning
  • Unsupervised learning

119
Unsupervised Learning Why?
  • Virtually unlimited supply of unlabeled text
  • Labeling is expensive (Cf. Penn-Treebank)
  • Often difficult to label with consistency and
    high quality (e.g., semantic parses)
  • Emerging field Machine reading
  • Extract knowledge from unstructured text with
    high precision/recall and minimal human effort
  • Check out LBR-Workshop (WS9) on Sunday

120
Unsupervised Learning How?
  • I.i.d. learning Sophisticated model requires
    more labeled data
  • Statistical relational learning Sophisticated
    model may require less labeled data
  • Relational dependencies constrain problem space
  • One formula is worth a thousand labels
  • Small amount of domain knowledge ?
    large-scale joint inference

121
Unsupervised Learning How?
  • Ambiguities vary among objects
  • Joint inference ? Propagate information from
    unambiguous objects to ambiguous ones
  • E.g.
  • G. W. Bush
  • He
  • Mrs. Bush

Are they coreferent?
122
Unsupervised Learning How
  • Ambiguities vary among objects
  • Joint inference ? Propagate information from
    unambiguous objects to ambiguous ones
  • E.g.
  • G. W. Bush
  • He
  • Mrs. Bush

Should be coreferent
123
Unsupervised Learning How
  • Ambiguities vary among objects
  • Joint inference ? Propagate information from
    unambiguous objects to ambiguous ones
  • E.g.
  • G. W. Bush
  • He
  • Mrs. Bush

So must be singular male!
124
Unsupervised Learning How
  • Ambiguities vary among objects
  • Joint inference ? Propagate information from
    unambiguous objects to ambiguous ones
  • E.g.
  • G. W. Bush
  • He
  • Mrs. Bush

Must be singular female!
125
Unsupervised Learning How
  • Ambiguities vary among objects
  • Joint inference ? Propagate information from
    unambiguous objects to ambiguous ones
  • E.g.
  • G. W. Bush
  • He
  • Mrs. Bush

Verdict Not coreferent!
126
Parameter Learning
  • Marginalize out hidden variables
  • Use MC-SAT to approximate both expectations
  • May also combine with contrastive estimation
    Poon Cherry Toutanova, NAACL-2009

Sum over z, conditioned on observed x
Summed over both x and z
127
Unsupervised Coreference Resolution
Head(mention, string) Type(mention,
type) MentionOf(mention, entity)
MentionOf(m,e) Type(m,t) Head(m,h)
MentionOf(m,e) MentionOf(a,e) MentionOf(b,e)
gt (Type(a,t) ltgt Type(b,t)) (similarly for
Number, Gender etc.)
Mixture model
Joint inference formulas Enforce agreement
128
Unsupervised Coreference Resolution
Head(mention, string) Type(mention,
type) MentionOf(mention, entity) Apposition(mentio
n, mention) MentionOf(m,e) Type(m,t) Head(m,
h) MentionOf(m,e) MentionOf(a,e)
MentionOf(b,e) gt (Type(a,t) ltgt Type(b,t))
(similarly for Number, Gender etc.) Apposition(a,
b) gt (MentionOf(a,e) ltgt MentionOf(b,e)) More
H. Poon and P. Domingos, Joint Unsupervised
Coreference Resolution with Markov Logic, in
Proc. EMNLP-2008.
Joint inference formulas Leverage apposition
129
Relational Clustering Discover Unknown Predicates
  • Cluster relations along with objects
  • Use second-order Markov logic
    Kok Domingos, 2007, 2008
  • Key idea Cluster combination determines
    likelihood of relations
  • InClust(r,c) InClust(x,a) InClust(y,b)
    gt r(x,y)
  • Input Relational tuples extracted by TextRunner
    Banko et al., 2007
  • Output Semantic network

130
Recursive Relational Clustering
  • Unsupervised semantic parsing
    Poon Domingos, EMNLP-2009
  • Text ? Knowledge
  • Start directly from text
  • Identify meaning units Resolve variations
  • Use high-order Markov logic (variables over
    arbitrary lambda forms and their clusters)
  • End-to-end machine reading Read
    text, then answer questions

131
Semantic Parsing
INDUCE(e1)
IL-4 protein induces CD11b
INDUCER(e1,e2)
INDUCED(e1,e3)
IL-4(e2)
CD11B(e3)
Structured prediction Partition Assignment
induces
induces
INDUCE
nsubj
dobj
nsubj
dobj
INDUCED
INDUCER
protein
CD11b
protein
CD11b
nn
CD11B
nn
IL-4
IL-4
IL-4
132
Challenge Same Meaning, Many Variations
  • IL-4 up-regulates CD11b
  • Protein IL-4 enhances the expression of CD11b
  • CD11b expression is induced by IL-4 protein
  • The cytokin interleukin-4 induces CD11b
    expression
  • IL-4s up-regulation of CD11b,

133
Unsupervised Semantic Parsing
  • USP ? Recursively cluster arbitrary expressions
    composed with / by similar expressions
  • IL-4 induces CD11b
  • Protein IL-4 enhances the expression of CD11b
  • CD11b expression is enhanced by IL-4 protein
  • The cytokin interleukin-4 induces CD11b
    expression
  • IL-4s up-regulation of CD11b,

134
Unsupervised Semantic Parsing
  • USP ? Recursively cluster arbitrary expressions
    composed with / by similar expressions
  • IL-4 induces CD11b
  • Protein IL-4 enhances the expression of CD11b
  • CD11b expression is enhanced by IL-4 protein
  • The cytokin interleukin-4 induces CD11b
    expression
  • IL-4s up-regulation of CD11b,

Cluster same forms at the atom level
135
Unsupervised Semantic Parsing
  • USP ? Recursively cluster arbitrary expressions
    composed with / by similar expressions
  • IL-4 induces CD11b
  • Protein IL-4 enhances the expression of CD11b
  • CD11b expression is enhanced by IL-4 protein
  • The cytokin interleukin-4 induces CD11b
    expression
  • IL-4s up-regulation of CD11b,

Cluster forms in composition with same forms
136
Unsupervised Semantic Parsing
  • USP ? Recursively cluster arbitrary expressions
    composed with / by similar expressions
  • IL-4 induces CD11b
  • Protein IL-4 enhances the expression of CD11b
  • CD11b expression is enhanced by IL-4 protein
  • The cytokin interleukin-4 induces CD11b
    expression
  • IL-4s up-regulation of CD11b,

Cluster forms in composition with same forms
137
Unsupervised Semantic Parsing
  • USP ? Recursively cluster arbitrary expressions
    composed with / by similar expressions
  • IL-4 induces CD11b
  • Protein IL-4 enhances the expression of CD11b
  • CD11b expression is enhanced by IL-4 protein
  • The cytokin interleukin-4 induces CD11b
    expression
  • IL-4s up-regulation of CD11b,

Cluster forms in composition with same forms
138
Unsupervised Semantic Parsing
  • USP ? Recursively cluster arbitrary expressions
    composed with / by similar expressions
  • IL-4 induces CD11b
  • Protein IL-4 enhances the expression of CD11b
  • CD11b expression is enhanced by IL-4 protein
  • The cytokin interleukin-4 induces CD11b
    expression
  • IL-4s up-regulation of CD11b,

Cluster forms in composition with same forms
139
Unsupervised Semantic Parsing
  • Exponential prior on number of parameters
  • Event/object/property cluster mixtures
  • InClust(e,c) HasValue(e,v)

Object/Event Cluster INDUCE
Property Cluster INDUCER
induces
0.1
0.5
IL-4
0.2
nsubj
None
0.1
enhances
0.4

0.4
One
0.8
agent
IL-8
0.1




140
But State Space Too Large
  • Coreference -clusters ? -mentions
  • USP -clusters ? exp(-tokens)
  • Also, meaning units often small and many
    singleton clusters
  • ? Use combinatorial search

141
Inference Hill-Climb Probability
induces
?
nsubj
dobj
?
?
Initialize
protein
CD11B
?
?
nn
?
IL-4
?
Lambda reduction
protein
protein
?
Search Operator
nn
?
nn
?
IL-4
IL-4
?
142
Learning Hill-Climb Likelihood

protein
enhances
1
1
IL-4
1
induces
1
Initialize
MERGE
COMPOSE
enhances
1
induces
1
1
protein
1
IL-4
Search Operator
induces
0.2
IL-4 protein
1
enhances
0.8
143
Unsupervised Ontology Induction
  • Limitations of USP
  • No ISA hierarchy among clusters
  • Little smoothing
  • Limited capability to generalize
  • OntoUSP Poon Domingos, ACL-2010
  • Extends USP to also induce ISA hierarchy
  • Joint approach for ontology induction,
    population, and knowledge extraction
  • To appear in ACL (see you in Uppsala -)

144
OntoUSP
  • Modify the cluster mixture formula
  • InClust(e,c) ISA(c,d) HasValue(e,v)
  • Hierarchical smoothing clustering
  • New operator in learning

MERGE with REGULATE?
ABSTRACTION
0.3
induces
0.1
enhances
induces
0.6
0.2
inhibits
suppresses
0.1
up-regulates
0.2
INDUCE


ISA
ISA
INHIBIT
INDUCE
inhibits
0.4
inhibits
0.4
induces
0.6
suppresses
INHIBIT
0.2
suppresses
0.2
up-regulates
0.2



145
End of The Beginning
  • Not merely a user guide of MLN and Alchemy
  • Statistical relational learning
  • Growth area for machine learning and NLP

146
Future Work Inference
  • Scale up inference
  • Cutting-planes methods (e.g., Riedel, 2008)
  • Unify lifted inference with sampling
  • Coarse-to-fine inference
  • Alternative technology
  • E.g., linear programming, lagrangian relaxation

147
Future Work Supervised Learning
  • Alternative optimization objectives
  • E.g., max-margin learning Huynh Mooney, 2009
  • Learning for efficient inference
  • E.g., learning arithmetic circuits Lowd
    Domingos, 2008
  • Structure learning
    Improve accuracy and scalability
  • E.g., Kok Domingos, 2009

148
Future Work Unsupervised Learning
  • Model Learning objective, formalism, etc.
  • Learning Local optima, intractability, etc.
  • Hyperparameter tuning
  • Leverage available resources
  • Semi-supervised learning
  • Multi-task learning
  • Transfer learning (e.g., domain adaptation)
  • Human in the loop
  • E.g., interative ML, active learning,
    crowdsourcing

149
Future Work NLP Applications
  • Existing application areas
  • More joint inference opportunities
  • Additional domain knowledge
  • Combine multiple pipeline stages
  • A killer app Machine reading
  • Many, many more awaiting YOU to discover

150
Summary
  • We need to unify logical and statistical NLP
  • Markov logic provides a language for this
  • Syntax Weighted first-order formulas
  • Semantics Feature templates of Markov nets
  • Inference Satisfiability, MCMC, lifted BP, etc.
  • Learning Pseudo-likelihood, VP, PSCG, ILP, etc.
  • Growing set of NLP applications
  • Open-source software Alchemy
  • Book Domingos Lowd, Markov Logic, Morgan
    Claypool, 2009.

alchemy.cs.washington.edu
151
References
  • Banko et al., 2007 Michele Banko, Michael J.
    Cafarella, Stephen Soderland, Matt Broadhead,
    Oren Etzioni, "Open Information Extraction From
    the Web", In Proc. IJCAI-2007.
  • Chakrabarti et al., 1998 Soumen Chakrabarti,
    Byron Dom, Piotr Indyk, "Hypertext Classification
    Using Hyperlinks", in Proc. SIGMOD-1998.
  • Damien et al., 1999 Paul Damien, Jon Wakefield,
    Stephen Walker, "Gibbs sampling for Bayesian
    non-conjugate and hierarchical models by
    auxiliary variables", Journal of the Royal
    Statistical Society B, 612.
  • Domingos Lowd, 2009 Pedro Domingos and Daniel
    Lowd, Markov Logic, Morgan Claypool.
  • Friedman et al., 1999 Nir Friedman, Lise
    Getoor, Daphne Koller, Avi Pfeffer, "Learning
    probabilistic relational models", in Proc.
    IJCAI-1999.

152
References
  • Halpern, 1990 Joe Halpern, "An analysis of
    first-order logics of probability", Artificial
    Intelligence 46.
  • Huynh Mooney, 2009 Tuyen Huynh and Raymond
    Mooney, "Max-Margin Weight Learning for Markov
    Logic Networks", In Proc. ECML-2009.
  • Kautz et al., 1997 Henry Kautz, Bart Selman,
    Yuejun Jiang, "A general stochastic approach to
    solving problems with hard and soft constraints",
    In The Satisfiability Problem Theory and
    Applications. AMS.
  • Kok Domingos, 2007 Stanley Kok and Pedro
    Domingos, "Statistical Predicate Invention", In
    Proc. ICML-2007.
  • Kok Domingos, 2008 Stanley Kok and Pedro
    Domingos, "Extracting Semantic Networks from Text
    via Relational Clustering", In Proc. ECML-2008.

153
References
  • Kok Domingos, 2009 Stanley Kok and Pedro
    Domingos, "Learning Markov Logic Network
    Structure via Hypergraph Lifting", In Proc.
    ICML-2009.
  • Ling Weld, 2010 Xiao Ling and Daniel S.
    Weld, "Temporal Information Extraction", In Proc.
    AAAI-2010.
  • Lowd Domingos, 2007 Daniel Lowd and Pedro
    Domingos, "Efficient Weight Learning for Markov
    Logic Networks", In Proc. PKDD-2007.
  • Lowd Domingos, 2008 Daniel Lowd and Pedro
    Domingos, "Learning Arithmetic Circuits", In
    Proc. UAI-2008.
  • Meza-Ruiz Riedel, 2009 Ivan Meza-Ruiz and
    Sebastian Riedel, "Jointly Identifying
    Predicates, Arguments and Senses using Markov
    Logic", In Proc. NAACL-2009.

154
References
  • Muggleton, 1996 Stephen Muggleton, "Stochastic
    logic programs", in Proc. ILP-1996.
  • Nilsson, 1986 Nil Nilsson, "Probabilistic
    logic", Artificial Intelligence 28.
  • Page et al., 1998 Lawrence Page, Sergey Brin,
    Rajeev Motwani, Terry Winograd, "The PageRank
    Citation Ranking Bringing Order to the Web",
    Tech. Rept., Stanford University, 1998.
  • Poon Domingos, 2006 Hoifung Poon and Pedro
    Domingos, "Sound and Efficient Inference with
    Probabilistic and Deterministic Dependencies", In
    Proc. AAAI-06.
  • Poon Domingos, 2007 Hoifung Poon and Pedro
    Domingo, "Joint Inference in Information
    Extraction", In Proc. AAAI-07.

155
References
  • Poon Domingos, 2008a Hoifung Poon, Pedro
    Domingos, Marc Sumner, "A General Method for
    Reducing the Complexity of Relational Inference
    and its Application to MCMC", In Proc. AAAI-08.
  • Poon Domingos, 2008b Hoifung Poon and Pedro
    Domingos, "Joint Unsupervised Coreference
    Resolution with Markov Logic", In Proc. EMNLP-08.
  • Poon Domingos, 2009 Hoifung and Pedro
    Domingos, "Unsupervised Semantic Parsing", In
    Proc. EMNLP-09.
  • Poon Cherry Toutanova, 2009 Hoifung Poon,
    Colin Cherry, Kristina Toutanova, "Unsupervised
    Morphological Segmentation with Log-Linear
    Models", In Proc. NAACL-2009.

156
References
  • Poon Vanderwende, 2010 Hoifung Poon and Lucy
    Vanderwende, "Joint Inference for Knowledge
    Extraction from Biomedical Literature", In Proc.
    NAACL-10.
  • Poon Domingos, 2010 Hoifung and Pedro
    Domingos, "Unsupervised Ontology Induction From
    Text", In Proc. ACL-10.
  • Riedel 2008 Sebatian Riedel, "Improving the
    Accuracy and Efficiency of MAP Inference for
    Markov Logic", In Proc. UAI-2008.
  • Riedel et al., 2009 Sebastian Riedel, Hong-Woo
    Chun, Toshihisa Takagi and Jun'ichi Tsujii, "A
    Markov Logic Approach to Bio-Molecular Event
    Extraction", In Proc. BioNLP 2009 Shared Task.
  • Selman et al., 1996 Bart Selman, Henry Kautz,
    Bram Cohen, "Local search strategies for
    satisfiability testing", In Cliques, Coloring,
    and Satisfiability Second DIMACS Implementation
    Challenge. AMS.

157
References
  • Singla Domingos, 2006a Parag Singla and Pedro
    Domingos, "Memory-Efficient Inference in
    Relational Domains", In Proc. AAAI-2006.
  • Singla Domingos, 2006b Parag Singla and Pedro
    Domingos, "Entity Resolution with Markov Logic",
    In Proc. ICDM-2006.
  • Singla Domingos, 2007 Parag Singla and Pedro
    Domingos, "Markov Logic in Infinite Domains", In
    Proc. UAI-2007.
  • Singla Domingos, 2008 Parag Singla and Pedro
    Domingos, "Lifted First-Order Belief
    Propagation", In Proc. AAAI-2008.
  • Taskar et al., 2002 Ben Taskar, Pieter Abbeel,
    Daphne Koller, "Discriminative probabilistic
    models for relational data", in Proc. UAI-2002.

158
References
  • Toutanova Haghighi Manning, 2008 Kristina
    Toutanova, Aria Haghighi, Chris Manning, "A
    global joint model for semantic role labeling",
    Computational Linguistics.
  • Wang Domingos, 2008 Jue Wang and Pedro
    Domingos, "Hybrid Markov Logic Networks", In
    Proc. AAAI-2008.
  • Wellman et al., 1992 Michael Wellman, John S.
    Breese, Robert P. Goldman, "From knowledge bases
    to decision models", Knowledge Engineering Review
    7.
  • Yoshikawa et al., 2009 Katsumasa Yoshikawa,
    Sebastian Riedel, Masayuki Asahara and Yuji
    Matsumoto, "Jointly Identifying Temporal
    Relations with Markov Logic", In Proc. ACL-2009.
About PowerShow.com