Statistical Relational Learning for Knowledge Extraction from the Web

About This Presentation

Title:

Statistical Relational Learning for Knowledge Extraction from the Web

Description:

Statistical Relational Learning for Knowledge Extraction from the Web Hoifung Poon Dept. of Computer Science & Eng. University of Washington * Uncertainty Statistics ... – PowerPoint PPT presentation

Number of Views:182

Avg rating:3.0/5.0

Slides: 92

Provided by: Hoi8

Category:

more less

Transcript and Presenter's Notes

Title: Statistical Relational Learning for Knowledge Extraction from the Web

1
Statistical Relational Learning for Knowledge
Extraction from the Web

Hoifung Poon
Dept. of Computer Science Eng.
University of Washington

1
2
Drowning in Information, Starved for Knowledge
WWW
2
2
2
3
Great VisionKnowledge Extraction from Web
Craven et al., Learning to Construct Knowledge
Bases from the World Wide Web," Artificial
Intelligence, 1999.

Also need
Knowledge representation and reasoning
Close the loop Apply knowledge to extraction
Machine reading Etzioni et al., 2007

3
4
Machine Reading Text ? Knowledge

4
4
4
5
Rapidly Growing Interest

AAAI-07 Spring Symposium on Machine Reading
DARPA Machine Reading Program (2009-2014)
NAACL-10 Workshop on Learning By Reading
Etc.

5
6
Great Impact

Scientific inquiry and commercial applications
Literature-based discovery, robot scientists
Question answering, semantic search
Drug design, medical diagnosis
Breach knowledge acquisition bottleneck for AI
and natural language understanding
Automatically semantify the Web
Etc.

6
7
This Talk

Statistical relational learning offers promising
solutions to machine reading
Markov logic is a leading unifying framework
A success story USP
Unsupervised, end-to-end machine reading
Extracts five times as many correct answers as
state of the art, with highest accuracy of 91

7
8
USP Question-Answer Example
Interestingly, the DEX-mediated IkappaBalpha
induction was completely inhibited by IL-2, but
not IL-4, in Th1 cells, while the reverse profile
was seen in Th2 cells.
Q What does IL-2 control? A The DEX-mediated
IkappaBalpha induction
8
8
9
Overview

Machine reading Challenges
Statistical relational learning
Markov logic
USP Unsupervised Semantic Parsing
Research directions

9
9
9
10
Key Challenges

Complexity
Uncertainty
Pipeline accumulates errors
Supervision is scarce

10
11
Languages Are Structural
governments
lmpxtm (Hebrew according to their families)
IL-4 induces CD11B
Involvement of p70(S6)-kinase activation in IL-10
up-regulation in human monocytes by gp41......
George Walker Bush was the 43rd President of the
United States. Bush was the eldest son of
President G. H. W. Bush and Babara Bush. . In
November 1977, he met Laura Welch at a barbecue.
11
11
11
12
Languages Are Structural
S
govern-ment-s
l-mpx-t-m (Hebrew according to their families)
VP
NP
V
NP
IL-4 induces CD11B
Involvement of p70(S6)-kinase activation in IL-10
up-regulation in human monocytes by gp41......
George Walker Bush was the 43rd President of the
United States. Bush was the eldest son of
President G. H. W. Bush and Babara Bush. . In
November 1977, he met Laura Welch at a barbecue.
involvement
Theme
Cause
up-regulation
activation
Site
Theme
Cause
Theme
human monocyte
IL-10
gp41
p70(S6)-kinase
12
12
12
13
Knowledge Is Heterogeneous

Individuals
E.g. Socrates is a man
Types
E.g. Man is mortal
Inference rules
E.g. Syllogism
Ontological relations
Etc.

MAMMAL
FACE
ISA
ISPART
HUMAN
EYE
13
13
14
Complexity

Can handle using first-order logic
Trees, graphs, dependencies, hierarchies, etc.
easily expressed
Inference algorithms (satisfiability testing,
theorem proving, etc.)
But logic is brittle with uncertainty

14
14
15
Languages Are Ambiguous
Microsoft buys Powerset Microsoft acquires
Powerset Powerset is acquired by Microsoft
Corporation The Redmond software giant buys
Powerset Microsofts purchase of Powerset,
I saw the man with the telescope
NP
I saw the man with the telescope
NP
ADVP
I saw the man with the telescope
G. W. Bush Laura Bush Mrs. Bush
Here in London, Frances Deek is a retired teacher
In the Israeli town , Karen London says Now
London says
Which one?
London ? PERSON or LOCATION?
15
15
15
16
Knowledge Has Uncertainty

We need to model correlations
Our information is always incomplete
Our predictions are uncertain

16
16
17
Uncertainty

Statistics provides the tools to handle this
Mixture models
Hidden Markov models
Bayesian networks
Markov random fields
Maximum entropy models
Conditional random fields
Etc.
But statistical models assume i.i.d.
data(independently and identically distributed)
objects ? feature vectors

18
Pipeline is Suboptimal

E.g., NLP pipeline
Tokenization ? Morphology ? Chunking ? Syntax ?
Accumulates and propagates errors
Wanted Joint inference
Across all processing stages
Among all interdependent objects

18
19
Supervision is Scarce

Tons of text but most is not annotated
Labeling is expensive (Cf. Penn-Treebank)
? Need to leverage indirect supervision

19
19
19
20
Redundancy

Key source of indirect supervision
State-of-the-art systems depend on this
E.g., TextRunner Banko et al., 2007
But Web is heterogeneous Long tail
Redundancy only present in head regime

21
Overview

Machine reading Challenges
Statistical relational learning
Markov logic
USP Unsupervised Semantic Parsing
Research directions

21
21
21
22
Statistical Relational Learning

Burgeoning field in machine learning
Offers promising solutions for machine reading
Unify statistical and logical approaches
Replace pipeline with joint inference
Principled framework to leverage both
direct and indirect supervision

22
22
23
Machine Reading A Vision
Challenge Long tail
23
24
Machine Reading A Vision
24
25
Challenges in Applying Statistical Relational
Learning

Learning is much harder
Inference becomes a crucial issue
Greater complexity for user

25
25
26
Progress to Date

Probabilistic logic Nilsson, 1986
Statistics and beliefs Halpern, 1990
Knowledge-based model constructionWellman et
al., 1992
Stochastic logic programs Muggleton, 1996
Probabilistic relational models Friedman et al.,
1999
Relational Markov networks Taskar et al., 2002
Markov logic Domingos Lowd, 2009
Etc.

26
26
27
Progress to Date

Probabilistic logic Nilsson, 1986
Statistics and beliefs Halpern, 1990
Knowledge-based model constructionWellman et
al., 1992
Stochastic logic programs Muggleton, 1996
Probabilistic relational models Friedman et al.,
1999
Relational Markov networks Taskar et al., 2002
Markov logic Domingos Lowd, 2009
Etc.

Leading unifying framework
27
27
28
Overview

Machine reading
Statistical relational learning
Markov logic
USP Unsupervised Semantic Parsing
Research directions

28
28
28
29
Markov Networks

Undirected graphical models

Cancer
Smoking
Cough
Asthma

Log-linear model

Weight of Feature i
Feature i
29
30
First-Order Logic

Constants, variables, functions, predicatesE.g.
Anna, x, MotherOf(x), Friends(x,y)
Grounding Replace all variables by
constantsE.g. Friends (Anna, Bob)
World (model, interpretation)Assignment of
truth values to all ground predicates

30
31
Markov Logic

Intuition Soften logical constraints
Syntax Weighted first-order formulas
Semantics Feature templates for Markov networks
A Markov Logic Network (MLN) is a set of pairs
(Fi, wi) where
Fi is a formula in first-order logic
wi is a real number

Number of true groundings of Fi
31
32
Example Friends Smokers
32
33
Example Friends Smokers
33
34
Example Friends Smokers
34
35
Example Friends Smokers
Probabilistic graphical models andfirst-order
logic are special cases
Two constants Anna (A) and Bob (B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
35
36
MLN AlgorithmsThe First Three Generations
Problem First generation Second generation Third generation
MAP inference Weighted satisfiability Lazy inference Cutting planes
Marginal inference Gibbs sampling MC-SAT Lifted inference
Weight learning Pseudo-likelihood Voted perceptron Scaled conj. gradient
Structure learning Inductive logic progr. ILP PL (etc.) Clustering pathfinding
36
37
Efficient Inference

Logical or statistical inference already hard
But can do approximate inference
Suffice to perform well in most cases
Combine ideas from both camps
E.g., MC-SAT ? MCMC ? SAT solver
Can also leverage sparsity in relational domains

More Poon Domingos, Sound and Efficient
Inference with Probabilistic and Deterministic
Dependencies, in Proc. AAAI-2006.
More Poon, Domingos Sumner, A General Method
for Reducing the Complexity of Relational
Inference and its Application to MCMC, in Proc.
AAAI-2008.
37
38
Weight Learning

Probability model P(X)
X Observable in training data
Maximize likelihood of observed data
Regularization to prevent overfitting

39
Weight Learning
Requires inference

Gradient descent
Use MC-SAT for inference
Can also leverage second-order information Lowd
Domingos, 2007

No. of times clause i is true in data
Expected no. times clause i is true according to
MLN
39
39
39
40
Unsupervised Learning How?

I.I.D. learning Sophisticated model requires
more labeled data
Statistical relational learning Sophisticated
model may require less labeled data
Ambiguities vary among objects
Joint inference ? Propagate information from
unambiguous objects to ambiguous ones
One formula is worth a thousand labels
Small amount of domain knowledge ?
large-scale joint inference

40
40
40
41
Unsupervised Weight Learning

Probability model P(X,Z)
X Observed in training data
Z Hidden variables
E.g., clustering with mixture models
Z Cluster assignment
X Observed features
Maximize likelihood of observed data by summing
out hidden variables Z

42
Unsupervised Weight Learning

Gradient descent
Use MC-SAT to compute both expectations
May also combine with contrastive estimation

Sum over z, conditioned on observed x
Summed over both x and z
More Poon, Cherry, Toutanova, Unsupervised
Morphological Segmentation with Log-Linear
Models, in Proc. NAACL-2009.
42
42
42
Best Paper Award
43
Markov Logic

Unified inference and learning algorithms
? Can handle millions of variables, billions of
features, ten of thousands of parameters
Easy-to-use software Alchemy
Many successful applications
E.g. Information extraction, coreference
resolution, semantic parsing, ontology induction

43
43
43
44
Pipeline ? Joint Inference

Combine segmentation and entity resolution for
information extraction
Extract complex and nested bio-events from PubMed
abstracts

More Poon Domingos, Joint Inference for
Information Extraction, in Proc. AAAI-2007.
More Poon Vanderwende, Joint Inference for
Knowledge Extraction from Biomedical Literature,
in Proc. NAACL-2010.
44
44
45
Unsupervised Learning Example

Coreference resolution Accuracy comparable to
previous supervised state of the art

More Poon Domingos, Joint Unsupervised
Coreference Resolution with Markov Logic, in
Proc. EMNLP-2008.
45
45
46
Overview

Machine reading Challenges
Statistical relational learning
Markov logic
USP Unsupervised Semantic Parsing
Research directions

46
46
46
47
Unsupervised Semantic Parsing
Best Paper Award

USP Poon Domingos, EMNLP-09
First unsupervised approach for semantic parsing
End-to-end machine reading system
Read text, answer questions
OntoUSP ? USP ? Ontology Induction Poon
Domingos, ACL-10
Encoded in a few Markov logic formulas

47
47
48
Semantic Parsing
Goal
Microsoft buys Powerset
BUY(MICROSOFT,POWERSET)
Challenge
Microsoft buys Powerset Microsoft acquires
semantic search engine Powerset Powerset is
acquired by Microsoft Corporation The Redmond
software giant buys Powerset Microsofts purchase
of Powerset,
48
48
48
49
Limitations of Existing Approaches

Manual grammar or supervised learning
Applicable to restricted domains only
For general text
Not clear what predicates and objects to use
Hard to produce consistent meaning annotation
Also, often learn both syntax and semantics
Fail to leverage advanced syntactic parsers
Make semantic parsing harder

50
USP Key Idea 1

Target predicates and objects can be learned
Viewed as clusters of syntactic or lexical
variations of the same meaning
BUY(-,-)
? ?buys, acquires, s purchase of, ?
? Cluster of various expressions for
acquisition
MICROSOFT
? ?Microsoft, the Redmond software giant, ?
? Cluster of various mentions of Microsoft

50
51
USP Key Idea 2

Relational clustering ? Cluster relations with
same objects
USP ? Recursively cluster arbitrary expressions
with similar subexpressions
Microsoft buys Powerset
Microsoft acquires semantic search engine
Powerset
Powerset is acquired by Microsoft Corporation
The Redmond software giant buys Powerset
Microsofts purchase of Powerset,

51
52
USP Key Idea 2

Relational clustering ? Cluster relations with
same objects
USP ? Recursively cluster arbitrary expressions
with similar subexpressions
Microsoft buys Powerset
Microsoft acquires semantic search engine
Powerset
Powerset is acquired by Microsoft Corporation
The Redmond software giant buys Powerset
Microsofts purchase of Powerset,

Cluster same forms at the atom level
52
53
USP Key Idea 2

Relational clustering ? Cluster relations with
same objects
USP ? Recursively cluster arbitrary expressions
with similar subexpressions
Microsoft buys Powerset
Microsoft acquires semantic search engine
Powerset
Powerset is acquired by Microsoft Corporation
The Redmond software giant buys Powerset
Microsofts purchase of Powerset,

Cluster forms in composition with same forms
53
54
USP Key Idea 2

Relational clustering ? Cluster relations with
same objects
USP ? Recursively cluster arbitrary expressions
with similar subexpressions
Microsoft buys Powerset
Microsoft acquires semantic search engine
Powerset
Powerset is acquired by Microsoft Corporation
The Redmond software giant buys Powerset
Microsofts purchase of Powerset,

Cluster forms in composition with same forms
54
55
USP Key Idea 2

Relational clustering ? Cluster relations with
same objects
USP ? Recursively cluster arbitrary expressions
with similar subexpressions
Microsoft buys Powerset
Microsoft acquires semantic search engine
Powerset
Powerset is acquired by Microsoft Corporation
The Redmond software giant buys Powerset
Microsofts purchase of Powerset,

Cluster forms in composition with same forms
55
56
USP Key Idea 3

Start directly from syntactic analyses
Focus on translating them to semantics
Leverage rapid progress in syntactic parsing
Much easier than learning both

56
57
Joint Inference in USP

Forms canonical meaning representation by
recursively clustering synonymous expressions
Text ? Logical form in this representation
Induces ISA hierarchy among clusters and
applies hierarchical smoothing (shrinkage)

57
58
USP System Overview

Input Dependency trees for sentences
Converts dependency trees into quasi-logical
forms (QLFs)
Starts with QLF clusters at atom level
Recursively builds up clusters of larger forms
Output
Probability distribution over QLF clusters and
their composition
MAP semantic parses of sentences

58
59
Generating Quasi-Logical Forms
buys
nsubj
dobj
Powerset
Microsoft
Convert each node into an unary atom
59
60
Generating Quasi-Logical Forms
buys(n1)
nsubj
dobj
Microsoft(n2)
Powerset(n3)
n1, n2, n3 are Skolem constants
60
61
Generating Quasi-Logical Forms
buys(n1)
nsubj
dobj
Microsoft(n2)
Powerset(n3)
Convert each edge into a binary atom
61
62
Generating Quasi-Logical Forms
buys(n1)
nsubj(n1,n2)
dobj(n1,n3)
Microsoft(n2)
Powerset(n3)
Convert each edge into a binary atom
62
63
A Semantic Parse
buys(n1)
nsubj(n1,n2)
dobj(n1,n3)
Microsoft(n2)
Powerset(n3)
Partition QLF into subformulas
63
64
A Semantic Parse
buys(n1)
nsubj(n1,n2)
dobj(n1,n3)
Microsoft(n2)
Powerset(n3)
Subformula ? Lambda form Replace Skolem
constant not in unary atom with a unique lambda
variable
64
65
A Semantic Parse
buys(n1)
?x2.nsubj(n1,x2)
?x3.dobj(n1,x3)
Microsoft(n2)
Powerset(n3)
Subformula ? Lambda form Replace Skolem
constant not in unary atom with a unique lambda
variable
65
66
A Semantic Parse
Core form
buys(n1)
Argument form
Argument form
?x2.nsubj(n1,x2)
?x3.dobj(n1,x3)
Microsoft(n2)
Powerset(n3)
Core form No lambda variable Argument form One
lambda variable
66
67
A Semantic Parse
buys(n1)
? BUY
?x2.nsubj(n1,x2)
?x3.dobj(n1,x3)
? MICROSOFT
Microsoft(n2)
? POWERSET
Powerset(n3)
Assign subformula to object cluster
67
68
Object Cluster BUY
buys(n1)
0.1
One formula in MLN Learn weights for each pair
of cluster and core form
acquires(n1)
0.2

Distribution over core forms
68
69
Object Cluster BUY
BUYER
buys(n1)
0.1
acquires(n1)
0.2
BOUGHT

PRICE

May contain variable number of property clusters
69
70
Property Cluster BUYER
0.5
0.1
0.2
Zero
?x2.nsubj(n1,x2)
MICROSOFT
Three MLN formulas
0.1
GOOGLE
0.4
0.8
One
?x2.agent(n1,x2)

Distributions over argument forms, clusters, and
number
70
71
Probabilistic Model

Exponential prior on number of parameters
Cluster mixtures

Object Cluster BUY
Property Cluster BUYER
buys
0.1
0.5
MICROSOFT
0.2
nsubj
Zero
0.1
acquires
0.4

0.8
0.4
One
agent
GOOGLE
0.1

71
71
71
72
Probabilistic Model
E.g., picking MICROSOFT as BUYER argument depends
not only on BUY, but also on its ISA ancestors

Exponential prior on number of parameters
Cluster mixtures with hierarchical smoothing

Object Cluster BUY
Property Cluster BUYER
buys
0.1

0.5
MICROSOFT
0.2
nsubj
Zero
0.1
acquires
0.4

0.8
0.4
One
agent
GOOGLE
0.1

72
72
72
73
Abstract Lambda Form

buys(n1)
?x2.nsubj(n1,x2)
?x3.dobj(n1,x3)

Final logical form is obtained via lambda
reduction

BUYS(n1)
?x2.BUYER(n1,x2)
?x3.BOUGHT(n1,x3)

73
74
Challenge State Space Too Large

Potential cluster number ? exp(token-number)
Also, meaning units and clusters often small
? Use combinatorial search

74
74
74
75
Inference Find MAP Parse
induces
nsubj
dobj
Initialize
protein
CD11B
nn
IL-4
Lambda reduction
protein
protein
Search Operator
nn
nn
IL-4
IL-4
75
75
75
76
Learning Greedily Maximize Posterior

enhances
acid
amino
1.0
induces
1.0
1.0
1.0
Initialize
Search Operators
MERGE
COMPOSE
enhances
1.0
induces
1.0
acid
amino
1.0
1.0
induces
0.2
amino acid
1.0
enhances
0.8
76
76
76
77
Operator Abstract
MERGE with REGULATE?
0.3
induces
0.1
enhances
induces
0.6
0.2
inhibits
suppresses
0.1
up-regulates
0.2
INDUCE

ISA
ISA
INHIBIT
INDUCE
inhibits
0.4
inhibits
0.4
induces
0.6
suppresses
INHIBIT
0.2
suppresses
0.2
up-regulates
0.2

Captures substantial similarities
77
77
77
78
Experiments

Apply to machine reading
Extract knowledge from text and answer questions
Evaluation Number of answers and accuracy
GENIA dataset 1999 Pubmed abstracts
Use simple factoid questions, e.g.
What does anti-STAT1 inhibit?
What regulates MIP-1 alpha?

78
78
78
79
Total and Correct Answers
USP extracted five times as many correct answers
as TextRunner Highest precision of 91
79
79
KW-SYN
TextRunner
RESOLVER
DIRT
USP
80
Qualitative Analysis

Resolve many nontrivial variations
Argument forms that mean the same, e.g.,
expression of X ? X expression
X stimulates Y ? Y is stimulated with X
Active vs. passive voices
Synonymous expressions
Etc.

80
80
81
Clusters And Compositions

Clusters in core forms
? investigate, examine, evaluate, analyze, study,
assay ?
? diminish, reduce, decrease, attenuate ?
? synthesis, production, secretion, release ?
? dramatically, substantially, significantly ?
Compositions
amino acid, t cell, immune response,
transcription factor, initiation site, binding
site

81
81
82
Question-Answer Example
Interestingly, the DEX-mediated IkappaBalpha
induction was completely inhibited by IL-2, but
not IL-4, in Th1 cells, while the reverse profile
was seen in Th2 cells.
Q What does IL-2 control? A The DEX-mediated
IkappaBalpha induction
82
82
83
Overview

Machine reading
Statistical relational learning
Markov logic
USP Unsupervised Semantic Parsing
Research directions

83
83
83
84
Web-Scale Joint Inference

Challenge Efficiently identify the relevant
Key Induce and leverage an ontology
Ontology ? Capture essential properties
Abstract away unimportant variations
Upper-level nodes ? Skip irrelevant branches
Wanted Combine the following
Probabilistic ontology induction (e.g., USP)
Coarse-to-fine learning and inference
Felzenszwalb McAllester, 2007 Petrov, Ph.D.
Thesis

84
84
85
Knowledge Reasoning

Most facts/rules are not explicitly stated
Dark matter in the natural language universe
kale contains calcium ? calcium prevent
osteoporosis
? kale prevents osteoporosis
Keys
Induce generic reasoning patterns
Incorporate reasoning in extraction
Additional sources of indirect supervision

85
85
86
Harness Social Computing

Bootstrap online community

Knowledge Base
86
86
87
Harness Social Computing

Bootstrap online community
Incorporate human end tasks in the loop

Tell me everything about dicer applied to
synapse
Knowledge Base
87
87
88
Harness Social Computing
Your extraction from my paper is correct except
for blah

Bootstrap online community
Incorporate human end tasks in the loop

Knowledge Base
88
88
89
Harness Social Computing

Bootstrap online community
Incorporate human end tasks in the loop
Form positive feedback loop

Knowledge Base
89
89
90
Acknowledgments

Pedro Domingos, Colin Cherry, Kristina Toutanova,
Lucy Vanderwende, Oren Etzioni, Dan Weld, Matt
Richardson, Parag Singla, Stanley Kok, Daniel
Lowd, Marc Sumner
ARO, AFRL, ONR, DARPA, NSF

90
90
91
Summary