Statistical Relational Learning for Knowledge Extraction from the Web - PowerPoint PPT Presentation

About This Presentation
Title:

Statistical Relational Learning for Knowledge Extraction from the Web

Description:

Statistical Relational Learning for Knowledge Extraction from the Web Hoifung Poon Dept. of Computer Science & Eng. University of Washington * Uncertainty Statistics ... – PowerPoint PPT presentation

Number of Views:182
Avg rating:3.0/5.0
Slides: 92
Provided by: Hoi8
Category:

less

Transcript and Presenter's Notes

Title: Statistical Relational Learning for Knowledge Extraction from the Web


1
Statistical Relational Learning for Knowledge
Extraction from the Web
  • Hoifung Poon
  • Dept. of Computer Science Eng.
  • University of Washington

1
2
Drowning in Information, Starved for Knowledge
WWW
2
2
2
3
Great VisionKnowledge Extraction from Web
Craven et al., Learning to Construct Knowledge
Bases from the World Wide Web," Artificial
Intelligence, 1999.
  • Also need
  • Knowledge representation and reasoning
  • Close the loop Apply knowledge to extraction
  • Machine reading Etzioni et al., 2007

3
4
Machine Reading Text ? Knowledge

4
4
4
5
Rapidly Growing Interest
  • AAAI-07 Spring Symposium on Machine Reading
  • DARPA Machine Reading Program (2009-2014)
  • NAACL-10 Workshop on Learning By Reading
  • Etc.

5
6
Great Impact
  • Scientific inquiry and commercial applications
  • Literature-based discovery, robot scientists
  • Question answering, semantic search
  • Drug design, medical diagnosis
  • Breach knowledge acquisition bottleneck for AI
    and natural language understanding
  • Automatically semantify the Web
  • Etc.

6
7
This Talk
  • Statistical relational learning offers promising
    solutions to machine reading
  • Markov logic is a leading unifying framework
  • A success story USP
  • Unsupervised, end-to-end machine reading
  • Extracts five times as many correct answers as
    state of the art, with highest accuracy of 91

7
8
USP Question-Answer Example
Interestingly, the DEX-mediated IkappaBalpha
induction was completely inhibited by IL-2, but
not IL-4, in Th1 cells, while the reverse profile
was seen in Th2 cells.
Q What does IL-2 control? A The DEX-mediated
IkappaBalpha induction
8
8
9
Overview
  • Machine reading Challenges
  • Statistical relational learning
  • Markov logic
  • USP Unsupervised Semantic Parsing
  • Research directions

9
9
9
10
Key Challenges
  • Complexity
  • Uncertainty
  • Pipeline accumulates errors
  • Supervision is scarce

10
11
Languages Are Structural
governments
lmpxtm (Hebrew according to their families)
IL-4 induces CD11B
Involvement of p70(S6)-kinase activation in IL-10
up-regulation in human monocytes by gp41......
George Walker Bush was the 43rd President of the
United States. Bush was the eldest son of
President G. H. W. Bush and Babara Bush. . In
November 1977, he met Laura Welch at a barbecue.
11
11
11
12
Languages Are Structural
S
govern-ment-s
l-mpx-t-m (Hebrew according to their families)
VP
NP
V
NP
IL-4 induces CD11B
Involvement of p70(S6)-kinase activation in IL-10
up-regulation in human monocytes by gp41......
George Walker Bush was the 43rd President of the
United States. Bush was the eldest son of
President G. H. W. Bush and Babara Bush. . In
November 1977, he met Laura Welch at a barbecue.
involvement
Theme
Cause
up-regulation
activation
Site
Theme
Cause
Theme
human monocyte
IL-10
gp41
p70(S6)-kinase
12
12
12
13
Knowledge Is Heterogeneous
  • Individuals
  • E.g. Socrates is a man
  • Types
  • E.g. Man is mortal
  • Inference rules
  • E.g. Syllogism
  • Ontological relations
  • Etc.

MAMMAL
FACE
ISA
ISPART
HUMAN
EYE
13
13
14
Complexity
  • Can handle using first-order logic
  • Trees, graphs, dependencies, hierarchies, etc.
    easily expressed
  • Inference algorithms (satisfiability testing,
    theorem proving, etc.)
  • But logic is brittle with uncertainty

14
14
15
Languages Are Ambiguous
Microsoft buys Powerset Microsoft acquires
Powerset Powerset is acquired by Microsoft
Corporation The Redmond software giant buys
Powerset Microsofts purchase of Powerset,
I saw the man with the telescope
NP
I saw the man with the telescope
NP
ADVP
I saw the man with the telescope
G. W. Bush Laura Bush Mrs. Bush
Here in London, Frances Deek is a retired teacher
In the Israeli town , Karen London says Now
London says
Which one?
London ? PERSON or LOCATION?
15
15
15
16
Knowledge Has Uncertainty
  • We need to model correlations
  • Our information is always incomplete
  • Our predictions are uncertain

16
16
17
Uncertainty
  • Statistics provides the tools to handle this
  • Mixture models
  • Hidden Markov models
  • Bayesian networks
  • Markov random fields
  • Maximum entropy models
  • Conditional random fields
  • Etc.
  • But statistical models assume i.i.d.
    data(independently and identically distributed)
    objects ? feature vectors

18
Pipeline is Suboptimal
  • E.g., NLP pipeline
  • Tokenization ? Morphology ? Chunking ? Syntax ?
  • Accumulates and propagates errors
  • Wanted Joint inference
  • Across all processing stages
  • Among all interdependent objects

18
19
Supervision is Scarce
  • Tons of text but most is not annotated
  • Labeling is expensive (Cf. Penn-Treebank)
  • ? Need to leverage indirect supervision

19
19
19
20
Redundancy
  • Key source of indirect supervision
  • State-of-the-art systems depend on this
  • E.g., TextRunner Banko et al., 2007
  • But Web is heterogeneous Long tail
  • Redundancy only present in head regime

21
Overview
  • Machine reading Challenges
  • Statistical relational learning
  • Markov logic
  • USP Unsupervised Semantic Parsing
  • Research directions

21
21
21
22
Statistical Relational Learning
  • Burgeoning field in machine learning
  • Offers promising solutions for machine reading
  • Unify statistical and logical approaches
  • Replace pipeline with joint inference
  • Principled framework to leverage both
    direct and indirect supervision

22
22
23
Machine Reading A Vision
Challenge Long tail
23
24
Machine Reading A Vision
24
25
Challenges in Applying Statistical Relational
Learning
  • Learning is much harder
  • Inference becomes a crucial issue
  • Greater complexity for user

25
25
26
Progress to Date
  • Probabilistic logic Nilsson, 1986
  • Statistics and beliefs Halpern, 1990
  • Knowledge-based model constructionWellman et
    al., 1992
  • Stochastic logic programs Muggleton, 1996
  • Probabilistic relational models Friedman et al.,
    1999
  • Relational Markov networks Taskar et al., 2002
  • Markov logic Domingos Lowd, 2009
  • Etc.

26
26
27
Progress to Date
  • Probabilistic logic Nilsson, 1986
  • Statistics and beliefs Halpern, 1990
  • Knowledge-based model constructionWellman et
    al., 1992
  • Stochastic logic programs Muggleton, 1996
  • Probabilistic relational models Friedman et al.,
    1999
  • Relational Markov networks Taskar et al., 2002
  • Markov logic Domingos Lowd, 2009
  • Etc.

Leading unifying framework
27
27
28
Overview
  • Machine reading
  • Statistical relational learning
  • Markov logic
  • USP Unsupervised Semantic Parsing
  • Research directions

28
28
28
29
Markov Networks
  • Undirected graphical models

Cancer
Smoking
Cough
Asthma
  • Log-linear model

Weight of Feature i
Feature i
29
30
First-Order Logic
  • Constants, variables, functions, predicatesE.g.
    Anna, x, MotherOf(x), Friends(x,y)
  • Grounding Replace all variables by
    constantsE.g. Friends (Anna, Bob)
  • World (model, interpretation)Assignment of
    truth values to all ground predicates

30
31
Markov Logic
  • Intuition Soften logical constraints
  • Syntax Weighted first-order formulas
  • Semantics Feature templates for Markov networks
  • A Markov Logic Network (MLN) is a set of pairs
    (Fi, wi) where
  • Fi is a formula in first-order logic
  • wi is a real number

Number of true groundings of Fi
31
32
Example Friends Smokers
32
33
Example Friends Smokers
33
34
Example Friends Smokers
34
35
Example Friends Smokers
Probabilistic graphical models andfirst-order
logic are special cases
Two constants Anna (A) and Bob (B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
35
36
MLN AlgorithmsThe First Three Generations
Problem First generation Second generation Third generation
MAP inference Weighted satisfiability Lazy inference Cutting planes
Marginal inference Gibbs sampling MC-SAT Lifted inference
Weight learning Pseudo-likelihood Voted perceptron Scaled conj. gradient
Structure learning Inductive logic progr. ILP PL (etc.) Clustering pathfinding
36
37
Efficient Inference
  • Logical or statistical inference already hard
  • But can do approximate inference
  • Suffice to perform well in most cases
  • Combine ideas from both camps
  • E.g., MC-SAT ? MCMC ? SAT solver
  • Can also leverage sparsity in relational domains

More Poon Domingos, Sound and Efficient
Inference with Probabilistic and Deterministic
Dependencies, in Proc. AAAI-2006.
More Poon, Domingos Sumner, A General Method
for Reducing the Complexity of Relational
Inference and its Application to MCMC, in Proc.
AAAI-2008.
37
38
Weight Learning
  • Probability model P(X)
  • X Observable in training data
  • Maximize likelihood of observed data
  • Regularization to prevent overfitting

39
Weight Learning
Requires inference
  • Gradient descent
  • Use MC-SAT for inference
  • Can also leverage second-order information Lowd
    Domingos, 2007

No. of times clause i is true in data
Expected no. times clause i is true according to
MLN
39
39
39
40
Unsupervised Learning How?
  • I.I.D. learning Sophisticated model requires
    more labeled data
  • Statistical relational learning Sophisticated
    model may require less labeled data
  • Ambiguities vary among objects
  • Joint inference ? Propagate information from
    unambiguous objects to ambiguous ones
  • One formula is worth a thousand labels
  • Small amount of domain knowledge ?
    large-scale joint inference

40
40
40
41
Unsupervised Weight Learning
  • Probability model P(X,Z)
  • X Observed in training data
  • Z Hidden variables
  • E.g., clustering with mixture models
  • Z Cluster assignment
  • X Observed features
  • Maximize likelihood of observed data by summing
    out hidden variables Z

42
Unsupervised Weight Learning
  • Gradient descent
  • Use MC-SAT to compute both expectations
  • May also combine with contrastive estimation

Sum over z, conditioned on observed x
Summed over both x and z
More Poon, Cherry, Toutanova, Unsupervised
Morphological Segmentation with Log-Linear
Models, in Proc. NAACL-2009.
42
42
42
Best Paper Award
43
Markov Logic
  • Unified inference and learning algorithms
  • ? Can handle millions of variables, billions of
    features, ten of thousands of parameters
  • Easy-to-use software Alchemy
  • Many successful applications
  • E.g. Information extraction, coreference
    resolution, semantic parsing, ontology induction

43
43
43
44
Pipeline ? Joint Inference
  • Combine segmentation and entity resolution for
    information extraction
  • Extract complex and nested bio-events from PubMed
    abstracts

More Poon Domingos, Joint Inference for
Information Extraction, in Proc. AAAI-2007.
More Poon Vanderwende, Joint Inference for
Knowledge Extraction from Biomedical Literature,
in Proc. NAACL-2010.
44
44
45
Unsupervised Learning Example
  • Coreference resolution Accuracy comparable to
    previous supervised state of the art

More Poon Domingos, Joint Unsupervised
Coreference Resolution with Markov Logic, in
Proc. EMNLP-2008.
45
45
46
Overview
  • Machine reading Challenges
  • Statistical relational learning
  • Markov logic
  • USP Unsupervised Semantic Parsing
  • Research directions

46
46
46
47
Unsupervised Semantic Parsing
Best Paper Award
  • USP Poon Domingos, EMNLP-09
  • First unsupervised approach for semantic parsing
  • End-to-end machine reading system
  • Read text, answer questions
  • OntoUSP ? USP ? Ontology Induction Poon
    Domingos, ACL-10
  • Encoded in a few Markov logic formulas

47
47
48
Semantic Parsing
Goal
Microsoft buys Powerset
BUY(MICROSOFT,POWERSET)
Challenge
Microsoft buys Powerset Microsoft acquires
semantic search engine Powerset Powerset is
acquired by Microsoft Corporation The Redmond
software giant buys Powerset Microsofts purchase
of Powerset,
48
48
48
49
Limitations of Existing Approaches
  • Manual grammar or supervised learning
  • Applicable to restricted domains only
  • For general text
  • Not clear what predicates and objects to use
  • Hard to produce consistent meaning annotation
  • Also, often learn both syntax and semantics
  • Fail to leverage advanced syntactic parsers
  • Make semantic parsing harder

50
USP Key Idea 1
  • Target predicates and objects can be learned
  • Viewed as clusters of syntactic or lexical
    variations of the same meaning
  • BUY(-,-)
  • ? ?buys, acquires, s purchase of, ?
  • ? Cluster of various expressions for
    acquisition
  • MICROSOFT
  • ? ?Microsoft, the Redmond software giant, ?
  • ? Cluster of various mentions of Microsoft

50
51
USP Key Idea 2
  • Relational clustering ? Cluster relations with
    same objects
  • USP ? Recursively cluster arbitrary expressions
    with similar subexpressions
  • Microsoft buys Powerset
  • Microsoft acquires semantic search engine
    Powerset
  • Powerset is acquired by Microsoft Corporation
  • The Redmond software giant buys Powerset
  • Microsofts purchase of Powerset,

51
52
USP Key Idea 2
  • Relational clustering ? Cluster relations with
    same objects
  • USP ? Recursively cluster arbitrary expressions
    with similar subexpressions
  • Microsoft buys Powerset
  • Microsoft acquires semantic search engine
    Powerset
  • Powerset is acquired by Microsoft Corporation
  • The Redmond software giant buys Powerset
  • Microsofts purchase of Powerset,

Cluster same forms at the atom level
52
53
USP Key Idea 2
  • Relational clustering ? Cluster relations with
    same objects
  • USP ? Recursively cluster arbitrary expressions
    with similar subexpressions
  • Microsoft buys Powerset
  • Microsoft acquires semantic search engine
    Powerset
  • Powerset is acquired by Microsoft Corporation
  • The Redmond software giant buys Powerset
  • Microsofts purchase of Powerset,

Cluster forms in composition with same forms
53
54
USP Key Idea 2
  • Relational clustering ? Cluster relations with
    same objects
  • USP ? Recursively cluster arbitrary expressions
    with similar subexpressions
  • Microsoft buys Powerset
  • Microsoft acquires semantic search engine
    Powerset
  • Powerset is acquired by Microsoft Corporation
  • The Redmond software giant buys Powerset
  • Microsofts purchase of Powerset,

Cluster forms in composition with same forms
54
55
USP Key Idea 2
  • Relational clustering ? Cluster relations with
    same objects
  • USP ? Recursively cluster arbitrary expressions
    with similar subexpressions
  • Microsoft buys Powerset
  • Microsoft acquires semantic search engine
    Powerset
  • Powerset is acquired by Microsoft Corporation
  • The Redmond software giant buys Powerset
  • Microsofts purchase of Powerset,

Cluster forms in composition with same forms
55
56
USP Key Idea 3
  • Start directly from syntactic analyses
  • Focus on translating them to semantics
  • Leverage rapid progress in syntactic parsing
  • Much easier than learning both

56
57
Joint Inference in USP
  • Forms canonical meaning representation by
    recursively clustering synonymous expressions
  • Text ? Logical form in this representation
  • Induces ISA hierarchy among clusters and
    applies hierarchical smoothing (shrinkage)

57
58
USP System Overview
  • Input Dependency trees for sentences
  • Converts dependency trees into quasi-logical
    forms (QLFs)
  • Starts with QLF clusters at atom level
  • Recursively builds up clusters of larger forms
  • Output
  • Probability distribution over QLF clusters and
    their composition
  • MAP semantic parses of sentences

58
59
Generating Quasi-Logical Forms
buys
nsubj
dobj
Powerset
Microsoft
Convert each node into an unary atom
59
60
Generating Quasi-Logical Forms
buys(n1)
nsubj
dobj
Microsoft(n2)
Powerset(n3)
n1, n2, n3 are Skolem constants
60
61
Generating Quasi-Logical Forms
buys(n1)
nsubj
dobj
Microsoft(n2)
Powerset(n3)
Convert each edge into a binary atom
61
62
Generating Quasi-Logical Forms
buys(n1)
nsubj(n1,n2)
dobj(n1,n3)
Microsoft(n2)
Powerset(n3)
Convert each edge into a binary atom
62
63
A Semantic Parse
buys(n1)
nsubj(n1,n2)
dobj(n1,n3)
Microsoft(n2)
Powerset(n3)
Partition QLF into subformulas
63
64
A Semantic Parse
buys(n1)
nsubj(n1,n2)
dobj(n1,n3)
Microsoft(n2)
Powerset(n3)
Subformula ? Lambda form Replace Skolem
constant not in unary atom with a unique lambda
variable
64
65
A Semantic Parse
buys(n1)
?x2.nsubj(n1,x2)
?x3.dobj(n1,x3)
Microsoft(n2)
Powerset(n3)
Subformula ? Lambda form Replace Skolem
constant not in unary atom with a unique lambda
variable
65
66
A Semantic Parse
Core form
buys(n1)
Argument form
Argument form
?x2.nsubj(n1,x2)
?x3.dobj(n1,x3)
Microsoft(n2)
Powerset(n3)
Core form No lambda variable Argument form One
lambda variable
66
67
A Semantic Parse
buys(n1)
? BUY
?x2.nsubj(n1,x2)
?x3.dobj(n1,x3)
? MICROSOFT
Microsoft(n2)
? POWERSET
Powerset(n3)
Assign subformula to object cluster
67
68
Object Cluster BUY
buys(n1)
0.1
One formula in MLN Learn weights for each pair
of cluster and core form
acquires(n1)
0.2

Distribution over core forms
68
69
Object Cluster BUY
BUYER
buys(n1)
0.1
acquires(n1)
0.2
BOUGHT

PRICE

May contain variable number of property clusters
69
70
Property Cluster BUYER
0.5
0.1
0.2
Zero
?x2.nsubj(n1,x2)
MICROSOFT
Three MLN formulas
0.1
GOOGLE
0.4
0.8
One
?x2.agent(n1,x2)



Distributions over argument forms, clusters, and
number
70
71
Probabilistic Model
  • Exponential prior on number of parameters
  • Cluster mixtures

Object Cluster BUY
Property Cluster BUYER
buys
0.1
0.5
MICROSOFT
0.2
nsubj
Zero
0.1
acquires
0.4

0.8
0.4
One
agent
GOOGLE
0.1




71
71
71
72
Probabilistic Model
E.g., picking MICROSOFT as BUYER argument depends
not only on BUY, but also on its ISA ancestors
  • Exponential prior on number of parameters
  • Cluster mixtures with hierarchical smoothing

Object Cluster BUY
Property Cluster BUYER
buys
0.1

0.5
MICROSOFT
0.2
nsubj
Zero
0.1
acquires
0.4

0.8
0.4
One
agent
GOOGLE
0.1




72
72
72
73
Abstract Lambda Form
  • buys(n1)
  • ?x2.nsubj(n1,x2)
  • ?x3.dobj(n1,x3)

Final logical form is obtained via lambda
reduction
  • BUYS(n1)
  • ?x2.BUYER(n1,x2)
  • ?x3.BOUGHT(n1,x3)

73
74
Challenge State Space Too Large
  • Potential cluster number ? exp(token-number)
  • Also, meaning units and clusters often small
  • ? Use combinatorial search

74
74
74
75
Inference Find MAP Parse
induces
nsubj
dobj
Initialize
protein
CD11B
nn
IL-4
Lambda reduction
protein
protein
Search Operator
nn
nn
IL-4
IL-4
75
75
75
76
Learning Greedily Maximize Posterior

enhances
acid
amino
1.0
induces
1.0
1.0
1.0
Initialize
Search Operators
MERGE
COMPOSE
enhances
1.0
induces
1.0
acid
amino
1.0
1.0
induces
0.2
amino acid
1.0
enhances
0.8
76
76
76
77
Operator Abstract
MERGE with REGULATE?
0.3
induces
0.1
enhances
induces
0.6
0.2
inhibits
suppresses
0.1
up-regulates
0.2
INDUCE


ISA
ISA
INHIBIT
INDUCE
inhibits
0.4
inhibits
0.4
induces
0.6
suppresses
INHIBIT
0.2
suppresses
0.2
up-regulates
0.2



Captures substantial similarities
77
77
77
78
Experiments
  • Apply to machine reading
  • Extract knowledge from text and answer questions
  • Evaluation Number of answers and accuracy
  • GENIA dataset 1999 Pubmed abstracts
  • Use simple factoid questions, e.g.
  • What does anti-STAT1 inhibit?
  • What regulates MIP-1 alpha?

78
78
78
79
Total and Correct Answers
USP extracted five times as many correct answers
as TextRunner Highest precision of 91
79
79
KW-SYN
TextRunner
RESOLVER
DIRT
USP
80
Qualitative Analysis
  • Resolve many nontrivial variations
  • Argument forms that mean the same, e.g.,
  • expression of X ? X expression
  • X stimulates Y ? Y is stimulated with X
  • Active vs. passive voices
  • Synonymous expressions
  • Etc.

80
80
81
Clusters And Compositions
  • Clusters in core forms
  • ? investigate, examine, evaluate, analyze, study,
    assay ?
  • ? diminish, reduce, decrease, attenuate ?
  • ? synthesis, production, secretion, release ?
  • ? dramatically, substantially, significantly ?
  • Compositions
  • amino acid, t cell, immune response,
    transcription factor, initiation site, binding
    site

81
81
82
Question-Answer Example
Interestingly, the DEX-mediated IkappaBalpha
induction was completely inhibited by IL-2, but
not IL-4, in Th1 cells, while the reverse profile
was seen in Th2 cells.
Q What does IL-2 control? A The DEX-mediated
IkappaBalpha induction
82
82
83
Overview
  • Machine reading
  • Statistical relational learning
  • Markov logic
  • USP Unsupervised Semantic Parsing
  • Research directions

83
83
83
84
Web-Scale Joint Inference
  • Challenge Efficiently identify the relevant
  • Key Induce and leverage an ontology
  • Ontology ? Capture essential properties
    Abstract away unimportant variations
  • Upper-level nodes ? Skip irrelevant branches
  • Wanted Combine the following
  • Probabilistic ontology induction (e.g., USP)
  • Coarse-to-fine learning and inference
    Felzenszwalb McAllester, 2007 Petrov, Ph.D.
    Thesis

84
84
85
Knowledge Reasoning
  • Most facts/rules are not explicitly stated
  • Dark matter in the natural language universe
  • kale contains calcium ? calcium prevent
    osteoporosis
  • ? kale prevents osteoporosis
  • Keys
  • Induce generic reasoning patterns
  • Incorporate reasoning in extraction
  • Additional sources of indirect supervision

85
85
86
Harness Social Computing
  • Bootstrap online community

Knowledge Base
86
86
87
Harness Social Computing
  • Bootstrap online community
  • Incorporate human end tasks in the loop

Tell me everything about dicer applied to
synapse
Knowledge Base
87
87
88
Harness Social Computing
Your extraction from my paper is correct except
for blah
  • Bootstrap online community
  • Incorporate human end tasks in the loop

Knowledge Base
88
88
89
Harness Social Computing
  • Bootstrap online community
  • Incorporate human end tasks in the loop
  • Form positive feedback loop

Knowledge Base
89
89
90
Acknowledgments
  • Pedro Domingos, Colin Cherry, Kristina Toutanova,
    Lucy Vanderwende, Oren Etzioni, Dan Weld, Matt
    Richardson, Parag Singla, Stanley Kok, Daniel
    Lowd, Marc Sumner
  • ARO, AFRL, ONR, DARPA, NSF

90
90
91
Summary
  • Statistical relational learning offers promising
    solutions for machine reading
  • Markov logic provides a language for this
  • Syntax Weighted first-order logical formulas
  • Semantics Feature templates of Markov nets
  • Open-source software Alchemy
  • A success story USP
  • Three key research directions

alchemy.cs.washington.edu
alchemy.cs.washington.edu/papers/poon09
91
91
Write a Comment
User Comments (0)
About PowerShow.com