Learning to Construct and Reason with a Large KB of Extracted Information - PowerPoint PPT Presentation

Loading...

PPT – Learning to Construct and Reason with a Large KB of Extracted Information PowerPoint presentation | free to download - id: 659977-MTU3O



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Learning to Construct and Reason with a Large KB of Extracted Information

Description:

Title: Template for Tobacco Fund Settlement Review Talks Author: William Cohen Last modified by: William Cohen Created Date: 8/23/2013 8:29:36 PM Document ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 58
Provided by: William797
Learn more at: http://www.cs.cmu.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Learning to Construct and Reason with a Large KB of Extracted Information


1
Learning to Construct and Reason with a Large KB
of Extracted Information
  • William W. CohenMachine Learning Dept and
    Language Technology Dept
  • joint work with
  • Tom Mitchell, Ni Lao, William Wang, Kathryn
    Rivard Mazaitis,
  • Richard Wang, Frank Lin, Ni Lao, Estevam
    Hruschka, Jr., Burr Settles, Partha Talukdar,
    Derry Wijaya, Edith Law, Justin Betteridge,
    Jayant Krishnamurthy, Bryan Kisiel, Andrew
    Carlson, Weam Abu Zaki , Bhavana Dalvi, Malcolm
    Greaves,
  • Lise Getoor, Jay Pujara, Hui Miao,

2
Outline
  • Background information extraction and NELL
  • Key ideas in NELL
  • Coupled learning
  • Multi-view, multi-strategy learning
  • Inference in NELL
  • Inference as another learning strategy
  • Learning in graphs
  • Path Ranking Algorithm
  • ProPPR
  • Promotion as inference
  • Conclusions summary

3
But first.some backstory
4
..and an unrelated project
5
..called SimStudent
6
SimStudent will learn rules to solve a problem
step-by-step and guide a student through how
solve problems step-by-step
7
Quinlans FOIL
8
Summary of SimStudent
  • Possible for a human author (eg middle school
    teacher) to build an ITS system
  • by building a GUI, then demonstrating problem
    solving and having the system learn how from
    examples
  • The rules learned by SimStudent can be used to
    construct a student model
  • with parameter tuning this can predict how well
    individual students will learn
  • better than state-of-the-art in some cases!
  • AI problem solving with a cognitively predictive
    model and ILP is a key component!

9
Information Extraction
  • Goal
  • Extract facts about the world automatically by
    reading text
  • IE systems are usually based on learning how to
    recognize facts in text
  • .. and then (sometimes) aggregating the results
  • Latest-generation IE systems need not require
    large amounts of training
  • and IE does not necessarily require subtle
    analysis of any particular piece of text

10
Never Ending Language Learning (NELL)
  • NELL is a broad-coverage IE system
  • Simultaneously learning 500-600 concepts and
    relations (person, celebrity, emotion, aquiredBy,
    locatedIn, capitalCityOf, ..)
  • Starting point containment/disjointness
    relations between concepts, types for relations,
    and O(10) examples per concept/relation
  • Uses 500M web page corpus live queries
  • Running (almost) continuously for over three
    years
  • Has learned over 50M beliefs, over 1M
    high-confidence ones
  • about 85 of high-confidence beliefs are correct

11
Demo
  • http//rtw.ml.cmu.edu/rtw/

12
NELL Screenshots
13
NELL Screenshots
14
NELL Screenshots
15
More examples of what NELL knows
16
(No Transcript)
17
Outline
  • Background information extraction and NELL
  • Key ideas in NELL
  • Coupled learning
  • Multi-view, multi-strategy learning
  • Inference in NELL
  • Inference as another learning strategy
  • Learning in graphs
  • Path Ranking Algorithm
  • ProPPR
  • Promotion as inference
  • Conclusions summary

18
Bootstrapped SSL learning of lexical patterns
its underconstrained!!
Extract cities
Paris Pittsburgh Seattle Cupertino
San Francisco Austin denial
anxiety selfishness Berlin
mayor of arg1 live in arg1
arg1 is home of traits such as arg1
Given four seed examples of the class city
19
One Key to Accurate Semi-Supervised Learning
teamPlaysSport(t,s)
playsForTeam(a,t)
person
sport
playsSport(a,s)
team
athlete
coach
coach(NP)
coachesTeam(c,t)
NP
NP1
NP2
Krzyzewski coaches the Blue Devils.
Krzyzewski coaches the Blue Devils.
much easier (more constrained) semi-supervised
learning problem
hard (underconstrained) semi-supervised learning
problem
  1. Easier to learn many interrelated tasks than one
    isolated task
  2. Also easier to learn using many different types
    of information

20
Outline
  • Background information extraction and NELL
  • Key ideas in NELL
  • Coupled learning
  • Multi-view, multi-strategy learning
  • Inference in NELL
  • Inference as another learning strategy
  • Learning in graphs
  • Path Ranking Algorithm
  • ProPPR
  • Promotion as inference
  • Conclusions summary

21
Another key idea use multiple types of
information
evidence integration
CBL text extraction patterns
SEAL HTML extraction patterns
PRA learned inference rules
Morph Morphologybased extractor
Ontology and populated KB
the Web
22
Extrapolating user-provided seeds
  • Set expansion (SEAL)
  • Given seeds (kdd, icml, icdm), formulate query to
    search engine and collect semi-structured web
    pages
  • Detect lists on these pages
  • Merge the results, ranking items frequently
    occurring on good lists highest
  • Details Wang Cohen ICDM 2007, 2008 EMNLP
    2008, 2009

23
Sample semi-structure pages for the concept
dictators
24
Another example of propagationExtrapolating
seeds in SEAL
  • Set expansion (SEAL)
  • Given seeds (kdd, icml, icdm), formulate query to
    search engine and collect semi-structured web
    pages
  • Detect lists on these pages
  • Merge the results, ranking items frequently
    occurring on good lists highest
  • Details Wang Cohen ICDM 2007, 2008 EMNLP
    2008, 2009

25
Extrapolating user-provided seeds
  • Set expansion (SEAL)
  • Given seeds (kdd, icml, icdm), formulate query to
    search engine and collect semi-structured web
    pages
  • Detect lists on these pages
  • Merge the results, ranking items frequently
    occurring on good lists highest
  • Details Richard Wang Cohen ICDM 2007, 2008
    EMNLP 2008, 2009
  • 300 pages/concept gt 100 pages/concept

26
Another key idea use multiple types of
information
evidence integration
CBL text extraction patterns
SEAL HTML extraction patterns
PRA learned inference rules
Morph Morphologybased extractor
Ontology and populated KB
the Web
27
Outline
  • Background information extraction and NELL
  • Key ideas in NELL
  • Coupled learning
  • Multi-view, multi-strategy learning
  • Inference in NELL
  • Inference as another learning strategy
  • Background Learning in graphs
  • Path Ranking Algorithm
  • ProPPR
  • Promotion as inference
  • Conclusions summary

28
Background Personal Info Management as
Similarity Queries on a Graph
SIGIR 2006, EMNLP 2008, TOIS 2010
NSF
Term In Subject
Einat Minkov, Univ Haifa
Sent To
William
graph
proposal
CMU
6/17/07
6/18/07
einat_at_cs.cmu.edu
29
Learning about graph similarity
  • Personalized PageRank aka Random Walk with
    Restart
  • Similarity measure for nodes in a graph,
    analogous to TFIDF for text in a WHIRL database
  • natural extension to PageRank
  • amenable to learning parameters of the walk
    (gradient search, w/ various optimization
    metrics)
  • Toutanova, Manning NG, ICML2004 Nie et al,
    WWW2005 Xi et al, SIGIR 2005
  • or reranking, etc
  • queries
  • Given type t and node x, find yT(y)t and yx
  • Given type t and nodes X, find yT(y)t and yX

30
Many tasks can be reduced to similarity queries
Person namedisambiguation
term andy file msgId
person
Threading
file msgId
  • What are the adjacent messages in this thread?
  • A proxy for finding more messages like this one

file
Alias finding
What are the email-addresses of Jason ?...
term Jason
email-address
Meeting attendees finder
Which email-addresses (persons) should I notify
about this meeting?
meeting mtgId
email-address
31
Results on a sample task
Mgmt. game
PERSON NAME DISAMBIGUATION
32
Learning about graph similaritythe next
generation
  • Personalized PageRank aka Random Walk with
    Restart
  • Given type t and nodes X, find yT(y)t and yX
  • Ni Laos thesis (2012) New, better learning
    methods
  • richer parameterization
  • faster PPR inference
  • structure learning
  • Other tasks
  • relation-finding in parsed text
  • information management for biologists
  • inference in large noisy knowledge bases

33
Lao A learned random walk strategy is a weighted
set of random-walk experts, each of which is a
walk constrained by a path (i.e., sequence of
relations)
Recommending papers to cite in a paper being
prepared
1) papers co-cited with on-topic papers
6) approx. standard IR retrieval
7,8) papers cited during the past two years
12-13) papers published during the past two
years
34
Another studylearning inference rules for a
noisy KB(Lao, Cohen, Mitchell 2011)(Lao et al,
2012)
Random walk interpretation is crucial
i.e. 10-15 extra points in MRR
35
Another key idea use multiple types of
information
evidence integration
CBL text extraction patterns
SEAL HTML extraction patterns
PRA learned inference rules
Morph Morphologybased extractor
Ontology and populated KB
the Web
36
Outline
  • Background information extraction and NELL
  • Key ideas in NELL
  • Inference in NELL
  • Inference as another learning strategy
  • Background Learning in graphs
  • Path Ranking Algorithm
  • PRA FOL ProPPR and joint learning for
    inference
  • Promotion as inference
  • Conclusions summary

37
How can you extend PRA to
  • Non-binary predicates?
  • Paths that include constants?
  • Recursive rules?
  • . ?
  • Current direction using ideas from PRA in a
    general first-order logic ProPPR

38
  • A limitation
  • Paths are learned separately for each relation
    type, and one learned rule cant call another
  • PRA can learn this.

athletePlaySportViaRule(Athlete,Sport) ?
onTeamViaKB(Athlete,Team), teamPlaysSportViaKB(T
eam,Sport) teamPlaysSportViaRule(Team,Sport) ?
memberOfViaKB(Team,Conference),
hasMemberViaKB(Conference,Team2), playsViaKB(Te
am2,Sport). teamPlaysSportViaRule(Team,Sport)
? onTeamViaKB(Athlete,Team), athletePlaysSportVia
KB(Athlete,Sport)
39
  • A limitation
  • Paths are learned separately for each relation
    type, and one learned rule cant call another
  • But PRA cant learn this..

athletePlaySport(Athlete,Sport) ?
onTeam(Athlete,Team), teamPlaysSport(Team,Sport)
athletePlaySport(Athlete,Sport) ?
athletePlaySportViaKB(Athlete,Sport)
teamPlaysSport(Team,Sport) ? memberOf(Team,Co
nference), hasMember(Conference,Team2), plays(T
eam2,Sport). teamPlaysSport(Team,Sport)
? onTeam(Athlete,Team), athletePlaysSport(Athlete
,Sport) teamPlaysSport(Team,Sport) ?
teamPlaysSportViaKB(Team,Sport)
40
  • Solution a major extension from PRA to include
    large subset of Prolog

athletePlaySport(Athlete,Sport) ?
onTeam(Athlete,Team), teamPlaysSport(Team,Sport)
athletePlaySport(Athlete,Sport) ?
athletePlaySportViaKB(Athlete,Sport)
teamPlaysSport(Team,Sport) ? memberOf(Team,Co
nference), hasMember(Conference,Team2), plays(T
eam2,Sport). teamPlaysSport(Team,Sport)
? onTeam(Athlete,Team), athletePlaysSport(Athlete
,Sport) teamPlaysSport(Team,Sport) ?
teamPlaysSportViaKB(Team,Sport)
41
Sample ProPPR program.
features of rules (vars from head ok)
Horn rules
42
.. and search space
Doh! This is a graph!
43
  • Score for a query soln (e.g., Zsport for
    about(a,Z)) depends on probability of reaching
    a ? node
  • learn transition probabilities based on features
    of the rules
  • implicit reset transitions with (pa) back to
    query node
  • Looking for answers supported by many short proofs

Exactly as in Stochastic Logic
Programs Cussens, 2001
Grounding size is O(1/ae) ie independent of
DB size ? fast approx incremental inference
(Reid,Lang,Chung, 08)
Learning supervised variant of personalized
PageRank (Backstrom Leskovic, 2011)
44
Sample Task Citation Matching
  • Task
  • citation matching (Alchemy Poon Domingos).
  • Dataset
  • CORA dataset, 1295 citations of 132 distinct
    papers.
  • Training set section 1-4.
  • Test set section 5.
  • ProPPR program
  • translated from corresponding Markov logic
    network (dropping non-Horn clauses)
  • of rules 21.

45
Task Citation Matching
46
Time Citation Matchingvs Alchemy
Grounding is independent of DB size
47
Accuracy Citation Matching
Our rules
UW rules
AUC scores 0.0low, 1.0hi w1 is before learning
48
It gets better..
  • Learning uses many example queries
  • e.g sameCitation(c120,X) with Xc123, Xc124-,
  • Each query is grounded to a separate small graph
    (for its proof)
  • Goal is to tune weights on these edge features to
    optimize RWR on the query-graphs.
  • Can do SGD and run RWR separately on each
    query-graph
  • Graphs do share edge features, so theres some
    synchronization needed

49
Learning can be parallelized by splitting on the
separate groundings of each query
50
You can do more with ProPPR
51
Back to NELL
evidence integration
CBL text extraction patterns
SEAL HTML extraction patterns
PRA learned inference rules
Morph Morphologybased extractor
Ontology and populated KB
the Web
52
  • Experiment
  • Take top K paths for each predicate learned by
    Laos PRA
  • (I dont know how to do structure learning for
    ProPPR yet)
  • Convert to a mutually recursive ProPPR program
  • Train weights on entire program

athletePlaySport(Athlete,Sport) ?
onTeam(Athlete,Team), teamPlaysSport(Team,Sport)
athletePlaySport(Athlete,Sport) ?
athletePlaySportViaKB(Athlete,Sport)
teamPlaysSport(Team,Sport) ? memberOf(Team,Co
nference), hasMember(Conference,Team2), plays(T
eam2,Sport). teamPlaysSport(Team,Sport)
? onTeam(Athlete,Team), athletePlaysSport(Athlete
,Sport) teamPlaysSport(Team,Sport) ?
teamPlaysSportViaKB(Team,Sport)
53
More details
  • Train on NELLs KB as of iteration 713
  • Test on new facts from later iterations
  • Try three subdomains of NELL
  • pick a seed entity S
  • pick top M entities nodes in a (simple untyped
    RWR) from S
  • project KB to just these M entities
  • look at three subdomains, six values of M

54
(No Transcript)
55
Time (to answer queries)
56
Time (sec to answer 10 recursive queries)
57
Time (sec to train top 5K DB)
58
ProPPR vs Alchemy
  • gt4 days to train discriminatively on recursive
    theory with 500-entity sample
  • pseudo-likelihood training fails on some
    recursive rule sets

59
Outline
  • Background information extraction and NELL
  • Key ideas in NELL
  • Coupled learning
  • Multi-view, multi-strategy learning
  • Inference in NELL
  • Inference as another learning strategy
  • Learning in graphs
  • Path Ranking Algorithm
  • ProPPR
  • Promotion as inference
  • Conclusions summary

60
More detail on NELL
  • For iteration i1,.,715,
  • For each view (lexical patterns, , PRA)
  • Distantly-train for that view using KBi
  • Propose new candidate beliefs based on the
    learned view-specific classifier
  • Hueristically find the best candidate beliefs
    and promote them into KBi1

Not obvious how to promote in a principled way
61
Promotion identifying new correct extractions
from a pool of noisy extractions
  • Many types of noise are possible
  • co-referent entities
  • missing or spurious labels
  • missing or spurious relations
  • violations of ontology (e.g., an athlete that is
    not a person)
  • Identifying true extractions requires joint
    reasoning, e.g.
  • Pooling information about co-referent entities
  • Enforcing mutual exclusion of labels and
    relations
  • Problem How can we integrate extractions from
    multiple sources in the presence of ontological
    constraints at the scale of millions of
    extractions?

62
An example
A knowledge graph view of NELLs extractions
Sample Extractions Lbl(Kyrgyzstan,
bird) Lbl(Kyrgyzstan, country) Lbl(Kyrgyz
Republic, country) Rel(Kyrgyz Republic, Bishkek,
hasCapital)
SameEnt
Kyrgyzstan
Kyrgyz Republic
Lbl
Lbl
Lbl
Ontology Dom(hasCapital, country) Mut(country,
bird)
Dom
country
Rel(hasCapital)
Mut
bird
Entity Resolution SameEnt(Kyrgyz Republic,
Kyrgyzstan)
Bishkek
What you want
Kyrgyzstan Kyrgyz Republic
Rel(hasCapital)
Lbl
Bishkek
country
63
Knowledge graph
graph identification
Lise Getoor, Jay Pujara, and Hui Miao _at_ UMD
64
Graph Identification as Joint Reasoning
Probabilistic Soft Logic (PSL)
  • Templating language for hinge-loss MRFs, much
    more scalable!
  • Model specified as a collection of logical
    formulas
  • Formulas are ground by substituting literal
    values
  • Truth values of atoms relaxed to 0,1 interval
  • Truth values of formulas derived from Lukasiewicz
    t-norm
  • Each ground rule, r, has a weighted potential, ?r
    corresponding to a distance to satisfaction
  • PSL defines a probability distribution over atom
    truth value assignments, I
  • Most probable explanation (MPE) inference is
    convex
  • Running time scales linearly with grounded rules
    (R)

65
PSL Representation of Heuristics for Promotion
Promote any candidate
Promote hints (old promotion strategy)
Be consistent about labels for duplicate entities
66
PSL Representation of Ontological Rules
Be consistent with constraints from ontology
Too expressive for ProPPR
Adapted from Jiang et al., ICDM 2012
67
Datasets Results
  • Evaluation on NELL dataset from iteration 165
  • 1.7M candidate facts
  • 70K ontological constraints
  • Predictions on 25K facts from a 2-hop
    neighborhood around test data
  • Beats other methods, runs in just 10 seconds!

F1 AUC
Baseline .828 .873
NELL .673 .765
MLN (Jiang, 12) .836 .899
KGI-PSL .853 .904
68
Summary
  • Background information extraction and NELL
  • Key ideas in NELL
  • Coupled learning
  • Multi-view, multi-strategy learning
  • Inference in NELL
  • Inference as another learning strategy
  • Learning in graphs
  • Path Ranking Algorithm
  • ProPPR
  • Promotion as inference
  • Conclusions summary
About PowerShow.com