Folie 1 - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Folie 1

Description:

'Elvis died in England' Previous approaches: ... 'Elvis' 'England' diedInPlace. Taxidophobist ... type(Elvis,Taxidophobist). type(X,Taxidophobist) & bornInPlace(X,Y) ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 33
Provided by: Such8
Category:
Tags: elvis | folie

less

Transcript and Presenter's Notes

Title: Folie 1


1
SOFIE A Self-Organizing Framework for
Information Extraction
Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum
(Max-Planck-Institute for Informatics,
Saarbrücken, Germany)?
2
Ontologies
Entity
subclassOf
subclassOf
Singer
Country
type
type
DBpedia, YAGO, KYLIN, ...
Wikipedia
bornInPlace
USA
?
birth-place USA
"Elvis died in England"
Internet
3
Information Extraction
Goal Extract ontological information from
natural language documents
diedInPlace
England
"Elvis died in England"
recoverWithout(most_people, medication) areUnder(0
, the_age_of_18) support(these_findings,
the_notion)
Previous approaches Espresso, DIPRE, LEILA,
Snowball, TextRunner, Alice, and many more
died in, perished in, was killed in
? May deliver non-canonic relations
England, UK, Great Britain
? May deliver non-canonic entities
diedInPlace(Elvis, England) diedInPlace(Elvis,
Germany)
? May deliver inconsistent facts
SOFIE aims to solve these problems in a new
unified framework
4
Pitfalls of Information Extraction
Ontology
Web page
Elvis died in England.
diedInPlace
France
Louis XIV died in France.
If a pattern occurs with two entities that stand
in a relation, then the pattern maps to the
relation.
"died in" diedInPlace
5
Pitfalls of Information Extraction
Ontology
Web page
Elvis died in England.
Louis XIV died in France.
If a pattern occurs with two entities that stand
in a relation, then the pattern maps to the
relation.
"died in" diedInPlace
If a meaningful pattern occurs with two entities,
then the entities stand in the relation.
diedInPlace
"Elvis"
"England"
6
Pitfalls of Information Extraction
Ontology
Web page
?
Taxidophobist
Elvis died in England.
Louis XIV died in France.
If a pattern occurs with two entities that stand
in a relation, then the pattern maps to the
relation.
"died in" diedInPlace
If a meaningful pattern occurs with two entities,
then the entities stand in the relation.
diedInPlace
"Elvis"
"England"
7
Pitfalls of Information Extraction
Web page
Reasoning Problem
Elvis died in England.
Taxidophobist
Louis XIV died in France.
If a pattern occurs with two entities that stand
in a relation, then the pattern maps to the
relation.
"died in" diedInPlace
If a meaningful pattern occurs with two entities,
then the entities stand in the relation.
diedInPlace
"Elvis"
"England"
8
Pitfalls of Information Extraction
Web page
Reasoning Problem
Elvis died in England.
Taxidophobist
Louis XIV died in France.
If a pattern occurs with two entities that stand
in a relation, then the pattern maps to the
relation.
Disambiguation Problem
"died in" diedInPlace
If a meaningful pattern occurs with two entities,
then the entities stand in the relation.
9
Pitfalls of Information Extraction
Reasoning Problem
Pattern Matching Problem
Taxidophobist
Elvis died in England.
Louis XIV died in France.
"died in" diedInPlace ?
Disambiguation Problem
10
Information Extraction as Formulas
Reasoning Problem
type(Elvis,Taxidophobist).
Taxidophobist
type(X,Taxidophobist) bornInPlace(X,Y) ?
diedInPlace(X,Z) 0.8
11
Information Extraction as Formulas
Reasoning Problem
Pattern Matching Problem
type(Elvis,Taxidophobist).
Elvis died in England.
type(X,Taxidophobist) bornInPlace(X,Y) ?
diedInPlace(X,Z)
Louis XIV died in France.
"died in" diedInPlace ?
Disambiguation Problem
12
Information Extraction as Formulas
Assumptions ? In one document, the same word
has always the same meaning ? The ontology
already knows all important meanings of proper
names
possibleMeaning(Elvis_at_D15, ElvisPresley). 0.7
Disambiguation Problem
13
Information Extraction as Formulas
Assumptions ? In one document, the same word
has always the same meaning ? The ontology
already knows all important meanings of proper
names
possibleMeaning(Elvis_at_D15, ElvisPresley). 0.7
Prior estimation for the likelihood of this
meaning.
A word in context (wic). Here The word "Elvis"
in document D15
words(D15) n rel(ElvisPresley)
One possible meaning of "Elvis" as given by the
ontology
words(D15)
14
Information Extraction as Formulas
Assumptions ? In one document, the same word
has always the same meaning ? The ontology
already knows all important meanings of proper
names
possibleMeaning(Elvis_at_D15, ElvisPresley). 0.7
possibleMeaning(X,Y) means(X,Y) means(X,Y)
Y?Z ? means(X,Z)
15
Information Extraction as Formulas
Reasoning Problem
Pattern Matching Problem
type(Elvis,Taxidophobist).
Elvis died in England.
type(X,Taxidophobist) bornInPlace(X,Y) ?
diedInPlace(X,Z)
Louis XIV died in France.
"died in" diedInPlace ?
Disambiguation Problem
meaning(Elvis_at_D15,
ElvisPresley). 0.7
16
Information Extraction as Formulas
Pattern Matching Problem
occurs("died in", Elvis_at_D15,
England_at_D15). 14
Elvis died in England.
Louis XIV died in France.
"died in" diedInPlace ?
occurs(P,Wic1,Wic2) means(Wic1,X)
means(Wic2,Y) R(X,Y) mapsTo(P,R)
occurs(P,Wic1,Wic2) means(Wic1,X)
means(Wic2,Y) mapsTo(P,R) R(X,Y)
17
Information Extraction as Formulas
Reasoning Problem
Pattern Matching Problem
type(Elvis,Taxidophobist).
occurs("died in", Elvis_at_D15,
England_at_D15). 14
type(X,Taxidophobist) bornInPlace(X,Y) ?
diedInPlace(X,Z)
Find truth assignments to hypotheses so that the
weight of satisfied formulas is
maximized means(Elvis_at_D15, ElvisPresley)
? mapsTo("died In", diedInPlace)
? diedIn(ElvisPresley, England) ?
Disambiguation Problem
meaning(Elvis_at_D15,
ElvisPresley). 0.7
18
Weighted MAX SAT Problem
Weighted MAX SAT Problem
Find truth assignments to hypotheses so that the
weight of satisfied formulas is maximized
Structurally much simpler than MLNs. No need to
model probabilities if we're just interested in
the maximum.
Problems ? The Weighted MAX SAT Problem is
NP-hard ? Our instance of the problem is huge ?
The most popular greedy approximation algorithm
(Johnson's) does not work well with our type
of formulas
bornInPlace(X,Y) ? bornInPlace(X,Z) ? A v ?
B ? A v ? C ? B v ? C
Johnson's has upper bound 2/3 on approximation
19
FMS Algorithm
The Functional MAX SAT Algorithm considers only
unit clauses.
Formulas
Hypotheses
?A v ?B w1 ?A v ?B w2 ?B v ?C
w3 C w4
false
A B C
false
true
The Functional MAX SAT Algorithm propagates
Dominating Unit Clauses
?A v B 10 ?A 10 A
30
A true
30 1010
20
FMS Algorithm
Polynomial time
FMS Algorithm FOR i1 TO 42 ... NEXT i
Approximation Guarantee
Experiments show better performance in practice
than Johnson's algorithm in our setting .
21
FMS Algorithm
Elvis died in England
r(X,Y) s(Y) t(X,Y)
FMS Algorithm FOR i1 TO 42 ... NEXT i
22
FMS Algorithm
Elvis died in England
r(X,Y) s(Y) t(X,Y)
type(Elvis,Taxidophobist)1
diedIn(Elvis,England)0
FMS Algorithm FOR i1 TO 42 ... NEXT i
means(Elvis_at_D15,Elvis)0
means(Elvis_at_D15,...)1
diedIn
England
St. Elvis
23
SOFIE
r(X,Y) s(Y) t(X,Y)
diedIn
England
St. Elvis
24
Other Experiments
(All experiments with the YAGO ontology)
25
Conclusion
SOFIE unifies the tasks of ? entity
disambiguation ? pattern extraction ? semantic
constraint reasoning in a single framework,
delivering ? canonicalized facts ? of high
precision
s(Y) t(X)
died in England...
but is alive!
http//mpii.de/yago-naga
26
SOFIE rules!
occurs(P,WX,WY) /\ refersTo(WX.X) /\
refersTo(WY,Y) /\ R(X,Y) expresses(P,R)
occurs(P,WX,WY) /\ expressed(P,R) /\
refersTo(WX.X) /\ refersTo(WY,Y) /\
range(R,D1) /\ domain(R,D2) /\ type(X,D1) /\
type(Y,D2) R(X,Y)
R(X,Y) /\ R(X,Z) /\ type(R,function) Y
Z
disambiguationPrior(W,X) refersTo(W,X)
? R(X,Y)
bornInYear(X,B) /\ diedInYear(X,D) B 27
SOFIE Experiments
28
SOFIE Large-Scale Experiment
Goal Extract bornIn, bornOnDate, diedIn,
diedOnDate, politicianOf
Corpus 3700 biography documents downloaded from
the Web
Results (precision in )
Runtime (summed over 5 batches)
Parsing 705h Hypothesis Generation 615h Sol
ving 230h Total 1550h
87 87 13 98 95
? 90
bornIn bornOnD diedIn diedOnD polOf
29
SOFIE Relation to Markov Logic
Number of satisfied instances of the ith formula
Weight of the ith formula
r(x,y) /\ s(x,z) t(x,z) w ...
P(X) ? e sat(i,X) wi
max X ? e sat(i,X) wi
P
max X log( ? e sat(i,X) wi )
max X ? sat(i,X) wi
false true
bornIn(Nicholas, Patras)
Weighted MAX SAT problem
30
Grounding
r(X,Y) s(Y) t(X,Y)
Immutable, complete facts (e.g. pattern
occurrences)
? r(X,Y), ? s(Y), t(X,Y)
r(a,a)
? r(a,b) ? r(b,a) ? r(b,b)
Entitiesa,b
? r(a,a), ? s(a), t(a,a) ? r(a,b), ? s(b),
t(a,b) ? r(b,a), ? s(a), t(b,a) ?
r(b,b), ? s(b), t(b,b)
31
Grounding
r(X,Y) s(Y) t(X,Y)
Immutable, complete facts (e.g. pattern
occurrences)
? r(X,Y), ? s(Y), t(X,Y)
r(a,a) w
? r(a,b) ? r(b,a) ? r(b,b)
? s(a), t(a,a) w
32
Grounding
? s(a), t(a,a) w1 p(c,d), ? q(e),
w2
Find truth assignments to hypotheses so that the
weight of satisfied formulas is maximized
means(Elvis_at_D15, ElvisPresley) true ?
mapsTo("died In", diedInPlace) true ?
diedIn(ElvisPresley, England) true ?
Write a Comment
User Comments (0)
About PowerShow.com