Title: Local and Global Algorithms for Disambiguation to Wikipedia
1Local and Global Algorithms for Disambiguation
to Wikipedia
- Lev Ratinov1, Dan Roth1, Doug Downey2, Mike
Anderson3 - 1University of Illinois at Urbana-Champaign
- 2Northwestern University
- 3Rexonomy
March 2011
2Information overload
3Organizing knowledge
Its a version of Chicago the standard classic
Macintosh menu font, with that distinctive thick
diagonal in the N.
Chicago was used by default for Mac menus through
MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago
albums to catch my ear, along with Chicago II.
4Cross-document co-reference resolution
Its a version of Chicago the standard classic
Macintosh menu font, with that distinctive thick
diagonal in the N.
Chicago was used by default for Mac menus through
MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago
albums to catch my ear, along with Chicago II.
5Reference resolution (disambiguation to
Wikipedia)
Its a version of Chicago the standard classic
Macintosh menu font, with that distinctive thick
diagonal in the N.
Chicago was used by default for Mac menus through
MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago
albums to catch my ear, along with Chicago II.
6The reference collection has structure
Its a version of Chicago the standard classic
Macintosh menu font, with that distinctive thick
diagonal in the N.
Chicago was used by default for Mac menus through
MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago
albums to catch my ear, along with Chicago II.
Is_a
Is_a
Used_In
Released
Succeeded
7Analysis of Information Networks
Its a version of Chicago the standard classic
Macintosh menu font, with that distinctive thick
diagonal in the N.
Chicago was used by default for Mac menus through
MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago
albums to catch my ear, along with Chicago II.
8Here Wikipedia as a knowledge resource . but
we can use other resources
Is_a
Is_a
Used_In
Released
Succeeded
9Talk outline
- High-level algorithmic approach.
- Bi-partite graph matching with global and local
inference. - Local Inference.
- Experiments Results
- Global Inference.
- Experiments Results
- Results, Conclusions
- Demo
10Problem formulation - matching/ranking problem
Text Document(s)News, Blogs,
Wikipedia Articles
11Local approach
Text Document(s)News, Blogs,
Wikipedia Articles
- G is a solution to the problem
- A set of pairs (m,t)
- m a mention in the document
- t the matched Wikipedia Title
12Local approach
Text Document(s)News, Blogs,
Wikipedia Articles
- G is a solution to the problem
- A set of pairs (m,t)
- m a mention in the document
- t the matched Wikipedia Title
Local score of matching the mention to the title
13Local Global using the Wikipedia structure
Text Document(s)News, Blogs,
Wikipedia Articles
A global term evaluating how good the
structure of the solution is
14Can be reduced to an NP-hard problem
Text Document(s)News, Blogs,
Wikipedia Articles
15A tractable variation
Text Document(s)News, Blogs,
Wikipedia Articles
- Invent a surrogate solution G
- disambiguate each mention independently.
- Evaluate the structure based on pair-wise
coherence scores ?(ti,tj)
16Talk outline
- High-level algorithmic approach.
- Bi-partite graph matching with global and local
inference. - Local Inference.
- Experiments Results
- Global Inference.
- Experiments Results
- Results, Conclusions
- Demo
17I. Baseline P(TitleSurface Form)
P(TitleChicago)
18II. Context(Title)
Context(Charcoal) a font called __ is used to
19III. Text(Title)
Just the text of the page (one per title)
20Putting it all together
- City Vs Font (0.99-0.0001, 0.01-0.2, 0.03-0.01)
- Band Vs Font (0.001-0.0001, 0.001-0.2,
0.02-0.01) - Training ranking SVM
- Consider all title pairs.
- Train a ranker on the pairs (learn to prefer the
correct solution). - Inference knockout tournament.
- Key Abstracts over the text learns which
scores are important.
Score Baseline Score Context Score Text
Chicago_city 0.99 0.01 0.03
Chicago_font 0.0001 0.2 0.01
Chicago_band 0.001 0.001 0.02
21Example font or city?
Text(Chicago_city), Context(Chicago_city)
Its a version of Chicago the standard classic
Macintosh menu font, with that distinctive thick
diagonal in the N.
Text(Chicago_font), Context(Chicago_font)
22Lexical matching
Text(Chicago_city), Context(Chicago_city)
Cosine similarity, TF-IDF weighting
Its a version of Chicago the standard classic
Macintosh menu font, with that distinctive thick
diagonal in the N.
Text(Chicago_font), Context(Chicago_font)
23Ranking font vs. city
Text(Chicago_city), Context(Chicago_city)
0.5
0.2
0.1
0.8
Its a version of Chicago the standard classic
Macintosh menu font, with that distinctive thick
diagonal in the N.
0.3
0.2
0.3
0.5
Text(Chicago_font), Context(Chicago_font)
24Train a ranking SVM
Text(Chicago_city), Context(Chicago_city)
(0.5, 0.2 , 0.1, 0.8)
Its a version of Chicago the standard classic
Macintosh menu font, with that distinctive thick
diagonal in the N.
(0.2, 0, -0.2, 0.3), -1
(0.3, 0.2, 0.3, 0.5)
Text(Chicago_font), Context(Chicago_font)
25Scaling issues one of our key contributions
Text(Chicago_city), Context(Chicago_city)
Its a version of Chicago the standard classic
Macintosh menu font, with that distinctive thick
diagonal in the N.
Text(Chicago_font), Context(Chicago_font)
26Scaling issues
Text(Chicago_city), Context(Chicago_city)
This stuff is big, and is loaded into the memory
from the disk
Its a version of Chicago the standard classic
Macintosh menu font, with that distinctive thick
diagonal in the N.
Text(Chicago_font), Context(Chicago_font)
27Improving performance
Text(Chicago_city), Context(Chicago_city)
Rather than computing TF-IDF weighted cosine
similarity, we want to train a classifier on the
fly. But due to the aggressive feature pruning,
we choose PrTFIDF
Its a version of Chicago the standard classic
Macintosh menu font, with that distinctive thick
diagonal in the N.
Text(Chicago_font), Context(Chicago_font)
28Performance (local only) ranking accuracy
Dataset Baseline (solvable) Local TFIDF (solvable) Local PrTFIDF (solvable)
ACE 94.05 95.67 96.21
MSN News 81.91 84.04 85.10
AQUAINT 93.19 94.38 95.57
Wikipedia Test 85.88 92.76 93.59
29Talk outline
- High-level algorithmic approach.
- Bi-partite graph matching with global and local
inference. - Local Inference.
- Experiments Results
- Global Inference.
- Experiments Results
- Results, Conclusions
- Demo
30Co-occurrence(Title1,Title2)
The city senses of Boston and Chicago appear
together often.
31Co-occurrence(Title1,Title2)
Rock music and albums appear together often
32Global ranking
- How to approximate the global semantic context
in the document? (What is G?) - Use only non-ambiguous mentions for G
- Use the top baseline disambiguation for NER
surface forms. - Use the top baseline disambiguation for all the
surface forms. - How to define relatedness between two titles?
(What is ??)
33? Pair-wise relatedness between 2 titles
- Normalized Google Distance
- Pointwise Mutual Information
34What is best the G? (ranker
accuracy, solvable mentions)
Dataset Baseline Baseline Lexical Baseline Global Unambiguous Baseline Global NER Baseline Global, All Mentions
ACE 94.05 94.56 96.21 96.75
MSN News 81.91 84.46 84.04 88.51
AQUAINT 93.19 95.40 94.04 95.91
Wikipedia Test 85.88 89.67 89.59 89.79
35Results ranker accuracy (solvable mentions)
Dataset Baseline Baseline Lexical Baseline Global Unambiguous Baseline Global NER Baseline Global, All Mentions
ACE 94.05 96.21 96.75
MSN News 81.91 85.10 88.51
AQUAINT 93.19 95.57 95.91
Wikipedia Test 85.88 93.59 89.79
36Results Local Global
Dataset Baseline Baseline Lexical Baseline Lexical Global
ACE 94.05 96.21 97.83
MSN News 81.91 85.10 87.02
AQUAINT 93.19 95.57 94.38
Wikipedia Test 85.88 93.59 94.18
37Talk outline
- High-level algorithmic approach.
- Bi-partite graph matching with global and local
inference. - Local Inference.
- Experiments Results
- Global Inference.
- Experiments Results
- Results, Conclusions
- Demo
38Conclusions
- Dealing with a very large scale knowledge
acquisition and extraction problem - State-of-the-art algorithmic tools that exploit
using content structure of the network. - Formulated a framework for Local Global
reference resolution and disambiguation into
knowledge networks - Proposed local and global algorithms state of
the art performance. - Addressed scaling issue a major issue.
- Identified key remaining challenges (next slide).
39We want to know what we dont know
- Not dealt well in the literature
- As Peter Thompson, a 16-year-old hunter, said
.. - Dorothy Byrne, a state coordinator for the
Florida Green Party - We train a separate SVM classifier to identify
such cases. The features are - All the baseline, lexical and semantic scores of
the top candidate. - Score assigned to the top candidate by the
ranker. - The confidence of the ranker on the top
candidate with respect to second-best
disambiguation. - Good-Turing probability of out-of-Wikipedia
occurrence for the mention. - Limited success future research.
40Comparison to the previous state of the art
(all mentions, including OOW)
Dataset Baseline MilneWitten Our System- GLOW
ACE 69.52 72.76 77.25
MSN News 72.83 68.49 74.88
AQUAINT 82.64 83.61 83.94
Wikipedia Test 81.77 80.32 90.54
41Demo