Title: Author Disambiguation using Errordriven Machine Learning with a Ranking Loss Function
1Author Disambiguation using Error-driven Machine
Learning with a Ranking Loss Function
- Robert Hall
- Joint work with Aron Culotta, Pallika Kanani,
Michael Wick and Andrew McCallum
2Given a set of similar names, which ones refer to
the same person?
3Did one H. Wang have two papers at ICCV 2003?
H. Wang and D. Suter. Variable Bandwidth QMDPE
and Its Application in Robust Optic Flow
Estimation. 9th IEEE International Conference on
Computer Vision (ICCV), Nice, France, pages
178-183, October 2003. H. Wang and N. Ahuja,
Facial Expression Decomposition, IEEE
International Conference on Computer Vision
(ICCV), 2003.
4Are these papers by the same D.P. Miller?
T. Dean, R.J. Firby and D.P. Miller,
Hierarchical Planning with Deadlines and
Resources Readings in Planning, pp. 369-388,
Allen, Hendler, Tate eds., Morgan Kaufman,
1990. D.P. Miller and C. Winton. Botball Kit
for Teaching Engineering Computing. In
Proceedings of the ASEE National Conference. Salt
Lake City, UT. June 2004. A. Winterholler, M.
Roman, T. Hunt, and D.P. Miller. Design Of A
High-Mobility Low-Weight Lunar Rover.
Proceedings of iSAIRAS 2005. Munich Germany.
September 2005.
5Outline
Pairwise Model Clusterwise Model Error-driven
Parameter Estimation Experimental Analysis
6The traditional coreference method
Vertices are author mentions
Edge weights are similarity scores.
7Pairwise Features
Number of overlapping co-authors. Similarity
between titles edit distance, overlapping
n-grams Similarity in email addresses (where
available) Institution and venue
information Others based on whatever information
is available
8Greedy Agglomerative Inference
Vertices represent clusters of
mentions. (initially singletons)
Terminate when the highest edge weight is below
threshold
Iteratively merge vertices with the highest edge
weight
9Pairwise Model Details
Similarity between two clusters is the sum of
pairwise similarities.
A maximum entropy classifier is learned
offline. All pairs of mentions are enumerated
for training.
10Characteristics of author entities
It is unlikely that a person would change
institution so frequently.
Therefore it is unlikely that these authors are
the same person.
11Characteristics of author entities
An author usually has fewer than 3 spellings of
their first name. An author usually has a small
group of coauthors. A person usually authors
fewer than 50 papers per year.
12Outline
Pairwise Model Clusterwise Model Error-driven
Parameter Estimation Experimental Analysis
13Clusterwise Model Details Culotta 2006
Similarity between two clusters is based on
features of all the mentions in their union
A maximum entropy classifier is learned
offline. Clusters are sampled for training.
14Clusterwise Features
Use first-order logic to encode features for
entire clusters Global and existential
operators there exists a mismatch in middle
names all mention-pairs have a common
co-author Aggregates of real-valued
features the cluster contains 4 different
institutions the average edit distance between
titles is x half of all mention-pairs share a
middle initial
15Parameter Estimation
True clustering available from labeled data
Randomly choose some positive and negative
examples
There are combinatorially many training examples
to chose from, how can we get a good classifier?
Train a classifier, e.g. by gradient ascent
16Outline
Pairwise Model Clusterwise Model Error-driven
Parameter Estimation Experimental Analysis
17Error-driven Training
Stop when the first error is made.
Perform inference on the training set.
Update the model parameters to guard against
this type of error.
18Parameter Updates
Change to a linear scoring function
Perceptron Update
MIRA Update Crammer Singer 2003
19Ranking Update
Stop when the first error is made.
Find a correct merge that could have been made
Update the model to give the correct edge a
higher score than the incorrect edge.
20Ranking Parameter Updates
Let be the features of a correct edge
Perceptron Update
Let be the features of an incorrect edge
MIRA Update
21Training Details
Initialize parameters to 0. Iterate over the
training sets. Perform inference until error,
then update the parameters. Final parameters are
the average of all observed settings after each
update.
22Outline
Pairwise Model Clusterwise Model Error-driven
Parameter Estimation Experimental Analysis
23Datasets
Penn 2021 citation strings, 139 author
entities. Rexa 1459 citation strings, 289
author entities. DBLP 566 citation strings, 76
author entities.
24Penn Corpus
25Rexa Corpus
26DBLP Corpus
27Conclusions
Clusterwise model performs worse than pairwise
when using a naïve training method.
The clusterwise model beats the baseline
when Using error-driven training.
The MIRA update performs worse than perceptron
(unexpectedly) perhaps due to non-separable data.
28Thank you