Author Disambiguation using Errordriven Machine Learning with a Ranking Loss Function

About This Presentation

Title:

Author Disambiguation using Errordriven Machine Learning with a Ranking Loss Function

Description:

Robert Hall. Joint work with Aron Culotta, Pallika Kanani, Michael Wick and Andrew McCallum ... T. Dean, R.J. Firby and D.P. Miller, Hierarchical Planning with ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 29

Provided by: rob1107

Category:

more less

Transcript and Presenter's Notes

Title: Author Disambiguation using Errordriven Machine Learning with a Ranking Loss Function

1
Author Disambiguation using Error-driven Machine
Learning with a Ranking Loss Function

Robert Hall
Joint work with Aron Culotta, Pallika Kanani,
Michael Wick and Andrew McCallum

2
Given a set of similar names, which ones refer to
the same person?
3
Did one H. Wang have two papers at ICCV 2003?
H. Wang and D. Suter. Variable Bandwidth QMDPE
and Its Application in Robust Optic Flow
Estimation. 9th IEEE International Conference on
Computer Vision (ICCV), Nice, France, pages
178-183, October 2003. H. Wang and N. Ahuja,
Facial Expression Decomposition, IEEE
International Conference on Computer Vision
(ICCV), 2003.
4
Are these papers by the same D.P. Miller?
T. Dean, R.J. Firby and D.P. Miller,
Hierarchical Planning with Deadlines and
Resources Readings in Planning, pp. 369-388,
Allen, Hendler, Tate eds., Morgan Kaufman,
1990. D.P. Miller and C. Winton. Botball Kit
for Teaching Engineering Computing. In
Proceedings of the ASEE National Conference. Salt
Lake City, UT. June 2004. A. Winterholler, M.
Roman, T. Hunt, and D.P. Miller. Design Of A
High-Mobility Low-Weight Lunar Rover.
Proceedings of iSAIRAS 2005. Munich Germany.
September 2005.
5
Outline
Pairwise Model Clusterwise Model Error-driven
Parameter Estimation Experimental Analysis
6
The traditional coreference method
Vertices are author mentions
Edge weights are similarity scores.
7
Pairwise Features
Number of overlapping co-authors. Similarity
between titles edit distance, overlapping
n-grams Similarity in email addresses (where
available) Institution and venue
information Others based on whatever information
is available
8
Greedy Agglomerative Inference
Vertices represent clusters of
mentions. (initially singletons)
Terminate when the highest edge weight is below
threshold
Iteratively merge vertices with the highest edge
weight
9
Pairwise Model Details
Similarity between two clusters is the sum of
pairwise similarities.
A maximum entropy classifier is learned
offline. All pairs of mentions are enumerated
for training.
10
Characteristics of author entities
It is unlikely that a person would change
institution so frequently.
Therefore it is unlikely that these authors are
the same person.
11
Characteristics of author entities
An author usually has fewer than 3 spellings of
their first name. An author usually has a small
group of coauthors. A person usually authors
fewer than 50 papers per year.
12
Outline
Pairwise Model Clusterwise Model Error-driven
Parameter Estimation Experimental Analysis
13
Clusterwise Model Details Culotta 2006
Similarity between two clusters is based on
features of all the mentions in their union
A maximum entropy classifier is learned
offline. Clusters are sampled for training.
14
Clusterwise Features
Use first-order logic to encode features for
entire clusters Global and existential
operators there exists a mismatch in middle
names all mention-pairs have a common
co-author Aggregates of real-valued
features the cluster contains 4 different
institutions the average edit distance between
titles is x half of all mention-pairs share a
middle initial
15
Parameter Estimation
True clustering available from labeled data
Randomly choose some positive and negative
examples
There are combinatorially many training examples
to chose from, how can we get a good classifier?
Train a classifier, e.g. by gradient ascent
16
Outline
Pairwise Model Clusterwise Model Error-driven
Parameter Estimation Experimental Analysis
17
Error-driven Training
Stop when the first error is made.
Perform inference on the training set.
Update the model parameters to guard against
this type of error.
18
Parameter Updates
Change to a linear scoring function
Perceptron Update
MIRA Update Crammer Singer 2003
19
Ranking Update
Stop when the first error is made.
Find a correct merge that could have been made
Update the model to give the correct edge a
higher score than the incorrect edge.
20
Ranking Parameter Updates
Let be the features of a correct edge
Perceptron Update
Let be the features of an incorrect edge
MIRA Update
21
Training Details
Initialize parameters to 0. Iterate over the
training sets. Perform inference until error,
then update the parameters. Final parameters are
the average of all observed settings after each
update.
22
Outline
Pairwise Model Clusterwise Model Error-driven
Parameter Estimation Experimental Analysis
23
Datasets
Penn 2021 citation strings, 139 author
entities. Rexa 1459 citation strings, 289
author entities. DBLP 566 citation strings, 76
author entities.
24
Penn Corpus
25
Rexa Corpus
26
DBLP Corpus
27
Conclusions
Clusterwise model performs worse than pairwise
when using a naïve training method.
The clusterwise model beats the baseline
when Using error-driven training.
The MIRA update performs worse than perceptron
(unexpectedly) perhaps due to non-separable data.
28
Thank you

Write a Comment

User Comments (0)