Title: A Graphbased Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields Yo
1A Graph-based Approach to Named Entity
Categorization in Wikipedia Using Conditional
Random Fields Yotaro Watanabe, Masayuki Asahara
and Yuji MatsumotoNara Institute of Science and
Technology EMNLP-CoNLL 2007 29th June
Prague, Czech
2Background
- Named Entity
- Proper nouns (e.g. Shinzo Abe (Person), Prague
(Location)), time/date expressions (e.g. June 29
(Date)) and numerical expressions (e.g. 10) - In many NLP applications (e.g. IE, QA), Named
Entities play an important role - Named Entity Recognition task (NER)
- Treated as sequential tagging problem
- Machine learning methods have been proposed
- Recall is usually low
- Large scale NE dictionary is useful for NER
Semi-automatic methods to compile NE dictionaries
have been demanded
3Resource for NE dictionary construction
- Wikipedia
- Multi-lingual encyclopedia on the Web
- 382,613 gloss articles (as of June 20, 2007,
Japanese) - Gloss indices are composed by nouns or proper
nouns - HTML (Semi-structured text)
- Lists(ltLIgt) and Tables(ltTABLEgt) can be used as
clues for NE type categorization - Linked articles are glossed by anchor texts in
articles - Each article has one or more categories
Wikipedia has useful information for NE
categorization Can be considered as a suitable
resource
4Objective
- Extract Named Entities by assigning proper NE
labels for gloss indices of Wikipedia
Person
Person
Location
Natural Object
Organization
Product
5Use of Wikipedia features
- Features of Wikipedia articles
- Anchors of an article refer to the other related
articles - Anchors in list elements have dependencies each
other - gt Make 3 assumptions about dependencies between
anchors
Assumption 1 The latter element in a list item
tends to be in an attribute relation to the
former element
VOCATION
PERSON
- Burt Bacharachcomposer
- Dillard Clark
- Carpenters
- Karen Carpenter
ORGANIZATION
ORGANIZATION
Assumption 2 The elements in the same
itemization tends to be in the same NE category
PERSON
an example of a list structure
Assumption 3 The nested element tends to be in
a part-of relation to the upper element
6Overview of our approach
- Focus on HTML list structure in Wikipedia
- Make 3 assumptions about dependencies between
anchors - Formalize NE categorization problem as labeling
NE classes to anchors in lists - Define 3 kinds of cliques (edges Sibling, Cousin
and Relative ) between anchors based on 3
assumptions - Construct graphs based on 3 defined cliques
- CRFs for NE categorization in Wikipedia
- Define potential functions over 3 edges (and
nodes) to provide conditional distribution over
the graphs - Estimate MAP label assignment over the graphs
using Conditional Random Fields
7Conditional Random Fields (CRFs)
- Conditional Random Fields Lafferty 2001
- Discriminative, Undirected Models
- Define conditional distribution p(yx)
- Features
- Arbitrary features can be used
- Globally optimize on all possible label
assignments - Can deal with label dependencies by defining
potential functions for cliques (2 or more nodes)
8Use of dependencies for categorization
- NE categorization problem as labeling classes to
anchors - The edges of the constructed graphs corresponds
to a particular dependency - Estimate MAP label assignment over the
constructed graphs using Conditional Random
Fields - Our formulation Can extract anchors without
gloss articles
article exists
- Dillard Clark..country rock
- Carpenters
- Karen Carpenter
article does not exist
9Clique definition based on HTML tree structure
ltULgt
- Dillard Clarkcountry rock
- Carpenters
- Karen Carpenter
ltLIgt
ltLIgt
ltAgt
ltAgt
ltAgt
ltULgt
Dillard Clark
country rock
Carpenters
Sibling
The latter element tends to be in an attribute or
a concept of the former element
ltLIgt
Sibling
Cousin
ltAgt
Relative
Cousin
The elements tend to have a common NE category
(e.g. ORGANIZATION)
Karen Carpenter
Use these 3 relations as cliques of CRFs
Relative
The latter element tends to be in a constituent
part of the former element
10A graph constructed from 3 clique definitions
The latter element tends to be an attribute or a
concept of the former element
Sibling
- Burt BacharachOn my own1986
- Dillard Clark
- Gene Clark
- Carpenters As Time Goes By2000
- Karen Carpenter
The elements tend to have a common attribute
(e.g. ORGANIZATION)
Cousin
The latter element tends to be a constituent part
of the former element
Relative
Estimate the MAP label assignment over the graph
S Sibling C Cousin R Relative
11Model
Potential function for Sibling, Cousin and
Relative cliques
Potential function for nodes
S
S
C
- Constructed graphs include cycles exact
inference is computationally expensive - -gtIntroduce Tree-based Reparameterization (TRP)
Wainwright 2003 for approximate inference
R
C
C
C
S
S
R
12Experiments
- The aims of experiments are
- Compare graph-based approach (relational) to
node-wise approach (independent) to investigate
how the relational classification improves
classification accuracy - Investigate the effect of defined cliques
- Compare CRFs models to baseline models based on
SVMs - Show the effectiveness of using marginal
probability for filtering NE candidates.
13Dataset
- Dataset
- Randomly sampled 2300 articles (Japanese version
as of October 2005) - Anchors in list elements(ltLIgt) are hand-annotated
with NE class label - We used Extended Named Entity Hierarchy (Sekine
et al. 2002) - We reduced the number of classes to 13 from the
original 200 in order to avoid data sparseness - Classification target 16136 (14285 of those are
NEs)
14Experiments (CRFs)
- To investigate which clique type contributes
classification accuracy - We construct models that constitute of possible
combinations of defined cliques - 8 models (SCR, SC, SR, CR, S, C, R, I)
- Classification is performed on each connected
subgraph
15Experimental settings (Baseline) , Evaluation
- BaselineSupport Vector Machines (SVMs) Vapnik
1998 - We perform two models
- I model each anchor text is classified
independently - P model anchor texts are ordered by linear
position in HTML, and performed history-based
classification (j-1th classification result is
used in j-th classification) - For multi-class classification one-versus-rest
- Evaluation
- 5-fold cross validation, by F1-value
P model
16Results (F1-value)
ALL whole dataset , no article anchors
without articles
S
S
S
S
S
S
C
C
C
R
R
R
C
C
C
C
C
C
C
C
C
S
S
C
S
S
C
S
S
C
R
R
R
SC model
SCR model
SR model
CR model
P model
S
S
C
R
C
C
C
S
S
C
R
S model
C model
R model
I model
I model
SVMs
CRFs
17Results (F1-value)
1. Graph-based vs. Node-wise
ALL whole dataset , no article anchors
without articles
S
S
S
S
S
S
C
C
C
R
R
R
C
C
C
C
C
C
C
C
C
S
S
C
S
S
C
S
S
C
R
R
R
SC model
SCR model
SR model
CR model
P model
S
S
C
R
C
C
C
S
S
C
R
S model
C model
R model
I model
I model
SVMs
CRFs
18Results (F1-value)
2. Which clique is most contributed? gt Cousin
clique
ALL whole dataset , no article anchors
without articles
S
S
S
S
S
S
C
C
C
R
R
R
C
C
C
C
C
C
C
C
C
S
S
C
S
S
C
S
S
C
R
R
R
SC model
SCR model
SR model
CR model
P model
S
S
Cousin cliques provided the highest accuracy
improvements compare to sibling and relative
cliques
C
R
C
C
C
S
S
C
R
S model
C model
R model
I model
I model
SVMs
CRFs
19Results (F1-value)
3. CRFs vs. SVMs
Significance Test
McNemar paired test on labeling disagreements
ALL whole dataset , no article anchors
without articles
S
S
S
S
S
S
C
C
C
R
R
R
C
C
C
C
C
C
C
C
C
S
S
C
S
S
C
S
S
C
R
R
R
SC model
SCR model
SR model
CR model
P model
S
S
C
R
C
C
C
S
S
C
R
S model
C model
R model
I model
I model
SVMs
CRFs
20Filtering NE candidates using marginal probability
- Construct dictionaries from extracted NE
candidates - Methods with lower cost are desirable
- Extract only confident NE candidates
- -gt Use of marginal probability that provided by
CRFs - Marginal probability
- probability of a particular label assignment for
a node - This can be regarded as
- confidence of a classifier
yi
21Precision-Recall Curve
Precision-Recall curve obtained by thresholding
the marginal probability of the MAP estimation in
the CR model of CRFs
22Summary and future work
- Summary
- Proposed a method for categorizing NEs in
Wikipedia - Defined 3 kinds of cliques (Sibling, Cousin and
Relative) over HTML tree - Graph-based model achieved significant
improvements compare to Node-wise model, and
baseline methods (SVMs) - NEs can be extracted with lower cost by
exploiting marginal probability
23Summary and Future work
- Future work
- Use fine-grained NE classes
- For many NLP applications (e.g. QA, IE), NE
dictionary with fine grained label sets will be a
useful resource - Classification with statistical methods becomes
difficult in case that the label set is large,
because of the insufficient positive examples - Incorporate hierarchical structure of label sets
into our models (Hierarchical Classification) - Previous work suggest that exploiting
hierarchical structure of label sets improve
classification accuracy
24