A Graphbased Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields Yo - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

A Graphbased Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields Yo

Description:

A Graph-based Approach to Named Entity Categorization in Wikipedia Using ... Discriminative, Undirected Models. Define conditional distribution p(y|x) Features ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 25

Provided by: yotarow

Category:

more less

Transcript and Presenter's Notes

Title: A Graphbased Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields Yo

1
A Graph-based Approach to Named Entity
Categorization in Wikipedia Using Conditional
Random Fields Yotaro Watanabe, Masayuki Asahara
and Yuji MatsumotoNara Institute of Science and
Technology EMNLP-CoNLL 2007 29th June
Prague, Czech
2
Background

Named Entity
Proper nouns (e.g. Shinzo Abe (Person), Prague
(Location)), time/date expressions (e.g. June 29
(Date)) and numerical expressions (e.g. 10)
In many NLP applications (e.g. IE, QA), Named
Entities play an important role
Named Entity Recognition task (NER)
Treated as sequential tagging problem
Machine learning methods have been proposed
Recall is usually low
Large scale NE dictionary is useful for NER

Semi-automatic methods to compile NE dictionaries
have been demanded
3
Resource for NE dictionary construction

Wikipedia
Multi-lingual encyclopedia on the Web
382,613 gloss articles (as of June 20, 2007,
Japanese)
Gloss indices are composed by nouns or proper
nouns
HTML (Semi-structured text)
Lists(ltLIgt) and Tables(ltTABLEgt) can be used as
clues for NE type categorization
Linked articles are glossed by anchor texts in
articles
Each article has one or more categories

Wikipedia has useful information for NE
categorization Can be considered as a suitable
resource
4
Objective

Extract Named Entities by assigning proper NE
labels for gloss indices of Wikipedia

Person
Person
Location
Natural Object
Organization
Product
5
Use of Wikipedia features

Features of Wikipedia articles
Anchors of an article refer to the other related
articles
Anchors in list elements have dependencies each
other
gt Make 3 assumptions about dependencies between
anchors

Assumption 1 The latter element in a list item
tends to be in an attribute relation to the
former element
VOCATION
PERSON

Burt Bacharachcomposer
Dillard Clark
Carpenters
Karen Carpenter

ORGANIZATION
ORGANIZATION
Assumption 2 The elements in the same
itemization tends to be in the same NE category
PERSON
an example of a list structure
Assumption 3 The nested element tends to be in
a part-of relation to the upper element
6
Overview of our approach

Focus on HTML list structure in Wikipedia
Make 3 assumptions about dependencies between
anchors
Formalize NE categorization problem as labeling
NE classes to anchors in lists
Define 3 kinds of cliques (edges Sibling, Cousin
and Relative ) between anchors based on 3
assumptions
Construct graphs based on 3 defined cliques
CRFs for NE categorization in Wikipedia
Define potential functions over 3 edges (and
nodes) to provide conditional distribution over
the graphs
Estimate MAP label assignment over the graphs
using Conditional Random Fields

7
Conditional Random Fields (CRFs)

Conditional Random Fields Lafferty 2001
Discriminative, Undirected Models
Define conditional distribution p(yx)
Features
Arbitrary features can be used
Globally optimize on all possible label
assignments
Can deal with label dependencies by defining
potential functions for cliques (2 or more nodes)

8
Use of dependencies for categorization

NE categorization problem as labeling classes to
anchors
The edges of the constructed graphs corresponds
to a particular dependency
Estimate MAP label assignment over the
constructed graphs using Conditional Random
Fields
Our formulation Can extract anchors without
gloss articles

article exists

Dillard Clark..country rock
Carpenters
Karen Carpenter

article does not exist
9
Clique definition based on HTML tree structure
ltULgt

Dillard Clarkcountry rock
Carpenters
Karen Carpenter

ltLIgt
ltLIgt
ltAgt
ltAgt
ltAgt
ltULgt
Dillard Clark
country rock
Carpenters
Sibling
The latter element tends to be in an attribute or
a concept of the former element
ltLIgt
Sibling
Cousin
ltAgt
Relative
Cousin
The elements tend to have a common NE category
(e.g. ORGANIZATION)
Karen Carpenter
Use these 3 relations as cliques of CRFs
Relative
The latter element tends to be in a constituent
part of the former element
10
A graph constructed from 3 clique definitions
The latter element tends to be an attribute or a
concept of the former element
Sibling

Burt BacharachOn my own1986
Dillard Clark
Gene Clark
Carpenters As Time Goes By2000
Karen Carpenter

The elements tend to have a common attribute
(e.g. ORGANIZATION)
Cousin
The latter element tends to be a constituent part
of the former element
Relative
Estimate the MAP label assignment over the graph
S Sibling C Cousin R Relative
11
Model
Potential function for Sibling, Cousin and
Relative cliques
Potential function for nodes
S
S
C

Constructed graphs include cycles exact
inference is computationally expensive
-gtIntroduce Tree-based Reparameterization (TRP)
Wainwright 2003 for approximate inference

R
C
C
C
S
S
R
12
Experiments

The aims of experiments are
Compare graph-based approach (relational) to
node-wise approach (independent) to investigate
how the relational classification improves
classification accuracy
Investigate the effect of defined cliques
Compare CRFs models to baseline models based on
SVMs
Show the effectiveness of using marginal
probability for filtering NE candidates.

13
Dataset

Dataset
Randomly sampled 2300 articles (Japanese version
as of October 2005)
Anchors in list elements(ltLIgt) are hand-annotated
with NE class label
We used Extended Named Entity Hierarchy (Sekine
et al. 2002)
We reduced the number of classes to 13 from the
original 200 in order to avoid data sparseness
Classification target 16136 (14285 of those are
NEs)

14
Experiments (CRFs)

To investigate which clique type contributes
classification accuracy
We construct models that constitute of possible
combinations of defined cliques
8 models (SCR, SC, SR, CR, S, C, R, I)
Classification is performed on each connected
subgraph

15
Experimental settings (Baseline) , Evaluation

BaselineSupport Vector Machines (SVMs) Vapnik
1998
We perform two models
I model each anchor text is classified
independently
P model anchor texts are ordered by linear
position in HTML, and performed history-based
classification (j-1th classification result is
used in j-th classification)
For multi-class classification one-versus-rest
Evaluation
5-fold cross validation, by F1-value

P model
16
Results (F1-value)
ALL whole dataset , no article anchors
without articles
S
S
S
S
S
S
C
C
C
R
R
R
C
C
C
C
C
C
C
C
C
S
S
C
S
S
C
S
S
C
R
R
R
SC model
SCR model
SR model
CR model
P model
S
S
C
R
C
C
C
S
S
C
R
S model
C model
R model
I model
I model
SVMs
CRFs
17
Results (F1-value)
1. Graph-based vs. Node-wise
ALL whole dataset , no article anchors
without articles
S
S
S
S
S
S
C
C
C
R
R
R
C
C
C
C
C
C
C
C
C
S
S
C
S
S
C
S
S
C
R
R
R
SC model
SCR model
SR model
CR model
P model
S
S
C
R
C
C
C
S
S
C
R
S model
C model
R model
I model
I model
SVMs
CRFs
18
Results (F1-value)
2. Which clique is most contributed? gt Cousin
clique
ALL whole dataset , no article anchors
without articles
S
S
S
S
S
S
C
C
C
R
R
R
C
C
C
C
C
C
C
C
C
S
S
C
S
S
C
S
S
C
R
R
R
SC model
SCR model
SR model
CR model
P model
S
S
Cousin cliques provided the highest accuracy
improvements compare to sibling and relative
cliques
C
R
C
C
C
S
S
C
R
S model
C model
R model
I model
I model
SVMs
CRFs
19
Results (F1-value)
3. CRFs vs. SVMs
Significance Test
McNemar paired test on labeling disagreements
ALL whole dataset , no article anchors
without articles
S
S
S
S
S
S
C
C
C
R
R
R
C
C
C
C
C
C
C
C
C
S
S
C
S
S
C
S
S
C
R
R
R
SC model
SCR model
SR model
CR model
P model
S
S
C
R
C
C
C
S
S
C
R
S model
C model
R model
I model
I model
SVMs
CRFs
20
Filtering NE candidates using marginal probability

Construct dictionaries from extracted NE
candidates
Methods with lower cost are desirable
Extract only confident NE candidates
-gt Use of marginal probability that provided by
CRFs
Marginal probability
probability of a particular label assignment for
a node
This can be regarded as
confidence of a classifier

yi
21
Precision-Recall Curve
Precision-Recall curve obtained by thresholding
the marginal probability of the MAP estimation in
the CR model of CRFs
22
Summary and future work

Summary
Proposed a method for categorizing NEs in
Wikipedia
Defined 3 kinds of cliques (Sibling, Cousin and
Relative) over HTML tree
Graph-based model achieved significant
improvements compare to Node-wise model, and
baseline methods (SVMs)
NEs can be extracted with lower cost by
exploiting marginal probability

23
Summary and Future work

Future work
Use fine-grained NE classes
For many NLP applications (e.g. QA, IE), NE
dictionary with fine grained label sets will be a
useful resource
Classification with statistical methods becomes
difficult in case that the label set is large,
because of the insufficient positive examples
Incorporate hierarchical structure of label sets
into our models (Hierarchical Classification)
Previous work suggest that exploiting
hierarchical structure of label sets improve
classification accuracy