A Graphbased Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields Yo - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

A Graphbased Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields Yo

Description:

A Graph-based Approach to Named Entity Categorization in Wikipedia Using ... Discriminative, Undirected Models. Define conditional distribution p(y|x) Features ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 25
Provided by: yotarow
Category:

less

Transcript and Presenter's Notes

Title: A Graphbased Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields Yo


1
A Graph-based Approach to Named Entity
Categorization in Wikipedia Using Conditional
Random Fields Yotaro Watanabe, Masayuki Asahara
and Yuji MatsumotoNara Institute of Science and
Technology EMNLP-CoNLL 2007 29th June
Prague, Czech
2
Background
  • Named Entity
  • Proper nouns (e.g. Shinzo Abe (Person), Prague
    (Location)), time/date expressions (e.g. June 29
    (Date)) and numerical expressions (e.g. 10)
  • In many NLP applications (e.g. IE, QA), Named
    Entities play an important role
  • Named Entity Recognition task (NER)
  • Treated as sequential tagging problem
  • Machine learning methods have been proposed
  • Recall is usually low
  • Large scale NE dictionary is useful for NER

Semi-automatic methods to compile NE dictionaries
have been demanded
3
Resource for NE dictionary construction
  • Wikipedia
  • Multi-lingual encyclopedia on the Web
  • 382,613 gloss articles (as of June 20, 2007,
    Japanese)
  • Gloss indices are composed by nouns or proper
    nouns
  • HTML (Semi-structured text)
  • Lists(ltLIgt) and Tables(ltTABLEgt) can be used as
    clues for NE type categorization
  • Linked articles are glossed by anchor texts in
    articles
  • Each article has one or more categories

Wikipedia has useful information for NE
categorization Can be considered as a suitable
resource
4
Objective
  • Extract Named Entities by assigning proper NE
    labels for gloss indices of Wikipedia

Person
Person
Location
Natural Object
Organization
Product
5
Use of Wikipedia features
  • Features of Wikipedia articles
  • Anchors of an article refer to the other related
    articles
  • Anchors in list elements have dependencies each
    other
  • gt Make 3 assumptions about dependencies between
    anchors

Assumption 1 The latter element in a list item
tends to be in an attribute relation to the
former element
VOCATION
PERSON
  • Burt Bacharachcomposer
  • Dillard Clark
  • Carpenters
  • Karen Carpenter

ORGANIZATION
ORGANIZATION
Assumption 2 The elements in the same
itemization tends to be in the same NE category
PERSON
an example of a list structure
Assumption 3 The nested element tends to be in
a part-of relation to the upper element
6
Overview of our approach
  • Focus on HTML list structure in Wikipedia
  • Make 3 assumptions about dependencies between
    anchors
  • Formalize NE categorization problem as labeling
    NE classes to anchors in lists
  • Define 3 kinds of cliques (edges Sibling, Cousin
    and Relative ) between anchors based on 3
    assumptions
  • Construct graphs based on 3 defined cliques
  • CRFs for NE categorization in Wikipedia
  • Define potential functions over 3 edges (and
    nodes) to provide conditional distribution over
    the graphs
  • Estimate MAP label assignment over the graphs
    using Conditional Random Fields

7
Conditional Random Fields (CRFs)
  • Conditional Random Fields Lafferty 2001
  • Discriminative, Undirected Models
  • Define conditional distribution p(yx)
  • Features
  • Arbitrary features can be used
  • Globally optimize on all possible label
    assignments
  • Can deal with label dependencies by defining
    potential functions for cliques (2 or more nodes)

8
Use of dependencies for categorization
  • NE categorization problem as labeling classes to
    anchors
  • The edges of the constructed graphs corresponds
    to a particular dependency
  • Estimate MAP label assignment over the
    constructed graphs using Conditional Random
    Fields
  • Our formulation Can extract anchors without
    gloss articles

article exists
  • Dillard Clark..country rock
  • Carpenters
  • Karen Carpenter

article does not exist
9
Clique definition based on HTML tree structure
ltULgt
  • Dillard Clarkcountry rock
  • Carpenters
  • Karen Carpenter

ltLIgt
ltLIgt
ltAgt
ltAgt
ltAgt
ltULgt
Dillard Clark
country rock
Carpenters
Sibling
The latter element tends to be in an attribute or
a concept of the former element
ltLIgt
Sibling
Cousin
ltAgt
Relative
Cousin
The elements tend to have a common NE category
(e.g. ORGANIZATION)
Karen Carpenter
Use these 3 relations as cliques of CRFs
Relative
The latter element tends to be in a constituent
part of the former element
10
A graph constructed from 3 clique definitions
The latter element tends to be an attribute or a
concept of the former element
Sibling
  • Burt BacharachOn my own1986
  • Dillard Clark
  • Gene Clark
  • Carpenters As Time Goes By2000
  • Karen Carpenter

The elements tend to have a common attribute
(e.g. ORGANIZATION)
Cousin
The latter element tends to be a constituent part
of the former element
Relative
Estimate the MAP label assignment over the graph
S Sibling C Cousin R Relative
11
Model
Potential function for Sibling, Cousin and
Relative cliques
Potential function for nodes
S
S
C
  • Constructed graphs include cycles exact
    inference is computationally expensive
  • -gtIntroduce Tree-based Reparameterization (TRP)
    Wainwright 2003 for approximate inference

R
C
C
C
S
S
R
12
Experiments
  • The aims of experiments are
  • Compare graph-based approach (relational) to
    node-wise approach (independent) to investigate
    how the relational classification improves
    classification accuracy
  • Investigate the effect of defined cliques
  • Compare CRFs models to baseline models based on
    SVMs
  • Show the effectiveness of using marginal
    probability for filtering NE candidates.

13
Dataset
  • Dataset
  • Randomly sampled 2300 articles (Japanese version
    as of October 2005)
  • Anchors in list elements(ltLIgt) are hand-annotated
    with NE class label
  • We used Extended Named Entity Hierarchy (Sekine
    et al. 2002)
  • We reduced the number of classes to 13 from the
    original 200 in order to avoid data sparseness
  • Classification target 16136 (14285 of those are
    NEs)

14
Experiments (CRFs)
  • To investigate which clique type contributes
    classification accuracy
  • We construct models that constitute of possible
    combinations of defined cliques
  • 8 models (SCR, SC, SR, CR, S, C, R, I)
  • Classification is performed on each connected
    subgraph

15
Experimental settings (Baseline) , Evaluation
  • BaselineSupport Vector Machines (SVMs) Vapnik
    1998
  • We perform two models
  • I model each anchor text is classified
    independently
  • P model anchor texts are ordered by linear
    position in HTML, and performed history-based
    classification (j-1th classification result is
    used in j-th classification)
  • For multi-class classification one-versus-rest
  • Evaluation
  • 5-fold cross validation, by F1-value

P model
16
Results (F1-value)
ALL whole dataset , no article anchors
without articles
S
S
S
S
S
S
C
C
C
R
R
R
C
C
C
C
C
C
C
C
C
S
S
C
S
S
C
S
S
C
R
R
R
SC model
SCR model
SR model
CR model
P model
S
S
C
R
C
C
C
S
S
C
R
S model
C model
R model
I model
I model
SVMs
CRFs
17
Results (F1-value)
1. Graph-based vs. Node-wise
ALL whole dataset , no article anchors
without articles
S
S
S
S
S
S
C
C
C
R
R
R
C
C
C
C
C
C
C
C
C
S
S
C
S
S
C
S
S
C
R
R
R
SC model
SCR model
SR model
CR model
P model
S
S
C
R
C
C
C
S
S
C
R
S model
C model
R model
I model
I model
SVMs
CRFs
18
Results (F1-value)
2. Which clique is most contributed? gt Cousin
clique
ALL whole dataset , no article anchors
without articles
S
S
S
S
S
S
C
C
C
R
R
R
C
C
C
C
C
C
C
C
C
S
S
C
S
S
C
S
S
C
R
R
R
SC model
SCR model
SR model
CR model
P model
S
S
Cousin cliques provided the highest accuracy
improvements compare to sibling and relative
cliques
C
R
C
C
C
S
S
C
R
S model
C model
R model
I model
I model
SVMs
CRFs
19
Results (F1-value)
3. CRFs vs. SVMs
Significance Test
McNemar paired test on labeling disagreements
ALL whole dataset , no article anchors
without articles
S
S
S
S
S
S
C
C
C
R
R
R
C
C
C
C
C
C
C
C
C
S
S
C
S
S
C
S
S
C
R
R
R
SC model
SCR model
SR model
CR model
P model
S
S
C
R
C
C
C
S
S
C
R
S model
C model
R model
I model
I model
SVMs
CRFs
20
Filtering NE candidates using marginal probability
  • Construct dictionaries from extracted NE
    candidates
  • Methods with lower cost are desirable
  • Extract only confident NE candidates
  • -gt Use of marginal probability that provided by
    CRFs
  • Marginal probability
  • probability of a particular label assignment for
    a node
  • This can be regarded as
  • confidence of a classifier

yi
21
Precision-Recall Curve
Precision-Recall curve obtained by thresholding
the marginal probability of the MAP estimation in
the CR model of CRFs
22
Summary and future work
  • Summary
  • Proposed a method for categorizing NEs in
    Wikipedia
  • Defined 3 kinds of cliques (Sibling, Cousin and
    Relative) over HTML tree
  • Graph-based model achieved significant
    improvements compare to Node-wise model, and
    baseline methods (SVMs)
  • NEs can be extracted with lower cost by
    exploiting marginal probability

23
Summary and Future work
  • Future work
  • Use fine-grained NE classes
  • For many NLP applications (e.g. QA, IE), NE
    dictionary with fine grained label sets will be a
    useful resource
  • Classification with statistical methods becomes
    difficult in case that the label set is large,
    because of the insufficient positive examples
  • Incorporate hierarchical structure of label sets
    into our models (Hierarchical Classification)
  • Previous work suggest that exploiting
    hierarchical structure of label sets improve
    classification accuracy

24
  • Thank you.
Write a Comment
User Comments (0)
About PowerShow.com