Deterministic Part-of-Speech Tagging with Finite-State Transducers - PowerPoint PPT Presentation

About This Presentation
Title:

Deterministic Part-of-Speech Tagging with Finite-State Transducers

Description:

by Emmanuel Roche and Yves Schabes. CS730B Statistical NLP - Page 2 - Introduction. Stochastic approaches to NLP have often been preferred to rule-based approaches ... – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 13
Provided by: nlpPos
Category:

less

Transcript and Presenter's Notes

Title: Deterministic Part-of-Speech Tagging with Finite-State Transducers


1
Deterministic Part-of-Speech Tagging with
Finite-State Transducers
by Emmanuel Roche and Yves Schabes
  • ? ? ?
  • KLE Lab. CSE POSTECH
  • 98. 10. 16

2
Introduction
  • Stochastic approaches to NLP have often been
    preferred to rule-based approaches
  • Eric Brill (1992) rule-based tagger by
    inferring rules from a training corpus
  • rules are automatically acquired
  • require drastically less space than stochastic
    tagger
  • but, considerably slow
  • ? Deterministic Finite-State Transducer
  • (Subsequential Transducer)

3
Overview of Brills Tagger
  • Structure of the tagger
  • Lexical tagger (Initial tagger)
  • Unknown word tagger
  • Contextual tagger
  • Inefficiency
  • Individual rules is compared at each token of
    the input (Fig.3)
  • Potential interaction between rules (Fig.1)
  • Complexity RKn
  • R of contextual rules n of input words
  • K max of tokens which rules require

4
Finite-State Transducer (1)
  • Finite-State Transducer T (?, Q, i, F, E)
  • ? finite alphabet Q finite set of states
  • i initial state F set of final state
  • E set of transitions (q, a, w, q) on Q? (?
    ? ?) ???Q
  • Deterministic F.S. Transducer T (?, Q, i, F, ?,
    ?, ?)
  • ? deterministic state transition func. ( q ? a
    q)
  • ? deterministic emission func. ( q ? a w )
  • ? final emission func. ( ?(q) w for q ? F )

5
Finite-State Transducer (2)
  • state transition function
  • d (q,a) q? Q ?w ? ? and (q,a,w,q) ? E
  • emission function
  • ? (q,a,q) w ? ? (q,a,w,q) ? E

6
Construction of the Finite-State Tagger (1)
  • 1. Turn each contextual rule into a finite-state
    transducer
  • 2. Local extension of the transducer (algorithm
    of Fig.17)

vbn vbd PRETAG np
np/np
vbn/vbd
0
1
2
np/np
?/?
np/np
?/?
1
0
vbn/vbd
7
Construction of the Finite-State Tagger (2)
  • 3. Combines all transducers into one single
    transducer
  • (algorithm of Elgot and Mezei)
  • 4. Transforming the obtained transducer into an
    equivalent
  • subsequential (deterministic) transducer
    (algorithm of Fig.21)
  • Advantage
  • Requires n steps to tag a sentence of length n,
    independently of the number of rules and the
    length of the context
  • Eliminate inefficiencies of Brills tagger

8
Local Extension Algorithm
?/?
1
a/b
b/c
1 transd
2
b/c
0
a/b
b/d
2 transd
3
Fig.18
b/d
0 identity
4
?/?
a/b
0
b/b
a/a
transd
0,1 identity
?/?
Fig.19
1
a/a
2
9
Determinization Algorithm
1
a/b
h/h
3
0
a/c
e/e
2
h/bh
Fig.13
(2, ?)
(1,b) (2,c)
(0, ?)
a/?
0
1
2
e/ce
Fig.22
10
Lexical Tagger
  • The first step of the tagging process looking
    up each word in a dictionary (Fig.9)
  • To achieve high speed (Fig.10)
  • Represent the dictionary by a deterministic
    finite-state automaton
  • (algorithm of Revuz)
  • Advantage
  • fast access 12,000 words / second
  • small storage space 742Kb (ASCII form) ?
    360Kb
  • Unknown words Tagger
  • same techniques used

11
Implementation of Finite-State Transducer
  • Represented by a two-dimensional table
  • row states
  • column alphabet of all possible input letters
  • content output of the transition

a
. . .
qn
w
. . .
12
Evaluation
  • Overall performance comparison (Fig.11)
  • Stochastic Tagger Churchs trigram tagger
    (1988)
  • Rule-based Tagger Brills tagger
  • All taggers were trained on the Brown corpus and
    used same lexicon of Fig.10
  • Speeds of the different parts of finite-state
    tagger (Fig.12)
  • Low-level factors (storage access) dominate the
    computation
Write a Comment
User Comments (0)
About PowerShow.com