Deep parsers for realworld application - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Deep parsers for realworld application

Description:

none – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 60
Provided by: wwwcsliS
Category:

less

Transcript and Presenter's Notes

Title: Deep parsers for realworld application


1
Deep parsers for real-world application
Takuya Matsuzaki Univ. of Tokyo
Tadayoshi Hara Univ. of Tokyo
Kenji Sagae Univ. of Southern California
Yoshimasa Tsuruoka Univ. of Manchester
Takashi Ninomiya Univ. of Tokyo
Yusuke Miyao Univ. of Tokyo
Jin-Dong Kim Univ. of Tokyo
Tomoko Ohta Univ. of Tokyo
Yuka Tateisi Kogakuin University
Research on Advanced Natural Language Processing
and Text Mining aNT Grant-in-Aid for specially
promoted research, MEXT (2006-2011)
  • Junichi TSUJII
  • Univ. of Tokyo
  • Univ. of Manchester

WorkshopGEAF, Coling 2008, Manchester
2
Sentence Parsing
Nerd
IT Businessman
The field has matured, ready to be used
by applications.
Parsing based on a proper linguistic formalism is
one of the core research fields in CL and NLP.
Integration of linguistic grammar formalisms with
statistical models. Distinction between
computational relationships and processes
It was considered as a monolithic, esoteric and
inward looking field, largely dissociated from
real world application.
Robust, efficient and open to eclectic sources
of information other than syntactic ones
. Information Extraction
Question/Answering Machine Translation

3
Deep parser which produces semantic
representation
S
Predicate Activate Arg1 P53
Arg2 Bcl-2 Protein
VP
An example from Bio-TM
VP
VP
S
VP
arg3
arg1
NP
ADVP
NP
arg2
arg2
p53 has been shown to directly
activate the Bcl-2 protein
4
??/?????HPSG??? (Enju)???
s
Sentence Retrieval System Using Semantic
Representation MEDIE
vp
vp
np
pp
arg2
arg1
mod
dt np vp vp pp
np
DT NN VBZ VBN IN PRP
The protein is activated by it
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
Computational Relationship HPSG
9
HPSG
  • HPSG Lexical entries Grammar rules
  • Lexical entries syntactic and semantic
    descriptions of word-specific behaviors
  • c.f. Enju grammar (Miyao et al 2004) has 3797
    lexical entries for 10,536 words
  • Grammar rules non-word-specific syntactic and
    semantic configurations
  • c.f. Enju grammar has 12 grammar rules

10
HPSGComputational Relationships
lexical entry (leaf node)
HEAD verb SUBJ ltHEAD noungt COMPS ltlt HEAD noungtgt
HEAD noun SUBJ ltgt COMPS ltgt
HEAD noun SUBJ ltgt COMPltgt
John
Mary
loved
11
HPSGComputational Relationships
HEAD SUBJ COMPS
1

2
4
grammar rule
HEAD SUBJ COMPS
1
2
5

3
lt , gt
4
3
unification
unification
HEAD verb SUBJ ltHEAD noungt COMPS ltlt HEAD noungtgt
HEAD noun SUBJ ltgt COMPS ltgt
John
Mary
loved
12
HPSGComputational Relationships
propagation of information
HEAD SUBJ COMPS
1

2
ltgt
4
grammar rule
HEAD SUBJ COMPS
1
2
5

3
lt , gt
4
3
unification
HEAD verb SUBJ ltHEAD noungt COMPS ltlt HEAD noungtgt
HEAD noun SUBJ ltgt COMPS ltgt
John
Mary
loved
13
HPSGComputational Relationships
HEAD verb SUBJ ltHEAD noungt COMPS ltgt
HEAD verb SUBJ ltHEAD noungt COMPS ltlt HEAD noungtgt
HEAD noun SUBJ ltgt COMPS ltgt
Mary
loved
14
HPSGComputational Relationships
  • Relationships are derived by applying grammar
    rules recursively

HEAD verb SUBJ ltgt COMPS ltgt
Predicate Love Arg1 Mary Arg2 John
2
3
HEAD verb SUBJ ltHEAD noungt COMPS ltgt
2
HEAD verb SUBJ ltHEAD noungt COMPS ltlt HEAD noungtgt
HEAD noun SUBJ ltgt COMPS ltgt
3
Mary
loved
15
HPSG Relationships among different layers of
representation
The information is mostly written in a lexical
entry
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
  • An example of a complex syntactic tree
  • SLASH, REL features explain non-local
    dependencies
  • WH movement, topicalization, relative clauses

Mapping a syntactic tree - passive in relative
clause construction - to the predicate argument
structure
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD detSUBJ lt gtCOMPS lt gt
1
1
the
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD verbSUBJ lt gtCOMPS lt gtREL lt gt
2
1
2
prices
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
2
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
3
3
we
2
CHARGE Arg1 Unknown Arg2 Price
Arg3 We
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
HEAD verbSUBJ lt gtCOMPS lt gt
3
3
4
4
2
charged
were
16
  • Parsing based on HPSG (Pollard Sag 1994)
  • Mathematically well-defined with sophisticated
    constraint-based system
  • Linguistically justified
  • Deep syntactic grammar that provides semantic
    analysis

10 years ago
Unrealistic solutions for real-world text
17
Combining HPSG with Statistical Models
18
Difficulties in Parsing based on HPSG
  • Difficulty of developing a broad-coverage HPSG
    grammar
  • Difficulty of disambiguation
  • No tree bank for training an HPSG grammar
  • No probabilistic model for HPSG
  • Efficiency
  • Very slow CFG filtering, Efficient search,
    Feature Forest

19
Difficulties in Parsing based on HPSG
  • Difficulty of developing a broad-coverage HPSG
    grammar
  • Difficulty of disambiguation
  • No treebank for training an HPSG grammar
  • No probabilistic model for HPSG
  • Efficiency
  • Very slow CFG filtering, Efficient search,
    Feature Forest

20
Grammar with Broad Coverage
  • Treebank for Grammar development and evaluation
  • Treebank grammar
  • Enju (Miyao et al. 2004)
  • Treebank development
  • Redwood (Oepen et al. 2002)
  • Hinoki (Bond et al. 2004)

Rule Application
HPSG Grammar
HPSG Treebank
Penn Treebank
Lexical Knowledge Acquisition
HPSG Treebank

Sentences
HPSG Grammar
21
Performance of Semantic Parser
22
Difficulties in HPSG Parsing
  • Difficulty of developing a broad-coverage HPSG
    grammar
  • Difficulty of disambiguation
  • No treebank for training an HPSG grammar
  • No probabilistic model for HPSG
  • Efficiency
  • Very slow CFG filtering, Efficient search,
    Feature Forest

23
Probabilistic Model and HPSG
  • Probabilistic model
  • Log-linear model for unification-based grammars
    (Abney 1997, Johnson et al. 1999, Riezler et al.
    2000, Miyao et al. 2003, Malouf and van Noord
    2004, Kaplan et al. 2004, Miyao and Tsujii 2005)

Training
HPSG Treebank
Statistics (Model Parameters)
24
Probabilistic HPSG
w A blue eyes girl with white hair and skin
walked
S
NP
NP
PP
T
NP
NP
NP
NP
VP
A blue eyes girl with white hair and skin walked
25
Probabilistic HPSG
w A blue eyes girl with white hair and skin
walked
T1
T2
T3
T4
Tn

All possible parse trees derived from w with a
grammar
p(T3w) is the probability of selecting T3 from
T1, T2, , and Tn.
26
Probabilistic HPSG
  • Log-linear model for unification-based grammars
    (Abney 1997, Johnson et al. 1999, Riezler et al.
    2000, Miyao et al. 2003, Malouf and van Noord
    2004, Kaplan et al. 2004, Miyao and Tsujii 2005)
  • Input sentence w
  • ww1/P1, w2/P2, w3/P3,,wn/Pn
  • Output parse tree T

word
POS
feature function
a weight for a feature function
normalization factor
27
Log-Linear ModelMaximum Entropy Model
w A blue eyes girl with white hair and skin
walked
All parse trees derived from w with a grammar
T1
T2
T3
T4
Tn

f1(T1)1 f2(T1)0 f3(T1)0 fm(T1)1
f1(T2)1 f2(T2)1 f3(T2)1 fm(T2)1
f1(T3)1 f2(T3)1 f3(T3)0 fm(T3)0
f1(T4)1 f2(T4)0 f3(T4)1 fm(T4)1
f1(Tn)0 f2(Tn)1 f3(Tn)0 fm(Tn)0
feature functions are indicators that indicate
the properties that the parse tree has.
28
Example of Features in Probabilistic HPSG
rule name
left daughters head lexical entry
distance of head words
left daughters POS
CAT
verb
ltgt
comma exists or not
SUBCAT
left daughters category
CAT
verb
left daughters head word
ltNPgt
SUBCAT
left daughters span
CAT
noun
CAT
verb
CAT
verb
ltgt
ltVPgt
ltNPgt
SUBCAT
SUBCAT
SUBCAT



29
Performance of Semantic Parser
30
Difficulties in HPSG Parsing
  • Difficulty of developing a broad-coverage HPSG
    grammar
  • Difficulty of disambiguation
  • No treebank for training an HPSG grammar
  • No probabilistic model for HPSG
  • Efficiency
  • Very slow CFG filtering, Efficient search,
    Feature Forest

31
(No Transcript)
32
CKY parsing
Prob 0.002
Prob 0.003
Prob 0.010
Prob 0.075
Feature Forest Model (Miyao and Tsujii,
20012008)
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
John
Mary
33
Beam Search and Iterative WideningNinomiya 2005
Iterative Global thresholding
Iterative
Local thresholding (numwidth)
Local thresholding Global thresholding
34
Distribution of Parsing time for Sentence Length
(Black none) (Red Iterative Parsing)
35
Performance of Semantic Parser
36
Scalability of TM Tools - MEDIE
Target Corpus MEDLINE corpus
Suppose, for example, that it takes one second
for parsing one sentence.
70 million seconds, that is, about 2 years
37
TM and GRIDNinomiya 2006, Taura 2004
  • Solution
  • The entire MEDLINE were parsed by distributed PC
    clusters consisting of 340 CPUs
  • Parallel processing was managed by grid platform
    GXP
  • Experiments
  • The entire MEDLINE was parsed in 8 days
  • Output
  • Syntactic parse trees and predicate argument
    structures in XML format
  • The data sizes of compressed/uncompressed output
    were 42.5GB/260GB.

38
More Accurate and Efficient Parser- Current
Research -
Research on Advanced Natural Language Processing
and Text Mining aNT Grant-in-Aid for Specially
promoted research, MEXT (2006-2011)
39
Selection of Lexical Entries
  • Reference distribution of unigram lexical entry
    selection (Miyao Tsujii 2005)
  • Filtering unlikely lexical entries during
    parameter estimation
  • Unigram lexical entry selection

reference distribution
lexical entry
POS
word
40
CKY parsing
Prob 0.002
Prob 0.003
Prob 0.010
Prob 0.075
Selection of Lexical Entries Is crucial.
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
John
Mary
41
Selection of Lexical EntriesSuper-Tagging
  • Reference distribution of unigram lexical entry
    selection (Miyao Tsujii 2005)
  • Filtering unlikely lexical entries during
    parameter estimation
  • Unigram lexical entry selection

reference distribution
Super-tagger
lexical entry
POS
word
42
Super-tagging and HPSG
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
  • An example of a complex syntactic tree
  • SLASH, REL features explain non-local
    dependencies
  • WH movement, topicalization, relative clauses

Mapping of a syntactic tree - passive in a
relative clause- to the predicate argument
structure
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD detSUBJ lt gtCOMPS lt gt
1
1
the
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD verbSUBJ lt gtCOMPS lt gtREL lt gt
2
1
2
prices
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
2
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
3
3
we
2
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
HEAD verbSUBJ lt gtCOMPS lt gt
3
3
4
4
2
charged
were
43
Deep Parser with Super-Tagging
  • Accuracy of predicate-argument dependencies and
    parsing time (Section 23 ? 100 words, Gold POS)

44
Integrated Model vs. Staged Model
Super-Tagger
Deterministic Parser
45
System Overview Matsuzaki, et.al. 2007
...
input sentence
Mary loved John
Enumeration of assignments
Supertagger
Deterministicdisambiguation
Prob.
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
Mary
John
46
Enumaration of the maybe-parsable LE assignments
Derived from the HPSG grammar
Enumeration of thehighest-prob. LE sequences
Deterministic Parser
Torisawa, Tsujii 2000
Supertaggingresult
CFG-filter
(
...
47
Deterministic S-R Parser
Initial state
Q
S
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
Mary
John
loved
48
argmax F(a, S, Q) SHIFT
Q
S
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
Mary
loved
John
49
argmax F(a, S, Q) SHIFT
Q
S
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
John
loved
Mary
50

argmax F(a, S, Q) REDUCE(Head_Comp)
Q
S
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ lt1NPgtCOMPS ltgt
Head-Comp-Schema
Mary
HEAD verbSUBJ lt1gtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
loved
John
51
Experiment Results
6 times faster 20 times faster than the initial
model
52
Richer Models Domain Adaptation
  • Low parsing accuracy for different domains
  • Ex.) Enju trained on the Penn Treebank
  • Penn Treebank 89.81 (F-score)
  • GENIA (biomedical domain) 86.39 (F-score)
  • Re-training a probabilistic model on the domain
  • Small training data for the target domain
  • Penn Treebank 39,832 sentences
  • GENIA 10,848 sentences (gtgt other domains)

Kim et al., 1998
53
Adaptation with Reference Distribution
Lexical Assignment
Syntactic Preference
Feature function
Feature weight
Original model
54
Performance of Adaptation Models Hara 2007
Corpus size vs. accuracy
Training time vs. accuracy
90
90
89
89
Original pE(ts) for the Penn Treebank 89.81,
the training time is 10 times less than the
naïve model.
Parsing accuracy (F-score)
88
88
87
87
86
86
20000
Baseline Original pE(ts) for the GENIA, 86.4
4000
6000
8000
0
0
2000
Training time (sec.)
of GENIA training sentences
55
Adaptation with Reference Distribution
Lexical Assignment
Syntactic Preference
Independent of the original model
Feature function
Feature weight
Original model
56
NER and Knowledge-based Processing
ONTOLOGY
ANNOTATION
MEMM, CRF
3) selective deletion of the functional nuclear
localization signal present in theRel homology
domain of NF-kappa B p65 disrupts its ability to
engage I kappa B/MAD-3, and 4)
TEXT
57
Adaptation with Reference Distribution
Lexical Assignment
Syntactic Preference
Relation, Event Recognition
NER results as soft constraints
Feature function
Feature weight
Original model
58
Conclusions
59
Conclusions Lessons
  • A Deep Parser, which produces semantic
    representation, has become a practical option
  • Integrated Model to Staged Model, lower level
    processings with rich context

60
Super-tagging and HPSG
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
  • An example of a complex syntactic tree
  • SLASH, REL features explain non-local
    dependencies
  • WH movement, topicalization, relative clauses

HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD detSUBJ lt gtCOMPS lt gt
1
1
the
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD verbSUBJ lt gtCOMPS lt gtREL lt gt
2
1
2
prices
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
2
RR results
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
3
3
NER results
we
2
ER results
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
HEAD verbSUBJ lt gtCOMPS lt gt
3
3
4
4
2
charged
were
61
Conclusions Lessons
  • A Deep Parser, which produces semantic
    representation, has become a practical option
  • Integrated Model to Staged Model, lower level
    Processing with rich context
  • Deterministic Parser with classifiers based on
    rich linguistic and extra-linguistic information

62
argmax F(a, S, Q) SHIFT
Q
S
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
it
I
like
NER results
RR results
ER results
63
Conclusions Lessons
  • A Deep Parser, which produces semantic
    representation, has become a practical option
  • Integrated Model to Staged Model, lower level
    Processing with rich context
  • Deterministic Parser with classifiers based on
    rich linguistic and extra-linguistic information
  • Combination of Constraints Preferences, more
    robust parsers

64
Grammar Engineering
  • Evaluation Standards for Deep Parsers
  • Modular Constructions of Deep Parsers
  • Semantic Frames of Verbs
  • New categories of NEs, New NERs/RRs to be plugged
    in
  • Modules for processing formula, punctuations,
    etc.
  • Dictionaries
  • Quick Adaptation of Statistical Components
  • Preparation of Annotated Corpora

65
Thank You !
The field has matured, ready to be used
by applications.
Integration of linguistic grammar formalisms with
statistical models. Distinction between
computational relationships and processes
Robust, efficient and open to eclectic sources
of information other than syntactic ones
. Information Extraction
Question/Answering Machine Translation
Write a Comment
User Comments (0)
About PowerShow.com