Title: Deep parsers for realworld application
1Deep parsers for real-world application
Takuya Matsuzaki Univ. of Tokyo
Tadayoshi Hara Univ. of Tokyo
Kenji Sagae Univ. of Southern California
Yoshimasa Tsuruoka Univ. of Manchester
Takashi Ninomiya Univ. of Tokyo
Yusuke Miyao Univ. of Tokyo
Jin-Dong Kim Univ. of Tokyo
Tomoko Ohta Univ. of Tokyo
Yuka Tateisi Kogakuin University
Research on Advanced Natural Language Processing
and Text Mining aNT Grant-in-Aid for specially
promoted research, MEXT (2006-2011)
- Junichi TSUJII
- Univ. of Tokyo
- Univ. of Manchester
WorkshopGEAF, Coling 2008, Manchester
2Sentence Parsing
Nerd
IT Businessman
The field has matured, ready to be used
by applications.
Parsing based on a proper linguistic formalism is
one of the core research fields in CL and NLP.
Integration of linguistic grammar formalisms with
statistical models. Distinction between
computational relationships and processes
It was considered as a monolithic, esoteric and
inward looking field, largely dissociated from
real world application.
Robust, efficient and open to eclectic sources
of information other than syntactic ones
. Information Extraction
Question/Answering Machine Translation
3Deep parser which produces semantic
representation
S
Predicate Activate Arg1 P53
Arg2 Bcl-2 Protein
VP
An example from Bio-TM
VP
VP
S
VP
arg3
arg1
NP
ADVP
NP
arg2
arg2
p53 has been shown to directly
activate the Bcl-2 protein
4??/?????HPSG??? (Enju)???
s
Sentence Retrieval System Using Semantic
Representation MEDIE
vp
vp
np
pp
arg2
arg1
mod
dt np vp vp pp
np
DT NN VBZ VBN IN PRP
The protein is activated by it
5(No Transcript)
6(No Transcript)
7(No Transcript)
8Computational Relationship HPSG
9HPSG
- HPSG Lexical entries Grammar rules
- Lexical entries syntactic and semantic
descriptions of word-specific behaviors - c.f. Enju grammar (Miyao et al 2004) has 3797
lexical entries for 10,536 words - Grammar rules non-word-specific syntactic and
semantic configurations - c.f. Enju grammar has 12 grammar rules
10HPSGComputational Relationships
lexical entry (leaf node)
HEAD verb SUBJ ltHEAD noungt COMPS ltlt HEAD noungtgt
HEAD noun SUBJ ltgt COMPS ltgt
HEAD noun SUBJ ltgt COMPltgt
John
Mary
loved
11HPSGComputational Relationships
HEAD SUBJ COMPS
1
2
4
grammar rule
HEAD SUBJ COMPS
1
2
5
3
lt , gt
4
3
unification
unification
HEAD verb SUBJ ltHEAD noungt COMPS ltlt HEAD noungtgt
HEAD noun SUBJ ltgt COMPS ltgt
John
Mary
loved
12HPSGComputational Relationships
propagation of information
HEAD SUBJ COMPS
1
2
ltgt
4
grammar rule
HEAD SUBJ COMPS
1
2
5
3
lt , gt
4
3
unification
HEAD verb SUBJ ltHEAD noungt COMPS ltlt HEAD noungtgt
HEAD noun SUBJ ltgt COMPS ltgt
John
Mary
loved
13HPSGComputational Relationships
HEAD verb SUBJ ltHEAD noungt COMPS ltgt
HEAD verb SUBJ ltHEAD noungt COMPS ltlt HEAD noungtgt
HEAD noun SUBJ ltgt COMPS ltgt
Mary
loved
14HPSGComputational Relationships
- Relationships are derived by applying grammar
rules recursively
HEAD verb SUBJ ltgt COMPS ltgt
Predicate Love Arg1 Mary Arg2 John
2
3
HEAD verb SUBJ ltHEAD noungt COMPS ltgt
2
HEAD verb SUBJ ltHEAD noungt COMPS ltlt HEAD noungtgt
HEAD noun SUBJ ltgt COMPS ltgt
3
Mary
loved
15HPSG Relationships among different layers of
representation
The information is mostly written in a lexical
entry
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
- An example of a complex syntactic tree
- SLASH, REL features explain non-local
dependencies - WH movement, topicalization, relative clauses
Mapping a syntactic tree - passive in relative
clause construction - to the predicate argument
structure
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD detSUBJ lt gtCOMPS lt gt
1
1
the
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD verbSUBJ lt gtCOMPS lt gtREL lt gt
2
1
2
prices
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
2
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
3
3
we
2
CHARGE Arg1 Unknown Arg2 Price
Arg3 We
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
HEAD verbSUBJ lt gtCOMPS lt gt
3
3
4
4
2
charged
were
16- Parsing based on HPSG (Pollard Sag 1994)
- Mathematically well-defined with sophisticated
constraint-based system - Linguistically justified
- Deep syntactic grammar that provides semantic
analysis
10 years ago
Unrealistic solutions for real-world text
17Combining HPSG with Statistical Models
18Difficulties in Parsing based on HPSG
- Difficulty of developing a broad-coverage HPSG
grammar - Difficulty of disambiguation
- No tree bank for training an HPSG grammar
- No probabilistic model for HPSG
- Efficiency
- Very slow CFG filtering, Efficient search,
Feature Forest
19Difficulties in Parsing based on HPSG
- Difficulty of developing a broad-coverage HPSG
grammar - Difficulty of disambiguation
- No treebank for training an HPSG grammar
- No probabilistic model for HPSG
- Efficiency
- Very slow CFG filtering, Efficient search,
Feature Forest
20Grammar with Broad Coverage
- Treebank for Grammar development and evaluation
- Treebank grammar
- Enju (Miyao et al. 2004)
- Treebank development
- Redwood (Oepen et al. 2002)
- Hinoki (Bond et al. 2004)
Rule Application
HPSG Grammar
HPSG Treebank
Penn Treebank
Lexical Knowledge Acquisition
HPSG Treebank
Sentences
HPSG Grammar
21Performance of Semantic Parser
22Difficulties in HPSG Parsing
- Difficulty of developing a broad-coverage HPSG
grammar - Difficulty of disambiguation
- No treebank for training an HPSG grammar
- No probabilistic model for HPSG
- Efficiency
- Very slow CFG filtering, Efficient search,
Feature Forest
23Probabilistic Model and HPSG
- Probabilistic model
- Log-linear model for unification-based grammars
(Abney 1997, Johnson et al. 1999, Riezler et al.
2000, Miyao et al. 2003, Malouf and van Noord
2004, Kaplan et al. 2004, Miyao and Tsujii 2005)
Training
HPSG Treebank
Statistics (Model Parameters)
24Probabilistic HPSG
w A blue eyes girl with white hair and skin
walked
S
NP
NP
PP
T
NP
NP
NP
NP
VP
A blue eyes girl with white hair and skin walked
25Probabilistic HPSG
w A blue eyes girl with white hair and skin
walked
T1
T2
T3
T4
Tn
All possible parse trees derived from w with a
grammar
p(T3w) is the probability of selecting T3 from
T1, T2, , and Tn.
26Probabilistic HPSG
- Log-linear model for unification-based grammars
(Abney 1997, Johnson et al. 1999, Riezler et al.
2000, Miyao et al. 2003, Malouf and van Noord
2004, Kaplan et al. 2004, Miyao and Tsujii 2005) - Input sentence w
- ww1/P1, w2/P2, w3/P3,,wn/Pn
- Output parse tree T
word
POS
feature function
a weight for a feature function
normalization factor
27Log-Linear ModelMaximum Entropy Model
w A blue eyes girl with white hair and skin
walked
All parse trees derived from w with a grammar
T1
T2
T3
T4
Tn
f1(T1)1 f2(T1)0 f3(T1)0 fm(T1)1
f1(T2)1 f2(T2)1 f3(T2)1 fm(T2)1
f1(T3)1 f2(T3)1 f3(T3)0 fm(T3)0
f1(T4)1 f2(T4)0 f3(T4)1 fm(T4)1
f1(Tn)0 f2(Tn)1 f3(Tn)0 fm(Tn)0
feature functions are indicators that indicate
the properties that the parse tree has.
28Example of Features in Probabilistic HPSG
rule name
left daughters head lexical entry
distance of head words
left daughters POS
CAT
verb
ltgt
comma exists or not
SUBCAT
left daughters category
CAT
verb
left daughters head word
ltNPgt
SUBCAT
left daughters span
CAT
noun
CAT
verb
CAT
verb
ltgt
ltVPgt
ltNPgt
SUBCAT
SUBCAT
SUBCAT
29Performance of Semantic Parser
30Difficulties in HPSG Parsing
- Difficulty of developing a broad-coverage HPSG
grammar - Difficulty of disambiguation
- No treebank for training an HPSG grammar
- No probabilistic model for HPSG
- Efficiency
- Very slow CFG filtering, Efficient search,
Feature Forest
31(No Transcript)
32CKY parsing
Prob 0.002
Prob 0.003
Prob 0.010
Prob 0.075
Feature Forest Model (Miyao and Tsujii,
20012008)
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
John
Mary
33Beam Search and Iterative WideningNinomiya 2005
Iterative Global thresholding
Iterative
Local thresholding (numwidth)
Local thresholding Global thresholding
34Distribution of Parsing time for Sentence Length
(Black none) (Red Iterative Parsing)
35Performance of Semantic Parser
36Scalability of TM Tools - MEDIE
Target Corpus MEDLINE corpus
Suppose, for example, that it takes one second
for parsing one sentence.
70 million seconds, that is, about 2 years
37TM and GRIDNinomiya 2006, Taura 2004
- Solution
- The entire MEDLINE were parsed by distributed PC
clusters consisting of 340 CPUs - Parallel processing was managed by grid platform
GXP - Experiments
- The entire MEDLINE was parsed in 8 days
- Output
- Syntactic parse trees and predicate argument
structures in XML format - The data sizes of compressed/uncompressed output
were 42.5GB/260GB.
38More Accurate and Efficient Parser- Current
Research -
Research on Advanced Natural Language Processing
and Text Mining aNT Grant-in-Aid for Specially
promoted research, MEXT (2006-2011)
39Selection of Lexical Entries
- Reference distribution of unigram lexical entry
selection (Miyao Tsujii 2005) - Filtering unlikely lexical entries during
parameter estimation - Unigram lexical entry selection
reference distribution
lexical entry
POS
word
40CKY parsing
Prob 0.002
Prob 0.003
Prob 0.010
Prob 0.075
Selection of Lexical Entries Is crucial.
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
John
Mary
41Selection of Lexical EntriesSuper-Tagging
- Reference distribution of unigram lexical entry
selection (Miyao Tsujii 2005) - Filtering unlikely lexical entries during
parameter estimation - Unigram lexical entry selection
reference distribution
Super-tagger
lexical entry
POS
word
42Super-tagging and HPSG
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
- An example of a complex syntactic tree
- SLASH, REL features explain non-local
dependencies - WH movement, topicalization, relative clauses
Mapping of a syntactic tree - passive in a
relative clause- to the predicate argument
structure
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD detSUBJ lt gtCOMPS lt gt
1
1
the
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD verbSUBJ lt gtCOMPS lt gtREL lt gt
2
1
2
prices
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
2
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
3
3
we
2
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
HEAD verbSUBJ lt gtCOMPS lt gt
3
3
4
4
2
charged
were
43Deep Parser with Super-Tagging
- Accuracy of predicate-argument dependencies and
parsing time (Section 23 ? 100 words, Gold POS)
44Integrated Model vs. Staged Model
Super-Tagger
Deterministic Parser
45System Overview Matsuzaki, et.al. 2007
...
input sentence
Mary loved John
Enumeration of assignments
Supertagger
Deterministicdisambiguation
Prob.
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
Mary
John
46Enumaration of the maybe-parsable LE assignments
Derived from the HPSG grammar
Enumeration of thehighest-prob. LE sequences
Deterministic Parser
Torisawa, Tsujii 2000
Supertaggingresult
CFG-filter
(
...
47Deterministic S-R Parser
Initial state
Q
S
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
Mary
John
loved
48argmax F(a, S, Q) SHIFT
Q
S
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
Mary
loved
John
49argmax F(a, S, Q) SHIFT
Q
S
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
John
loved
Mary
50 argmax F(a, S, Q) REDUCE(Head_Comp)
Q
S
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ lt1NPgtCOMPS ltgt
Head-Comp-Schema
Mary
HEAD verbSUBJ lt1gtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
loved
John
51Experiment Results
6 times faster 20 times faster than the initial
model
52Richer Models Domain Adaptation
- Low parsing accuracy for different domains
- Ex.) Enju trained on the Penn Treebank
- Penn Treebank 89.81 (F-score)
- GENIA (biomedical domain) 86.39 (F-score)
- Re-training a probabilistic model on the domain
- Small training data for the target domain
- Penn Treebank 39,832 sentences
- GENIA 10,848 sentences (gtgt other domains)
Kim et al., 1998
53Adaptation with Reference Distribution
Lexical Assignment
Syntactic Preference
Feature function
Feature weight
Original model
54Performance of Adaptation Models Hara 2007
Corpus size vs. accuracy
Training time vs. accuracy
90
90
89
89
Original pE(ts) for the Penn Treebank 89.81,
the training time is 10 times less than the
naïve model.
Parsing accuracy (F-score)
88
88
87
87
86
86
20000
Baseline Original pE(ts) for the GENIA, 86.4
4000
6000
8000
0
0
2000
Training time (sec.)
of GENIA training sentences
55Adaptation with Reference Distribution
Lexical Assignment
Syntactic Preference
Independent of the original model
Feature function
Feature weight
Original model
56NER and Knowledge-based Processing
ONTOLOGY
ANNOTATION
MEMM, CRF
3) selective deletion of the functional nuclear
localization signal present in theRel homology
domain of NF-kappa B p65 disrupts its ability to
engage I kappa B/MAD-3, and 4)
TEXT
57Adaptation with Reference Distribution
Lexical Assignment
Syntactic Preference
Relation, Event Recognition
NER results as soft constraints
Feature function
Feature weight
Original model
58Conclusions
59Conclusions Lessons
- A Deep Parser, which produces semantic
representation, has become a practical option - Integrated Model to Staged Model, lower level
processings with rich context
60Super-tagging and HPSG
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
- An example of a complex syntactic tree
- SLASH, REL features explain non-local
dependencies - WH movement, topicalization, relative clauses
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD detSUBJ lt gtCOMPS lt gt
1
1
the
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD verbSUBJ lt gtCOMPS lt gtREL lt gt
2
1
2
prices
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
2
RR results
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
3
3
NER results
we
2
ER results
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
HEAD verbSUBJ lt gtCOMPS lt gt
3
3
4
4
2
charged
were
61Conclusions Lessons
- A Deep Parser, which produces semantic
representation, has become a practical option - Integrated Model to Staged Model, lower level
Processing with rich context - Deterministic Parser with classifiers based on
rich linguistic and extra-linguistic information -
62argmax F(a, S, Q) SHIFT
Q
S
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
it
I
like
NER results
RR results
ER results
63Conclusions Lessons
- A Deep Parser, which produces semantic
representation, has become a practical option - Integrated Model to Staged Model, lower level
Processing with rich context - Deterministic Parser with classifiers based on
rich linguistic and extra-linguistic information - Combination of Constraints Preferences, more
robust parsers -
64Grammar Engineering
- Evaluation Standards for Deep Parsers
- Modular Constructions of Deep Parsers
- Semantic Frames of Verbs
- New categories of NEs, New NERs/RRs to be plugged
in - Modules for processing formula, punctuations,
etc. - Dictionaries
- Quick Adaptation of Statistical Components
- Preparation of Annotated Corpora
65Thank You !
The field has matured, ready to be used
by applications.
Integration of linguistic grammar formalisms with
statistical models. Distinction between
computational relationships and processes
Robust, efficient and open to eclectic sources
of information other than syntactic ones
. Information Extraction
Question/Answering Machine Translation