Deep parsers for realworld application - PowerPoint PPT Presentation

1 / 59

About This Presentation

Title:

Deep parsers for realworld application

Description:

none – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 60

Provided by: wwwcsliS

Category:

more less

Transcript and Presenter's Notes

Title: Deep parsers for realworld application

1
Deep parsers for real-world application
Takuya Matsuzaki Univ. of Tokyo
Tadayoshi Hara Univ. of Tokyo
Kenji Sagae Univ. of Southern California
Yoshimasa Tsuruoka Univ. of Manchester
Takashi Ninomiya Univ. of Tokyo
Yusuke Miyao Univ. of Tokyo
Jin-Dong Kim Univ. of Tokyo
Tomoko Ohta Univ. of Tokyo
Yuka Tateisi Kogakuin University
Research on Advanced Natural Language Processing
and Text Mining aNT Grant-in-Aid for specially
promoted research, MEXT (2006-2011)

Junichi TSUJII
Univ. of Tokyo
Univ. of Manchester

WorkshopGEAF, Coling 2008, Manchester
2
Sentence Parsing
Nerd
IT Businessman
The field has matured, ready to be used
by applications.
Parsing based on a proper linguistic formalism is
one of the core research fields in CL and NLP.
Integration of linguistic grammar formalisms with
statistical models. Distinction between
computational relationships and processes
It was considered as a monolithic, esoteric and
inward looking field, largely dissociated from
real world application.
Robust, efficient and open to eclectic sources
of information other than syntactic ones
. Information Extraction
Question/Answering Machine Translation

3
Deep parser which produces semantic
representation
S
Predicate Activate Arg1 P53
Arg2 Bcl-2 Protein
VP
An example from Bio-TM
VP
VP
S
VP
arg3
arg1
NP
ADVP
NP
arg2
arg2
p53 has been shown to directly
activate the Bcl-2 protein
4
??/?????HPSG??? (Enju)???
s
Sentence Retrieval System Using Semantic
Representation MEDIE
vp
vp
np
pp
arg2
arg1
mod
dt np vp vp pp
np
DT NN VBZ VBN IN PRP
The protein is activated by it
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
Computational Relationship HPSG
9
HPSG

HPSG Lexical entries Grammar rules
Lexical entries syntactic and semantic
descriptions of word-specific behaviors
c.f. Enju grammar (Miyao et al 2004) has 3797
lexical entries for 10,536 words
Grammar rules non-word-specific syntactic and
semantic configurations
c.f. Enju grammar has 12 grammar rules

10
HPSGComputational Relationships
lexical entry (leaf node)
HEAD verb SUBJ ltHEAD noungt COMPS ltlt HEAD noungtgt
HEAD noun SUBJ ltgt COMPS ltgt
HEAD noun SUBJ ltgt COMPltgt
John
Mary
loved
11
HPSGComputational Relationships
HEAD SUBJ COMPS
1

2
4
grammar rule
HEAD SUBJ COMPS
1
2
5

3
lt , gt
4
3
unification
unification
HEAD verb SUBJ ltHEAD noungt COMPS ltlt HEAD noungtgt
HEAD noun SUBJ ltgt COMPS ltgt
John
Mary
loved
12
HPSGComputational Relationships
propagation of information
HEAD SUBJ COMPS
1

2
ltgt
4
grammar rule
HEAD SUBJ COMPS
1
2
5

3
lt , gt
4
3
unification
HEAD verb SUBJ ltHEAD noungt COMPS ltlt HEAD noungtgt
HEAD noun SUBJ ltgt COMPS ltgt
John
Mary
loved
13
HPSGComputational Relationships
HEAD verb SUBJ ltHEAD noungt COMPS ltgt
HEAD verb SUBJ ltHEAD noungt COMPS ltlt HEAD noungtgt
HEAD noun SUBJ ltgt COMPS ltgt
Mary
loved
14
HPSGComputational Relationships

Relationships are derived by applying grammar
rules recursively

HEAD verb SUBJ ltgt COMPS ltgt
Predicate Love Arg1 Mary Arg2 John
2
3
HEAD verb SUBJ ltHEAD noungt COMPS ltgt
2
HEAD verb SUBJ ltHEAD noungt COMPS ltlt HEAD noungtgt
HEAD noun SUBJ ltgt COMPS ltgt
3
Mary
loved
15
HPSG Relationships among different layers of
representation
The information is mostly written in a lexical
entry
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt

An example of a complex syntactic tree
SLASH, REL features explain non-local
dependencies
WH movement, topicalization, relative clauses

Mapping a syntactic tree - passive in relative
clause construction - to the predicate argument
structure
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD detSUBJ lt gtCOMPS lt gt
1
1
the
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD verbSUBJ lt gtCOMPS lt gtREL lt gt
2
1
2
prices
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
2
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
3
3
we
2
CHARGE Arg1 Unknown Arg2 Price
Arg3 We
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
HEAD verbSUBJ lt gtCOMPS lt gt
3
3
4
4
2
charged
were
16

Parsing based on HPSG (Pollard Sag 1994)
Mathematically well-defined with sophisticated
constraint-based system
Linguistically justified
Deep syntactic grammar that provides semantic
analysis

10 years ago
Unrealistic solutions for real-world text
17
Combining HPSG with Statistical Models
18
Difficulties in Parsing based on HPSG

Difficulty of developing a broad-coverage HPSG
grammar
Difficulty of disambiguation
No tree bank for training an HPSG grammar
No probabilistic model for HPSG
Efficiency
Very slow CFG filtering, Efficient search,
Feature Forest

19
Difficulties in Parsing based on HPSG

Difficulty of developing a broad-coverage HPSG
grammar
Difficulty of disambiguation
No treebank for training an HPSG grammar
No probabilistic model for HPSG
Efficiency
Very slow CFG filtering, Efficient search,
Feature Forest

20
Grammar with Broad Coverage

Treebank for Grammar development and evaluation
Treebank grammar
Enju (Miyao et al. 2004)
Treebank development
Redwood (Oepen et al. 2002)
Hinoki (Bond et al. 2004)

Rule Application
HPSG Grammar
HPSG Treebank
Penn Treebank
Lexical Knowledge Acquisition
HPSG Treebank

Sentences
HPSG Grammar
21
Performance of Semantic Parser
22
Difficulties in HPSG Parsing

Difficulty of developing a broad-coverage HPSG
grammar
Difficulty of disambiguation
No treebank for training an HPSG grammar
No probabilistic model for HPSG
Efficiency
Very slow CFG filtering, Efficient search,
Feature Forest

23
Probabilistic Model and HPSG

Probabilistic model
Log-linear model for unification-based grammars
(Abney 1997, Johnson et al. 1999, Riezler et al.
2000, Miyao et al. 2003, Malouf and van Noord
2004, Kaplan et al. 2004, Miyao and Tsujii 2005)

Training
HPSG Treebank
Statistics (Model Parameters)
24
Probabilistic HPSG
w A blue eyes girl with white hair and skin
walked
S
NP
NP
PP
T
NP
NP
NP
NP
VP
A blue eyes girl with white hair and skin walked
25
Probabilistic HPSG
w A blue eyes girl with white hair and skin
walked
T1
T2
T3
T4
Tn

All possible parse trees derived from w with a
grammar
p(T3w) is the probability of selecting T3 from
T1, T2, , and Tn.
26
Probabilistic HPSG

Log-linear model for unification-based grammars
(Abney 1997, Johnson et al. 1999, Riezler et al.
2000, Miyao et al. 2003, Malouf and van Noord
2004, Kaplan et al. 2004, Miyao and Tsujii 2005)
Input sentence w
ww1/P1, w2/P2, w3/P3,,wn/Pn
Output parse tree T

word
POS
feature function
a weight for a feature function
normalization factor
27
Log-Linear ModelMaximum Entropy Model
w A blue eyes girl with white hair and skin
walked
All parse trees derived from w with a grammar
T1
T2
T3
T4
Tn

f1(T1)1 f2(T1)0 f3(T1)0 fm(T1)1
f1(T2)1 f2(T2)1 f3(T2)1 fm(T2)1
f1(T3)1 f2(T3)1 f3(T3)0 fm(T3)0
f1(T4)1 f2(T4)0 f3(T4)1 fm(T4)1
f1(Tn)0 f2(Tn)1 f3(Tn)0 fm(Tn)0
feature functions are indicators that indicate
the properties that the parse tree has.
28
Example of Features in Probabilistic HPSG
rule name
left daughters head lexical entry
distance of head words
left daughters POS
CAT
verb
ltgt
comma exists or not
SUBCAT
left daughters category
CAT
verb
left daughters head word
ltNPgt
SUBCAT
left daughters span
CAT
noun
CAT
verb
CAT
verb
ltgt
ltVPgt
ltNPgt
SUBCAT
SUBCAT
SUBCAT

29
Performance of Semantic Parser
30
Difficulties in HPSG Parsing

Difficulty of developing a broad-coverage HPSG
grammar
Difficulty of disambiguation
No treebank for training an HPSG grammar
No probabilistic model for HPSG
Efficiency
Very slow CFG filtering, Efficient search,
Feature Forest

31
(No Transcript)
32
CKY parsing
Prob 0.002
Prob 0.003
Prob 0.010
Prob 0.075
Feature Forest Model (Miyao and Tsujii,
20012008)
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
John
Mary
33
Beam Search and Iterative WideningNinomiya 2005
Iterative Global thresholding
Iterative
Local thresholding (numwidth)
Local thresholding Global thresholding
34
Distribution of Parsing time for Sentence Length
(Black none) (Red Iterative Parsing)
35
Performance of Semantic Parser
36
Scalability of TM Tools - MEDIE
Target Corpus MEDLINE corpus
Suppose, for example, that it takes one second
for parsing one sentence.
70 million seconds, that is, about 2 years
37
TM and GRIDNinomiya 2006, Taura 2004

Solution
The entire MEDLINE were parsed by distributed PC
clusters consisting of 340 CPUs
Parallel processing was managed by grid platform
GXP
Experiments
The entire MEDLINE was parsed in 8 days
Output
Syntactic parse trees and predicate argument
structures in XML format
The data sizes of compressed/uncompressed output
were 42.5GB/260GB.

38
More Accurate and Efficient Parser- Current
Research -
Research on Advanced Natural Language Processing
and Text Mining aNT Grant-in-Aid for Specially
promoted research, MEXT (2006-2011)
39
Selection of Lexical Entries

Reference distribution of unigram lexical entry
selection (Miyao Tsujii 2005)
Filtering unlikely lexical entries during
parameter estimation
Unigram lexical entry selection

reference distribution
lexical entry
POS
word
40
CKY parsing
Prob 0.002
Prob 0.003
Prob 0.010
Prob 0.075
Selection of Lexical Entries Is crucial.
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
John
Mary
41
Selection of Lexical EntriesSuper-Tagging

Reference distribution of unigram lexical entry
selection (Miyao Tsujii 2005)
Filtering unlikely lexical entries during
parameter estimation
Unigram lexical entry selection

reference distribution
Super-tagger
lexical entry
POS
word
42
Super-tagging and HPSG
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt

An example of a complex syntactic tree
SLASH, REL features explain non-local
dependencies
WH movement, topicalization, relative clauses

Mapping of a syntactic tree - passive in a
relative clause- to the predicate argument
structure
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD detSUBJ lt gtCOMPS lt gt
1
1
the
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD verbSUBJ lt gtCOMPS lt gtREL lt gt
2
1
2
prices
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
2
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
3
3
we
2
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
HEAD verbSUBJ lt gtCOMPS lt gt
3
3
4
4
2
charged
were
43
Deep Parser with Super-Tagging

Accuracy of predicate-argument dependencies and
parsing time (Section 23 ? 100 words, Gold POS)

44
Integrated Model vs. Staged Model
Super-Tagger
Deterministic Parser
45
System Overview Matsuzaki, et.al. 2007
...
input sentence
Mary loved John
Enumeration of assignments
Supertagger
Deterministicdisambiguation
Prob.
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
Mary
John
46
Enumaration of the maybe-parsable LE assignments
Derived from the HPSG grammar
Enumeration of thehighest-prob. LE sequences
Deterministic Parser
Torisawa, Tsujii 2000
Supertaggingresult
CFG-filter
(
...
47
Deterministic S-R Parser
Initial state
Q
S
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
Mary
John
loved
48
argmax F(a, S, Q) SHIFT
Q
S
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
Mary
loved
John
49
argmax F(a, S, Q) SHIFT
Q
S
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
John
loved
Mary
50

argmax F(a, S, Q) REDUCE(Head_Comp)
Q
S
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ lt1NPgtCOMPS ltgt
Head-Comp-Schema
Mary
HEAD verbSUBJ lt1gtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
loved
John
51
Experiment Results
6 times faster 20 times faster than the initial
model
52
Richer Models Domain Adaptation

Low parsing accuracy for different domains
Ex.) Enju trained on the Penn Treebank
Penn Treebank 89.81 (F-score)
GENIA (biomedical domain) 86.39 (F-score)
Re-training a probabilistic model on the domain
Small training data for the target domain
Penn Treebank 39,832 sentences
GENIA 10,848 sentences (gtgt other domains)

Kim et al., 1998
53
Adaptation with Reference Distribution
Lexical Assignment
Syntactic Preference
Feature function
Feature weight
Original model
54
Performance of Adaptation Models Hara 2007
Corpus size vs. accuracy
Training time vs. accuracy
90
90
89
89
Original pE(ts) for the Penn Treebank 89.81,
the training time is 10 times less than the
naïve model.
Parsing accuracy (F-score)
88
88
87
87
86
86
20000
Baseline Original pE(ts) for the GENIA, 86.4
4000
6000
8000
0
0
2000
Training time (sec.)
of GENIA training sentences
55
Adaptation with Reference Distribution
Lexical Assignment
Syntactic Preference
Independent of the original model
Feature function
Feature weight
Original model
56
NER and Knowledge-based Processing
ONTOLOGY
ANNOTATION
MEMM, CRF
3) selective deletion of the functional nuclear
localization signal present in theRel homology
domain of NF-kappa B p65 disrupts its ability to
engage I kappa B/MAD-3, and 4)
TEXT
57
Adaptation with Reference Distribution
Lexical Assignment
Syntactic Preference
Relation, Event Recognition
NER results as soft constraints
Feature function
Feature weight
Original model
58
Conclusions
59
Conclusions Lessons

A Deep Parser, which produces semantic
representation, has become a practical option
Integrated Model to Staged Model, lower level
processings with rich context

60
Super-tagging and HPSG
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt

An example of a complex syntactic tree
SLASH, REL features explain non-local
dependencies
WH movement, topicalization, relative clauses

HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD detSUBJ lt gtCOMPS lt gt
1
1
the
HEAD nounSUBJ lt gtCOMPS lt gtSPR lt gt
HEAD verbSUBJ lt gtCOMPS lt gtREL lt gt
2
1
2
prices
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
2
RR results
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
3
3
NER results
we
2
ER results
HEAD verbSUBJ lt gtCOMPS lt gtSLASH lt gt
HEAD verbSUBJ lt gtCOMPS lt gt
3
3
4
4
2
charged
were
61
Conclusions Lessons

A Deep Parser, which produces semantic
representation, has become a practical option
Integrated Model to Staged Model, lower level
Processing with rich context
Deterministic Parser with classifiers based on
rich linguistic and extra-linguistic information

62
argmax F(a, S, Q) SHIFT
Q
S
HEAD nounSUBJ lt gtCOMPS lt gt
HEAD verbSUBJ ltNPgtCOMPS ltNPgt
HEAD nounSUBJ lt gtCOMPS lt gt
it
I
like
NER results
RR results
ER results
63
Conclusions Lessons

A Deep Parser, which produces semantic
representation, has become a practical option
Integrated Model to Staged Model, lower level
Processing with rich context
Deterministic Parser with classifiers based on
rich linguistic and extra-linguistic information
Combination of Constraints Preferences, more
robust parsers

64
Grammar Engineering

Evaluation Standards for Deep Parsers
Modular Constructions of Deep Parsers
Semantic Frames of Verbs
New categories of NEs, New NERs/RRs to be plugged
in
Modules for processing formula, punctuations,
etc.
Dictionaries
Quick Adaptation of Statistical Components
Preparation of Annotated Corpora

65
Thank You !
The field has matured, ready to be used
by applications.
Integration of linguistic grammar formalisms with
statistical models. Distinction between
computational relationships and processes
Robust, efficient and open to eclectic sources
of information other than syntactic ones
. Information Extraction
Question/Answering Machine Translation

Write a Comment

User Comments (0)