Grammatical processing with LFG and XLE - PowerPoint PPT Presentation

Loading...

PPT – Grammatical processing with LFG and XLE PowerPoint presentation | free to download - id: 65802c-NDY5Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Grammatical processing with LFG and XLE

Description:

Title: Contextually-related Entities Author: Francine Chen Last modified by: Ron Kaplan Created Date: 8/15/2003 12:17:06 AM Document presentation format – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 125
Provided by: Franci67
Learn more at: http://www.itl.nist.gov
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Grammatical processing with LFG and XLE


1
Grammatical processing with LFG
and XLE
  • Ron Kaplan
  • ARDA Symposium, August 2004

2
Layered Architecture for Question Answering
XLE/LFG Parsing
Target KRR
Text
KR Mapping
F-structure
Conceptual semantics
KR
Sources
Assertions
Question
Query
Answers Explanations Subqueries
Composed F-StructureTemplates
Text to user
XLE/LFG Generation
Text
3
Layered Architecture for Question Answering
XLE/LFG Parsing
Target KRR
Text
KR Mapping
F-structure
Conceptual semantics
KR
Sources
Assertions
Question
Query
Answers Explanations Subqueries
Composed F-StructureTemplates
Text to user
XLE/LFG Generation
Text
4
Layered Architecture for Question Answering
XLE/LFG Parsing
Target KRR
Text
KR Mapping
F-structure
Conceptual semantics
KR
Sources
Assertions
Question
Query
5
Deep analysis matters if you care about
the answer
  • Example
  • A delegation led by Vice President Philips, head
    of the chemical division, flew to Chicago a
    week after the incident.
  • Question Who flew to Chicago?
  • Candidate answers
  • division closest noun
  • head next closest
  • V.P. Philips next

6
F-structure localizes arguments
  • Was John pleased?
  • John was easy to please Yes
  • John was eager to please Unknown

lexical dependency
7
Topics
  • Basic LFG architecture
  • Ambiguity management in XLE
  • Pargram project Large scale grammars
  • Robustness
  • Stochastic disambiguation
  • Shallow markup
  • Semantic interpretation

Focus on the language end, not knowledge
8
The Language Mapping LFG XLE
StochasticModel
LFGGrammar
NamedEntities
English, German, etc.
Parse
Functional structures
Sentence
TokensMorphology
Generate
XLE
XLE Efficient ambiguity management
9
Why deep analysis is difficult
  • Languages are hard to describe
  • Meaning depends on complex properties of words
    and sequences
  • Different languages rely on different properties
  • Errors and disfluencies
  • Languages are hard to compute
  • Expensive to recognize complex patterns
  • Sentences are ambiguous
  • Ambiguities multiply explosion in time and
    space

10
Different patterns code same meaning
The small children are chasing the dog.
English Group, order
Japanese Group, mark
11
Different patterns code same meaning
The small children are chasing the dog.
LFG theory minor adjustments on universal theme
English Group, order
Japanese Group, mark
chase(small(children), dog)
12
LFG architecture
Modularity Nearly-decomposable
  • C(onstituent)-structures and F(unctional)-structur
    es

related by a piecewise correspondence ?
S
NP
VP
John
NP
V
Mary
likes
Formal encoding of grammatical relations
Formal encoding of order and grouping
13
LFG grammar
Rules
Lexical entries
S ? NP VP (? SUBJ)? ??
N ? John (? PRED)John
(? NUM)SG V ? likes (?
PRED)likeltSUBJ, OBJgt (? SUBJ NUM)SG
(? SUBJ PERS)3
VP ? V (NP) ? ? (? OBJ)?
NP ? (Det) N ? ? ??
  • Context-free rules define valid c-structures
    (trees).
  • Annotations on rules give constraints that
    corresponding f-structures must satisfy.
  • Satisfiability of constraints determines
    grammaticality.
  • F-structure is solution for constraints (if
    satisfied).

14
Rules as well-formedness conditions
S
?
SUBJ
NP
VP
If denotes a particular daughter ?
f-structure of mother ?(M()) ?
f-structure of daughter ?()
A tree containing S over NP - VP is OK if
F-unit corresponding to NP node is SUBJ of f-unit
corresponding to S node The same f-unit
corresponds to both S and VP nodes.
15
Inconsistent equations Ungrammatical
  • Whats wrong with They walks ?

S
NP
VP
they
walks
f v and (v SUBJ NUM)SG gt (f
SUBJ NUM)SG
If a valid inference chain yields FALSE, the
premises are unsatisfiable, no f-structure.
16
English and Japanese
17
Warlpiri Discontinuous constituents
Like Japanese Any number of NPs
Particle on each defines its grammatical function
18
English Discontinuity in questions
Who did Mary see? Who did Bill think Mary
saw? Who did Bill think saw Mary?
OBJ COMP OBJ COMP SUBJ
Who is understood as subject/object of distant
verb.Uncertainty which function of which verb?
S ? NP S (? Q)? ?? (? COMP SUBJOBJ)?
19
Summary Lexical Functional Grammar
Kaplan and Bresnan, 1982
  • Modular c-structure/f-structure in
    correspondence
  • Mathematically simple, computationally
    transparent
  • Combination of Context-free grammar,
    Quantifier-free equality theory
  • Closed under composition with regular relations
    finite-state morphology
  • Grammatical functions are universal primitives
  • Subject and Object expressed differently in
    different languages
  • English Subject is first NP
  • Japanese Subject has ga
  • But Subject and Object behave similarly in all
    languages
  • Active to Passive Object becomes Subject
  • English move words Japanese
    move ga
  • Adopted by world-wide community of linguists
  • Large literature papers, (text)books,
    conferences reference theory
  • (Relatively) easy to describe all languages
  • Linguists contribute to practical computation
  • Stable Only minor changes in 25 years

20
Efficient computation with LFG
grammars Ambiguity Management in XLE
21
Computation challenge Pervasive ambiguity
22
Coverage vs. Ambiguity
  • I fell in the park.
  • I know the girl in the park.
  • I see the girl in the park.

23
Ambiguity can be explosive
  • If alternatives multiply within or across
    components

Tokenize
Morphology
Syntax
Semantics
Knowledge
24
Computational consequences of ambiguity
  • Serious problem for computational systems
  • Broad coverage, hand written grammars frequently
    produce thousands of analyses, sometimes millions
  • Machine learned grammars easily produce hundreds
    of thousands of analyses if allowed to parse to
    completion
  • Three approaches to ambiguity management
  • Prune block unlikely analysis paths early
  • Procrastinate do not expand alternative analysis
    paths until something else requires them
  • Also known as underspecification
  • Manage compact representation and computation of
    all possible analyses

25
Pruning ? Premature Disambiguation
  • Conventional approach Use heuristics to kill as
    soon as possible

X
X
X
Tokenize
Morphology
Syntax
Semantics
Knowledge
X
Fast computation, wrong result
26
Procrastination Passing the Buck
  • Chunk parsing as an example
  • Collect noun groups, verb groups, PP groups
  • Leave it to later processing to put these
    together
  • Some combinations are nonsense
  • Later processing must either
  • Call (another) parser to check constraints
  • Have its own model of constraints ( grammar)
  • Solve constraints that chunker includes with
    output

27
Computational Complexity of LFG
  • LFG is simple combination of two simple theories
  • Context-free grammars for trees
  • Quantifier free theory of equality for
    f-structures
  • Both theories are easy to compute
  • Cubic CFG Parsing
  • Linear equation solving
  • Combination is difficult Parsing problem is NP
    Complete
  • Exponential/intractible in the worst case (but
    computable, unlike some other linguistic theories
  • Can we avoid the worst case?

28
Some syntactic dependencies
  • Local dependencies These dogs This
    dogs (agreement)
  • Nested dependencies The dogs in the park
    bark (agreement)
  • Cross-serial dependencies Jan Piet
    Marie zag helpen zwemmen (predicate/argument
    map)

See(Jan, help(Piet, swim(Marie)))
  • Long distance dependencies
  • The girl who John says that Bob believes
    likes Henry left.

Left(girl) ? Says(John, believes(Bob,
(likes(girl, Henry))))
29
Expressiveness vs. complexity
The Chomsky Hierarchy
n is length of sentence
Type Dependency ComputationalComplexity
Regular Local O(n)
Context-free Nested O(n3)
Context-sensitive Cross-serial and Long Distance O(2n)
Linear
Cubic
Exponential
But languages have mostly local and nested
dependencies... so (mostly) cubic performance
should be possible.
30
NP Complete Problems
  • Problems that can be solved by a Nondeterministic
    Turing Machine in Polynomial time
  • General characterization Generate and test
  • Lots of candidate solutions that need to be
    verified for correctness
  • Every candidate is easy to confirm or disconfirm

n elements
Nondeterministic TM has an oracle that provides
only the right candidates to test, doesnt search.
Deterministic TM doesnt have oracle, must test
all (exponentially many) candidates.
2n candidates
31
Polynomial search problems
  • Subparts of a candidate are independent of other
    parts outcome is not influenced by other parts
    (context free)
  • The same independent subparts appear in many
    candidates
  • We can (easily) determine that this is the case
  • Consequence test subparts independent of
    context, share results

32
Why is LFG parsing NP Complete?
  • Classic generate-and-test search problem
  • Exponentially many tree-candidates
  • CFG chart parser quickly produces packed
    representation of all trees
  • CFG can be exponentially ambiguous
  • Each tree must be tested for f-structure
    satisfiability
  • Boolean combinations of per-tree constraints
  • English base verbs Not 3rd singular
  • (? SUBJ NUM)?SG ? (? SUBJ PERS)?3
  • Disjunction!

Exponentially many exponential problems
33
XLE Ambiguity Management The intuition
How many sheep? How many fish?
The sheep saw the fish.
  • Packed representation is a free choice system
  • Encodes all dependencies without loss of
    information
  • Common items represented, computed once
  • Key to practical efficiency

34
Dependent choices
but its
wrong Doesnt encode all dependencies, choices
are not free.
Again, packing avoids duplication
Who do you want to succeed? I want to
succeed John want intrans, succeed trans I
want John to succeed want trans, succeed
intrans
35
Solution Label dependent choices
  • Label each choice with distinct Boolean
    variables p, q, etc.
  • Record acceptable combinations as a Boolean
    expression ?
  • Each analysis corresponds to a satisfying
    truth-value assignment
  • (free choice from the true lines of ?s
    truth table)

36
Boolean Satisfiability
Can solve Boolean formulas by multiplying out
Disjunctive Normal Form
  • Produces simple conjunctions of literal
    propositions (facts--equations)
  • Easy checks for satisfiability
  • If a?d ? FALSE, replace any conjunction with
    a and d by FALSE.
  • Blow-up of disjunctive structure before fact
    processing
  • Individual facts are replicated (and
    re-processed) Exponential

37
Alternative Contexted normal form
(a ? b) ? x ? (c ? d)
p?a ? ?p?b ? x ? q?c ? ?q?d
context
fact
  • Produce a flat conjunction of contexted facts

38
Alternative Contexted normal form
(a ? b) ? x ? (c ? d)
p?a ? ?p?b ? x ? q?c ? ?q?d
  • Each fact is labeled with its position in the
    disjunctive structure
  • Boolean hierarchy discarded

Produce a flat conjunction of contexted facts
  • No blow-up, no duplicates
  • Each fact appears and can be processed once
  • Claims
  • Checks for satisfiability still easy
  • Facts can be processed first, disjunctions
    deferred

39
A sound and complete method
Maxwell Kaplan, 1987, 1991
  • Conversion to logically equivalent contexted form
  • Lemma ? ? ? iff p ? ? ? ?p ? ? (p a new
    Boolean variable)
  • Proof (If) If ? is true, let p be true, in
    which case p ? ? ? ?p ? ? is true.
  • (Only if) If p is true, then ? is true, in
    which case ? ? ? is true.

40
Test for satisfiability
Suppose R ? FALSE is deduced from a contexted
formula ?. Then ? is satisfiable only if ?R.
E.g. R ? SGPL ? R ? FALSE.
?R is called a nogood context.
  • Perform all fact-inferences, conjoining contexts
  • If infer FALSE, add context to nogoods
  • Solve conjunction of nogoods
  • Boolean satisfiability exponential in nogood
    context-Booleans
  • Independent facts no FALSE, no nogoods
  • Implicitly notices independence/context-freeness

41
Example 1
  • They walk
  • No disjunction, all facts are in the default
    True context
  • No change to inference
  • T?(f SUBJ NUM)SG ? T?(f SUBJ NUM)SG
    ? T? SGSG
  • reduces to (f SUBJ NUM)SG ? (f SUBJ
    NUM)SG ? SGSG
  • They walks
  • No disjunction, all facts still in the default
    True context
  • No change to inference
  • T?(f SUBJ NUM)PL ? T?(f SUBJ NUM)SG
    ? T?PLSG ? T?FALSE

Satisfiable iff T, so unsatisfiable
42
Examples 2
  • The sheep walks
  • Disjunction of NUM feature from sheep
  • (f SUBJ NUM)SG ? (f SUBJ NUM)PL
  • Contexted facts
  • p?(f SUBJ NUM)SG ?
  • ?p?(f SUBJ NUM)PL ?
  • (f SUBJ NUM)SG
    (from walks)
  • Inferences
  • p?(f SUBJ NUM)SG ? (f SUBJ NUM)SG ? p? SGSG
  • ?p?(f SUBJ NUM)PL ? (f SUBJ NUM)SG ? ?p? PLSG
    ? ?p? FALSE

?p? FALSE is true iff ?p is false iff p is
True. Conclusion Sentence is grammatical
in context p Only 1 sheep
43
Contexts and packing Index by facts
The sheep saw the fish.
Contexted unification ? concatenation, when
choices dont interact.
44
Compare DNF unification
The sheep saw the fish.
SUBJ NUM SGOBJ NUM SG
?
SUBJ NUM SGSUBJ NUM PL
?
?
OBJ NUM SG OBJ NUM PL
?
?
DNF cross-product of alternatives Exponential
45
The XLE wager (for real sentences of real
languages)
  • Alternatives from distant choice-sets can be
    freely chosen without affecting satisfiability
  • FALSE is unlikely to appear
  • Contexted method optimizes for independence
  • No FALSE ? no nogoods ? nothing to solve.

Bet Worst case 2n reduces to k2m where m ltlt
n
46
Ambiguity-enabled inference Choice-logic common
to all modules
If ? ? ? ? ? is a rule of inference,then so is
C1 ? ? ? C2 ? ? ? (C1?C2) ? ?
1. Substitution of equals for equals (e.g. for
LFG syntax) xy ? ? ? ?x/y
Therefore C1?xy ? C2 ? ? ? (C1?C2)?
?x/y
2. Reasoning Cause(x,y) ?
Prevent(y,z) ? Prevent(x,z) Therefore
C1?Cause(x,y) ? C2?Prevent(y,z) ?
(C1?C2)?Prevent(x,z)
3. Log-linear disambiguation
Prop1(x) ? Prop2(x) ? Count(Featuren)
Therefore C1? Prop1(x) ? C2? Prop2(x) ?
(C1?C2)? Count(Featuren)
Ambiguity-enabled components propagate choices,
can defer choosing, enumerating
47
Summary Contexted constraint satisfaction
  • Packed
  • facts not duplicated
  • facts not hidden in Boolean structure
  • Efficient
  • deductions not duplicated
  • fast fact processing (e.g. equality) can prune
    slow disjunctive processing
  • optimized for independence
  • General and simple
  • applies to any deductive system, uniform across
    modules
  • not limited to special-case disjunctions
  • mathematically trivial
  • Compositional free-choice system
  • enumeration of (exponentially many?) valid
    solutions deferred across module boundaries
  • enables backtrack-free, linear-time, on-demand
    enumeration
  • enables packed refinement by cross-module
    constraints new nogoods

48
The remaining exponential
  • Contexted constraint satisfaction (typically)
    avoids the Boolean explosion in solving
    f-structure constraints for single trees
  • How can we suppress tree enumeration?
  • (and still determine satisfiability)

49
Ordering strategy Easy things first
  • Do all c-structure before any f-structure
    processing
  • Chart is a free choice representation, guarantees
    valid trees
  • Only produce/solve f-structure constraints for
    constituents in complete, well-formed trees
  • NB Interleaved, bottom-up pruning is a bad
    idea
  • Bets on inconsistency, not independence

50
Asking the right question
  • How can we make it faster?
  • More efficient unifier undoable operations,
    better indexing, clever data structures,
    compiling.
  • Reordering for more effective pruning.
  • Why not cubic?
  • Intuitively, the problem isnt that hard.
  • GPSG Natural language is nearly context free.
  • Surely for context-free equivalent grammars!

51
No f-structure filtering, no nogoods...
but still explosive
  • LFG grammar for a context-free language

52
Disjunctive lazy copy
  • Pack functional information from alternative
    local subtrees.
  • Unpack/copy to higher consumers only on demand.

p f1q f2r f3
S
L

S
S
1
4
(? L)? on Sdoesnt accessinternal ? features
S
S
2
5
p f6q f5r f4
S
S
6
3
R

Automatically takes advantage of context
freeness, without grammar analysis or compilation
53
The XLE wager
  • Most feature dependencies are restricted to local
    subtrees
  • mother/daughter/sister interactions
  • maybe a grandmother now and then
  • very rarely span an unbounded distance
  • Optimize for local case
  • bounded computation per subtree gives cubic curve
  • graceful degradation with non-local interactions
    but still correct

54
Packing Equalities in F-structure
S
A1
A2
NP(?SUBJ)?
NP(?SUBJ)?
VP
V
Adj
NP(?NUM)sg
NP??
Adj
visiting
relatives (?NUM)pl
boring
is (?SUBJ NUM)sg
NP
V
visiting
relatives
T A1 ? sgsg
A1 ? (?SUBJ NUM)sg
T ? (?SUBJ NUM)sg
nogood(A2)
A2 ? (?SUBJ NUM)pl
T A2 ? sgpl
55
(No Transcript)
56
a1
a2
57
XLE Performance HomeCentre Corpus
About 1100 English sentences
58
Time is linear in subtrees Nearly cubic
R2.79
2.1 ms/subtree
59
French HomeCentre
R2.80
3.3 ms/subtree
60
German HomeCentre
R2.44
3.8 ms/subtree
61
Generation with LFG/XLE
  • Parse string ? c-structure ? f-structure
  • Generate f-structure ? c-structure ? string
  • Same grammar shared development, maintenance
  • Formal criterion s ? Gen(Parse(s))
  • Practical criterion dont generate everything
  • Parsing robustness ? undesired strings, needless
    ambiguity
  • Use optimality marks to restrict generation
    grammar
  • Restricted (un)tokenizing transducer dont
    allow arbitrary white space, etc.

62
Mathematics and Computation
  • Formal properties
  • Gen(f) is a (possibly infinite) set
  • Equality is idempotent xy ? xy ? xy
  • Longer strings with redundant equations map to
    same f-structure
  • What kind of set?
  • Context-free language (Kaplan Wedekind,
    2000)

63
Computation
  • XLE/LFG generation
  • Convert LFG grammar to CFG only for strings that
    map to f
  • NP complete, ambiguity managed (as usual)
  • All strings in CFL are grammatical w.r.t. LFG
    grammar
  • Composition with regular relations is crucial
  • CFG is a packed, free-choice representation of
    all strings
  • Can use ordinary CF generation algorithms to
    enumerate strings
  • Can defer enumeration, give CFG for client to
    enumerate
  • Can apply other context-free technology
  • Choose shortest string
  • Reduce to finite set of unpumped strings (Context
    free Pumping Lemma)
  • Choose most probable (for fluency, not
    grammaticality)

64
Generating from incomplete f-structures
  • Grammatical features cant be read from
  • Back-end question-answering logic
  • F-structure translated from other language
  • Generating from a bounded underspecification of a
    complete f-structure is still context-free
  • Example a skeleton of predicates
  • Proof CFLs are closed under union, bounded
    extensions produce finite alternatives
  • Generation from arbitrary underspecification is
    undecidable
  • Reduces to undecidable emptiness problem (
    Hilberts 10th) (Dymetman, van Noord,
    Wedekind, Roach)

65
A (light-weight?) approach to QA
Analyze the question, anticipate and search for
possible answer phrases
Question F-structure
Queries
  • Question What is the graph partitioning problem?
  • Generated Queries The graph partitioning
    problem is
  • Answer (Google) The graph partitioning problem
    is defined as dividing a graph into disjoint
    subsets of nodes
  • Question When were the Rolling Stones formed?
  • Generated Queries The Rolling Stones were
    formed formed the Rolling Stones
  • Answer (Google) Mick Jagger, Keith Richards,
    Brian Jones, Bill Wyman, and Charlie Watts
    formed the Rolling Stones in 1962.

66
Pipeline for Answer Anticipation
Question f-structures
Answer f-structures
Convert
Generator
Parser
Question
AnswerPhrases
Search (Google...)
Englishgrammar
Englishgrammar
67
Grammar engineering The Parallel Grammar
Project
68
Pargram project
  • Large-scale LFG grammars for several languages
  • English, German, Japanese, French, Norwegian
  • Coming along Korean, Urdu, Chinese, Arabic,
    Welsh, Malagasy, Danish
  • Intuition Corpus Cover real uses of
    language--newspapers, documents, etc.
  • Parallelism test LFG universality claims
  • Common c- to f-structure mapping conventions
  • (unless typologically motivated variation)
  • Similar underlying f-structures
  • Permits shared disambiguation properties, Glue
    interpretation premises
  • Practical all grammars run on XLE software
  • International consortium of world-class linguists
  • PARC, Stuttgart, Fuji Xerox, Konstanz, Bergen,
    Copenhagen, Oxford, Dublin City
    University, PIEAS
  • Full week meetings, twice a year
  • Contributions to linguistics and comp-ling
    books and papers
  • Each group is self-funded, self-managed

69
Pargram goals
  • Practical
  • Create grammatical resources for NL applications
  • translation, question answering, information
    retrieval, ...
  • Develop discipline of grammar engineering
  • what tools, techniques, conventions make it easy
    to develop and maintain broad-coverage grammars?
  • how long does it take?
  • how much does it cost?
  • Theoretical
  • Refine and guide LFG theory through broad
    coverage of multiple languages
  • Refine and guide XLE algorithms and implementation

70
Parallel f-structures (where possible)
71
but different c-structures
72
Pargram grammars
Rules 251 388 180 56
States 3,239 13,655 3,422 368
Disjuncts 13,294 55,725 16,938 2,012
  • German
  • English
  • French
  • Japanese (Korean)

English allows for shallow markup labeled
bracketing, named-entities
73
Why Norwegian and Japanese?
  • Engineering assessment given mature system,
    parallel grammar specs.
  • How hard is it?
  • Norwegian best case
  • Well-trained LFG linguists
  • Users of previous Parc software
  • Closely related to existing Pargram languages
  • Japanese worst case
  • One computer scientist, one traditional Japanese
    linguist--no LFG experience
  • Typologically different language
  • Character sets, typographical conventions
  • Conclusion not that hard
  • For both languages good coverage, accuracy in
    2 person years

74
Engineering results
  • Grammars and Lexicons
  • Grammar writers cookbook (Butt et al., 1999)
  • New practical formal devices
  • Complex categories for efficiency NPnom
    vs. NP (? CASE) NOM
  • Optimality marks for robustness
  • enlarge grammar without being overrun by
    peculiar analyses
  • Lexical priority merging different lexicons
  • Integration of off-the-shelf morphology
  • From Inxight, based on earlier PARC research,
    and Kyoto

75
Accuracy and coverage
Riezler et al., 2002
  • WSJ F scores for English Pargram grammar
  • Produces dependencies, not labeled trees
  • Stochastic model trained on sections 2-22
  • Tested on dependencies for 700 sentences in
    section 23
  • Robustness some output for every input

Full 74.7 Fragments 25.3
Best 88.5 76.7
Most probable 82.5 69
Random 78.4 67.7
(Named Entities seem to bump these by 3)
76
Meridian will pay a premium of 30.5 million to
assume 2 billion in deposits.
  • mood(pay0, indicative),
  • tense(pay0, fut),
  • adjunct(pay0, assume7),
  • obj(pay0, premium3),
  • stmt_type(pay0, declarative),
  • subj(pay0, Meridian5),
  • det_type(premium3, indef),
  • adjunct(premium3, of23),
  • num(premium3, sg),
  • pers(premium3, 3),
  • adjunct(million4, 30.528),
  • number_type(million4, cardinal),
  • num(Meridian5, sg),
  • pers(Meridian5, 3),
  • obj(assume7, 9),
  • stmt_type(assume7, purpose),

subj(assume7, pro8), number(9,
billion17), adjunct(9, in11), num(9,
pl), pers(9, 3), adjunct_type(in11,
nominal), obj(in11, deposit12), num(deposit12,
pl), pers(deposit12, 3), adjunct(billion17,
219), number_type(billion17, cardinal), number_t
ype(219, cardinal), obj(of23,
24), number(24, million4), num(24,
pl), pers(24, 3), number_type(30.528,
cardinal))
77
Accuracy and coverage
  • Japanese Pargram grammar
  • 97 coverage on large corpora
  • 10,000 newspaper sentences (EDR)
  • 460 copier manual sentences
  • 9,637 customer-relations sentences
  • F-scores against 200 hand-annotated sentences
    from newspaper corpus
  • Best 87
  • Average 80
  • Recall Grammar constructed with 2
    person-years of effort
  • (compare Effort to create an annotated
    training corpus)

78
Robustness Some output for every input
79
Sources of Brittleness
  • Vocabulary problems
  • Gaps in coverage, neologisms, terminology
  • Incorrect entries, missing frames
  • Missing constructions
  • No theoretical guidance (or interest)
  • (e.g. dates, company names)
  • Core constructions overlooked
  • Intuition and corpus both limited
  • Ungrammatical input
  • Real world text is not perfect
  • Sometimes its horrendous
  • Strict performance limits (XLE parameters)

80
Real world input
  • Other weak blue-chip issues included Chevron,
    which went down 2 to 64 7/8 in Big Board
    composite trading of 1.3 million shares Goodyear
    Tire Rubber, off 1 1/2 to 46 3/4, and American
    Express, down 3/4 to 37 1/4.
  • (WSJ, section 13)
  • The croaker's done gone from the hook
  • (WSJ, section 13)
  • (SOLUTION 27000 20) Without tag P-248 the W7F3
    fuse is located in the rear of the machine by the
    charge power supply (PL3 C14 item 15.
  • (Copier repair tip)

81
LFG entries from Finite-State Morphologies
  • Broad-coverage inflectional transducers
  • falls ? fall Noun Pl
  • fall Verb Pres 3sg
  • Mary ? Mary Prop Giv Fem Sg
  • vienne ? venir SubjP SG P1P3 Verb
  • For listed words, transducer provides
  • canonical stem form
  • inflectional information

82
On-the-fly LFG entries
  • -unknown head-word matches unrecognized stems
  • Grammar writer defines -unknown and affixes
  • -unknown N (? PRED)stem (?
    NTYPE)common
  • V (? PRED)stemltSUBJ,OBJgt.
  • Noun N-AFX (? PERS)3.
  • Pl N-AFX (? NUM)pl.
  • Pres V-AFX (? TENSE)present
  • 3sg V-AFX (? SUBJ PERS)3 (? SUBJ NUM)sg
  • Pieces assembled by sublexical rules
  • NOUN ? N N-AFX.
  • VERB ? V V-AFX.

(transitive)
83
Guessing for unlisted words
  • Use FST guesser for general patterns
  • Capitalized words can be proper nouns
  • Saakashvili ? Saakashvili Noun Proper Guessed
  • ed words can be past tense verbs or adjectives
  • fumped ? fump Verb Past Guessed
  • fumped Adj Deverbal
    Guessed
  • Languages with richer morphology allow better
    guessers

84
Subcategorization and Argument Mapping?
  • Transitive, intransitive, inchoative
  • Not related to inflection
  • Cant be inferred from shallow data
  • Fill in gaps from external sources
  • Machine readable dictionaries
  • Other resources VerbNet, WordNet, FrameNet, Cyc
  • Not always easy, not always reliable
  • Current research

85
Grammatical failures
  • Fall-back approach
  • First try to get a complete analysis
  • Prefer standard rules, but
  • Allow for anticipated errors
  • E.g. subject/verb disagree, but interpretation is
    obvious
  • Optimality-theory marks to prefer standard
    analyses
  • If fail, enlarge grammar, try again
  • Build up fragments that get complete sub-parses
    (c-structure and f-structure)
  • Allow tokens that cant be chunked
  • Link chunks and tokens in a single f-structure

86
Fall-back grammar for fragments
  • Grammar writer specifies REPARSECAT
  • Alternative c-structure root if no complete parse
  • Allows for fragments and linking
  • Grammar writer specifies possible chunks
  • Categories (e.g. S, NP, VP but not N, V)
  • Looser expansions
  • Optimality theory
  • Grammar writer specifies marks to
  • Prefer standard rules over anticipated errors
  • Prefer parse with fewest chunks
  • Disprefer using tokens over chunks

87
Example
  • The the dog appears.
  • Analyzed as
  • token the
  • sentence the dog appears

88
C-structure
89
F-structure
  • Many chunks have useful analyses
  • XLE/LFG degrades to shallow parsing in worst
    case

90
Robustness summary
  • External resources for incomplete lexical entries
  • Morphologies, guessers, taggers
  • Current work Verbnet, Wordnet, Framenet, Cyc
  • Order by reliability
  • Fall back techniques for missing constructions
  • Disprefered rules
  • Fragment grammar
  • Current WSJ evaluation
  • 100 coverage, 85 full parses
  • F-score (esp. recall) declines for fragment parses

91
Brief demo
92
Stochastic disambiguation When you have to
choose
93
Finding the most probable parse
  • XLE produces many candidates
  • All valid (with respect to grammar and OT marks)
  • Not all equally likely
  • Some applications are ambiguity enabled (defer
    selection)
  • But some require a single best guess
  • Grammar writers have only coarse preference
    intuitions
  • Many implicit properties of words and structures
    with unclear significance
  • Appeal to probability model to choose best parse
  • Assume previous experience is a good guide for
    future decisions
  • Collect corpus of training sentences
  • Build probability model that optimizes for
    previous good results
  • Apply model to choose best analysis of new
    sentences

94
Issues
  • What kind of probability model?
  • What kind of training data?
  • Efficiency of training, disambiguation?
  • Benefit vs. random choice of parse?
  • Random is awful for treebank grammars
  • Hard LFG constraints restrict to plausible
    candidates

95
Probability model
  • Conventional models stochastic branching
    process
  • Hidden Markov models
  • Probabilistic Context-Free grammars
  • Sequence of decisions, each independent of
    previous decisions, each choice having a certain
    probability
  • HMM Choose from outgoing arcs at a given state
  • PCFG Choose from alternative expansions of a
    given category
  • Probability of an analysis product of choice
    probabilities
  • Efficient algorithms
  • Training forward/backward, inside/outside
  • Disambiguation Viterbi
  • Abney 1997 and others Not appropriate for LFG,
    HPSG
  • Choices are not independent Information from
    different CFG branches interacts through
    f-structure
  • Relative-frequency estimator is inconsistent

96
Exponential models are appropriate
(aka Log-linear models)
  • Assign probabilities to representations, not to
    choices in a derivation
  • No independence assumption
  • Arithmetic combined with human insight
  • Human
  • Define properties of representations that may be
    relevant
  • Based on any computable configuration of
    f-structure features, trees
  • Arithmetic
  • Train to figure out the weight of each property

97
Stochastic Disambiguation in XLE All parses ?
Most probable
  • Discriminative ranking
  • Conditional log-linear model on c/f-structure
    pairs
  • Probability of parse x for string s,
    where
  • f is a vector of feature values for x
  • ? is a vector of feature weights
  • Z is normalizer for all parses of s
  • Discriminative estimation of ? from partially
    labeled data (Riezler et al. ACL02)
  • Combined l1-regularization and feature-selection
  • Avoid over-fitting, choose best features
    (Riezler Vasserman, EMNLP04)

98
Coarse training data for XLE
  • Correct parses are consistent with weak
    annotation

Considering/VBG (NP the naggings of a culture
imperative), (NP-SBJ I) promptly signed/VBD
up.
Sufficient for disambiguation, not for grammar
induction
99
Classes of properties
  • C-structure nodes and subtrees
  • indicating certain attachment preferences
  • Recursively embedded phrases
  • indicating high vs. low attachment
  • F-structure attributes
  • presence of grammatical functions
  • Atomic attribute-value pairs in f-structure
  • particular feature values
  • Left/right/ branching behavior of c-structures
  • (Non)parallelism of coordinations in c- and
    f-structures
  • Lexical elements
  • tuples of head words, argument words, grammatical
    relations

60,000 candidate properties, 1000 selected
100
Some properties and weights
0.937481 cs_embedded VPvpass
1 -0.126697 cs_embedded VPvperf
3 -0.0204844 cs_embedded VPvperf
2 -0.0265543 cs_right_branch -0.986274 cs_conj_non
par 5 -0.536944 cs_conj_nonpar 4 -0.0561876 cs_con
j_nonpar 3 0.373382 cs_label ADVPint -1.20711 cs
_label ADVPvp -0.57614 cs_label
APattr -0.139274 cs_adjacent_label DATEP
PP -1.25583 cs_adjacent_label MEASUREP
PPnp -0.35766 cs_adjacent_label NPadj
PP -0.00651106 fs_attrs 1 OBL-COMPAR 0.454177 fs_
attrs 1 OBL-PART -0.180969 fs_attrs 1
ADJUNCT 0.285577 fs_attr_val DET-FORM
the 0.508962 fs_attr_val DET-FORM
this 0.285577 fs_attr_val DET-TYPE
def 0.217335 fs_attr_val DET-TYPE
demon 0.278342 lex_subcat achieve OBJ,SUBJ,VTYPE
SUBJ,OBL-AG,PASSIVE 0.00735123 lex_subcat
acknowledge COMP-EX,SUBJ,VTYPE
101
Efficiency
  • Properties counts
  • Associated with AND/OR tree of XLE contexts (a1,
    b2)
  • Detectors may add new nodes to tree conjoined
    contexts
  • Shared among many parses
  • Training
  • Dynamic programming algorithm applied to AND/OR
    tree
  • Avoids unpacking of individual parses
    (Miyao and Tsujii HLT02)
  • Similar to inside-outside algorithm of PCFG
  • Fast algorithm for choosing best properties
  • Can train only on sentences with relatively
    low-ambiguity
  • Shorter, perhaps easier to annotate
  • 5 hours to train over WSJ (given file of parses)
  • Disambiguation
  • Viterbi algorithm applied to Boolean tree
  • 5 of parse time to disambiguate
  • 30 gain in F-score from random-parse baseline

102
Integrating Shallow Mark up Part of speech
tags Named entities Syntactic brackets
103
Shallow mark-up of input strings
  • Part-of-speech tags (tagger?)
  • I/PRP saw/VBD her/PRP duck/VB.
  • I/PRP saw/VBD her/PRP duck/NN.
  • Named entities (named-entity recognizer)
  • ltpersongtGeneral Millslt/persongt bought it.
  • ltcompanygtGeneral Millslt/companygt bought it
  • Syntactic brackets (chunk parser?)
  • NP-S I saw NP-O the girl with the
    telescope.
  • NP-S I saw NP-O the girl with the
    telescope.

104
Hypothesis
  • Shallow mark-up
  • Reduces ambiguity
  • Increases speed
  • Without decreasing accuracy
  • (Helps development)
  • Issues
  • Markup errors may eliminate correct analyses
  • Markup process may be slow
  • Markup may interfere with existing robustness
    mechanisms (optimality, fragments, guessers)
  • Backoff may restore robustness but decrease speed
    in 2-pass system

105
Implementation in XLE
Integration with minimal changes to existing
system/grammar
106
Experimental Results PARC 700
Full/All Full parses Optimalsolns Best F-sc Time
Unmarked 76 482/1753 82/79 65/100
?Named ent 78 263/1477 86/84 60/91
POS tag 62 248/1916 76/72 40/48
Lab brk 65 158/ 774 85/79 19/31
107
Comparison Shallow vs. Deep parsing
HLT, 2004
  • Popular myth
  • Shallow statistical parsers are fast, robust and
    useful
  • Deep grammar-based parsers are slow and brittle
  • Is this true?Comparison on predicate-argument
    relations, not phrase-trees
  • Needed for meaning-sensitive applications (
    usefulness)
  • (translation, question answeringbut maybe not
    IR)
  • Collins (1999) parser state-of-the-art, marks
    arguments
  • (for fair test, wrote special code to make
    relations explicit--not so easy)
  • LFG/XLE with morphology, named-entities,
    disambiguation
  • Measured time, accuracy against PARC 700 Gold
    Standard
  • Results
  • Collins is a bit times faster than LFG/XLE
  • LFG/XLE makes somewhat fewer errors, provides
    more useful detail

108
XLE System
  • Parser/generator for LFG grammars multilingual
  • Composition with finite-state transductions
  • Careful ambiguity-management implementation
  • Preserves context-free locality in equational
    disjunctions
  • Exports ambiguity-enabling interfaces
  • Efficient implementation of clause-conjunction
    (C1?C2)\
  • Log-linear disambiguation
  • Appropriate for LFG representations
  • Ambiguity-enabled theory and implementation
  • Robustness shallow in the worst-case
  • Scales to broad-coverage grammars, long sentences
  • Semantic interface Glue

109
LFG/XLE Current issues
  • Induction of LFG grammars from treebanks
  • Basic work in ParGram Dublin City University
  • Principles of generalization, for human
    extension, combination with manual grammar
  • DCU PARC
  • Large grammars for more language typologies
  • E.g. verb initial Welsh, Malagasy, Arabic
  • Reduce performance variance why not linear?
  • Competence vs. performance limit center
    embedding?
  • Investigate speed/accuracy trade-off
  • Embedding in applications XLE as a black box
  • Question answering(!), Translation, Sentence
    condensation
  • Develop, combine with other ambiguity-enabled
    modules
  • Reasoning, transfer-rewriting

110
Matching for Question Answering
Overlap detector
Semantics
Semantics
F-structure
F-structure
Answer Sources
Question
Parser
Parser
Englishgrammar
Englishgrammar
111
Glue Semantics
112
Logical Collocational Semantics
  • Logical Semantics
  • Map sentences to logical representations of
    meaning
  • Enables inference reasoning
  • Collocational semantics
  • Represent word meanings as feature vectors
  • Typically obtained by statistical corpus analysis
  • Good for indexing, classification, language
    modeling, word sense disambiguation
  • Currently does not enable inference
  • Complementary, not conflicting, approaches

113
Example Semantic Representation
The wire broke
  • F-structure gives basic predicate-argument
    structure,
  • but lacks
  • Standard logical machinery (variables,
    connectives, etc)
  • Implicit arguments (events,
    causes)
  • Contextual dependencies (the wire part25)
  • Mapping from f-structure to logical form is
    systematic,
  • but non-trivial

114
Glue Semantics Dalrymple, Lamping Saraswat
1993 and subsequently
  • Syntax-semantics mapping as linear logic
    inference
  • Two logics in semantics
  • Meaning Logic (target semantic representation)
  • any suitable
    semantic representation
  • Glue Logic (deductively assembles target
    meaning)
  • fragment of
    linear logic
  • Syntactic analysis produces lexical glue premises
  • Semantic interpretation uses deduction to
    assemble final meaning from these premises

115
Linear Logic
  • Influential development in theoretical computer
    science (Girard 87)
  • Premises are resources consumed in inference
    (Traditional logic premises are non-resourced)

Traditional Linear A,
A?B B A, A -o B
B A, A?B AB A, A -o B
A?B A re-used
A consumed A, B B
A, B B A discarded
Cannot discard A
/
/
  • Linguistic processing typically resource
    sensitive
  • Words/meanings used exactly once

116
Glue Interpretation (Outline)
  • Parsing sentence instantiates lexical entries to
    produce lexical glue premises
  • Example lexical premise (verb saw in John saw
    Fred)

see g
-o (h -o f) Meaning Term
Glue Formula 2-place predicate
g, h, f constituents in parse

consume meanings of g and h
to produce
meaning of f
  • Glue derivation ? M f
  • Consume all lexical premises ?,
  • to produce meaning, M, for entire sentence, f

117
Glue Interpretation Getting the premises
Lexicon John NP john? Fred NP
fred ? saw V see ?SUBJ -o (?OBJ -o
?)
Instantiated premises john g
fred h see g -o (h -o f)
118
Glue InterpretationDeduction with premises
Premises john g fred h see g -o
(h -o f)
119
Modus Ponens Function ApplicationThe
Curry-Howard Isomorphism
Curry Howard Isomorphism Pairs LL inference
rules with operations on meaning terms
Propositional linear logic inference constructs
meanings LL inference completely independent
of meaning language (Modularity of meaning
representation)
120
Semantic AmbiguityMultiple derivations from
single set of premises
Premises criminal f alleged
f -o f from-London f -o f
Two distinct derivations 1.
from-London(alleged(criminal)) 2.
alleged(from-London(criminal))
121
Semantic Ambiguity Modifiers
  • Multiple derivations from single premise set
  • Arises through different ways of permuting
    modifiers around an ? skeleton
  • Modifiers given formal representation in glue as
    ? -o ? logical identities
  • E.g. an adjective is a noun -o noun modifier
  • Modifiers prevalent in natural language, and lead
    to combinatorial explosion
  • Given N f -o f modifiers, N! ways of
    permuting them around f skeleton

122
Ambiguity management in semantics
  • Efficient theorem provers that manage
    combinatorial explosion of modifiers
  • Packing of N! analyses
  • Represent all N! analyses in polynomial space
  • Compute representation in polynomial time
  • Free choice Read off any given analysis in
    linear time
  • Packing through structure re-use
  • N! analyses through combinations of N
    sub-analyses
  • Compute each sub-analysis once, and re-use

123
Parc Linguistic Environment
Multidimensional Architecture
Glue Semantics
LFG Syntax
FS Morphology
Parse Generate Select Transfer Interpret
Mathematics
Algorithms
Programs
Data structures
Models, parameters
English
French
Theory Software Tableware
Japanese
German
Urdu
Norwegian
124
(No Transcript)
About PowerShow.com