Linguistics 187287 Week 8 - PowerPoint PPT Presentation

1 / 85
About This Presentation
Title:

Linguistics 187287 Week 8

Description:

PERS(var(1),3) PRED(var(1),girl) CASE(var(1),nom) NTYPE(var(1),common) NUM(var(1),pl) SUBJ(var(0),var(1)) PRED(var(0),laugh) TNS-ASP(var(0),var(2)) TENSE(var(2),pres) ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 86
Provided by: ronka
Category:
Tags: linguistics | var | week

less

Transcript and Presenter's Notes

Title: Linguistics 187287 Week 8


1
Linguistics 187/287 Week 8
Generation, Rewriting, and Applications
  • Ron Kaplan and Tracy King

2
Generation
  • Parsing string to analysis
  • Generation analysis to string
  • What type of input?
  • How to generate

3
Why generate?
  • Machine translation
  • Lang1 string -gt Lang1 fstr -gt Lang2 fstr -gt Lang2
    string
  • Sentence condensation
  • Long string -gt fstr -gt smaller fstr -gt new string
  • Question answering
  • Production of NL reports
  • State of machine or process
  • Explanation of logical deduction
  • Grammar debugging

4
F-structures as input
  • Use f-structures as input to the generator
  • May parse sentences that shouldnt be generated
  • May want to constrain number of generated options
  • Input f-structure may be underspecified

5
XLE generator
  • Use the same grammar for parsing and generation
  • Advantages
  • maintainability
  • write rules and lexicons once
  • But
  • special generation tokenizer
  • different OT ranking

6
Generation tokenizer/morphology
  • White space
  • Parsing multiple white space becomes a single TB
  • John appears. -gt John TB appears TB . TB
  • Generation single TB becomes a single space (or
    nothing)
  • John TB appears TB . TB -gt John appears.

  • John appears .
  • Suppress variant forms
  • Parse both favor and favour
  • Generate only one

7
Morphconfig for parsing generation
  • STANDARD ENGLISH MOPRHOLOGY (1.0)
  • TOKENIZE
  • P!eng.tok.parse.fst G!eng.tok.gen.fst
  • ANALYZE
  • eng.infl-morph.fst G!amerbritfilter.fst
  • G!amergen.fst
  • ----

8
Reversing the parsing grammar
  • The parsing grammar can be used directly as a
    generator
  • Adapt the grammar with a special OT ranking
    GENOPTIMALITYORDER
  • Why do this?
  • parse ungrammatical input
  • have too many options

9
Ungrammatical input
  • Linguistically ungrammatical
  • They walks.
  • They ate banana.
  • Stylistically ungrammatical
  • No ending punctuation They appear
  • Superfluous commas John, and Mary appear.
  • Shallow markup NP John and Mary appear.

10
Too many options
  • All the generated options can be linguistically
    valid, but too many for applications
  • Occurs when more than one string has the same,
    legitimate f-structure
  • PP placement
  • In the morning I left. I left in the morning.

11
Using the Gen OT ranking
  • Generally much simpler than in the parsing
    direction
  • Usually only use standard marks and NOGOOD
  • no marks, no STOPPOINT
  • Can have a few marks that are shared by several
    constructions
  • one or two for dispreferred
  • one or two for preferred

12
Example Prefer initial PP
  • S --gt (PP _at_ADJUNCT _at_(OT-MARK GenGood))
  • NP _at_SUBJ
  • VP.
  • VP --gt V
  • (NP _at_OBJ)
  • (PP _at_ADJUNCT).
  • GENOPTIMALITYORDER NOGOOD GenGood.
  • parse they appear in the morning.
  • generate without OT In the morning they appear.
  • They appear
    in the morning.
  • with OT In the morning they
    appear.

13
Debugging the generator
  • When generating from an f-structure produced by
    the same grammar, XLE should always generate
  • Unless
  • OT marks block the only possible string
  • something is wrong with the tokenizer/morphology
  • regenerate-morphemes if this gets a
    string
  • the tokenizer/morphology is not the
    problem
  • Hard to debug XLE has robustness features to help

14
Underspecified Input
  • F-structures provided by applications are not
    perfect
  • may be missing features
  • may have extra features
  • may simply not match the grammar coverage
  • Missing and extra features are often systematic
  • specify in XLE which features can be added and
    deleted
  • Not matching the grammar is a more serious problem

15
Adding features
  • English to French translation
  • English nouns have no gender
  • French nouns need gender
  • Soln have XLE add gender
  • the French morphology will control
    the value
  • Specify additions in xlerc
  • set-gen-adds add "GEND"
  • can add multiple features
  • set-gen-adds add "GEND CASE PCASE"
  • XLE will optionally insert the feature

Note Unconstrained additions make generation
undecidable
16
Example
The cat sleeps. -gt Le chat dort.
  • PRED 'dormirltSUBJgt'
  • SUBJ PRED 'chat'
  • NUM sg
  • SPEC def
  • TENSE present

PRED 'dormirltSUBJgt' SUBJ PRED 'chat'
NUM sg GEND masc
SPEC def TENSE present
17
Deleting features
  • French to English translation
  • delete the GEND feature
  • Specify deletions in xlerc
  • set-gen-adds remove "GEND"
  • can remove multiple features
  • set-gen-adds remove "GEND CASE PCASE"
  • XLE obligatorily removes the features
  • no GEND feature will remain in the f-structure
  • if a feature takes an f-structure value, that
    f-structure is also removed

18
Changing values
  • If values of a feature do not match between the
    input f-structure and the grammar
  • delete the feature and then add it
  • Example case assignment in translation
  • set-gen-adds remove "CASE"
  • set-gen-adds add "CASE"
  • allows dative case in input to become accusative
  • e.g., exceptional case marking verb in input
    language but regular case in output language

19
Generation for Debugging
  • Checking for grammar and lexicon errors
  • create-generator english.lfg
  • reports ill-formed rules, templates, feature
    declarations, lexical entries
  • Checking for ill-formed sentences that can be
    parsed
  • parse a sentence
  • see if all the results are legitimate strings
  • regenerate they appear.

20
Generation How it works
21
Design considerations
  • Use f-structures as input
  • Allow for a limited form of underspecification
  • produce strings for f-structures that are larger
    than the input
  • conservative extension of language-specific
    attributes is decidable (internal attributes)

22
Design considerations
  • Produce all possible realizations, without
    duplicating common subparts packed output
  • Process packed ambiguous input
  • operate on multiple meanings without duplication
  • detect ambiguity-preserving realizations
  • Share processing advantages with parser
  • context-free backbone
  • maintain packing throughout

23
Termination Issues
24
True and spurious cycles
  • True Cycles
  • in fact, the grammar may allow infinitely many
    realizations
  • finite representation of infinite realizations
    (CFG pumping)
  • Spurious Cycles
  • hard to distinguish from true cycles
  • can produce infinitely many hypotheses
  • input determines number of iterations

25
Spurious cycles
The grammar does not bound the cycle.
26
Approach generation as parsing
  • Quickly construct a context-free chart (guide),
    that includes at least all possible realizations
  • effectively transforming grammar on the fly, to
    specialize it to the particular input
  • LFG f-structure ? CFG
  • Refine the context-free chart to eliminate
    spurious cycles and handle true ones
  • Parse, to ensure constraints are satisfied

27
1. Build guide
S0
VP0
NP4
N 4
V0
/NP4
DET11
dog
barks
a
28
1. Build guide detect cycles
29
2. Refine track resources
S0
VP0
NP4
N 4
V0

DET11
dog
barks
fat
big
a
30
3 . Filter using the parser
  • The refined grammar still over-generates
  • constraints have not been applied yet (e.g. AGR)
  • f-structures must be compared against input
  • Invoke the parser on the packed hypotheses

The grammatical strings are the
desiredrealizations of the input!
31
Generation from Packed Input
32
Packed parses
Unplug the power cord from the wall outlet
33
Packed transfer output
?
débranchez le cordon dalimentation de la prise
murale
34
S
VP
V débranchez
NP
PP de la prise murale
NP le cordondalimentation
35
Ambiguity Preservation
36
Ambiguities dont always matter(for translation)
Move the lever
to the left ...
to obtain unheated air
from the vent
37
Nondistinct realizations
S
VP
VPinf
V déplacer
NP
VP
P pour
?p
p
?q
NP le levier
PP à gauche
V obtenir
NP
q
NP de lair non chauffé
PP aux ouïesdaeration
Disambiguation is not required!
38
Rewriting/Transfer System
39
Why a Rewrite System
  • Grammars produce c-/f-structure output
  • Applications may need to manipulate this
  • Remove features
  • Rearrange features
  • Continue linguistic analysis (semantics,
    knowledge representation next week)
  • XLE has a general purpose rewrite system (aka
    "transfer" or "xfr" system)

40
Sample Uses of Rewrite System
  • Sentence condensation
  • Machine translation
  • Mapping to logic for knowledge representation and
    reasoning
  • Tutoring systems

41
What does the system do?
  • Input set of "facts"
  • Apply a set of ordered rules to the facts
  • this gradually changes the set of input facts
  • Output new set of facts
  • Rewrite system uses the same ambiguity management
    as XLE
  • can efficiently rewrite packed structures,
    maintaining the packing

42
Example F-structure Facts
  • PERS(var(1),3)
  • PRED(var(1),girl)
  • CASE(var(1),nom)
  • NTYPE(var(1),common)
  • NUM(var(1),pl)
  • SUBJ(var(0),var(1))
  • PRED(var(0),laugh)
  • TNS-ASP(var(0),var(2))
  • TENSE(var(2),pres)
  • arg(var(0),1,var(1))
  • lex_id(var(0),1)
  • lex_id(var(1),0)
  • F-structures get var()
  • Special arg facts
  • lex_id for each PRED
  • Facts have two arguments (except arg)
  • Rewrite system allows for any number
  • of arguments

43
Rule format
  • Obligatory rule LHS gt RHS.
  • Optional rule LHS ?gt RHS.
  • Unresourced fact - clause.
  • LHS
  • clause match and delete
  • clause match and keep
  • -LHS negation (don't have fact)
  • LHS, LHS conjunction
  • ( LHS LHS ) disjunction
  • ProcedureCall procedural attachment
  • RHS
  • clause replacement facts
  • 0 empty set of replacement facts
  • stop abandon the analysis

44
Example rules
PERS(var(1),3) PRED(var(1),girl) CASE(var(1),nom)
NTYPE(var(1),common) NUM(var(1),pl) SUBJ(var(0),v
ar(1)) PRED(var(0),laugh) TNS-ASP(var(0),var(2))
TENSE(var(2),pres) arg(var(0),1,var(1)) lex_id(va
r(0),1) lex_id(var(1),0)
"PRS (1.0)" grammar toy_rules. "obligatorily
add a determiner if there is a noun with no
spec" NTYPE(F,), -SPEC(F,) gt SPEC(F,def
). "optionally make plural nouns singular this
will split the choice space" NUM(F, pl) ?gt
NUM(F, sg).
45
Example Obligatory Rule
PERS(var(1),3) PRED(var(1),girl) CASE(var(1),nom)
NTYPE(var(1),common) NUM(var(1),pl) SUBJ(var(0),v
ar(1)) PRED(var(0),laugh) TNS-ASP(var(0),var(2))
TENSE(var(2),pres) arg(var(0),1,var(1)) lex_id(va
r(0),1) lex_id(var(1),0)
"obligatorily add a determiner if there is a
noun with no spec" NTYPE(F,),
-SPEC(F,) gt SPEC(F,def).
Output facts all the input facts plus
SPEC(var(1),def)
46
Example Optional Rule
"optionally make plural nouns singular this will
split the choice space" NUM(F, pl) ?gt
NUM(F, sg).
PERS(var(1),3) PRED(var(1),girl) CASE(var(1),nom)
NTYPE(var(1),common) NUM(var(1),pl) SPEC(var(1),de
f) SUBJ(var(0),var(1)) PRED(var(0),laugh) TNS-AS
P(var(0),var(2)) TENSE(var(2),pres) arg(var(0),1,
var(1)) lex_id(var(0),1) lex_id(var(1),0)
Output facts all the input facts plus
choice split A1 NUM(var(1),pl)
A2 NUM(var(1),sg)
47
Output of example rules
  • Output is a packed f-structure
  • Generation gives two sets of strings
  • The girls laugh.laugh!laugh
  • The girl laughs.laughs!laughs

48
Manipulating sets
  • Sets are represented with an in_set feature
  • He laughs in the park with the telescope
  • ADJUNCT(var(0),var(2))
  • in_set(var(4),var(2))
  • in_set(var(5),var(2))
  • PRED(var(4),in)
  • PRED(var(5),with)
  • Might want to optionally remove adjuncts
  • but not negation

49
Example Adjunct Deletion Rules
  • "optionally remove member of adjunct set"
  • ADJUNCT(, AdjSet), in_set(Adj, AdjSet),
  • -PRED(Adj, not)
  • ?gt 0.
  • "obligatorily remove adjunct with nothing in it"
  • ADJUNCT(, Adj), -in_set(,Adj)
  • gt 0.

He laughs with the telescope in the park. He
laughs in the park with the telescope He laughs
with the telescope. He laughs in the park. He
laughs.
50
Manipulating PREDs
  • Changing the value of a PRED is easy
  • PRED(F,girl) gt PRED(F,boy).
  • Changing the argument structure is trickier
  • Make any changes to the grammatical functions
  • Make the arg facts correlate with these

51
Example Passive Rule
  • "make actives passive
  • make the subject NULL make the object the
    subject
  • put in features"
  • SUBJ( Verb, Subj), arg( Verb, Num, Subj),
  • OBJ( Verb, Obj), CASE( Obj, acc)
  • gt
  • SUBJ( Verb, Obj), arg( Verb, Num, NULL),
    CASE( Obj, nom),
  • PASSIVE( Verb, ), VFORM( Verb, pass).

the girls saw the monkeys gt The monkeys were
seen. in the park the girls saw the monkeys
gt In the park the monkeys were seen.
52
Templates and Macros
  • Rules can be encoded as templates
  • n2n(Eng,Frn)
  • PRED(F,Eng), NTYPE(F,)
  • gt PRED(F,Frn).
  • _at_n2n(man, homme).
  • _at_n2n(woman, femme).
  • Macros encode groups of clauses/facts
  • sg_noun(F)
  • NTYPE(F,), NUM(F,sg).
  • _at_sg_noun(F), -SPEC(F)
  • gt SPEC(F,def).

53
Unresourced Facts
  • Facts can be stipulated in the rules and refered
    to
  • Often used as a lexicon of information not
    encoded in the f-structure
  • For example, list of days and months for
    manipulation of dates
  • - day(Monday). - day(Tuesday). etc.
  • - month(January). - month(February). etc.
  • PRED(F,Pred), ( day(Pred) month(Pred) )
    gt

54
Rule Ordering
  • Rewrite rules are ordered (unlike LFG syntax
    rules but like finite-state rules)
  • Output of rule1 is input to rule2
  • Output of rule2 is input to rule3
  • This allows for feeding and bleeding
  • Feeding insert facts used by later rules
  • Bleeding remove facts needed by later rules
  • Can make debugging challenging

55
Example of Rule Feeding
  • Early Rule Insert SPEC on nouns
  • NTYPE(F,), -SPEC(F,) gt
  • SPEC(F, def).
  • Later Rule Allow plural nouns to become singular
    only if have a specifier (to avoid bad count
    nouns)
  • NUM(F,pl), SPEC(F,) gt NUM(F,sg).

56
Example of Rule Bleeding
  • Early Rule Turn actives into passives
    (simplified)
  • SUBJ(F,S), OBJ(F,O) gt
  • SUBJ(F,O), PASSIVE(F,).
  • Later Rule Impersonalize actives
  • SUBJ(F,), -PASSIVE(F,) gt
  • SUBJ(F,S), PRED(S,they), PERS(S,3),
    NUM(S,pl).
  • will apply to intransitives and verbs with
    (X)COMPs but not transitives

57
Debugging
  • XLE command line tdbg
  • steps through rules stating how they apply

Rule
1 (NTYPE(F,A)), -(SPEC(F,B))
gtSPEC(F,def) File /tilde/thking/courses/ling18
7/hws/thk.pl, lines 4-10 Rule 1 matches
(2) NTYPE(var(1),common) 1
--gt SPEC(var(1),def)
Rule 2 NUM(F,pl)
?gtNUM(F,sg) File /tilde/thking/courses/ling187/
hws/thk.pl, lines 11-17 Rule 2 matches 3
NUM(var(1),pl) 1 --gt
NUM(var(1),sg)
Rule 5 SUBJ(Verb,Subj),
arg(Verb,Num,Subj), OBJ(Verb,Obj),
CASE(Obj,acc) gtSUBJ(Verb,Obj),
arg(Verb,Num,NULL), CASE(Obj,nom),
PASSIVE(Verb,), VFORM(Verb,pass) File
/tilde/thking/courses/ling187/hws/thk.pl, lines
28-37 Rule does not apply
girls laughed
58
Running the Rewrite System
  • create-transfer adds menu items
  • load-transfer-rules FILE loads rules from file
  • f-str window under commands has
  • transfer prints output of rules in XLE window
  • translate runs output through generator
  • Need to do (where path is XLEPATH/lib)
  • setenv LD_LIBRARY_PATH /afs/ir.stanford.edu/data/l
    inguistics/XLE/SunOS/lib

59
Rewrite Summary
  • The XLE rewrite system lets you manipulate the
    output of parsing
  • Creates versions of output suitable for
    applications
  • Can involve significant reprocessing
  • Rules are ordered
  • Ambiguity management is as with parsing

60
Applications
  • Sentence Condensation
  • Notetaking

61
NLTT language components
Lexicons
Morph FST
Token FST
Grammar
Allpacked f-structures
Core XLE Parse/Generate
Sentence
Named entities
Train
Propertyweights
Propertydefinitions
Transfer
N best
Disambiguate
Semantics
62
Sentence condensation
  • Goal Shrink sentences chosen for summary
  • Challenges
  • Retain most salient information of input
  • and guarantee grammaticality of output
  • Example
  • Original uncondensed sentence
  • A prototype is ready for testing, and
    Leary hopes to set requirements for a full system
    by the end of the year.
  • One condensed version
  • A prototype is ready for testing.

63
Sentence Condensation in the LFG Framework
  • Apply fine-grained tools for stochastic LFG
    parsing, transfer, and generation to sentence
    condensation
  • Condensation decisions made on fine-grained
    functional structures instead of context-free
    trees or strings
  • Expressive transfer-module for modifying parse
    structures
  • Powerful MaxEnt disambiguation model on transfer
    output
  • Grammatical well-formedness of output guaranteed
    by filtering through constraint-based generator
  • Efficient ambiguity packing methods applied
    throughout

64
Condensation System
Log-linearmodel
Condensationrules
Packed F-structures
Packed Condens.
XLEParsing
XLEGeneration
Stochastic Selection
PargramEnglish
Simple combination of reusable system components
65
Transfer
  • Transfer component developed for machine
    translation (by Martin Kay, cf. Anette Frank MT
    Summit99). Extended, hardened by Dick Crouch.
  • Small hand-written set of transfer rules
  • Obligatory and optional rules (possibly multiple
    output for single input)
  • Rules may add, delete, or change parts of
    f-structures
  • Transfer operates on packed input and output

66
Sample Transfer Rule
  • adjunct(X,Y), in-set(Z,Y) ?gt del-node(Z,r1),
    rule-trace(r1, del(Z,X)).
  • Rule optionally removes adjunct Z by deleting the
    fact that Z is contained within the set of
    adjuncts Y associated with expression X.
  • Adds trace of rule usage to accumulating history
    of rule applications
  • Rule-traces record relation of transferred
    f-structure to original f-structure for
    stochastic disambiguation

67
Generation
  • Use of generator as filter since transfer rules
    are independent of grammar, and not constrained
    to preserve grammaticality!
  • Robustness techniques in generation
  • Insertion/deletion of features to match lexicon
  • For fragmentary input from robust parser
    grammatical output guaranteed for separate
    fragments

68
One f-structure for Original Sentence
69
Packed alternatives after transfer condensation
70
Selection lta1,b1gt
71
Selection lta2gt

72
Generated condensed strings
  • A prototype is ready.
  • A prototype is ready for testing.
  • Leary hopes to set requirements for a full
    system.
  • A prototype is ready and Leary hopes to set
    requirements for a full system.
  • A prototype is ready for testing and Leary hopes
    to set requirements for a full system.
  • Leary hopes to set requirements for a full system
    by the end of the year.
  • A prototype is ready and Leary hopes to set
    requirements for a full system by the end of the
    year.
  • A prototype is ready for testing and Leary hopes
    to set requirements for a full system by the end
    of the year.

All grammatical!
73
Transfer Rules used in Most Probable Condensation
lta2gt
  • Rule-traces in order of application
  • r13 Keep of-phrases (of the year)
  • r161 Keep adjuncts for certain heads, specified
    elsewhere (system)
  • r1 Delete adjunct of first conjunct (for
    testing)
  • r1 Delete adjunct of second conjunct (by the end
    of the year)
  • r2 Delete (rest of) second conjunct (Leary hopes
    to set requirements for a full system),
  • r22 Delete conjunction itself (and).

74
Discussion
  • Ranking of system variants shows close
    correlation between automatic and manual
    evaluation.
  • Stochastic selection of transfer-output crucial
    50 reduction in error rate relative to upper
    bound.
  • Selection of best parse for transfer-input less
    important Similar results for manual selection
    and transfer from all parses.
  • Compression rate around 60 less aggressive than
    human condensation, but shortest-string heuristic
    is worse.

75
Note-Taking Application
  • Ronald Kaplan, Richard Crouch, Tracy Holloway
    King, Michael Tepper, Daniel G. Bobrow

76
Note-taking Simple, commonignored
  • Analysts scan sources, copy important passages
    into note-file
  • Copy-paste selections from Explorer to Word?
  • Awkward interaction
  • Selected passages have both useful and irrelevant
    information
  • The earliest signs of trouble--a slight nasal
    stuffiness, twinge of pain in the joints,
    fatigue, and a dry, persistent cough--resemble
    the onset of a cold or flu.
  • A note-taking tool Shrink sentences for speedy
    recall/review
  • The earliest signs of trouble resemble the
    onset of a cold or flu.
  • One-key interaction
  • Keep information of interest to user/task
  • Avoid meaning distortion
  • Preserve grammaticality (for readability)
  • Retain full selection (for detail and context)
  • Record provenance
  • Fail-soft manual entry/edit in worst case
  • (Note-taking ?
    summarization)

77
Outline
  • Front end simple note-taking interface
  • Back end deep language processing

78
Source browser
79
User interests
Also User history Project profile
80
Click to drill down
Original passage
Provenance
81
Fail-soft manual repair
82
The Language Mapping LFG XLE
LFGGrammar
Namedentities
English, German, Arabic, etc.
Parse
Functional structures
Sentence
TokensMorphology
Generate
XLE
Igor carried plague.
83
Accompanied by an armed guard, Domaradsky carried
a dish with a culture of genetically altered
plague through the gates of the ancient fortress
like a rare jewel.
ADJUNCT(f,a) ?? delete(a)
84
Accompanied by an armed guard, Domaradsky carried
a dish with a culture of genetically altered
plague through the gates of the ancient fortress
like a rare jewel.
ADJUNCT(f,a) ?? delete(a)
Domaradsky Disease
85
Extend by ontology discard more
  • Containers are typically less significant than
    the stuff inside
  • Rule Reduce container with stuff to stuff
  • dish is a container, so

86
Avoid meaning distortion
  • Some predicates are downward monotonic (deny,
    preventnot)
  • It is misleading to discard modifiers in the
    context of such predicates

Domaradsky denied carrying a culture of
genetically-altered plague.
87
? Note-Taking
Architecture Condensation
Log-linearmodel
Condensationrules
F-structures
F-structures
XLEParse
XLEGenerate
Stochastic Selection
Condense
LFGEnglish
88
Note-taking prototype
  • Artful combination of front-end UI and back-end
    language processing
  • Front end tuned to user, task
  • Simple augmentation of existing desktop tools
  • No training
  • Fail soft
  • Exploits (but conceals) complexity of back-end

Needed coreference, hardening, evaluation
89
Other Applications
  • More on applications next week
  • Mapping to semantics and knowledge representation

90
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com