Linguistics 239E Week 9 - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Linguistics 239E Week 9

Description:

Linguistics 239E Week 9. Ron Kaplan and Tracy King. Generation. Evaluation and Testing ... allows dative case in input to become accusative ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 47
Provided by: Franci65
Category:

less

Transcript and Presenter's Notes

Title: Linguistics 239E Week 9


1
Linguistics 239E Week 9
Generation Evaluation and Testing
  • Ron Kaplan and Tracy King

2
Issues from HW8
  • How to keep punctuation from being TOKENS
  • FRAGMENT --gt PUNCT
  • NP _at_FIRST-EQ
  • S _at_FIRST-EQ
  • TOKEN
    _at_FIRST-EQ PUNCT
  • PUNCT (
    REST).

3
Sample c-structure
4
Sample F-structure
5
Generation
  • Parsing string to analysis
  • Generation analysis to string
  • What type of input?
  • How to generate

6
Why generate?
  • Machine translation
  • Lang1 string -gt Lang1 fstr -gt Lang2 fstr -gt Lang2
    string
  • Sentence condensation
  • Long string -gt fstr -gt smaller fstr -gt new string
  • Question answering
  • Production of NL reports
  • State of machine or process
  • Explanation of logical deduction
  • Grammar debugging

7
F-structures as input
  • Use f-structures as input to the generator
  • May parse sentences that shouldnt be generated
  • May want to constrain number of generated options
  • Input f-structure may be underspecified

8
XLE generator
  • Use the same grammar for parsing and generation
  • Advantages
  • maintainability
  • write rules and lexicons once
  • But
  • special generation tokenizer
  • different OT ranking

9
Generation tokenizer
  • White space
  • Parsing multiple white space becomes a single TB
  • John appears. -gt John TB appears TB . TB
  • Generation single TB becomes a single space (or
    nothing)
  • John TB appears TB . TB -gt John appears.

  • John appears .

10
Generation tokenizer
  • Capitalization
  • Parsing optionally decap initially
  • They came -gt they came
  • Mary came -gt Mary came
  • Generation always capitalize initially
  • they came -gt They came
  • they came
  • May regularize other options
  • quotes, dashes, etc.

11
Generation morphology
  • Suppress variant forms
  • Parse both favor and favour
  • Generate only one

12
Morphconfig for parsing generation
  • STANDARD ENGLISH MOPRHOLOGY (1.0)
  • TOKENIZE
  • P!eng.tok.parse.fst G!eng.tok.gen.fst
  • ANALYZE
  • eng.infl-morph.fst G!amerbritfilter.fst
  • G!amergen.fst
  • ----

13
Reversing the parsing grammar
  • The parsing grammar can be used directly as a
    generator
  • Adapt the grammar with a special OT ranking
    GENOPTIMALITYORDER
  • Why do this?
  • parse ungrammatical input
  • have too many options

14
Ungrammatical input
  • Linguistically ungrammatical
  • They walks.
  • They ate banana.
  • Stylistically ungrammatical
  • No ending punctuation They appear
  • Superfluous commas John, and Mary appear.
  • Shallow markup NP John and Mary appear.

15
Too many options
  • All the generated options can be linguistically
    valid, but too many for applications
  • Occurs when more than one string has the same,
    legitimate f-structure
  • PP placement
  • In the morning I left. I left in the morning.

16
Using the Gen OT ranking
  • Generally much simpler than in the parsing
    direction
  • Usually only use standard marks and NOGOOD
  • no marks, no STOPPOINT
  • Can have a few marks that are shared by several
    constructions
  • one or two for disprefered
  • one or two for prefered

17
Example Comma in coord
  • COORD(_CAT) _CAT _at_CONJUNCT
  • (COMMA _at_(OTMARK
    GenBadPunct))
  • CONJ
  • _CAT _at_CONJUNCT.
  • GENOPTIMALITYORDER GenBadPunct NOGOOD.
  • parse They appear, and disappear.
  • generate without OT They appear(,) and
    disappear.
  • with OT They appear and
    disappear.

18
Example Prefer initial PP
  • S --gt (PP _at_ADJUNCT _at_(OT-MARK GenGood))
  • NP _at_SUBJ
  • VP.
  • VP --gt V
  • (NP _at_OBJ)
  • (PP _at_ADJUNCT).
  • GENOPTIMALITYORDER NOGOOD GenGood.
  • parse they appear in the morning.
  • generate without OT In the morning they appear.
  • They appear
    in the morning.
  • with OT In the morning they
    appear.

19
Generation commands
  • XLE command line
  • regenerate "They appear."
  • generate-from-file my-file.pl
  • (regenerate-from-directory, regenerate-testfile)
  • F-structure window
  • commands generate from this fs
  • Debugging commands
  • regenerate-morphemes

20
Debugging the generator
  • When generating from an f-structure produced by
    the same grammar, XLE should always generate
  • Unless
  • OT marks block the only possible string
  • something is wrong with the tokenizer/morphology
  • regenerate-morphemes if this gets a
    string
  • the tokenizer/morphology is not the
    problem
  • Very hard to debug newest XLE has robustness
    features to help

21
Underspecified Input
  • F-structures provided by applications are not
    perfect
  • may be missing features
  • may have extra features
  • may simply not match the grammar coverage
  • Missing and extra features are often systematic
  • specify in XLE which features can be added and
    deleted
  • Not matching the grammar is a more serious problem

22
Adding features
  • English to French translation
  • English nouns have no gender
  • French nouns need gender
  • Soln have XLE add gender
  • the French morphology will control
    the value
  • Specify additions in xlerc
  • set-gen-adds add "GEND"
  • can add multiple features
  • set-gen-adds add "GEND CASE PCASE"
  • XLE will optionally insert the feature

Note Unconstrained additions make generation
undecidable
23
Example
The cat sleeps. -gt Le chat dort.
  • PRED 'dormirltSUBJgt'
  • SUBJ PRED 'chat'
  • NUM sg
  • SPEC def
  • TENSE present

PRED 'dormirltSUBJgt' SUBJ PRED 'chat'
NUM sg GEND masc
SPEC def TENSE present
24
Deleting features
  • French to English translation
  • delete the GEND feature
  • Specify deletions in xlerc
  • set-gen-adds remove "GEND"
  • can remove multiple features
  • set-gen-adds remove "GEND CASE PCASE"
  • XLE obligatorily removes the features
  • no GEND feature will remain in the f-structure
  • if a feature takes an f-structure value, that
    f-structure is also removed

25
Changing values
  • If values of a feature do not match between the
    input f-structure and the grammar
  • delete the feature and then add it
  • Example case assignment in translation
  • set-gen-adds remove "CASE"
  • set-gen-adds add "CASE"
  • allows dative case in input to become accusative
  • e.g., exceptional case marking verb in input
    language but regular case in output language

26
Creating Paradigms
  • Deleting and adding features within one grammar
    can produce paradigms
  • Specifiers
  • set-gen-adds remove "SPEC"
  • set-gen-adds add "SPEC DET DEMON"
  • regenerate "NP boys"
  • the those these boys

27
Generation for Debugging
  • Checking for grammar and lexicon errors
  • create-generator english.lfg
  • reports ill-formed rules, templates, feature
    declarations, lexical entries
  • Checking for ill-formed sentences that can be
    parsed
  • parse a sentence
  • see if all the results are legitimate strings
  • regenerate they appear.

28
Regeneration example
  • regenerate "In the park they often see the boy
    with the telescope."
  • parsing In the park they often see the boy with
    the telescope.
  • 4 solutions, 0.39 CPU seconds, 178 subtrees
    unified
  • They see the boy in the parkIn the park they
    see the boy often with the telescope.
  • regeneration took 0.87 CPU seconds.

29
Regenerate testfile
  • regenerate-testfile
  • produces new file testfile.regen
  • sentences with parses and generated strings
  • lists sentences with no strings
  • if have no Gen OT marks, everything should
    generate back to itself

30
Testing and Evaluation
  • Need to know
  • Does the grammar do what you think it should?
  • cover the constructions
  • still cover them after changes
  • not get spurious parses
  • not cover ungrammatical input
  • How good is it?
  • relative to a ground truth/gold standard
  • for a given application

31
Testsuites
  • XLE can parse and generate from testsuites
  • parse-testfile
  • regenerate-testfile
  • Issues
  • where to get the testsuites
  • how to know if the parse the grammar got is the
    one that was intended

32
Basic testsuites
  • Set of sentences separated by blank lines
  • can specify category
  • NP the children who I see
  • can specify expected number of results
  • They saw her duck. (2! 0 0 0)
  • parse-testfile produces
  • xxx.new sentences plus new parse statistics
  • of parses time complexity
  • xxx.stats new parse statistics without the
    sentences
  • xxx.errors changes in the statistics from
    previous run

33
Testsuite examples
  • LEXICON _'s
  • ROOT He's leaving. (11 0.10 55)
  • ROOT It's broken. (21 0.11 59)
  • ROOT He's left. (31 0.12 92)
  • ROOT He's a teacher. (11 0.13 57)
  • RULE CPwh
  • ROOT Which book have you read? (14 0.15 123)
  • ROOT How does he be? (0! 0 0.08 0)
  • RULE NOMINALARGS
  • NP the money that they gave him (1 0.10 82)

34
.errors file
ROOT They left, then they arrived. (22 0.17
110) MISMATCH ON 339 (22 -gt 12) ROOT Is
important that he comes. (0! 0 0.15 316) ERROR
AND MISMATCH ON 784 (0! 0 -gt 1119)
35
.stats file
((1901) (11 0.21 72) -gt (11 0.21 72) (5
words)) ((1902) (11 0.10 82) -gt (11 0.12 82) (6
words)) ((1903) (1 0.04 15) -gt (1 0.04 15) (1
word)) XLE release of Feb 26, 2004
1129. Grammar /tilde/thking/pargram/english/sta
ndard/english.lfg. Grammar last modified on Feb
27, 2004 1358. 1903 sentences, 38 errors, 108
mismatches 0 sentences had 0 parses (added 0,
removed 56) 38 sentences with 0! 38 sentences
with 0! have solutions (added 29, removed 0) 57
starred sentences (added 57, removed 0) timeout
100 max_new_events_per_graph_when_skimming
500 maximum scratch storage per sentence 26.28
MB (642) maximum event count per sentence
1276360 average event count per graph 217.37
36
.stats file cont.
293.75 CPU secs total, 1.79 CPU secs max new
time/old time 1.23 elapsed time 337
seconds biggest increase 1.16 sec (677 1.63
sec) biggest decrease 0.64 sec (1386 0.54
sec) range parsed failed words seconds
subtrees optimal suboptimal 1-10 1844
0 4.25 0.14 80.73 1.44
2.49E01 11-20 59 0 11.98 0.54
497.12 10.41 2.05E04 all 1903
0 4.49 0.15 93.64 1.72
6.60E02 0.71 of the variance in seconds is
explained by the number of subtrees
37
Is it the right parse?
  • Use shallow markup to constrain possibilities
  • bracketing of desired constituents
  • POS tags
  • Compare resulting structure to a previously
    banked one (perhaps a skeletal one)
  • significant amount of work if done by hand
  • bank f-structures from the grammar if good enough
  • reduce work by using partial structures
  • (e.g., just predicate argument structure)

38
Where to get the testsuite?
  • Basic coverage
  • create testsuite when writing the grammar
  • publically available testsuites
  • extract examples from the grammar comments
  • "COMEX NP-RULE NP the flimsy boxes"
  • examples specific enough to test one construction
    at a time
  • Interactions
  • real world text necessary
  • may need to clean up the text somewhat

39
Evaluation
  • How good is the grammar?
  • Absolute scale
  • need a gold standard to compare against
  • Relative scale
  • comparing against other systems
  • For an application
  • some applications are more error tolerant than
    others

40
Gold standards
  • Representation of the perfect parse for the
    sentence
  • can bootstrap with a grammar for efficiency and
    consistency
  • hand checking and correction
  • Determine how close the grammar's output is to
    the gold standard
  • may have to do systematic mappings
  • may only care about certain relations

41
PARC700
  • 700 sentences randomly chosen from section23 of
    the UPenn WSJ corpus
  • How created
  • parsed with the grammar
  • saved the best parse
  • converted format to "triples"
  • hand corrected the output
  • Issues
  • very time consuming process
  • difficult to maintain consistency even with
    bootstrapping and error checking tools

42
Sample triple from PARC700
sentence( id(wsj_2356.19, parc_23.34)
date(2002.6.12) validators(T.H. King, J.-P.
Marcotte) sentence_form(The device was
replaced.) structure( mood(replace0,
indicative) passive(replace0, )
stmt_type(replace0, declarative)
subj(replace0, device1) tense(replace0,
past) vtype(replace0, main)
det_form(device1, the) det_type(device1,
def) num(device1, sg) pers(device1, 3)))
43
Evaluation against PARC700
  • Parse the 700 sentences with the grammar
  • Compare the f-structure with the triple
  • Determine
  • number of attribute-value pairs that are missing
    from the f-structure
  • number of attribute-value pairs that are in the
    f-structure but should not be
  • combine result into an f-score
  • 100 is perfect match 0 is no match
  • current grammar is in the low 80s

44
Using other gold standards
  • Need to match corpus to grammar type
  • written text vs. transcribed speech
  • technical manuals, novels, newspapers
  • May need to have mappings between systematic
    differences in analyses
  • minimally want a match in grammatical functions
  • but even this can be difficult (e.g. XCOMP
    subjects)

45
Testing and evaluation
  • Necessary to determine grammar coverage and
    useability
  • Frequent testing allows problems to be corrected
    early on
  • Changes in efficiency are also detectable in this
    way

46
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com