Linguistics 187 Week 4 - PowerPoint PPT Presentation

About This Presentation
Title:

Linguistics 187 Week 4

Description:

Noun or Verb (untie)able or un(tieable)? river or financial? ... Vacuous ambiguity of non-branching trees. this can be avoided (pushup) Legitimate ambiguity ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 65
Provided by: Franci92
Learn more at: https://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Linguistics 187 Week 4


1
Linguistics 187 Week 4
Ambiguity and Robustness
2
Language has pervasive ambiguity
Discourse
Entailment
Semantics
Syntax
Morphology
Tokenization
  • Bill fell. John kicked him.
  • because or after?
  • John didnt wait to go. now or never?
  • Every man loves a woman.
  • The same woman or each their own?
  • John told Tom he had to go. Who had to
    go?
  • The duck is ready to eat. Cooked or hungry?
  • walk untieable knot bank?
  • Noun or Verb (untie)able or un(tieable)?
    river or financial?
  • I like Jan. Jan. or Jan..
    (sentence end or abbreviation)

3
Ambiguity
  • Syntactically legitimate ambiguity (vs.
    spurious ambiguity boys and girls pushup)
  • Sources
  • Alternative c-structure rules
  • Disjunctions in f-structure description
  • Lexical categories
  • XLEs display/computation of ambiguity
  • Dealing with ambiguity
  • Recognize legitimate ambiguity
  • OT marks for preferences (later in the course)
  • Stochastic disambiguation

4
Syntactic Ambiguity
  • Lexical
  • part of speech
  • subcategorization frames
  • Syntactic
  • attachments
  • coordination
  • Implemented system highlights interactions

5
Lexical Ambiguity POS
  • verb-noun
  • I saw her duck.
  • I saw NP her duck.
  • I saw NP her VP duck.
  • noun-adjective
  • the N/A mean rule
  • that child is A mean.
  • he calculated the N mean.

6
Morphology and POS ambiguity
  • English has impoverished morphology and hence
    extreme POS ambiguity
  • leaves leave Verb Pres 3sg
  • leaf Noun Pl
  • leave Noun Pl
  • will Noun Sg Aux Verb base
  • Even languages with extensive morphology have
    ambiguities

7
Lexical ambiguity Subcat frames
  • Words often have more than one subcategorization
    frame
  • transitive/intransitive
  • I broke it./It broke.
  • intransitive/oblique
  • He went./He went to London.
  • transitive/transitive with infinitive
  • I want it./I want it to leave.

8
Subcat-Rule interactions
  • OBL vs. ADJUNCT with intransitive/oblique
  • He went to London.
  • PRED golt( SUBJ)( OBL)gt
  • SUBJ PRED he
  • OBL PRED tolt( OBJ)gt
  • OBJ PRED London
  • PRED golt( SUBJ)gt
  • SUBJ PRED he
  • ADJUNCT PRED tolt( OBJ)gt
  • OBJ PRED
    London

9
OBL-ADJUNCT cont.
  • Passive by phrase
  • It was eaten by the boys.
  • PRED eatlt( OBL-AG)( SUBJ)gt
  • SUBJ PRED it
  • OBL-AG PRED bylt( OBJ)gt
  • OBJ PRED boy
  • It was eaten by the window.
  • PRED eatltNULL( SUBJ)gt
  • SUBJ PRED it
  • ADJUNCT PRED bylt( OBJ)gt
  • OBJ PRED boy

10
XCOMP-ADJUNCT
  • to infinitives can be arguments or adjuncts
    (purpose clauses)
  • I want her to leave.
  • PRED wantlt( SUBJ)( XCOMP)gt( OBJ)
  • SUBJ PRED I
  • OBJ PRED her 1
  • XCOMP PRED leavelt( SUBJ)gt
  • SUBJ 1

11
XCOMP-ADJUNCT cont.
  • I want money to buy that.
  • PRED wantlt( SUBJ)( OBJ)gt
  • SUBJ PRED I
  • OBJ PRED money
  • ADJUNCT PRED buylt( SUBJ)( OBJ)gt
  • SUBJ PRED pro
  • OBJ PRED that
  • But both sentences get both analyses
  • The syntax does not have world knowledge

12
OBJ-TH and Noun-Noun compounds
  • Many OBJ-TH verbs are also transitive
  • I took the cake. I took Mary the cake.
  • The grammar needs a rule for noun-noun compounds
  • the tractor trailer, a grammar rule
  • These can interact
  • I took the grammar rules
  • I took NP the grammar rules
  • I took NP the grammar NP rules

13
Syntactic Ambiguities
  • Even without lexical ambiguity, there is
    legitimate syntactic ambiguity
  • PP attachment
  • Coordination
  • Want to
  • constrain these to legitimate cases
  • make sure they are processed efficiently

14
PP Attachment
  • PP adjuncts can attach to VPs and NPs
  • Strings of PPs in the VP are ambiguous
  • I see the girl with the telescope.
  • I see the girl with the telescope.
  • I see the girl with the telescope.
  • This ambiguity is reflected in
  • the c-structure (constituency)
  • the f-structure (ADJUNCT attachment)

15
PP attachment cont.
  • This ambiguity multiplies with more PPs
  • I saw the girl with the telescope
  • I saw the girl with the telescope in the garden
  • I saw the girl with the telescope in the garden
    on the lawn
  • The syntax has no way to determine the
    attachment, even if humans can.

16
Ambiguity in coordination
  • Vacuous ambiguity of non-branching trees
  • this can be avoided (pushup)
  • Legitimate ambiguity
  • old men and women
  • old N men and women
  • NP old men and NP women
  • I turned and pushed the cart
  • I V turned and pushed the cart
  • I VP turned and VP pushed the cart

17
Grammar Engineering and ambiguity
  • Large-scale grammars will have lexical and
    syntactic ambiguities
  • With real data they will interact, resulting in
    many parses
  • these parses are (syntactically) legitimate
  • they are not intuitive to humans
  • (but more plausible words can make them better)
  • XLE provides tools to manage ambiguity
  • grammar writer interfaces
  • computation

18
XLE display
  • Four windows
  • c-structure (top left)
  • f-structure (bottom left)
  • packed f-structure (top right)
  • choice space (bottom right)
  • C-structure and f-structure next buttons
  • Other two windows are packed representations of
    all the parses
  • clicking on a choice will display that choice in
    the left windows

19
Example
  • I see the girl in the garden
  • PP attachment ambiguity
  • both ADJUNCTS
  • difference in ADJUNCT-TYPE

20
Packed F-structure and Choice space
21
Sorting through the analyses
  • Next button on c-structure and then f-structure
    windows
  • impractical with many choices
  • independent vs. interacting ambiguities
  • hard to detect spurious ambiguity
  • The packed representations show all the analyses
    at once
  • (in)dependence more visible
  • click on choice to view
  • spurious ambiguities appear as blank choices
  • but legitimate ambiguities may also do so

22
Ambiguity Demo
  • eng-week4-demo.lfg
  • eng-week4-demo-test.lfg
  • Attachment
  • the girl ate the banana with the monkey
  • Subcategorization
  • the girl thought about the banana
  • Feature
  • the sheep laughed
  • All three (2 c-structures 8 analyses)
  • the girl thought about the banana with the monkey

23
XLE Ambiguity Management
How many sheep? How many fish?
The sheep liked the fish.
  • Packed representation is a free choice system
  • Encodes all dependencies without loss of
    information
  • Common items represented, computed once
  • Key to practical efficiency

24
Dependent choices
but its wrong It
doesnt encode all dependencies, choices are not
free.
Again, packing avoids duplication
bad The girl saw the cat The cat saw the
girl bad
Who do you want to succeed? I want to
succeed John want intrans, succeed trans I
want John to succeed want trans, succeed intrans
25
Solution Label dependent choices
  • Label each choice with distinct Boolean
    variables p, q, etc.
  • Record acceptable combinations as a Boolean
    expression ?
  • Each analysis corresponds to a satisfying
    truth-value assignment
  • (free choice from the true lines of ?s
    truth table)

26
Ambiguity and Robustness
  • Large-scale grammars are massively ambiguous
  • Grammars parsing real text need to be robust
  • "loosening" rules to allow robustness increases
    ambiguity even more
  • Need a way to control the ambiguity
  • version of Optimality Theory (OT)

27
Theoretical OT
  • Grammar has a set of violable constraints
  • Constraints are ranked by each language
  • This gives cross-linguistic variation
  • Candidates (analyses) compete
  • John waited for Mary. vs. John waited for 3
    hours.
  • Constraint ranking determines winning candidate
  • Issues for XLE
  • Candidates can be very ungrammatical
  • we have a grammar to produce grammatical analyses
  • even with robust, ungrammatical analyses, these
    are controlled
  • Generation, not parsing direction
  • we know what the string is already
  • for generation we have a very specified analysis

28
XLE OT
  • Incorporate idea of ranking and (dis)preference
  • Filter syntactic and lexical ambiguity
  • Reconcile robustness and accuracy
  • Allow parsing grammar to be used for generation

29
XLE OT Implementation
  • OT marks in
  • grammar rules
  • templates
  • lexical entries
  • CONFIG states
  • preference vs. dispreference
  • ranking
  • parsing vs. generation orders

30
The o projection
  • OT marks are not f-structure features
  • OT marks are in their own projection

f-structure
c-structure
o-structure (set of OT marks)
31
The o projection
  • The o-structure is just a set of marks
  • PPadj GuessedN
  • Instead of and !, have o (NB !?f)
  • PP ( ADJUNCT)!
  • PPadj o
  • the f-structure is exactly the same
  • there is now an additional o-structure

32
Ranking analyses
  • Specify relative importance of OT marks in the
    CONFIG
  • OPTIMALITYORDER Mark3 Mark2 Mark1.
  • Comparing analyses
  • Find most important mark where the analyses
    differ
  • Prefer the analysis with the
  • Least number of dispreference marks (no )
  • Most number of preference marks ()

33
Ranking analyses (continued)
  • an analysis with Mark2 is preferred over an
    analysis with Mark3
  • an analysis with no mark is preferred over an
    analysis with Mark2 or Mark3
  • an analysis with one Mark2 is preferred over one
    with two Mark2
  • an analysis with Mark1 is preferred over an
    analysis with no mark
  • an analysis with two Mark1 is preferred over an
    analysis with one Mark1

34
Difference with Theoretical OT
  • Theoretical OT only dispreference marks
  • XLE OT
  • dispreference marks Mark1
  • preference marks Mark1
  • NOTE is only indicated in the CONFIG
  • only the name (Mark1) appears in
    the
  • grammar
  • Deciding which to use can be difficult

35
Example PP ambiguities
  • John waited for Mary.
  • John waited for 3 hours.
  • Rule with OT marks Using template
    OT(_mark)_mark o.
  • VP --gt V
  • (NP ( OBJ)!)
  • PP ( OBL)!
  • _at_(OT PPobl)
  • ! ( ADJUNCT)
  • _at_(OT PPadj).

36
Basic Structures
John waited for Mary f-str PRED 'waitltSUBJgt'
SUBJ PRED 'John' ADJ PRED 'forltOBJgt'
OBJ PRED 'Mary' o-str
PPadj
John waited for Mary f-str PRED 'waitltSUBJ
OBLgt' SUBJ PRED 'John' OBL PRED
'forltOBJgt' OBJ PRED 'Mary'
o-str PPobl
37
Ranking for Example
  • Disprefer ADJUNCTs
  • OPTIMALITYORDER PPadj.
  • Problem will disprefer adjuncts even when no OBL
    analysis is possible
  • Prefer OBLs
  • OPTIMALITYORDER PPobl.
  • Problem will prefer OBL even when the other
    analysis was not an ADJUNCT
  • Still probably better than dispreferring ADJUNCTs
  • Solution local OT marks (not discussed here)

38
Special OT marks in XLE
  • Separate other marks into fields
  • Marks preceding
  • NOGOOD remove parts of the grammar
  • for debugging or specializing
  • STOPPOINT apply on a second pass
  • for extending grammar on failure
  • CSTRUCTURE filter when the c-structure is built
  • for speed
  • There is lots of discussion in the XLE
    documentation the reading on the web is a bit
    out of date for these marks

39
The NOGOOD Mark
  • OT marks can be used to remove parts of the
    grammar
  • rules or rule parts
  • templates or template parts
  • lexical items or parts of them
  • Use for
  • grammar adaptation/sharing
  • grammar development
  • Example
  • OPTIMALITYORDER FrontMatter NOGOOD.

40
NOGOOD Example
  • ROOT rule allows for front matter for special
    corpus
  • ROOT --gt (FR-MAT ( ID)!
  • _at_(OT
    FrontMatter))
  • S.
  • FR-MAT --gt NUMBER
  • (PERIOD).
  • 1. The light flashes.

41
FR-MAT
  • Grammars for corpora with front matter will not
    rank the OT mark FrontMatter
  • (unranked marks are neutral)
  • Grammars for corpora without front matter will
    make the OT mark a NOGOOD
  • OPTIMALITYORDER FrontMatter NOGOOD.
  • Effective ROOT rule ROOT --gt S.
  • Allows rule sharing across grammars
  • Can also be used for debugging

42
Robustness
  • What to do if the grammar doesn't provide an
    analysis?
  • Graceful failure
  • FRAGMENTs
  • Specific relaxations
  • Ungrammatical analysis only if no grammatical one
  • Avoid ungrammatical analyses in generation

43
Robustness STOPPOINT
  • On first pass, STOPPOINT is treated as NOGOOD
  • Small, fast grammar for standard constructions
  • If first pass fails, ignore STOPPOINT and extend
    grammar
  • Relaxation possibilities precede STOPPOINT
  • OPTIMALITYORDER BadDetNAgr STOPPOINT.

44
STOPPOINT Mark example
  • Example NP this boy NP this boys
  • Template call with OT mark
  • DEMON(_P _N) ( SPEC PRED)'_P'
  • ( NUM)c _N
  • ( NUM) _N
  • _at_(OT
    BadDetNAgr).
  • Lexical entry
  • this DET XLE _at_(DEMON stem sg).
  • Ranking
  • OPTIMALITYORDER BadDetNAgr STOPPOINT.

45
Structures for STOPOINT example
NP this boys f-str PRED 'boy' NUM pl
SPEC PRED 'this' o-str BadDetNAgr
NP this boy f-str PRED 'boy' NUM sg
SPEC PRED 'this' o-str
  • Parsing this boys will be slow the grammar
  • has to parse a second time
  • But the ungrammatical input gets a parse
  • Only put OT marks behind the STOPPOINT
  • if they will be rarely triggered

46
Preference marks and STOPPOINT
  • Preference marks behind the STOPPOINT are tried
    first (counter to intuitition)
  • OPTIMALITYORDER MWE STOPPOINT.
  • Use MWE readings if at all possible
  • If fail, do a second pass with the analytic
    (non-MWE) structure (inefficient if fail)
  • Example
  • print quality N _at_(NOUN STEM) _at_(OT MWE).
  • The N print quality is excellent.
  • I want to V print NP quality documents.

47
CSTRUCTURE Marks
  • Apply marks before f-structure constraints are
    processed
  • OPTIMALITYORDER NoCloseQuote Guessed CSTRUCTURE.
  • Improve performance by filtering early
  • May loose some analyses
  • coverage/efficiency tradeoff

48
CSTRUCTURE example Guessed
  • Only use guessed form if another form is not
    found in the morphology/lexicon
  • OPTIMALITYORDER Guessed CSTRUCTURE.
  • Trade-off lose some parses, but much faster
  • The foobar is good.
  • no entry for foobar gt parse with guessed N
  • The audio is good.
  • audio only A in morphology gt no parse

49
CSTRUCTURE example Quote
  • Only allow unbalanced quote marks if there is no
    other quote mark
  • Then I left." vs. He said, "they
    appeared."
  • METARULEMACRO
  • _CAT QT _at_(OT NoCloseQt)
  • XLE only tries balanced version, not double
    unbalanced version
  • failure when really needed two unbalanced quotes

50
Combining the OT marks
  • All the types of OT marks can be used in one
    grammar
  • ordering of NOGOOD, CSTRUCTURE, STOPPOINT are
    important
  • Example
  • OPTIMALITYORDER
  • Verbmobil NOGOOD
  • Guessed CSTRUCTURE
  • MWE Fragment STOPPOINT
  • RareForm StrandedP Obl.

51
Other Features
  • Grouping have marks treated as being of equal
    importance
  • OPTIMALITYORDER (Paren Appositive) Adjunct.
  • Ungrammatical markup have XLE report analyses
    with this mark with a
  • these are treated like any dispreference mark for
    determining the optimal analyses
  • OPTIMALITYORDER NoDetAgr STOPPOINT.

52
Generation
  • XLE uses the same basic grammar to parse and
    generate
  • Do not always want to generate all the
    possibilities that can be parsed
  • Put in special OT marks for generation to block
    or prefer certain strings
  • fix up bad subject-verb agreement
  • only allow certain adverb placements
  • control punctuation options
  • GENOPTIMALITYORDER

53
OT Marks Main points
  • Ambiguity broad coverage results in ambiguity
    OT marks allow preferences
  • Robustness want fall back parses only when
    regular parses fail OT marks allow multipass
    grammar
  • XLE provides for complex orderings of OT marks
  • NOGOOD, CSTRUCTURE, STOPPOINT
  • preference, dispreference, ungrammatical
  • see the XLE documentation for details

54
FRAGMENT grammar
  • What to do when the grammar does not get a parse
  • always want some type of output
  • want the output to be maximally useful
  • Why might it fail
  • construction not covered yet
  • "bad" input
  • took too long (XLE parsing parameters)

55
Grammar engineering approach
  • First try to get a complete parse
  • If fail, build up chunks that get complete parses
    (c-str and f-str)
  • Have a fall back for things without even chunk
    parses
  • Link these chunks and fall backs together in a
    single f-structure

56
Basic idea
  • XLE has a REPARSECAT which it tries if there is
    no complete parse
  • Grammar writer specifies what category the
    possible chunks are
  • OT marks are used to
  • build the fewest chunks possible
  • disprefer using the fall back over the chunks

57
Sample output
  • the the dog appears.
  • Split into
  • "token" the
  • sentence "the dog appears"
  • ignore the period

58
C-structure
59
F-structure
60
How to get this
FRAGMENTS --gt NP ( FIRST)!
_at_(OT-MARK Fragment) S ( FIRST)!
_at_(OT-MARK Fragment) TOKEN (
FIRST)! _at_(OT-MARK Fragment)
(FRAGMENTS ( REST)! ).
Lexicon -token TOKEN ( TOKEN)stem
_at_(OT-MARK Token).
61
Why First-Rest?
  • FIRST-REST
  • FIRST PRED
  • REST FIRST PRED
  • REST
  • Efficient
  • Encodes order
  • Possible alternative set
  • PRED
  • PRED
  • Not as efficient (copying)
  • Even less efficient if mark scope facts

62
Accuracy?
  • Evaluation against gold standard
  • PARC 700 f-structure bank for Wall Street
    Journal
  • Measure F-score on dependency triples
  • F-score average of precision and recall
  • Dependency triples separate f-structure
    features
  • Subj(run, dog) Tense(run, past)
  • Results for best-matching f-structure
  • Full parses F88.5
  • Fragment parses F76.7

(Riezler et al, 2002)
63
Fragments summary
  • XLE has a chunking strategy for when the grammar
    does not provide a full analysis
  • Each chunk gets full c-str and f-str
  • The grammar writer defines the chunks based on
    what will be best for that grammar and
    application
  • Quality
  • Fragments have reasonable but degraded f-scores
  • Usefulness in applications is being tested

64
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com