Title: A Probabilistic model of Lexical and Syntactic Access and Disambiguation
1A Probabilistic model of Lexical and Syntactic
Access and Disambiguation
Daniel Jurafsky, 1995
Presented by Connor Stroomberg, 13-12-2006
2Introduction
- Access Retrieving linguistic structure from
some mental grammar. - Disambiguation choosing among combinations of
structures to correctly parse ambiguous
linguistic input.
3Previous models
- Try to solve problem by divide and conquer
method. - Model either access or disambiguation.
- Model either at lexical or syntactic level.
4The proposed parser
- Parallel
- Built with standard dynamic programming (chart)
parsing architecture. - Access and disambiguation implemented as a set of
pruning heuristics.
5Why pruning ?
- Dynamic programming may be used to efficiently do
syntax parsing. - Interpretation may not be as efficient.
- Both access and disambiguation pruning based on
probability ranking
6Grammar
- Each grammatical construction is a sign
- A sign is a from meaning pair, represented by
typed unification-based context free rules - Four assumptions are made
- The representation of constituent structure rules
as mental objects. - A uniform context-free model of lexical,
morphological and syntactic rules. - Valence expectations on lexical heads.
- A lack of empty categories.
7Representational uniformity
- No distinction between, lexicon, morphology,
syntax. - Each represented by a augmented CFR
8Augmentations
- Each construction is augmented with two types of
probabilitys. - 1 Prior probability (resting activation)
- 2 Probabilitys in each case were a construction
expresses some linguistic expectation. - Valence
- Obligatory arguments have probability of 1.
- Optional arguments between 0 and 1.
9Access
- Traditional psycholinguistic models were serial.
- Motivation garden-path sentences
- Traditional computational models were parallel.
- Due to dynamic programming.
- Proposed model is a parallel model based on
dynamic programming, but is able to model
garden-path effect.
10Examples of previous models
11Problems with Previous models
- Bottom-up and top-down have been discussed.
- Timing and frequency effects.
- Solution key or clue
- Problem 1 each item needs to be annotated with
key or clue. - Problem 2 context effects.
12The probabilistic model
- For each construction compute conditional
probability given the evidence. - Evidence may be syntactic semantic lexical and
bottom-up or top-down. - Constructions are accessed according to
beam-search. - Beam-width is universal constant in a grammar.
13Evidence
- Top-down evidence (e). P(c e) the probability
that the evidence construction e left-expands to
construction c.
14Evidence
- Combining the top-down and bottom-up is complex.
- A simplifying assumption can be made, saying e
can only effect e- thought c.
15Simplifying evidence
- Since ratios a compared the denominator may be
dropped.
- However psycholinguistic studies suggest this is
to simplistic.
16Choosing constructions
- Constructions have probabilities how do we choose
between them ? - Pruning, relative beam-search.
- Prunes any construction more than a constant
times worse than the best construction.
17Advantages of the model
- The model is able to account for a number of
psycholinguistic results. - Lexical items with a higher frequency will have a
higher probability.
implies
- Access of a construction will be inversely
proportional to the probability of the evidence.
18Disambiguation
- Natural language is ambiguous.
- Examples
- Preference
- The woman discussed the dogs on the beach
- The woman discussed the dogs which were on the
beach (90) - The woman discussed them (the dogs) while on the
beach (10) - The woman kept the dogs on the beach
- The woman kept the dogs which were on the beach
(5) - The woman kept them (the dogs) while on the beach
(95) - Garden-path the horse raced past the barn
fell. - Gap-filling / valence ambiguities.
19Serial or parallel disambiguation
- Garden-path effect
- Serial explanation needs additional heuristic.
- Word-based window, 3-constituent window.
- Parallel parser uses pruning.
- Using probability as pruning metric the most
coherent interpretation is chosen.
20Modeling preference effects
21Modeling preference effects
22Modeling the garden-path effect
- Requires showing that the theory predict pruning
in a case of ambiguity whenever the garden-path
effect can be shown to occur. - This requires setting the appropriate beam width.
- Set the beam to wide the garden-path sentence
will be mislabeled as a less preferred
interpretation. - Set the beam to narrow the mislabel parsable
sentences as garden-path sentences. - Jurafsky suggests a beam width of 1/5.
23Modeling the garden-path effect
- The complex houses married and single students
and their families.
24Semantics in disambiguation
- We need semantics to explain.
- The teachers taught by the Berlitz method
passed the test. - ?The children taught by the Berlitz method
passed the test. - Possible solution add semantics to valence
probabilities. - This may also be used to model real world
knowledge. - The view from the window would be improved by the
addition of a plant out there. - The view from the window would be destroyed by
the addition of a plant out there.
25Problems en future work
- Simplification assumption.
- No discussion on morphology.
- No discussion of overload effects
(center-embedding). - Embedding in connectionist framework.
26Final thought
- The author sees probabilities not as a
replacement for structure, but as an enrichment
of structure.
27Questions / Comments Discussion