DataOriented Parsing - PowerPoint PPT Presentation

1 / 78
About This Presentation
Title:

DataOriented Parsing

Description:

Data-Oriented Parsing. Remko Scha. Institute for Logic, Language and Computation ... live on this paradoxical slope to which it is doomed by the evanescence of its ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 79
Provided by: remko9
Category:

less

Transcript and Presenter's Notes

Title: DataOriented Parsing


1
Data-Oriented Parsing
  • Remko Scha
  • Institute for Logic, Language and Computation
  • University of Amsterdam

2
  • Overview
  • The Big Picture (cognitive motivation)
  • A simple Data-Oriented Parsing model
  • Extended DOP models
  • Psycholinguistics revisited
  • Statistical considerations

3
  • Data-Oriented Parsing
  • The Big Picture

4
  • Data-Oriented Parsing
  • The Big Picture
  • (1) The key to understanding cognition is
  • understanding perception.

5
  • Data-Oriented Parsing
  • The Big Picture
  • (1) The key to understanding cognition is
  • understanding visual Gestalt perception.

6
  • Data-Oriented Parsing
  • The Big Picture
  • (1) The key to understanding cognition is
  • understanding visual Gestalt perception.
  • Conjecture Language processing and "thinking"
    involve a metaphorical use of our Gestalt
    perception capability.
  • R. Scha "Wat is het medium van het denken?" In
    M.B. In 't Veld R. de Groot Beelddenken en
    begripsdenken een paradox? Utrecht Agiel, 2005.

7
  • Data-Oriented Parsing
  • The Big Picture
  • (1) The key to understanding cognition is
  • understanding visual Gestalt perception.
  • (2) All perceptual processes are based on
    detecting similarities and analogies with
    concrete past experiences.

8
The Data-Oriented World View
  • All interpretive processes are based on detecting
    similarities and analogies with concrete past
    experiences.
  • E.g.
  • Visual Perception
  • Music Perception
  • Lexical Semantics
  • Concept Formation.

9
E.g. The Data-Oriented Perspective on Lexical
Semantics and Concept Formation.
  • A concept the extensional set of its
    previously experienced instances.
  • Classifying new input under an existing concept
    judging the input's similarity to these
    instances.
  • Against
  • Explicit definitions
  • Prototypes

10
The Data-Oriented Perspective on Lexical
Semantics and Concept Formation.
  • A concept the extensional set of its
    previously experienced instances.
  • Classifying new input under an existing concept
    judging the input's similarity to these
    instances.
  • Against
  • Explicit definitions
  • Prototypes
  • Learning

11
  • Part II
  • Data-Oriented Parsing

12
  • Data-Oriented Parsing
  • Processing new input utterances in terms of their
    similarities and analogies with previously
    experienced utterances.

13
Language processing by analogy Was proposed
already by "Bloomfield, Hockett, Paul, Saussure,
Jespersen, and many others". But "To attribute
the creative aspect of language use to 'analogy'
or 'grammatical patterns' is to use these terms
in a completely metaphorical way, with no clear
sense and with no relation to the technical
usage of linguistic theory." (Chomsky
1966)
14
Challenge To work out a formally precise
notion of "language processing by analogy".
15
Challenge To work out a formally precise
notion of "language processing by analogy". A
first step Data-Oriented Parsing Remember all
utterances with their syntactic tree-structures.
Analyse new input by recombining fragments of
these tree structures.
16
Data-Oriented Parsing
  • Memory-based approach to syntactic parsing and
    disambiguation.
  • Basic idea use the subtrees from a syntactically
    annotated corpus directly as a stochastic
    grammar.

17
Data-Oriented Parsing (DOP)
  • Simplest version DOP1 (Bod, 1992).
  • Annotated corpus defines Stochastic Tree
    Substitution Grammar

18
Data-Oriented Parsing (DOP)
  • Simplest version DOP1 (Bod 1992).
  • Annotated corpus defines Stochastic Tree
    Substitution Grammar
  • (Slides adapted from Guy De Pauw,
  • University of Antwerp)

19
(No Transcript)
20
(No Transcript)
21
Fragment Collection
22
Generating "Peter killed the bear."
Note one parse has many derivations!
23
An annotated corpus defines a Stochastic Tree
Substitution Grammar
  • Probability of a Derivation
  • Product of the Probabilities of the Subtrees

24
An annotated corpus defines a Stochastic Tree
Substitution Grammar
  • Probability of a Derivation
  • Product of the Probabilities of the Subtrees
  • Probability of a Parse
  • Sum of the Probabilities of its Derivations

25

Example derivation for "Van Utrecht naar
Leiden."
26
Probability of substituting a subtree ti on a
node the number of occurrences of a subtree
ti, divided by the total number of occurrences
of subtrees t with the same root node label as ti
(ti) / (t root(t) root(ti)
) Probability of a derivation t1... tn the
product of the probabilities of the substitutions
that it involves Pi (ti) / (t root(t)
root(ti) ) Probability of a parse-tree the
sum of the probabilities of all derivations of
that parse-tree Si Pj (tij) / (t
root(t) root(tij) )
27
An annotated corpus defines a Stochastic Tree
Substitution Grammar
  • Probability of a Derivation
  • Product of the Probabilities of the Subtrees
  • Probability of a Parse
  • Sum of the Probabilities of its Derivations
  • Disambiguation Choose the Most Probable
    Parse-tree

28
An annotated corpus defines a Stochastic Tree
Substitution Grammar
  • Q. Does this work?

29
An annotated corpus defines a Stochastic Tree
Substitution Grammar
  • Q. Does this work?
  • A. Yes. Experiments on a small fragment of the
    ATIS corpus gave very good results. (Bod's
    dissertation, 1995.)

30
An annotated corpus defines a Stochastic Tree
Substitution Grammar
  • Q. Do we really need all fragments?

31
An annotated corpus defines a Stochastic Tree
Substitution Grammar
  • Q. Do we really need all fragments?
  • A. Experiments on the ATIS corpus

32
Experiments on a small subset of the ATIS
corpus max words ? 1 2 3 4
6 8 unlimited max tree-depth ? 1 47
47 2 65 68 68 68 3 74
76 79 79 79 79 79 4 75 79
81 83 83 83 83 5 77 80
83 83 83 85 84 6 75 80
83 83 83 87 84 Parse accuracy (in
) as a function of the maximum number of
lexical items and the maximum tree-depth of the
fragments.
33
Beyond DOP1
  • Computational issues
  • Linguistic issues
  • Psycholinguistic issues
  • Statistical issues

34
Computational issues Part 1 the good news
  • TSG parsing can be based on the techniques of
    CFG-parsing, and inherits some of their
    properties.
  • Semi-ring algorithms are applicable for many
    useful purposes

35
Computational issues Part 1 the good news
  • Semi-ring algorithms are applicable for many
    useful purposes. In O(n3) of sentence-length, we
    can
  • Build a parse-forest.
  • Compute the Most Probable Derivation.
  • Select a random parse.
  • Compute a Monte-Carlo estimation of the Most
    Probable Parse.

36
Computational issues Part 2 the bad news
  • Computing the Most Probable Parse is NP-complete
    (Sima'an). (Not a semi-ring algorithm.)
  • The grammar gets very large.

37
Computational issuesPart 3 Solutions
  • Non-probabilistic DOP Choose the shortest
    derivation. (De Pauw, 1997 more recently, good
    results by Bod on WSJ corpus.)
  • Compress the fragment-set. (Use Minimum
    Description Length. Van der Werff, 2004.)
  • Rig the probability assignments so that the Most
    Probable Derivation becomes applicable.

38
  • Linguistic issues

39
  • More powerful models
  • Kaplan Bod LFG-DOP (Based on
    Lexical-Functional Grammar)
  • Hoogweg TIG-DOP (Based on Tree-Insertion
    Grammar cf. Tree-Adjoining Grammar)
  • Sima'an The Tree-Gram Model (Markov-processes
    on sister-nodes, conditioned on lexical heads)

40
  • Linguistic issues Future work

41
  • Linguistic issues Future work
  • Scha (1990), about an imagined future DOP
    algorithm
  • It will be especially interesting to find out how
    such an algorithm can deal with complex syntactic
    phenomena such as "long distance movement". It is
    quite possible that an optimal matching algorithm
    does not operate exclusively on constructions
    which occur explicitly in the surface-structure
    perhaps "transformations" (in the classical
    Chomskyan sense) play a role in the parsing
    process.

42
  • Transformations
  • "John likes Mary."
  • "Mary is liked by John."
  • "Does John like Mary?"
  • "Who does John like?"
  • "Who do you think John likes?"
  • "Mary is the girl I think John likes."

43
  • Transformations
  • Wh-movement, Passivization, Topicalization,
    Fronting, Scrambling, . . .?
  • Move-Alfa?

44
Psycholinguistics Revisited
45
Psycholinguistic Considerations
  • DOP is a performance model
  • DOP defines syntactic probabilities of sentences
    and their analyses
  • (against the background of a weak, overgenerating
    competence grammar the definition of all
    formally possible sentence annotations).

46
Psycholinguistic Considerations
  • Does DOP account for performance phenomena?

47
Psycholinguistic Considerations
  • Probabilistic Disambiguation
  • Psychological experiments consistently show that
    disambiguation preferences correlate with
    occurrence frequencies.

48
Psycholinguistic Considerations
  • The "Garden Path" Phenomenon
  • "The horse raced past the barn "

49
Psycholinguistic Considerations
  • The "Garden Path" Phenomenon
  • "The horse raced past the barn fell."

50
Psycholinguistic Considerations
  • The "Garden Path" Phenomenon
  • "The horse raced past the barn fell."
  • Plausible model Incremental version of DOP
  • Analysis with very high probability kills
    analyses with low probability.

51
Psycholinguistic Considerations
  • Utterance Generation
  • Cf. Kempen et al. (Leyden University)
  • (Non-probabilistic) generation mechanism which
    combines tree fragments at random.

52
Psycholinguistic Considerations
  • Grammaticality Judgements
  • Cf. Stich Priming of Grammaticality Judgements.
  • Plausible model DOP with "recency effect".

53
Psycholinguistic Considerations
  • Integration with semantics
  • Cf. "Compositional Semantics" (Montague).
  • Assume semantically annotated corpus.Cf. Van den
    Berg et al.
  • Factoring in the probabilities of semantic
    subcategories Cf. Bonnema.

54
Psycholinguistic Considerations
  • Language dynamics
  • Grammar as an "emergent phenomenon" its
    development to be explained in terms of
    underlying, more detailed, possibly
    incommensurable phenomena.

55
Psycholinguistic Considerations
  • Dynamics
  • E.g. Physics
  • Thermodynamics Describes the relations between
    temperature, pressure, volume and entropy (in
    equilibrium situations).
  • Statistical thermodynamics explains this in terms
    of movements of molecules. (And movements of
    molecules also account for non-equilibrium
    situations.)
  • E.g. Biology
  • Theory of Evolution

56
  • "Doesn't every science live on this paradoxical
    slope to which it is doomed by the evanescence of
    its object in the very process of its
    apprehension, and by the pitiless reversal this
    dead object exerts on it?"
  • Baudrillard, 1983

57
Psycholinguistic Considerations
  • Language Acquisition
  • Q. How does a child get its first corpus?

58
Psycholinguistic Considerations
  • Language Acquisition
  • Q. How does a child get its first corpus?
  • A. By bootstrapping pragmatic/semantic
    structures.

59
Psycholinguistic Considerations
  • Language Acquisition
  • Rule-based models which bootstrap the syntactic
    structures from perceived semantic relations
  • Suggested by Schlesinger (1971, 1975)
  • Implemented by Chang Maia (2001)
  • Data-oriented version of this
  • Described by De Kreek (2003)

60
Psycholinguistic Considerations
  • Language Change
  • The data-oriented approach allows for gradual
    changes in parsing and generation preferences.
  • It allows language change within a lifetime.
    (Language change does not depend on
    misunderstandings between successive generations.)

61
Psychological Considerations
  • Perception Revisited
  • How to generalize DOP to visual and musical
    perception?

62
Psychological Considerations
  • Perception Revisited
  • How to generalize DOP to visual and musical
    perception?
  • How to represent visual and musical Gestalts in a
    formal way?
  • How to generalize DOP to arbitrary algebras?

63
Data-Oriented ParsingStatistical Issues
64
  • Statistical problems
  • DOP1 Relative Frequency Estimation on the
    fragment set.
  • Bonnema et al. (1999)
  • The DOP1 estimator has strange properties The
    largest trees in the corpus completely dominate
    the statistics.
  • Maximum Likelihood Estimation is not a viable
    alternative MLE completely overfits the corpus.

65
In DOP1, the largest trees in the corpus
completely dominate the statistics.
The above treebank contains 7 fragments with root
label S, each with probability 1/7. For the
input string 'ab', parse (a) will thus receive
probability 3/7 parse (d) will receive
probability 4/7.
66
In DOP1, the largest trees in the corpus
completely dominate the statistics.
Assume the above treebank, with equiprobable
initial rules S ? X and S ? A. Input string
'ab' will be analysed as a constituent of
category X, because of the relative improbability
of the fragments from (b)
67
In DOP1, the largest trees in the corpus
completely dominate the statistics.
Assume a treebank with 999 binary trees of depth
five and 1 tree of depth six. Now 99.8 of the
probability mass will go to fragments from the
only tree of depth six.
68
In DOP1, the largest trees in the corpus
completely dominate the statistics.
  • "Solution"
  • Heuristic constraints on tree-depth and number of
    terminals and non-terminals. E.g., Sima'an
    (1999)
  • Maximum of substitution sites (leaf
    non-terminals) 2.
  • Maximum of lexical items 9.
  • Maximum of consecutive lexical items 3.
  • Maximum tree-depth 4.

69
In DOP1, the largest trees in the corpus
completely dominate the statistics.Needed a
different estimator.
  • Not a solution Maximum Likelihood Estimation.
  • MLE completely overfits the corpus The DOP
    grammar which maximizes the chance of generating
    the treebank assigns the following probabilities
  • to every full corpus tree its relative frequency
    in the corpus
  • to every other fragment zero

(Bonnema Scha, 2003)
70
In DOP1, the largest trees in the corpus
completely dominate the statistics.Needed a
different estimator.
  • Bonnema et al. (1999) Treat every full
    corpus-tree as the representation of a set
    derivations.
  • If we assume a uniform probability distribution
    over this set of derivations, we arrive at the
    following "weighed relative frequency estimate".
    A fragment ? with N(?) non-root non-terminal
    nodes receives probability
  • P(?) 2N(?) F(?)

71
In DOP1, the largest trees in the corpus
completely dominate the statistics.Needed a
different estimator.
  • Bonnema et al. (1999) Treat every full
    corpus-tree as the representation of a set
    derivations.
  • If we assume a uniform probability distribution
    over this set of derivations, we arrive at the
    following "weighed relative frequency estimate".
    A fragment ? with N(?) non-root non-terminal
    nodes receives probability
  • P(?) 2N(?) F(?)
  • Sub-optimal assumption!

72
In DOP1, the largest trees in the corpus
completely dominate the statistics.Needed a
different estimator.
  • Solutions
  • Smoothing an overfitting estimation (Sima'an,
    Buratto).
  • Held-out estimation (Zollmann).

73
  • Smoothing
  • Good-Turing estimation Estimating the
    probability of unseen events on the basis of the
    number of observed unique events, twice-occurring
    events, etc.
  • Back-off Sparse-data problem with
    trigram-models estimate the probabilities of
    unseen trigrams on the basis of the probabilities
    of their constituent bigrams and unigrams

74
  • Held-out estimation
  • Get the fragment set from one part of the corpus
    and the probabilities from another part. Use ten
    different splits and take the average.

75
(No Transcript)
76
The Data-Oriented Perspective on Perlocutionary
Effect
  • "The effect of a lecture depends on the habits of
    the listener, because we expect the language to
    which we are accustomed."
  • Aristotle, Metaphysics II 12,13

77
Data-Oriented Parsing as a cognitive model
S
VP
NP
NP
detevery
Nwoman
Nman
det a
Vloves
78
Data-Oriented Parsing as a cognitive model
S
VP
NP
NP
detevery
Nwoman
Nman
det a
Vloves
Write a Comment
User Comments (0)
About PowerShow.com