Scaling Minimal Generalization - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Scaling Minimal Generalization

Description:

CON: This is too explicit; these rules emerge from a more fine-grained approach' ... It offers a nice balance between distributedness and explicitness. ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 23
Provided by: bartc5
Category:

less

Transcript and Presenter's Notes

Title: Scaling Minimal Generalization


1
  • Scaling Minimal Generalization
  • Bart Cramer, John Nerbonne
  • Lecture for symposium Computing
    PhonologyGroningen, December 2006

2
Contents
  • The Past Tense discussion
  • A few old models
  • Minimal Generalization (MG)
  • Psycholinguistic results of MG
  • MG phonotactics
  • The (stochastic) n-gram algorithm
  • Modifications to suit MG to phonotactics
  • Reducing the number of rules
  • Results
  • Conclusions
  • Future work

3
Past Tense discussion
  • Is inflectional morphology learned by a
    generative (i.e. a rule-based) model?
  • Example the Past Tense in English
  • PRO By far, most of the verbs in English are
    inflected regularly. Also, in wug word
    experiments, subjects show this preference. This
    position is adopted in e.g. Pinker and Prince
    (1994).
  • CON This is too explicit these rules emerge
    from a more fine-grained approach, for example
    a connectionist approach (Rumelhart and
    McClelland, 1986) or an analogical approach
    (Skousen, 1989).
  • Compromise a Words and Rules model, in which
    both approached are blended (Pinker, 1998)

4
Minimal GeneralizationAlbright and Hayes (2003)
  • Albright and Hayes (2003) try to counter the
    argument of the Words and Rules model, and claim
    that productive behavior emerges subsymbolically.
  • Their model extracts rules from the examples it
    perceives. These rules are actually hypotheses
    they might prove wrong in other cases. Hence the
    rules are stochastic.
  • The advantages
  • It offers a nice balance between distributedness
    and explicitness.
  • It turns out to have predictive power in
    psycholinguistics.

5
Minimal GeneralizationAlbright and Hayes (2003)
  • For all couples of examples identify the
    non-matching, shared and identical parts.
  • Example
  • Shine-shined and consign-consigned
  • s cons., obstruent, fricative, palatal,
    voicelessS cons., obstruent, fricative,
    alveolar-palatal, voiceless

6
Minimal GeneralizationAlbright and Hayes (2003)
  • Each rule has a context and a consequence. Scope
    is the number of times the context occurs hits
    is when the consequence is predicted correctly.
  • The confidence is then computed as hits / scope.
    The rule with the highest confidence is selected.
  • Example gleed-?

7
Minimal GeneralizationAlbright and Hayes (2003)
  • If a certain rule has a very high confidence, it
    is called an Island of Reliability (IOR).
  • When given a wug verb, the model will give
    ratings to a few inflections. Such a wug verb can
    belong to one of these four categories

8
Minimal GeneralizationAlbright and Hayes (2003)
9
MG Phonotactics
  • In the past, there have been several attempts to
    model phonotactics.
  • The simplest of these is an n-gram model
  • remember all combinations of phonemes seen
    before
  • if there is a new combination in a probe word,
    reject the word
  • otherwise, accept the word.
  • This model can be made stochastic, by creating a
    direction
  • car gt (ca -gt r)
  • Context -gt core
  • In this model, accept an n-gram if the confidence
    is higher than a certain threshold t.

10
MG Phonotactics
  • The rules in the stochastic n-gram model are
    labelled phoneme-specific.
  • When Minimal Generalization is applied to all
    combinations of these rules, we obtain
    phoneme-general rules.
  • One modification also the core, or consequence,
    is generalized over.
  • This is because given a certain context, there
    are probably several valid combinations to follow
    after that context.
  • However, this results in quite a high number of
    rules.

11
MG Phonotactics
  • Summarizing, we are looking at three models of
    phonotactics
  • The n-gram model
  • The stochastic n-gram model
  • The Minimal Generalization model

12
MG Phonotactics
brand, gland,
Apply MG b r - a g l - a
13
MG Phonotactics
  • Conclusion MG is only feasible using the 3-grams
    from CELEX, treating all vowels the same.

14
MG Phonotactics
  • The evaluation of a candidate word is done as
    follows
  • For each n-gram in the candidate word, a list of
    all matching rules is obtained. The mean of these
    confidences is taken.
  • If one of these means is below the threshold, the
    word is rejected this equals to a veto system.
  • Preliminary results showed that the Island of
    Reliability approach does not work.
  • The very general rules have a very high
    confidence, but accept very awkward combinations
    of vowels as well.

15
MG Phonotactics
  • A few data sets were created
  • P all monosyllables from CELEX
  • N1 negative data created incorporating unigram
    frequency levels, and frequency of onset and coda
    length.
  • N2 same as N1, but with the extra constraint,
    that either the onset or coda must have a length
    longer than 1.
  • N2 is simpler, but more justified, because N1
    contains many examples that are actually
    phonotactically correct.

16
MG Phonotactics
  • The main problem we encountered was that a large
    part of the matching rules was not useful,
    because the level of generalization was not
    controlled.
  • Too general consonant vowel -gt consonant
    (1.0).
  • Too many levels of abstraction, e.g. a p
    matches with
  • Consonant
  • Consonant, Obstruent
  • Consonant, Obstruent, Stop
  • Consonant, Obstruent, Stop, Labial
  • Consonant, Obstruent, Stop, Labial, Voiceless
  • Overall, MG seems to be aimed at accepting
    positive examples, but does less well in
    rejecting negatives.

17
MG Phonotactics
  • Filter_spurious filter all rules for which an
    alternative rule exists, that is both more
    general and has a higher confidence level.
  • 13 977 -gt 2 169 rules
  • Filter_coverage only keep the rules of which all
    the predictions it makes, a certain percentage is
    actually covered by the learning set.
  • t_cov 0.9 13 977 -gt 6 939 rules
  • Filter_spurious filter_coverage 1 749 rules.

18
MG Phonotactics
19
MG Phonotactics
CA Correct acceptance CR Correct rejection
20
Conclusions
  • Applying Minimal Generalisation on a slightly
    more complex problem yields an opaque collection
    of rules, and significantly inferior results.
  • Using filter_spurious was moderately successful
    a serious reduction of rules without a loss of
    performance.
  • Using filter_coverage was successful the number
    of rules reduced, while cutting the number of
    errors in half.
  • Filter_coverage is a gentle way to bootstrap
    negative information from positive data.

21
Future work
  • Currently, there is one model for all phonotactic
    data. Using different models for onset and coda
    might improve the performance.
  • Unifying the vowels was clearly a step forward,
    but has not been tried on the n-gram model, which
    would be fair.
  • When the model has become more accurate, test its
    behaviour against human behaviour.

22
Conclusions
  • Applying Minimal Generalisation on a slightly
    more complex problem yields an opaque collection
    of rules, and significantly inferior results.
  • Using filter_spurious was moderately successful
    a serious reduction of rules without a loss of
    performance.
  • Using filter_coverage was successful the number
    of rules reduced, while cutting the number of
    errors in half.
  • Filter_coverage is a gentle way to bootstrap
    negative information from positive data.
Write a Comment
User Comments (0)
About PowerShow.com