Learning Transfer Rules for Machine Translation with Limited Data - PowerPoint PPT Presentation

About This Presentation
Title:

Learning Transfer Rules for Machine Translation with Limited Data

Description:

Why has Machine Translation been applied only to few language pairs? ... Rules include a) a context-free backbone and b) unification constraints ... – PowerPoint PPT presentation

Number of Views:170
Avg rating:3.0/5.0
Slides: 92
Provided by: katharin7
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Learning Transfer Rules for Machine Translation with Limited Data


1
Learning Transfer Rules for Machine Translation
with Limited Data
  • Thesis Defense
  • Katharina Probst
  • Committee
  • Alon Lavie (Chair)
  • Jaime Carbonell
  • Lori Levin
  • Bonnie Dorr, University of Maryland

2
Introduction (I)
  • Why has Machine Translation been applied only to
    few language pairs?
  • Bilingual corpora available only for few language
    pairs (English-French, Japanese-English, etc.)
  • Natural Language Processing tools available only
    for few language (English, German, Spanish,
    Japanese, etc.)
  • Scaling to other languages often difficult,
    time-consuming, and knowledge-intensive
  • What can we do to change this?

3
Introduction (II)
  • This thesis presents a framework for automatic
    inference of transfer rules
  • Transfer rules capture syntactic and
    morphological mappings between languages
  • Learned from small, word-aligned training corpus
  • Rules are learned for unbalanced language pairs,
    where more data and tools are available for one
    language (L1) than for the other (L2)

4
Training Data Example
Setting the Stage Rule Learning Experimental
Results Conclusions
  • SL the widespread interest in the election
  • the interest the widespread in the election
  • TL h niin h rxb b h bxirwt
  • Alignment((1,1),(1,3),(2,4),(3,2),(4,5),(5,6),(6,
    7))
  • Type NP
  • Parse ( (DET the-1)
  • (ADJ widespread-2) (N interest-3)
  • ( (PREP in-4)
  • ( (DET the-5)
  • (N election-6))))

NP DET ADJ N PP the widespread
interest PREP NP in DET N the
election
5
Transfer Rule Formalism
  • L2 h niin h rxb b h bxirwt
  • L1 the widespread interest in the election
  • NPNP
  • h N h Adj PP - the Adj N PP
  • ((X1Y1)(X2Y3)
  • (X3Y1)(X4Y2)
  • (X5Y4)
  • ((Y3 num) (X2 num))
  • ((X2 num) sg)
  • ((X2 gen) m))

Training example Rule type Component
sequences Component alignments Agreement
constraints Value constraints
6
Research Goals (I)
  • Develop a framework for learning transfer rules
    from bilingual data
  • Training corpus set of sentences/phrases in one
    language with translation into other language
    ( bilingual corpus), word-aligned
  • Rules include a) a context-free backbone and b)
    unification constraints
  • Improve of the grammaticality of MT output by
    automatically learned rules
  • Learned rules improve translation quality in
    run-time system

7
Research Goals (II)
  • Learn rules in the absence of a parser for one of
    the languages
  • Infer syntactic knowledge about minor language
    using a) projection from major language, b)
    analysis of word alignments, c) morphology
    information, and d) bilingual dictionary
  • Combine a set of different knowledge sources in a
    meaningful way
  • Resources (parser, morphology modules,
    dictionary, etc.) often disagree
  • Combine conflicting knowledge sources

8
Research Goals (III)
  • Address limited-data scenarios with frugal
    techniques
  • Unbalanced language pairs with little or no
    bilingual data
  • Training corpus is small (120 sentences and
    phrases), but carefully designed
  • Pushing MT research in the direction of
    incorporating syntax into statistical-based
    systems
  • Infer highly involved linguistic information,
    incorporate with statistical decoder in hybrid
    system

9
Thesis Statement (I)
  • Given bilingual, word-aligned data, and given a
    parser for one of the languages in the
    translation pair, we can learn a set of syntactic
    transfer rules for MT.
  • The rules consist of a context-free backbone and
    unification constraints, learned in two separate
    stages.
  • The resulting rules form a syntactic translation
    grammar for the language pair and are used in a
    statistical transfer system to translate unseen
    examples.

10
Thesis Statement (II)
  • The translation quality of a run-time system that
    uses the learned rules is
  • superior to a system that does not use the
    learned rules
  • comparable to the performance using a small
    manual grammar written by an expert
  • on Hebrew-English and Hindi-English translation
    tasks.
  • The thesis presents a new approach to learning
    transfer rules for Machine Translation in that
    the system learns syntactic models from text in a
    novel way and in a rich hypothesis space, aiming
    at emulating a human grammar writer.

11
Talk Overview
  • Setting the Stage related work, system overview,
    training data
  • Rule Learning
  • Step 1 Seed Generation
  • Step 2 Compositionality
  • Step 3 Unification Constraints
  • Experimental Results
  • Conclusion

12
Related Work MT overview
Setting the Stage Rule Learning Experimental
Results Conclusions
Analyze meaning
Semantics-based MT
Depth of Analysis
Analyze structure
Syntax-based MT
Analyze sequence
Statistical MT, EBMT
Target Language
Source Language
13
Related Work (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Traditional transfer-based MT analysis,
    transfer, generation (Hutchins and Somers 1992,
    Senellart et al. 2001)
  • Data-driven MT
  • EBMT store database of examples, possibly
    generalized (Sato and Nagao 1990, Brown 1997)
  • SMT usually noisy channel model translation
    model target language model (Vogel et al. 2003,
    Och and Ney 2002, Brown 2004)
  • Hybrid (Knight et al. 1995, Habash and Dorr 2002)

14
Related Work (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Structure/syntax for MT
  • EBMT (Alshawi et al. 2000, Watanabe et al. 2002)
  • SMT (Yamada and Knight 2001, Wu 1997)
  • Other approaches (Habash and Dorr 2002, Menezes
    and Richardson 2001)
  • Learning from elicited data / small datasets
    (Nirenburg 1998, McShane et al 2003, Jones and
    Havrilla 1998)

15
Training Data Example
Setting the Stage Rule Learning Experimental
Results Conclusions
  • SL the widespread interest in the election
  • the interest the widespread in the election
  • TL h niin h rxb b h bxirwt
  • Alignment((1,1),(1,3),(2,4),(3,2),(4,5),(5,6),(6,
    7))
  • Type NP
  • Parse ( (DET the-1)
  • (ADJ widespread-2) (N interest-3)
  • ( (PREP in-4)
  • ( (DET the-5)
  • (N election-6))))

NP DET ADJ N PP the widespread
interest PREP NP in DET N the
election
16
Transfer Rule Formalism
Setting the Stage Rule Learning Experimental
Results Conclusions
  • L2 h niin h rxb b h bxirwt
  • the interest the widespread in the election
  • L1 the widespread interest in the election
  • NPNP
  • h N h Adj PP - the Adj N PP
  • ((X1Y1)(X2Y3)
  • (X3Y1)(X4Y2)
  • (X5Y4)
  • ((Y3 num) (X2 num))
  • ((X2 num) sg)
  • ((X2 gen) m))

Training example Rule type Component
sequences Component alignments Agreement
constraints Value constraints
17
Training Data Collection
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Elicitation Corpora
  • Generally designed to cover major linguistic
    phenomena
  • Bilingual user translates and word aligns
  • Structural Elicitation Corpus
  • Designed to cover a wide variety of structural
    phenomena (Probst and Lavie 2004)
  • 120 sentences and phrases
  • Targeting specific constituent types AdvP, AdjP,
    NP, PP, SBAR, S with subtypes
  • Translated into Hebrew, Hindi

18
Resources
Setting the Stage Rule Learning Experimental
Results Conclusions
  • L1 parses Either from statistical parser
    (Charniak 1999), or use data from Penn Treebank
  • L1 morphology Can be obtained or created (I
    created one for English)
  • L1 language model Trained on a large amount of
    monolingual data
  • L2 morphology If available, use morphology
    module. If not, use automated techniques, such
    as (Goldsmith 2001) or (Probst 2003).
  • Bilingual lexicon gives word-level
    correspondences, created from training data or
    previously existing

19
Development and Testing Environment
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Syntactic transfer engine takes rules and
    lexicon and produces all possible partial
    translations
  • Statistical decoder uses word-to-word
    probabilities and TL language model to extract
    best combination of partial translations (Vogel
    et al. 2003)

20
System Overview
Setting the Stage Rule Learning Experimental
Results Conclusions
Bilingual training data
Rule Learner
Training time
L1 parses morphology
Run time
Learned Rules
L2 morphology
Transfer Engine
Bilingual Lexicon
L1 Language Model
Lattice
Statistical Decoder
Final Translation
L2 test data
21
Overview of Learning Phases
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Seed Generation create initial guesses at rules
    based on specific training examples
  • Compositionality add context-free structure to
    rules, rules can combine
  • Constraint learning learn appropriate
    unification constraints

22
Seed Generation
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Training example in rule format
  • Produce rules that closely reflect training
    examples
  • But generalize to POS level when words are 1-1
    aligned
  • Rules are fully functional, but little
    generalization
  • Seed rules are intended as input for later two
    learning phases

23
Seed Generation Sample Learned rule
Setting the Stage Rule Learning Experimental
Results Conclusions
  • L2 TKNIT H _at_IPWL H HTNDBWTIT
  • plan the care the voluntary
  • L1 THE VOLUNTARY CARE PLAN
  • C-Structure( (DET the-1)
  • ( (ADJ voluntary-2))
  • (N care-3)(N plan-4))
  • NPNP N "H" N "H" ADJ - "THE" ADJ N N
  • (
  • (X1Y4)
  • (X3Y3)
  • (X5Y2)
  • )

24
Seed Generation Algorithm
Setting the Stage Rule Learning Experimental
Results Conclusions
  • For a given training example, produce a seed rule
  • For all 1-1 aligned words, enter the POS tag
    (e.g. N) into component sequences
  • Get POS tags from morphology module and parse
  • Hypothesis on unseen data, any words of this POS
    can fill this slot
  • For all not 1-1 aligned words, put actual words
    in component sequences
  • L2 and L1 type are parses root label
  • Derive alignments from training example

25
Compositionality
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Generalize seed rules to reflect structure
  • Infer a partial constituent grammar for L2
  • Rules map mixture of
  • Lexical items (LIT)
  • Parts of speech (PT)
  • Constituents (NT)
  • Analyze L1 parse to find generalizations
  • Produced rules are context-free

26
Compositionality - Example
Setting the Stage Rule Learning Experimental
Results Conclusions
  • L2 BTWK H M_at_PH HIH M
  • that inside the envelope was name
  • L1 THAT INSIDE THE ENVELOPE WAS A NAME
  • C-Structure( (SUBORD that-1)
  • ( ( (PREP inside-2)
  • ( (DET the-3)(N envelope-4)))
  • ( (V was-5))
  • ( (DET a-6)(N name-7))))
  • SBARSBAR
  • SUBORD PP V NP - SUBORD PP V NP
  • (
  • (X1Y1) (X2Y2) (X3Y3) (X4Y4)
  • )

27
Basic Compositionality Algorithm
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Traverse parse tree in order to partition
    sentence
  • For each sub-tree, if there is previously learned
    rule that can account for the subtree and its
    translation, introduce compositional element
  • Compositional element subtrees root label for
    both L1 and L2
  • Adjust alignments
  • Note preference for maximum generalization,
    because tree traversed from top

28
Maximum Compositionality
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Assume that lower-level rules exist Assumption is
    correct if training data is completely
    compositional
  • Introduce compositional elements for direct
    children of parse root node
  • Results in higher level of compositionality, thus
    higher generalization power
  • Can overgeneralize, but because of strong decoder
    generally preferable

29
Other Advanced Compositionality Techniques
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Techniques that allow you to generalize to POS
    not 1-1 aligned words
  • Techniques that enhance the dictionary based on
    training data
  • Techniques that deal with noun compounds
  • Rule filters to ensure that no learned rules
    violate axioms

30
Constraint Learning
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Annotate context-free compositional rules with
    unification constraints
  • a) limit applicability of rules to certain
    contexts (thereby limiting parsing ambiguity)
  • b) ensure the passing of a feature value from
    source to target language (thereby limiting
    transfer ambiguity)
  • c) disallow certain target language outputs
    (thereby limiting generation ambiguity)
  • Value constraints and agreement constraints are
    learned separately

31
Constraint Learning - Overview
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Introduce basic constraints use morphology
    module(s) and parses to introduce constraints for
    words in training example
  • Create agreement constraints (where appropriate)
    by merging basic constraints
  • Retain appropriate value constraints help in
    constricting a rule to some contexts or
    restricting output

32
Constraint Learning Agreement Constraints (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
  • For example In an NP, do the adjective and the
    noun agree in number?
  • in Hebrew the good boys
  • Correct H ILDIM _at_WBIM
  • the.det.def boy.pl.m good.pl.m
  • the good boys
  • Incorrect H ILDIM _at_WB
  • the.det.def boy.pl.m good.sg.m
  • the good boys

33
Constraint Learning Agreement Constraints (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
  • E.g. number in a determiner and the corresponding
    noun
  • Use a likelihood ratio test to determine what
    value constraints can be merged into agreement
    constraints
  • The log-likelihood ratio is defined by proposing
    distributions that could have given rise to the
    data
  • Null Hypothesis The values are independently
    distributed.
  • Alternative Hypothesis The values are not
    independently distributed.
  • For sparse data, use heuristic test if more
    evidence for than against agreement constraint

34
Constraint Learning Agreement Constraints (III)
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Collect all instances in the training data where
    an adjective and a noun mark for number
  • Count how often the feature value is the same,
    how often different
  • Feature values are distributed by
  • Two multinomial distributions (if theyre
    independent, e.g. Null hypothesis)
  • One multinomial distribution (if they should
    agree, e.g. Alternate hypothesis)
  • Compute log-likelihood under each scenario and
    perform LL ratio or heuristic test
  • Generalize to cross-lingual case

35
Constraint Learning Value Constraints
Setting the Stage Rule Learning Experimental
Results Conclusions
  • L2 ild _at_wb
  • boy good
  • L1 a good boy
  • NPNP N ADJ -
  • A'' ADJ N
  • (...
  • ((X1 NUM) SG)
  • ((X2 NUM) SG)
  • ...)

L2 ildim t_at_wbim boys good L1 good
boys NPNP N ADJ - ADJ N (... ((X1
NUM) PL) ((X2 NUM) PL) ...)
Retain value constraints to distinguish
36
Constraint Learning Value Constraints
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Retain those value constraints that determine the
    structure of the L2 translation
  • If you have two rules with
  • different L2 component sequences
  • same L1 component sequence
  • they differ in only a value constraint
  • Retain the value constraint to distinguish

37
Constraint Learning Sample Learned Rule
Setting the Stage Rule Learning Experimental
Results Conclusions
  • L2 ANI AIN_at_LIGN_at_I
  • I intelligent
  • L1 I AM INTELLIGENT
  • SS
  • NP ADJP - NP AM ADJP
  • (
  • (X1Y1) (X2Y3)
  • ((X1 NUM) (X2 NUM))
  • ((Y1 NUM) (X1 NUM))
  • ((Y1 PER) (X1 PER))
  • (Y0 Y2)
  • )

38
Dimensions of Evaluation
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Learning Phases / Settings default, Seed
    Generation only, Compositionality, Constraint
    Learning
  • Evaluation rule-based evaluation pruning
  • Test Corpora TestSet, TestSuite
  • Run-time Settings Lengthlimit
  • Portability Hindi?English translation

39
Test Corpora
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Test corpora
  • Test Corpus Newspaper text (Haaretz) 65
    sentences, 1 reference translation
  • Test Suite specific phenomena 138 sentences, 1
    reference translation
  • Hindi 245 sentences, 4 reference translations
  • Compare statistical system only, system with
    manually written grammar, system with learned
    grammar
  • Manually written grammar written by expert
    within about a month (both Hebrew and Hindi)

40
Test Corpus Evaluation, Default Settings (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
41
Test Corpus Evaluation, Default Settings (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Learned grammar performs statistically
    significantly better than baseline
  • Performed one-tailed paired t-test
  • BLEU with resampling
  • t-value 81.98, p-value0 (df999)
  • ? Significant at 100 confidence level
  • Median of differences -0.0217 with 95
    confidence interval -0.0383,-0.0056
  • METEOR
  • t-value 1.73, p-value 0.044 (df61)
  • ? Significant at higher than 95 confidence level

42
Test Corpus Evaluation, Default Settings (III)
Setting the Stage Rule Learning Experimental
Results Conclusions
43
Test Corpus Evaluation, Different Settings (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
44
Test Corpus Evaluation, Different Settings (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
System times in seconds, lattice sizes
? 20 reduction in lattice size!
45
Evaluation withRule Scoring (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Estimate translation power of the rules
  • Use training data most training examples are
    actually unseen data for a given rule
  • Match arc against the reference translation
  • A rules score is the average of all its arcs
    scores
  • Order the rules by precision score, prune
  • Goal of rule scoring limit run-time
  • Note trade-off with decoder power

46
Evaluation with Rule Scoring (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
47
Evaluation with Rule Scoring (III)
Setting the Stage Rule Learning Experimental
Results Conclusions
48
Test Suite Evaluation (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Test suite designed to target specific
    constructions
  • Conjunctions of PPs
  • Adverb phrases
  • Reordering of adjectives and nouns
  • AdjP embedded in NP
  • Possessives
  • Designed in English, translated into Hebrew
  • 138 sentences, one reference translation

49
Test Suite Evaluation (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
50
Test Suite Evaluation (III)
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Learned grammar performs statistically
    significantly better than baseline
  • Performed one-tailed paired t-test
  • BLEU with resampling
  • t-value 122.53, p-value0 (df999)
  • ? Statistically significantly better at 100
    confidence level
  • Median of differences -0.0462 with 95
    confidence interval -0.0245,-0.0721
  • METEOR
  • t-value 47.20, p-value 0.0 (df137)
  • ? Statistically significantly better at 100
    confidence level

51
Test Suite Evaluation (IV)
Setting the Stage Rule Learning Experimental
Results Conclusions
52
Hindi-English Portability Test (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
53
Hindi-English Portability Test (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Learned grammar performs statistically
    significantly better than baseline
  • Performed one-tailed paired t-test
  • BLEU with resampling
  • t-value 37.20, p-value0 (df999)
  • ? Statistically significantly better at 100
    confidence level
  • Median of differences -0.0024 with 80
    confidence interval -0.0052,0.0001
  • METEOR
  • t-value 1.72, p-value 0.043 (df244)
  • ? Statistically significantly better at higher
    than 95 confidence level

54
Hindi-EnglishPortability Test (III)
Setting the Stage Rule Learning Experimental
Results Conclusions
55
Discussion of Results
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Performance superior to standard SMT system
  • Learned grammar comparable to manual grammar
  • Learned grammar higher METEOR score, indicating
    that it is more general
  • Constraints slightly lower performance in
    exchange for higher run-time efficiency
  • Pruning slightly lower performance in exchange
    for higher run-time efficiency

56
Conclusions and Contributions
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Framework for learning transfer rules from
    bilingual data
  • Improvement of translation output in hybrid
    transfer and statistical system
  • Addressing limited-data scenarios with frugal
    techniques
  • Combining different knowledge sources in a
    meaningful way
  • Pushing MT research in the direction of
    incorporating syntax into statistical-based
    systems
  • Human-readable rules that can be improved by an
    expert

57
Summary
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Take a bilingual word-aligned corpus, and learn
    transfer rules with constituent transfer and
    unification constraints.
  • Is it a big corpus?
  • Ahem. No.
  • Do I have a parser for both languages?
  • No, just for one.
  • So I can use a dictionary, morphology modules,
    a parser But these are all imperfect resources.
    How do I combine them?
  • We can do it!
  • Ok.

58
  • Thank you!

59
  • Additional Slides

60
References (I)
  • Ayan, Fazil, Bonnie J. Dorr, and Nizar Habash.
    Application of Alignment to Real-World Data
    Combining Linguistic and Statistical Techniques
    for Adaptable MT. Proceedings of AMTA-2004.
  • Baldwin, Timothy and Aline Villavicencio. 2002.
    Extracting the Unextractable A case study on
    verb-particles. Proceedings of CoNLL-2002.
  • Brown, Ralf D., A Modified Burrows-Wheeler
    Transform for Highly-Scalable Example-Based
    Translation, Proceedings of AMTA-2004.
  • Charniak, Eugene, Kevin Knight and Kenji Yamada.
    2003. Syntax-based Language Models for
    Statistical Machine Translation. Proceedings of
    MT-Summit IX.

61
References (II)
  • Hutchins, John W. and Harold L. Somers. 1992. An
    Introduction to Machine Translation. Academic
    Press, London.
  • Jones, Douglas and R. Havrilla. Twisted Pair
    Grammar Support for Rapid Development of Machine
    Translation for Low Density Languages.
    Proceedings of AMTA-98.
  • Menezes, Arul and Stephen D. Richardson. A
    best-first alignment algorithm for automatic
    extraction of transfer mappings from bilingual
    corpora. Proceedings of the Workshop on
    Data-driven Machine Translation at ACL-2001.
  • Nirenburg, Sergei. Project Boas A Linguist in
    the Box as a Multi-Purpose Language Resource.
    Proceedings of LREC-98.

62
References (III)
  • Orasan, Constantin and Richard Evans. 2001.
    Learning to identify animate references.
    Proceedings of CoNLL-2001.
  • Probst, Katharina. 2003. Using smart bilingual
    projection to feature-tag a monolingual
    dictionary. Proceedings of CoNLL-2003.
  • Probst, Katharina and Alon Lavie. A Structurally
    Diverse Minimal Corpus for Eliciting Structural
    Mappings between Languages. Proceedings of
    AMTA-04.
  • Probst, Katharina and Lori Levin. 2002.
    Challenges in Automated Elicitation of a
    Controlled Bilingual Corpus. Proceedings of
    TMI-02.

63
References (IV)
  • Senellart, Jean, Mirko Plitt, Christophe Bailly,
    and Francoise Cardoso. 2001. Resource Alignment
    and Implicit Transfer. Proceedings of MT-Summit
    VIII.
  • Vogel, Stephan and Alicia Tribble. 2002.
    Improving Statistical Machine Translation for a
    Speech-to-Speech Translation Task. Proceedings of
    ICSLP-2002.
  • Watanabe, Hideo, Sadao Kurohashi, and Eiji
    Aramaki. 2000. Finding Structural Correspondences
    from Bilingual Parsed Corpus for Corpus-based
    Translation. Proceedings of COLING-2000.

64
Log-likelihood test for agreement constraints (I)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
  • create list of all possible index pairs that
    should be considered for an agreement constraint
  • L1 only constraints
  • list of all head-head pairs that ever occur with
    the same feature (not necessarily same value),
    and all head-nonheads in the same constituent
    that occur with the same feature (not necessarily
    same value).
  • For example, possible agreement constraint Num
    agreement between Det and N in a NP where the Det
    is a dependent of N
  • L2 only constraints same as L1 only constraints
    above.
  • L2?L1 constraints all situations where two
    aligned indices mark the same feature

65
Log-likelihood test for agreement constraints (II)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
  • Hypothesis 0 The values are independently
    distributed.
  • Hypothesis 1 The values are not independently
    distributed.
  • Under the null hypothesis
  • Under the alternative hypothesis
  • where ind is 1 if vxi1 vxi2 and 0 otherwise.

66
Log-likelihood test for agreement constraints
(III)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
  • i1 and i2 drawn from a multinomial distribution.
  • where cvi is the number of times the value vi
    was encountered for the given feature (e.g.
    PERS), and k is the number of possible values for
    the feature (e.g. 1st, 2nd, 3rd).
  • If strong evidence for Hypothesis 0, introduce
    agreement constraint
  • For cases where there is not enough evidence
    either way (n

67
Lexicon Enhancement for Hebrew Adverbs (I)
  • Example 1 B MX ? happily
  • Example 2 IWTR GBWH ? taller
  • These are not necessarily in the dictionary
  • Both processes are productive
  • How can we add these and similar entries to
    lexicon? Automatically?

68
Lexicon Enhancement for Hebrew Adverbs (II)
  • For all 1-2 (L1-L2) alignments in training data
  • 1. extract all cases of at least 2 instances
    where one word is constant (constant word
    wL2c, non-constant word wL2v, non-constant word
    wL1v)
  • 2. For each word wL2v
  • 2.1. Get all L1 translations
  • 2.2. Find the closest match wL1match to wL1v
  • 2.3. Learn replacement rule wL1match-wL1v
  • 3. For each word wL2POS of same POS as wL2c
  • 3.1. For each possible translations wL1POS
  • 3.1.1. Apply all replacement rules
    possible wL1POS-wL1POSmod
  • 3.1.2. For each applied replacement rule,
    insert into lexicon entry
  • wc wL2POS - wL1POSmod

69
Lexicon Enhancement for Hebrew Adverbs (III)
  • Example B MX - happily
  • Possible translations of MX
  • joy
  • happiness
  • Use edit distance to find that happiness is
    wL1match for happily
  • Learn replacement rule ness-ly

70
Lexicon Enhancement for Hebrew Adverbs (IV)
  • For all L2 Nouns in the dictionary, get all
    possible L1 translations, and apply the
    replacement rule
  • If replacement rule can be applied, add lexicon
    entry
  • Examples of new adverbs added to lexicon
  • ADVADV "B" "APTNWT" - "AMBITIOUSLY"
  • ADVADV "B" "BIRWT" - "BRITTLELY"
  • ADVADV "B" "GWN" - "MADLY"
  • ADVADV "B" "I_at_TIWT" - "METHODICALLY"

71
Lexicon Enhancement for Hebrew Comparatives
  • Same process as for adverbs
  • Examples of new comparatives added to lexicon
  • ADJADJ "IWTR" "MLA" - "FULLER"
  • ADJADJ "IWTR" "MPGR" - "SLOWER"
  • ADJADJ "IWTR" "MQCH" - "HEATER"
  • All words are checked in the BNC
  • Comment automatic process, thus far from perfect

72
Some notation
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
  • SL Source Language, language to be translated
    from
  • TL Target Language, language to be translated
    into
  • L1 language for which abundant information is
    available
  • L2 language for which less information is
    available
  • (Here) SL L2 Hebrew, Hindi
  • (Here) TL L1 English
  • POS part of speech, e.g. noun, adjective, verb
  • Parse structural (tree) analysis of sentence
  • Lattice list of partial translations, arranged
    by length and start index

73
Training Data Example
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
  • SL the widespread interest in the election
  • TL h niin h rxb b h bxirwt
  • Alignment((1,1),(1,3),(2,4),(3,2),(4,5),(5,6),(6,
    7))
  • Type NP
  • Parse
  • (
  • (DET the-1)(ADJ widespread-2)(N interest-3)
  • ( (PREP in-4)
  • ( (DET the-5)(N election-6))))

74
Seed Generation Algorithm
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
  • for all training examples
  • for all 1-1 aligned words
  • get the L1 POS tag from the parse
  • get the L2 POS tag from the morphology module
    and the dictionary
  • if the L1 POS and the L2 POS tags are not the
    same, leave both words lexicalized
  • for all other words
  • leave the words lexicalized
  • create rule word alignments from training
    example
  • set L2 type and L1 type to be the parse roots
    label

75
Taxonomy of Structural Mappings (I)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
  • Non-terminals (NT)
  • used in two rule parts
  • type definition of a rule (both for SLand TL,
    meaning X0 and Y0),
  • constituent sequences for both languages.
  • any label that can be the type of a rule
  • describe higher-level structures such as
    sentences (S), noun phrases (NP), or
    prepositional phrases(PP).
  • can be filled with more than one word filled by
    other rules.

76
Taxonomy of Structural Mappings (II)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
  • Pre-terminals (PT)
  • used only in the constituent sequences of the
    rules, not as X0 or Y0 types.
  • filled with only one word, except phrasal lexicon
    entries filled by lexical entries, not by other
    grammar rules.
  • Terminals (LIT)
  • lexicalized entries in the constituent sequences
  • can be used on both the x- and the y-side
  • can only be filled by the specified terminal
    itself.

77
Taxonomy of Structural Mappings (III)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
  • NTs must not be aligned 1-0 or 0-1
  • PTs must not be aligned 1-0 or 0-1.
  • Any word in the bilingual training pair must
    participate in exactly one LIT, PT, or NT.
  • An L1 NT is assumed to translate into the same NT
    inL2.

78
Taxonomy of Structural Mappings (IV)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
  • Transformation I (SL type into SL component
    sequence).
  • NT ? (NT PT LIT)
  • Transformation II (SL type into TL type).
  • NTi ? NTi (same type of NT)
  • Transformation III (TL type into TL component
    sequence).
  • NT ? (NT PT LIT)
  • Transformation IV (SL components into TL
    components).
  • NTi ? NTi (same type of NT)
  • PT ? PT
  • LIT ? e
  • e ? LIT

79
Basic Compositionality Pseudocode
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
  • traverse parse top-down
  • for each node i in parse
  • extract the subtree rooted at i
  • extract the L1 chunk cL1 rooted at i and the
    corresponding L2 chunk cL2 (using alignments)
  • if transfer engine can translate cL1 into cL2
    using previously learned rules
  • introduce compositional element
  • replace POS sequence for cL1 and cL2 with
    label of node i
  • adjust alignments
  • do not traverse already covered subtree

80
Co-Embedding Resolution, Iterative Type Learning
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
  • Problem looking for previously learned rules
  • Must determine optimal learning ordering
  • Co-Embedding Resolution
  • Tag each training example with depth of tree,
    i.e. how many embedded elements
  • Then learn lowest to highest
  • Iterative Type Learning
  • Some types (e.g. PPs) are frequently embedded in
    others (e.g. NP)
  • Pre-determine the order in which types are learned

81
Compositionality Sample Learned Rules (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
  • L2 RQ AM H RKBT TGI
  • L1 ONLY IF THE TRAIN ARRIVES
  • C-Structure(
  • ( (ADV only-1))
  • (SUBORD if-2)
  • ( ( (DET the-3)(N train-4))
  • ( (V arrives-5))))
  • SBARSBAR
  • ADVP SUBORD S - ADVP SUBORD S
  • (
  • (X1Y1) (X2Y2) (X3Y3)
  • )

82
Taxonomy of Constraints (I)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
83
Co-Embedding Resolution, Iterative Type Learning
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
  • find highest co-embedding score in training data
  • find the number of types to learn, ntypes
  • for (i 0 i
  • for (j 0 j
  • for all training examples with co-embedding
    score i and of type j
  • perform Seed Generation
  • perform Compositionality Learning

84
Taxonomy of Constraints (II)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
85
Taxonomy of Constraints (III)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
86
Constraint Learning Sample Learned Rules (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
  • L2 H ILD AKL KI HWA HIH RB
  • L1 THE BOY ATE BECAUSE HE WAS HUNGRY
  • SS NP V SBAR - NP V SBAR
  • (
  • (X1Y1) (X2Y2) (X3Y3)
  • (X0 X2)
  • ((X1 GEN) (X2 GEN))
  • ((X1 NUM) (X2 NUM))
  • ((Y1 NUM) (X1 NUM))
  • ((Y2 TENSE) (X2 TENSE))
  • ((Y3 NUM) (X3 NUM))
  • ((Y3 TENSE) (X3 TENSE))
  • (Y0 Y2))

87
Evaluation with Different Length Limits (I)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
88
Evaluation with Different Length Limits
(II)(METEOR score)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
89
Discussion of Results Comparison of Translations
(back to Hebrew-English)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
  • No grammar the doctor helps to patients his
  • Learned grammar the doctor helps to his patients
  • Reference translation The doctor helps his
    patients  
  • No grammar the soldier writes many letters to
    the family of he
  • Learned grammar the soldier writes many letters
    to his family
  • Reference translation The soldier writes many
    letters to his family  

90
Time Complexity of Algorithms
Setting the Stage Rule Learning Experimental
Results Conclusions
  • Seed Generation O(n)
  • Compositionality
  • Basic O(nmax(tree_depth))
  • Maximum Compositionality O(nmax(num_children))
  • Constraint Learning O(nmax(num_basic_constraints
    ))
  • Practically no issue

91
If I had 6 more months
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
  • Application to larger datasets
  • Training data enhancement to obtain training
    examples at different levels (NPs, PPs, etc.)
  • More emphasis on rule scoring (more noise)
  • More emphasis on context learning constraints
  • Constraint learning as version space learning
    problem
  • Integrate rules into statistical system more
    directly, without producing full lattice
Write a Comment
User Comments (0)
About PowerShow.com