Title: Learning Transfer Rules for Machine Translation with Limited Data
1Learning Transfer Rules for Machine Translation
with Limited Data
- Thesis Defense
- Katharina Probst
- Committee
- Alon Lavie (Chair)
- Jaime Carbonell
- Lori Levin
- Bonnie Dorr, University of Maryland
2Introduction (I)
- Why has Machine Translation been applied only to
few language pairs? - Bilingual corpora available only for few language
pairs (English-French, Japanese-English, etc.) - Natural Language Processing tools available only
for few language (English, German, Spanish,
Japanese, etc.) - Scaling to other languages often difficult,
time-consuming, and knowledge-intensive - What can we do to change this?
3Introduction (II)
- This thesis presents a framework for automatic
inference of transfer rules - Transfer rules capture syntactic and
morphological mappings between languages - Learned from small, word-aligned training corpus
- Rules are learned for unbalanced language pairs,
where more data and tools are available for one
language (L1) than for the other (L2)
4Training Data Example
Setting the Stage Rule Learning Experimental
Results Conclusions
- SL the widespread interest in the election
- the interest the widespread in the election
- TL h niin h rxb b h bxirwt
- Alignment((1,1),(1,3),(2,4),(3,2),(4,5),(5,6),(6,
7)) - Type NP
- Parse ( (DET the-1)
- (ADJ widespread-2) (N interest-3)
- ( (PREP in-4)
- ( (DET the-5)
- (N election-6))))
NP DET ADJ N PP the widespread
interest PREP NP in DET N the
election
5Transfer Rule Formalism
- L2 h niin h rxb b h bxirwt
- L1 the widespread interest in the election
- NPNP
- h N h Adj PP - the Adj N PP
- ((X1Y1)(X2Y3)
- (X3Y1)(X4Y2)
- (X5Y4)
- ((Y3 num) (X2 num))
- ((X2 num) sg)
- ((X2 gen) m))
Training example Rule type Component
sequences Component alignments Agreement
constraints Value constraints
6Research Goals (I)
- Develop a framework for learning transfer rules
from bilingual data - Training corpus set of sentences/phrases in one
language with translation into other language
( bilingual corpus), word-aligned - Rules include a) a context-free backbone and b)
unification constraints - Improve of the grammaticality of MT output by
automatically learned rules - Learned rules improve translation quality in
run-time system
7Research Goals (II)
- Learn rules in the absence of a parser for one of
the languages - Infer syntactic knowledge about minor language
using a) projection from major language, b)
analysis of word alignments, c) morphology
information, and d) bilingual dictionary - Combine a set of different knowledge sources in a
meaningful way - Resources (parser, morphology modules,
dictionary, etc.) often disagree - Combine conflicting knowledge sources
8Research Goals (III)
- Address limited-data scenarios with frugal
techniques - Unbalanced language pairs with little or no
bilingual data - Training corpus is small (120 sentences and
phrases), but carefully designed - Pushing MT research in the direction of
incorporating syntax into statistical-based
systems - Infer highly involved linguistic information,
incorporate with statistical decoder in hybrid
system
9Thesis Statement (I)
- Given bilingual, word-aligned data, and given a
parser for one of the languages in the
translation pair, we can learn a set of syntactic
transfer rules for MT. - The rules consist of a context-free backbone and
unification constraints, learned in two separate
stages. - The resulting rules form a syntactic translation
grammar for the language pair and are used in a
statistical transfer system to translate unseen
examples.
10Thesis Statement (II)
- The translation quality of a run-time system that
uses the learned rules is - superior to a system that does not use the
learned rules - comparable to the performance using a small
manual grammar written by an expert - on Hebrew-English and Hindi-English translation
tasks. - The thesis presents a new approach to learning
transfer rules for Machine Translation in that
the system learns syntactic models from text in a
novel way and in a rich hypothesis space, aiming
at emulating a human grammar writer.
11Talk Overview
- Setting the Stage related work, system overview,
training data - Rule Learning
- Step 1 Seed Generation
- Step 2 Compositionality
- Step 3 Unification Constraints
- Experimental Results
- Conclusion
12Related Work MT overview
Setting the Stage Rule Learning Experimental
Results Conclusions
Analyze meaning
Semantics-based MT
Depth of Analysis
Analyze structure
Syntax-based MT
Analyze sequence
Statistical MT, EBMT
Target Language
Source Language
13Related Work (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
- Traditional transfer-based MT analysis,
transfer, generation (Hutchins and Somers 1992,
Senellart et al. 2001) - Data-driven MT
- EBMT store database of examples, possibly
generalized (Sato and Nagao 1990, Brown 1997) - SMT usually noisy channel model translation
model target language model (Vogel et al. 2003,
Och and Ney 2002, Brown 2004) - Hybrid (Knight et al. 1995, Habash and Dorr 2002)
14Related Work (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
- Structure/syntax for MT
- EBMT (Alshawi et al. 2000, Watanabe et al. 2002)
- SMT (Yamada and Knight 2001, Wu 1997)
- Other approaches (Habash and Dorr 2002, Menezes
and Richardson 2001) - Learning from elicited data / small datasets
(Nirenburg 1998, McShane et al 2003, Jones and
Havrilla 1998)
15Training Data Example
Setting the Stage Rule Learning Experimental
Results Conclusions
- SL the widespread interest in the election
- the interest the widespread in the election
- TL h niin h rxb b h bxirwt
- Alignment((1,1),(1,3),(2,4),(3,2),(4,5),(5,6),(6,
7)) - Type NP
- Parse ( (DET the-1)
- (ADJ widespread-2) (N interest-3)
- ( (PREP in-4)
- ( (DET the-5)
- (N election-6))))
NP DET ADJ N PP the widespread
interest PREP NP in DET N the
election
16Transfer Rule Formalism
Setting the Stage Rule Learning Experimental
Results Conclusions
- L2 h niin h rxb b h bxirwt
- the interest the widespread in the election
- L1 the widespread interest in the election
- NPNP
- h N h Adj PP - the Adj N PP
- ((X1Y1)(X2Y3)
- (X3Y1)(X4Y2)
- (X5Y4)
- ((Y3 num) (X2 num))
- ((X2 num) sg)
- ((X2 gen) m))
Training example Rule type Component
sequences Component alignments Agreement
constraints Value constraints
17Training Data Collection
Setting the Stage Rule Learning Experimental
Results Conclusions
- Elicitation Corpora
- Generally designed to cover major linguistic
phenomena - Bilingual user translates and word aligns
- Structural Elicitation Corpus
- Designed to cover a wide variety of structural
phenomena (Probst and Lavie 2004) - 120 sentences and phrases
- Targeting specific constituent types AdvP, AdjP,
NP, PP, SBAR, S with subtypes - Translated into Hebrew, Hindi
18Resources
Setting the Stage Rule Learning Experimental
Results Conclusions
- L1 parses Either from statistical parser
(Charniak 1999), or use data from Penn Treebank - L1 morphology Can be obtained or created (I
created one for English) - L1 language model Trained on a large amount of
monolingual data - L2 morphology If available, use morphology
module. If not, use automated techniques, such
as (Goldsmith 2001) or (Probst 2003). - Bilingual lexicon gives word-level
correspondences, created from training data or
previously existing
19Development and Testing Environment
Setting the Stage Rule Learning Experimental
Results Conclusions
- Syntactic transfer engine takes rules and
lexicon and produces all possible partial
translations - Statistical decoder uses word-to-word
probabilities and TL language model to extract
best combination of partial translations (Vogel
et al. 2003)
20System Overview
Setting the Stage Rule Learning Experimental
Results Conclusions
Bilingual training data
Rule Learner
Training time
L1 parses morphology
Run time
Learned Rules
L2 morphology
Transfer Engine
Bilingual Lexicon
L1 Language Model
Lattice
Statistical Decoder
Final Translation
L2 test data
21Overview of Learning Phases
Setting the Stage Rule Learning Experimental
Results Conclusions
- Seed Generation create initial guesses at rules
based on specific training examples - Compositionality add context-free structure to
rules, rules can combine - Constraint learning learn appropriate
unification constraints
22Seed Generation
Setting the Stage Rule Learning Experimental
Results Conclusions
- Training example in rule format
- Produce rules that closely reflect training
examples - But generalize to POS level when words are 1-1
aligned - Rules are fully functional, but little
generalization - Seed rules are intended as input for later two
learning phases
23Seed Generation Sample Learned rule
Setting the Stage Rule Learning Experimental
Results Conclusions
- L2 TKNIT H _at_IPWL H HTNDBWTIT
- plan the care the voluntary
- L1 THE VOLUNTARY CARE PLAN
- C-Structure( (DET the-1)
- ( (ADJ voluntary-2))
- (N care-3)(N plan-4))
- NPNP N "H" N "H" ADJ - "THE" ADJ N N
- (
- (X1Y4)
- (X3Y3)
- (X5Y2)
- )
24Seed Generation Algorithm
Setting the Stage Rule Learning Experimental
Results Conclusions
- For a given training example, produce a seed rule
- For all 1-1 aligned words, enter the POS tag
(e.g. N) into component sequences - Get POS tags from morphology module and parse
- Hypothesis on unseen data, any words of this POS
can fill this slot - For all not 1-1 aligned words, put actual words
in component sequences - L2 and L1 type are parses root label
- Derive alignments from training example
25Compositionality
Setting the Stage Rule Learning Experimental
Results Conclusions
- Generalize seed rules to reflect structure
- Infer a partial constituent grammar for L2
- Rules map mixture of
- Lexical items (LIT)
- Parts of speech (PT)
- Constituents (NT)
- Analyze L1 parse to find generalizations
- Produced rules are context-free
26Compositionality - Example
Setting the Stage Rule Learning Experimental
Results Conclusions
- L2 BTWK H M_at_PH HIH M
- that inside the envelope was name
- L1 THAT INSIDE THE ENVELOPE WAS A NAME
- C-Structure( (SUBORD that-1)
- ( ( (PREP inside-2)
- ( (DET the-3)(N envelope-4)))
- ( (V was-5))
- ( (DET a-6)(N name-7))))
- SBARSBAR
- SUBORD PP V NP - SUBORD PP V NP
- (
- (X1Y1) (X2Y2) (X3Y3) (X4Y4)
- )
27Basic Compositionality Algorithm
Setting the Stage Rule Learning Experimental
Results Conclusions
- Traverse parse tree in order to partition
sentence - For each sub-tree, if there is previously learned
rule that can account for the subtree and its
translation, introduce compositional element - Compositional element subtrees root label for
both L1 and L2 - Adjust alignments
- Note preference for maximum generalization,
because tree traversed from top
28Maximum Compositionality
Setting the Stage Rule Learning Experimental
Results Conclusions
- Assume that lower-level rules exist Assumption is
correct if training data is completely
compositional - Introduce compositional elements for direct
children of parse root node - Results in higher level of compositionality, thus
higher generalization power - Can overgeneralize, but because of strong decoder
generally preferable
29Other Advanced Compositionality Techniques
Setting the Stage Rule Learning Experimental
Results Conclusions
- Techniques that allow you to generalize to POS
not 1-1 aligned words - Techniques that enhance the dictionary based on
training data - Techniques that deal with noun compounds
- Rule filters to ensure that no learned rules
violate axioms
30Constraint Learning
Setting the Stage Rule Learning Experimental
Results Conclusions
- Annotate context-free compositional rules with
unification constraints - a) limit applicability of rules to certain
contexts (thereby limiting parsing ambiguity) - b) ensure the passing of a feature value from
source to target language (thereby limiting
transfer ambiguity) - c) disallow certain target language outputs
(thereby limiting generation ambiguity) - Value constraints and agreement constraints are
learned separately
31Constraint Learning - Overview
Setting the Stage Rule Learning Experimental
Results Conclusions
- Introduce basic constraints use morphology
module(s) and parses to introduce constraints for
words in training example - Create agreement constraints (where appropriate)
by merging basic constraints - Retain appropriate value constraints help in
constricting a rule to some contexts or
restricting output
32Constraint Learning Agreement Constraints (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
- For example In an NP, do the adjective and the
noun agree in number? - in Hebrew the good boys
- Correct H ILDIM _at_WBIM
- the.det.def boy.pl.m good.pl.m
- the good boys
- Incorrect H ILDIM _at_WB
- the.det.def boy.pl.m good.sg.m
- the good boys
33Constraint Learning Agreement Constraints (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
- E.g. number in a determiner and the corresponding
noun - Use a likelihood ratio test to determine what
value constraints can be merged into agreement
constraints - The log-likelihood ratio is defined by proposing
distributions that could have given rise to the
data - Null Hypothesis The values are independently
distributed. - Alternative Hypothesis The values are not
independently distributed. - For sparse data, use heuristic test if more
evidence for than against agreement constraint
34Constraint Learning Agreement Constraints (III)
Setting the Stage Rule Learning Experimental
Results Conclusions
- Collect all instances in the training data where
an adjective and a noun mark for number - Count how often the feature value is the same,
how often different - Feature values are distributed by
- Two multinomial distributions (if theyre
independent, e.g. Null hypothesis) - One multinomial distribution (if they should
agree, e.g. Alternate hypothesis) - Compute log-likelihood under each scenario and
perform LL ratio or heuristic test - Generalize to cross-lingual case
35Constraint Learning Value Constraints
Setting the Stage Rule Learning Experimental
Results Conclusions
- L2 ild _at_wb
- boy good
- L1 a good boy
- NPNP N ADJ -
- A'' ADJ N
- (...
- ((X1 NUM) SG)
- ((X2 NUM) SG)
- ...)
L2 ildim t_at_wbim boys good L1 good
boys NPNP N ADJ - ADJ N (... ((X1
NUM) PL) ((X2 NUM) PL) ...)
Retain value constraints to distinguish
36Constraint Learning Value Constraints
Setting the Stage Rule Learning Experimental
Results Conclusions
- Retain those value constraints that determine the
structure of the L2 translation - If you have two rules with
- different L2 component sequences
- same L1 component sequence
- they differ in only a value constraint
- Retain the value constraint to distinguish
37Constraint Learning Sample Learned Rule
Setting the Stage Rule Learning Experimental
Results Conclusions
- L2 ANI AIN_at_LIGN_at_I
- I intelligent
- L1 I AM INTELLIGENT
- SS
- NP ADJP - NP AM ADJP
- (
- (X1Y1) (X2Y3)
- ((X1 NUM) (X2 NUM))
- ((Y1 NUM) (X1 NUM))
- ((Y1 PER) (X1 PER))
- (Y0 Y2)
- )
38Dimensions of Evaluation
Setting the Stage Rule Learning Experimental
Results Conclusions
- Learning Phases / Settings default, Seed
Generation only, Compositionality, Constraint
Learning - Evaluation rule-based evaluation pruning
- Test Corpora TestSet, TestSuite
- Run-time Settings Lengthlimit
- Portability Hindi?English translation
39Test Corpora
Setting the Stage Rule Learning Experimental
Results Conclusions
- Test corpora
- Test Corpus Newspaper text (Haaretz) 65
sentences, 1 reference translation - Test Suite specific phenomena 138 sentences, 1
reference translation - Hindi 245 sentences, 4 reference translations
- Compare statistical system only, system with
manually written grammar, system with learned
grammar - Manually written grammar written by expert
within about a month (both Hebrew and Hindi)
40Test Corpus Evaluation, Default Settings (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
41Test Corpus Evaluation, Default Settings (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
- Learned grammar performs statistically
significantly better than baseline - Performed one-tailed paired t-test
- BLEU with resampling
- t-value 81.98, p-value0 (df999)
- ? Significant at 100 confidence level
- Median of differences -0.0217 with 95
confidence interval -0.0383,-0.0056 - METEOR
- t-value 1.73, p-value 0.044 (df61)
- ? Significant at higher than 95 confidence level
42Test Corpus Evaluation, Default Settings (III)
Setting the Stage Rule Learning Experimental
Results Conclusions
43Test Corpus Evaluation, Different Settings (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
44Test Corpus Evaluation, Different Settings (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
System times in seconds, lattice sizes
? 20 reduction in lattice size!
45Evaluation withRule Scoring (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
- Estimate translation power of the rules
- Use training data most training examples are
actually unseen data for a given rule - Match arc against the reference translation
- A rules score is the average of all its arcs
scores - Order the rules by precision score, prune
- Goal of rule scoring limit run-time
- Note trade-off with decoder power
46Evaluation with Rule Scoring (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
47Evaluation with Rule Scoring (III)
Setting the Stage Rule Learning Experimental
Results Conclusions
48Test Suite Evaluation (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
- Test suite designed to target specific
constructions - Conjunctions of PPs
- Adverb phrases
- Reordering of adjectives and nouns
- AdjP embedded in NP
- Possessives
-
- Designed in English, translated into Hebrew
- 138 sentences, one reference translation
49Test Suite Evaluation (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
50Test Suite Evaluation (III)
Setting the Stage Rule Learning Experimental
Results Conclusions
- Learned grammar performs statistically
significantly better than baseline - Performed one-tailed paired t-test
- BLEU with resampling
- t-value 122.53, p-value0 (df999)
- ? Statistically significantly better at 100
confidence level - Median of differences -0.0462 with 95
confidence interval -0.0245,-0.0721 - METEOR
- t-value 47.20, p-value 0.0 (df137)
- ? Statistically significantly better at 100
confidence level
51Test Suite Evaluation (IV)
Setting the Stage Rule Learning Experimental
Results Conclusions
52Hindi-English Portability Test (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
53Hindi-English Portability Test (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
- Learned grammar performs statistically
significantly better than baseline - Performed one-tailed paired t-test
- BLEU with resampling
- t-value 37.20, p-value0 (df999)
- ? Statistically significantly better at 100
confidence level - Median of differences -0.0024 with 80
confidence interval -0.0052,0.0001 - METEOR
- t-value 1.72, p-value 0.043 (df244)
- ? Statistically significantly better at higher
than 95 confidence level
54Hindi-EnglishPortability Test (III)
Setting the Stage Rule Learning Experimental
Results Conclusions
55Discussion of Results
Setting the Stage Rule Learning Experimental
Results Conclusions
- Performance superior to standard SMT system
- Learned grammar comparable to manual grammar
- Learned grammar higher METEOR score, indicating
that it is more general - Constraints slightly lower performance in
exchange for higher run-time efficiency - Pruning slightly lower performance in exchange
for higher run-time efficiency
56Conclusions and Contributions
Setting the Stage Rule Learning Experimental
Results Conclusions
- Framework for learning transfer rules from
bilingual data - Improvement of translation output in hybrid
transfer and statistical system - Addressing limited-data scenarios with frugal
techniques - Combining different knowledge sources in a
meaningful way - Pushing MT research in the direction of
incorporating syntax into statistical-based
systems - Human-readable rules that can be improved by an
expert
57Summary
Setting the Stage Rule Learning Experimental
Results Conclusions
- Take a bilingual word-aligned corpus, and learn
transfer rules with constituent transfer and
unification constraints. - Is it a big corpus?
- Ahem. No.
- Do I have a parser for both languages?
- No, just for one.
- So I can use a dictionary, morphology modules,
a parser But these are all imperfect resources.
How do I combine them? - We can do it!
- Ok.
58 59 60References (I)
- Ayan, Fazil, Bonnie J. Dorr, and Nizar Habash.
Application of Alignment to Real-World Data
Combining Linguistic and Statistical Techniques
for Adaptable MT. Proceedings of AMTA-2004. - Baldwin, Timothy and Aline Villavicencio. 2002.
Extracting the Unextractable A case study on
verb-particles. Proceedings of CoNLL-2002. - Brown, Ralf D., A Modified Burrows-Wheeler
Transform for Highly-Scalable Example-Based
Translation, Proceedings of AMTA-2004. - Charniak, Eugene, Kevin Knight and Kenji Yamada.
2003. Syntax-based Language Models for
Statistical Machine Translation. Proceedings of
MT-Summit IX.
61References (II)
- Hutchins, John W. and Harold L. Somers. 1992. An
Introduction to Machine Translation. Academic
Press, London. - Jones, Douglas and R. Havrilla. Twisted Pair
Grammar Support for Rapid Development of Machine
Translation for Low Density Languages.
Proceedings of AMTA-98. - Menezes, Arul and Stephen D. Richardson. A
best-first alignment algorithm for automatic
extraction of transfer mappings from bilingual
corpora. Proceedings of the Workshop on
Data-driven Machine Translation at ACL-2001. - Nirenburg, Sergei. Project Boas A Linguist in
the Box as a Multi-Purpose Language Resource.
Proceedings of LREC-98.
62References (III)
- Orasan, Constantin and Richard Evans. 2001.
Learning to identify animate references.
Proceedings of CoNLL-2001. - Probst, Katharina. 2003. Using smart bilingual
projection to feature-tag a monolingual
dictionary. Proceedings of CoNLL-2003. - Probst, Katharina and Alon Lavie. A Structurally
Diverse Minimal Corpus for Eliciting Structural
Mappings between Languages. Proceedings of
AMTA-04. - Probst, Katharina and Lori Levin. 2002.
Challenges in Automated Elicitation of a
Controlled Bilingual Corpus. Proceedings of
TMI-02.
63References (IV)
- Senellart, Jean, Mirko Plitt, Christophe Bailly,
and Francoise Cardoso. 2001. Resource Alignment
and Implicit Transfer. Proceedings of MT-Summit
VIII. - Vogel, Stephan and Alicia Tribble. 2002.
Improving Statistical Machine Translation for a
Speech-to-Speech Translation Task. Proceedings of
ICSLP-2002. - Watanabe, Hideo, Sadao Kurohashi, and Eiji
Aramaki. 2000. Finding Structural Correspondences
from Bilingual Parsed Corpus for Corpus-based
Translation. Proceedings of COLING-2000.
64Log-likelihood test for agreement constraints (I)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
- create list of all possible index pairs that
should be considered for an agreement constraint - L1 only constraints
- list of all head-head pairs that ever occur with
the same feature (not necessarily same value),
and all head-nonheads in the same constituent
that occur with the same feature (not necessarily
same value). - For example, possible agreement constraint Num
agreement between Det and N in a NP where the Det
is a dependent of N - L2 only constraints same as L1 only constraints
above. - L2?L1 constraints all situations where two
aligned indices mark the same feature
65Log-likelihood test for agreement constraints (II)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
- Hypothesis 0 The values are independently
distributed. - Hypothesis 1 The values are not independently
distributed. - Under the null hypothesis
- Under the alternative hypothesis
-
- where ind is 1 if vxi1 vxi2 and 0 otherwise.
66Log-likelihood test for agreement constraints
(III)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
- i1 and i2 drawn from a multinomial distribution.
-
- where cvi is the number of times the value vi
was encountered for the given feature (e.g.
PERS), and k is the number of possible values for
the feature (e.g. 1st, 2nd, 3rd). - If strong evidence for Hypothesis 0, introduce
agreement constraint - For cases where there is not enough evidence
either way (n
67Lexicon Enhancement for Hebrew Adverbs (I)
- Example 1 B MX ? happily
- Example 2 IWTR GBWH ? taller
- These are not necessarily in the dictionary
- Both processes are productive
- How can we add these and similar entries to
lexicon? Automatically?
68Lexicon Enhancement for Hebrew Adverbs (II)
- For all 1-2 (L1-L2) alignments in training data
- 1. extract all cases of at least 2 instances
where one word is constant (constant word
wL2c, non-constant word wL2v, non-constant word
wL1v) - 2. For each word wL2v
- 2.1. Get all L1 translations
- 2.2. Find the closest match wL1match to wL1v
- 2.3. Learn replacement rule wL1match-wL1v
- 3. For each word wL2POS of same POS as wL2c
- 3.1. For each possible translations wL1POS
- 3.1.1. Apply all replacement rules
possible wL1POS-wL1POSmod - 3.1.2. For each applied replacement rule,
insert into lexicon entry - wc wL2POS - wL1POSmod
69Lexicon Enhancement for Hebrew Adverbs (III)
- Example B MX - happily
- Possible translations of MX
- joy
- happiness
- Use edit distance to find that happiness is
wL1match for happily - Learn replacement rule ness-ly
70Lexicon Enhancement for Hebrew Adverbs (IV)
- For all L2 Nouns in the dictionary, get all
possible L1 translations, and apply the
replacement rule - If replacement rule can be applied, add lexicon
entry - Examples of new adverbs added to lexicon
- ADVADV "B" "APTNWT" - "AMBITIOUSLY"
- ADVADV "B" "BIRWT" - "BRITTLELY"
- ADVADV "B" "GWN" - "MADLY"
- ADVADV "B" "I_at_TIWT" - "METHODICALLY"
71Lexicon Enhancement for Hebrew Comparatives
- Same process as for adverbs
- Examples of new comparatives added to lexicon
- ADJADJ "IWTR" "MLA" - "FULLER"
- ADJADJ "IWTR" "MPGR" - "SLOWER"
- ADJADJ "IWTR" "MQCH" - "HEATER"
- All words are checked in the BNC
- Comment automatic process, thus far from perfect
72Some notation
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
- SL Source Language, language to be translated
from - TL Target Language, language to be translated
into - L1 language for which abundant information is
available - L2 language for which less information is
available - (Here) SL L2 Hebrew, Hindi
- (Here) TL L1 English
- POS part of speech, e.g. noun, adjective, verb
- Parse structural (tree) analysis of sentence
- Lattice list of partial translations, arranged
by length and start index -
73Training Data Example
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
- SL the widespread interest in the election
- TL h niin h rxb b h bxirwt
- Alignment((1,1),(1,3),(2,4),(3,2),(4,5),(5,6),(6,
7)) - Type NP
- Parse
- (
- (DET the-1)(ADJ widespread-2)(N interest-3)
- ( (PREP in-4)
- ( (DET the-5)(N election-6))))
74Seed Generation Algorithm
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
- for all training examples
- for all 1-1 aligned words
- get the L1 POS tag from the parse
- get the L2 POS tag from the morphology module
and the dictionary - if the L1 POS and the L2 POS tags are not the
same, leave both words lexicalized - for all other words
- leave the words lexicalized
- create rule word alignments from training
example - set L2 type and L1 type to be the parse roots
label
75Taxonomy of Structural Mappings (I)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
- Non-terminals (NT)
- used in two rule parts
- type definition of a rule (both for SLand TL,
meaning X0 and Y0), - constituent sequences for both languages.
- any label that can be the type of a rule
- describe higher-level structures such as
sentences (S), noun phrases (NP), or
prepositional phrases(PP). - can be filled with more than one word filled by
other rules.
76Taxonomy of Structural Mappings (II)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
- Pre-terminals (PT)
- used only in the constituent sequences of the
rules, not as X0 or Y0 types. - filled with only one word, except phrasal lexicon
entries filled by lexical entries, not by other
grammar rules. - Terminals (LIT)
- lexicalized entries in the constituent sequences
- can be used on both the x- and the y-side
- can only be filled by the specified terminal
itself.
77Taxonomy of Structural Mappings (III)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
- NTs must not be aligned 1-0 or 0-1
- PTs must not be aligned 1-0 or 0-1.
- Any word in the bilingual training pair must
participate in exactly one LIT, PT, or NT. - An L1 NT is assumed to translate into the same NT
inL2.
78Taxonomy of Structural Mappings (IV)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
- Transformation I (SL type into SL component
sequence). - NT ? (NT PT LIT)
- Transformation II (SL type into TL type).
- NTi ? NTi (same type of NT)
- Transformation III (TL type into TL component
sequence). - NT ? (NT PT LIT)
- Transformation IV (SL components into TL
components). - NTi ? NTi (same type of NT)
- PT ? PT
- LIT ? e
- e ? LIT
79Basic Compositionality Pseudocode
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
- traverse parse top-down
- for each node i in parse
- extract the subtree rooted at i
- extract the L1 chunk cL1 rooted at i and the
corresponding L2 chunk cL2 (using alignments) - if transfer engine can translate cL1 into cL2
using previously learned rules - introduce compositional element
- replace POS sequence for cL1 and cL2 with
label of node i - adjust alignments
- do not traverse already covered subtree
80Co-Embedding Resolution, Iterative Type Learning
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
- Problem looking for previously learned rules
- Must determine optimal learning ordering
- Co-Embedding Resolution
- Tag each training example with depth of tree,
i.e. how many embedded elements - Then learn lowest to highest
- Iterative Type Learning
- Some types (e.g. PPs) are frequently embedded in
others (e.g. NP) - Pre-determine the order in which types are learned
81Compositionality Sample Learned Rules (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
- L2 RQ AM H RKBT TGI
- L1 ONLY IF THE TRAIN ARRIVES
- C-Structure(
- ( (ADV only-1))
- (SUBORD if-2)
- ( ( (DET the-3)(N train-4))
- ( (V arrives-5))))
- SBARSBAR
- ADVP SUBORD S - ADVP SUBORD S
- (
- (X1Y1) (X2Y2) (X3Y3)
- )
82Taxonomy of Constraints (I)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
83Co-Embedding Resolution, Iterative Type Learning
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
- find highest co-embedding score in training data
- find the number of types to learn, ntypes
- for (i 0 i
- for (j 0 j
- for all training examples with co-embedding
score i and of type j - perform Seed Generation
- perform Compositionality Learning
84Taxonomy of Constraints (II)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
85Taxonomy of Constraints (III)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
86Constraint Learning Sample Learned Rules (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
- L2 H ILD AKL KI HWA HIH RB
- L1 THE BOY ATE BECAUSE HE WAS HUNGRY
- SS NP V SBAR - NP V SBAR
- (
- (X1Y1) (X2Y2) (X3Y3)
- (X0 X2)
- ((X1 GEN) (X2 GEN))
- ((X1 NUM) (X2 NUM))
- ((Y1 NUM) (X1 NUM))
- ((Y2 TENSE) (X2 TENSE))
- ((Y3 NUM) (X3 NUM))
- ((Y3 TENSE) (X3 TENSE))
- (Y0 Y2))
87Evaluation with Different Length Limits (I)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
88Evaluation with Different Length Limits
(II)(METEOR score)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
89Discussion of Results Comparison of Translations
(back to Hebrew-English)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
- No grammar the doctor helps to patients his
- Learned grammar the doctor helps to his patients
- Reference translation The doctor helps his
patients - No grammar the soldier writes many letters to
the family of he - Learned grammar the soldier writes many letters
to his family - Reference translation The soldier writes many
letters to his family -
90Time Complexity of Algorithms
Setting the Stage Rule Learning Experimental
Results Conclusions
- Seed Generation O(n)
- Compositionality
- Basic O(nmax(tree_depth))
- Maximum Compositionality O(nmax(num_children))
- Constraint Learning O(nmax(num_basic_constraints
)) - Practically no issue
91If I had 6 more months
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
- Application to larger datasets
- Training data enhancement to obtain training
examples at different levels (NPs, PPs, etc.) - More emphasis on rule scoring (more noise)
- More emphasis on context learning constraints
- Constraint learning as version space learning
problem - Integrate rules into statistical system more
directly, without producing full lattice