Learning Transfer Rules for Machine Translation with Limited Data

About This Presentation

Title:

Learning Transfer Rules for Machine Translation with Limited Data

Description:

Why has Machine Translation been applied only to few language pairs? ... Rules include a) a context-free backbone and b) unification constraints ... – PowerPoint PPT presentation

Number of Views:170

Avg rating:3.0/5.0

Slides: 92

Provided by: katharin7

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning Transfer Rules for Machine Translation with Limited Data

1
Learning Transfer Rules for Machine Translation
with Limited Data

Thesis Defense
Katharina Probst
Committee
Alon Lavie (Chair)
Jaime Carbonell
Lori Levin
Bonnie Dorr, University of Maryland

2
Introduction (I)

Why has Machine Translation been applied only to
few language pairs?
Bilingual corpora available only for few language
pairs (English-French, Japanese-English, etc.)
Natural Language Processing tools available only
for few language (English, German, Spanish,
Japanese, etc.)
Scaling to other languages often difficult,
time-consuming, and knowledge-intensive
What can we do to change this?

3
Introduction (II)

This thesis presents a framework for automatic
inference of transfer rules
Transfer rules capture syntactic and
morphological mappings between languages
Learned from small, word-aligned training corpus
Rules are learned for unbalanced language pairs,
where more data and tools are available for one
language (L1) than for the other (L2)

4
Training Data Example
Setting the Stage Rule Learning Experimental
Results Conclusions

SL the widespread interest in the election
the interest the widespread in the election
TL h niin h rxb b h bxirwt
Alignment((1,1),(1,3),(2,4),(3,2),(4,5),(5,6),(6,
7))
Type NP
Parse ( (DET the-1)
(ADJ widespread-2) (N interest-3)
( (PREP in-4)
( (DET the-5)
(N election-6))))

NP DET ADJ N PP the widespread
interest PREP NP in DET N the
election
5
Transfer Rule Formalism

L2 h niin h rxb b h bxirwt
L1 the widespread interest in the election
NPNP
h N h Adj PP - the Adj N PP
((X1Y1)(X2Y3)
(X3Y1)(X4Y2)
(X5Y4)
((Y3 num) (X2 num))
((X2 num) sg)
((X2 gen) m))

Training example Rule type Component
sequences Component alignments Agreement
constraints Value constraints
6
Research Goals (I)

Develop a framework for learning transfer rules
from bilingual data
Training corpus set of sentences/phrases in one
language with translation into other language
( bilingual corpus), word-aligned
Rules include a) a context-free backbone and b)
unification constraints
Improve of the grammaticality of MT output by
automatically learned rules
Learned rules improve translation quality in
run-time system

7
Research Goals (II)

Learn rules in the absence of a parser for one of
the languages
Infer syntactic knowledge about minor language
using a) projection from major language, b)
analysis of word alignments, c) morphology
information, and d) bilingual dictionary
Combine a set of different knowledge sources in a
meaningful way
Resources (parser, morphology modules,
dictionary, etc.) often disagree
Combine conflicting knowledge sources

8
Research Goals (III)

Address limited-data scenarios with frugal
techniques
Unbalanced language pairs with little or no
bilingual data
Training corpus is small (120 sentences and
phrases), but carefully designed
Pushing MT research in the direction of
incorporating syntax into statistical-based
systems
Infer highly involved linguistic information,
incorporate with statistical decoder in hybrid
system

9
Thesis Statement (I)

Given bilingual, word-aligned data, and given a
parser for one of the languages in the
translation pair, we can learn a set of syntactic
transfer rules for MT.
The rules consist of a context-free backbone and
unification constraints, learned in two separate
stages.
The resulting rules form a syntactic translation
grammar for the language pair and are used in a
statistical transfer system to translate unseen
examples.

10
Thesis Statement (II)

The translation quality of a run-time system that
uses the learned rules is
superior to a system that does not use the
learned rules
comparable to the performance using a small
manual grammar written by an expert
on Hebrew-English and Hindi-English translation
tasks.
The thesis presents a new approach to learning
transfer rules for Machine Translation in that
the system learns syntactic models from text in a
novel way and in a rich hypothesis space, aiming
at emulating a human grammar writer.

11
Talk Overview

Setting the Stage related work, system overview,
training data
Rule Learning
Step 1 Seed Generation
Step 2 Compositionality
Step 3 Unification Constraints
Experimental Results
Conclusion

12
Related Work MT overview
Setting the Stage Rule Learning Experimental
Results Conclusions
Analyze meaning
Semantics-based MT
Depth of Analysis
Analyze structure
Syntax-based MT
Analyze sequence
Statistical MT, EBMT
Target Language
Source Language
13
Related Work (I)
Setting the Stage Rule Learning Experimental
Results Conclusions

Traditional transfer-based MT analysis,
transfer, generation (Hutchins and Somers 1992,
Senellart et al. 2001)
Data-driven MT
EBMT store database of examples, possibly
generalized (Sato and Nagao 1990, Brown 1997)
SMT usually noisy channel model translation
model target language model (Vogel et al. 2003,
Och and Ney 2002, Brown 2004)
Hybrid (Knight et al. 1995, Habash and Dorr 2002)

14
Related Work (II)
Setting the Stage Rule Learning Experimental
Results Conclusions

Structure/syntax for MT
EBMT (Alshawi et al. 2000, Watanabe et al. 2002)
SMT (Yamada and Knight 2001, Wu 1997)
Other approaches (Habash and Dorr 2002, Menezes
and Richardson 2001)
Learning from elicited data / small datasets
(Nirenburg 1998, McShane et al 2003, Jones and
Havrilla 1998)

15
Training Data Example
Setting the Stage Rule Learning Experimental
Results Conclusions

SL the widespread interest in the election
the interest the widespread in the election
TL h niin h rxb b h bxirwt
Alignment((1,1),(1,3),(2,4),(3,2),(4,5),(5,6),(6,
7))
Type NP
Parse ( (DET the-1)
(ADJ widespread-2) (N interest-3)
( (PREP in-4)
( (DET the-5)
(N election-6))))

NP DET ADJ N PP the widespread
interest PREP NP in DET N the
election
16
Transfer Rule Formalism
Setting the Stage Rule Learning Experimental
Results Conclusions

L2 h niin h rxb b h bxirwt
the interest the widespread in the election
L1 the widespread interest in the election
NPNP
h N h Adj PP - the Adj N PP
((X1Y1)(X2Y3)
(X3Y1)(X4Y2)
(X5Y4)
((Y3 num) (X2 num))
((X2 num) sg)
((X2 gen) m))

Training example Rule type Component
sequences Component alignments Agreement
constraints Value constraints
17
Training Data Collection
Setting the Stage Rule Learning Experimental
Results Conclusions

Elicitation Corpora
Generally designed to cover major linguistic
phenomena
Bilingual user translates and word aligns
Structural Elicitation Corpus
Designed to cover a wide variety of structural
phenomena (Probst and Lavie 2004)
120 sentences and phrases
Targeting specific constituent types AdvP, AdjP,
NP, PP, SBAR, S with subtypes
Translated into Hebrew, Hindi

18
Resources
Setting the Stage Rule Learning Experimental
Results Conclusions

L1 parses Either from statistical parser
(Charniak 1999), or use data from Penn Treebank
L1 morphology Can be obtained or created (I
created one for English)
L1 language model Trained on a large amount of
monolingual data
L2 morphology If available, use morphology
module. If not, use automated techniques, such
as (Goldsmith 2001) or (Probst 2003).
Bilingual lexicon gives word-level
correspondences, created from training data or
previously existing

19
Development and Testing Environment
Setting the Stage Rule Learning Experimental
Results Conclusions

Syntactic transfer engine takes rules and
lexicon and produces all possible partial
translations
Statistical decoder uses word-to-word
probabilities and TL language model to extract
best combination of partial translations (Vogel
et al. 2003)

20
System Overview
Setting the Stage Rule Learning Experimental
Results Conclusions
Bilingual training data
Rule Learner
Training time
L1 parses morphology
Run time
Learned Rules
L2 morphology
Transfer Engine
Bilingual Lexicon
L1 Language Model
Lattice
Statistical Decoder
Final Translation
L2 test data
21
Overview of Learning Phases
Setting the Stage Rule Learning Experimental
Results Conclusions

Seed Generation create initial guesses at rules
based on specific training examples
Compositionality add context-free structure to
rules, rules can combine
Constraint learning learn appropriate
unification constraints

22
Seed Generation
Setting the Stage Rule Learning Experimental
Results Conclusions

Training example in rule format
Produce rules that closely reflect training
examples
But generalize to POS level when words are 1-1
aligned
Rules are fully functional, but little
generalization
Seed rules are intended as input for later two
learning phases

23
Seed Generation Sample Learned rule
Setting the Stage Rule Learning Experimental
Results Conclusions

L2 TKNIT H _at_IPWL H HTNDBWTIT
plan the care the voluntary
L1 THE VOLUNTARY CARE PLAN
C-Structure( (DET the-1)
( (ADJ voluntary-2))
(N care-3)(N plan-4))
NPNP N "H" N "H" ADJ - "THE" ADJ N N
(
(X1Y4)
(X3Y3)
(X5Y2)
)

24
Seed Generation Algorithm
Setting the Stage Rule Learning Experimental
Results Conclusions

For a given training example, produce a seed rule
For all 1-1 aligned words, enter the POS tag
(e.g. N) into component sequences
Get POS tags from morphology module and parse
Hypothesis on unseen data, any words of this POS
can fill this slot
For all not 1-1 aligned words, put actual words
in component sequences
L2 and L1 type are parses root label
Derive alignments from training example

25
Compositionality
Setting the Stage Rule Learning Experimental
Results Conclusions

Generalize seed rules to reflect structure
Infer a partial constituent grammar for L2
Rules map mixture of
Lexical items (LIT)
Parts of speech (PT)
Constituents (NT)
Analyze L1 parse to find generalizations
Produced rules are context-free

26
Compositionality - Example
Setting the Stage Rule Learning Experimental
Results Conclusions

L2 BTWK H M_at_PH HIH M
that inside the envelope was name
L1 THAT INSIDE THE ENVELOPE WAS A NAME
C-Structure( (SUBORD that-1)
( ( (PREP inside-2)
( (DET the-3)(N envelope-4)))
( (V was-5))
( (DET a-6)(N name-7))))
SBARSBAR
SUBORD PP V NP - SUBORD PP V NP
(
(X1Y1) (X2Y2) (X3Y3) (X4Y4)
)

27
Basic Compositionality Algorithm
Setting the Stage Rule Learning Experimental
Results Conclusions

Traverse parse tree in order to partition
sentence
For each sub-tree, if there is previously learned
rule that can account for the subtree and its
translation, introduce compositional element
Compositional element subtrees root label for
both L1 and L2
Adjust alignments
Note preference for maximum generalization,
because tree traversed from top

28
Maximum Compositionality
Setting the Stage Rule Learning Experimental
Results Conclusions

Assume that lower-level rules exist Assumption is
correct if training data is completely
compositional
Introduce compositional elements for direct
children of parse root node
Results in higher level of compositionality, thus
higher generalization power
Can overgeneralize, but because of strong decoder
generally preferable

29
Other Advanced Compositionality Techniques
Setting the Stage Rule Learning Experimental
Results Conclusions

Techniques that allow you to generalize to POS
not 1-1 aligned words
Techniques that enhance the dictionary based on
training data
Techniques that deal with noun compounds
Rule filters to ensure that no learned rules
violate axioms

30
Constraint Learning
Setting the Stage Rule Learning Experimental
Results Conclusions

Annotate context-free compositional rules with
unification constraints
a) limit applicability of rules to certain
contexts (thereby limiting parsing ambiguity)
b) ensure the passing of a feature value from
source to target language (thereby limiting
transfer ambiguity)
c) disallow certain target language outputs
(thereby limiting generation ambiguity)
Value constraints and agreement constraints are
learned separately

31
Constraint Learning - Overview
Setting the Stage Rule Learning Experimental
Results Conclusions

Introduce basic constraints use morphology
module(s) and parses to introduce constraints for
words in training example
Create agreement constraints (where appropriate)
by merging basic constraints
Retain appropriate value constraints help in
constricting a rule to some contexts or
restricting output

32
Constraint Learning Agreement Constraints (I)
Setting the Stage Rule Learning Experimental
Results Conclusions

For example In an NP, do the adjective and the
noun agree in number?
in Hebrew the good boys
Correct H ILDIM _at_WBIM
the.det.def boy.pl.m good.pl.m
the good boys
Incorrect H ILDIM _at_WB
the.det.def boy.pl.m good.sg.m
the good boys

33
Constraint Learning Agreement Constraints (II)
Setting the Stage Rule Learning Experimental
Results Conclusions

E.g. number in a determiner and the corresponding
noun
Use a likelihood ratio test to determine what
value constraints can be merged into agreement
constraints
The log-likelihood ratio is defined by proposing
distributions that could have given rise to the
data
Null Hypothesis The values are independently
distributed.
Alternative Hypothesis The values are not
independently distributed.
For sparse data, use heuristic test if more
evidence for than against agreement constraint

34
Constraint Learning Agreement Constraints (III)
Setting the Stage Rule Learning Experimental
Results Conclusions

Collect all instances in the training data where
an adjective and a noun mark for number
Count how often the feature value is the same,
how often different
Feature values are distributed by
Two multinomial distributions (if theyre
independent, e.g. Null hypothesis)
One multinomial distribution (if they should
agree, e.g. Alternate hypothesis)
Compute log-likelihood under each scenario and
perform LL ratio or heuristic test
Generalize to cross-lingual case

35
Constraint Learning Value Constraints
Setting the Stage Rule Learning Experimental
Results Conclusions

L2 ild _at_wb
boy good
L1 a good boy
NPNP N ADJ -
A'' ADJ N
(...
((X1 NUM) SG)
((X2 NUM) SG)
...)

L2 ildim t_at_wbim boys good L1 good
boys NPNP N ADJ - ADJ N (... ((X1
NUM) PL) ((X2 NUM) PL) ...)
Retain value constraints to distinguish
36
Constraint Learning Value Constraints
Setting the Stage Rule Learning Experimental
Results Conclusions

Retain those value constraints that determine the
structure of the L2 translation
If you have two rules with
different L2 component sequences
same L1 component sequence
they differ in only a value constraint
Retain the value constraint to distinguish

37
Constraint Learning Sample Learned Rule
Setting the Stage Rule Learning Experimental
Results Conclusions

L2 ANI AIN_at_LIGN_at_I
I intelligent
L1 I AM INTELLIGENT
SS
NP ADJP - NP AM ADJP
(
(X1Y1) (X2Y3)
((X1 NUM) (X2 NUM))
((Y1 NUM) (X1 NUM))
((Y1 PER) (X1 PER))
(Y0 Y2)
)

38
Dimensions of Evaluation
Setting the Stage Rule Learning Experimental
Results Conclusions

Learning Phases / Settings default, Seed
Generation only, Compositionality, Constraint
Learning
Evaluation rule-based evaluation pruning
Test Corpora TestSet, TestSuite
Run-time Settings Lengthlimit
Portability Hindi?English translation

39
Test Corpora
Setting the Stage Rule Learning Experimental
Results Conclusions

Test corpora
Test Corpus Newspaper text (Haaretz) 65
sentences, 1 reference translation
Test Suite specific phenomena 138 sentences, 1
reference translation
Hindi 245 sentences, 4 reference translations
Compare statistical system only, system with
manually written grammar, system with learned
grammar
Manually written grammar written by expert
within about a month (both Hebrew and Hindi)

40
Test Corpus Evaluation, Default Settings (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
41
Test Corpus Evaluation, Default Settings (II)
Setting the Stage Rule Learning Experimental
Results Conclusions

Learned grammar performs statistically
significantly better than baseline
Performed one-tailed paired t-test
BLEU with resampling
t-value 81.98, p-value0 (df999)
? Significant at 100 confidence level
Median of differences -0.0217 with 95
confidence interval -0.0383,-0.0056
METEOR
t-value 1.73, p-value 0.044 (df61)
? Significant at higher than 95 confidence level

42
Test Corpus Evaluation, Default Settings (III)
Setting the Stage Rule Learning Experimental
Results Conclusions
43
Test Corpus Evaluation, Different Settings (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
44
Test Corpus Evaluation, Different Settings (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
System times in seconds, lattice sizes
? 20 reduction in lattice size!
45
Evaluation withRule Scoring (I)
Setting the Stage Rule Learning Experimental
Results Conclusions

Estimate translation power of the rules
Use training data most training examples are
actually unseen data for a given rule
Match arc against the reference translation
A rules score is the average of all its arcs
scores
Order the rules by precision score, prune
Goal of rule scoring limit run-time
Note trade-off with decoder power

46
Evaluation with Rule Scoring (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
47
Evaluation with Rule Scoring (III)
Setting the Stage Rule Learning Experimental
Results Conclusions
48
Test Suite Evaluation (I)
Setting the Stage Rule Learning Experimental
Results Conclusions

Test suite designed to target specific
constructions
Conjunctions of PPs
Adverb phrases
Reordering of adjectives and nouns
AdjP embedded in NP
Possessives
Designed in English, translated into Hebrew
138 sentences, one reference translation

49
Test Suite Evaluation (II)
Setting the Stage Rule Learning Experimental
Results Conclusions
50
Test Suite Evaluation (III)
Setting the Stage Rule Learning Experimental
Results Conclusions

Learned grammar performs statistically
significantly better than baseline
Performed one-tailed paired t-test
BLEU with resampling
t-value 122.53, p-value0 (df999)
? Statistically significantly better at 100
confidence level
Median of differences -0.0462 with 95
confidence interval -0.0245,-0.0721
METEOR
t-value 47.20, p-value 0.0 (df137)
? Statistically significantly better at 100
confidence level

51
Test Suite Evaluation (IV)
Setting the Stage Rule Learning Experimental
Results Conclusions
52
Hindi-English Portability Test (I)
Setting the Stage Rule Learning Experimental
Results Conclusions
53
Hindi-English Portability Test (II)
Setting the Stage Rule Learning Experimental
Results Conclusions

Learned grammar performs statistically
significantly better than baseline
Performed one-tailed paired t-test
BLEU with resampling
t-value 37.20, p-value0 (df999)
? Statistically significantly better at 100
confidence level
Median of differences -0.0024 with 80
confidence interval -0.0052,0.0001
METEOR
t-value 1.72, p-value 0.043 (df244)
? Statistically significantly better at higher
than 95 confidence level

54
Hindi-EnglishPortability Test (III)
Setting the Stage Rule Learning Experimental
Results Conclusions
55
Discussion of Results
Setting the Stage Rule Learning Experimental
Results Conclusions

Performance superior to standard SMT system
Learned grammar comparable to manual grammar
Learned grammar higher METEOR score, indicating
that it is more general
Constraints slightly lower performance in
exchange for higher run-time efficiency
Pruning slightly lower performance in exchange
for higher run-time efficiency

56
Conclusions and Contributions
Setting the Stage Rule Learning Experimental
Results Conclusions

Framework for learning transfer rules from
bilingual data
Improvement of translation output in hybrid
transfer and statistical system
Addressing limited-data scenarios with frugal
techniques
Combining different knowledge sources in a
meaningful way
Pushing MT research in the direction of
incorporating syntax into statistical-based
systems
Human-readable rules that can be improved by an
expert

57
Summary
Setting the Stage Rule Learning Experimental
Results Conclusions

Take a bilingual word-aligned corpus, and learn
transfer rules with constituent transfer and
unification constraints.
Is it a big corpus?
Ahem. No.
Do I have a parser for both languages?
No, just for one.
So I can use a dictionary, morphology modules,
a parser But these are all imperfect resources.
How do I combine them?
We can do it!
Ok.

Thank you!

Additional Slides

60
References (I)

Ayan, Fazil, Bonnie J. Dorr, and Nizar Habash.
Application of Alignment to Real-World Data
Combining Linguistic and Statistical Techniques
for Adaptable MT. Proceedings of AMTA-2004.
Baldwin, Timothy and Aline Villavicencio. 2002.
Extracting the Unextractable A case study on
verb-particles. Proceedings of CoNLL-2002.
Brown, Ralf D., A Modified Burrows-Wheeler
Transform for Highly-Scalable Example-Based
Translation, Proceedings of AMTA-2004.
Charniak, Eugene, Kevin Knight and Kenji Yamada.
2003. Syntax-based Language Models for
Statistical Machine Translation. Proceedings of
MT-Summit IX.

61
References (II)

Hutchins, John W. and Harold L. Somers. 1992. An
Introduction to Machine Translation. Academic
Press, London.
Jones, Douglas and R. Havrilla. Twisted Pair
Grammar Support for Rapid Development of Machine
Translation for Low Density Languages.
Proceedings of AMTA-98.
Menezes, Arul and Stephen D. Richardson. A
best-first alignment algorithm for automatic
extraction of transfer mappings from bilingual
corpora. Proceedings of the Workshop on
Data-driven Machine Translation at ACL-2001.
Nirenburg, Sergei. Project Boas A Linguist in
the Box as a Multi-Purpose Language Resource.
Proceedings of LREC-98.

62
References (III)

Orasan, Constantin and Richard Evans. 2001.
Learning to identify animate references.
Proceedings of CoNLL-2001.
Probst, Katharina. 2003. Using smart bilingual
projection to feature-tag a monolingual
dictionary. Proceedings of CoNLL-2003.
Probst, Katharina and Alon Lavie. A Structurally
Diverse Minimal Corpus for Eliciting Structural
Mappings between Languages. Proceedings of
AMTA-04.
Probst, Katharina and Lori Levin. 2002.
Challenges in Automated Elicitation of a
Controlled Bilingual Corpus. Proceedings of
TMI-02.

63
References (IV)

Senellart, Jean, Mirko Plitt, Christophe Bailly,
and Francoise Cardoso. 2001. Resource Alignment
and Implicit Transfer. Proceedings of MT-Summit
VIII.
Vogel, Stephan and Alicia Tribble. 2002.
Improving Statistical Machine Translation for a
Speech-to-Speech Translation Task. Proceedings of
ICSLP-2002.
Watanabe, Hideo, Sadao Kurohashi, and Eiji
Aramaki. 2000. Finding Structural Correspondences
from Bilingual Parsed Corpus for Corpus-based
Translation. Proceedings of COLING-2000.

64
Log-likelihood test for agreement constraints (I)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work

create list of all possible index pairs that
should be considered for an agreement constraint
L1 only constraints
list of all head-head pairs that ever occur with
the same feature (not necessarily same value),
and all head-nonheads in the same constituent
that occur with the same feature (not necessarily
same value).
For example, possible agreement constraint Num
agreement between Det and N in a NP where the Det
is a dependent of N
L2 only constraints same as L1 only constraints
above.
L2?L1 constraints all situations where two
aligned indices mark the same feature

65
Log-likelihood test for agreement constraints (II)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work

Hypothesis 0 The values are independently
distributed.
Hypothesis 1 The values are not independently
distributed.
Under the null hypothesis
Under the alternative hypothesis
where ind is 1 if vxi1 vxi2 and 0 otherwise.

66
Log-likelihood test for agreement constraints
(III)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work

i1 and i2 drawn from a multinomial distribution.
where cvi is the number of times the value vi
was encountered for the given feature (e.g.
PERS), and k is the number of possible values for
the feature (e.g. 1st, 2nd, 3rd).
If strong evidence for Hypothesis 0, introduce
agreement constraint
For cases where there is not enough evidence
either way (n

67
Lexicon Enhancement for Hebrew Adverbs (I)

Example 1 B MX ? happily
Example 2 IWTR GBWH ? taller
These are not necessarily in the dictionary
Both processes are productive
How can we add these and similar entries to
lexicon? Automatically?

68
Lexicon Enhancement for Hebrew Adverbs (II)

For all 1-2 (L1-L2) alignments in training data
1. extract all cases of at least 2 instances
where one word is constant (constant word
wL2c, non-constant word wL2v, non-constant word
wL1v)
2. For each word wL2v
2.1. Get all L1 translations
2.2. Find the closest match wL1match to wL1v
2.3. Learn replacement rule wL1match-wL1v
3. For each word wL2POS of same POS as wL2c
3.1. For each possible translations wL1POS
3.1.1. Apply all replacement rules
possible wL1POS-wL1POSmod
3.1.2. For each applied replacement rule,
insert into lexicon entry
wc wL2POS - wL1POSmod

69
Lexicon Enhancement for Hebrew Adverbs (III)

Example B MX - happily
Possible translations of MX
joy
happiness
Use edit distance to find that happiness is
wL1match for happily
Learn replacement rule ness-ly

70
Lexicon Enhancement for Hebrew Adverbs (IV)

For all L2 Nouns in the dictionary, get all
possible L1 translations, and apply the
replacement rule
If replacement rule can be applied, add lexicon
entry
Examples of new adverbs added to lexicon
ADVADV "B" "APTNWT" - "AMBITIOUSLY"
ADVADV "B" "BIRWT" - "BRITTLELY"
ADVADV "B" "GWN" - "MADLY"
ADVADV "B" "I_at_TIWT" - "METHODICALLY"

71
Lexicon Enhancement for Hebrew Comparatives

Same process as for adverbs
Examples of new comparatives added to lexicon
ADJADJ "IWTR" "MLA" - "FULLER"
ADJADJ "IWTR" "MPGR" - "SLOWER"
ADJADJ "IWTR" "MQCH" - "HEATER"
All words are checked in the BNC
Comment automatic process, thus far from perfect

72
Some notation
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work

SL Source Language, language to be translated
from
TL Target Language, language to be translated
into
L1 language for which abundant information is
available
L2 language for which less information is
available
(Here) SL L2 Hebrew, Hindi
(Here) TL L1 English
POS part of speech, e.g. noun, adjective, verb
Parse structural (tree) analysis of sentence
Lattice list of partial translations, arranged
by length and start index

73
Training Data Example
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work

SL the widespread interest in the election
TL h niin h rxb b h bxirwt
Alignment((1,1),(1,3),(2,4),(3,2),(4,5),(5,6),(6,
7))
Type NP
Parse
(
(DET the-1)(ADJ widespread-2)(N interest-3)
( (PREP in-4)
( (DET the-5)(N election-6))))

74
Seed Generation Algorithm
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work

for all training examples
for all 1-1 aligned words
get the L1 POS tag from the parse
get the L2 POS tag from the morphology module
and the dictionary
if the L1 POS and the L2 POS tags are not the
same, leave both words lexicalized
for all other words
leave the words lexicalized
create rule word alignments from training
example
set L2 type and L1 type to be the parse roots
label

75
Taxonomy of Structural Mappings (I)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work

Non-terminals (NT)
used in two rule parts
type definition of a rule (both for SLand TL,
meaning X0 and Y0),
constituent sequences for both languages.
any label that can be the type of a rule
describe higher-level structures such as
sentences (S), noun phrases (NP), or
prepositional phrases(PP).
can be filled with more than one word filled by
other rules.

76
Taxonomy of Structural Mappings (II)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work

Pre-terminals (PT)
used only in the constituent sequences of the
rules, not as X0 or Y0 types.
filled with only one word, except phrasal lexicon
entries filled by lexical entries, not by other
grammar rules.
Terminals (LIT)
lexicalized entries in the constituent sequences
can be used on both the x- and the y-side
can only be filled by the specified terminal
itself.

77
Taxonomy of Structural Mappings (III)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work

NTs must not be aligned 1-0 or 0-1
PTs must not be aligned 1-0 or 0-1.
Any word in the bilingual training pair must
participate in exactly one LIT, PT, or NT.
An L1 NT is assumed to translate into the same NT
inL2.

78
Taxonomy of Structural Mappings (IV)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work

Transformation I (SL type into SL component
sequence).
NT ? (NT PT LIT)
Transformation II (SL type into TL type).
NTi ? NTi (same type of NT)
Transformation III (TL type into TL component
sequence).
NT ? (NT PT LIT)
Transformation IV (SL components into TL
components).
NTi ? NTi (same type of NT)
PT ? PT
LIT ? e
e ? LIT

79
Basic Compositionality Pseudocode
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work

traverse parse top-down
for each node i in parse
extract the subtree rooted at i
extract the L1 chunk cL1 rooted at i and the
corresponding L2 chunk cL2 (using alignments)
if transfer engine can translate cL1 into cL2
using previously learned rules
introduce compositional element
replace POS sequence for cL1 and cL2 with
label of node i
adjust alignments
do not traverse already covered subtree

80
Co-Embedding Resolution, Iterative Type Learning
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work

Problem looking for previously learned rules
Must determine optimal learning ordering
Co-Embedding Resolution
Tag each training example with depth of tree,
i.e. how many embedded elements
Then learn lowest to highest
Iterative Type Learning
Some types (e.g. PPs) are frequently embedded in
others (e.g. NP)
Pre-determine the order in which types are learned

81
Compositionality Sample Learned Rules (II)
Setting the Stage Rule Learning Experimental
Results Conclusions

L2 RQ AM H RKBT TGI
L1 ONLY IF THE TRAIN ARRIVES
C-Structure(
( (ADV only-1))
(SUBORD if-2)
( ( (DET the-3)(N train-4))
( (V arrives-5))))
SBARSBAR
ADVP SUBORD S - ADVP SUBORD S
(
(X1Y1) (X2Y2) (X3Y3)
)

82
Taxonomy of Constraints (I)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
83
Co-Embedding Resolution, Iterative Type Learning
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work

find highest co-embedding score in training data
find the number of types to learn, ntypes
for (i 0 i
for (j 0 j
for all training examples with co-embedding
score i and of type j
perform Seed Generation
perform Compositionality Learning

84
Taxonomy of Constraints (II)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
85
Taxonomy of Constraints (III)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
86
Constraint Learning Sample Learned Rules (II)
Setting the Stage Rule Learning Experimental
Results Conclusions

L2 H ILD AKL KI HWA HIH RB
L1 THE BOY ATE BECAUSE HE WAS HUNGRY
SS NP V SBAR - NP V SBAR
(
(X1Y1) (X2Y2) (X3Y3)
(X0 X2)
((X1 GEN) (X2 GEN))
((X1 NUM) (X2 NUM))
((Y1 NUM) (X1 NUM))
((Y2 TENSE) (X2 TENSE))
((Y3 NUM) (X3 NUM))
((Y3 TENSE) (X3 TENSE))
(Y0 Y2))

87
Evaluation with Different Length Limits (I)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
88
Evaluation with Different Length Limits
(II)(METEOR score)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work
89
Discussion of Results Comparison of Translations
(back to Hebrew-English)
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work

No grammar the doctor helps to patients his
Learned grammar the doctor helps to his patients
Reference translation The doctor helps his
patients
No grammar the soldier writes many letters to
the family of he
Learned grammar the soldier writes many letters
to his family
Reference translation The soldier writes many
letters to his family

90
Time Complexity of Algorithms
Setting the Stage Rule Learning Experimental
Results Conclusions

Seed Generation O(n)
Compositionality
Basic O(nmax(tree_depth))
Maximum Compositionality O(nmax(num_children))
Constraint Learning O(nmax(num_basic_constraints
))
Practically no issue

91
If I had 6 more months
Setting the Stage Rule Learning Experimental
Results Conclusions Future Work

Application to larger datasets
Training data enhancement to obtain training
examples at different levels (NPs, PPs, etc.)
More emphasis on rule scoring (more noise)
More emphasis on context learning constraints
Constraint learning as version space learning
problem
Integrate rules into statistical system more
directly, without producing full lattice

Write a Comment

User Comments (0)