Grammatical Machine Translation presentation

About This Presentation

Transcript and Presenter's Notes

Title: Grammatical Machine Translation

1
Grammatical Machine Translation

Stefan Riezler John Maxwell

2
Overview

Introduction
Extracting F-Structure Snippets
Parsing-Transfer-Generation
Statistical Models and Training
Experimental Evaluation
Discussion

3
Section 1Introduction
4
Introduction

Recent approaches to SMT use
Phrase-based SMT
Syntactic knowledge
Phrase-base SMT is great for
Local ordering
Short idiomatic expressions
But not so good for
Learning LDDs
Generalising to unseen phrases that share
non-overt linguistic info

5
Statistical Parsers

Statistical Parsers can provide information to
Resolve LDDs
Generalise to unseen phrases that share non-overt
linguistic info
Examples
Xia McCord 2004
Collins et al. 2005
Lin 2004
Ding Palmer 2005
Quirk et al. 2005

6
Grammar-based Generation

Could grammar-based generation be useful for MT?
Quirk et al. 2005
Simple statistical model outperforms grammar-base
generator of Menezes Richardson 2001 on BLEU
score
Charniak et al. 2003
Parsing-based language modelling can improve
grammaticality of translations while not
improving BLEU score
Perhaps BLEU score is not sufficient way to test
for grammaticality.
Further investigation needed

7
Grammatical Machine Translation

Aim
Investigate incorporating a grammar-based
generator into a dependency-based SMT system
The authors present
A dependency-based SMT model
Statistical components that are modelled on
phrase-based system of Koehn et al. 2003
Also used
Component weights adjusted using MER training
(Och 2003)
Grammar-based generator
N-gram and distortion models

8
Section 2Extracting F-Structure Snippets
9
Extracting F-Structure Snippets

SL and TL sentences of bilingual corpus parsed
using
LFG grammars
For each English and German f-structure pair
The two f-structures that most preserve
dependencies are selected
Many-to-many word alignments used to create
many-to-many correspondences between the
substructures
Correspondences are the basis for deciding what
goes into the basic transfer rule

10
Extracting F-Structure SnippetsExample

Dafur bin ich zutiefst dankbar ? I have a
deep appreciation for that
ltfor thatgt ltamgt ltIgt ltdeepestgt ltthankfulgt
Many-to-many bidirectional word alignment

11
Transfer Rule Extraction Example

From the aligned words we get the following
substructure correspondences

12
Transfer Rule Extraction Example

From the correspondences two kinds of transfer
rules are extracted
Primitive Transfer Rules
Complex Transfer Rules

Transfer Contiguity Constraint
Source and target f-structures are each
connected.
F-structures in the transfer source can only be
aligned with f-structures in the transfer target
and vice versa.

13
Transfer Rule Extraction Example

Primitive Rule 1
pred( X1, sein) pred( X1, have)
subj( X1, X2) ? subj( X1, X2)
xcomp( X1, X3) obj( X1, X3)

14
Transfer Rule Extraction Example

Primitive Rule 2
pred( X1, ich) ? pred( X1, I)

15
Transfer Rule Extraction Example

Primitive Rule 3
pred( X1, dafur) pred( X1, for)
? obj( X1, X2)
pred( X2, that)

16
Transfer Rule Extraction Example

Primitive Rule 4
pred( X1, dankbar) pred( X1, appreciation)
adj( X1, X2) ? spec( X1, X2)
in_set( X3, X2) pred( X2, a)
pred(X3, zutiefst) adj( X1, X3)
in_set( X4, X3)
pred( X4, deep)

17
Transfer Rule Extraction Example

Complex Transfer Rules
primitive transfer rules that are adjacent in
f-structure combined to form more complex rules
Example (rules 1 2 above)

pred( X1, sein) pred( X1, have) subj( X1,
X2) ? subj( X1, X2) pred( X2, ich)
pred( X2, I) xcomp( X1, X3) obj( X1, X3)
In the worst case, there can be an exponential
number of combinations of primitive transfer
rules, the number of primitive rules used to form
a complex rule is restricted to 3 causing the
no. of transfer rules taken to be O(n2) in the
worst case.
18
Section 3Parsing-Transfer-Generation

19
Parsing

LFG grammars used to parse source and target text
FRAGMENT grammar is used to augment standard
grammar increasing robustness
Correct parse determined by fewest chunk method

20
Transfer

Rules applied to source f-structure
non-deterministically and in parallel
Each fact of German f-structure translated by
exactly one transfer rule
Default rule included that allows any fact to be
translated as itself
Chart used to encode translations
Beam search decoding used to select the most
probable translations

21
Generation

Method of generation has to be fault tolerant
Transfer system can be given a fragmentary parse
as input
Transfer system can output an non-valid
f-structure
Unknown predicates
Default morphology used to inflect source stem
for English
Unknown structures
Default grammar used that allows any attribute to
be generated in any order with any category

22
Section 4Statistical Models Training

23
Statistical Components

Modelled on statistical components of Pharaoh
Paraoh integrates 8 statistical models
Relative frequency of phrase translations in
source-to-target
Relative frequency of phrase translations in
target-to-source
Lexical weighting in source-to-target
Lexical weighting in target-to-source
Phrase count
Language model probability
Word count
Distortion probability

24
Statistical Components

Following statistics for each translation
Log-probability of source-to-target transfer
rules, where the probability r(ef) of a rule
that transfers source snippet f into target
snippet e is estimated by the relative frequency

2. Log-probability of target-to-source rules
25
Statistical Components

3. Log-probability of lexical translations from
source to target snippets, estimated from
Viterbi alignments â between source word
positions i 1, , n and target word positions j
1, , m for stems fi and ej in snippets f and e
with relative word translation frequencies
t(ejfi)

4. Log-probability of lexical translations from
target-to-source snippets
26
Statistical Components

5. Number of transfer rule
6. Number of transfer rules with frequency 1
7. Number of default transfer rules
8. Log-probability of strings of predicates from
root to frontier of target f-structure, estimated
from predicate trigrams of English
9. Number of predicates in target language
10. Number of constituent movements during
generation based on the original order of the
head predicates of the constituents (for example,
AP2 BP3 CP1 counts as two movements since
the head predicate of CP moved from first to
third position)

27
Statistical Components

11. Number of generation repairs
12. Log-probability of target string as computed
by trigram language model
13. Number of words in target string
1 10 are used to choose the most probable parse
from the transfer chart
1 7 are are tests on source and target
f-structure snippets related via transfer rules
8 -10 are language model and distortion features
on the target c- and f-structures
11 13 are computed on the strings that are
generated from the target f-structure
The statistics are combined into a log-linear
model whose parameters are adjusted by minimum
error rate training.

28
Section 5ExperimentalEvaluation

29
Experimental Evaluation

Europarl German to English
Sents of length 5 15 words
Training set 163,141 sents
Development set 1,967 sents
Test set 1,755 sents (same as Koehn et al
2003)
Bidirectional word alignment created from word
alignment of IBM model 4 as implemented by Giza
(Och et al. 1999)
Grammars achieve 100 coverage on unseen data
80 as full parses
20 as fragment parses
700,000 transfer rules extracted
For language modelling trigram model of Stolcke
2002 is used

30
Experimental Evaluation

For translating the test set
1 parse for each German sentence was used
10 transferred f-structures
1,000 generated strings for each transferred
f-structure
Most probable target f-structure is gotten by a
beam search on the transfer chart using features
1-10 above, with a beam size of 20.
Features 11-13 are computed on the strings that
are generated

31
Experimental Evaluation

For automatic evaluation they used NIST combined
with the approximate randomization test (Noreen,
1999)

32
Experimental Evaluation

Manual Evaluation
To separate the factors of grammaticality and
translation adequacy
500 sentences randomly extracted from in-coverage
examples
2 independent human judges
Presented with the output from the phrase-based
SMT system and LFG-based system in a blind test
and asked them to choose a preference for one of
the translations based on
Grammaticality / fluency
Translational / semantic adequacy

33
Experimental Evaluation

Promising results for examples that are
in-coverage of LFG grammars ?
However, back-off to robustness techniques for
parsing and generation results in loss of
translation quality ?
Rule Extraction Problems
20 of the parses are fragmental
Errors occur in rule extraction process resulting
in ill-formed transfer rules
Parsing-Transfer-Generation Problems
Parsing errors ? errors in transfer ? generation
errors
In-coverage ? disambiguation errors in parsing
and transfer ? suboptimal translation

34
Experimental Evaluation

Despite use of minimum error rate training and
n-gram language models, the system cannot be used
to maximize n-gram scores on reference
translations in the same way as phrase-based
systems since statistical ordering models are
employed in the framework after generation
This gives preference to grammaticality over
similarity to reference translations

35
Conclusion

SMT model that marries phrase-based SMT with
traditional grammar-based MT
NIST measure showed that results achieved are
comparable with phrase-based SMT system of Koehn
et al 2003 for in-coverage examples
Manual evaluation showed significant improvements
in both grammaticality and translational adequacy
for in-coverage examples

36
Conclusion

Determinable with this system whether or not a
source sentence is in-coverage
Possibility for hybrid system that achieves
improved grammaticality at state-of-the-art
translation quality
Future Work
Improvement of translation of in-coverage source
sentences e.g. stochastic generation
Apply system to other language pairs and data sets

37
References

Miriam Butt, Dyvik Helge, Tracy King, Hiroshi
Masuichi and Christian Rohrer. 2002 The Parallel
Grammar Project.
Eugene Charniak, Kevin Knight and Kenji Yamada.
2003 Syntax-based Language Models for Statistical
Machine Translation.
Michael Collins, PhilippKoehn and Ivona
Kucerova. 2005 Clause Restructuring for
Statistical Machine Translation.
Philipp Koehn, Franz Och and Daniel Marcu. 2003
Statistical Phrase-based Translation.
Philipp Koehn. 2004 Pharaoh a beam search
decoder for phrase-based statistical machine
translation
Arul Menezes and Stephen Richardson. 2001 A
best-first alignment for automatic extraction of
transfer mappings from bilingual corpora.
Franz Och, Christoph Tillmann and Ney Hermann.
1999 Improved Alignment Models for Statistical
Machine Translation.
Franz Och. 2003 Minimum error rate training in
statistical machine translation.
Kishore Papineni, Salim Roukos, Todd Ward and
Wei-Jing Zhu. 2002 BLEU a method for automatic
evaluation of machine translation.
Stefan Riezler, Tracy King, Ronald Kaplan,
Richard Crouch, John Maxwell and Mark Johnson.
2002 Parsing the Wall Street Journal using LFG
and Discriminative Estimation Techniques
Stefan Riezler and John Maxwell. 2006
Grammatical Machine Translation.
Fei Xia and Michael McCord. 2004 Improving a
statistical MT system with automatically learned
rewrite patterns

Write a Comment

User Comments (0)

About PowerShow.com

Grammatical Machine Translation PowerPoint PPT Presentation