Grammatical Machine Translation - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Grammatical Machine Translation

Description:

Collins et al. 2005. Lin 2004. Ding & Palmer 2005. Quirk et al. 2005 ... Michael Collins, PhilippKoehn and Ivona Kucerova. ... Fei Xia and Michael McCord. ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 38
Provided by: compa3
Category:

less

Transcript and Presenter's Notes

Title: Grammatical Machine Translation


1
Grammatical Machine Translation
  • Stefan Riezler John Maxwell

2
Overview
  • Introduction
  • Extracting F-Structure Snippets
  • Parsing-Transfer-Generation
  • Statistical Models and Training
  • Experimental Evaluation
  • Discussion

3
Section 1Introduction
4
Introduction
  • Recent approaches to SMT use
  • Phrase-based SMT
  • Syntactic knowledge
  • Phrase-base SMT is great for
  • Local ordering
  • Short idiomatic expressions
  • But not so good for
  • Learning LDDs
  • Generalising to unseen phrases that share
    non-overt linguistic info

5
Statistical Parsers
  • Statistical Parsers can provide information to
  • Resolve LDDs
  • Generalise to unseen phrases that share non-overt
    linguistic info
  • Examples
  • Xia McCord 2004
  • Collins et al. 2005
  • Lin 2004
  • Ding Palmer 2005
  • Quirk et al. 2005

6
Grammar-based Generation
  • Could grammar-based generation be useful for MT?
  • Quirk et al. 2005
  • Simple statistical model outperforms grammar-base
    generator of Menezes Richardson 2001 on BLEU
    score
  • Charniak et al. 2003
  • Parsing-based language modelling can improve
    grammaticality of translations while not
    improving BLEU score
  • Perhaps BLEU score is not sufficient way to test
    for grammaticality.
  • Further investigation needed

7
Grammatical Machine Translation
  • Aim
  • Investigate incorporating a grammar-based
    generator into a dependency-based SMT system
  • The authors present
  • A dependency-based SMT model
  • Statistical components that are modelled on
    phrase-based system of Koehn et al. 2003
  • Also used
  • Component weights adjusted using MER training
    (Och 2003)
  • Grammar-based generator
  • N-gram and distortion models

8
Section 2Extracting F-Structure Snippets
9
Extracting F-Structure Snippets
  • SL and TL sentences of bilingual corpus parsed
    using
  • LFG grammars
  • For each English and German f-structure pair
  • The two f-structures that most preserve
    dependencies are selected
  • Many-to-many word alignments used to create
    many-to-many correspondences between the
    substructures
  • Correspondences are the basis for deciding what
    goes into the basic transfer rule

10
Extracting F-Structure SnippetsExample
  • Dafur bin ich zutiefst dankbar ? I have a
    deep appreciation for that
  • ltfor thatgt ltamgt ltIgt ltdeepestgt ltthankfulgt
  • Many-to-many bidirectional word alignment

11
Transfer Rule Extraction Example
  • From the aligned words we get the following
    substructure correspondences

12
Transfer Rule Extraction Example
  • From the correspondences two kinds of transfer
    rules are extracted
  • Primitive Transfer Rules
  • Complex Transfer Rules
  • Transfer Contiguity Constraint
  • Source and target f-structures are each
    connected.
  • F-structures in the transfer source can only be
    aligned with f-structures in the transfer target
    and vice versa.

13
Transfer Rule Extraction Example
  • Primitive Rule 1
  • pred( X1, sein) pred( X1, have)
  • subj( X1, X2) ? subj( X1, X2)
  • xcomp( X1, X3) obj( X1, X3)

14
Transfer Rule Extraction Example
  • Primitive Rule 2
  • pred( X1, ich) ? pred( X1, I)

15
Transfer Rule Extraction Example
  • Primitive Rule 3
  • pred( X1, dafur) pred( X1, for)
  • ? obj( X1, X2)
  • pred( X2, that)

16
Transfer Rule Extraction Example
  • Primitive Rule 4
  • pred( X1, dankbar) pred( X1, appreciation)
  • adj( X1, X2) ? spec( X1, X2)
  • in_set( X3, X2) pred( X2, a)
  • pred(X3, zutiefst) adj( X1, X3)
  • in_set( X4, X3)
  • pred( X4, deep)

17
Transfer Rule Extraction Example
  • Complex Transfer Rules
  • primitive transfer rules that are adjacent in
    f-structure combined to form more complex rules
  • Example (rules 1 2 above)

pred( X1, sein) pred( X1, have) subj( X1,
X2) ? subj( X1, X2) pred( X2, ich)
pred( X2, I) xcomp( X1, X3) obj( X1, X3)
In the worst case, there can be an exponential
number of combinations of primitive transfer
rules, the number of primitive rules used to form
a complex rule is restricted to 3 causing the
no. of transfer rules taken to be O(n2) in the
worst case.
18
Section 3Parsing-Transfer-Generation

19
Parsing
  • LFG grammars used to parse source and target text
  • FRAGMENT grammar is used to augment standard
    grammar increasing robustness
  • Correct parse determined by fewest chunk method

20
Transfer
  • Rules applied to source f-structure
    non-deterministically and in parallel
  • Each fact of German f-structure translated by
    exactly one transfer rule
  • Default rule included that allows any fact to be
    translated as itself
  • Chart used to encode translations
  • Beam search decoding used to select the most
    probable translations

21
Generation
  • Method of generation has to be fault tolerant
  • Transfer system can be given a fragmentary parse
    as input
  • Transfer system can output an non-valid
    f-structure
  • Unknown predicates
  • Default morphology used to inflect source stem
    for English
  • Unknown structures
  • Default grammar used that allows any attribute to
    be generated in any order with any category

22
Section 4Statistical Models Training

23
Statistical Components
  • Modelled on statistical components of Pharaoh
  • Paraoh integrates 8 statistical models
  • Relative frequency of phrase translations in
    source-to-target
  • Relative frequency of phrase translations in
    target-to-source
  • Lexical weighting in source-to-target
  • Lexical weighting in target-to-source
  • Phrase count
  • Language model probability
  • Word count
  • Distortion probability

24
Statistical Components
  • Following statistics for each translation
  • Log-probability of source-to-target transfer
    rules, where the probability r(ef) of a rule
    that transfers source snippet f into target
    snippet e is estimated by the relative frequency

2. Log-probability of target-to-source rules
25
Statistical Components
  • 3. Log-probability of lexical translations from
  • source to target snippets, estimated from
    Viterbi alignments â between source word
    positions i 1, , n and target word positions j
    1, , m for stems fi and ej in snippets f and e
    with relative word translation frequencies
    t(ejfi)

4. Log-probability of lexical translations from
target-to-source snippets
26
Statistical Components
  • 5. Number of transfer rule
  • 6. Number of transfer rules with frequency 1
  • 7. Number of default transfer rules
  • 8. Log-probability of strings of predicates from
    root to frontier of target f-structure, estimated
    from predicate trigrams of English
  • 9. Number of predicates in target language
  • 10. Number of constituent movements during
    generation based on the original order of the
    head predicates of the constituents (for example,
    AP2 BP3 CP1 counts as two movements since
    the head predicate of CP moved from first to
    third position)

27
Statistical Components
  • 11. Number of generation repairs
  • 12. Log-probability of target string as computed
    by trigram language model
  • 13. Number of words in target string
  • 1 10 are used to choose the most probable parse
    from the transfer chart
  • 1 7 are are tests on source and target
    f-structure snippets related via transfer rules
  • 8 -10 are language model and distortion features
    on the target c- and f-structures
  • 11 13 are computed on the strings that are
    generated from the target f-structure
  • The statistics are combined into a log-linear
    model whose parameters are adjusted by minimum
    error rate training.

28
Section 5ExperimentalEvaluation

29
Experimental Evaluation
  • Europarl German to English
  • Sents of length 5 15 words
  • Training set 163,141 sents
  • Development set 1,967 sents
  • Test set 1,755 sents (same as Koehn et al
    2003)
  • Bidirectional word alignment created from word
    alignment of IBM model 4 as implemented by Giza
    (Och et al. 1999)
  • Grammars achieve 100 coverage on unseen data
  • 80 as full parses
  • 20 as fragment parses
  • 700,000 transfer rules extracted
  • For language modelling trigram model of Stolcke
    2002 is used

30
Experimental Evaluation
  • For translating the test set
  • 1 parse for each German sentence was used
  • 10 transferred f-structures
  • 1,000 generated strings for each transferred
    f-structure
  • Most probable target f-structure is gotten by a
    beam search on the transfer chart using features
    1-10 above, with a beam size of 20.
  • Features 11-13 are computed on the strings that
    are generated

31
Experimental Evaluation
  • For automatic evaluation they used NIST combined
    with the approximate randomization test (Noreen,
    1999)

32
Experimental Evaluation
  • Manual Evaluation
  • To separate the factors of grammaticality and
    translation adequacy
  • 500 sentences randomly extracted from in-coverage
    examples
  • 2 independent human judges
  • Presented with the output from the phrase-based
    SMT system and LFG-based system in a blind test
    and asked them to choose a preference for one of
    the translations based on
  • Grammaticality / fluency
  • Translational / semantic adequacy

33
Experimental Evaluation
  • Promising results for examples that are
    in-coverage of LFG grammars ?
  • However, back-off to robustness techniques for
    parsing and generation results in loss of
    translation quality ?
  • Rule Extraction Problems
  • 20 of the parses are fragmental
  • Errors occur in rule extraction process resulting
    in ill-formed transfer rules
  • Parsing-Transfer-Generation Problems
  • Parsing errors ? errors in transfer ? generation
    errors
  • In-coverage ? disambiguation errors in parsing
    and transfer ? suboptimal translation

34
Experimental Evaluation
  • Despite use of minimum error rate training and
    n-gram language models, the system cannot be used
    to maximize n-gram scores on reference
    translations in the same way as phrase-based
    systems since statistical ordering models are
    employed in the framework after generation
  • This gives preference to grammaticality over
    similarity to reference translations

35
Conclusion
  • SMT model that marries phrase-based SMT with
    traditional grammar-based MT
  • NIST measure showed that results achieved are
    comparable with phrase-based SMT system of Koehn
    et al 2003 for in-coverage examples
  • Manual evaluation showed significant improvements
    in both grammaticality and translational adequacy
    for in-coverage examples

36
Conclusion
  • Determinable with this system whether or not a
    source sentence is in-coverage
  • Possibility for hybrid system that achieves
    improved grammaticality at state-of-the-art
    translation quality
  • Future Work
  • Improvement of translation of in-coverage source
    sentences e.g. stochastic generation
  • Apply system to other language pairs and data sets

37
References
  • Miriam Butt, Dyvik Helge, Tracy King, Hiroshi
    Masuichi and Christian Rohrer. 2002 The Parallel
    Grammar Project.
  • Eugene Charniak, Kevin Knight and Kenji Yamada.
    2003 Syntax-based Language Models for Statistical
    Machine Translation.
  • Michael Collins, PhilippKoehn and Ivona
    Kucerova. 2005 Clause Restructuring for
    Statistical Machine Translation.
  • Philipp Koehn, Franz Och and Daniel Marcu. 2003
    Statistical Phrase-based Translation.
  • Philipp Koehn. 2004 Pharaoh a beam search
    decoder for phrase-based statistical machine
    translation
  • Arul Menezes and Stephen Richardson. 2001 A
    best-first alignment for automatic extraction of
    transfer mappings from bilingual corpora.
  • Franz Och, Christoph Tillmann and Ney Hermann.
    1999 Improved Alignment Models for Statistical
    Machine Translation.
  • Franz Och. 2003 Minimum error rate training in
    statistical machine translation.
  • Kishore Papineni, Salim Roukos, Todd Ward and
    Wei-Jing Zhu. 2002 BLEU a method for automatic
    evaluation of machine translation.
  • Stefan Riezler, Tracy King, Ronald Kaplan,
    Richard Crouch, John Maxwell and Mark Johnson.
    2002 Parsing the Wall Street Journal using LFG
    and Discriminative Estimation Techniques
  • Stefan Riezler and John Maxwell. 2006
    Grammatical Machine Translation.
  • Fei Xia and Michael McCord. 2004 Improving a
    statistical MT system with automatically learned
    rewrite patterns
Write a Comment
User Comments (0)
About PowerShow.com