A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang

Description:

A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted from ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 23
Provided by: facultyWa6
Category:

less

Transcript and Presenter's Notes

Title: A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang


1
A Hierarchical Phrase-Based Model for Statistical
Machine TranslationAuthor David Chiang
  • Presented by Achim Ruopp
  • Formulas/illustrations/numbers extracted from
    referenced papers

2
Outline
  • Phrase Order in Phrase-based Statistical MT
  • Using synchronous CFGs to solve the issue
  • Integrating the idea into an SMT system
  • Results
  • Conclusions
  • Future work
  • My Thoughts/Questions

3
Phrase Order in Phrase-based Statistical MT
  • Example from Chiang2005

4
Phrase Order in Phrase-based Statistical MT
  • Translation of the example with a phrase-based
    SMT system (Pharao, Koehn2004)
  • Aozhou shi yu Bei Han you bangjiao1
    de shaoshu guojia zhiyi
  • Australia is dipl. rels.1 with North
    Korea is one of the few countries
  • Uses learned phrase translations
  • Accomplishes local phrase-reordering
  • Fails on overall reordering of phrases
  • Not only applicable to Chinese, but also Japanese
    (SOV order), German (scrambling)

5
Idea Rules for Subphrases
  • Motivation
  • phrases are good for learning reorderings of
    words, we can use them to learn reorderings of
    phrases as well
  • Rules with placeholders for subphrases
  • ltyu 1 you 2, have 2 with 1gt
  • Learned automatically from bitext without
    syntactical annotation
  • Formally syntax-based but not linguistically
    syntax-based
  • the result sometimes resembles a syntacticians
    grammar but often does not

6
Synchronous CFGs
  • Developed in the 60s for programming-language
    compilation Aho1969
  • Separate tutorial by Chiang describing them
    Chiang2005b
  • In NLP synchronous CFGs have been used for
  • Machine translation
  • Semantic interpretation

7
Synchronous CFGs
  • Like CFGs, but production have two right hand
    sides
  • Source side
  • Target side
  • Related through linked non-terminal symbols
  • E.g. VP ? ltV1 NP2,NP2 V1gt
  • One-to-one correspondence
  • Non-terminal of type X is always linked to same
    type
  • Productions applied in parallel to both sides to
    linked non-terminals

8
Synchronous CFGs
9
Synchronous CFGs
  • Limitations
  • No Chomsky normal form
  • Has implications for complexity of decoder
  • Only limited closure under composition
  • Sister-reordering only

10
Model
  • Using the log-linear model Och2002
  • Presented by Bill last week

11
Model Rule Features
  • P(?a) and P(a?)
  • Lexical weights Pw(?a) and Pw(a?)
  • Estimation how we words in a translate to words
    in ?
  • Phrase penalty exp(1)
  • Allows model to learn longer/shorter derivations
  • Exception glue rule weights
  • w(S ? ltX1,X1 gt) 1
  • w(S ? ltS1X2,S1X2gt) exp(-?g)
  • ?g controls models preference for hierarchical
    phrases over serial phrase combination

12
Model Additional Features
  • Separated out from rule weights
  • Notational convenience
  • Conceptually cleaner (necessary for
    polynominal-time decoding)
  • Derivation D
  • Set of triples ltr,i,jgt apply grammar rule r for
    rewriting a non-terminal in span f(D) from i to j
  • Ambiguous

13
Training
  • Training is starting from a symetrical,
    word-aligned corpus
  • Adopted from Och2004 and Koehn2003
  • How to get from a one-directional alignment to a
    symetrical alignment
  • How to find initial phrase pairs
  • alternative would be Marcu Wong 2002 that Ping
    presented Marcu2002

14
Training
15
Training
  • Scheme leads unfortunately
  • To a large number of rules
  • With false ambiguity
  • Grammar is filtered to
  • Balance grammar size and performance
  • Five filter criteria e.g.
  • produce only two non-terminals
  • Initial phrase length limited to 10

16
Decoding
  • Our good old friend - the CKY parser
  • Enhanced with
  • Beam search
  • Postprocessor to map French derivations to
    English derivations

17
Results
  • Baseline
  • Pharao Koehn2003, Koehn2004
  • Minimum error rate training on BLEU measure
  • Hierarchical model
  • 2.2 Million rules after filtering down from 24
    Million
  • 7.5 relative improvement
  • Additional constituent feature
  • Additional feature favoring syntactic parses
  • Trained on 250k sentences Penn Chinese Treebank
  • Improved accuracy only in development set

18
Learned Feature Weights
  • Word word penalty
  • Phr phrase penalty (pp)
  • ?g penalizes glue rules much less than ?pp does
    regular rules
  • i.e. This suggests that the model will prefer
    serial combination of phrases, unless some other
    factor supports the use of hierarchical phrases

19
Conclusions
  • Hierarchical phrase pairs that can be learned
    data without syntactically annotation
  • Hierarchical phrase pairs improve translation
    accuracy significantly
  • Added syntactic information (constituent feature
    did not provide statistically significant gain

20
Future Work
  • Move to more syntactically motivated grammar
  • Reducing grammar size to allow more aggressive
    training settings

21
My Thoughts/Questions
  • Really interesting approach to bring syntactic
    information into SMT
  • Example sentence was not translated correctly
  • Missing words are problematic
  • Can phrase reordering be also learned by
    lexicalized phrase reordering models Och2004?
  • Why did constituent feature only improve accuracy
    in development set, but not in test set?
  • Does data sparseness influence the learned
    feature weights?
  • What syntactical features are already built into
    Pharao?

22
References
  • Aho1969 Aho, A. V. and J. D. Ullman. 1969.
    Syntax directed translations and the pushdown
    assembler. Journal of Computer and System
    Sciences, 33756.
  • Chiang2005 Chiang, David. 2005. A Hierarchical
    Phrase-Based Model for Statistical Machine
    Translation. In Proceedings of ACL 2005, pages
    263270.
  • Chiang2005b http//www.umiacs.umd.edu/resnik/l
    ing645_fa2005/notes/synchcfg.pdf
  • Koehn2003 Koehn, Philipp. 2003. Noun Phrase
    Translation. Ph.D.
  • thesis, University of Southern California.
  • Koehn2004 Koehn, Phillip. 2004. Pharaoh a
    beam search decoder for phrase-based statistical
    machine translation models. In Proceedings of the
    Sixth Conference of the Association for Machine
    Translation in the Americas, pages 115124.
  • Marcu2002 Marcu, Daniel and William Wong.
    2002. A phrasebased, joint probability model for
    statistical machine translation. In Proceedings
    of the 2002 Conference on
  • Empirical Methods in Natural Language Processing
    (EMNLP), pages 133139.
  • Och 2002 Och, Franz Josef and Hermann Ney.
    2002. Discriminative training and maximum entropy
    models for statistical machine translation. In
    Proceedings of the 40th Annual Meeting of the
    ACL, pages 295302.
  • Och2004 Och, Franz Josef, Hermann Ney. 2004.
    The alignment template approach to statistical
    machine translation. Computational Linguistics,
    30417449.
Write a Comment
User Comments (0)
About PowerShow.com