A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang

Description:

A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted from ... – PowerPoint PPT presentation

Number of Views:134

Avg rating:3.0/5.0

Slides: 23

Provided by: facultyWa6

Learn more at: http://faculty.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang

1
A Hierarchical Phrase-Based Model for Statistical
Machine TranslationAuthor David Chiang

Presented by Achim Ruopp
Formulas/illustrations/numbers extracted from
referenced papers

2
Outline

Phrase Order in Phrase-based Statistical MT
Using synchronous CFGs to solve the issue
Integrating the idea into an SMT system
Results
Conclusions
Future work
My Thoughts/Questions

3
Phrase Order in Phrase-based Statistical MT

Example from Chiang2005

4
Phrase Order in Phrase-based Statistical MT

Translation of the example with a phrase-based
SMT system (Pharao, Koehn2004)
Aozhou shi yu Bei Han you bangjiao1
de shaoshu guojia zhiyi
Australia is dipl. rels.1 with North
Korea is one of the few countries
Uses learned phrase translations
Accomplishes local phrase-reordering
Fails on overall reordering of phrases
Not only applicable to Chinese, but also Japanese
(SOV order), German (scrambling)

5
Idea Rules for Subphrases

Motivation
phrases are good for learning reorderings of
words, we can use them to learn reorderings of
phrases as well
Rules with placeholders for subphrases
ltyu 1 you 2, have 2 with 1gt
Learned automatically from bitext without
syntactical annotation
Formally syntax-based but not linguistically
syntax-based
the result sometimes resembles a syntacticians
grammar but often does not

6
Synchronous CFGs

Developed in the 60s for programming-language
compilation Aho1969
Separate tutorial by Chiang describing them
Chiang2005b
In NLP synchronous CFGs have been used for
Machine translation
Semantic interpretation

7
Synchronous CFGs

Like CFGs, but production have two right hand
sides
Source side
Target side
Related through linked non-terminal symbols
E.g. VP ? ltV1 NP2,NP2 V1gt
One-to-one correspondence
Non-terminal of type X is always linked to same
type
Productions applied in parallel to both sides to
linked non-terminals

8
Synchronous CFGs
9
Synchronous CFGs

Limitations
No Chomsky normal form
Has implications for complexity of decoder
Only limited closure under composition
Sister-reordering only

10
Model

Using the log-linear model Och2002
Presented by Bill last week

11
Model Rule Features

P(?a) and P(a?)
Lexical weights Pw(?a) and Pw(a?)
Estimation how we words in a translate to words
in ?
Phrase penalty exp(1)
Allows model to learn longer/shorter derivations
Exception glue rule weights
w(S ? ltX1,X1 gt) 1
w(S ? ltS1X2,S1X2gt) exp(-?g)
?g controls models preference for hierarchical
phrases over serial phrase combination

12
Model Additional Features

Separated out from rule weights
Notational convenience
Conceptually cleaner (necessary for
polynominal-time decoding)
Derivation D
Set of triples ltr,i,jgt apply grammar rule r for
rewriting a non-terminal in span f(D) from i to j
Ambiguous

13
Training

Training is starting from a symetrical,
word-aligned corpus
Adopted from Och2004 and Koehn2003
How to get from a one-directional alignment to a
symetrical alignment
How to find initial phrase pairs
alternative would be Marcu Wong 2002 that Ping
presented Marcu2002

14
Training
15
Training

Scheme leads unfortunately
To a large number of rules
With false ambiguity
Grammar is filtered to
Balance grammar size and performance
Five filter criteria e.g.
produce only two non-terminals
Initial phrase length limited to 10

16
Decoding

Our good old friend - the CKY parser
Enhanced with
Beam search
Postprocessor to map French derivations to
English derivations

17
Results

Baseline
Pharao Koehn2003, Koehn2004
Minimum error rate training on BLEU measure
Hierarchical model
2.2 Million rules after filtering down from 24
Million
7.5 relative improvement
Additional constituent feature
Additional feature favoring syntactic parses
Trained on 250k sentences Penn Chinese Treebank
Improved accuracy only in development set

18
Learned Feature Weights

Word word penalty
Phr phrase penalty (pp)
?g penalizes glue rules much less than ?pp does
regular rules
i.e. This suggests that the model will prefer
serial combination of phrases, unless some other
factor supports the use of hierarchical phrases

19
Conclusions

Hierarchical phrase pairs that can be learned
data without syntactically annotation
Hierarchical phrase pairs improve translation
accuracy significantly
Added syntactic information (constituent feature
did not provide statistically significant gain

20
Future Work

Move to more syntactically motivated grammar
Reducing grammar size to allow more aggressive
training settings

21
My Thoughts/Questions

Really interesting approach to bring syntactic
information into SMT
Example sentence was not translated correctly
Missing words are problematic
Can phrase reordering be also learned by
lexicalized phrase reordering models Och2004?
Why did constituent feature only improve accuracy
in development set, but not in test set?
Does data sparseness influence the learned
feature weights?
What syntactical features are already built into
Pharao?

22
References

Aho1969 Aho, A. V. and J. D. Ullman. 1969.
Syntax directed translations and the pushdown
assembler. Journal of Computer and System
Sciences, 33756.
Chiang2005 Chiang, David. 2005. A Hierarchical
Phrase-Based Model for Statistical Machine
Translation. In Proceedings of ACL 2005, pages
263270.
Chiang2005b http//www.umiacs.umd.edu/resnik/l
ing645_fa2005/notes/synchcfg.pdf
Koehn2003 Koehn, Philipp. 2003. Noun Phrase
Translation. Ph.D.
thesis, University of Southern California.
Koehn2004 Koehn, Phillip. 2004. Pharaoh a
beam search decoder for phrase-based statistical
machine translation models. In Proceedings of the
Sixth Conference of the Association for Machine
Translation in the Americas, pages 115124.
Marcu2002 Marcu, Daniel and William Wong.
2002. A phrasebased, joint probability model for
statistical machine translation. In Proceedings
of the 2002 Conference on
Empirical Methods in Natural Language Processing
(EMNLP), pages 133139.
Och 2002 Och, Franz Josef and Hermann Ney.
2002. Discriminative training and maximum entropy
models for statistical machine translation. In
Proceedings of the 40th Annual Meeting of the
ACL, pages 295302.
Och2004 Och, Franz Josef, Hermann Ney. 2004.
The alignment template approach to statistical
machine translation. Computational Linguistics,
30417449.